When moving xfs volumes between kernels that have 96f859d52 and don't have 96f859d52, there is potential for a filesystem crash if the agfl has wrapped (flfirst > fllast). Depending on which filesystem this is this can take down the whole machine.
Such is the case when upgrading from the stock Centos 7 3.13 to the kernel.org stable kernels (via elrepo). Another possible common boundary cross I noticed was early Ubuntu kernel v4.4 to recent v4.4. We've been hitting this crash roughly once a week in our cloud, and it has produced the below stack trace.
The solution prefers to reset the agfl and leak a few blocks instead of shutting down the filesystem. The leaked blocks can be recovered using a xfs_repair.
The attached patch is a backport of a27ba2607 due to a78ee256c. It is intended for and tested on the v4.4 stream, but should apply to all kernels that lack upstream a78ee256c.
Thanks, Dave Chiluk
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv XFS (dm-4): Internal error XFS_WANT_CORRUPTED_GOTO at line 3505 of file fs/xfs/libxfs/xfs_btree.c. Caller xfs_free_ag_extent+0x35d/0x7a0 [xfs] CPU: 18 PID: 9896 Comm: mesos-slave Not tainted 4.10.10-1.el7.elrepo.x86_64 #1 Hardware name: Supermicro PIO-618U-TR4T+-ST031/X10DRU-i+, BIOS 2.0 12/17/2015 Call Trace: dump_stack+0x63/0x87 xfs_error_report+0x3b/0x40 [xfs] ? xfs_free_ag_extent+0x35d/0x7a0 [xfs] xfs_btree_insert+0x1b0/0x1c0 [xfs] xfs_free_ag_extent+0x35d/0x7a0 [xfs] xfs_free_extent+0xbb/0x150 [xfs] xfs_trans_free_extent+0x4f/0x110 [xfs] ? xfs_trans_add_item+0x5d/0x90 [xfs] xfs_extent_free_finish_item+0x26/0x40 [xfs] xfs_defer_finish+0x149/0x410 [xfs] xfs_remove+0x281/0x330 [xfs] xfs_vn_unlink+0x55/0xa0 [xfs] vfs_rmdir+0xb6/0x130 do_rmdir+0x1b3/0x1d0 SyS_rmdir+0x16/0x20 do_syscall_64+0x67/0x180 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f85d8d92397 RSP: 002b:00007f85cef9b758 EFLAGS: 00000246 ORIG_RAX: 0000000000000054 RAX: ffffffffffffffda RBX: 00007f858c00b4c0 RCX: 00007f85d8d92397 RDX: 00007f858c09ad70 RSI: 0000000000000000 RDI: 00007f858c09ad70 RBP: 00007f85cef9bc30 R08: 0000000000000001 R09: 0000000000000002 R10: 0000006f74656c67 R11: 0000000000000246 R12: 00007f85cef9c640 R13: 00007f85cef9bc50 R14: 00007f85cef9bcc0 R15: 00007f85cef9bc40 XFS (dm-4): xfs_do_force_shutdown(0x8) called from line 236 of file fs/xfs/libxfs/xfs_defer.c. Return address = 0xffffffffa028f087 XFS (dm-4): Corruption of in-memory data detected. Shutting down filesystem XFS (dm-4): Please umount the filesystem and rectify the problem(s) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^