On Thu, May 31, 2018 at 06:41:46PM -0500, Dave Chiluk wrote:
When moving xfs volumes between kernels that have 96f859d52 and don't have 96f859d52, there is potential for a filesystem crash if the agfl has wrapped (flfirst > fllast). Depending on which filesystem this is this can take down the whole machine.
Such is the case when upgrading from the stock Centos 7 3.13 to the kernel.org stable kernels (via elrepo). Another possible common boundary cross I noticed was early Ubuntu kernel v4.4 to recent v4.4. We've been hitting this crash roughly once a week in our cloud, and it has produced the below stack trace.
The solution prefers to reset the agfl and leak a few blocks instead of shutting down the filesystem. The leaked blocks can be recovered using a xfs_repair.
The attached patch is a backport of a27ba2607 due to a78ee256c. It is intended for and tested on the v4.4 stream, but should apply to all kernels that lack upstream a78ee256c.
Thanks, now queued up.
greg k-h