From: Brian Foster bfoster@redhat.com
commit 5ca5916b6bc93577c360c06cb7cdf71adb9b5faf upstream.
If writeback I/O to a COW extent fails, the COW fork blocks are punched out and the data fork blocks left alone. It is possible for COW fork blocks to overlap non-shared data fork blocks (due to cowextsz hint prealloc), however, and writeback unconditionally maps to the COW fork whenever blocks exist at the corresponding offset of the page undergoing writeback. This means it's quite possible for a COW fork extent to overlap delalloc data fork blocks, writeback to convert and map to the COW fork blocks, writeback to fail, and finally for ioend completion to cancel the COW fork blocks and leave stale data fork delalloc blocks around in the inode. The blocks are effectively stale because writeback failure also discards dirty page state.
If this occurs, it is likely to trigger assert failures, free space accounting corruption and failures in unrelated file operations. For example, a subsequent reflink attempt of the affected file to a new target file will trip over the stale delalloc in the source file and fail. Several of these issues are occasionally reproduced by generic/648, but are reproducible on demand with the right sequence of operations and timely I/O error injection.
To fix this problem, update the ioend failure path to also punch out underlying data fork delalloc blocks on I/O error. This is analogous to the writeback submission failure path in xfs_discard_page() where we might fail to map data fork delalloc blocks and consistent with the successful COW writeback completion path, which is responsible for unmapping from the data fork and remapping in COW fork blocks.
Fixes: 787eb485509f ("xfs: fix and streamline error handling in xfs_end_io") Signed-off-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Amir Goldstein amir73il@gmail.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_aops.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
--- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -145,6 +145,7 @@ xfs_end_ioend( struct iomap_ioend *ioend) { struct xfs_inode *ip = XFS_I(ioend->io_inode); + struct xfs_mount *mp = ip->i_mount; xfs_off_t offset = ioend->io_offset; size_t size = ioend->io_size; unsigned int nofs_flag; @@ -160,18 +161,26 @@ xfs_end_ioend( /* * Just clean up the in-memory strutures if the fs has been shut down. */ - if (XFS_FORCED_SHUTDOWN(ip->i_mount)) { + if (XFS_FORCED_SHUTDOWN(mp)) { error = -EIO; goto done; }
/* - * Clean up any COW blocks on an I/O error. + * Clean up all COW blocks and underlying data fork delalloc blocks on + * I/O error. The delalloc punch is required because this ioend was + * mapped to blocks in the COW fork and the associated pages are no + * longer dirty. If we don't remove delalloc blocks here, they become + * stale and can corrupt free space accounting on unmount. */ error = blk_status_to_errno(ioend->io_bio->bi_status); if (unlikely(error)) { - if (ioend->io_flags & IOMAP_F_SHARED) + if (ioend->io_flags & IOMAP_F_SHARED) { xfs_reflink_cancel_cow_range(ip, offset, size, true); + xfs_bmap_punch_delalloc_range(ip, + XFS_B_TO_FSBT(mp, offset), + XFS_B_TO_FSB(mp, size)); + } goto done; }