On 05/03/2022 21:35, Greg KH wrote:
On Thu, Mar 03, 2022 at 09:30:31PM +0800, Anand Jain wrote:
From: Filipe Manana fdmanana@suse.com
Commit f0bfa76a11e93d0fe2c896fcb566568c5e8b5d3f upstream.
When doing a direct IO write against a file range that either has preallocated extents in that range or has regular extents and the file has the NOCOW attribute set, the write fails with -ENOSPC when all of the following conditions are met:
There are no data blocks groups with enough free space matching the size of the write;
There's not enough unallocated space for allocating a new data block group;
The extents in the target file range are not shared, neither through snapshots nor through reflinks.
This is wrong because a NOCOW write can be done in such case, and in fact it's possible to do it using a buffered IO write, since when failing to allocate data space, the buffered IO path checks if a NOCOW write is possible.
The failure in direct IO write path comes from the fact that early on, at btrfs_dio_iomap_begin(), we try to allocate data space for the write and if it that fails we return the error and stop - we never check if we can do NOCOW. But later, at btrfs_get_blocks_direct_write(), we check if we can do a NOCOW write into the range, or a subset of the range, and then release the previously reserved data space.
Fix this by doing the data reservation only if needed, when we must COW, at btrfs_get_blocks_direct_write() instead of doing it at btrfs_dio_iomap_begin(). This also simplifies a bit the logic and removes the inneficiency of doing unnecessary data reservations.
The following example test script reproduces the problem:
$ cat dio-nocow-enospc.sh #!/bin/bash
DEV=/dev/sdj MNT=/mnt/sdj
# Use a small fixed size (1G) filesystem so that it's quick to fill # it up. # Make sure the mixed block groups feature is not enabled because we # later want to not have more space available for allocating data # extents but still have enough metadata space free for the file writes. mkfs.btrfs -f -b $((1024 * 1024 * 1024)) -O ^mixed-bg $DEV mount $DEV $MNT
# Create our test file with the NOCOW attribute set. touch $MNT/foobar chattr +C $MNT/foobar
# Now fill in all unallocated space with data for our test file. # This will allocate a data block group that will be full and leave # no (or a very small amount of) unallocated space in the device, so # that it will not be possible to allocate a new block group later. echo echo "Creating test file with initial data..." xfs_io -c "pwrite -S 0xab -b 1M 0 900M" $MNT/foobar
# Now try a direct IO write against file range [0, 10M[. # This should succeed since this is a NOCOW file and an extent for the # range was previously allocated. echo echo "Trying direct IO write over allocated space..." xfs_io -d -c "pwrite -S 0xcd -b 10M 0 10M" $MNT/foobar
umount $MNT
When running the test:
$ ./dio-nocow-enospc.sh (...)
Creating test file with initial data... wrote 943718400/943718400 bytes at offset 0 900 MiB, 900 ops; 0:00:01.43 (625.526 MiB/sec and 625.5265 ops/sec)
Trying direct IO write over allocated space... pwrite: No space left on device
A test case for fstests will follow, testing both this direct IO write scenario as well as the buffered IO write scenario to make it less likely to get future regressions on the buffered IO case.
Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Anand Jain anand.jain@oracle.com
fs/btrfs/inode.c | 142 ++++++++++++++++++++++++++--------------------- 1 file changed, 78 insertions(+), 64 deletions(-)
A normal "cherry pick" worked here, right?
Yes. A normal cherry pick worked.
Also this is needed for 5.16.
Right. It applies to 5.16 as well.
Thanks, Anand
thanks,
greg k-h