Commit 93bba24d4b5ad1e5cd8b43f64e66ff9d6355dd20 upstream.
Function btrfs_trim_fs() doesn't handle errors in a consistent way. If error happens when trimming existing block groups, it will skip the remaining blocks and continue to trim unallocated space for each device.
The return value will only reflect the final error from device trimming.
This patch will fix such behavior by:
1) Recording the last error from block group or device trimming The return value will also reflect the last error during trimming. Make developer more aware of the problem.
2) Continuing trimming if possible If we failed to trim one block group or device, we could still try the next block group or device.
3) Report number of failures during block group and device trimming It would be less noisy, but still gives user a brief summary of what's going wrong.
Such behavior can avoid confusion for cases like failure to trim the first block group and then only unallocated space is trimmed.
Reported-by: Chris Murphy lists@colorremedies.com CC: stable@vger.kernel.org # 4.9 Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com [ add bg_ret and dev_ret to the messages ] Signed-off-by: David Sterba dsterba@suse.com --- fs/btrfs/extent-tree.c | 40 +++++++++++++++++++++++++++++----------- 1 file changed, 29 insertions(+), 11 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 6661116c47d9..4f28fef9dc7f 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -11124,6 +11124,10 @@ int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range) u64 end; u64 trimmed = 0; u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy); + u64 bg_failed = 0; + u64 dev_failed = 0; + int bg_ret = 0; + int dev_ret = 0; int ret = 0;
/* @@ -11134,7 +11138,7 @@ int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range) else cache = btrfs_lookup_block_group(fs_info, range->start);
- while (cache) { + for (; cache; cache = next_block_group(fs_info->tree_root, cache)) { if (cache->key.objectid >= (range->start + range->len)) { btrfs_put_block_group(cache); break; @@ -11148,13 +11152,15 @@ int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range) if (!block_group_cache_done(cache)) { ret = cache_block_group(cache, 0); if (ret) { - btrfs_put_block_group(cache); - break; + bg_failed++; + bg_ret = ret; + continue; } ret = wait_block_group_cache_done(cache); if (ret) { - btrfs_put_block_group(cache); - break; + bg_failed++; + bg_ret = ret; + continue; } } ret = btrfs_trim_block_group(cache, @@ -11165,28 +11171,40 @@ int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range)
trimmed += group_trimmed; if (ret) { - btrfs_put_block_group(cache); - break; + bg_failed++; + bg_ret = ret; + continue; } } - - cache = next_block_group(fs_info->tree_root, cache); }
+ if (bg_failed) + btrfs_warn(fs_info, + "failed to trim %llu block group(s), last error %d", + bg_failed, bg_ret); mutex_lock(&root->fs_info->fs_devices->device_list_mutex); devices = &root->fs_info->fs_devices->alloc_list; list_for_each_entry(device, devices, dev_alloc_list) { ret = btrfs_trim_free_extents(device, range->minlen, &group_trimmed); - if (ret) + if (ret) { + dev_failed++; + dev_ret = ret; break; + }
trimmed += group_trimmed; } mutex_unlock(&root->fs_info->fs_devices->device_list_mutex);
+ if (dev_failed) + btrfs_warn(fs_info, + "failed to trim %llu device(s), last error %d", + dev_failed, dev_ret); range->len = trimmed; - return ret; + if (bg_ret) + return bg_ret; + return dev_ret; }
/*
Commit 6ba9fc8e628becf0e3ec94083450d089b0dec5f5 upstream.
[BUG] fstrim on some btrfs only trims the unallocated space, not trimming any space in existing block groups.
[CAUSE] Before fstrim_range passed to btrfs_trim_fs(), it gets truncated to range [0, super->total_bytes). So later btrfs_trim_fs() will only be able to trim block groups in range [0, super->total_bytes).
While for btrfs, any bytenr aligned to sectorsize is valid, since btrfs uses its logical address space, there is nothing limiting the location where we put block groups.
For filesystem with frequent balance, it's quite easy to relocate all block groups and bytenr of block groups will start beyond super->total_bytes.
In that case, btrfs will not trim existing block groups.
[FIX] Just remove the truncation in btrfs_ioctl_fitrim(), so btrfs_trim_fs() can get the unmodified range, which is normally set to [0, U64_MAX].
Reported-by: Chris Murphy lists@colorremedies.com Fixes: f4c697e6406d ("btrfs: return EINVAL if start > total_bytes in fitrim ioctl") CC: stable@vger.kernel.org # v4.9 Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: Nikolay Borisov nborisov@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com --- fs/btrfs/extent-tree.c | 10 +--------- fs/btrfs/ioctl.c | 11 +++++++---- 2 files changed, 8 insertions(+), 13 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 4f28fef9dc7f..e2b8877c1a47 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -11123,21 +11123,13 @@ int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range) u64 start; u64 end; u64 trimmed = 0; - u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy); u64 bg_failed = 0; u64 dev_failed = 0; int bg_ret = 0; int dev_ret = 0; int ret = 0;
- /* - * try to trim all FS space, our block group may start from non-zero. - */ - if (range->len == total_bytes) - cache = btrfs_lookup_first_block_group(fs_info, range->start); - else - cache = btrfs_lookup_block_group(fs_info, range->start); - + cache = btrfs_lookup_first_block_group(fs_info, range->start); for (; cache; cache = next_block_group(fs_info->tree_root, cache)) { if (cache->key.objectid >= (range->start + range->len)) { btrfs_put_block_group(cache); diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index cbf512b64597..4f49132fb0b6 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -380,7 +380,6 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg) struct fstrim_range range; u64 minlen = ULLONG_MAX; u64 num_devices = 0; - u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy); int ret;
if (!capable(CAP_SYS_ADMIN)) @@ -404,11 +403,15 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg) return -EOPNOTSUPP; if (copy_from_user(&range, arg, sizeof(range))) return -EFAULT; - if (range.start > total_bytes || - range.len < fs_info->sb->s_blocksize) + + /* + * NOTE: Don't truncate the range using super->total_bytes. Bytenr of + * block group is in the logical address space, which can be any + * sectorsize aligned bytenr in the range [0, U64_MAX]. + */ + if (range.len < fs_info->sb->s_blocksize) return -EINVAL;
- range.len = min(range.len, total_bytes - range.start); range.minlen = max(range.minlen, minlen); ret = btrfs_trim_fs(fs_info->tree_root, &range); if (ret < 0)
On Mon, Nov 12, 2018 at 08:32:41AM +0800, Qu Wenruo wrote:
Commit 93bba24d4b5ad1e5cd8b43f64e66ff9d6355dd20 upstream.
Function btrfs_trim_fs() doesn't handle errors in a consistent way. If error happens when trimming existing block groups, it will skip the remaining blocks and continue to trim unallocated space for each device.
The return value will only reflect the final error from device trimming.
This patch will fix such behavior by:
Recording the last error from block group or device trimming The return value will also reflect the last error during trimming. Make developer more aware of the problem.
Continuing trimming if possible If we failed to trim one block group or device, we could still try the next block group or device.
Report number of failures during block group and device trimming It would be less noisy, but still gives user a brief summary of what's going wrong.
Such behavior can avoid confusion for cases like failure to trim the first block group and then only unallocated space is trimmed.
Reported-by: Chris Murphy lists@colorremedies.com CC: stable@vger.kernel.org # 4.9 Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com [ add bg_ret and dev_ret to the messages ] Signed-off-by: David Sterba dsterba@suse.com
fs/btrfs/extent-tree.c | 40 +++++++++++++++++++++++++++++----------- 1 file changed, 29 insertions(+), 11 deletions(-)
Does not apply to the latest 4.9.y tree :(
Can you please rebase and resend both of these?
thanks,
greg k-h
On 2018/11/19 下午8:30, Greg KH wrote:
On Mon, Nov 12, 2018 at 08:32:41AM +0800, Qu Wenruo wrote:
Commit 93bba24d4b5ad1e5cd8b43f64e66ff9d6355dd20 upstream.
Function btrfs_trim_fs() doesn't handle errors in a consistent way. If error happens when trimming existing block groups, it will skip the remaining blocks and continue to trim unallocated space for each device.
The return value will only reflect the final error from device trimming.
This patch will fix such behavior by:
Recording the last error from block group or device trimming The return value will also reflect the last error during trimming. Make developer more aware of the problem.
Continuing trimming if possible If we failed to trim one block group or device, we could still try the next block group or device.
Report number of failures during block group and device trimming It would be less noisy, but still gives user a brief summary of what's going wrong.
Such behavior can avoid confusion for cases like failure to trim the first block group and then only unallocated space is trimmed.
Reported-by: Chris Murphy lists@colorremedies.com CC: stable@vger.kernel.org # 4.9 Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com [ add bg_ret and dev_ret to the messages ] Signed-off-by: David Sterba dsterba@suse.com
fs/btrfs/extent-tree.c | 40 +++++++++++++++++++++++++++++----------- 1 file changed, 29 insertions(+), 11 deletions(-)
Does not apply to the latest 4.9.y tree :(
Sorry, it looks like I rebased it to the wrong branch.
Now I know which branch it should go, will send the updated patch soon.
Thanks, Qu
Can you please rebase and resend both of these?
thanks,
greg k-h
linux-stable-mirror@lists.linaro.org