I hit a stuck relocation on btrfs/061 during my overnight testing. This turned out to be because we had left over extent entries in our extent root for a data reloc inode that no longer existed. This happened because in btrfs_drop_extents() we only update refs if we have SHAREABLE set or we are the tree_root. This regression was introduced by aeb935a45581 ("btrfs: don't set SHAREABLE flag for data reloc tree") where we stopped setting SHAREABLE for the data reloc tree.
The problem here is we actually do want to update extent references for data extents in the data reloc tree, in fact we only don't want to update extent references if the file extents are in the log tree. Update this check to only skip updating references in the case of the log tree.
This is relatively rare, because you have to be running scrub at the same time, which is what btrfs/061 does. The data reloc inode has its extents pre-allcated, and then we copy the extent into the pre-allocated chunks. We theoretically should never be calling btrfs_drop_extents() on a data reloc inode. The exception of course is with scrub, if our pre-allocated extent falls inside of the block group we are scrubbing, then the block group will be marked read only and we will be forced to cow that extent. This means we will call btrfs_drop_extents() on that range when we cow that file extent.
This isn't really problematic if we do this, the data reloc inode requires that our extent lengths match exactly with the extent we are copying, thankfully we validate the extent is correct with get_new_location(), so if we happen to cow only part of the extent we won't link it in when we do the relocation, so we are safe from any other shenanigans that arise because of this interaction with scrub.
cc: stable@vger.kernel.org Fixes: aeb935a45581 ("btrfs: don't set SHAREABLE flag for data reloc tree") Signed-off-by: Josef Bacik josef@toxicpanda.com --- fs/btrfs/file.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 04e29b40a38e..b7d3559efcf7 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -734,8 +734,7 @@ int btrfs_drop_extents(struct btrfs_trans_handle *trans, if (args->start >= inode->disk_i_size && !args->replace_extent) modify_tree = 0;
- update_refs = (test_bit(BTRFS_ROOT_SHAREABLE, &root->state) || - root == fs_info->tree_root); + update_refs = root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID; while (1) { recow = 0; ret = btrfs_lookup_file_extent(trans, root, path, ino,
On 2021/10/2 01:57, Josef Bacik wrote:
I hit a stuck relocation on btrfs/061 during my overnight testing. This turned out to be because we had left over extent entries in our extent root for a data reloc inode that no longer existed. This happened because in btrfs_drop_extents() we only update refs if we have SHAREABLE set or we are the tree_root. This regression was introduced by aeb935a45581 ("btrfs: don't set SHAREABLE flag for data reloc tree") where we stopped setting SHAREABLE for the data reloc tree.
The problem here is we actually do want to update extent references for data extents in the data reloc tree, in fact we only don't want to update extent references if the file extents are in the log tree. Update this check to only skip updating references in the case of the log tree.
This is relatively rare, because you have to be running scrub at the same time, which is what btrfs/061 does. The data reloc inode has its extents pre-allcated, and then we copy the extent into the pre-allocated chunks. We theoretically should never be calling btrfs_drop_extents() on a data reloc inode. The exception of course is with scrub, if our pre-allocated extent falls inside of the block group we are scrubbing, then the block group will be marked read only and we will be forced to cow that extent. This means we will call btrfs_drop_extents() on that range when we cow that file extent.
Oh my god, I forgot the corner case here!
This isn't really problematic if we do this, the data reloc inode requires that our extent lengths match exactly with the extent we are copying, thankfully we validate the extent is correct with get_new_location(), so if we happen to cow only part of the extent we won't link it in when we do the relocation, so we are safe from any other shenanigans that arise because of this interaction with scrub.
But this makes me wonder, can we just leave scrub and balance exclusive? There are already quite some limitations, like balance and send.
Adding balance and scrub to be exclusive to each other shouldn't cause too much hassle, and can remove these checks.
cc: stable@vger.kernel.org Fixes: aeb935a45581 ("btrfs: don't set SHAREABLE flag for data reloc tree") Signed-off-by: Josef Bacik josef@toxicpanda.com
Reviewed-by: Qu Wenruo wqu@suse.com
Thanks for the cause analyse and fix! Qu
fs/btrfs/file.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 04e29b40a38e..b7d3559efcf7 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -734,8 +734,7 @@ int btrfs_drop_extents(struct btrfs_trans_handle *trans, if (args->start >= inode->disk_i_size && !args->replace_extent) modify_tree = 0;
- update_refs = (test_bit(BTRFS_ROOT_SHAREABLE, &root->state) ||
root == fs_info->tree_root);
- update_refs = root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID; while (1) { recow = 0; ret = btrfs_lookup_file_extent(trans, root, path, ino,
On Sat, Oct 02, 2021 at 09:11:19AM +0800, Qu Wenruo wrote:
But this makes me wonder, can we just leave scrub and balance exclusive? There are already quite some limitations, like balance and send.
Adding balance and scrub to be exclusive to each other shouldn't cause too much hassle, and can remove these checks.
This was suggested in the past in the btrfsmaintenance project, but rather for performance reasons. Some people run scrub and balance and let it finish when the time comes, so no perf concerns. As scrub is read-only the strict exclusion is not necessary as with send where it can lead to bugs.
I'd like to keep it working as it is now, unless we find a stronger reason to make it exclusive. As eg. zoned now uses relocation on the background we'd have either scrub failing or bg reclaim not working. It's a trade off, there will be always some problematic case. Not allowing to do eg. a quick balance filter while scrub is running could be pretty annoying.
On Fri, Oct 01, 2021 at 01:57:18PM -0400, Josef Bacik wrote:
I hit a stuck relocation on btrfs/061 during my overnight testing. This turned out to be because we had left over extent entries in our extent root for a data reloc inode that no longer existed. This happened because in btrfs_drop_extents() we only update refs if we have SHAREABLE set or we are the tree_root. This regression was introduced by aeb935a45581 ("btrfs: don't set SHAREABLE flag for data reloc tree") where we stopped setting SHAREABLE for the data reloc tree.
The problem here is we actually do want to update extent references for data extents in the data reloc tree, in fact we only don't want to update extent references if the file extents are in the log tree. Update this check to only skip updating references in the case of the log tree.
This is relatively rare, because you have to be running scrub at the same time, which is what btrfs/061 does. The data reloc inode has its extents pre-allcated, and then we copy the extent into the pre-allocated chunks. We theoretically should never be calling btrfs_drop_extents() on a data reloc inode. The exception of course is with scrub, if our pre-allocated extent falls inside of the block group we are scrubbing, then the block group will be marked read only and we will be forced to cow that extent. This means we will call btrfs_drop_extents() on that range when we cow that file extent.
This isn't really problematic if we do this, the data reloc inode requires that our extent lengths match exactly with the extent we are copying, thankfully we validate the extent is correct with get_new_location(), so if we happen to cow only part of the extent we won't link it in when we do the relocation, so we are safe from any other shenanigans that arise because of this interaction with scrub.
cc: stable@vger.kernel.org Fixes: aeb935a45581 ("btrfs: don't set SHAREABLE flag for data reloc tree") Signed-off-by: Josef Bacik josef@toxicpanda.com
Added to misc-next, thanks.
linux-stable-mirror@lists.linaro.org