Re: [PATCH] btrfs: make sure no dirty metadata write is submitted after btrfs_stop_all_workers()

24 Oct 2025

      在 2025/10/24 18:37, Filipe Manana 写道:
...
On Thu, Oct 23, 2025 at 9:44 PM Qu Wenruo wqu@suse.com wrote:
...
在 2025/10/24 01:56, Filipe Manana 写道:
...
On Thu, Oct 23, 2025 at 10:33 AM Qu Wenruo wqu@suse.com wrote:
[...]
...
So two suggestions:

Move this into btrfs_error_commit_super(), to have all code related

to fs in error state in one single place.

Instead of of calling filemap_write_and_wait(), make this simpler

by doing the iput() of the btree inode right before calling
btrfs_stop_all_workers() and removing the call to
invalidate_inode_pages2() which is irrelevant since the final iput()
removes everything from the page cache except dirty pages (but the
iput() already triggered writeback of them).
In fact for this scenario the call to invalidate_inode_pages2() must
be returning -EBUSY due to the dirty pages, but we have always ignored
its return value.
From a quick glance, it seems to me that suggestion 2 should work.
Yes, that's the original workaround I went with, the problem is we're
still submitting metadata writes after a trans abort.
I don't feel that comfort writing back metadata in that situation.
Maybe the trans abort is triggered because a corrupted extent/free space
tree which allows us to allocate new tree blocks where we shouldn't
(aka, metadata COW is already broken).
Metadata COW is broken why??
Consider this situation, no free space cache, and extent tree by somehow 
lacks the backref item for tree block A.
Then a new transaction is started, allocator choose the bytenr of tree 
block A for a new tree block B.
At this stage, tree block B will overwrite tree block A, but no 
writeback yet. And at this time transaction is not yet aborted.
Then by somehoe the tree block A got its reference dropped by one (e.g. 
subvolume deletion), but extent tree is corrupted and failed to find the 
backref item, and transaction is aborted.
...
Even after a transaction aborts, it's ok, but pointless, to allocate
extents and trigger writeback for them.
Writeback can be triggered by a lot of other reasons, e.g. memory 
pressure, and in that case if we try to writeback tree block B, it will 
overwrite tree block A even if it's after transaction abort.
Not to mention the final iput() in close_ctree().
Thanks,
Qu
...
As long as we don't allow the transaction to be committed and new
transactions to be started, we are safe - in fact that's the only
thing a transaction abort guarantees.
We may have many places that could check if a transaction was aborted
and avoid extent allocation and writeback, but that's ok, as long as
we don't allow transaction commits.
...
Thus I consider delaying btrfs_stop_all_workers() until iput() is only a
workaround, it still allows us to submit unnecessary writes.
I'd prefer the solution 1) in this case, still with the extra handling
in write_one_eb().
Thanks for the review and suggestion, will follow the advice of the
remaining part.
Thanks,
Qu

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] btrfs: make sure no dirty metadata write is submitted after btrfs_stop_all_workers()