[PATCH] btrfs: make sure no dirty metadata write is submitted after btrfs_stop_all_workers()

23 Oct 2025

[BUG]
During development of a minor feature (make sure all btrfs_bio::end_io()
is called in task context), I noticed a crash in generic/388, where
metadata writes triggered new works after btrfs_stop_all_workers().
It turns out that it can even happen without any code modification, just
using RAID5 for metadata and the same workload from generic/388 is going
to trigger the use-after-free.
[CAUSE]
If btrfs hits an error, the fs is marked as error, no new
transaction is allowed thus metadata is in a frozen state.
But there are some metadata modification before that error, and they are
still in the btree inode page cache.
Since there will be no real transaction commitment, all those dirty
folios are just kept as is in the page cache, and they can not be
invalidated by invalidate_inode_pages2() call inside close_ctree(),
because they are dirty.
And finally after btrfs_stop_all_workers(), we call iput() on btree
inode, which triggers writeback of those dirty metadata.
And if the fs is using RAID56 metadata, this will trigger RMW and queue
new works into rmw_workers, which is already stopped, causing warning
from queue_work() and use-after-free.
[FIX]
Add a special handling for write_one_eb(), that if the fs is already in
an error state immediately mark the bbio as failure, instead of really
submitting them into the device.
This means for test case like generic/388, at iput() those dirty folios
will just be discarded without triggering IO.
CC: stable@vger.kernel.org
Signed-off-by: Qu Wenruo wqu@suse.com
---
 fs/btrfs/disk-io.c   | 12 ++++++++++++
 fs/btrfs/extent_io.c | 11 +++++++++++
 2 files changed, 23 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 0aa7e5d1b05f..8b0fc2df85f1 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4402,11 +4402,23 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
btrfs_put_block_group_cache(fs_info);
+	/*
+	 * If the fs is already in trans aborted case, also trigger writeback of all
+	 * dirty metadata folios.
+	 * Those folios will not reach disk but dicarded directly.
+	 * This is to make sure no dirty folios before iput(), or iput() will
+	 * trigger writeback again, and may even cause extra works queued
+	 * into workqueue.
+	 */
+	if (unlikely(BTRFS_FS_ERROR(fs_info)))
+		filemap_write_and_wait(fs_info->btree_inode->i_mapping);
+
    /*
     * we must make sure there is not any read request to
     * submit after we stopping all workers.
     */
    invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
+
    btrfs_stop_all_workers(fs_info);
/* We shouldn't have any transaction open at this point */
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 870584dde575..a8a53409bb3f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2246,6 +2246,17 @@ static noinline_for_stack void write_one_eb(struct extent_buffer *eb,
    	wbc_account_cgroup_owner(wbc, folio, range_len);
    	folio_unlock(folio);
    }
+	/*
+	 * If the fs is already in error status, do not submit any writeback
+	 * but immediately finish it.
+	 * This is to avoid iput() triggering dirty folio writeback for
+	 * transaction aborted fses, which can cause extra works into
+	 * already stopped workqueues.
+	 */
+	if (unlikely(BTRFS_FS_ERROR(fs_info))) {
+		btrfs_bio_end_io(bbio, errno_to_blk_status(-EROFS));
+		return;
+	}
    btrfs_submit_bbio(bbio, 0);
 }
-- 
2.51.0



    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

[PATCH] btrfs: make sure no dirty metadata write is submitted after btrfs_stop_all_workers()