On Tue, Jan 6, 2026 at 1:34 AM Jan Kara jack@suse.cz wrote:
Hi Jan,
[Thanks to Andrew for CCing me on patch commit]
Sorry, I didn't mean to exclude you. I hadn't realized the fs-writeback.c file had maintainers/reviewers listed for it. I'll make sure to cc you next time.
On Sun 14-12-25 19:00:43, Joanne Koong wrote:
Skip waiting on writeback for inodes that belong to mappings that do not have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY mapping flag).
This restores fuse back to prior behavior where syncs are no-ops. This is needed because otherwise, if a system is running a faulty fuse server that does not reply to issued write requests, this will cause wait_sb_inodes() to wait forever.
Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") Reported-by: Athul Krishna athul.krishna.kr@protonmail.com Reported-by: J. Neuschäfer j.neuschaefer@gmx.net Cc: stable@vger.kernel.org Signed-off-by: Joanne Koong joannelkoong@gmail.com
OK, but the difference 0c58a97f919c introduced goes much further than just wait_sb_inodes(). Before 0c58a97f919c also filemap_fdatawait() (and all the other variants waiting for folio_writeback() to clear) returned immediately because folio writeback was done as soon as we've copied the content into the temporary page. Now they will block waiting for the server to finish the IO. So e.g. fsync() will block waiting for the server in file_write_and_wait_range() now, instead of blocking in fuse_fsync_common() -> fuse_simple_request(). Similarly e.g. truncate(2) will now block waiting for the server so that folio_writeback can be cleared.
So I understand your patch fixes the regression with suspend blocking but I don't have a high confidence we are not just starting a whack-a-mole game catching all the places that previously hiddenly depended on folio_writeback getting cleared without any involvement of untrusted fuse server and now this changed. So do we have some higher-level idea what is / is not guaranteed with stuck fuse server?
The implications of 0c58a97f919c (eg clearing folio writeback only when the server has completed writeback instead of clearing writeback and returning immediately) had some analysis and discussion in this prior thread [1]. Copying/pasting a snippet from the cover letter:
"With removing the temp page, writeback state is now only cleared on the dirty page after the server has written it back to disk. This may take an indeterminate amount of time. As well, there is also the possibility of malicious or well-intentioned but buggy servers where writeback may in the worst case scenario, never complete. This means that any folio_wait_writeback() on a dirty page belonging to a FUSE filesystem needs to be carefully audited.
In particular, these are the cases that need to be accounted for: * potentially deadlocking in reclaim, as mentioned above * potentially stalling sync(2) * potentially stalling page migration / compaction
This patchset adds a new mapping flag, AS_WRITEBACK_INDETERMINATE, which filesystems may set on its inode mappings to indicate that writeback operations may take an indeterminate amount of time to complete. FUSE will set this flag on its mappings. This patchset adds checks to the critical parts of reclaim, sync, and page migration logic where writeback may be waited on.
Please note the following: * For sync(2), waiting on writeback will be skipped for FUSE, but this has no effect on existing behavior. Dirty FUSE pages are already not guaranteed to be written to disk by the time sync(2) returns (eg writeback is cleared on the dirty page but the server may not have written out the temp page to disk yet). If the caller wishes to ensure the data has actually been synced to disk, they should use fsync(2)/fdatasync(2) instead. * AS_WRITEBACK_INDETERMINATE does not indicate that the folios should never be waited on when in writeback. There are some cases where the wait is desirable. For example, for the sync_file_range() syscall, it is fine to wait on the writeback since the caller passes in a fd for the operation."
That was from v6 of the patchset and some things were changed between that and the final version landed in v8 [2] (most notably, changing AS_WRITEBACK_INDETERMINATE to AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM and dropping the sync + page migration skips), but I think that analysis of what cases need to be accounted for / audited remains the same. I don't think there are any places beyond those 3 listed above that have a core intrinsic dependency on folio writeback being cleared cleanly (eg without any involvement of an untrusted fuse server).
For the fsync() and truncate() examples you mentioned, I don't think it's an issue that these now wait for the server to finish the I/O and hang if the server doesn't. I think it's actually more correct behavior than what we had with temp pages, eg imo these actually ought to wait for the writeback to have been completed by the server. If the server is malicious / buggy and fsync/truncate hangs, I think that's fine given that fsync/truncate is initiated by the user on a specific file descriptor (as opposed to the generic sync()) (and imo it should hang if it can't actually be executed correctly because the server is malfunctioning).
As for why this sync user regression has surfaced and now needs to be addressed, I don't think it's because there's a whack-a-mole game where we're ad-hoc having to patch up places we didn't realize could be broken by folio writeback potentially hanging. The original patchset [1] contained patches that addressed the sync and compaction case (eg maintaining the original behavior that the temp pages had), so I don't think this is something that was missed. These patches were dropped because in the discussion in [1], they seemed pointless to mitigate / guard against when there already exists other ways migration/sync could be stalled by a malicious/buggy fuse server. What I missed was that it's more common than I had thought for well-intentioned servers to not correctly implement writeback handling, and that even if it's userspace's "fault", it's still considered a kernel regression if buggy code previously sufficed but now doesn't.
Thanks, Joanne
[1] https://lore.kernel.org/linux-fsdevel/20241122232359.429647-1-joannelkoong@g... [2] https://lore.kernel.org/linux-fsdevel/CAJfpegveOFoL-XzDKQZZ4U6UF_AetNwTUDbfm...
Honza
fs/fs-writeback.c | 3 ++- fs/fuse/file.c | 4 +++- include/linux/pagemap.h | 11 +++++++++++ 3 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 6800886c4d10..ab2e279ed3c2 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -2751,7 +2751,8 @@ static void wait_sb_inodes(struct super_block *sb) * do not have the mapping lock. Skip it here, wb completion * will remove it. */
if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) ||mapping_no_data_integrity(mapping)) continue; spin_unlock_irq(&sb->s_inode_wblist_lock);diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 01bc894e9c2b..3b2a171e652f 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -3200,8 +3200,10 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags)
inode->i_fop = &fuse_file_operations; inode->i_data.a_ops = &fuse_file_aops;
if (fc->writeback_cache)
if (fc->writeback_cache) { mapping_set_writeback_may_deadlock_on_reclaim(&inode->i_data);mapping_set_no_data_integrity(&inode->i_data);} INIT_LIST_HEAD(&fi->write_files); INIT_LIST_HEAD(&fi->queued_writes);diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 31a848485ad9..ec442af3f886 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -210,6 +210,7 @@ enum mapping_flags { AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9, AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't account usage to user cgroups */
AS_NO_DATA_INTEGRITY = 11, /* no data integrity guarantees */ /* Bits 16-25 are used for FOLIO_ORDER */ AS_FOLIO_ORDER_BITS = 5, AS_FOLIO_ORDER_MIN = 16,@@ -345,6 +346,16 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(const struct addres return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags); }
+static inline void mapping_set_no_data_integrity(struct address_space *mapping) +{
set_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);+}
+static inline bool mapping_no_data_integrity(const struct address_space *mapping) +{
return test_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);+}
static inline gfp_t mapping_gfp_mask(const struct address_space *mapping) { return mapping->gfp_mask; -- 2.47.3
-- Jan Kara jack@suse.com SUSE Labs, CR