Re: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()

6 Jan 2026

On Tue, Jan 6, 2026 at 1:34 AM Jan Kara jack@suse.cz wrote:
...
Hi Jan,
...
[Thanks to Andrew for CCing me on patch commit]
Sorry, I didn't mean to exclude you. I hadn't realized the
fs-writeback.c file had maintainers/reviewers listed for it. I'll make
sure to cc you next time.
...
On Sun 14-12-25 19:00:43, Joanne Koong wrote:
...
Skip waiting on writeback for inodes that belong to mappings that do not
have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY
mapping flag).
This restores fuse back to prior behavior where syncs are no-ops. This
is needed because otherwise, if a system is running a faulty fuse
server that does not reply to issued write requests, this will cause
wait_sb_inodes() to wait forever.
Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree")
Reported-by: Athul Krishna athul.krishna.kr@protonmail.com
Reported-by: J. Neuschäfer j.neuschaefer@gmx.net
Cc: stable@vger.kernel.org
Signed-off-by: Joanne Koong joannelkoong@gmail.com
OK, but the difference 0c58a97f919c introduced goes much further than just
wait_sb_inodes(). Before 0c58a97f919c also filemap_fdatawait() (and all the
other variants waiting for folio_writeback() to clear) returned immediately
because folio writeback was done as soon as we've copied the content into
the temporary page. Now they will block waiting for the server to finish
the IO. So e.g. fsync() will block waiting for the server in
file_write_and_wait_range() now, instead of blocking in fuse_fsync_common()
-> fuse_simple_request(). Similarly e.g. truncate(2) will now block waiting
for the server so that folio_writeback can be cleared.
So I understand your patch fixes the regression with suspend blocking but I
don't have a high confidence we are not just starting a whack-a-mole game
catching all the places that previously hiddenly depended on
folio_writeback getting cleared without any involvement of untrusted fuse
server and now this changed. So do we have some higher-level idea what is /
is not guaranteed with stuck fuse server?
The implications of 0c58a97f919c (eg clearing folio writeback only
when the server has completed writeback instead of clearing writeback
and returning immediately) had some analysis and discussion in this
prior thread [1]. Copying/pasting a snippet from the cover letter:
"With removing the temp page, writeback state is now only cleared on the dirty
page after the server has written it back to disk. This may take an
indeterminate amount of time. As well, there is also the possibility of
malicious or well-intentioned but buggy servers where writeback may in the
worst case scenario, never complete. This means that any
folio_wait_writeback() on a dirty page belonging to a FUSE filesystem needs to
be carefully audited.
In particular, these are the cases that need to be accounted for:
* potentially deadlocking in reclaim, as mentioned above
* potentially stalling sync(2)
* potentially stalling page migration / compaction
This patchset adds a new mapping flag, AS_WRITEBACK_INDETERMINATE, which
filesystems may set on its inode mappings to indicate that writeback
operations may take an indeterminate amount of time to complete. FUSE will set
this flag on its mappings. This patchset adds checks to the critical parts of
reclaim, sync, and page migration logic where writeback may be waited on.
Please note the following:
* For sync(2), waiting on writeback will be skipped for FUSE, but this has no
  effect on existing behavior. Dirty FUSE pages are already not guaranteed to
  be written to disk by the time sync(2) returns (eg writeback is cleared on
  the dirty page but the server may not have written out the temp page to disk
  yet). If the caller wishes to ensure the data has actually been synced to
  disk, they should use fsync(2)/fdatasync(2) instead.
* AS_WRITEBACK_INDETERMINATE does not indicate that the folios should never be
  waited on when in writeback. There are some cases where the wait is
  desirable. For example, for the sync_file_range() syscall, it is fine to
  wait on the writeback since the caller passes in a fd for the operation."
That was from v6 of the patchset and some things were changed between
that and the final version landed in v8 [2] (most notably, changing
AS_WRITEBACK_INDETERMINATE to AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM and
dropping the sync + page migration skips), but I think that analysis
of what cases need to be accounted for / audited remains the same. I
don't think there are any places beyond those 3 listed above that have
a core intrinsic dependency on folio writeback being cleared cleanly
(eg without any involvement of an untrusted fuse server).
For the fsync() and truncate() examples you mentioned, I don't think
it's an issue that these now wait for the server to finish the I/O and
hang if the server doesn't. I think it's actually more correct
behavior than what we had with temp pages, eg imo these actually ought
to wait for the writeback to have been completed by the server. If the
server is malicious / buggy and fsync/truncate hangs, I think that's
fine given that fsync/truncate is initiated by the user on a specific
file descriptor (as opposed to the generic sync()) (and imo it should
hang if it can't actually be executed correctly because the server is
malfunctioning).
As for why this sync user regression has surfaced and now needs to be
addressed, I don't think it's because there's a whack-a-mole game
where we're ad-hoc having to patch up places we didn't realize could
be broken by folio writeback potentially hanging. The original
patchset [1] contained patches that addressed the sync and compaction
case (eg maintaining the original behavior that the temp pages had),
so I don't think this is something that was missed. These patches were
dropped because in the discussion in [1], they seemed pointless to
mitigate / guard against when there already exists other ways
migration/sync could be stalled by a malicious/buggy fuse server. What
I missed was that it's more common than I had thought for
well-intentioned servers to not correctly implement writeback
handling, and that even if it's userspace's "fault", it's still
considered a kernel regression if buggy code previously sufficed but
now doesn't.
Thanks,
Joanne
[1] https://lore.kernel.org/linux-fsdevel/20241122232359.429647-1-joannelkoong@g...
[2] https://lore.kernel.org/linux-fsdevel/CAJfpegveOFoL-XzDKQZZ4U6UF_AetNwTUDbfm...
...
                                                            Honza


...

fs/fs-writeback.c       |  3 ++-
 fs/fuse/file.c          |  4 +++-
 include/linux/pagemap.h | 11 +++++++++++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 6800886c4d10..ab2e279ed3c2 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2751,7 +2751,8 @@ static void wait_sb_inodes(struct super_block *sb)
               * do not have the mapping lock. Skip it here, wb completion
               * will remove it.
               */

        if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))




        if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) ||


            mapping_no_data_integrity(mapping))
                continue;

        spin_unlock_irq(&sb->s_inode_wblist_lock);



diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 01bc894e9c2b..3b2a171e652f 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -3200,8 +3200,10 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags)
  inode->i_fop = &fuse_file_operations;
  inode->i_data.a_ops = &fuse_file_aops;


if (fc->writeback_cache)




if (fc->writeback_cache) {
        mapping_set_writeback_may_deadlock_on_reclaim(&inode->i_data);


        mapping_set_no_data_integrity(&inode->i_data);


}

INIT_LIST_HEAD(&fi->write_files);
INIT_LIST_HEAD(&fi->queued_writes);



diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 31a848485ad9..ec442af3f886 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -210,6 +210,7 @@ enum mapping_flags {
      AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
      AS_KERNEL_FILE = 10,    /* mapping for a fake kernel file that shouldn't
                                 account usage to user cgroups */

AS_NO_DATA_INTEGRITY = 11, /* no data integrity guarantees */
/* Bits 16-25 are used for FOLIO_ORDER */
AS_FOLIO_ORDER_BITS = 5,
AS_FOLIO_ORDER_MIN = 16,



@@ -345,6 +346,16 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(const struct addres
      return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags);
 }
+static inline void mapping_set_no_data_integrity(struct address_space *mapping)
+{

set_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);



+}



+static inline bool mapping_no_data_integrity(const struct address_space *mapping)
+{

return test_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);



+}



static inline gfp_t mapping_gfp_mask(const struct address_space *mapping)
 {
      return mapping->gfp_mask;
--
2.47.3
--
Jan Kara jack@suse.com
SUSE Labs, CR

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()