[PATCH v2 0/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()

List overview All Threads
Download

newer

older

[PATCH 1/5] ceph: Do not propagate...

FAILED: patch "[PATCH] wifi:...

Joanne Koong

15 Dec 2025 15 Dec '25

3 a.m.

This patch reverts fuse back to its original behavior of sync being a no-op.

This fixes the userspace regression reported by Athul and J. upstream in [1][2] where if there is a bug in a fuse server that causes the server to never complete writeback, it will make wait_sb_inodes() wait forever.

Thanks, Joanne

[1] https://lore.kernel.org/regressions/CAJnrk1ZjQ8W8NzojsvJPRXiv9TuYPNdj8Ye7=Cg... [2] https://lore.kernel.org/linux-fsdevel/aT7JRqhUvZvfUQlV@eldamar.lan/

Changelog: v1: https://lore.kernel.org/linux-mm/20251120184211.2379439-1-joannelkoong@gmail... * Change AS_WRITEBACK_MAY_HANG to AS_NO_DATA_INTEGRITY and keep AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM as is.

Joanne Koong (1): fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()

fs/fs-writeback.c | 3 ++- fs/fuse/file.c | 4 +++- include/linux/pagemap.h | 11 +++++++++++ 3 files changed, 16 insertions(+), 2 deletions(-)

-- 2.47.3

Show replies by date

Joanne Koong

15 Dec 15 Dec

3 a.m.

New subject: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()

Skip waiting on writeback for inodes that belong to mappings that do not have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY mapping flag).

This restores fuse back to prior behavior where syncs are no-ops. This is needed because otherwise, if a system is running a faulty fuse server that does not reply to issued write requests, this will cause wait_sb_inodes() to wait forever.

Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") Reported-by: Athul Krishna athul.krishna.kr@protonmail.com Reported-by: J. Neuschäfer j.neuschaefer@gmx.net Cc: stable@vger.kernel.org Signed-off-by: Joanne Koong joannelkoong@gmail.com --- fs/fs-writeback.c | 3 ++- fs/fuse/file.c | 4 +++- include/linux/pagemap.h | 11 +++++++++++ 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 6800886c4d10..ab2e279ed3c2 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -2751,7 +2751,8 @@ static void wait_sb_inodes(struct super_block *sb) * do not have the mapping lock. Skip it here, wb completion * will remove it. */ - if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) + if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) || + mapping_no_data_integrity(mapping)) continue;

spin_unlock_irq(&sb->s_inode_wblist_lock); diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 01bc894e9c2b..3b2a171e652f 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -3200,8 +3200,10 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags)

inode->i_fop = &fuse_file_operations; inode->i_data.a_ops = &fuse_file_aops; - if (fc->writeback_cache) + if (fc->writeback_cache) { mapping_set_writeback_may_deadlock_on_reclaim(&inode->i_data); + mapping_set_no_data_integrity(&inode->i_data); + }

INIT_LIST_HEAD(&fi->write_files); INIT_LIST_HEAD(&fi->queued_writes); diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 31a848485ad9..ec442af3f886 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -210,6 +210,7 @@ enum mapping_flags { AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9, AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't account usage to user cgroups */ + AS_NO_DATA_INTEGRITY = 11, /* no data integrity guarantees */ /* Bits 16-25 are used for FOLIO_ORDER */ AS_FOLIO_ORDER_BITS = 5, AS_FOLIO_ORDER_MIN = 16, @@ -345,6 +346,16 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(const struct addres return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags); }

+static inline void mapping_set_no_data_integrity(struct address_space *mapping) +{ + set_bit(AS_NO_DATA_INTEGRITY, &mapping->flags); +} + +static inline bool mapping_no_data_integrity(const struct address_space *mapping) +{ + return test_bit(AS_NO_DATA_INTEGRITY, &mapping->flags); +} + static inline gfp_t mapping_gfp_mask(const struct address_space *mapping) { return mapping->gfp_mask;

-- 2.47.3

Bernd Schubert

5:09 p.m.

New subject: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()

On 12/15/25 04:00, Joanne Koong wrote:

...

Skip waiting on writeback for inodes that belong to mappings that do not have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY mapping flag).

This restores fuse back to prior behavior where syncs are no-ops. This is needed because otherwise, if a system is running a faulty fuse server that does not reply to issued write requests, this will cause wait_sb_inodes() to wait forever.

Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") Reported-by: Athul Krishna athul.krishna.kr@protonmail.com Reported-by: J. Neuschäfer j.neuschaefer@gmx.net Cc: stable@vger.kernel.org Signed-off-by: Joanne Koong joannelkoong@gmail.com

fs/fs-writeback.c | 3 ++- fs/fuse/file.c | 4 +++- include/linux/pagemap.h | 11 +++++++++++ 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 6800886c4d10..ab2e279ed3c2 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -2751,7 +2751,8 @@ static void wait_sb_inodes(struct super_block *sb) * do not have the mapping lock. Skip it here, wb completion * will remove it. */
if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) ||
    mapping_no_data_integrity(mapping))
continue;
spin_unlock_irq(&sb->s_inode_wblist_lock); diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 01bc894e9c2b..3b2a171e652f 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -3200,8 +3200,10 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags) inode->i_fop = &fuse_file_operations; inode->i_data.a_ops = &fuse_file_aops;

if (fc->writeback_cache)
if (fc->writeback_cache) { mapping_set_writeback_may_deadlock_on_reclaim(&inode->i_data);
mapping_set_no_data_integrity(&inode->i_data);
}

For a future commit, maybe we could add a FUSE_INIT flag that allows privileged fuse server to not set this? Maybe even in combination with an enforced request timeout?

...

INIT_LIST_HEAD(&fi->write_files); INIT_LIST_HEAD(&fi->queued_writes); diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 31a848485ad9..ec442af3f886 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -210,6 +210,7 @@ enum mapping_flags { AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9, AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't account usage to user cgroups */

AS_NO_DATA_INTEGRITY = 11, /* no data integrity guarantees */ /* Bits 16-25 are used for FOLIO_ORDER */ AS_FOLIO_ORDER_BITS = 5, AS_FOLIO_ORDER_MIN = 16,

@@ -345,6 +346,16 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(const struct addres return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags); } +static inline void mapping_set_no_data_integrity(struct address_space *mapping) +{

set_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);

+}

+static inline bool mapping_no_data_integrity(const struct address_space *mapping) +{

return test_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);

+}

static inline gfp_t mapping_gfp_mask(const struct address_space *mapping) { return mapping->gfp_mask;

Reviewed-by: Bernd Schubert bschubert@ddn.com

Joanne Koong

16 Dec 16 Dec

7:07 a.m.

New subject: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()

On Tue, Dec 16, 2025 at 1:09 AM Bernd Schubert bernd@bsbernd.com wrote:

...

On 12/15/25 04:00, Joanne Koong wrote:

...
Skip waiting on writeback for inodes that belong to mappings that do not have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY mapping flag).

This restores fuse back to prior behavior where syncs are no-ops. This is needed because otherwise, if a system is running a faulty fuse server that does not reply to issued write requests, this will cause wait_sb_inodes() to wait forever.

Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") Reported-by: Athul Krishna athul.krishna.kr@protonmail.com Reported-by: J. Neuschäfer j.neuschaefer@gmx.net Cc: stable@vger.kernel.org Signed-off-by: Joanne Koong joannelkoong@gmail.com

fs/fs-writeback.c | 3 ++- fs/fuse/file.c | 4 +++- include/linux/pagemap.h | 11 +++++++++++ 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 6800886c4d10..ab2e279ed3c2 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -2751,7 +2751,8 @@ static void wait_sb_inodes(struct super_block *sb) * do not have the mapping lock. Skip it here, wb completion * will remove it. */
        if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
        if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) ||
            mapping_no_data_integrity(mapping))
                continue;

        spin_unlock_irq(&sb->s_inode_wblist_lock);
diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 01bc894e9c2b..3b2a171e652f 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -3200,8 +3200,10 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags)
  inode->i_fop = &fuse_file_operations;
  inode->i_data.a_ops = &fuse_file_aops;
if (fc->writeback_cache)
if (fc->writeback_cache) {
        mapping_set_writeback_may_deadlock_on_reclaim(&inode->i_data);
        mapping_set_no_data_integrity(&inode->i_data);
}
For a future commit, maybe we could add a FUSE_INIT flag that allows privileged fuse server to not set this? Maybe even in combination with an enforced request timeout?

That sounds good, thanks for reviewing this, Bernd!

...

...
  INIT_LIST_HEAD(&fi->write_files);
  INIT_LIST_HEAD(&fi->queued_writes);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 31a848485ad9..ec442af3f886 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -210,6 +210,7 @@ enum mapping_flags { AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9, AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't account usage to user cgroups */
AS_NO_DATA_INTEGRITY = 11, /* no data integrity guarantees */
/* Bits 16-25 are used for FOLIO_ORDER */
AS_FOLIO_ORDER_BITS = 5,
AS_FOLIO_ORDER_MIN = 16,
@@ -345,6 +346,16 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(const struct addres return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags); }

+static inline void mapping_set_no_data_integrity(struct address_space *mapping) +{
set_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);
+}

+static inline bool mapping_no_data_integrity(const struct address_space *mapping) +{
return test_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);
+}

static inline gfp_t mapping_gfp_mask(const struct address_space *mapping) { return mapping->gfp_mask;
Reviewed-by: Bernd Schubert bschubert@ddn.com

J. Neuschäfer

6:13 p.m.

New subject: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()

On Sun, Dec 14, 2025 at 07:00:43PM -0800, Joanne Koong wrote:

...

Skip waiting on writeback for inodes that belong to mappings that do not have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY mapping flag).

This restores fuse back to prior behavior where syncs are no-ops. This is needed because otherwise, if a system is running a faulty fuse server that does not reply to issued write requests, this will cause wait_sb_inodes() to wait forever.

Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") Reported-by: Athul Krishna athul.krishna.kr@protonmail.com Reported-by: J. Neuschäfer j.neuschaefer@gmx.net Cc: stable@vger.kernel.org Signed-off-by: Joanne Koong joannelkoong@gmail.com

I can confirm that this patch fixes the issue I reported. (Tested by applying it on top of v6.19-rc1)

Tested-by: J. Neuschäfer j.neuschaefer@gmx.net

Thank you very much!

...

fs/fs-writeback.c | 3 ++- fs/fuse/file.c | 4 +++- include/linux/pagemap.h | 11 +++++++++++ 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 6800886c4d10..ab2e279ed3c2 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -2751,7 +2751,8 @@ static void wait_sb_inodes(struct super_block *sb) * do not have the mapping lock. Skip it here, wb completion * will remove it. */
if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) ||
    mapping_no_data_integrity(mapping))
continue;
spin_unlock_irq(&sb->s_inode_wblist_lock); diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 01bc894e9c2b..3b2a171e652f 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -3200,8 +3200,10 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags) inode->i_fop = &fuse_file_operations; inode->i_data.a_ops = &fuse_file_aops;

if (fc->writeback_cache)
if (fc->writeback_cache) { mapping_set_writeback_may_deadlock_on_reclaim(&inode->i_data);
mapping_set_no_data_integrity(&inode->i_data);
}
INIT_LIST_HEAD(&fi->write_files); INIT_LIST_HEAD(&fi->queued_writes); diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 31a848485ad9..ec442af3f886 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -210,6 +210,7 @@ enum mapping_flags { AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9, AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't account usage to user cgroups */

AS_NO_DATA_INTEGRITY = 11, /* no data integrity guarantees */ /* Bits 16-25 are used for FOLIO_ORDER */ AS_FOLIO_ORDER_BITS = 5, AS_FOLIO_ORDER_MIN = 16,

@@ -345,6 +346,16 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(const struct addres return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags); } +static inline void mapping_set_no_data_integrity(struct address_space *mapping) +{

set_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);

+}

+static inline bool mapping_no_data_integrity(const struct address_space *mapping) +{

return test_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);

+}

static inline gfp_t mapping_gfp_mask(const struct address_space *mapping) { return mapping->gfp_mask; -- 2.47.3

Joanne Koong

2 Jan 2 Jan

5:42 p.m.

New subject: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()

On Sun, Dec 14, 2025 at 7:05 PM Joanne Koong joannelkoong@gmail.com wrote:

...

Skip waiting on writeback for inodes that belong to mappings that do not have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY mapping flag).

This restores fuse back to prior behavior where syncs are no-ops. This is needed because otherwise, if a system is running a faulty fuse server that does not reply to issued write requests, this will cause wait_sb_inodes() to wait forever.

Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") Reported-by: Athul Krishna athul.krishna.kr@protonmail.com Reported-by: J. Neuschäfer j.neuschaefer@gmx.net Cc: stable@vger.kernel.org Signed-off-by: Joanne Koong joannelkoong@gmail.com

Hi Andrew,

This patch fixes a user regression that's been reported a few times upstream [1][2]. Bernd (who works on fuse) has given his Reviewed-by for the changes and J. has verified that it fixes the issues he saw. Is there anything else needed to move this patch forward?

Thanks, Joanne

[1] https://lore.kernel.org/regressions/mwBOip3XK77dn-UJtlk-uQ1N6i3nwsKticZyQdPY... [2] https://lore.kernel.org/linux-fsdevel/aT7JRqhUvZvfUQlV@eldamar.lan/

...

fs/fs-writeback.c | 3 ++- fs/fuse/file.c | 4 +++- include/linux/pagemap.h | 11 +++++++++++ 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 6800886c4d10..ab2e279ed3c2 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -2751,7 +2751,8 @@ static void wait_sb_inodes(struct super_block *sb) * do not have the mapping lock. Skip it here, wb completion * will remove it. */
          if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
          if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) ||
              mapping_no_data_integrity(mapping))
                  continue;

          spin_unlock_irq(&sb->s_inode_wblist_lock);
diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 01bc894e9c2b..3b2a171e652f 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -3200,8 +3200,10 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags)
    inode->i_fop = &fuse_file_operations;
    inode->i_data.a_ops = &fuse_file_aops;
  if (fc->writeback_cache)
  if (fc->writeback_cache) {
          mapping_set_writeback_may_deadlock_on_reclaim(&inode->i_data);
          mapping_set_no_data_integrity(&inode->i_data);
  }

  INIT_LIST_HEAD(&fi->write_files);
  INIT_LIST_HEAD(&fi->queued_writes);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 31a848485ad9..ec442af3f886 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -210,6 +210,7 @@ enum mapping_flags { AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9, AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't account usage to user cgroups */
  AS_NO_DATA_INTEGRITY = 11, /* no data integrity guarantees */
  /* Bits 16-25 are used for FOLIO_ORDER */
  AS_FOLIO_ORDER_BITS = 5,
  AS_FOLIO_ORDER_MIN = 16,
@@ -345,6 +346,16 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(const struct addres return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags); }

+static inline void mapping_set_no_data_integrity(struct address_space *mapping) +{
  set_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);
+}

+static inline bool mapping_no_data_integrity(const struct address_space *mapping) +{
  return test_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);
+}

static inline gfp_t mapping_gfp_mask(const struct address_space *mapping) { return mapping->gfp_mask; -- 2.47.3

Andrew Morton

3 Jan 3 Jan

6:03 p.m.

New subject: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()

On Sun, 14 Dec 2025 19:00:43 -0800 Joanne Koong joannelkoong@gmail.com wrote:

...

Skip waiting on writeback for inodes that belong to mappings that do not have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY mapping flag).

This restores fuse back to prior behavior where syncs are no-ops. This is needed because otherwise, if a system is running a faulty fuse server that does not reply to issued write requests, this will cause wait_sb_inodes() to wait forever.

Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") Reported-by: Athul Krishna athul.krishna.kr@protonmail.com Reported-by: J. Neuschäfer j.neuschaefer@gmx.net Cc: stable@vger.kernel.org Signed-off-by: Joanne Koong joannelkoong@gmail.com

..

--- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -2751,7 +2751,8 @@ static void wait_sb_inodes(struct super_block *sb) * do not have the mapping lock. Skip it here, wb completion * will remove it. */
if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) ||
    mapping_no_data_integrity(mapping))
continue;

It's not obvious why a no-data-integrity mapping would want to skip writeback - what do these things have to do with each other?

So can we please have a v2 which has a comment here explaining this to the reader?

David Hildenbrand (Red Hat)

4 Jan 4 Jan

6:54 p.m.

New subject: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()

On 1/3/26 19:03, Andrew Morton wrote:

...

On Sun, 14 Dec 2025 19:00:43 -0800 Joanne Koong joannelkoong@gmail.com wrote:

...
Skip waiting on writeback for inodes that belong to mappings that do not have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY mapping flag).

This restores fuse back to prior behavior where syncs are no-ops. This is needed because otherwise, if a system is running a faulty fuse server that does not reply to issued write requests, this will cause wait_sb_inodes() to wait forever.

Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") Reported-by: Athul Krishna athul.krishna.kr@protonmail.com Reported-by: J. Neuschäfer j.neuschaefer@gmx.net Cc: stable@vger.kernel.org Signed-off-by: Joanne Koong joannelkoong@gmail.com

..

--- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -2751,7 +2751,8 @@ static void wait_sb_inodes(struct super_block *sb) * do not have the mapping lock. Skip it here, wb completion * will remove it. */
if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) ||
    mapping_no_data_integrity(mapping))
continue;
It's not obvious why a no-data-integrity mapping would want to skip writeback - what do these things have to do with each other?

So can we please have a v2 which has a comment here explaining this to the reader?

Sorry for not replying earlier, I missed a couple of mails sent to my @redhat address due to @gmail being force-unsubscribed from linux-mm ...

Probably sufficient to add at the beginning of the commit:

"Above the while() loop in wait_sb_inodes(), we document that we must wait for all pages under writeback for data integrity. Consequently, if a mapping, like fuse, traditionally does not have data integrity semantics, there is no need to wait at all; we can simply skip these inodes.

So skip ..."

-- Cheers David

Joanne Koong

5 Jan 5 Jan

7:55 p.m.

New subject: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()

On Sun, Jan 4, 2026 at 10:54 AM David Hildenbrand (Red Hat) david@kernel.org wrote:

...

On 1/3/26 19:03, Andrew Morton wrote:

...
On Sun, 14 Dec 2025 19:00:43 -0800 Joanne Koong joannelkoong@gmail.com wrote:

...
Skip waiting on writeback for inodes that belong to mappings that do not have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY mapping flag).

This restores fuse back to prior behavior where syncs are no-ops. This is needed because otherwise, if a system is running a faulty fuse server that does not reply to issued write requests, this will cause wait_sb_inodes() to wait forever.

Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") Reported-by: Athul Krishna athul.krishna.kr@protonmail.com Reported-by: J. Neuschäfer j.neuschaefer@gmx.net Cc: stable@vger.kernel.org Signed-off-by: Joanne Koong joannelkoong@gmail.com

..

--- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -2751,7 +2751,8 @@ static void wait_sb_inodes(struct super_block *sb) * do not have the mapping lock. Skip it here, wb completion * will remove it. */
       if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
       if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) ||
           mapping_no_data_integrity(mapping))
               continue;
It's not obvious why a no-data-integrity mapping would want to skip writeback - what do these things have to do with each other?

So can we please have a v2 which has a comment here explaining this to the reader?
Sorry for not replying earlier, I missed a couple of mails sent to my @redhat address due to @gmail being force-unsubscribed from linux-mm ...

Probably sufficient to add at the beginning of the commit:

"Above the while() loop in wait_sb_inodes(), we document that we must wait for all pages under writeback for data integrity. Consequently, if a mapping, like fuse, traditionally does not have data integrity semantics, there is no need to wait at all; we can simply skip these inodes.

So skip ..."

Sounds good, I'll send out v3 with these changes. Thanks for the feedback, Andrew and David.

...

-- Cheers

David

Jan Kara

6 Jan 6 Jan

9:33 a.m.

New subject: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()

[Thanks to Andrew for CCing me on patch commit]

On Sun 14-12-25 19:00:43, Joanne Koong wrote:

...

Skip waiting on writeback for inodes that belong to mappings that do not have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY mapping flag).

This restores fuse back to prior behavior where syncs are no-ops. This is needed because otherwise, if a system is running a faulty fuse server that does not reply to issued write requests, this will cause wait_sb_inodes() to wait forever.

Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") Reported-by: Athul Krishna athul.krishna.kr@protonmail.com Reported-by: J. Neuschäfer j.neuschaefer@gmx.net Cc: stable@vger.kernel.org Signed-off-by: Joanne Koong joannelkoong@gmail.com

OK, but the difference 0c58a97f919c introduced goes much further than just wait_sb_inodes(). Before 0c58a97f919c also filemap_fdatawait() (and all the other variants waiting for folio_writeback() to clear) returned immediately because folio writeback was done as soon as we've copied the content into the temporary page. Now they will block waiting for the server to finish the IO. So e.g. fsync() will block waiting for the server in file_write_and_wait_range() now, instead of blocking in fuse_fsync_common() -> fuse_simple_request(). Similarly e.g. truncate(2) will now block waiting for the server so that folio_writeback can be cleared.

So I understand your patch fixes the regression with suspend blocking but I don't have a high confidence we are not just starting a whack-a-mole game catching all the places that previously hiddenly depended on folio_writeback getting cleared without any involvement of untrusted fuse server and now this changed. So do we have some higher-level idea what is / is not guaranteed with stuck fuse server?

Honza

...

fs/fs-writeback.c | 3 ++- fs/fuse/file.c | 4 +++- include/linux/pagemap.h | 11 +++++++++++ 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 6800886c4d10..ab2e279ed3c2 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -2751,7 +2751,8 @@ static void wait_sb_inodes(struct super_block *sb) * do not have the mapping lock. Skip it here, wb completion * will remove it. */
if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) ||
    mapping_no_data_integrity(mapping))
continue;
spin_unlock_irq(&sb->s_inode_wblist_lock); diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 01bc894e9c2b..3b2a171e652f 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -3200,8 +3200,10 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags) inode->i_fop = &fuse_file_operations; inode->i_data.a_ops = &fuse_file_aops;

if (fc->writeback_cache)
if (fc->writeback_cache) { mapping_set_writeback_may_deadlock_on_reclaim(&inode->i_data);
mapping_set_no_data_integrity(&inode->i_data);
}
INIT_LIST_HEAD(&fi->write_files); INIT_LIST_HEAD(&fi->queued_writes); diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 31a848485ad9..ec442af3f886 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -210,6 +210,7 @@ enum mapping_flags { AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9, AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't account usage to user cgroups */

AS_NO_DATA_INTEGRITY = 11, /* no data integrity guarantees */ /* Bits 16-25 are used for FOLIO_ORDER */ AS_FOLIO_ORDER_BITS = 5, AS_FOLIO_ORDER_MIN = 16,

@@ -345,6 +346,16 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(const struct addres return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags); } +static inline void mapping_set_no_data_integrity(struct address_space *mapping) +{

set_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);

+}

+static inline bool mapping_no_data_integrity(const struct address_space *mapping) +{

return test_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);

+}

static inline gfp_t mapping_gfp_mask(const struct address_space *mapping) { return mapping->gfp_mask; -- 2.47.3

-- Jan Kara jack@suse.com SUSE Labs, CR

David Hildenbrand (Red Hat)

10:05 a.m.

New subject: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()

On 1/6/26 10:33, Jan Kara wrote:

...

[Thanks to Andrew for CCing me on patch commit]

On Sun 14-12-25 19:00:43, Joanne Koong wrote:

...
Skip waiting on writeback for inodes that belong to mappings that do not have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY mapping flag).

This restores fuse back to prior behavior where syncs are no-ops. This is needed because otherwise, if a system is running a faulty fuse server that does not reply to issued write requests, this will cause wait_sb_inodes() to wait forever.

Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") Reported-by: Athul Krishna athul.krishna.kr@protonmail.com Reported-by: J. Neuschäfer j.neuschaefer@gmx.net Cc: stable@vger.kernel.org Signed-off-by: Joanne Koong joannelkoong@gmail.com

OK, but the difference 0c58a97f919c introduced goes much further than just wait_sb_inodes(). Before 0c58a97f919c also filemap_fdatawait() (and all the other variants waiting for folio_writeback() to clear) returned immediately because folio writeback was done as soon as we've copied the content into the temporary page. Now they will block waiting for the server to finish the IO. So e.g. fsync() will block waiting for the server in file_write_and_wait_range() now, instead of blocking in fuse_fsync_common() -> fuse_simple_request(). Similarly e.g. truncate(2) will now block waiting for the server so that folio_writeback can be cleared.

So I understand your patch fixes the regression with suspend blocking but I don't have a high confidence we are not just starting a whack-a-mole game

Yes, I think so, and I think it is [1] not even only limited to writeback [2].

...

catching all the places that previously hiddenly depended on folio_writeback getting cleared without any involvement of untrusted fuse server and now this changed.

Even worse, it's not only untrusted fuse servers, but also trusted-but-buggy fuse servers, unfortunately. As Joanne wrote in v1:

" As reported by Athul upstream in [1], there is a userspace regression caused by commit 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") where if there is a bug in a fuse server that causes the server to never complete writeback, it will make wait_sb_inodes() wait forever, causing sync paths to hang. "

...

So do we have some higher-level idea what is / is not guaranteed with stuck fuse server?

Joanne first proposed AS_WRITEBACK_MAY_HANG, which I disliked [2] for various reasons because the semantics are weird. I am strongly against using such a flag to arbitrarily skip waiting for writeback on folios in the tree.

The patch here is at least logically the right thing to do when only looking at the wait_sb_inodes() writeback situation [3] and why it is even ok to skip waiting for writeback, and the fix Joanne originally proposed.

To handle the bigger picture (I raised another problematic instance in [4]): I don't know how to handle that without properly fixing fuse. Fuse folks should really invest some time to solve this problem for good.

As a big temporary kernel hack, we could add a AS_ANY_WAITING_UTTERLY_BROKEN and simply refuse to wait for writeback directly inside folio_wait_writeback() -- not arbitrarily skipping it in callers -- and possibly other places (readahead, not sure). That would restore the old behavior.

Well, not quite, because the semantics that folio_wait_writeback() promises -- writeback flag at least cleared once, like required here for data integrity -- are just not true anymore.

And it would still break migration of folios that are under writeback even though waiting for writeback even for migration even though in 99.9999% of all cases with trusted fuse server will do the right thing. Just nasty.

Of course, we could set AS_ANY_WAITING_UTTERLY_BROKEN in fuse only conditionally, but the fact that buggy trusted fuse servers are now a thing, it all stops making any sense because we would have to set that flag always.

There is no easy way to get back the old behavior without reverting to the old way of using buffer pages I guess. [1] https://lore.kernel.org/linux-mm/504d100d-b8f3-475b-b575-3adfd17627b5@kernel...] https://lore.kernel.org/linux-mm/f8da9ee0-f136-4366-b63a-1812fda11304@kernel...] https://lore.kernel.org/linux-mm/6d0948f5-e739-49f3-8e23-359ddbf3da8f@kernel...] https://lore.kernel.org/linux-mm/504d100d-b8f3-475b-b575-3adfd17627b5@kernel...

-- Cheers David

Miklos Szeredi

1:13 p.m.

New subject: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()

On Tue, 6 Jan 2026 at 11:05, David Hildenbrand (Red Hat) david@kernel.org wrote:

...

...
So I understand your patch fixes the regression with suspend blocking but I don't have a high confidence we are not just starting a whack-a-mole game

Joanne did a thorough analysis, so I still have hope. Missing a case in such a complex thing is not unexpected.

...

Yes, I think so, and I think it is [1] not even only limited to writeback [2].

You are referring to DoS against compaction?

It is a much more benign issue, since compaction will just skip locked pages, AFAIU (wasn't always so: https://lore.kernel.org/all/1288817005.4235.11393.camel@nimitz/).

Not saying it shouldn't be fixed, but it should be a separate discussion.

...

To handle the bigger picture (I raised another problematic instance in [4]): I don't know how to handle that without properly fixing fuse. Fuse folks should really invest some time to solve this problem for good.

Fixing it generically in fuse would necessarily involve bringing back some sort of temp buffer. The performance penalty could be minimized, but complexity is what really hurts.

Maybe doing whack-a-mole results in less mess overall :-/

...

As a big temporary kernel hack, we could add a AS_ANY_WAITING_UTTERLY_BROKEN and simply refuse to wait for writeback directly inside folio_wait_writeback() -- not arbitrarily skipping it in callers -- and possibly other places (readahead, not sure). That would restore the old behavior.

No it wouldn't, since the old code had surrogate methods for waiting on outstanding writes, which were called on fsync, etc.

Thanks, Miklos

Jan Kara

1:55 p.m.

New subject: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()

On Tue 06-01-26 14:13:55, Miklos Szeredi wrote:

...

On Tue, 6 Jan 2026 at 11:05, David Hildenbrand (Red Hat) david@kernel.org wrote:

...
...
So I understand your patch fixes the regression with suspend blocking but I don't have a high confidence we are not just starting a whack-a-mole game

Joanne did a thorough analysis, so I still have hope. Missing a case in such a complex thing is not unexpected.

...
Yes, I think so, and I think it is [1] not even only limited to writeback [2].

You are referring to DoS against compaction?

It is a much more benign issue, since compaction will just skip locked pages, AFAIU (wasn't always so: https://lore.kernel.org/all/1288817005.4235.11393.camel@nimitz/).

Not saying it shouldn't be fixed, but it should be a separate discussion.

...
To handle the bigger picture (I raised another problematic instance in [4]): I don't know how to handle that without properly fixing fuse. Fuse folks should really invest some time to solve this problem for good.

Fixing it generically in fuse would necessarily involve bringing back some sort of temp buffer. The performance penalty could be minimized, but complexity is what really hurts.

Maybe doing whack-a-mole results in less mess overall :-/

OK, I was wondering about the bigger picture and now I see there's none :) I can live with this workaround for now as its blast radius is relatively small and we can see if some other practical issues appear in the future (in which case I'll probably push for a more systemic solution).

Honza

-- Jan Kara jack@suse.com SUSE Labs, CR

David Hildenbrand (Red Hat)

2:33 p.m.

New subject: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()

On 1/6/26 14:13, Miklos Szeredi wrote:

...

On Tue, 6 Jan 2026 at 11:05, David Hildenbrand (Red Hat) david@kernel.org wrote:

...
...
So I understand your patch fixes the regression with suspend blocking but I don't have a high confidence we are not just starting a whack-a-mole game

Joanne did a thorough analysis, so I still have hope. Missing a case in such a complex thing is not unexpected.

...
Yes, I think so, and I think it is [1] not even only limited to writeback [2].

You are referring to DoS against compaction?

In previous discussions it was raised that readahead runs into similar problems.

I don't recall all the details, but I think that we might end up holding the folio lock forever while the fuse user space daemon is supposed to fill the page with data; anybody trying to lock the folio would similarly deadlock.

Maybe only compaction/migration is affected by that, hard to tell.

...

It is a much more benign issue, since compaction will just skip locked pages, AFAIU (wasn't always so: https://lore.kernel.org/all/1288817005.4235.11393.camel@nimitz/).

Not saying it shouldn't be fixed, but it should be a separate discussion.

Right. But as I pointed out in [4], there are other call paths where we might end up waiting for writeback unless I am missing something.

So it has whack-a-mole smell to it.

...

...
To handle the bigger picture (I raised another problematic instance in [4]): I don't know how to handle that without properly fixing fuse. Fuse folks should really invest some time to solve this problem for good.

Fixing it generically in fuse would necessarily involve bringing back some sort of temp buffer. The performance penalty could be minimized, but complexity is what really hurts.

I'm not sure about temp buffers. During early discussions there were ideas about canceling writeback and instead marking the folio dirty again. I assume there is a non-trivial solution space left unexplored for now.

...

Maybe doing whack-a-mole results in less mess overall :-/

Maybe :) I'm fine with the patch as is as well.

...

...
As a big temporary kernel hack, we could add a AS_ANY_WAITING_UTTERLY_BROKEN and simply refuse to wait for writeback directly inside folio_wait_writeback() -- not arbitrarily skipping it in callers -- and possibly other places (readahead, not sure). That would restore the old behavior.

No it wouldn't, since the old code had surrogate methods for waiting on outstanding writes, which were called on fsync, etc.

Yeah, I raised some "except" below, I assume there are more. No that I would want to go down that path :)

-- Cheers David

Miklos Szeredi

3:21 p.m.

New subject: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()

On Tue, 6 Jan 2026 at 15:34, David Hildenbrand (Red Hat) david@kernel.org wrote:

...

I don't recall all the details, but I think that we might end up holding the folio lock forever while the fuse user space daemon is supposed to fill the page with data; anybody trying to lock the folio would similarly deadlock.

Right.

...

Maybe only compaction/migration is affected by that, hard to tell.

Can't imagine anything beyond actual I/O and folio logistics (reclaim/compaction) that would want to touch the page lock.

I/O has the right to wait forever on the folio if the server is stuck, that doesn't count as a deadlock.

The logistics functions are careful to use folio_trylock(), but they could give a hint to fuse via a callback that they'd like to have this particular folio. In that case fuse would be free to cancel the read and let the whole thing be retried with a new folio.

What we really need is a failing test case, the rest should be easy ;-)

...

I'm not sure about temp buffers. During early discussions there were ideas about canceling writeback and instead marking the folio dirty again. I assume there is a non-trivial solution space left unexplored for now.

That might work combined with the suggested callback to fix the compaction issue.

But I don't see how it would be a generic replacement for the tmp page code.

Thanks, Miklos

David Hildenbrand (Red Hat)

3:41 p.m.

New subject: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()

On 1/6/26 16:21, Miklos Szeredi wrote:

...

On Tue, 6 Jan 2026 at 15:34, David Hildenbrand (Red Hat) david@kernel.org wrote:

...
I don't recall all the details, but I think that we might end up holding the folio lock forever while the fuse user space daemon is supposed to fill the page with data; anybody trying to lock the folio would similarly deadlock.

Right.

...
Maybe only compaction/migration is affected by that, hard to tell.

Can't imagine anything beyond actual I/O and folio logistics (reclaim/compaction) that would want to touch the page lock.

I assume the usual suspects, including mm/memory-failure.c.

memory_failure() not only contains a folio_wait_writeback() but also a folio_lock(), so twice the fun :)

-- Cheers David

Miklos Szeredi

4:05 p.m.

New subject: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()

On Tue, 6 Jan 2026 at 16:41, David Hildenbrand (Red Hat) david@kernel.org wrote:

...

I assume the usual suspects, including mm/memory-failure.c.

memory_failure() not only contains a folio_wait_writeback() but also a folio_lock(), so twice the fun :)

As long as it's run from a workqueue it shouldn't affect the rest of the system, right? The wq thread will consume a nontrivial amount of resources, I suppose, so it would be better to implement those waits asynchronously.

Thanks, Miklos

David Hildenbrand (Red Hat)

5:54 p.m.

New subject: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()

On 1/6/26 17:05, Miklos Szeredi wrote:

...

On Tue, 6 Jan 2026 at 16:41, David Hildenbrand (Red Hat) david@kernel.org wrote:

...
I assume the usual suspects, including mm/memory-failure.c.

memory_failure() not only contains a folio_wait_writeback() but also a folio_lock(), so twice the fun :)

As long as it's run from a workqueue it shouldn't affect the rest of the system, right? The wq thread will consume a nontrivial amount of resources, I suppose, so it would be better to implement those waits asynchronously.

Good question. I know that memory_failure() can be triggered out of various context, but I never traced it back to its origin.

-- Cheers David

Joanne Koong

11:30 p.m.

New subject: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()

On Tue, Jan 6, 2026 at 1:34 AM Jan Kara jack@suse.cz wrote:

...

Hi Jan,

...

[Thanks to Andrew for CCing me on patch commit]

Sorry, I didn't mean to exclude you. I hadn't realized the fs-writeback.c file had maintainers/reviewers listed for it. I'll make sure to cc you next time.

...

On Sun 14-12-25 19:00:43, Joanne Koong wrote:

...
Skip waiting on writeback for inodes that belong to mappings that do not have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY mapping flag).

This restores fuse back to prior behavior where syncs are no-ops. This is needed because otherwise, if a system is running a faulty fuse server that does not reply to issued write requests, this will cause wait_sb_inodes() to wait forever.

Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") Reported-by: Athul Krishna athul.krishna.kr@protonmail.com Reported-by: J. Neuschäfer j.neuschaefer@gmx.net Cc: stable@vger.kernel.org Signed-off-by: Joanne Koong joannelkoong@gmail.com

OK, but the difference 0c58a97f919c introduced goes much further than just wait_sb_inodes(). Before 0c58a97f919c also filemap_fdatawait() (and all the other variants waiting for folio_writeback() to clear) returned immediately because folio writeback was done as soon as we've copied the content into the temporary page. Now they will block waiting for the server to finish the IO. So e.g. fsync() will block waiting for the server in file_write_and_wait_range() now, instead of blocking in fuse_fsync_common() -> fuse_simple_request(). Similarly e.g. truncate(2) will now block waiting for the server so that folio_writeback can be cleared.

So I understand your patch fixes the regression with suspend blocking but I don't have a high confidence we are not just starting a whack-a-mole game catching all the places that previously hiddenly depended on folio_writeback getting cleared without any involvement of untrusted fuse server and now this changed. So do we have some higher-level idea what is / is not guaranteed with stuck fuse server?

The implications of 0c58a97f919c (eg clearing folio writeback only when the server has completed writeback instead of clearing writeback and returning immediately) had some analysis and discussion in this prior thread [1]. Copying/pasting a snippet from the cover letter:

"With removing the temp page, writeback state is now only cleared on the dirty page after the server has written it back to disk. This may take an indeterminate amount of time. As well, there is also the possibility of malicious or well-intentioned but buggy servers where writeback may in the worst case scenario, never complete. This means that any folio_wait_writeback() on a dirty page belonging to a FUSE filesystem needs to be carefully audited.

In particular, these are the cases that need to be accounted for: * potentially deadlocking in reclaim, as mentioned above * potentially stalling sync(2) * potentially stalling page migration / compaction

This patchset adds a new mapping flag, AS_WRITEBACK_INDETERMINATE, which filesystems may set on its inode mappings to indicate that writeback operations may take an indeterminate amount of time to complete. FUSE will set this flag on its mappings. This patchset adds checks to the critical parts of reclaim, sync, and page migration logic where writeback may be waited on.

Please note the following: * For sync(2), waiting on writeback will be skipped for FUSE, but this has no effect on existing behavior. Dirty FUSE pages are already not guaranteed to be written to disk by the time sync(2) returns (eg writeback is cleared on the dirty page but the server may not have written out the temp page to disk yet). If the caller wishes to ensure the data has actually been synced to disk, they should use fsync(2)/fdatasync(2) instead. * AS_WRITEBACK_INDETERMINATE does not indicate that the folios should never be waited on when in writeback. There are some cases where the wait is desirable. For example, for the sync_file_range() syscall, it is fine to wait on the writeback since the caller passes in a fd for the operation."

That was from v6 of the patchset and some things were changed between that and the final version landed in v8 [2] (most notably, changing AS_WRITEBACK_INDETERMINATE to AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM and dropping the sync + page migration skips), but I think that analysis of what cases need to be accounted for / audited remains the same. I don't think there are any places beyond those 3 listed above that have a core intrinsic dependency on folio writeback being cleared cleanly (eg without any involvement of an untrusted fuse server).

For the fsync() and truncate() examples you mentioned, I don't think it's an issue that these now wait for the server to finish the I/O and hang if the server doesn't. I think it's actually more correct behavior than what we had with temp pages, eg imo these actually ought to wait for the writeback to have been completed by the server. If the server is malicious / buggy and fsync/truncate hangs, I think that's fine given that fsync/truncate is initiated by the user on a specific file descriptor (as opposed to the generic sync()) (and imo it should hang if it can't actually be executed correctly because the server is malfunctioning).

As for why this sync user regression has surfaced and now needs to be addressed, I don't think it's because there's a whack-a-mole game where we're ad-hoc having to patch up places we didn't realize could be broken by folio writeback potentially hanging. The original patchset [1] contained patches that addressed the sync and compaction case (eg maintaining the original behavior that the temp pages had), so I don't think this is something that was missed. These patches were dropped because in the discussion in [1], they seemed pointless to mitigate / guard against when there already exists other ways migration/sync could be stalled by a malicious/buggy fuse server. What I missed was that it's more common than I had thought for well-intentioned servers to not correctly implement writeback handling, and that even if it's userspace's "fault", it's still considered a kernel regression if buggy code previously sufficed but now doesn't.

Thanks, Joanne

[1] https://lore.kernel.org/linux-fsdevel/20241122232359.429647-1-joannelkoong@g... [2] https://lore.kernel.org/linux-fsdevel/CAJfpegveOFoL-XzDKQZZ4U6UF_AetNwTUDbfm...

...

                                                            Honza
...
fs/fs-writeback.c | 3 ++- fs/fuse/file.c | 4 +++- include/linux/pagemap.h | 11 +++++++++++ 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 6800886c4d10..ab2e279ed3c2 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -2751,7 +2751,8 @@ static void wait_sb_inodes(struct super_block *sb) * do not have the mapping lock. Skip it here, wb completion * will remove it. */
        if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
        if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) ||
            mapping_no_data_integrity(mapping))
                continue;

        spin_unlock_irq(&sb->s_inode_wblist_lock);
diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 01bc894e9c2b..3b2a171e652f 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -3200,8 +3200,10 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags)
  inode->i_fop = &fuse_file_operations;
  inode->i_data.a_ops = &fuse_file_aops;
if (fc->writeback_cache)
if (fc->writeback_cache) {
        mapping_set_writeback_may_deadlock_on_reclaim(&inode->i_data);
        mapping_set_no_data_integrity(&inode->i_data);
}

INIT_LIST_HEAD(&fi->write_files);
INIT_LIST_HEAD(&fi->queued_writes);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 31a848485ad9..ec442af3f886 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -210,6 +210,7 @@ enum mapping_flags { AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9, AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't account usage to user cgroups */
AS_NO_DATA_INTEGRITY = 11, /* no data integrity guarantees */
/* Bits 16-25 are used for FOLIO_ORDER */
AS_FOLIO_ORDER_BITS = 5,
AS_FOLIO_ORDER_MIN = 16,
@@ -345,6 +346,16 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(const struct addres return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags); }

+static inline void mapping_set_no_data_integrity(struct address_space *mapping) +{
set_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);
+}

+static inline bool mapping_no_data_integrity(const struct address_space *mapping) +{
return test_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);
+}

static inline gfp_t mapping_gfp_mask(const struct address_space *mapping) { return mapping->gfp_mask; -- 2.47.3
-- Jan Kara jack@suse.com SUSE Labs, CR

days inactive

days old

linux-stable-mirror@lists.linaro.org

18 comments

participants

tags (0)

participants (7)

Andrew Morton
Bernd Schubert
David Hildenbrand (Red Hat)
J. Neuschäfer
Jan Kara
Joanne Koong
Miklos Szeredi