On 1/6/26 10:33, Jan Kara wrote:
[Thanks to Andrew for CCing me on patch commit]
On Sun 14-12-25 19:00:43, Joanne Koong wrote:
Skip waiting on writeback for inodes that belong to mappings that do not have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY mapping flag).
This restores fuse back to prior behavior where syncs are no-ops. This is needed because otherwise, if a system is running a faulty fuse server that does not reply to issued write requests, this will cause wait_sb_inodes() to wait forever.
Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") Reported-by: Athul Krishna athul.krishna.kr@protonmail.com Reported-by: J. Neuschäfer j.neuschaefer@gmx.net Cc: stable@vger.kernel.org Signed-off-by: Joanne Koong joannelkoong@gmail.com
OK, but the difference 0c58a97f919c introduced goes much further than just wait_sb_inodes(). Before 0c58a97f919c also filemap_fdatawait() (and all the other variants waiting for folio_writeback() to clear) returned immediately because folio writeback was done as soon as we've copied the content into the temporary page. Now they will block waiting for the server to finish the IO. So e.g. fsync() will block waiting for the server in file_write_and_wait_range() now, instead of blocking in fuse_fsync_common() -> fuse_simple_request(). Similarly e.g. truncate(2) will now block waiting for the server so that folio_writeback can be cleared.
So I understand your patch fixes the regression with suspend blocking but I don't have a high confidence we are not just starting a whack-a-mole game
Yes, I think so, and I think it is [1] not even only limited to writeback [2].
catching all the places that previously hiddenly depended on folio_writeback getting cleared without any involvement of untrusted fuse server and now this changed.
Even worse, it's not only untrusted fuse servers, but also trusted-but-buggy fuse servers, unfortunately. As Joanne wrote in v1:
" As reported by Athul upstream in [1], there is a userspace regression caused by commit 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") where if there is a bug in a fuse server that causes the server to never complete writeback, it will make wait_sb_inodes() wait forever, causing sync paths to hang. "
So do we have some higher-level idea what is / is not guaranteed with stuck fuse server?
Joanne first proposed AS_WRITEBACK_MAY_HANG, which I disliked [2] for various reasons because the semantics are weird. I am strongly against using such a flag to arbitrarily skip waiting for writeback on folios in the tree.
The patch here is at least logically the right thing to do when only looking at the wait_sb_inodes() writeback situation [3] and why it is even ok to skip waiting for writeback, and the fix Joanne originally proposed.
To handle the bigger picture (I raised another problematic instance in [4]): I don't know how to handle that without properly fixing fuse. Fuse folks should really invest some time to solve this problem for good.
As a big temporary kernel hack, we could add a AS_ANY_WAITING_UTTERLY_BROKEN and simply refuse to wait for writeback directly inside folio_wait_writeback() -- not arbitrarily skipping it in callers -- and possibly other places (readahead, not sure). That would restore the old behavior.
Well, not quite, because the semantics that folio_wait_writeback() promises -- writeback flag at least cleared once, like required here for data integrity -- are just not true anymore.
And it would still break migration of folios that are under writeback even though waiting for writeback even for migration even though in 99.9999% of all cases with trusted fuse server will do the right thing. Just nasty.
Of course, we could set AS_ANY_WAITING_UTTERLY_BROKEN in fuse only conditionally, but the fact that buggy trusted fuse servers are now a thing, it all stops making any sense because we would have to set that flag always.
There is no easy way to get back the old behavior without reverting to the old way of using buffer pages I guess. [1] https://lore.kernel.org/linux-mm/504d100d-b8f3-475b-b575-3adfd17627b5@kernel...] https://lore.kernel.org/linux-mm/f8da9ee0-f136-4366-b63a-1812fda11304@kernel...] https://lore.kernel.org/linux-mm/6d0948f5-e739-49f3-8e23-359ddbf3da8f@kernel...] https://lore.kernel.org/linux-mm/504d100d-b8f3-475b-b575-3adfd17627b5@kernel...