On 1/6/26 14:13, Miklos Szeredi wrote:
On Tue, 6 Jan 2026 at 11:05, David Hildenbrand (Red Hat) david@kernel.org wrote:
So I understand your patch fixes the regression with suspend blocking but I don't have a high confidence we are not just starting a whack-a-mole game
Joanne did a thorough analysis, so I still have hope. Missing a case in such a complex thing is not unexpected.
Yes, I think so, and I think it is [1] not even only limited to writeback [2].
You are referring to DoS against compaction?
In previous discussions it was raised that readahead runs into similar problems.
I don't recall all the details, but I think that we might end up holding the folio lock forever while the fuse user space daemon is supposed to fill the page with data; anybody trying to lock the folio would similarly deadlock.
Maybe only compaction/migration is affected by that, hard to tell.
It is a much more benign issue, since compaction will just skip locked pages, AFAIU (wasn't always so: https://lore.kernel.org/all/1288817005.4235.11393.camel@nimitz/).
Not saying it shouldn't be fixed, but it should be a separate discussion.
Right. But as I pointed out in [4], there are other call paths where we might end up waiting for writeback unless I am missing something.
So it has whack-a-mole smell to it.
To handle the bigger picture (I raised another problematic instance in [4]): I don't know how to handle that without properly fixing fuse. Fuse folks should really invest some time to solve this problem for good.
Fixing it generically in fuse would necessarily involve bringing back some sort of temp buffer. The performance penalty could be minimized, but complexity is what really hurts.
I'm not sure about temp buffers. During early discussions there were ideas about canceling writeback and instead marking the folio dirty again. I assume there is a non-trivial solution space left unexplored for now.
Maybe doing whack-a-mole results in less mess overall :-/
Maybe :) I'm fine with the patch as is as well.
As a big temporary kernel hack, we could add a AS_ANY_WAITING_UTTERLY_BROKEN and simply refuse to wait for writeback directly inside folio_wait_writeback() -- not arbitrarily skipping it in callers -- and possibly other places (readahead, not sure). That would restore the old behavior.
No it wouldn't, since the old code had surrogate methods for waiting on outstanding writes, which were called on fsync, etc.
Yeah, I raised some "except" below, I assume there are more. No that I would want to go down that path :)