On Sun, 5 May 2024 at 03:50, Christian Brauner brauner@kernel.org wrote:
And I agree with you that for some instances it's valid to take another reference to a file from f_op->poll() but then they need to use get_file_active() imho and simply handle the case where f_count is zero.
I think this is
(a) practically impossible to find (since most f_count updates are in various random helpers)
(b) not tenable in the first place, since *EVERYBODY* does a f_count update as part of the bog-standard pollwait
So (b) means that the notion of "warn if somebody increments f_count from zero" is broken to begin with - but it's doubly broken because it wouldn't find anything *anyway*, since this never happens in any normal situation.
And (a) means that any non-automatic finding of this is practically impossible.
And we need to document that in Documentation/filesystems/file.rst or locking.rst.
WHY?
Why cannot you and Al just admit that the problem is in epoll. Always has been, always will be.
The fact is, it's not dma-buf that is violating any rules. It's epoll. It's calling out to random driver functions with a file pointer that is no longer valid.
It really is that simple.
I don't see why you are arguing for "unknown number of drivers - we know at least *one* - have to be fixed for a bug that is in epoll".
If it was *easy* to fix, and if it was *easy* to validate, then sure. But that just isn't the case.
In contrast, in epoll it's *trivial* to fix the one case where it does a VFS call-out, and just say "you have to follow the rules".
So explain to me again why you want to mess up the driver interface and everybody who has a '.poll()' function, and not just fix the ONE clearly buggy piece of code.
Because dammit,. epoll is clearly buggy. It's not enough to say "the file allocation isn't going away", and claim that that means that it's not buggy - when the file IS NO LONGER VALID!
Linus