On Sun, 5 May 2024 at 12:46, Al Viro viro@zeniv.linux.org.uk wrote:
I've no problem with having epoll grab a reference, but if we make that a universal requirement ->poll() instances can rely upon,
Al, we're note "making that a requirement".
It always has been.
Otgherwise, the docs should have shouted out DAMN LOUDLY that you can't rely on all the normal refcounting of 'struct file' THAT EVERY SINGLE NORMAL VFS FUNCTION CAN.
Lookie herte: epoll is unimportant and irrelevant garbage compared to something fundamental like "read()", and what does read() do?
It does this:
struct fd f = fdget_pos(fd);
if (f.file) { ...
which is being DAMN CAREFUL to make sure that the file has the proper refcounts before it then calls "vfs_read()". There's a lot of very careful and subtle code in fdget_pos() to make this all proper, and that even if the file is closed by another thread concurrently, we *always* have a refcount to it, and it's always live over the whole 'vfs_read()' sequence.
'vfs_poll()' is NOT DIFFERENT in this regard. Not at all.
Now, you have two choices that are intellectually honest:
- admit that epoll() - which is a hell of a lot less important - should spend a small fraction of that effort on making its vfs_poll() use sane
- say that all this fdget_pos() care is uncalled for in the read() path, and we should make all the filesystem .read() functions be aware that the file pointer they get may be garbage, and they should use get_file_active() to make sure every 'struct file *' use they have is safe?
because if your choice is that "epoll can do whatever the f*&k it wants", then it's in clear violation of all the effort we go to in a MUCH MORE IMPORTANT code path, and is clearly not consistent or logical.
Neither you nor Christian have explained why you think it's ok for that epoll() garbage to magically violate all our regular rules.
Your claim that those regular rules are some new conditional requirement that we'd be imposing. NO. They are the rules that *anybody* who gets a 'struct file *' pointer should always be able to rely on by default: it's damn well a ref-counted thing, and the caller holds the refcount.
The exceptional case is exactly the other way around: if you do random crap with unrefcounted poitners, it's *your* problem, and *you* are the one who has to be careful. Not some unrelated poor driver that didn't know about your f*&k-up.
Dammit, epoll is CLEARLY BUGGY. It's passing off random kernel pointers without holding a refcount to them. THAT'S A BUG.
And fixing that bug is *not* somehow changing existing rules as you are trying to claim. No. It's just fixing a bug.
So stop claiming that this is some "new requirement". It is absolutely nothing of the sort. epoll() actively MISUSED file pointer, because file pointers are fundamentally refcounted (as are pretty much all sane kernel interfaces).
Linus