On Mon, May 06, 2024 at 04:29:44PM +0200, Christian König wrote:
Am 04.05.24 um 20:20 schrieb Linus Torvalds:
On Sat, 4 May 2024 at 08:32, Linus Torvalds torvalds@linux-foundation.org wrote:
Lookie here, the fundamental issue is that epoll can call '->poll()' on a file descriptor that is being closed concurrently.
Thinking some more about this, and replying to myself...
Actually, I wonder if we could *really* fix this by simply moving the eventpoll_release() to where it really belongs.
If we did it in file_close_fd_locked(), it would actually make a *lot* more sense. Particularly since eventpoll actually uses this:
struct epoll_filefd { struct file *file; int fd; } __packed;
ie it doesn't just use the 'struct file *', it uses the 'fd' itself (for ep_find()).
(Strictly speaking, it should also have a pointer to the 'struct files_struct' to make the 'int fd' be meaningful).
While I completely agree on this I unfortunately have to ruin the idea.
Before we had KCMP some people relied on the strange behavior of eventpoll to compare struct files when the fd is the same.
I just recently suggested that solution to somebody at AMD as a workaround when KCMP is disabled because of security hardening and I'm pretty sure I've seen it somewhere else as well.
So when we change that it would break (undocumented?) UAPI behavior.
Uh extremely aside, but doesn't this mean we should just enable kcmp on files unconditionally, since there's an alternative? Or a least everywhere CONFIG_EPOLL is enabled?
It's really annoying that on some distros/builds we don't have that, and for gpu driver stack reasons we _really_ need to know whether a fd is the same as another, due to some messy uniqueness requirements on buffer objects various drivers have. -Sima
Regards, Christian.
IOW, eventpoll already considers the file _descriptor_ relevant, not just the file pointer, and that's destroyed at *close* time, not at 'fput()' time.
Yeah, yeah, the locking situation in file_close_fd_locked() is a bit inconvenient, but if we can solve that, it would solve the problem in a fundamentally different way: remove the ep iterm before the file->f_count has actually been decremented, so the whole "race with fput()" would just go away entirely.
I dunno. I think that would be the right thing to do, but I wouldn't be surprised if some disgusting eventpoll user then might depend on the current situation where the eventpoll thing stays around even after the close() if you have another copy of the file open.
Linus
Linaro-mm-sig mailing list -- linaro-mm-sig@lists.linaro.org To unsubscribe send an email to linaro-mm-sig-leave@lists.linaro.org