On Fri, May 03, 2024 at 06:54:22PM +0700, Bui Quang Minh wrote:
[...] Root cause: AFAIK, eventpoll (epoll) does not increase the registered file's reference. To ensure the safety, when the registered file is deallocated in __fput, eventpoll_release is called to unregister the file from epoll. When calling poll on epoll, epoll will loop through registered files and call vfs_poll on these files. In the file's poll, file is guaranteed to be alive, however, as epoll does not increase the registered file's reference, the file may be dying and it's not safe the get the file for later use. And dma_buf_poll violates this. In the dma_buf_poll, it tries to get_file to use in the callback. This leads to a race where the dmabuf->file can be fput twice.
Here is the race occurs in the above proof-of-concept
close(dmabuf->file) __fput_sync (f_count == 1, last ref) f_count-- (f_count == 0 now) __fput epoll_wait vfs_poll(dmabuf->file) get_file(dmabuf->file)(f_count == 1) eventpoll_release dmabuf->file deallocation fput(dmabuf->file) (f_count == 1) f_count-- dmabuf->file deallocation
I am not familiar with the dma_buf so I don't know the proper fix for the issue. About the rule that don't get the file for later use in poll callback of file, I wonder if it is there when only select/poll exist or just after epoll appears.
I hope the analysis helps us to fix the issue.
Thanks for doing this analysis! I suspect at least a start of a fix would be this:
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 8fe5aa67b167..15e8f74ee0f2 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -267,9 +267,8 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
if (events & EPOLLOUT) { /* Paired with fput in dma_buf_poll_cb */ - get_file(dmabuf->file); - - if (!dma_buf_poll_add_cb(resv, true, dcb)) + if (!atomic_long_inc_not_zero(&dmabuf->file) && + !dma_buf_poll_add_cb(resv, true, dcb)) /* No callback queued, wake up any other waiters */ dma_buf_poll_cb(NULL, &dcb->cb); else @@ -290,9 +289,8 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
if (events & EPOLLIN) { /* Paired with fput in dma_buf_poll_cb */ - get_file(dmabuf->file); - - if (!dma_buf_poll_add_cb(resv, false, dcb)) + if (!atomic_long_inc_not_zero(&dmabuf->file) && + !dma_buf_poll_add_cb(resv, false, dcb)) /* No callback queued, wake up any other waiters */ dma_buf_poll_cb(NULL, &dcb->cb); else
But this ends up leaving "active" non-zero, and at close time it runs into:
BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
But the bottom line is that get_file() is unsafe to use in some places, one of which appears to be in the poll handler. There are maybe some other fragile places too, like in drivers/gpu/drm/vmwgfx/ttm_object.c:
static bool __must_check get_dma_buf_unless_doomed(struct dma_buf *dmabuf) { return atomic_long_inc_not_zero(&dmabuf->file->f_count) != 0L; }
Which I also note involves a dmabuf...
Due to this issue I've proposed fixing get_file() to detect pathological states: https://lore.kernel.org/lkml/20240502222252.work.690-kees@kernel.org/
But that has run into some push-back. I'm hoping that seeing this epoll example will help illustrate what needs fixing a little better.
I think the best current proposal is to just WARN sooner instead of a full refcount_t implementation:
diff --git a/include/linux/fs.h b/include/linux/fs.h index 8dfd53b52744..e09107d0a3d6 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1040,7 +1040,8 @@ struct file_handle {
static inline struct file *get_file(struct file *f) { - atomic_long_inc(&f->f_count); + long prior = atomic_long_fetch_inc_relaxed(&f->f_count); + WARN_ONCE(!prior, "struct file::f_count incremented from zero; use-after-free condition present!\n"); return f; }
What's the right way to deal with the dmabuf situation? (And I suspect it applies to get_dma_buf_unless_doomed() as well...)
-Kees