New subject: [PATCH] epoll: try to be a _bit_ better about file lifetimes

3 May 2024

On Fri, May 03, 2024 at 06:54:22PM +0700, Bui Quang Minh wrote:
...
[...]
Root cause:
AFAIK, eventpoll (epoll) does not increase the registered file's reference.
To ensure the safety, when the registered file is deallocated in __fput,
eventpoll_release is called to unregister the file from epoll. When calling
poll on epoll, epoll will loop through registered files and call vfs_poll on
these files. In the file's poll, file is guaranteed to be alive, however, as
epoll does not increase the registered file's reference, the file may be
dying
and it's not safe the get the file for later use. And dma_buf_poll violates
this. In the dma_buf_poll, it tries to get_file to use in the callback. This
leads to a race where the dmabuf->file can be fput twice.
Here is the race occurs in the above proof-of-concept
close(dmabuf->file)
__fput_sync (f_count == 1, last ref)
f_count-- (f_count == 0 now)
__fput
                                    epoll_wait
                                    vfs_poll(dmabuf->file)
                                    get_file(dmabuf->file)(f_count == 1)
eventpoll_release
dmabuf->file deallocation
                                    fput(dmabuf->file) (f_count == 1)
                                    f_count--
                                    dmabuf->file deallocation
I am not familiar with the dma_buf so I don't know the proper fix for the
issue. About the rule that don't get the file for later use in poll callback
of
file, I wonder if it is there when only select/poll exist or just after
epoll
appears.
I hope the analysis helps us to fix the issue.
Thanks for doing this analysis! I suspect at least a start of a fix
would be this:

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 8fe5aa67b167..15e8f74ee0f2 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -267,9 +267,8 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
if (events & EPOLLOUT) {
    		/* Paired with fput in dma_buf_poll_cb */
-			get_file(dmabuf->file);
-
-			if (!dma_buf_poll_add_cb(resv, true, dcb))
+			if (!atomic_long_inc_not_zero(&dmabuf->file) &&
+			    !dma_buf_poll_add_cb(resv, true, dcb))
    			/* No callback queued, wake up any other waiters */
    			dma_buf_poll_cb(NULL, &dcb->cb);
    		else
@@ -290,9 +289,8 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
if (events & EPOLLIN) {
    		/* Paired with fput in dma_buf_poll_cb */
-			get_file(dmabuf->file);
-
-			if (!dma_buf_poll_add_cb(resv, false, dcb))
+			if (!atomic_long_inc_not_zero(&dmabuf->file) &&
+			    !dma_buf_poll_add_cb(resv, false, dcb))
    			/* No callback queued, wake up any other waiters */
    			dma_buf_poll_cb(NULL, &dcb->cb);
    		else
But this ends up leaving "active" non-zero, and at close time it runs
into:
BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
But the bottom line is that get_file() is unsafe to use in some places,
one of which appears to be in the poll handler. There are maybe some
other fragile places too, like in drivers/gpu/drm/vmwgfx/ttm_object.c:
static bool __must_check get_dma_buf_unless_doomed(struct dma_buf *dmabuf)
{
    return atomic_long_inc_not_zero(&dmabuf->file->f_count) != 0L;
}
Which I also note involves a dmabuf...
Due to this issue I've proposed fixing get_file() to detect pathological states:
https://lore.kernel.org/lkml/20240502222252.work.690-kees@kernel.org/
But that has run into some push-back. I'm hoping that seeing this epoll
example will help illustrate what needs fixing a little better.
I think the best current proposal is to just WARN sooner instead of a
full refcount_t implementation:
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8dfd53b52744..e09107d0a3d6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1040,7 +1040,8 @@ struct file_handle {
static inline struct file *get_file(struct file *f)
 {
-	atomic_long_inc(&f->f_count);
+	long prior = atomic_long_fetch_inc_relaxed(&f->f_count);
+	WARN_ONCE(!prior, "struct file::f_count incremented from zero; use-after-free condition present!\n");
    return f;
 }
What's the right way to deal with the dmabuf situation? (And I suspect
it applies to get_dma_buf_unless_doomed() as well...)
-Kees
-- 
Kees Cook

    

get_file() unsafe under epoll (was Re: [syzbot] [fs?] [io-uring?] general protection fault in __ep_remove)