On Fri, May 3, 2024 at 6:40 AM Charan Teja Kalla quic_charante@quicinc.com wrote:
Thanks Christian/TJ for the inputs!!
On 4/18/2024 12:16 PM, Christian König wrote:
As far as I can see the EPOLL holds a reference to the files it contains. So it is perfectly valid to add the file descriptor to EPOLL and then close it.
In this case the file won't be closed, but be kept alive by it's reference in the EPOLL file descriptor.
I am not seeing that adding a 'fd' into the epoll monitoring list will increase its refcount. Confirmed by writing a testcase that just do open + EPOLL_CTL_ADD and see the /proc/../fdinfo of epoll fd(Added file_count() info to the output) # cat /proc/136/fdinfo/3 pos: 0 flags: 02 mnt_id: 14 ino: 1041 tfd: 4 events: 19 data: 4 pos:0 ino:81 sdev:5 refcount: 1<-- The fd added to the epoll monitoring is still 1(same as open file refcount)
From the code too, I don't see a file added in the epoll monitoring list will have an extra refcount but momentarily (where it increases the refcount of target file, add it to the rbtree of the epoll context and then decreasing the file refcount): do_epoll_ctl(): f = fdget(epfd); tf = fdget(fd); epoll_mutex_lock(&ep->mtx) EPOLL_CTL_ADD: ep_insert(ep, epds, tf.file, fd, full_check); // Added to the epoll monitoring rb tree list as epitem. mutex_unlock(&ep->mtx); fdput(tf); fdput(f);
Not sure If i am completely mistaken what you're saying here.
The fs layer which calls dma_buf_poll() should make sure that the file can't go away until the function returns.
Then inside dma_buf_poll() we add another reference to the file while installing the callback:
/* Paired with fput in dma_buf_poll_cb */ get_file(dmabuf->file); No, exactly that can't
happen either.
I am not quite comfortable with epoll internals but I think below race is possible where "The 'file' passed to dma_buf_poll() is proper but ->f_count == 0, which is indicating that a parallel freeing is happening". So, we should check the file->f_count value before taking the refcount.
(A 'fd' registered for the epoll monitoring list is maintained as 'epitem(epi)' in the rbtree of 'eventpoll(ep, called as epoll context).
epoll_wait() __fput()(as file->f_count=0)
a) ep_poll_callback(): Is the waitqueue function called when signaled on the wait_queue_head_t of the registered poll() function.
It links the 'epi' to the ready list of 'ep': if (!ep_is_linked(epi)) list_add_tail_lockless(&epi->rdllink, &ep->rdllist)
wake_up(&ep->wq); Wake up the process waiting on epoll_wait() that endup in ep_poll.
b) ep_poll(): 1) while (1) { eavail = ep_events_available(ep); (checks ep->rdlist) ep_send_events(ep); (notify the events to user) } (epoll_wait() calling process gets woken up from a.2 and process the events raised added to rdlist in a.1)
ep_send_events(): mutex_lock(&ep->mtx); (** The sync lock is taken **) list_for_each_entry_safe(epi, tmp, &txlist, rdllink) { list_del_init(&epi->rdllink); revents = ep_item_poll(epi, &pt, 1) (vfs_poll()-->...f_op->poll(=dma_buf_poll) } mutex_unlock(&ep->mtx) (**release the lock.**)
(As part of procession of events, each epitem is removed from rdlist and call the f_op->poll() of a file which will endup in dma_buf_poll())
dma_buf_poll(): get_file(dmabuf->file); (** f_count changed from 0->1 **) dma_buf_poll_add_cb(resv, true, dcb): (registers dma_buf_poll_cb() against fence)
c) eventpoll_release_file(): mutex_lock(&ep->mtx); __ep_remove(ep, epi, true): mutex_unlock(&ep->mtx); (__ep_remove() will remove the 'epi' from rbtree and if present from rdlist as well) d) file_free(file), free the 'file'.
e) dma_buf_poll_cb: /* Paired with get_file in dma_buf_poll */ fput(dmabuf->file); (f_count changed from 1->0, where we try to free the 'file' again which is UAF/double free).
In the above race, If c) gets called first, then the 'epi' is removed from both rbtree and 'rdlink' under ep->mtx lock thus b.2 don't end up in calling the ->poll() as it don't see this event in the rdlist.
Race only exist If b.2 executes first, where it will call dma_buf_poll with __valid 'struct file' under ep->mtx but its refcount is already could have been zero__. Later When e) is executed, it turns into double free of the 'file' structure.
If you're convinced with the above race, should the fix here will be this simple check: diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 8fe5aa67b167..e469dd9288cc --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -240,6 +240,10 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll) struct dma_resv *resv; __poll_t events;
/* Check parallel freeing of file */
if (!file_count(file))
return 0;
Thanks, Charan
Hi Charan,
It looks like a similar conclusion about epoll was reached at: https://lore.kernel.org/all/a87d7ef8-2c59-4dc5-ba0a-b821d1effc72@amd.com/
I agree with Christian that it should not be possible for the file to be freed while inside dma_buf_poll. Aside from causing problems in dma_buf_poll, ep_item_poll itself would have issues dereferencing the freed file pointer.
Christian's patch there makes me wonder about what the epoll man page says about closing files. "Will closing a file descriptor cause it to be removed from all epoll interest lists?" Answer: Yes https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/man/man7/ep...
It looks like eventpoll_release_file is responsible for that, but if the epitem is changed to hold a reference then that can't be true anymore because __fput will never call eventpoll_release_file (until EPOLL_CTL_DEL). But how do you call EPOLL_CTL_DEL if you've closed all your references to the file in userspace? So I think epoll needs a slightly more complicated fix.