Am 12.04.24 um 08:19 schrieb zhiguojiang:
[SNIP] -> Here task 2220 do epoll again where internally it will get/put then start to free twice and lead to final crash.
Here is the basic flow:
- Thread A install the dma_buf_fd via dma_buf_export, the fd refcount
is 1
Thread A add the fd to epoll list via epoll_ctl(EPOLL_CTL_ADD)
After use the dma buf, Thread A close the fd, then here fd refcount
is 0, and it will run __fput finally to release the file
Stop, that isn't correct.
The fs layer which calls dma_buf_poll() should make sure that the file can't go away until the function returns.
Then inside dma_buf_poll() we add another reference to the file while installing the callback:
/* Paired with fput in dma_buf_poll_cb */ get_file(dmabuf->file);
This reference is only dropped after the callback is completed in dma_buf_poll_cb():
/* Paired with get_file in dma_buf_poll */ fput(dmabuf->file);
So your explanation for the issue just seems to be incorrect.
- Here Thread A not do epoll_ctl(EPOLL_CTL_DEL) manunally, so it
still resides in one epoll_list. Although __fput will call eventpoll_release to remove the file from binded epoll list, but it has small time window where Thread B jumps in.
Well if that is really the case then that would then be a bug in epoll_ctl().
- During the small window, Thread B do the poll action for
dma_buf_fd, where it will fget/fput for the file, this means the fd refcount will be 0 -> 1 -> 0, and it will call __fput again. This will lead to __fput twice for the same file.
- So the potenial fix is use get_file_rcu which to check if file
refcount already zero which means under free. If so, we just return and no need to do the dma_buf_poll.
Well to say it bluntly as far as I can see this suggestion is completely nonsense.
When the reference to the file goes away while dma_buf_poll() is executed then that's a massive bug in the caller of that function.
Regards, Christian.
Here is the race condition:
Thread A Thread B dma_buf_export fd_refcount is 1 epoll_ctl(EPOLL_ADD) add dma_buf_fd to epoll list close(dma_buf_fd) fd_refcount is 0 __fput dma_buf_poll fget fput fd_refcount is zero again
Thanks