Am 15.04.24 um 12:35 schrieb zhiguojiang:
在 2024/4/12 14:39, Christian König 写道:
[Some people who received this message don't often get email from christian.koenig@amd.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
Am 12.04.24 um 08:19 schrieb zhiguojiang:
[SNIP] -> Here task 2220 do epoll again where internally it will get/put then start to free twice and lead to final crash.
Here is the basic flow:
- Thread A install the dma_buf_fd via dma_buf_export, the fd refcount
is 1
Thread A add the fd to epoll list via epoll_ctl(EPOLL_CTL_ADD)
After use the dma buf, Thread A close the fd, then here fd refcount
is 0, and it will run __fput finally to release the file
Stop, that isn't correct.
The fs layer which calls dma_buf_poll() should make sure that the file can't go away until the function returns.
Then inside dma_buf_poll() we add another reference to the file while installing the callback:
/* Paired with fput in dma_buf_poll_cb */ get_file(dmabuf->file);
Hi,
The problem may just occurred here.
Is it possible file reference count already decreased to 0 and fput already being in progressing just before calling get_file(dmabuf->file) in dma_buf_poll()?
No, exactly that isn't possible.
If a function gets a dma_buf pointer or even more general any reference counted pointer which has already decreased to 0 then that is a major bug in the caller of that function.
BTW: It's completely illegal to work around such issues by using file_count() or RCU functions. So when you suggest stuff like that it will immediately face rejection.
Regards, Christian.
This reference is only dropped after the callback is completed in dma_buf_poll_cb():
/* Paired with get_file in dma_buf_poll */ fput(dmabuf->file);
So your explanation for the issue just seems to be incorrect.
- Here Thread A not do epoll_ctl(EPOLL_CTL_DEL) manunally, so it
still resides in one epoll_list. Although __fput will call eventpoll_release to remove the file from binded epoll list, but it has small time window where Thread B jumps in.
Well if that is really the case then that would then be a bug in epoll_ctl().
- During the small window, Thread B do the poll action for
dma_buf_fd, where it will fget/fput for the file, this means the fd refcount will be 0 -> 1 -> 0, and it will call __fput again. This will lead to __fput twice for the same file.
- So the potenial fix is use get_file_rcu which to check if file
refcount already zero which means under free. If so, we just return and no need to do the dma_buf_poll.
Well to say it bluntly as far as I can see this suggestion is completely nonsense.
When the reference to the file goes away while dma_buf_poll() is executed then that's a massive bug in the caller of that function.
Regards, Christian.
Here is the race condition:
Thread A Thread B dma_buf_export fd_refcount is 1 epoll_ctl(EPOLL_ADD) add dma_buf_fd to epoll list close(dma_buf_fd) fd_refcount is 0 __fput dma_buf_poll fget fput fd_refcount is zero again
Thanks
linaro-mm-sig@lists.linaro.org