Re: [PATCH 2/2] eventpoll: Fix epoll_wait() report false negative

18 Jul 2025

On Fri, Jul 18, 2025 at 8:52 AM Nam Cao namcao@linutronix.de wrote:
...
ep_events_available() checks for available events by looking at ep->rdllist
and ep->ovflist. However, this is done without a lock, therefore the
returned value is not reliable. Because it is possible that both checks on
ep->rdllist and ep->ovflist are false while ep_start_scan() or
ep_done_scan() is being executed on other CPUs, despite events are
available.
This bug can be observed by:

Create an eventpoll with at least one ready level-triggered event

Create multiple threads who do epoll_wait() with zero timeout. The
threads do not consume the events, therefore all epoll_wait() should
return at least one event.


If one thread is executing ep_events_available() while another thread is
executing ep_start_scan() or ep_done_scan(), epoll_wait() may wrongly
return no event for the former thread.
That is the whole point of epoll_wait with a zero timeout.  We would want to
opportunistically poll without much overhead, which will have more
false positives.
A caller that calls with a zero timeout should retry later, and will
at some point
observe the event.
I'm not sure if we would want to add much more overheads, for higher precision.
Thanks,
Soheil
...
This reproducer is implemented as TEST(epoll65) in
tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
Fix it by skipping ep_events_available(), just call ep_try_send_events()
directly.
epoll_sendevents() (io_uring) suffers the same problem, fix that as well.
There is still ep_busy_loop() who uses ep_events_available() without lock,
but it is probably okay (?) for busy-polling.
Fixes: c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()")
Fixes: e59d3c64cba6 ("epoll: eliminate unnecessary lock for zero timeout")
Fixes: ae3a4f1fdc2c ("eventpoll: add epoll_sendevents() helper")
Signed-off-by: Nam Cao namcao@linutronix.de
Cc: stable@vger.kernel.org

fs/eventpoll.c | 16 ++--------------
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 0fbf5dfedb24..541481eafc20 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2022,7 +2022,7 @@ static int ep_schedule_timeout(ktime_t *to)
 static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
                   int maxevents, struct timespec64 *timeout)
 {

  int res, eavail, timed_out = 0;




  int res, eavail = 1, timed_out = 0;
  u64 slack = 0;
  wait_queue_entry_t wait;
  ktime_t expires, *to = NULL;



@@ -2041,16 +2041,6 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
                timed_out = 1;
        }

  /*


   * This call is racy: We may or may not see events that are being added


   * to the ready list under the lock (e.g., in IRQ callbacks). For cases


   * with a non-zero timeout, this thread will check the ready list under


   * lock and will add to the wait queue.  For cases with a zero


   * timeout, the user by definition should not care and will have to


   * recheck again.


   */


  eavail = ep_events_available(ep);


  while (1) {
          if (eavail) {
                  res = ep_try_send_events(ep, events, maxevents);



@@ -2496,9 +2486,7 @@ int epoll_sendevents(struct file *file, struct epoll_event __user *events,
         * Racy call, but that's ok - it should get retried based on
         * poll readiness anyway.
         */

  if (ep_events_available(ep))


          return ep_try_send_events(ep, events, maxevents);


  return 0;




  return ep_try_send_events(ep, events, maxevents);



}
/*
2.39.5

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 2/2] eventpoll: Fix epoll_wait() report false negative

/*