Re: 'simple' futex interface [Was: [PATCH v3 1/4] futex: Implement mechanism to wait on any of several futexes]

3 Mar 2020

      * Peter Zijlstra:
...
So how about we introduce new syscalls:
sys_futex_wait(void *uaddr, unsigned long val, unsigned long flags, ktime_t *timo);
struct futex_wait {
   void *uaddr;
   unsigned long val;
   unsigned long flags;
  };
  sys_futex_waitv(struct futex_wait *waiters, unsigned int nr_waiters,
   	  unsigned long flags, ktime_t *timo);
sys_futex_wake(void *uaddr, unsigned int nr, unsigned long flags);
sys_futex_cmp_requeue(void *uaddr1, void *uaddr2, unsigned int nr_wake,
   		unsigned int nr_requeue, unsigned long cmpval, unsigned long flags);
Where flags:

has 2 bits for size: 8,16,32,64
has 2 more bits for size (requeue) ??
has ... bits for clocks
has private/shared
has numa

What's the actual type of *uaddr?  Does it vary by size (which I assume
is in bits?)?  Are there alignment constraints?
These system calls seemed to be type-polymorphic still, which is
problematic for defining a really nice C interface.  I would really like
to have a strongly typed interface for this, with a nice struct futex
wrapper type (even if it means that we need four of them).
Will all architectures support all sizes?  If not, how do we probe which
size/flags combinations are supported?
...
For NUMA I propose that when NUMA_FLAG is set, uaddr-4 will be 'int
node_id', with the following semantics:

on WAIT, node_id is read and when 0 <= node_id <= nr_nodes, is
directly used to index into per-node hash-tables. When -1, it is
replaced by the current node_id and an smp_mb() is issued before we
load and compare the @uaddr.

on WAKE/REQUEUE, it is an immediate index.

Does this mean the first waiter determines the NUMA index, and all
future waiters use the same chain even if they are on different nodes?
I think documenting this as a node index would be a mistake.  It could
be an arbitrary hint for locating the corresponding kernel data
structures.
...
Any invalid value with result in EINVAL.
Using uaddr-4 is slightly tricky with a 64-bit futex value, due to the
need to maintain alignment and avoid padding.
Thanks,
Florian

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: 'simple' futex interface [Was: [PATCH v3 1/4] futex: Implement mechanism to wait on any of several futexes]