On 2021-06-08 16:23:45 [+0200], Peter Zijlstra wrote:
There's more futex users than glibc, and some of them are really hurting because of the NUMA issue. Oracle used to (I've no idea what they do or do not do these days) use sysvsem because the futex hash table was a massive bottleneck for them.
And as Nick said, other vendors are having the same problems.
I just wanted to do a brief summary of last events. The implementation tglx did with the cookie resulting in a quick lookup did not have any downsides except that the user-API had to change glibc couldn't. So if we are back to square one why not start with that.
And if you don't extend the futex to store the nid you put the waiter in (see all the problems above) you will have to do wakeups on all nodes, which is both slower than it is today, and scales possibly even worse.
The whole numa-aware qspinlock saga is in part because of futex.
sure.
That said; if we're going to do the whole futex-vector thing, we really do need a new interface, because the futex multiplex monster is about to crumble (see the fun wrt timeouts for example).
This might have been a series of unfortunate events leading to this. The sad part is that glibc has a comment that the kernel does not support this and nobody bother to change it (until recently).
And if we're going to do a new interface, we ought to make one that can solve all these problems. Now, ideally glibc will bring forth some opinions, but if they don't want to play, we'll go back to the good old days of non-standard locking libraries.. we're halfway there already due to glibc not wanting to break with POSIX were we know POSIX was just dead wrong broken.
I'm aware of that, I hacked on it, too :) This was the unfortunate result of a ~8y old bug which was not fixed instead and part of the code was rewritten and a bit-spinlock was added in user-land. You may remember the discussion regarding spins in userland… That said, REQUEUE_PI is no longer used by glibc.
Sebastian