On Mon, Feb 18, 2019 at 8:26 AM Stefan Liebler stli@linux.ibm.com wrote:
Hi Sudip,
On 02/17/2019 06:59 PM, Thomas Gleixner wrote:
On Sun, 17 Feb 2019, Sudip Mukherjee wrote:
Hi Thomas,
On Sun, Feb 17, 2019 at 11:53 AM Thomas Gleixner tglx@linutronix.de wrote:
On Sun, 17 Feb 2019, Sudip Mukherjee wrote:
Hi Greg,
On Mon, Dec 24, 2018 at 12:52:22PM +0100, gregkh@linuxfoundation.org wrote:
<snip> >> I think we have a real usecase which is triggering this error and I was >> still in the middle of debugging that. But my initial analysis was >> showing that the userspace thread was stuck in the indefinite loop.
=> This behaviour depends on the configuration of assert. See glibc code in nptl/pthread_mutex_lock.c (you will encounter either an abort due to assert or an indefinite loop): /* ESRCH can happen only for non-robust PI mutexes where the owner of the lock died. */ assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust);
/* Delay the thread indefinitely. */ while (1) __pause_nocancel ();
I have a reliable reproducer of the problem and will setup a test tomorrow and confirm.
There are more patches in that area and you also need a fixed glibc.
I can see 1a1fb985f2e2 ("futex: Handle early deadlock return correctly") is already there in 4.14-stable. Is anything else missing, other than this one?
glibc might be a problem, but lets see what can be done.
Those two are the kernel side of affairs I think.
The relevant glibc commits are:
8f9450a0b7a9e78267e8ae1ab1000ebca08e473e
=> Needed for pthread_mutex_lock / pthread_mutex_timedlock (within glibc release 2.25)
823624bdc47f1f80109c9c52dee7939b9386d708
=> Needed for pthread_mutex_trylock (will be within next glibc release 2.30, but is backported to glibc release branches 2.25 ... 2.29)
Thanks. I tried with only the kernel changes and it was not resolved. Then I tried with both kernel changes and the glibc changes and I saw the problem improving significantly. But since we are using an ancient version of eglibc, I am not expecting it to get better than this.