On Tue, 2018-09-11 at 13:40 +0200, Peter Zijlstra wrote:
On Tue, Sep 11, 2018 at 02:48:17AM +0100, Dmitry Safonov wrote:
There is a couple of reports about lockup in ldsem_down_read() without anyone holding write end of ldisc semaphore: lkml.kernel.org/r/<20171121132855.ajdv4k6swzhvktl6@wfg-t540p.sh.int el.com> lkml.kernel.org/r/20180907045041.GF1110@shao2-debian
They all looked like a missed wake up. I wasn't lucky enough to reproduce it, but it seems like reader on another CPU can miss waiter->task update and schedule again, resulting in indefinite (MAX_SCHEDULE_TIMEOUT) sleep.
Make sure waked up reader will see waiter->task == NULL. --- a/drivers/tty/tty_ldsem.c +++ b/drivers/tty/tty_ldsem.c @@ -118,6 +118,8 @@ static void __ldsem_wake_readers(struct ld_semaphore *sem) tsk = waiter->task; smp_mb(); waiter->task = NULL;
/* Make sure down_read_failed() will see !waiter-
task update */
wake_up_process(tsk);smp_wmb();
This is 'wrong', wake_up_process() should imply sufficient for this to already be true.
Yeah, thanks. It was stupid of me not to check that.. Saw the smoke that would describe the reports and made too long-going conjectures. Need more covfefe and staring into that code.
put_task_struct(tsk);
}