On Mon 2018-10-01 13:37:30, Daniel Wang wrote:
On Mon, Oct 1, 2018 at 12:23 PM Steven Rostedt rostedt@goodmis.org wrote:
Serial console logs leading up to the deadlock. As can be seen the stack trace was incomplete because the printing path hit a timeout.
I'm fine with having this backported.
Thanks. I can send the cherrypicks your way. Do you recommend that I include the three follow-up fixes though?
c14376de3a1b printk: Wake klogd when passing console_lock owner fd5f7cde1b85 printk: Never set console_may_schedule in console_trylock() c162d5b4338d printk: Hide console waiter logic into helpers dbdda842fe96 printk: Add console owner and waiter logic to load balance console writes
This list looks complete and I am fine with backporting it to 4.14.
Well, I still wonder why it helped and why you do not see it with 4.4. I have a feeling that the console owner switch helped only by chance. In fact, you might be affected by a race in printk_safe_flush_on_panic() that was fixed by the commit:
554755be08fba31c7 printk: drop in_nmi check from printk_safe_flush_on_panic()
The above one commit might be enough. Well, there was one more NMI-related race that was fixed by:
ba552399954dde1b printk: Split the code for storing a message into the log buffer a338f84dc196f44b printk: Create helper function to queue deferred console handling 03fc7f9c99c1e7ae printk/nmi: Prevent deadlock when accessing the main log buffer in NMI
Best Regards, Petr