Re: [PATCH] printk/nmi: Prevent deadlock when serializing NMI backtraces

5 Jun 2018

On Thu 2018-05-17 16:39:03, Petr Mladek wrote:
...
The commit 719f6a7040f1bdaf96fcc ("printk: Use the main logbuf in NMI when
logbuf_lock is available") tried to detect when logbuf_lock was taken
on another CPU. Then it looked safe to wait for the lock even in NMI.
It would be safe if other locks were not involved. Ironically the same
commit introduced an ABBA deadlock scenario. It added a spin lock into
nmi_cpu_backtrace() to serialize logs from different CPUs. The effect
is that also the NMI handlers are serialized. As a result, logbuf_lock
might be blocked by NMI on another CPU:
CPU0			CPU1			CPU2
printk()
  vprintk_emit()
    spin_lock(&logbuf_lock)
				trigger_all_cpu_backtrace()
				  raise()

	nmi_enter()
	  printk_nmi_enter()
	    if (this_cpu_read(printk_context)
	      & PRINTK_SAFE_CONTEXT_MASK)
	      // false
	    else
	      // looks safe to use printk_deferred()
	      this_cpu_or(printk_context,
		PRINTK_NMI_DEFERRED_CONTEXT_MASK);

	  nmi_cpu_backtrace()
	    arch_spin_lock(&lock);
	      show_regs()


nmi_enter()
  nmi_cpu_backtrace()
    arch_spin_lock(&lock);
	      printk()
		vprintk_func()
		  vprintk_deferred()
		    vprintk_emit()
		      spin_lock(&logbuf_lock)


DEADLOCK: between &logbuf_lock from vprintk_emit() and
   	  &lock from nmi_cpu_backtrace().
CPU0			CPU1
lock(logbuf_lock)	lock(lock)
  lock(lock)		  lock(logbuf_lock)
I have found this problem when stress testing trigger_all_cpu_backtrace()
and the system frozen.
Note that lockdep is not able to detect these dependencies because
there is no support for NMI context. Let's stay on the safe side
and always use printk_safe buffers when logbuf_lock is taken
when entering NMI.
Fixes: 719f6a7040f1bdaf96fcc ("printk: Use the main logbuf in NMI when logbuf_lock is available")
Cc: 4.13+ stable@vger.kernel.org # v4.13+
Signed-off-by: Petr Mladek pmladek@suse.com

kernel/printk/printk_safe.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 449d67edfa4b..a2ebd749c053 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -310,15 +310,12 @@ void printk_nmi_enter(void)
 {
   /*
    * The size of the extra per-CPU buffer is limited. Use it only when

* the main one is locked. If this CPU is not in the safe context,


* the lock must be taken on another CPU and we could wait for it.




* the main one is locked.

*/


if ((this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK) &&
   raw_spin_is_locked(&logbuf_lock)) {




if (raw_spin_is_locked(&logbuf_lock))
this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);

Grr, the ABBA deadlock is still there. NMIs are not sent to the other
CPUs atomically. Even if we detect that logbuf_lock is available
in printk_nmi_enter() on some CPUs, it might still get locked on
another CPU before the other CPU gets NMI.
By other words, any check in printk_safe_enter() is racy and not
sufficient
=> I suggest to revert the commit 719f6a7040f1bdaf96fcc70
     "printk: Use the main logbuf in NMI when logbuf_lock is available"
     for-4.18 and stable until we get a better solution.
The only safe solution seems to be a trylock() in NMI in
vprintk_emit() and fallback to vprintk_safe() when the lock
is not taken. I mean something like:
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 247808333ba4..4a5a0bf221b3 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1845,7 +1845,13 @@ asmlinkage int vprintk_emit(int facility, int level,
    printk_delay();
/* This stops the holder of console_sem just where we want him */
-	logbuf_lock_irqsave(flags);
+	printk_safe_enter_irqsave(flags);
+	if (in_nmi() && !raw_spin_trylock(&logbuf_lock)) {
+		vprintk_nmi(fmt, args);
+		printk_safe_exit_irqrestore(flags);
+		return;
+	} else
+		raw_spin_lock(&logbuf_lock);
    /*
     * The printf needs to come first; we need the syslog
     * prefix which might be passed-in as a parameter.
Sigh, this looks like a material for-4.19. We might need to
revisit if printk_context still makes sense, ...
Best Regards,
Petr
PS: I realized this when writing the pull request for-4.18.
I removed this patch from the pull request.

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] printk/nmi: Prevent deadlock when serializing NMI backtraces