[PATCH] printk/nmi: Prevent deadlock when serializing NMI backtraces

List overview All Threads
Download

newer

older

[PATCH v4 06/15] mtd: rawnand:...

Re: [PATCH 3.18 00/21]...

Petr Mladek

17 May 2018 17 May '18

2:39 p.m.

The commit 719f6a7040f1bdaf96fcc ("printk: Use the main logbuf in NMI when logbuf_lock is available") tried to detect when logbuf_lock was taken on another CPU. Then it looked safe to wait for the lock even in NMI.

It would be safe if other locks were not involved. Ironically the same commit introduced an ABBA deadlock scenario. It added a spin lock into nmi_cpu_backtrace() to serialize logs from different CPUs. The effect is that also the NMI handlers are serialized. As a result, logbuf_lock might be blocked by NMI on another CPU:

CPU0 CPU1 CPU2

printk() vprintk_emit() spin_lock(&logbuf_lock)

trigger_all_cpu_backtrace() raise()

nmi_enter() printk_nmi_enter() if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK) // false else // looks safe to use printk_deferred() this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);

nmi_cpu_backtrace() arch_spin_lock(&lock); show_regs()

nmi_enter() nmi_cpu_backtrace() arch_spin_lock(&lock);

printk() vprintk_func() vprintk_deferred() vprintk_emit() spin_lock(&logbuf_lock)

DEADLOCK: between &logbuf_lock from vprintk_emit() and &lock from nmi_cpu_backtrace().

CPU0 CPU1 lock(logbuf_lock) lock(lock) lock(lock) lock(logbuf_lock)

I have found this problem when stress testing trigger_all_cpu_backtrace() and the system frozen.

Note that lockdep is not able to detect these dependencies because there is no support for NMI context. Let's stay on the safe side and always use printk_safe buffers when logbuf_lock is taken when entering NMI.

Fixes: 719f6a7040f1bdaf96fcc ("printk: Use the main logbuf in NMI when logbuf_lock is available") Cc: 4.13+ stable@vger.kernel.org # v4.13+ Signed-off-by: Petr Mladek pmladek@suse.com --- kernel/printk/printk_safe.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 449d67edfa4b..a2ebd749c053 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -310,15 +310,12 @@ void printk_nmi_enter(void) { /* * The size of the extra per-CPU buffer is limited. Use it only when - * the main one is locked. If this CPU is not in the safe context, - * the lock must be taken on another CPU and we could wait for it. + * the main one is locked. */ - if ((this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK) && - raw_spin_is_locked(&logbuf_lock)) { + if (raw_spin_is_locked(&logbuf_lock)) this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK); - } else { + else this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK); - } }

void printk_nmi_exit(void)

-- 2.13.6

Show replies by date

Sergey Senozhatsky

18 May 18 May

2:07 a.m.

On (05/17/18 16:39), Petr Mladek wrote:

...

CPU0 CPU1 CPU2

printk() vprintk_emit() spin_lock(&logbuf_lock)
				trigger_all_cpu_backtrace()
				  raise()

	nmi_enter()
	  printk_nmi_enter()
	    if (this_cpu_read(printk_context)
	      & PRINTK_SAFE_CONTEXT_MASK)
	      // false
	    else
	      // looks safe to use printk_deferred()
	      this_cpu_or(printk_context,
		PRINTK_NMI_DEFERRED_CONTEXT_MASK);

	  nmi_cpu_backtrace()
	    arch_spin_lock(&lock);
	      show_regs()
nmi_enter() nmi_cpu_backtrace() arch_spin_lock(&lock);
	      printk()
		vprintk_func()
		  vprintk_deferred()
		    vprintk_emit()
		      spin_lock(&logbuf_lock)
DEADLOCK: between &logbuf_lock from vprintk_emit() and &lock from nmi_cpu_backtrace().

CPU0 CPU1 lock(logbuf_lock) lock(lock) lock(lock) lock(logbuf_lock)

[..]

...

Signed-off-by: Petr Mladek pmladek@suse.com

This is a pretty cool find!

Acked-by: Sergey Senozhatsky sergey.senozhatsky@gmail.com

...

if ((this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK) &&
   raw_spin_is_locked(&logbuf_lock)) {
if (raw_spin_is_locked(&logbuf_lock)) this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);

} else {

else this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);

}

A question - can we switch to a bitwise OR?

if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK) || raw_spin_is_locked(&logbuf_lock)

just to check per-CPU `printk_context' first and only afterwards access the global `logbuf_lock'. printk_nmi_enter() happens on every CPU, so maybe we can avoid some overhead by checking the local per-CPU data first.

-ss

Sergey Senozhatsky

6:38 a.m.

On (05/18/18 11:07), Sergey Senozhatsky wrote:

...

if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK) || raw_spin_is_locked(&logbuf_lock)

just to check per-CPU `printk_context' first and only afterwards access the global `logbuf_lock'. printk_nmi_enter() happens on every CPU, so maybe we can avoid some overhead by checking the local per-CPU data first.

Nah, may be it won't. This, probably, would have been the case if we had continue to call console drivers from printk_safe section [at least]. CPUs don't spend that much time in printk_safe sections.

-ss

Petr Mladek

8:10 a.m.

On Fri 2018-05-18 15:38:20, Sergey Senozhatsky wrote:

...

On (05/18/18 11:07), Sergey Senozhatsky wrote:

...
if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK) || raw_spin_is_locked(&logbuf_lock)

just to check per-CPU `printk_context' first and only afterwards access the global `logbuf_lock'. printk_nmi_enter() happens on every CPU, so maybe we can avoid some overhead by checking the local per-CPU data first.

Nah, may be it won't. This, probably, would have been the case if we had continue to call console drivers from printk_safe section [at least]. CPUs don't spend that much time in printk_safe sections.

Yeah, I do not think that the more complicated code is worth it.

There is really minimal chance to hit printk_safe context on the given CPU. The eventual win would be negligible to the cost of printk(). In case of trigger_all_cpu_backtrace(), most CPUs spend a lot of time waiting for the spinlock in nmi_cpu_backtrace() anyway.

Thanks for the ack.

Best Regards, Petr

Steven Rostedt

22 May 22 May

9:43 p.m.

On Thu, 17 May 2018 16:39:03 +0200 Petr Mladek pmladek@suse.com wrote:

...

The commit 719f6a7040f1bdaf96fcc ("printk: Use the main logbuf in NMI when logbuf_lock is available") tried to detect when logbuf_lock was taken on another CPU. Then it looked safe to wait for the lock even in NMI.

It would be safe if other locks were not involved. Ironically the same commit introduced an ABBA deadlock scenario. It added a spin lock into nmi_cpu_backtrace() to serialize logs from different CPUs. The effect is that also the NMI handlers are serialized. As a result, logbuf_lock might be blocked by NMI on another CPU:

CPU0 CPU1 CPU2

printk() vprintk_emit() spin_lock(&logbuf_lock)
				trigger_all_cpu_backtrace()
				  raise()

	nmi_enter()
	  printk_nmi_enter()
	    if (this_cpu_read(printk_context)
	      & PRINTK_SAFE_CONTEXT_MASK)
	      // false
	    else
	      // looks safe to use printk_deferred()
	      this_cpu_or(printk_context,
		PRINTK_NMI_DEFERRED_CONTEXT_MASK);

	  nmi_cpu_backtrace()
	    arch_spin_lock(&lock);

What branch is this based on, because I can't find the "arch_spin_lock()" you are talking about here.

-- Steve

...

	      show_regs()
nmi_enter() nmi_cpu_backtrace() arch_spin_lock(&lock);
	      printk()
		vprintk_func()
		  vprintk_deferred()
		    vprintk_emit()
		      spin_lock(&logbuf_lock)
DEADLOCK: between &logbuf_lock from vprintk_emit() and &lock from nmi_cpu_backtrace().

CPU0 CPU1 lock(logbuf_lock) lock(lock) lock(lock) lock(logbuf_lock)

I have found this problem when stress testing trigger_all_cpu_backtrace() and the system frozen.

Note that lockdep is not able to detect these dependencies because there is no support for NMI context. Let's stay on the safe side and always use printk_safe buffers when logbuf_lock is taken when entering NMI.

Fixes: 719f6a7040f1bdaf96fcc ("printk: Use the main logbuf in NMI when logbuf_lock is available") Cc: 4.13+ stable@vger.kernel.org # v4.13+ Signed-off-by: Petr Mladek pmladek@suse.com

kernel/printk/printk_safe.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 449d67edfa4b..a2ebd749c053 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -310,15 +310,12 @@ void printk_nmi_enter(void) { /* * The size of the extra per-CPU buffer is limited. Use it only when
* the main one is locked. If this CPU is not in the safe context,
* the lock must be taken on another CPU and we could wait for it.
* the main one is locked.
*/
if ((this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK) &&
   raw_spin_is_locked(&logbuf_lock)) {
if (raw_spin_is_locked(&logbuf_lock)) this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);

} else {

else this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);

}

} void printk_nmi_exit(void)

Sergey Senozhatsky

23 May 23 May

2:01 a.m.

On (05/22/18 17:43), Steven Rostedt wrote:

...

...
CPU0 CPU1 CPU2

printk() vprintk_emit() spin_lock(&logbuf_lock)
				trigger_all_cpu_backtrace()
				  raise()

	nmi_enter()
	  printk_nmi_enter()
	    if (this_cpu_read(printk_context)
	      & PRINTK_SAFE_CONTEXT_MASK)
	      // false
	    else
	      // looks safe to use printk_deferred()
	      this_cpu_or(printk_context,
		PRINTK_NMI_DEFERRED_CONTEXT_MASK);

	  nmi_cpu_backtrace()
	    arch_spin_lock(&lock);
What branch is this based on, because I can't find the "arch_spin_lock()" you are talking about here.

This arch_spin_lock() is in lib/nmi_backtrace.c

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/...

-ss

Petr Mladek

28 May 28 May

12:27 p.m.

On Wed 2018-05-23 11:01:07, Sergey Senozhatsky wrote:

...

On (05/22/18 17:43), Steven Rostedt wrote:

...
...
CPU0 CPU1 CPU2

printk() vprintk_emit() spin_lock(&logbuf_lock)
				trigger_all_cpu_backtrace()
				  raise()

	nmi_enter()
	  printk_nmi_enter()
	    if (this_cpu_read(printk_context)
	      & PRINTK_SAFE_CONTEXT_MASK)
	      // false
	    else
	      // looks safe to use printk_deferred()
	      this_cpu_or(printk_context,
		PRINTK_NMI_DEFERRED_CONTEXT_MASK);

	  nmi_cpu_backtrace()
	    arch_spin_lock(&lock);
What branch is this based on, because I can't find the "arch_spin_lock()" you are talking about here.
This arch_spin_lock() is in lib/nmi_backtrace.c

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/...

JFYI, I have pushed the patch into printk.git, for-4.18 branch, see https://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk.git/commit/?h...

Best Regards, Petr

Petr Mladek

5 Jun 5 Jun

12:47 p.m.

On Thu 2018-05-17 16:39:03, Petr Mladek wrote:

...

The commit 719f6a7040f1bdaf96fcc ("printk: Use the main logbuf in NMI when logbuf_lock is available") tried to detect when logbuf_lock was taken on another CPU. Then it looked safe to wait for the lock even in NMI.

It would be safe if other locks were not involved. Ironically the same commit introduced an ABBA deadlock scenario. It added a spin lock into nmi_cpu_backtrace() to serialize logs from different CPUs. The effect is that also the NMI handlers are serialized. As a result, logbuf_lock might be blocked by NMI on another CPU:

CPU0 CPU1 CPU2

printk() vprintk_emit() spin_lock(&logbuf_lock)
				trigger_all_cpu_backtrace()
				  raise()

	nmi_enter()
	  printk_nmi_enter()
	    if (this_cpu_read(printk_context)
	      & PRINTK_SAFE_CONTEXT_MASK)
	      // false
	    else
	      // looks safe to use printk_deferred()
	      this_cpu_or(printk_context,
		PRINTK_NMI_DEFERRED_CONTEXT_MASK);

	  nmi_cpu_backtrace()
	    arch_spin_lock(&lock);
	      show_regs()
nmi_enter() nmi_cpu_backtrace() arch_spin_lock(&lock);
	      printk()
		vprintk_func()
		  vprintk_deferred()
		    vprintk_emit()
		      spin_lock(&logbuf_lock)
DEADLOCK: between &logbuf_lock from vprintk_emit() and &lock from nmi_cpu_backtrace().

CPU0 CPU1 lock(logbuf_lock) lock(lock) lock(lock) lock(logbuf_lock)

I have found this problem when stress testing trigger_all_cpu_backtrace() and the system frozen.

Note that lockdep is not able to detect these dependencies because there is no support for NMI context. Let's stay on the safe side and always use printk_safe buffers when logbuf_lock is taken when entering NMI.

Fixes: 719f6a7040f1bdaf96fcc ("printk: Use the main logbuf in NMI when logbuf_lock is available") Cc: 4.13+ stable@vger.kernel.org # v4.13+ Signed-off-by: Petr Mladek pmladek@suse.com

kernel/printk/printk_safe.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 449d67edfa4b..a2ebd749c053 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -310,15 +310,12 @@ void printk_nmi_enter(void) { /* * The size of the extra per-CPU buffer is limited. Use it only when
* the main one is locked. If this CPU is not in the safe context,
* the lock must be taken on another CPU and we could wait for it.
* the main one is locked.
*/
if ((this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK) &&
   raw_spin_is_locked(&logbuf_lock)) {
if (raw_spin_is_locked(&logbuf_lock)) this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);

Grr, the ABBA deadlock is still there. NMIs are not sent to the other CPUs atomically. Even if we detect that logbuf_lock is available in printk_nmi_enter() on some CPUs, it might still get locked on another CPU before the other CPU gets NMI.

By other words, any check in printk_safe_enter() is racy and not sufficient

=> I suggest to revert the commit 719f6a7040f1bdaf96fcc70 "printk: Use the main logbuf in NMI when logbuf_lock is available" for-4.18 and stable until we get a better solution.

The only safe solution seems to be a trylock() in NMI in vprintk_emit() and fallback to vprintk_safe() when the lock is not taken. I mean something like:

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 247808333ba4..4a5a0bf221b3 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1845,7 +1845,13 @@ asmlinkage int vprintk_emit(int facility, int level, printk_delay();

Sigh, this looks like a material for-4.19. We might need to revisit if printk_context still makes sense, ...

Best Regards, Petr

PS: I realized this when writing the pull request for-4.18. I removed this patch from the pull request.

Sergey Senozhatsky

6 Jun 6 Jun

5:10 a.m.

On (06/05/18 14:47), Petr Mladek wrote: [..]

...

Grr, the ABBA deadlock is still there. NMIs are not sent to the other CPUs atomically. Even if we detect that logbuf_lock is available in printk_nmi_enter() on some CPUs, it might still get locked on another CPU before the other CPU gets NMI.

Can we do something about "B"? :) I mean - any chance we can rework locking in nmi_cpu_backtrace()?

...

By other words, any check in printk_safe_enter() is racy and not sufficient

I suppose you meant printk_nmi_enter().

...

=> I suggest to revert the commit 719f6a7040f1bdaf96fcc70 "printk: Use the main logbuf in NMI when logbuf_lock is available" for-4.18 and stable until we get a better solution.

Just random thoughts.

May be we need to revert it, but let's not "panic". I think [but don't insist on it] that the patch in question is *probably* "good enough". It addresses a bug report after all. How often do we have arch_trigger_cpumask_backtrace() on all CPUs these days? I tend to think that it used to be much more popular in the past, because we had a loops_per_jiffy based spin_lock lockup detection which would trigger NMI backtracase, but this functionality has gone, see bc88c10d7e6900916f5e1ba3829d66a9de92b633 for details. I'm not saying that the race condition that you found is unrealistic, I'm just saying that _it seems_ that nmi_panic()->printk() on a single CPU is more common now, so having that nmi_printk()->printk_deferred() might be quite valuable at the end of the day.

May be I'm wrong!

...

The only safe solution seems to be a trylock() in NMI in vprintk_emit() and fallback to vprintk_safe() when the lock is not taken. I mean something like:

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 247808333ba4..4a5a0bf221b3 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1845,7 +1845,13 @@ asmlinkage int vprintk_emit(int facility, int level, printk_delay(); /* This stops the holder of console_sem just where we want him */

logbuf_lock_irqsave(flags);
printk_safe_enter_irqsave(flags);

if (in_nmi() && !raw_spin_trylock(&logbuf_lock)) {
vprintk_nmi(fmt, args);
printk_safe_exit_irqrestore(flags);
return;
} else
raw_spin_lock(&logbuf_lock);
/*

The printf needs to come first; we need the syslog

prefix which might be passed-in as a parameter.

I need some time to think about it.

...

Sigh, this looks like a material for-4.19.

Agreed.

...

We might need to revisit if printk_context still makes sense, ...

What do you mean by this?

...

PS: I realized this when writing the pull request for-4.18. I removed this patch from the pull request.

Yep. Good job!

-ss

Sergey Senozhatsky

10:33 a.m.

On (06/06/18 14:10), Sergey Senozhatsky wrote:

...

On (06/05/18 14:47), Petr Mladek wrote: [..]

...
Grr, the ABBA deadlock is still there. NMIs are not sent to the other CPUs atomically. Even if we detect that logbuf_lock is available in printk_nmi_enter() on some CPUs, it might still get locked on another CPU before the other CPU gets NMI.

Can we do something about "B"? :) I mean - any chance we can rework locking in nmi_cpu_backtrace()?

Sorry, I don't have that much free time at the moment, so can't fully concentrate on this issue.

Here is a quick-n-dirty thought.

The whole idea of printk_nmi() was to reduce the number of locks we take performing printk() from NMI context - ideally down to 0. We added logbuf spin_lock later on. One lock was fine. But then we added another one - nmi_cpu_backtrace() lock. And two locks is too many. So can we drop the nmi_cpu_backtrace() lock, and in exchange extend printk-safe API with functions that will disable/enable PRINTK_NMI_DEFERRED_CONTEXT_MASK on a particular CPU?

I refer to it as HARD and SOFT printk_nmi :) Just for fun. Hard one has no right to use logbuf and will use only per-CPU buffer, while soft one can use either per-CPU buffer or logbuf.

---

diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c index 0ace3c907290..b57d5daa90b5 100644 --- a/lib/nmi_backtrace.c +++ b/lib/nmi_backtrace.c @@ -90,11 +90,10 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,

bool nmi_cpu_backtrace(struct pt_regs *regs) { - static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED; int cpu = smp_processor_id();

if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) { - arch_spin_lock(&lock); + printk_nmi_hard(); if (regs && cpu_in_idle(instruction_pointer(regs))) { pr_warn("NMI backtrace for cpu %d skipped: idling at %pS\n", cpu, (void *)instruction_pointer(regs)); @@ -105,7 +104,7 @@ bool nmi_cpu_backtrace(struct pt_regs *regs) else dump_stack(); } - arch_spin_unlock(&lock); + printk_nmi_sort(); cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask)); return true; }

Petr Mladek

8 Jun 8 Jun

10:48 a.m.

On Wed 2018-06-06 19:33:56, Sergey Senozhatsky wrote:

...

On (06/06/18 14:10), Sergey Senozhatsky wrote:

...
On (06/05/18 14:47), Petr Mladek wrote: [..]

...
Grr, the ABBA deadlock is still there. NMIs are not sent to the other CPUs atomically. Even if we detect that logbuf_lock is available in printk_nmi_enter() on some CPUs, it might still get locked on another CPU before the other CPU gets NMI.

Can we do something about "B"? :) I mean - any chance we can rework locking in nmi_cpu_backtrace()?

Sorry, I don't have that much free time at the moment, so can't fully concentrate on this issue.

Here is a quick-n-dirty thought.

The whole idea of printk_nmi() was to reduce the number of locks we take performing printk() from NMI context - ideally down to 0. We added logbuf spin_lock later on. One lock was fine. But then we added another one - nmi_cpu_backtrace() lock. And two locks is too many. So can we drop the nmi_cpu_backtrace() lock, and in exchange extend printk-safe API with functions that will disable/enable PRINTK_NMI_DEFERRED_CONTEXT_MASK on a particular CPU?

I ended with similar conclusion. I am just nervous by the fact that the check in printk_nmi_enter() will always be unreliable. We already deal with situations when we want to check the actual lock state in panic(). Also the lock in nmi_cpu_backtrace() need not be the only lock serializing NMIs. I am always surprised what code can be called in NMI.

I played with this a bit and came up with the following:

...

From 25c904dbe8eb95d823b311a343f39ed926b7a7b2 Mon Sep 17 00:00:00 2001

From: Petr Mladek pmladek@suse.com Date: Fri, 8 Jun 2018 11:21:07 +0200 Subject: [PATCH] printk/nmi: More reliable approach to use the main log buffer in NMI

The check whether logbuf_lock is available is not safe in printk_nmi_enter(). It might get blocked by another CPU that takes the lock and enter NMI later.

The race still would be acceptable if the other CPU could finish the NMI handler independently. But this is not a case with nmi_trigger_cpumask_backtrace(). It has to serialize all NMI handlers to prevent mixed output.

It would be possible to make the decision later in vprintk_emit(). But it would cause mixed output and make nmi_trigger_cpumask_backtrace() unusable.

Also it would be possible to avoid the serialization and force using the per-CPU buffers in nmi_cpu_backtrace(). But this problem is generic and we do not know how many other such deadlocks are possible.

But wait, the motivation to use the main log buffer in NMI was for flushing ftrace log in panic(). It can produce hundreds of lines that do not fit the per-CPU buffers. The good think is that they have its own time stamp and can be sorted later. Therefore it might be acceptable to risk the mixed output in this case.

This patch introduces new printk_nmi_direct() context. It tries to use the main log buffer when possible but there is a risk of mixed messages. It will be used only when flushing the ftrace buffer.

The special printk_nmi per-CPU buffers will always be used in the other situation again. As a result, nmi_cpu_backtrace() need not longer be serialized.

Fixes: 719f6a7040f1bdaf96 ("printk: Use the main logbuf in NMI when logbuf_lock is available") Signed-off-by: Petr Mladek pmladek@suse.com --- include/linux/printk.h | 4 ++++ kernel/printk/internal.h | 8 ++++++- kernel/printk/printk.c | 8 ++++++- kernel/printk/printk_safe.c | 57 ++++++++++++++++++++++++--------------------- kernel/trace/trace.c | 4 +++- lib/nmi_backtrace.c | 3 --- 6 files changed, 51 insertions(+), 33 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h index 6d7e800affd8..872fbdf8df26 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -148,9 +148,13 @@ void early_printk(const char *s, ...) { } #ifdef CONFIG_PRINTK_NMI extern void printk_nmi_enter(void); extern void printk_nmi_exit(void); +extern void printk_nmi_direct_enter(void); +extern void printk_nmi_direct_exit(void); #else static inline void printk_nmi_enter(void) { } static inline void printk_nmi_exit(void) { } +static void printk_nmi_direct_enter(void) { } +static void printk_nmi_direct_exit(void) { } #endif /* PRINTK_NMI */

#ifdef CONFIG_PRINTK diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h index 2a7d04049af4..47db66e41ab4 100644 --- a/kernel/printk/internal.h +++ b/kernel/printk/internal.h @@ -19,11 +19,17 @@ #ifdef CONFIG_PRINTK

#define PRINTK_SAFE_CONTEXT_MASK 0x3fffffff -#define PRINTK_NMI_DEFERRED_CONTEXT_MASK 0x40000000 +#define PRINTK_NMI_DIRECT_CONTEXT_MASK 0x40000000 #define PRINTK_NMI_CONTEXT_MASK 0x80000000

extern raw_spinlock_t logbuf_lock;

+#ifdef CONFIG_PRINTK_NMI +__printf(1, 0) int vprintk_nmi(const char *fmt, va_list args); +#else +__printf(1, 0) int vprintk_nmi(const char *fmt, va_list args) { return 0; } +#endif + __printf(1, 0) int vprintk_default(const char *fmt, va_list args); __printf(1, 0) int vprintk_deferred(const char *fmt, va_list args); __printf(1, 0) int vprintk_func(const char *fmt, va_list args); diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 247808333ba4..cf55bece43d9 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1845,7 +1845,13 @@ asmlinkage int vprintk_emit(int facility, int level, printk_delay();

/* This stops the holder of console_sem just where we want him */ - logbuf_lock_irqsave(flags); + printk_safe_enter_irqsave(flags); + if (in_nmi() && !raw_spin_trylock(&logbuf_lock)) { + printed_len = vprintk_nmi(fmt, args); + printk_safe_exit_irqrestore(flags); + return printed_len; + } else + raw_spin_lock(&logbuf_lock); /* * The printf needs to come first; we need the syslog * prefix which might be passed-in as a parameter. diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index d7d091309054..bf8e3ebfca8e 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -299,40 +299,43 @@ void printk_safe_flush_on_panic(void) * one writer running. But the buffer might get flushed from another * CPU, so we need to be careful. */ -static __printf(1, 0) int vprintk_nmi(const char *fmt, va_list args) +__printf(1, 0) int vprintk_nmi(const char *fmt, va_list args) { struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq);

return printk_safe_log_store(s, fmt, args); }

-void printk_nmi_enter(void) +/* + * Try to use the main log buffer even in NMI context. It falls + * back to vprintk_nmi() when the lock is not available. The decision + * is made in vprintk_emit(). + * + * Therefore there is not a risk of a deadlock. But the messages might + * get reordered when anyone uses the lock in parallel and only some + * messages are stored directly. + */ +void printk_nmi_direct_enter(void) { - /* - * The size of the extra per-CPU buffer is limited. Use it only when - * the main one is locked. If this CPU is not in the safe context, - * the lock must be taken on another CPU and we could wait for it. - */ - if ((this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK) && - raw_spin_is_locked(&logbuf_lock)) { - this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK); - } else { - this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK); - } + if (!(this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK)) + return; + + this_cpu_or(printk_context, PRINTK_NMI_DIRECT_CONTEXT_MASK); }

-void printk_nmi_exit(void) +void printk_nmi_direct_exit(void) { - this_cpu_and(printk_context, - ~(PRINTK_NMI_CONTEXT_MASK | - PRINTK_NMI_DEFERRED_CONTEXT_MASK)); + this_cpu_and(printk_context, ~PRINTK_NMI_DIRECT_CONTEXT_MASK); }

-#else +void printk_nmi_enter(void) +{ + this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK); +}

-static __printf(1, 0) int vprintk_nmi(const char *fmt, va_list args) +void printk_nmi_exit(void) { - return 0; + this_cpu_and(printk_context, ~PRINTK_NMI_CONTEXT_MASK); }

#endif /* CONFIG_PRINTK_NMI */ @@ -363,6 +366,13 @@ void __printk_safe_exit(void)

__printf(1, 0) int vprintk_func(const char *fmt, va_list args) { + /* + * Try to use the main logbuf even in NMI. But avoid calling console + * drivers that might have their own locks. + */ + if (this_cpu_read(printk_context) & PRINTK_NMI_DIRECT_CONTEXT_MASK) + return vprintk_deferred(fmt, args); + /* Use extra buffer in NMI when logbuf_lock is taken or in safe mode. */ if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK) return vprintk_nmi(fmt, args); @@ -371,13 +381,6 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list args) if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK) return vprintk_safe(fmt, args);

- /* - * Use the main logbuf when logbuf_lock is available in NMI. - * But avoid calling console drivers that might have their own locks. - */ - if (this_cpu_read(printk_context) & PRINTK_NMI_DEFERRED_CONTEXT_MASK) - return vprintk_deferred(fmt, args); - /* No obstacles. */ return vprintk_default(fmt, args); } diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 108ce3e1dc13..38b6dbe1709b 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -8279,6 +8279,7 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) tracing_off();

local_irq_save(flags); + printk_nmi_direct_enter();

/* Simulate the iterator */ trace_init_global_iter(&iter); @@ -8358,7 +8359,8 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) for_each_tracing_cpu(cpu) { atomic_dec(&per_cpu_ptr(iter.trace_buffer->data, cpu)->disabled); } - atomic_dec(&dump_running); + atomic_dec(&dump_running); + printk_nmi_direct_exit(); local_irq_restore(flags); } EXPORT_SYMBOL_GPL(ftrace_dump); diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c index 61a6b5aab07e..15ca78e1c7d4 100644 --- a/lib/nmi_backtrace.c +++ b/lib/nmi_backtrace.c @@ -87,11 +87,9 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,

bool nmi_cpu_backtrace(struct pt_regs *regs) { - static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED; int cpu = smp_processor_id();

if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) { - arch_spin_lock(&lock); if (regs && cpu_in_idle(instruction_pointer(regs))) { pr_warn("NMI backtrace for cpu %d skipped: idling at %pS\n", cpu, (void *)instruction_pointer(regs)); @@ -102,7 +100,6 @@ bool nmi_cpu_backtrace(struct pt_regs *regs) else dump_stack(); } - arch_spin_unlock(&lock); cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask)); return true; }

-- 2.13.6

Sergey Senozhatsky

18 Jun 18 Jun

6:37 a.m.

On (06/08/18 12:48), Petr Mladek wrote: [..]

...

diff --git a/include/linux/printk.h b/include/linux/printk.h index 6d7e800affd8..872fbdf8df26 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -148,9 +148,13 @@ void early_printk(const char *s, ...) { } #ifdef CONFIG_PRINTK_NMI extern void printk_nmi_enter(void); extern void printk_nmi_exit(void); +extern void printk_nmi_direct_enter(void); +extern void printk_nmi_direct_exit(void); #else static inline void printk_nmi_enter(void) { } static inline void printk_nmi_exit(void) { } +static void printk_nmi_direct_enter(void) { } +static void printk_nmi_direct_exit(void) { }

Can we have better names may be? Since direct printk_nmi is not in fact always `direct'.

...

+#ifdef CONFIG_PRINTK_NMI +__printf(1, 0) int vprintk_nmi(const char *fmt, va_list args); +#else +__printf(1, 0) int vprintk_nmi(const char *fmt, va_list args) { return 0; } +#endif

Hmm, printk_safe.c knows about printk.c, printk.c knows about printk_safe.c.

...

__printf(1, 0) int vprintk_default(const char *fmt, va_list args); __printf(1, 0) int vprintk_deferred(const char *fmt, va_list args); __printf(1, 0) int vprintk_func(const char *fmt, va_list args); diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 247808333ba4..cf55bece43d9 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1845,7 +1845,13 @@ asmlinkage int vprintk_emit(int facility, int level, printk_delay(); /* This stops the holder of console_sem just where we want him */

logbuf_lock_irqsave(flags);
printk_safe_enter_irqsave(flags);

if (in_nmi() && !raw_spin_trylock(&logbuf_lock)) {
printed_len = vprintk_nmi(fmt, args);
printk_safe_exit_irqrestore(flags);
return printed_len;
} else
raw_spin_lock(&logbuf_lock);

OK... Can we do this in vprintk_func()? The race window should be super tiny [if matters at all], but in exchange we don't have to mix nmi, printk, printk_mni, etc.

So over all I understand why you did it this way. May be I'd prefer to have less universal but shorter solution (e.g. modify only nmi_backtrace function and put there "printk_nmi_restricted_buffer"), but I won't really object your patch [unless I see some real issues with it].

-ss

Petr Mladek

9:39 a.m.

On Mon 2018-06-18 15:37:38, Sergey Senozhatsky wrote:

...

On (06/08/18 12:48), Petr Mladek wrote: [..]

...
diff --git a/include/linux/printk.h b/include/linux/printk.h index 6d7e800affd8..872fbdf8df26 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -148,9 +148,13 @@ void early_printk(const char *s, ...) { } #ifdef CONFIG_PRINTK_NMI extern void printk_nmi_enter(void); extern void printk_nmi_exit(void); +extern void printk_nmi_direct_enter(void); +extern void printk_nmi_direct_exit(void); #else static inline void printk_nmi_enter(void) { } static inline void printk_nmi_exit(void) { } +static void printk_nmi_direct_enter(void) { } +static void printk_nmi_direct_exit(void) { }

Can we have better names may be? Since direct printk_nmi is not in fact always `direct'.

What about printk_chatty_nmi_enter(), printk_large_nmi_enter() or something similar?

...

...
+#ifdef CONFIG_PRINTK_NMI +__printf(1, 0) int vprintk_nmi(const char *fmt, va_list args); +#else +__printf(1, 0) int vprintk_nmi(const char *fmt, va_list args) { return 0; } +#endif

Hmm, printk_safe.c knows about printk.c, printk.c knows about printk_safe.c.

I am sorry but I do not understand the problem. The function is defined in printk_safe.c and we need to call it also from printk.c. It seems reasonable to declare it in kernel/printk/internal.h.

...

...
__printf(1, 0) int vprintk_default(const char *fmt, va_list args); __printf(1, 0) int vprintk_deferred(const char *fmt, va_list args); __printf(1, 0) int vprintk_func(const char *fmt, va_list args); diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 247808333ba4..cf55bece43d9 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1845,7 +1845,13 @@ asmlinkage int vprintk_emit(int facility, int level, printk_delay(); /* This stops the holder of console_sem just where we want him */

logbuf_lock_irqsave(flags);
printk_safe_enter_irqsave(flags);

if (in_nmi() && !raw_spin_trylock(&logbuf_lock)) {
printed_len = vprintk_nmi(fmt, args);
printk_safe_exit_irqrestore(flags);
return printed_len;
} else
raw_spin_lock(&logbuf_lock);
OK... Can we do this in vprintk_func()? The race window should be super tiny [if matters at all], but in exchange we don't have to mix nmi, printk, printk_mni, etc.

You are right that it would still solve the main risk (NMI comes inside logbuf_lock critical section).

In fact, the only real risk would be another lock serializing NMIs and printk() called with that lock. This patch removes one in nmi_backtrace() and I am not aware of any other.

The less hairy code really might be worth the rather theoretical risk.

...

So over all I understand why you did it this way. May be I'd prefer to have less universal but shorter solution (e.g. modify only nmi_backtrace function and put there "printk_nmi_restricted_buffer"), but I won't really object your patch [unless I see some real issues with it].

Thanks in advance. I'll send v2 once we have a conclusion on the function names and includes.

Best Regards, Petr

Sergey Senozhatsky

10:07 a.m.

On (06/18/18 11:39), Petr Mladek wrote: [..]

...

...
...
extern void printk_nmi_enter(void); extern void printk_nmi_exit(void); +extern void printk_nmi_direct_enter(void); +extern void printk_nmi_direct_exit(void); #else static inline void printk_nmi_enter(void) { } static inline void printk_nmi_exit(void) { } +static void printk_nmi_direct_enter(void) { } +static void printk_nmi_direct_exit(void) { }

Can we have better names may be? Since direct printk_nmi is not in fact always `direct'.

What about printk_chatty_nmi_enter(), printk_large_nmi_enter() or something similar?

Hmm. Can't answer right now :)

...

...
...
+#ifdef CONFIG_PRINTK_NMI +__printf(1, 0) int vprintk_nmi(const char *fmt, va_list args); +#else +__printf(1, 0) int vprintk_nmi(const char *fmt, va_list args) { return 0; } +#endif

Hmm, printk_safe.c knows about printk.c, printk.c knows about printk_safe.c.

I am sorry but I do not understand the problem. The function is defined in printk_safe.c and we need to call it also from printk.c. It seems reasonable to declare it in kernel/printk/internal.h.

Just wanted to suggest to keep printk_safe/printk_nmi stuff in printk_safe.c. We already have everything we need there, so let's just add the vprintk_nmi() fallback, avoiding spreading printk_safe/printk_nmi logic and details across printk.c and printk_safe.c

...

...
OK... Can we do this in vprintk_func()? The race window should be super tiny [if matters at all], but in exchange we don't have to mix nmi, printk, printk_mni, etc.

You are right that it would still solve the main risk (NMI comes inside logbuf_lock critical section).

In fact, the only real risk would be another lock serializing NMIs and printk() called with that lock. This patch removes one in nmi_backtrace() and I am not aware of any other.

The less hairy code really might be worth the rather theoretical risk.

...
So over all I understand why you did it this way. May be I'd prefer to have less universal but shorter solution (e.g. modify only nmi_backtrace function and put there "printk_nmi_restricted_buffer"), but I won't really object your patch [unless I see some real issues with it].

Thanks in advance. I'll send v2 once we have a conclusion on the function names and includes.

Does this mean that we agreed to handle the printk_nmi per-CPU buffer fallback in printk_safe.c?

-ss

Petr Mladek

19 Jun 19 Jun

7:52 a.m.

On Mon 2018-06-18 19:07:18, Sergey Senozhatsky wrote:

...

On (06/18/18 11:39), Petr Mladek wrote: [..]

...
...
...
extern void printk_nmi_enter(void); extern void printk_nmi_exit(void); +extern void printk_nmi_direct_enter(void); +extern void printk_nmi_direct_exit(void); #else static inline void printk_nmi_enter(void) { } static inline void printk_nmi_exit(void) { } +static void printk_nmi_direct_enter(void) { } +static void printk_nmi_direct_exit(void) { }

Can we have better names may be? Since direct printk_nmi is not in fact always `direct'.

What about printk_chatty_nmi_enter(), printk_large_nmi_enter() or something similar?

Hmm. Can't answer right now :)

Please, let me know what name you would like ;-)

...

...
...
...
+#ifdef CONFIG_PRINTK_NMI +__printf(1, 0) int vprintk_nmi(const char *fmt, va_list args); +#else +__printf(1, 0) int vprintk_nmi(const char *fmt, va_list args) { return 0; } +#endif

Hmm, printk_safe.c knows about printk.c, printk.c knows about printk_safe.c.

Just wanted to suggest to keep printk_safe/printk_nmi stuff in printk_safe.c. We already have everything we need there, so let's just add the vprintk_nmi() fallback, avoiding spreading printk_safe/printk_nmi logic and details across printk.c and printk_safe.c

This won't be needed in v2 where the decision will be made in vprintk_func().

...

...
...
OK... Can we do this in vprintk_func()? The race window should be super tiny [if matters at all], but in exchange we don't have to mix nmi, printk, printk_mni, etc.

You are right that it would still solve the main risk (NMI comes inside logbuf_lock critical section).

In fact, the only real risk would be another lock serializing NMIs and printk() called with that lock. This patch removes one in nmi_backtrace() and I am not aware of any other.

The less hairy code really might be worth the rather theoretical risk.

Does this mean that we agreed to handle the printk_nmi per-CPU buffer fallback in printk_safe.c?

Yes, I feel persuaded.

Best Regards, Petr

Sergey Senozhatsky

8:27 a.m.

On (06/19/18 09:52), Petr Mladek wrote:

...

On Mon 2018-06-18 19:07:18, Sergey Senozhatsky wrote:

...
On (06/18/18 11:39), Petr Mladek wrote: [..]

Hmm. Can't answer right now :)

Please, let me know what name you would like ;-)

:) Wow, it's hard. Maybe we can derive some bits from the ftrace_dump() function name? Dunno... Does printk_dump_nmi_enter() sound terrible?

Maybe Steven has some opinions on this?

-ss

Steven Rostedt

1:23 p.m.

On Tue, 19 Jun 2018 17:27:16 +0900 Sergey Senozhatsky sergey.senozhatsky.work@gmail.com wrote:

...

On (06/19/18 09:52), Petr Mladek wrote:

...
On Mon 2018-06-18 19:07:18, Sergey Senozhatsky wrote:

...
On (06/18/18 11:39), Petr Mladek wrote: [..]

Hmm. Can't answer right now :)

Please, let me know what name you would like ;-)

:) Wow, it's hard. Maybe we can derive some bits from the ftrace_dump() function name? Dunno... Does printk_dump_nmi_enter() sound terrible?

Maybe Steven has some opinions on this?

What exactly is the question?

Also, from a previous email in this thread, if you have to risk interleaved output to solve a deadlock, then just do that.

-- Steve

Sergey Senozhatsky

20 Jun 20 Jun

1:58 a.m.

On (06/19/18 09:23), Steven Rostedt wrote:

...

...
On (06/19/18 09:52), Petr Mladek wrote:

...
On Mon 2018-06-18 19:07:18, Sergey Senozhatsky wrote:

...
On (06/18/18 11:39), Petr Mladek wrote: [..]

Hmm. Can't answer right now :)

Please, let me know what name you would like ;-)

:) Wow, it's hard. Maybe we can derive some bits from the ftrace_dump() function name? Dunno... Does printk_dump_nmi_enter() sound terrible?

Maybe Steven has some opinions on this?

What exactly is the question?

Which one of these you'd prefer to see in ftrace_dump():

- printk_nmi_direct_enter() / printk_nmi_direct_exit() - printk_chatty_nmi_enter() / printk_chatty_nmi_exit() - printk_large_nmi_enter() / printk_large_nmi_exit() - printk_dump_nmi_enter() / printk_dump_nmi_exit()

...

Also, from a previous email in this thread, if you have to risk interleaved output to solve a deadlock, then just do that.

Yep.

-ss

Steven Rostedt

2:32 a.m.

On Wed, 20 Jun 2018 10:58:34 +0900 Sergey Senozhatsky sergey.senozhatsky.work@gmail.com wrote:

...

Which one of these you'd prefer to see in ftrace_dump():

printk_nmi_direct_enter() / printk_nmi_direct_exit()

The above appears to be the most sane.

-- Steve

...

printk_chatty_nmi_enter() / printk_chatty_nmi_exit()

printk_large_nmi_enter() / printk_large_nmi_exit()

printk_dump_nmi_enter() / printk_dump_nmi_exit()

Sergey Senozhatsky

4:17 a.m.

On (06/19/18 22:32), Steven Rostedt wrote:

...

On Wed, 20 Jun 2018 10:58:34 +0900 Sergey Senozhatsky sergey.senozhatsky.work@gmail.com wrote:

...
Which one of these you'd prefer to see in ftrace_dump():

printk_nmi_direct_enter() / printk_nmi_direct_exit()

The above appears to be the most sane.

OK. The original name suggested by Petr.

-ss

Petr Mladek

6 Jun 6 Jun

11:15 a.m.

On Wed 2018-06-06 14:10:29, Sergey Senozhatsky wrote:

...

On (06/05/18 14:47), Petr Mladek wrote: [..]

...
Grr, the ABBA deadlock is still there. NMIs are not sent to the other CPUs atomically. Even if we detect that logbuf_lock is available in printk_nmi_enter() on some CPUs, it might still get locked on another CPU before the other CPU gets NMI.

Can we do something about "B"? :) I mean - any chance we can rework locking in nmi_cpu_backtrace()?

I can't think of any possibility at the moment. Well, it does not mean that it does not exist.

The irony is that we need the extra lock in nmi_cpu_backtrace() only because we try to store the messages into the common log buffer. If we always use the per-CPU buffers in NMI then the lock would not cause any problems but it also won't be necessary.

...

...
By other words, any check in printk_safe_enter() is racy and not sufficient

I suppose you meant printk_nmi_enter().

Yup, I am sorry for confusion.

...

...
=> I suggest to revert the commit 719f6a7040f1bdaf96fcc70 "printk: Use the main logbuf in NMI when logbuf_lock is available" for-4.18 and stable until we get a better solution.

Just random thoughts.

May be we need to revert it, but let's not "panic". I think [but don't insist on it] that the patch in question is *probably* "good enough". It addresses a bug report after all.

It was a problem reported by me. I found it when testing other changes. The patch improved the situation definitely. The question is if it is enough in practice.

...

How often do we have arch_trigger_cpumask_backtrace() on all CPUs these days? I tend to think that it used to be much more popular in the past, because we had a loops_per_jiffy based spin_lock lockup detection which would trigger NMI backtracase, but this functionality has gone, see bc88c10d7e6900916f5e1ba3829d66a9de92b633 for details.

AFAIK, we (SUSE) started to work on the printk/NMI deadlocks because it was too risky to use sysrq-l on systems with many CPUs. It means that the use case is still there.

Hmm, the original problem was that any interrupted logbuf_lock owner caused the deadlock. It is _different now_ because the interrupted logbuf_lock owner must block another CPU via the lock in nmi_cpu_backtrace().

I actually wonder why this patch actually helped so much in my test (while true; do echo l >/proc/sysrq-trigger; done). My guess is that it increased a lot the number of CPUs using PRINTK_NMI_CONTEXT (per-CPU buffers). Therefore the deadlock was less likely. On the other hand, the original problem was basically back without this patch.

...

I'm not saying that the race condition that you found is unrealistic, I'm just saying that _it seems_ that nmi_panic()->printk() on a single CPU is more common now, so having that nmi_printk()->printk_deferred() might be quite valuable at the end of the day.

I see the point. I need to think about it.

...

May be I'm wrong!

...
The only safe solution seems to be a trylock() in NMI in vprintk_emit() and fallback to vprintk_safe() when the lock is not taken. I mean something like:

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 247808333ba4..4a5a0bf221b3 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1845,7 +1845,13 @@ asmlinkage int vprintk_emit(int facility, int level, printk_delay(); /* This stops the holder of console_sem just where we want him */

logbuf_lock_irqsave(flags);
printk_safe_enter_irqsave(flags);

if (in_nmi() && !raw_spin_trylock(&logbuf_lock)) {
vprintk_nmi(fmt, args);
printk_safe_exit_irqrestore(flags);
return;
} else
raw_spin_lock(&logbuf_lock);
/*

The printf needs to come first; we need the syslog

prefix which might be passed-in as a parameter.
I need some time to think about it.

The bad thing about this solution is that different messages might be handled different way during a single nmi_cpu_backtrace(). The result might be mixed backtraces in the log.

...

...
Sigh, this looks like a material for-4.19.

Agreed.

...
We might need to revisit if printk_context still makes sense, ...

...

What do you mean by this?

My dream was to select the right vprintk implementation according to printk_context on a single place (vprintk_func()). If this is not usable for NMI, we might need to think about another solution. But this is premature. I believe the printk_context will most likely make still sense.

I need to think more about it.

Best Regards, Petr

Sergey Senozhatsky

7 Jun 7 Jun

5:40 a.m.

On (06/06/18 13:15), Petr Mladek wrote:

...

On Wed 2018-06-06 14:10:29, Sergey Senozhatsky wrote:

...
On (06/05/18 14:47), Petr Mladek wrote: [..]

...
Grr, the ABBA deadlock is still there. NMIs are not sent to the other CPUs atomically. Even if we detect that logbuf_lock is available in printk_nmi_enter() on some CPUs, it might still get locked on another CPU before the other CPU gets NMI.

Can we do something about "B"? :) I mean - any chance we can rework locking in nmi_cpu_backtrace()?

I can't think of any possibility at the moment. Well, it does not mean that it does not exist.

The irony is that we need the extra lock in nmi_cpu_backtrace() only because we try to store the messages into the common log buffer. If we always use the per-CPU buffers in NMI then the lock would not cause any problems but it also won't be necessary.

Yep. I think we can come up with something. It seems that the only problem is that we want in this particular case to avoid printk_nmi()->logbuf and instead we want to enforce per-CPU printk_nmi buffers. Then we can drop nmi_cpu_backtrace() spinlock, because the messages will be serialized by printk_safe flush spin_lock. That doesn't sound like an impossible thing to do. What am I missing?

Could you please check my follow up email?

...

...
...
=> I suggest to revert the commit 719f6a7040f1bdaf96fcc70 "printk: Use the main logbuf in NMI when logbuf_lock is available" for-4.18 and stable until we get a better solution.

Just random thoughts.

May be we need to revert it, but let's not "panic". I think [but don't insist on it] that the patch in question is *probably* "good enough". It addresses a bug report after all.

It was a problem reported by me. I found it when testing other changes. The patch improved the situation definitely. The question is if it is enough in practice.

Oh, certainly. But I was talking about 719f6a7040f1bdaf96fcc70. We introduced that change in response to a bug report from Steven. He would not be able to debug his kernel otherwise, because per-CPU printk_nmi was too limited in size. So on one hand we have the problem that you reported, which you found while you were hammering/testing NMI printk-s [a valid report on its own]; on the other hand we have the problem that Steven reported, which he triggered while he was debugging the kernel. It might be the case that Steven's problem is more likely to happen in real world. So that's why I proposed to keep 719f6a7040f1bda for the time being [until we come up with another fix]. I may be wrong.

-ss

2683

days inactive

2717

days old

linux-stable-mirror@lists.linaro.org

21 comments

participants

tags (0)

participants (4)

Petr Mladek
Sergey Senozhatsky
Sergey Senozhatsky
Steven Rostedt