SysRq-L and RCU stall detector call arch_trigger_cpumask_backtrace() to trigger other CPU's backtrace, but its behavior is totally broken. The root cause is arch_trigger_cpumask_backtrace() use call-function IPI in irq context, which trigger deadlocks in smp_call_function_single() and smp_call_function_many().
This patch fix arch_trigger_cpumask_backtrace() by: 1, Use a dedecated IPI (SMP_CPU_BACKTRACE) to trigger backtraces; 2, If myself is in target cpumask, do backtrace and clear myself; 3, Use a spinlock to avoid parallel backtrace output; 4, Handle SMP_CPU_BACKTRACE IPI for Loongson-3.
I have attempted to implement SMP_CPU_BACKTRACE for all MIPS CPUs, but I failed because some of their IPIs are not extensible. :(
Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen chenhc@lemote.com --- arch/mips/include/asm/smp.h | 3 +++ arch/mips/kernel/process.c | 23 ++++++++++++++++++----- arch/mips/loongson64/loongson-3/smp.c | 6 ++++++ 3 files changed, 27 insertions(+), 5 deletions(-)
diff --git a/arch/mips/include/asm/smp.h b/arch/mips/include/asm/smp.h index 88ebd83..b0521f4 100644 --- a/arch/mips/include/asm/smp.h +++ b/arch/mips/include/asm/smp.h @@ -43,6 +43,7 @@ extern int __cpu_logical_map[NR_CPUS]; /* Octeon - Tell another core to flush its icache */ #define SMP_ICACHE_FLUSH 0x4 #define SMP_ASK_C0COUNT 0x8 +#define SMP_CPU_BACKTRACE 0x10
/* Mask of CPUs which are currently definitely operating coherently */ extern cpumask_t cpu_coherent_mask; @@ -81,6 +82,8 @@ static inline void __cpu_die(unsigned int cpu) extern void play_dead(void); #endif
+void arch_dump_stack(void); + /* * This function will set up the necessary IPIs for Linux to communicate * with the CPUs in mask. diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c index 57028d4..647e15d 100644 --- a/arch/mips/kernel/process.c +++ b/arch/mips/kernel/process.c @@ -655,26 +655,39 @@ unsigned long arch_align_stack(unsigned long sp) return sp & ALMASK; }
-static void arch_dump_stack(void *info) +void arch_dump_stack(void) { struct pt_regs *regs; + static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED;
+ arch_spin_lock(&lock); regs = get_irq_regs();
if (regs) show_regs(regs);
dump_stack(); + arch_spin_unlock(&lock); }
void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self) { long this_cpu = get_cpu(); + struct cpumask backtrace_mask; + extern const struct plat_smp_ops *mp_ops; + + cpumask_copy(&backtrace_mask, mask); + if (cpumask_test_cpu(this_cpu, mask)) { + if (!exclude_self) { + struct pt_regs *regs = get_irq_regs(); + if (regs) + show_regs(regs); + dump_stack(); + } + cpumask_clear_cpu(this_cpu, &backtrace_mask); + }
- if (cpumask_test_cpu(this_cpu, mask) && !exclude_self) - dump_stack(); - - smp_call_function_many(mask, arch_dump_stack, NULL, 1); + mp_ops->send_ipi_mask(&backtrace_mask, SMP_CPU_BACKTRACE);
put_cpu(); } diff --git a/arch/mips/loongson64/loongson-3/smp.c b/arch/mips/loongson64/loongson-3/smp.c index 8501109..0655114 100644 --- a/arch/mips/loongson64/loongson-3/smp.c +++ b/arch/mips/loongson64/loongson-3/smp.c @@ -291,6 +291,12 @@ void loongson3_ipi_interrupt(struct pt_regs *regs) __wbflush(); /* Let others see the result ASAP */ }
+ if (action & SMP_CPU_BACKTRACE) { + irq_enter(); + arch_dump_stack(); + irq_exit(); + } + if (irqs) { int irq; while ((irq = ffs(irqs))) {
Hi Huacai,
On Mon, Feb 05, 2018 at 11:42:47AM +0800, Huacai Chen wrote:
SysRq-L and RCU stall detector call arch_trigger_cpumask_backtrace() to trigger other CPU's backtrace, but its behavior is totally broken. The root cause is arch_trigger_cpumask_backtrace() use call-function IPI in irq context, which trigger deadlocks in smp_call_function_single() and smp_call_function_many().
This patch fix arch_trigger_cpumask_backtrace() by: 1, Use a dedecated IPI (SMP_CPU_BACKTRACE) to trigger backtraces; 2, If myself is in target cpumask, do backtrace and clear myself; 3, Use a spinlock to avoid parallel backtrace output; 4, Handle SMP_CPU_BACKTRACE IPI for Loongson-3.
I have attempted to implement SMP_CPU_BACKTRACE for all MIPS CPUs, but I failed because some of their IPIs are not extensible. :(
Interesting - I've been using a similar patch internally for a little while which can be seen here:
https://git.linux-mips.org/cgit/linux-mti.git/commit/?h=eng-v4.15&id=f46...
Mine uses the generic nmi_trigger_cpumask_backtrace() infrastructure to handle most of the work, and just has to deal with sending the IPIs. It relies upon some changes from Matt to do that for the generic platform.
If you have a chance could you test the branch below & let me know whether it works for you?
git://git.kernel.org/pub/scm/linux/kernel/git/paulburton/linux.git
Branch "wip-cpumask-backtrace".
Hopefully with a little more work we can fix this up generically.
Thanks, Paul
I can't test your branch...... Because now the mainline kernel lacks too many features needed by Loongson-3. By the way, Your approach is based on NMI but I don't think NMI is always available on each MIPS board.
Huacai
------------------ Original ------------------ From: "Paul Burton"paul.burton@mips.com; Date: Thu, Jun 14, 2018 05:21 AM To: "Huacai Chen"chenhc@lemote.com; Cc: "Ralf Baechle"ralf@linux-mips.org; "James Hogan"james.hogan@mips.com; "Steven J . Hill"Steven.Hill@cavium.com; "linux-mips"linux-mips@linux-mips.org; "Fuxin Zhang"zhangfx@lemote.com; "wuzhangjin"wuzhangjin@gmail.com; "stable"stable@vger.kernel.org; Subject: Re: [PATCH] MIPS: Fix arch_trigger_cpumask_backtrace()
Hi Huacai,
On Mon, Feb 05, 2018 at 11:42:47AM +0800, Huacai Chen wrote:
SysRq-L and RCU stall detector call arch_trigger_cpumask_backtrace() to trigger other CPU's backtrace, but its behavior is totally broken. The root cause is arch_trigger_cpumask_backtrace() use call-function IPI in irq context, which trigger deadlocks in smp_call_function_single() and smp_call_function_many().
This patch fix arch_trigger_cpumask_backtrace() by: 1, Use a dedecated IPI (SMP_CPU_BACKTRACE) to trigger backtraces; 2, If myself is in target cpumask, do backtrace and clear myself; 3, Use a spinlock to avoid parallel backtrace output; 4, Handle SMP_CPU_BACKTRACE IPI for Loongson-3.
I have attempted to implement SMP_CPU_BACKTRACE for all MIPS CPUs, but I failed because some of their IPIs are not extensible. :(
Interesting - I've been using a similar patch internally for a little while which can be seen here:
https://git.linux-mips.org/cgit/linux-mti.git/commit/?h=eng-v4.15&id=f46...
Mine uses the generic nmi_trigger_cpumask_backtrace() infrastructure to handle most of the work, and just has to deal with sending the IPIs. It relies upon some changes from Matt to do that for the generic platform.
If you have a chance could you test the branch below & let me know whether it works for you?
git://git.kernel.org/pub/scm/linux/kernel/git/paulburton/linux.git
Branch "wip-cpumask-backtrace".
Hopefully with a little more work we can fix this up generically.
Thanks, Paul
Hi Huacai,
On Fri, Jun 15, 2018 at 12:30:35PM +0800, 陈华才 wrote:
I can't test your branch...... Because now the mainline kernel lacks too many features needed by Loongson-3.
Interesting - so the mainline Loongson-3 code doesn't actually work? How much is missing for it to be functional?
By the way, Your approach is based on NMI but I don't think NMI is always available on each MIPS board.
It isn't using NMIs at all - the nmi_trigger_cpumask_backtrace() function has NMI in its name, sure, but it just invokes a callback to interrupt other CPUs & we can implement that using regular old IPIs.
This is the same way arch/arm does it, so it's not unprecedented & allows us to share the common code.
It would be ideal to use NMIs where possible in future, but that can come later for platforms where they're available.
Thanks, Paul
Mainline kernel can work on Loongson-3A1000/3B1500, but unstable on Loongson-3A2000/3A3000, and there is something wrong about display. I only have Loongson-3A2000/3A3000 now, and some of needed patches are already available on patchwork.
Huacai
------------------ Original ------------------ From: "Paul Burton"paul.burton@mips.com; Date: Sat, Jun 16, 2018 00:31 AM To: "陈华才"chenhc@lemote.com; Cc: "Ralf Baechle"ralf@linux-mips.org; "James Hogan"james.hogan@mips.com; "Steven J . Hill"Steven.Hill@cavium.com; "linux-mips"linux-mips@linux-mips.org; "Fuxin Zhang"zhangfx@lemote.com; "wuzhangjin"wuzhangjin@gmail.com; "stable"stable@vger.kernel.org; Subject: Re: [PATCH] MIPS: Fix arch_trigger_cpumask_backtrace()
Hi Huacai,
On Fri, Jun 15, 2018 at 12:30:35PM +0800, 陈华才 wrote:
I can't test your branch...... Because now the mainline kernel lacks too many features needed by Loongson-3.
Interesting - so the mainline Loongson-3 code doesn't actually work? How much is missing for it to be functional?
By the way, Your approach is based on NMI but I don't think NMI is always available on each MIPS board.
It isn't using NMIs at all - the nmi_trigger_cpumask_backtrace() function has NMI in its name, sure, but it just invokes a callback to interrupt other CPUs & we can implement that using regular old IPIs.
This is the same way arch/arm does it, so it's not unprecedented & allows us to share the common code.
It would be ideal to use NMIs where possible in future, but that can come later for platforms where they're available.
Thanks, Paul
linux-stable-mirror@lists.linaro.org