Hi All,
LKFT occasionally found a kernel bug in x15 platform, which is a armv7 board. The bug caught on kernel commit f82786d v4.9.55, but panic could happens in upstream, since there is no much change on the function call chain.
The function call chain is vector___pabt_svc -> do_PrefetchAbort -> do_page_fault -> might_sleep()
The trick thing is LKFT team can not reproduce the bug. But from the kernel panic info, we know the irq_disabled() is 128, that would be the only reason, we got the panic -- the code can not return since irqs_disabled() = 128. The preempt_offset and preempt_count are both 0 here.
line 7726 in kernel/sched/core.c: in function ___might_sleep(): if ((preempt_count_equals(preempt_offset) && !irqs_disabled() && !is_idle_task(current)) || system_state != SYSTEM_RUNNING || oops_in_progress) return;
I have no more idea on this issue. Any hints are appreciated!
Regards Alex
BUG: sleeping function called from invalid context at /srv/oe/build/tmp-rpb-glibc/work-shared/am57xx-evm/kernel-source/arch/arm/mm/fault.c:303 [ 53.264908] in_atomic(): 0, irqs_disabled(): 128, pid: 1691, name: ftracetest [ 53.272074] 1 lock held by ftracetest/1691: [ 53.276273] #0: (&mm->mmap_sem){++++++}, at: [<c0d60cfc>] do_page_fault+0x90/0x428 [ 53.284095] irq event stamp: 12924 [ 53.287514] hardirqs last enabled at (12923): [<c0307f10>] no_work_pending+0x4/0x30 [ 53.295289] hardirqs last disabled at (12924): [<c0d605a0>] __pabt_svc+0x60/0xa0 [ 53.302718] softirqs last enabled at (11474): [<c034c5d0>] __do_softirq+0x280/0x5ac [ 53.310494] softirqs last disabled at (11433): [<c034cc98>] irq_exit+0xf4/0x158 [ 53.317837] CPU: 0 PID: 1691 Comm: ftracetest Not tainted 4.9.55-dirty #1 [ 53.324652] Hardware name: Generic DRA74X (Flattened Device Tree) [ 53.330857] [<c03114d8>] (unwind_backtrace) from [<c030cb18>] (show_stack+0x10/0x14) [ 53.338644] [<c030cb18>] (show_stack) from [<c067e604>] (dump_stack+0xa4/0xd0) [ 53.345908] [<c067e604>] (dump_stack) from [<c0373808>] (___might_sleep+0x1ac/0x2a0) [ 53.353694] [<c0373808>] (___might_sleep) from [<c0d60ec8>] (do_page_fault+0x25c/0x428) [ 53.361739] [<c0d60ec8>] (do_page_fault) from [<c03013e8>] (do_PrefetchAbort+0x38/0x9c) [ 53.369780] [<c03013e8>] (do_PrefetchAbort) from [<c0d605a8>] (__pabt_svc+0x68/0xa0) [ 53.377557] Exception stack(0xec6fbfa8 to 0xec6fbff0) [ 53.382629] bfa0: 00000001 00000001 ffffffff 00000000 0010ac68 00000007 [ 53.390845] bfc0: 00000001 0000003f 00000009 0000000c fffffffa be9d27a4 000e31fc ec6fbff8 [ 53.399055] bfe0: b6e6d49c b6e6d49c 40070093 ffffffff [ 53.404137] [<c0d605a8>] (__pabt_svc) from [<b6e6d49c>] (0xb6e6d49c)
On Tue, Nov 21, 2017 at 09:06:27PM +0800, Alex Shi wrote:
Unfortunately, this doesn't help, because on entry to __pabt_svc, we tell the IRQ context tracker that IRQs are now disabled, wiping out the previous recording of where IRQs were disabled...
It also doesn't help that the backtrace stops at this point, and it looks very strange:
1. the value of PC looks like it's outside of the module space. 2. the CPSR indicates that the CPU was in SVC mode in the parent context with IRQs disabled. 3. We're right at the top of the kernel stack, which suggests no further stack frames above this.
We should never be in SVC mode without further stack frames on the kernel stack.
We don't seem to have overflowed the kernel stack, as the thread info seems correct - and it would also be unlikely that the saved SP value would end in ff8 in the exception stack frame.
I suspect something nasty is going on in the ftrace code, causing some stacked state corruption, which then leads to us returning from a kernel exception with state that leaves the CPU in SVC mode with IRQs disabled, and with a LR & PC value of 0xb6e6d49c - a page that doesn't exist. That the leads to a prefetch abort, and this error.
In other words, the real problem is that something has gone wrong in the ftrace code... what that is, I've no idea.
Hi Russell,
Sorry for response late! Is this SP was stained by sth? As my understand, SP should be times of 32bits. But why stack print out correct with a incorrect SP?
Full agree with your analysis. Is it possible to stain PC value with heavy stress on thermal or sth else? the ARM64 board run well with ftracetest of LTP.
Thanks a lot! Alex
On Fri, Nov 24, 2017 at 11:09:30PM +0800, Alex Shi wrote:
There's nothing wrong with SP.
Looking deeper at this, I think that the kernel stack somehow got corrupted earlier:
irq event stamp: 12924 hardirqs last enabled at (12923): [<c0307f10>] no_work_pending+0x4/0x30 hardirqs last disabled at (12924): [<c0d605a0>] __pabt_svc+0x60/0xa0
The hard IRQ disable is as a result of taking a prefetch abort in SVC mode. The saved context agrees with that:
R0 R1 R2 R3 R4 R5 bfa0: 00000001 00000001 ffffffff 00000000 0010ac68 00000007 R6 R7 R8 R9 R10 FP IP SP bfc0: 00000001 0000003f 00000009 0000000c fffffffa be9d27a4 000e31fc ec6fbff8 LR PC PSR bfe0: b6e6d49c b6e6d49c 40070093 ffffffff
The PSR lower 5 bits are 0x13, which is SVC mode. Bit 7 set means IRQs disabled. The PC address was 0xb6e6d49c.
The last record we have of interrupts being enabled was in no_work_pending, which is the exit path to usermode - but if we were returning to usermode:
(a) how did we get into SVC mode instead (b) why are interrupts disabled (c) why was mm->mmap_sem still held
Can you try the following patch to try and catch the problem earlier? I haven't tested it myself, and adding code may move things around in the kernel and make this bug disappear.
diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S index d523cd8439a3..ff577177b286 100644 --- a/arch/arm/kernel/entry-header.S +++ b/arch/arm/kernel/entry-header.S @@ -299,6 +299,8 @@ @ ARM mode restore mov r2, sp ldr r1, [r2, #\offset + S_PSR] @ get calling cpsr + tst r1, #0xcf + bne oops ldr lr, [r2, #\offset + S_PC]! @ get pc msr spsr_cxsf, r1 @ save in spsr_svc #if defined(CONFIG_CPU_V6) || defined(CONFIG_CPU_32v6K) @@ -314,6 +316,15 @@ @ after ldm {}^ add sp, sp, #\offset + PT_REGS_SIZE movs pc, lr @ return & move spsr_svc into cpsr +oops: .word 0xe7f001f2 + .pushsection .rodata.str, "aMS", %progbits, 1 +2: .asciz "Returning to usermode but unexpected PSR bits set?" + .popsection + .pushsection __bug_table, "aw" + .align 2 + .word oops, 2b + .hword @ + .popsection #elif defined(CONFIG_CPU_V7M) @ V7M restore. @ Note that we don't need to do clrex here as clearing the local
On 11/24/2017 11:56 PM, Russell King - ARM Linux wrote:
Got it. Thanks a lot!
Thank you very much for detailed explanations! :)
We return to user mode correctly in no_work_pending, the irq enabled then we get a pabt_svc?
(b) why are interrupts disabled
it was disabled in __pabt_svc, objdump show a extra irq disable with arm defconfig. ... 2a0: ebfffffe bl 0 <v7_pabort> 2a4: f10c0080 cpsid i ...
(c) why was mm->mmap_sem still held
the bug was caught here, just with mmap_sem, the pabt_svc happens.
Do we still need to try this patch? I saw you tried it and do further more.
This oops cause allnoconfig with arm failed in build. but it's fine for a multi_v7_defconfig arch/arm/kernel/entry-common.S:106: Error: symbol `oops' is already defined
On Fri, Nov 24, 2017 at 11:09:30PM +0800, Alex Shi wrote:
In your first email, you said "x15 platform, which is a armv7 board." Here you say "ARM64 board" which isn't armv7. There's x15 DTS under arch/arm/boot/dts, so I guess you mean 32-bit ARM, but who knows...
Anyway, I've tried running ftracetest on an OMAP4430 SDP board, and after a while with the patch I sent you, I get:
Internal error: Oops - BUG: 0 [#1] SMP ARM Modules linked in: CPU: 1 PID: 2948 Comm: ftracetest Not tainted 4.14.0+ #557 Hardware name: Generic OMAP4 (Flattened Device Tree) task: ce41c100 task.stack: cc7b8000 PC is at oops+0x0/0x4 LR is at trace_hardirqs_on_caller+0x154/0x1e0 pc : [<c0015adc>] lr : [<c0086840>] psr: 20000193 sp : cc7b9fb0 ip : cc7b9f80 fp : 00000000 r10: 00000000 r9 : cc7b8000 r8 : c0015c28 r7 : 00000006 r6 : 00000004 r5 : 0009fed4 r4 : 00000001 r3 : 00000000 r2 : cc7b9fb0 r1 : 60000193 r0 : 00000001 Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user Control: 10c5387d Table: 8e7b804a DAC: 00000055 Process ftracetest (pid: 2948, stack limit = 0xcc7b8210) Stack: (0xcc7b9fb0 to 0xcc7ba000) 9fa0: 00000000 00000000 0009b008 0000000d 9fc0: 00000001 0009fed4 00000004 00000006 0009b3e0 000a17d4 00000000 beaf6e3c 9fe0: 0009fed0 beaf6e20 000319e0 b6e7199c 60000193 00000001 6b6b6b6b a56b6b6b Backtrace: no frame pointer Code: e9527fff e1a00000 e28dd048 e1b0f00e (e7f001f2) ---[ end trace 390efe5843605357 ]---
The other CPU also oopses:
Internal error: Oops - BUG: 0 [#3] SMP ARM Modules linked in: CPU: 1 PID: 1 Comm: init Tainted: G D 4.14.0+ #557 Hardware name: Generic OMAP4 (Flattened Device Tree) task: ced04c00 task.stack: ced06000 PC is at oops+0x0/0x4 LR is at trace_hardirqs_on+0x14/0x18 pc : [<c0015adc>] lr : [<c00868e0>] psr: 20000193 sp : ced07fb0 ip : ced07fa0 fp : 00000000 r10: 00000000 r9 : ced06000 r8 : c0015c28 r7 : 0000004e r6 : bec0acd4 r5 : 000176b4 r4 : bec0ac3c r3 : 00000000 r2 : ced07fb0 r1 : 60000193 r0 : c0015aa8 Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user Control: 10c5387d Table: 8e2d804a DAC: 00000055 Process init (pid: 1, stack limit = 0xced06210) Stack: (0xced07fb0 to 0xced08000) 7fa0: 00000000 00000000 00000000 00000000 7fc0: bec0ac3c 000176b4 bec0acd4 0000004e 10000000 00000000 0000a1d0 bec0ac44 7fe0: bec0acd8 bec0ac28 b6e54544 b6e5456c 60000193 bec0ac28 005d5555 00020201 Backtrace: no frame pointer Code: e9527fff e1a00000 e28dd048 e1b0f00e (e7f001f2) ---[ end trace 390efe5843605358 ]---
which is exactly your bug, but caught a bit earlier.
This happens while executing this ftrace test:
[28] Register/unregister many kprobe events
and needs a kernel with ftrace and kprobes enabled.
Unfortunately, the debug is immediately after a call to trace_hardirqs_on() in no_work_pending, so the LR value is meaningless.
So, now that we know it's tracing kprobes triggering it - it's trying to set tracepoints on the first 256 symbols in the kernel's kallsyms, which includes all sorts of things.
With some extra debug, this doesn't look clever:
trace_kprobe: Inserting kprobe at ret_fast_syscall+0 trace_kprobe: Inserting kprobe at slow_work_pending+0 trace_kprobe: Inserting kprobe at ret_slow_syscall+0 trace_kprobe: Could not insert probe at ret_slow_syscall+0: -22 trace_kprobe: Inserting kprobe at ret_to_user+0 trace_kprobe: Could not insert probe at ret_to_user+0: -22 trace_kprobe: Inserting kprobe at ret_to_user_from_irq+0 trace_kprobe: Inserting kprobe at no_work_pending+0 trace_kprobe: Inserting kprobe at oops+0 trace_kprobe: Could not insert probe at oops+0: -22 trace_kprobe: Inserting kprobe at ret_from_fork+0 trace_kprobe: Inserting kprobe at vector_swi+0 trace_kprobe: Inserting kprobe at local_restart+0 trace_kprobe: Inserting kprobe at __sys_trace+0 trace_kprobe: Inserting kprobe at __sys_trace_return+0 trace_kprobe: Inserting kprobe at __sys_trace_return_nosave+0 trace_kprobe: Could not insert probe at __sys_trace_return_nosave+0: -22 trace_kprobe: Inserting kprobe at __cr_alignment+0 trace_kprobe: Could not insert probe at __cr_alignment+0: -22 trace_kprobe: Inserting kprobe at sys_call_table+0 trace_kprobe: Inserting kprobe at sys_syscall+0 trace_kprobe: Inserting kprobe at sys_sigreturn_wrapper+0 trace_kprobe: Inserting kprobe at sys_rt_sigreturn_wrapper+0 trace_kprobe: Inserting kprobe at sys_statfs64_wrapper+0 trace_kprobe: Inserting kprobe at sys_fstatfs64_wrapper+0
I wouldn't be surprised if some of those were the cause of it - for example, what guarantee do we have that a trace kprobe inserted at ret_fast_syscall which starts with this:
c0015a40: e5ad0008 str r0, [sp, #8]!
will be handled correctly? I can't say, I've virtually no knowledge about kprobes, but I guess it isn't - especially as there's this comment in the ARM kprobes code:
* Never instrument insn like 'str r0, [sp, +/-r1]'. Also, insn likes * 'str r0, [sp, #-68]' should also be prohibited.
Clearly, that's not the case as the kprobes insert on ret_fast_syscall succeeded.
Adding Tixy, as he's more knowledgable in this area - I suggest someone knowledgable in this area runs
ftracetest test.d/kprobe/multiple_kprobes.tc
and fixes these bugs... also running the entire ftracetest suite would probably also be a very good idea.
On Fri, Nov 24, 2017 at 07:27:00PM +0000, Russell King - ARM Linux wrote:
There's some other stupidities as well:
trace_kprobe: Inserting kprobe at __error_lpae+0 trace_kprobe: Inserting kprobe at str_lpae+0 trace_kprobe: Inserting kprobe at __error_p+0 trace_kprobe: Inserting kprobe at str_p1+0 trace_kprobe: Inserting kprobe at str_p2+0 trace_kprobe: Could not insert probe at str_p2+0: -22
str_lpae: .asciz "\nError: Kernel with LPAE support, but CPU does not support LPAE.\n" str_p1: .asciz "\nError: unrecognized/unsupported processor variant (0x" str_p2: .asciz ").\n"
So we successfully placed a kprobe on some ASCII strings, which are used prior to the kernel being properly up and running, which means we have to use relative references to these strings, and relative references to strings in other sections is not simple.
More worrying:
trace_kprobe: Inserting kprobe at __turn_mmu_on+0 trace_kprobe: Inserting kprobe at __idmap_text_start+0 trace_kprobe: Inserting kprobe at __turn_mmu_on_end+0 ... trace_kprobe: Inserting kprobe at __idmap_text_end+0 ... trace_kprobe: Inserting kprobe at secondary_startup+0 trace_kprobe: Inserting kprobe at secondary_startup_arm+0 trace_kprobe: Inserting kprobe at __secondary_switched+0 trace_kprobe: Inserting kprobe at __secondary_data+0 trace_kprobe: Inserting kprobe at __enable_mmu+0 trace_kprobe: Inserting kprobe at __do_fixup_smp_on_up+0 trace_kprobe: Inserting kprobe at __fixup_a_pv_table+0 trace_kprobe: Inserting kprobe at fixup_pv_table+0 trace_kprobe: Inserting kprobe at __lookup_processor_type+0 trace_kprobe: Inserting kprobe at __lookup_processor_type_data+0
These are definitely a no-no for tracing, because they're part of the early startup code for the kernel when no stacks are available.
Some of these can't live in the special "do not use kprobes here" section as they need to be in other sections (like .idmap) because they need to be part of the identity-mapped code.
On Fri, Nov 24, 2017 at 08:25:53PM +0000, Russell King - ARM Linux wrote:
Okay, this is what I came up with, and seems to solve the problem here. It's quite a large patch though, as it shuffles around how we deal with exception entry, and knowing whether we should dump the stacked registers. This particular patch also contains a few debugging bits too.
arch/arm/include/asm/exception.h | 3 +-- arch/arm/include/asm/sections.h | 21 +++++++++++++++++++++ arch/arm/include/asm/traps.h | 12 ------------ arch/arm/kernel/entry-armv.S | 6 +----- arch/arm/kernel/entry-common.S | 1 + arch/arm/kernel/entry-header.S | 13 +++++++++++++ arch/arm/kernel/head-common.S | 12 ++++++------ arch/arm/kernel/head.S | 2 +- arch/arm/kernel/stacktrace.c | 14 ++------------ arch/arm/kernel/traps.c | 4 ++-- arch/arm/kernel/vmlinux.lds.S | 6 +++--- arch/arm/mm/fault.c | 5 ++--- arch/arm/probes/kprobes/core.c | 14 +++++++++++--- kernel/trace/trace_kprobe.c | 3 +++ 14 files changed, 67 insertions(+), 49 deletions(-)
diff --git a/arch/arm/include/asm/exception.h b/arch/arm/include/asm/exception.h index a7273ad9587a..58e039a851af 100644 --- a/arch/arm/include/asm/exception.h +++ b/arch/arm/include/asm/exception.h @@ -10,11 +10,10 @@
#include <linux/interrupt.h>
-#define __exception __attribute__((section(".exception.text"))) #ifdef CONFIG_FUNCTION_GRAPH_TRACER #define __exception_irq_entry __irq_entry #else -#define __exception_irq_entry __exception +#define __exception_irq_entry #endif
#endif /* __ASM_ARM_EXCEPTION_H */ diff --git a/arch/arm/include/asm/sections.h b/arch/arm/include/asm/sections.h index 63dfe1f10335..4ceb4f757d4d 100644 --- a/arch/arm/include/asm/sections.h +++ b/arch/arm/include/asm/sections.h @@ -6,4 +6,25 @@
extern char _exiprom[];
+extern char __idmap_text_start[]; +extern char __idmap_text_end[]; +extern char __entry_text_start[]; +extern char __entry_text_end[]; +extern char __hyp_idmap_text_start[]; +extern char __hyp_idmap_text_end[]; + +static inline bool in_entry_text(unsigned long addr) +{ + return memory_contains(__entry_text_start, __entry_text_end, + (void *)addr, 1); +} + +static inline bool in_idmap_text(unsigned long addr) +{ + void *a = (void *)addr; + return memory_contains(__idmap_text_start, __idmap_text_end, a, 1) || + memory_contains(__hyp_idmap_text_start, __hyp_idmap_text_end, + a, 1); +} + #endif /* _ASM_ARM_SECTIONS_H */ diff --git a/arch/arm/include/asm/traps.h b/arch/arm/include/asm/traps.h index f9a6c5fc3fd1..a00288d75ee6 100644 --- a/arch/arm/include/asm/traps.h +++ b/arch/arm/include/asm/traps.h @@ -28,18 +28,6 @@ static inline int __in_irqentry_text(unsigned long ptr) ptr < (unsigned long)&__irqentry_text_end; }
-static inline int in_exception_text(unsigned long ptr) -{ - extern char __exception_text_start[]; - extern char __exception_text_end[]; - int in; - - in = ptr >= (unsigned long)&__exception_text_start && - ptr < (unsigned long)&__exception_text_end; - - return in ? : __in_irqentry_text(ptr); -} - extern void __init early_trap_init(void *); extern void dump_backtrace_entry(unsigned long where, unsigned long from, unsigned long frame); extern void ptrace_break(struct task_struct *tsk, struct pt_regs *regs); diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S index 10ad44f3886a..b8ab97dfdd17 100644 --- a/arch/arm/kernel/entry-armv.S +++ b/arch/arm/kernel/entry-armv.S @@ -82,11 +82,7 @@ #endif .endm
-#ifdef CONFIG_KPROBES - .section .kprobes.text,"ax",%progbits -#else - .text -#endif + .section .entry.text,"ax",%progbits
/* * Invalid mode handlers diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S index e655dcd0a933..3c4f88701f22 100644 --- a/arch/arm/kernel/entry-common.S +++ b/arch/arm/kernel/entry-common.S @@ -37,6 +37,7 @@ saved_pc .req lr #define TRACE(x...) #endif
+ .section .entry.text,"ax",%progbits .align 5 #if !(IS_ENABLED(CONFIG_TRACE_IRQFLAGS) || IS_ENABLED(CONFIG_CONTEXT_TRACKING)) /* diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S index d523cd8439a3..e2d216ad70f0 100644 --- a/arch/arm/kernel/entry-header.S +++ b/arch/arm/kernel/entry-header.S @@ -300,6 +300,8 @@ mov r2, sp ldr r1, [r2, #\offset + S_PSR] @ get calling cpsr ldr lr, [r2, #\offset + S_PC]! @ get pc + tst r1, #0xcf + bne oops msr spsr_cxsf, r1 @ save in spsr_svc #if defined(CONFIG_CPU_V6) || defined(CONFIG_CPU_32v6K) @ We must avoid clrex due to Cortex-A15 erratum #830321 @@ -314,6 +316,17 @@ @ after ldm {}^ add sp, sp, #\offset + PT_REGS_SIZE movs pc, lr @ return & move spsr_svc into cpsr +oops: .word 0xe7f001f2 +#ifdef CONFIG_DEBUG_BUGVERBOSE + .pushsection .rodata.str, "aMS", %progbits, 1 +2: .asciz "Returning to usermode but unexpected PSR bits set?" + .popsection + .pushsection __bug_table, "aw" + .align 2 + .word oops, 2b + .hword @ + .popsection +#endif #elif defined(CONFIG_CPU_V7M) @ V7M restore. @ Note that we don't need to do clrex here as clearing the local diff --git a/arch/arm/kernel/head-common.S b/arch/arm/kernel/head-common.S index 21dde771a7dd..a989d84c4c38 100644 --- a/arch/arm/kernel/head-common.S +++ b/arch/arm/kernel/head-common.S @@ -201,10 +201,10 @@ ENDPROC(__lookup_processor_type)
__error_lpae: #ifdef CONFIG_DEBUG_LL - adr r0, str_lpae + adr r0, 1f bl printascii b __error -str_lpae: .asciz "\nError: Kernel with LPAE support, but CPU does not support LPAE.\n" +1: .asciz "\nError: Kernel with LPAE support, but CPU does not support LPAE.\n" #else b __error #endif @@ -213,15 +213,15 @@ ENDPROC(__error_lpae)
__error_p: #ifdef CONFIG_DEBUG_LL - adr r0, str_p1 + adr r0, 1f bl printascii mov r0, r9 bl printhex8 - adr r0, str_p2 + adr r0, 2f bl printascii b __error -str_p1: .asciz "\nError: unrecognized/unsupported processor variant (0x" -str_p2: .asciz ").\n" +1: .asciz "\nError: unrecognized/unsupported processor variant (0x" +2: .asciz ").\n" .align #endif ENDPROC(__error_p) diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S index 6b1148cafffd..7ea72ce41329 100644 --- a/arch/arm/kernel/head.S +++ b/arch/arm/kernel/head.S @@ -415,7 +415,7 @@ ENDPROC(secondary_startup_arm) /* * r6 = &secondary_data */ -ENTRY(__secondary_switched) +__secondary_switched: ldr sp, [r7, #12] @ get secondary_data.stack mov fp, #0 b secondary_start_kernel diff --git a/arch/arm/kernel/stacktrace.c b/arch/arm/kernel/stacktrace.c index 65228bf4c6df..a56e7c856ab5 100644 --- a/arch/arm/kernel/stacktrace.c +++ b/arch/arm/kernel/stacktrace.c @@ -3,6 +3,7 @@ #include <linux/sched/debug.h> #include <linux/stacktrace.h>
+#include <asm/sections.h> #include <asm/stacktrace.h> #include <asm/traps.h>
@@ -63,7 +64,6 @@ EXPORT_SYMBOL(walk_stackframe); #ifdef CONFIG_STACKTRACE struct stack_trace_data { struct stack_trace *trace; - unsigned long last_pc; unsigned int no_sched_functions; unsigned int skip; }; @@ -87,16 +87,7 @@ static int save_trace(struct stackframe *frame, void *d) if (trace->nr_entries >= trace->max_entries) return 1;
- /* - * in_exception_text() is designed to test if the PC is one of - * the functions which has an exception stack above it, but - * unfortunately what is in frame->pc is the return LR value, - * not the saved PC value. So, we need to track the previous - * frame PC value when doing this. - */ - addr = data->last_pc; - data->last_pc = frame->pc; - if (!in_exception_text(addr)) + if (!in_entry_text(frame->pc)) return 0;
regs = (struct pt_regs *)frame->sp; @@ -114,7 +105,6 @@ static noinline void __save_stack_trace(struct task_struct *tsk, struct stackframe frame;
data.trace = trace; - data.last_pc = ULONG_MAX; data.skip = trace->skip; data.no_sched_functions = nosched;
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c index 5de2bc46153f..95978073db53 100644 --- a/arch/arm/kernel/traps.c +++ b/arch/arm/kernel/traps.c @@ -73,7 +73,7 @@ void dump_backtrace_entry(unsigned long where, unsigned long from, unsigned long printk("Function entered at [<%08lx>] from [<%08lx>]\n", where, from); #endif
- if (in_exception_text(where)) + if (in_entry_text(from)) dump_mem("", "Exception stack", frame + 4, frame + 4 + sizeof(struct pt_regs)); }
@@ -434,7 +434,7 @@ static int call_undef_hook(struct pt_regs *regs, unsigned int instr) return fn ? fn(regs, instr) : 1; }
-asmlinkage void __exception do_undefinstr(struct pt_regs *regs) +asmlinkage void do_undefinstr(struct pt_regs *regs) { unsigned int instr; siginfo_t info; diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S index ee53f6518872..84a1ae3ce46e 100644 --- a/arch/arm/kernel/vmlinux.lds.S +++ b/arch/arm/kernel/vmlinux.lds.S @@ -105,9 +105,9 @@ SECTIONS .text : { /* Real text segment */ _stext = .; /* Text and read-only data */ IDMAP_TEXT - __exception_text_start = .; - *(.exception.text) - __exception_text_end = .; + __entry_text_start = .; + *(.entry.text) + __entry_text_end = .; IRQENTRY_TEXT SOFTIRQENTRY_TEXT TEXT_TEXT diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c index 42f585379e19..b75eada23d0a 100644 --- a/arch/arm/mm/fault.c +++ b/arch/arm/mm/fault.c @@ -21,7 +21,6 @@ #include <linux/highmem.h> #include <linux/perf_event.h>
-#include <asm/exception.h> #include <asm/pgtable.h> #include <asm/system_misc.h> #include <asm/system_info.h> @@ -545,7 +544,7 @@ hook_fault_code(int nr, int (*fn)(unsigned long, unsigned int, struct pt_regs *) /* * Dispatch a data abort to the relevant handler. */ -asmlinkage void __exception +asmlinkage void do_DataAbort(unsigned long addr, unsigned int fsr, struct pt_regs *regs) { const struct fsr_info *inf = fsr_info + fsr_fs(fsr); @@ -578,7 +577,7 @@ hook_ifault_code(int nr, int (*fn)(unsigned long, unsigned int, struct pt_regs * ifsr_info[nr].name = name; }
-asmlinkage void __exception +asmlinkage void do_PrefetchAbort(unsigned long addr, unsigned int ifsr, struct pt_regs *regs) { const struct fsr_info *inf = ifsr_info + fsr_fs(ifsr); diff --git a/arch/arm/probes/kprobes/core.c b/arch/arm/probes/kprobes/core.c index 52d1cd14fda4..e90cc8a08186 100644 --- a/arch/arm/probes/kprobes/core.c +++ b/arch/arm/probes/kprobes/core.c @@ -32,6 +32,7 @@ #include <linux/percpu.h> #include <linux/bug.h> #include <asm/patch.h> +#include <asm/sections.h>
#include "../decode-arm.h" #include "../decode-thumb.h" @@ -64,9 +65,6 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p) int is; const struct decode_checker **checkers;
- if (in_exception_text(addr)) - return -EINVAL; - #ifdef CONFIG_THUMB2_KERNEL thumb = true; addr &= ~1; /* Bit 0 would normally be set to indicate Thumb code */ @@ -680,3 +678,13 @@ int __init arch_init_kprobes() #endif return 0; } + +bool arch_within_kprobe_blacklist(unsigned long addr) +{ + void *a = (void *)addr; + + return __in_irqentry_text(addr) || + in_entry_text(addr) || + in_idmap_text(addr) || + memory_contains(__kprobes_text_start, __kprobes_text_end, a, 1); +} diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 8a907e12b6b9..b0a7068afa2d 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -472,6 +472,9 @@ static int __register_trace_kprobe(struct trace_kprobe *tk) else tk->rp.kp.flags |= KPROBE_FLAG_DISABLED;
+ pr_info("Inserting kprobe at %s+%lu\n", + trace_kprobe_symbol(tk), trace_kprobe_offset(tk)); + if (trace_kprobe_is_return(tk)) ret = register_kretprobe(&tk->rp); else
On Sun, 26 Nov 2017 22:59:58 +0800 Alex Shi alex.shi@linaro.org wrote:
Oops, that's my mistake. It should pick only text symbols.
It should be rejected by kprobes arch dependent code.
kprobes already has a blacklist on x86, so it is a good time to start making it on arm/arm64 too.
Thank you,
On Mon, Nov 27, 2017 at 10:40:34AM +0900, Masami Hiramatsu wrote:
Hi Masami, Russell
Does the fact you are allowed to put a kprobe on an ASCII string from userspace indicate a real problem? I would of thought the kernel core kprobe could would be looking at the type of a symbol and rejecting such requests. So fix the core, keep this test and make sure you get an EINVAL back?
Andrew
On Mon, Nov 27, 2017 at 02:36:31PM +0100, Andrew Lunn wrote:
As far as I'm aware, the kernel doesn't know whether the address that userspace wants to set a kprobe is code or any kind of data. All that it can do is:
1. Translate the address to a ksym and offset, looking it up in blacklists/blacklisted sections. 2. Look at the "instruction" and reject if it thinks the instruction is unsuitable.
Making the kernel reject placing a kprobe on a local function ("t" in nm's or /proc/kallsyms output) would severely restrict the usefulness of kprobes - that would mean you couldn't ever set a kprobe on a static function.
So no, I don't agree with you.
Normally, there won't be strings in the .text section, but we have some exceptions in ARM code where we have to have them to make early kernel entry code sane without having to jump through excessive hoops.
As I've already said, we should not be placing kprobes even on this code. Think about a kprobe placed on the secondary CPU entry point, where the CPU is running with the MMU off and there's no exception table in place. The kprobe instruction is a CPU undefined instruction, so the CPU will try and take the undefined instruction vector, which won't be present - the result will be an instant crash.
The same is true of the identity-mapped code - which again can run with the MMU off, and suffer from exactly the same issue.
Then we have the exception code itself. Consider the implications of placing the kprobe undefined instruction somewhere in the undefined instruction exception handling code - that would result in recursive faulting if placed on the SVC paths.
So no, this problem has nothing to do with symbol types - it's about what code is safe for kprobes to be placed.
On Mon, 27 Nov 2017 13:55:23 +0000 Russell King - ARM Linux linux@armlinux.org.uk wrote:
Right, and anyway kprobes can put on a specified address via userspace.
In general, kprobe itself checks probe point is in the .text section, so most of cases are safe as Russell explained.
It's arch-specific reason, so it should be filtered out in arch specific kprobe code.
OK.
So there is already NOKPROBE_SYMBOL() macro to make a blacklist so that we can avoid such recursive faults. We need to identify such place and put the symbols in it as I did on x86.
Thank you,
On Tue, Nov 28, 2017 at 02:52:21PM +0900, Masami Hiramatsu wrote:
Only if they are C functions. If they're assembly, NOKPROBE_SYMBOL() doesn't work. The answer there is to move them into a section that is excluded from kprobing. That's what the second patch I've already sent does.
On Tue, 28 Nov 2017 09:52:05 +0000 Russell King - ARM Linux linux@armlinux.org.uk wrote:
Thanks!
And if you need to put some symbols in assembly which already in another section, you can use _ASM_NOKPROBE(symbol) macro :)
On Sun, 26 Nov 2017 20:07:43 +0800 Alex Shi alex.shi@linaro.org wrote:
This test case is to test whether we can safely put multiple kprobes on different functions or not. If it causes kernel crashes, it means kprobes has a bug on that function (the last "Inserting kprobe at ...").
Since the ftrace kprobe event interface is opened to userspace (but also requires root priv), it must be fixed.
BTW, the test is not care about failure or blacklisted functions. I'll update it to check blacklist functions if available.
Thank you,