On Tue, Oct 17, 2023 at 01:22:14PM +0100, Mark Brown wrote:
On Tue, Oct 17, 2023 at 01:34:18PM +0530, Naresh Kamboju wrote:
Following kernel crash noticed while running selftests: ftrace: ftracetest-ktap on FVP models running stable-rc 6.5.8-rc2.
This is not an easy to reproduce issue and not seen on mainline and next. We are investigating this report.
To confirm have you seen this on other stables as well or is this only v6.5? For how long have you been seeing this?
[ 764.987161] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[ 765.074221] Call trace: [ 765.075045] sve_save_state+0x4/0xf0 [ 765.076138] fpsimd_thread_switch+0x2c/0xe8 [ 765.077305] __switch_to+0x20/0x158 [ 765.078384] __schedule+0x2cc/0xb38 [ 765.079464] preempt_schedule_irq+0x44/0xa8 [ 765.080633] el1_interrupt+0x4c/0x68 [ 765.081691] el1h_64_irq_handler+0x18/0x28 [ 765.082829] el1h_64_irq+0x64/0x68 [ 765.083874] ftrace_return_to_handler+0x98/0x158 [ 765.085090] return_to_handler+0x20/0x48 [ 765.086205] do_sve_acc+0x64/0x128 [ 765.087272] el0_sve_acc+0x3c/0xa0 [ 765.088356] el0t_64_sync_handler+0x114/0x130 [ 765.089524] el0t_64_sync+0x190/0x198
So something managed to get flagged as having SVE state without having the backing storage allocated. We *were* preempted in the SVE access handler which does the allocation but I can't see the path that would trigger that since we allocate the state before setting TIF_SVE. It's possible the compiler did something funky, a decode of the backtrace might help show that?
Having a vmlinux would be *really* helpful...
I tried generating fpsimd.o using the same config and the kernel.org crosstool GCC 13.2.0, code dump below. Assuming the code generation is the same as for Naresh, do_sve_acc+0x64 is at 0x191c, and is just after the call to sve_alloc().
So IIUC what's happening here is that sve_alloc() has been called, its entry has been traced, its body has been run, and in the process of tracing its return an IRQ has preempted the task and caused a reschedule.
So unless sve_alloc() failed, at the instant the IRQ was taken:
* `task->thread.sve_state` should be non-NULL * `task->thread_info.flags & TIF_SVE` should be 0
... so if `task->thread.sve_state` becomes NULL, I wonder if we end up accidentally blatting that as part of the context switch? I can't immedaitely see how/
Mark.
00000000000018b8 <do_sve_acc>: 18b8: d503201f nop 18bc: d503201f nop 18c0: d503233f paciasp 18c4: a9be7bfd stp x29, x30, [sp, #-32]! 18c8: 910003fd mov x29, sp 18cc: 1400000a b 18f4 <do_sve_acc+0x3c> 18d0: d503201f nop 18d4: f9408022 ldr x2, [x1, #256] 18d8: d2800003 mov x3, #0x0 // #0 18dc: 52800080 mov w0, #0x4 // #4 18e0: 52800021 mov w1, #0x1 // #1 18e4: 94000000 bl 0 <force_signal_inject> 18e8: a8c27bfd ldp x29, x30, [sp], #32 18ec: d50323bf autiasp 18f0: d65f03c0 ret 18f4: 90000000 adrp x0, 0 <system_cpucaps> 18f8: f9400000 ldr x0, [x0] 18fc: b6f7fec0 tbz x0, #62, 18d4 <do_sve_acc+0x1c> 1900: f9000bf3 str x19, [sp, #16] 1904: d5384113 mrs x19, sp_el0 1908: f9400260 ldr x0, [x19] 190c: 37b005e0 tbnz w0, #22, 19c8 <do_sve_acc+0x110> 1910: aa1303e0 mov x0, x19 1914: 52800021 mov w1, #0x1 // #1 1918: 94000000 bl 1140 <sve_alloc> 191c: f946be60 ldr x0, [x19, #3448] 1920: b4000480 cbz x0, 19b0 <do_sve_acc+0xf8> 1924: 97fffb59 bl 688 <get_cpu_fpsimd_context> 1928: 14000016 b 1980 <do_sve_acc+0xc8> 192c: d2a01000 mov x0, #0x800000 // #8388608 1930: f8e03260 ldsetal x0, x0, [x19] 1934: 36b80040 tbz w0, #23, 193c <do_sve_acc+0x84> 1938: d4210000 brk #0x800 193c: d5384113 mrs x19, sp_el0 1940: f9400260 ldr x0, [x19] 1944: 371802c0 tbnz w0, #3, 199c <do_sve_acc+0xe4> 1948: b94d8a73 ldr w19, [x19, #3464] 194c: 53047e73 lsr w19, w19, #4 1950: 51000673 sub w19, w19, #0x1 1954: aa1303e0 mov x0, x19 1958: 94000000 bl 0 <sve_set_vq> 195c: aa1303e1 mov x1, x19 1960: 52800020 mov w0, #0x1 // #1 1964: 94000000 bl 0 <sve_flush_live> 1968: 97fffbb4 bl 838 <fpsimd_bind_task_to_cpu> 196c: 97fffb61 bl 6f0 <put_cpu_fpsimd_context> 1970: f9400bf3 ldr x19, [sp, #16] 1974: a8c27bfd ldp x29, x30, [sp], #32 1978: d50323bf autiasp 197c: d65f03c0 ret 1980: f9800271 prfm pstl1strm, [x19] 1984: c85f7e60 ldxr x0, [x19] 1988: b2690001 orr x1, x0, #0x800000 198c: c802fe61 stlxr w2, x1, [x19] 1990: 35ffffa2 cbnz w2, 1984 <do_sve_acc+0xcc> 1994: d5033bbf dmb ish 1998: 17ffffe7 b 1934 <do_sve_acc+0x7c> 199c: aa1303e0 mov x0, x19 19a0: 97fffaf2 bl 568 <fpsimd_to_sve> 19a4: 52800040 mov w0, #0x2 // #2 19a8: b90d7260 str w0, [x19, #3440] 19ac: 17fffff0 b 196c <do_sve_acc+0xb4> 19b0: 52800120 mov w0, #0x9 // #9 19b4: 94000000 bl 0 <force_sig> 19b8: f9400bf3 ldr x19, [sp, #16] 19bc: a8c27bfd ldp x29, x30, [sp], #32 19c0: d50323bf autiasp 19c4: d65f03c0 ret 19c8: d4210000 brk #0x800 19cc: f9400bf3 ldr x19, [sp, #16] 19d0: 17ffffc1 b 18d4 <do_sve_acc+0x1c> 19d4: d503201f nop 19d8: d503201f nop 19dc: d503201f nop