On Wed, 11 Jan 2023 at 13:48, Arnd Bergmann arnd@arndb.de wrote:
On Wed, Jan 11, 2023, at 07:16, Naresh Kamboju wrote:
On Tue, 10 Jan 2023 at 23:36, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
Results from Linaro’s test farm. Regressions on arm64 Raspberry Pi 4 Model B.
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
While running LTP controllers cgroup_fj_stress_blkio test cases the Insufficient stack space to handle exception! occurred and followed by kernel panic on arm64 Raspberry Pi 4 Model B with clang-15 built kernel Image.
The full boot and test log attached to this email and build and Kconfig links provided in the bottom of this email.
I will try to reproduce this reported issue and get back to you.
I looked at the log between 6.0.18 and 6.0.19-rc1, but don't see any arm64 or memory management patches that could result in this. Do you know if 6.0.18 ran successful
Yes, it ran successfully on 6.0.18.
On the same kernel 6.0.19-rc1 built with gcc-12 did not find this panic. The reported issue is specific to clang-15 build.
[ 2893.044339] Insufficient stack space to handle exception! [ 2893.044351] ESR: 0x0000000096000047 -- DABT (current EL) [ 2893.044360] FAR: 0xffff8000128180d0 [ 2893.044364] Task stack: [0xffff800012a18000..0xffff800012a1c000] [ 2893.044370] IRQ stack: [0xffff80000a798000..0xffff80000a79c000] [ 2893.044375] Overflow stack: [0xffff0000f77c4310..0xffff0000f77c5310]
...
[ 2893.044413] pc : el1h_64_sync+0x0/0x68 [ 2893.044430] lr : wp_page_copy+0xf8/0x90c [ 2893.044445] sp : ffff8000128180d0
...
[ 2893.044692] el1h_64_sync+0x0/0x68 [ 2893.044700] do_wp_page+0x4a0/0x5c8 [ 2893.044708] handle_mm_fault+0x7fc/0x14dc [ 2893.044718] do_page_fault+0x29c/0x450 [ 2893.044727] do_mem_abort+0x4c/0xf8 [ 2893.044741] el0_da+0x48/0xa8 [ 2893.044750] el0t_64_sync_handler+0xcc/0xf0 [ 2893.044759] el0t_64_sync+0x18c/0x190
It claims that the stack overflow happened in do_wp_page(), but that has a really short call chain. It would be good to have the source line for do_wp_page+0x4a0/0x5c8 and wp_page_copy+0xf8/0x90c to see where exactly it was.
[ 2893.285975] WARNING: CPU: 2 PID: 315758 at kernel/sched/core.c:3119 set_task_cpu+0x14c/0x208
....
[ 2893.286117] CPU: 2 PID: 315758 Comm: cgroup_fj_stres Not tainted [ 2893.286416] arch_timer_handler_phys+0x44/0x54 [ 2893.286427] handle_percpu_devid_irq+0x90/0x220 [ 2893.286439] generic_handle_domain_irq+0x38/0x50 [ 2893.286447] gic_handle_irq+0x68/0xe8 [ 2893.286455] el1_interrupt+0x88/0xc8 [ 2893.286464] el1h_64_irq_handler+0x18/0x24 [ 2893.286474] el1h_64_irq+0x64/0x68 [ 2893.286482] panic+0x2d8/0x374
This is apparently a second unrelated bug -- it still processes timer interrupts after calling panic() and this apparently fails because the system is already unusable.
artifact-location: https://storage.tuxsuite.com/public/linaro/lkft/builds/2K9JDtix2mHMoYRjNkBef...
Adding " / " at end works. https://storage.tuxsuite.com/public/linaro/lkft/builds/2K9JDtix2mHMoYRjNkBef...
file not found. I tried to get the vmlinux file to look at the disassembly but the artifacts appear to be gone already.
System.map: https://storage.tuxsuite.com/public/linaro/lkft/builds/2K9JDtix2mHMoYRjNkBef...
vmlinux: https://storage.tuxsuite.com/public/linaro/lkft/builds/2K9JDtix2mHMoYRjNkBef...
Sorry for the trouble.
- Naresh
Arnd