[ my two cents ] While running LTP pty07 test cases on arm64 juno-r2 with Linux next-20230919 the following kernel crash was noticed.
I have been noticing this issue intermittently on Juno-r2 for more than a month. Anyone have noticed this crash ?
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
[ 0.000000] Linux version 6.6.0-rc2-next-20230919 (tuxmake@tuxmake) (aarch64-linux-gnu-gcc (Debian 13.2.0-2) 13.2.0, GNU ld (GNU Binutils for Debian) 2.41) #1 SMP PREEMPT @1695107157 [ 0.000000] KASLR disabled due to lack of seed [ 0.000000] Machine model: ARM Juno development board (r2) ... LTP running pty ...
pty07.c:92: TINFO: Saving active console 1 ../../../include/tst_fuzzy_sync.h:640: TINFO: Stopped sampling at 552 (out of 1024) samples, sampling time reached 50% of the total time limit ../../../include/tst_fuzzy_sync.h:307: TINFO: loop = 552, delay_bias = 0 ../../../include/tst_fuzzy_sync.h:295: TINFO: start_a - start_b: { avg = 127ns, avg_dev = 84ns, dev_ratio = 0.66 } ../../../include/tst_fuzzy_sync.h:295: TINFO: end_a - start_a : { avg = 17296156ns, avg_dev = 5155058ns, dev_ratio = 0.30 } ../../../include/tst_fuzzy_sync.h:295: TINFO: end_b - start_b : { avg = 101202336ns, avg_dev = 6689286ns, dev_ratio = 0.07 } ../../../include/tst_fuzzy_sync.h:295: TINFO: end_a - end_b : { avg = -83906064ns, avg_dev = 10230694ns, dev_ratio = 0.12 } ../../../include/tst_fuzzy_sync.h:295: TINFO: spins : { avg = 2765565 , avg_dev = 339285 , dev_ratio = 0.12 } [ 384.133538] Unable to handle kernel execute from non-executable memory at virtual address ffff8000834c13a0 [ 384.133559] Mem abort info: [ 384.133568] ESR = 0x000000008600000f [ 384.133578] EC = 0x21: IABT (current EL), IL = 32 bits [ 384.133590] SET = 0, FnV = 0 [ 384.133600] EA = 0, S1PTW = 0 [ 384.133610] FSC = 0x0f: level 3 permission fault [ 384.133621] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000082375000 [ 384.133634] [ffff8000834c13a0] pgd=10000009fffff003, p4d=10000009fffff003, pud=10000009ffffe003, pmd=10000009ffff8003, pte=00780000836c1703 [ 384.133697] Internal error: Oops: 000000008600000f [#1] PREEMPT SMP [ 384.133707] Modules linked in: tda998x onboard_usb_hub cec hdlcd crct10dif_ce drm_dma_helper drm_kms_helper fuse drm backlight dm_mod ip_tables x_tables [ 384.133767] CPU: 3 PID: 589 Comm: (udev-worker) Not tainted 6.6.0-rc2-next-20230919 #1 [ 384.133779] Hardware name: ARM Juno development board (r2) (DT) [ 384.133784] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 384.133796] pc : in_lookup_hashtable+0x178/0x2000 [ 384.133818] lr : rcu_core (arch/arm64/include/asm/preempt.h:13 (discriminator 1) kernel/rcu/tree.c:2146 (discriminator 1) kernel/rcu/tree.c:2403 (discriminator 1)) [ 384.133832] sp : ffff800083533e60 [ 384.133836] x29: ffff800083533e60 x28: ffff0008008a6180 x27: 000000000000000a [ 384.133854] x26: 0000000000000000 x25: 0000000000000000 x24: ffff800083533f10 [ 384.133871] x23: ffff800082404008 x22: ffff800082ebea80 x21: ffff800082f55940 [ 384.133889] x20: ffff00097ed75440 x19: 0000000000000001 x18: 0000000000000000 [ 384.133905] x17: ffff8008fc95c000 x16: ffff800083530000 x15: 00003d0900000000 [ 384.133922] x14: 0000000000030d40 x13: 0000000000000000 x12: 003d090000000000 [ 384.133939] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff80008015b05c [ 384.133955] x8 : ffff800083533da8 x7 : 0000000000000000 x6 : 0000000000000100 [ 384.133971] x5 : ffff800082ebf000 x4 : ffff800082ebf2e8 x3 : 0000000000000000 [ 384.133987] x2 : ffff000825bf8618 x1 : ffff8000834c13a0 x0 : ffff00082b6d7170 [ 384.134005] Call trace: [ 384.134009] in_lookup_hashtable+0x178/0x2000 [ 384.134022] rcu_core_si (kernel/rcu/tree.c:2421) [ 384.134035] __do_softirq (arch/arm64/include/asm/jump_label.h:21 include/linux/jump_label.h:207 include/trace/events/irq.h:142 kernel/softirq.c:554) [ 384.134046] ____do_softirq (arch/arm64/kernel/irq.c:81) [ 384.134058] call_on_irq_stack (arch/arm64/kernel/entry.S:888) [ 384.134070] do_softirq_own_stack (arch/arm64/kernel/irq.c:86) [ 384.134082] irq_exit_rcu (arch/arm64/include/asm/percpu.h:44 kernel/softirq.c:612 kernel/softirq.c:634 kernel/softirq.c:644) [ 384.134094] el0_interrupt (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:144 arch/arm64/kernel/entry-common.c:763) [ 384.134110] __el0_irq_handler_common (arch/arm64/kernel/entry-common.c:769) [ 384.134124] el0t_64_irq_handler (arch/arm64/kernel/entry-common.c:774) [ 384.134137] el0t_64_irq (arch/arm64/kernel/entry.S:592) [ 384.134153] Code: 00000000 00000000 00000000 00000000 (2b6d7170) All code ======== ... 10: 70 71 jo 0x83 12: 6d insl (%dx),%es:(%rdi) 13: 2b .byte 0x2b
Code starting with the faulting instruction =========================================== 0: 70 71 jo 0x73 2: 6d insl (%dx),%es:(%rdi) 3: 2b .byte 0x2b [ 384.134161] ---[ end trace 0000000000000000 ]--- [ 384.134168] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 384.134173] SMP: stopping secondary CPUs [ 384.134184] Kernel Offset: disabled [ 384.134187] CPU features: 0x8000020c,3c020000,0000421b [ 384.134194] Memory Limit: none
Links: - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230919/tes... - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230919/tes... - https://storage.tuxsuite.com/public/linaro/lkft/builds/2VbZdpWwncUx8oSxsSXCW... - https://lkft.validation.linaro.org/scheduler/job/6666807#L2461
-- Linaro LKFT https://lkft.linaro.org
Hi Naresh,
On Wed, Sep 20, 2023 at 11:29:12AM +0200, Naresh Kamboju wrote:
[ my two cents ] While running LTP pty07 test cases on arm64 juno-r2 with Linux next-20230919 the following kernel crash was noticed.
I have been noticing this issue intermittently on Juno-r2 for more than a month. Anyone have noticed this crash ?
How intermittent is this? 1/2, 1/10, 1/100, rarer still?
Are you running *just* the pty07 test, or are you running a whole LTP suite and the issue first occurs around pty07?
Given you've been hitting this for a month, have you tried testing mainline? Do you have a known-good kernel that we can start a bisect from?
Do you *only* see this on Juno-r2 and are you testing on other hardware?
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
[ 0.000000] Linux version 6.6.0-rc2-next-20230919 (tuxmake@tuxmake) (aarch64-linux-gnu-gcc (Debian 13.2.0-2) 13.2.0, GNU ld (GNU Binutils for Debian) 2.41) #1 SMP PREEMPT @1695107157 [ 0.000000] KASLR disabled due to lack of seed [ 0.000000] Machine model: ARM Juno development board (r2) ... LTP running pty ...
pty07.c:92: TINFO: Saving active console 1 ../../../include/tst_fuzzy_sync.h:640: TINFO: Stopped sampling at 552 (out of 1024) samples, sampling time reached 50% of the total time limit ../../../include/tst_fuzzy_sync.h:307: TINFO: loop = 552, delay_bias = 0 ../../../include/tst_fuzzy_sync.h:295: TINFO: start_a - start_b: { avg = 127ns, avg_dev = 84ns, dev_ratio = 0.66 } ../../../include/tst_fuzzy_sync.h:295: TINFO: end_a - start_a : { avg = 17296156ns, avg_dev = 5155058ns, dev_ratio = 0.30 } ../../../include/tst_fuzzy_sync.h:295: TINFO: end_b - start_b : { avg = 101202336ns, avg_dev = 6689286ns, dev_ratio = 0.07 } ../../../include/tst_fuzzy_sync.h:295: TINFO: end_a - end_b : { avg = -83906064ns, avg_dev = 10230694ns, dev_ratio = 0.12 } ../../../include/tst_fuzzy_sync.h:295: TINFO: spins : { avg = 2765565 , avg_dev = 339285 , dev_ratio = 0.12 } [ 384.133538] Unable to handle kernel execute from non-executable memory at virtual address ffff8000834c13a0 [ 384.133559] Mem abort info: [ 384.133568] ESR = 0x000000008600000f [ 384.133578] EC = 0x21: IABT (current EL), IL = 32 bits [ 384.133590] SET = 0, FnV = 0 [ 384.133600] EA = 0, S1PTW = 0 [ 384.133610] FSC = 0x0f: level 3 permission fault [ 384.133621] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000082375000 [ 384.133634] [ffff8000834c13a0] pgd=10000009fffff003, p4d=10000009fffff003, pud=10000009ffffe003, pmd=10000009ffff8003, pte=00780000836c1703 [ 384.133697] Internal error: Oops: 000000008600000f [#1] PREEMPT SMP [ 384.133707] Modules linked in: tda998x onboard_usb_hub cec hdlcd crct10dif_ce drm_dma_helper drm_kms_helper fuse drm backlight dm_mod ip_tables x_tables [ 384.133767] CPU: 3 PID: 589 Comm: (udev-worker) Not tainted 6.6.0-rc2-next-20230919 #1 [ 384.133779] Hardware name: ARM Juno development board (r2) (DT) [ 384.133784] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 384.133796] pc : in_lookup_hashtable+0x178/0x2000
This indicates that the faulting address ffff8000834c13a0 is in_lookup_hashtable+0x178/0x2000, which would been we've somehow marked the kernel text as non-executable, which we never do intentionally.
I suspect that implies memory corruption. Have you tried running this with KASAN enabled?
[ 384.133818] lr : rcu_core (arch/arm64/include/asm/preempt.h:13 (discriminator 1) kernel/rcu/tree.c:2146 (discriminator 1) kernel/rcu/tree.c:2403 (discriminator 1)) [ 384.133832] sp : ffff800083533e60 [ 384.133836] x29: ffff800083533e60 x28: ffff0008008a6180 x27: 000000000000000a [ 384.133854] x26: 0000000000000000 x25: 0000000000000000 x24: ffff800083533f10 [ 384.133871] x23: ffff800082404008 x22: ffff800082ebea80 x21: ffff800082f55940 [ 384.133889] x20: ffff00097ed75440 x19: 0000000000000001 x18: 0000000000000000 [ 384.133905] x17: ffff8008fc95c000 x16: ffff800083530000 x15: 00003d0900000000 [ 384.133922] x14: 0000000000030d40 x13: 0000000000000000 x12: 003d090000000000 [ 384.133939] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff80008015b05c [ 384.133955] x8 : ffff800083533da8 x7 : 0000000000000000 x6 : 0000000000000100 [ 384.133971] x5 : ffff800082ebf000 x4 : ffff800082ebf2e8 x3 : 0000000000000000 [ 384.133987] x2 : ffff000825bf8618 x1 : ffff8000834c13a0 x0 : ffff00082b6d7170 [ 384.134005] Call trace: [ 384.134009] in_lookup_hashtable+0x178/0x2000 [ 384.134022] rcu_core_si (kernel/rcu/tree.c:2421) [ 384.134035] __do_softirq (arch/arm64/include/asm/jump_label.h:21 include/linux/jump_label.h:207 include/trace/events/irq.h:142 kernel/softirq.c:554) [ 384.134046] ____do_softirq (arch/arm64/kernel/irq.c:81) [ 384.134058] call_on_irq_stack (arch/arm64/kernel/entry.S:888) [ 384.134070] do_softirq_own_stack (arch/arm64/kernel/irq.c:86) [ 384.134082] irq_exit_rcu (arch/arm64/include/asm/percpu.h:44 kernel/softirq.c:612 kernel/softirq.c:634 kernel/softirq.c:644) [ 384.134094] el0_interrupt (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:144 arch/arm64/kernel/entry-common.c:763) [ 384.134110] __el0_irq_handler_common (arch/arm64/kernel/entry-common.c:769) [ 384.134124] el0t_64_irq_handler (arch/arm64/kernel/entry-common.c:774) [ 384.134137] el0t_64_irq (arch/arm64/kernel/entry.S:592) [ 384.134153] Code: 00000000 00000000 00000000 00000000 (2b6d7170) All code ======== ... 10: 70 71 jo 0x83 12: 6d insl (%dx),%es:(%rdi) 13: 2b .byte 0x2b
Code starting with the faulting instruction
0: 70 71 jo 0x73 2: 6d insl (%dx),%es:(%rdi) 3: 2b .byte 0x2b
As a general thing, can you *please* fix this code dump to decode arm64 as arm64?
Given the instructions before this are all UDF #0, I suspect the page table entry has been corrupted and this is pointing at entirely the wrong page.
Thanks, Mark.
[ 384.134161] ---[ end trace 0000000000000000 ]--- [ 384.134168] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 384.134173] SMP: stopping secondary CPUs [ 384.134184] Kernel Offset: disabled [ 384.134187] CPU features: 0x8000020c,3c020000,0000421b [ 384.134194] Memory Limit: none
Links:
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230919/tes...
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230919/tes...
- https://storage.tuxsuite.com/public/linaro/lkft/builds/2VbZdpWwncUx8oSxsSXCW...
- https://lkft.validation.linaro.org/scheduler/job/6666807#L2461
-- Linaro LKFT https://lkft.linaro.org
On 20/09/2023 3:32 pm, Mark Rutland wrote:
Hi Naresh,
On Wed, Sep 20, 2023 at 11:29:12AM +0200, Naresh Kamboju wrote:
[ my two cents ] While running LTP pty07 test cases on arm64 juno-r2 with Linux next-20230919 the following kernel crash was noticed.
I have been noticing this issue intermittently on Juno-r2 for more than a month. Anyone have noticed this crash ?
How intermittent is this? 1/2, 1/10, 1/100, rarer still?
Are you running *just* the pty07 test, or are you running a whole LTP suite and the issue first occurs around pty07?
Given you've been hitting this for a month, have you tried testing mainline? Do you have a known-good kernel that we can start a bisect from?
Do you *only* see this on Juno-r2 and are you testing on other hardware?
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
[ 0.000000] Linux version 6.6.0-rc2-next-20230919 (tuxmake@tuxmake) (aarch64-linux-gnu-gcc (Debian 13.2.0-2) 13.2.0, GNU ld (GNU Binutils for Debian) 2.41) #1 SMP PREEMPT @1695107157 [ 0.000000] KASLR disabled due to lack of seed [ 0.000000] Machine model: ARM Juno development board (r2) ... LTP running pty ...
pty07.c:92: TINFO: Saving active console 1 ../../../include/tst_fuzzy_sync.h:640: TINFO: Stopped sampling at 552 (out of 1024) samples, sampling time reached 50% of the total time limit ../../../include/tst_fuzzy_sync.h:307: TINFO: loop = 552, delay_bias = 0 ../../../include/tst_fuzzy_sync.h:295: TINFO: start_a - start_b: { avg = 127ns, avg_dev = 84ns, dev_ratio = 0.66 } ../../../include/tst_fuzzy_sync.h:295: TINFO: end_a - start_a : { avg = 17296156ns, avg_dev = 5155058ns, dev_ratio = 0.30 } ../../../include/tst_fuzzy_sync.h:295: TINFO: end_b - start_b : { avg = 101202336ns, avg_dev = 6689286ns, dev_ratio = 0.07 } ../../../include/tst_fuzzy_sync.h:295: TINFO: end_a - end_b : { avg = -83906064ns, avg_dev = 10230694ns, dev_ratio = 0.12 } ../../../include/tst_fuzzy_sync.h:295: TINFO: spins : { avg = 2765565 , avg_dev = 339285 , dev_ratio = 0.12 } [ 384.133538] Unable to handle kernel execute from non-executable memory at virtual address ffff8000834c13a0 [ 384.133559] Mem abort info: [ 384.133568] ESR = 0x000000008600000f [ 384.133578] EC = 0x21: IABT (current EL), IL = 32 bits [ 384.133590] SET = 0, FnV = 0 [ 384.133600] EA = 0, S1PTW = 0 [ 384.133610] FSC = 0x0f: level 3 permission fault [ 384.133621] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000082375000 [ 384.133634] [ffff8000834c13a0] pgd=10000009fffff003, p4d=10000009fffff003, pud=10000009ffffe003, pmd=10000009ffff8003, pte=00780000836c1703 [ 384.133697] Internal error: Oops: 000000008600000f [#1] PREEMPT SMP [ 384.133707] Modules linked in: tda998x onboard_usb_hub cec hdlcd crct10dif_ce drm_dma_helper drm_kms_helper fuse drm backlight dm_mod ip_tables x_tables [ 384.133767] CPU: 3 PID: 589 Comm: (udev-worker) Not tainted 6.6.0-rc2-next-20230919 #1 [ 384.133779] Hardware name: ARM Juno development board (r2) (DT) [ 384.133784] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 384.133796] pc : in_lookup_hashtable+0x178/0x2000
This indicates that the faulting address ffff8000834c13a0 is in_lookup_hashtable+0x178/0x2000, which would been we've somehow marked the kernel text as non-executable, which we never do intentionally.
I suspect that implies memory corruption. Have you tried running this with KASAN enabled?
[ 384.133818] lr : rcu_core (arch/arm64/include/asm/preempt.h:13 (discriminator 1) kernel/rcu/tree.c:2146 (discriminator 1) kernel/rcu/tree.c:2403 (discriminator 1))
For the record, this LR appears to be the expected return address of the "f(rhp);" call within rcu_do_batch() (if CONFIG_DEBUG_LOCK_ALLOC=n), so it looks like a case of a bogus or corrupted RCU callback. The PC is in the middle of a data symbol (in_lookup_hashtable is an array), so NX is expected and I wouldn't imagine the pagetables have gone wrong, just regular data corruption or use-after-free somewhere.
Robin.
[ 384.133832] sp : ffff800083533e60 [ 384.133836] x29: ffff800083533e60 x28: ffff0008008a6180 x27: 000000000000000a [ 384.133854] x26: 0000000000000000 x25: 0000000000000000 x24: ffff800083533f10 [ 384.133871] x23: ffff800082404008 x22: ffff800082ebea80 x21: ffff800082f55940 [ 384.133889] x20: ffff00097ed75440 x19: 0000000000000001 x18: 0000000000000000 [ 384.133905] x17: ffff8008fc95c000 x16: ffff800083530000 x15: 00003d0900000000 [ 384.133922] x14: 0000000000030d40 x13: 0000000000000000 x12: 003d090000000000 [ 384.133939] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff80008015b05c [ 384.133955] x8 : ffff800083533da8 x7 : 0000000000000000 x6 : 0000000000000100 [ 384.133971] x5 : ffff800082ebf000 x4 : ffff800082ebf2e8 x3 : 0000000000000000 [ 384.133987] x2 : ffff000825bf8618 x1 : ffff8000834c13a0 x0 : ffff00082b6d7170 [ 384.134005] Call trace: [ 384.134009] in_lookup_hashtable+0x178/0x2000 [ 384.134022] rcu_core_si (kernel/rcu/tree.c:2421) [ 384.134035] __do_softirq (arch/arm64/include/asm/jump_label.h:21 include/linux/jump_label.h:207 include/trace/events/irq.h:142 kernel/softirq.c:554) [ 384.134046] ____do_softirq (arch/arm64/kernel/irq.c:81) [ 384.134058] call_on_irq_stack (arch/arm64/kernel/entry.S:888) [ 384.134070] do_softirq_own_stack (arch/arm64/kernel/irq.c:86) [ 384.134082] irq_exit_rcu (arch/arm64/include/asm/percpu.h:44 kernel/softirq.c:612 kernel/softirq.c:634 kernel/softirq.c:644) [ 384.134094] el0_interrupt (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:144 arch/arm64/kernel/entry-common.c:763) [ 384.134110] __el0_irq_handler_common (arch/arm64/kernel/entry-common.c:769) [ 384.134124] el0t_64_irq_handler (arch/arm64/kernel/entry-common.c:774) [ 384.134137] el0t_64_irq (arch/arm64/kernel/entry.S:592) [ 384.134153] Code: 00000000 00000000 00000000 00000000 (2b6d7170) All code ======== ... 10: 70 71 jo 0x83 12: 6d insl (%dx),%es:(%rdi) 13: 2b .byte 0x2b
Code starting with the faulting instruction
0: 70 71 jo 0x73 2: 6d insl (%dx),%es:(%rdi) 3: 2b .byte 0x2b
As a general thing, can you *please* fix this code dump to decode arm64 as arm64?
Given the instructions before this are all UDF #0, I suspect the page table entry has been corrupted and this is pointing at entirely the wrong page.
Thanks, Mark.
[ 384.134161] ---[ end trace 0000000000000000 ]--- [ 384.134168] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 384.134173] SMP: stopping secondary CPUs [ 384.134184] Kernel Offset: disabled [ 384.134187] CPU features: 0x8000020c,3c020000,0000421b [ 384.134194] Memory Limit: none
Links:
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230919/tes...
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230919/tes...
- https://storage.tuxsuite.com/public/linaro/lkft/builds/2VbZdpWwncUx8oSxsSXCW...
- https://lkft.validation.linaro.org/scheduler/job/6666807#L2461
-- Linaro LKFT https://lkft.linaro.org
linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Wed, Sep 20, 2023 at 05:26:33PM +0100, Robin Murphy wrote:
On 20/09/2023 3:32 pm, Mark Rutland wrote:
Hi Naresh,
On Wed, Sep 20, 2023 at 11:29:12AM +0200, Naresh Kamboju wrote:
[ my two cents ] While running LTP pty07 test cases on arm64 juno-r2 with Linux next-20230919 the following kernel crash was noticed.
I have been noticing this issue intermittently on Juno-r2 for more than a month. Anyone have noticed this crash ?
How intermittent is this? 1/2, 1/10, 1/100, rarer still?
Are you running *just* the pty07 test, or are you running a whole LTP suite and the issue first occurs around pty07?
Given you've been hitting this for a month, have you tried testing mainline? Do you have a known-good kernel that we can start a bisect from?
Do you *only* see this on Juno-r2 and are you testing on other hardware?
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
[ 0.000000] Linux version 6.6.0-rc2-next-20230919 (tuxmake@tuxmake) (aarch64-linux-gnu-gcc (Debian 13.2.0-2) 13.2.0, GNU ld (GNU Binutils for Debian) 2.41) #1 SMP PREEMPT @1695107157 [ 0.000000] KASLR disabled due to lack of seed [ 0.000000] Machine model: ARM Juno development board (r2) ... LTP running pty ...
pty07.c:92: TINFO: Saving active console 1 ../../../include/tst_fuzzy_sync.h:640: TINFO: Stopped sampling at 552 (out of 1024) samples, sampling time reached 50% of the total time limit ../../../include/tst_fuzzy_sync.h:307: TINFO: loop = 552, delay_bias = 0 ../../../include/tst_fuzzy_sync.h:295: TINFO: start_a - start_b: { avg = 127ns, avg_dev = 84ns, dev_ratio = 0.66 } ../../../include/tst_fuzzy_sync.h:295: TINFO: end_a - start_a : { avg = 17296156ns, avg_dev = 5155058ns, dev_ratio = 0.30 } ../../../include/tst_fuzzy_sync.h:295: TINFO: end_b - start_b : { avg = 101202336ns, avg_dev = 6689286ns, dev_ratio = 0.07 } ../../../include/tst_fuzzy_sync.h:295: TINFO: end_a - end_b : { avg = -83906064ns, avg_dev = 10230694ns, dev_ratio = 0.12 } ../../../include/tst_fuzzy_sync.h:295: TINFO: spins : { avg = 2765565 , avg_dev = 339285 , dev_ratio = 0.12 } [ 384.133538] Unable to handle kernel execute from non-executable memory at virtual address ffff8000834c13a0 [ 384.133559] Mem abort info: [ 384.133568] ESR = 0x000000008600000f [ 384.133578] EC = 0x21: IABT (current EL), IL = 32 bits [ 384.133590] SET = 0, FnV = 0 [ 384.133600] EA = 0, S1PTW = 0 [ 384.133610] FSC = 0x0f: level 3 permission fault [ 384.133621] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000082375000 [ 384.133634] [ffff8000834c13a0] pgd=10000009fffff003, p4d=10000009fffff003, pud=10000009ffffe003, pmd=10000009ffff8003, pte=00780000836c1703 [ 384.133697] Internal error: Oops: 000000008600000f [#1] PREEMPT SMP [ 384.133707] Modules linked in: tda998x onboard_usb_hub cec hdlcd crct10dif_ce drm_dma_helper drm_kms_helper fuse drm backlight dm_mod ip_tables x_tables [ 384.133767] CPU: 3 PID: 589 Comm: (udev-worker) Not tainted 6.6.0-rc2-next-20230919 #1 [ 384.133779] Hardware name: ARM Juno development board (r2) (DT) [ 384.133784] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 384.133796] pc : in_lookup_hashtable+0x178/0x2000
This indicates that the faulting address ffff8000834c13a0 is in_lookup_hashtable+0x178/0x2000, which would been we've somehow marked the kernel text as non-executable, which we never do intentionally.
I suspect that implies memory corruption. Have you tried running this with KASAN enabled?
[ 384.133818] lr : rcu_core (arch/arm64/include/asm/preempt.h:13 (discriminator 1) kernel/rcu/tree.c:2146 (discriminator 1) kernel/rcu/tree.c:2403 (discriminator 1))
For the record, this LR appears to be the expected return address of the "f(rhp);" call within rcu_do_batch() (if CONFIG_DEBUG_LOCK_ALLOC=n), so it looks like a case of a bogus or corrupted RCU callback. The PC is in the middle of a data symbol (in_lookup_hashtable is an array), so NX is expected and I wouldn't imagine the pagetables have gone wrong, just regular data corruption or use-after-free somewhere.
Is it possible to use either KASAN or CONFIG_DEBUG_OBJECTS_RCU_HEAD=y here?
Thanx, Paul
Robin.
[ 384.133832] sp : ffff800083533e60 [ 384.133836] x29: ffff800083533e60 x28: ffff0008008a6180 x27: 000000000000000a [ 384.133854] x26: 0000000000000000 x25: 0000000000000000 x24: ffff800083533f10 [ 384.133871] x23: ffff800082404008 x22: ffff800082ebea80 x21: ffff800082f55940 [ 384.133889] x20: ffff00097ed75440 x19: 0000000000000001 x18: 0000000000000000 [ 384.133905] x17: ffff8008fc95c000 x16: ffff800083530000 x15: 00003d0900000000 [ 384.133922] x14: 0000000000030d40 x13: 0000000000000000 x12: 003d090000000000 [ 384.133939] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff80008015b05c [ 384.133955] x8 : ffff800083533da8 x7 : 0000000000000000 x6 : 0000000000000100 [ 384.133971] x5 : ffff800082ebf000 x4 : ffff800082ebf2e8 x3 : 0000000000000000 [ 384.133987] x2 : ffff000825bf8618 x1 : ffff8000834c13a0 x0 : ffff00082b6d7170 [ 384.134005] Call trace: [ 384.134009] in_lookup_hashtable+0x178/0x2000 [ 384.134022] rcu_core_si (kernel/rcu/tree.c:2421) [ 384.134035] __do_softirq (arch/arm64/include/asm/jump_label.h:21 include/linux/jump_label.h:207 include/trace/events/irq.h:142 kernel/softirq.c:554) [ 384.134046] ____do_softirq (arch/arm64/kernel/irq.c:81) [ 384.134058] call_on_irq_stack (arch/arm64/kernel/entry.S:888) [ 384.134070] do_softirq_own_stack (arch/arm64/kernel/irq.c:86) [ 384.134082] irq_exit_rcu (arch/arm64/include/asm/percpu.h:44 kernel/softirq.c:612 kernel/softirq.c:634 kernel/softirq.c:644) [ 384.134094] el0_interrupt (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:144 arch/arm64/kernel/entry-common.c:763) [ 384.134110] __el0_irq_handler_common (arch/arm64/kernel/entry-common.c:769) [ 384.134124] el0t_64_irq_handler (arch/arm64/kernel/entry-common.c:774) [ 384.134137] el0t_64_irq (arch/arm64/kernel/entry.S:592) [ 384.134153] Code: 00000000 00000000 00000000 00000000 (2b6d7170) All code ======== ... 10: 70 71 jo 0x83 12: 6d insl (%dx),%es:(%rdi) 13: 2b .byte 0x2b
Code starting with the faulting instruction
0: 70 71 jo 0x73 2: 6d insl (%dx),%es:(%rdi) 3: 2b .byte 0x2b
As a general thing, can you *please* fix this code dump to decode arm64 as arm64?
Given the instructions before this are all UDF #0, I suspect the page table entry has been corrupted and this is pointing at entirely the wrong page.
Thanks, Mark.
[ 384.134161] ---[ end trace 0000000000000000 ]--- [ 384.134168] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 384.134173] SMP: stopping secondary CPUs [ 384.134184] Kernel Offset: disabled [ 384.134187] CPU features: 0x8000020c,3c020000,0000421b [ 384.134194] Memory Limit: none
Links:
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230919/tes...
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230919/tes...
- https://storage.tuxsuite.com/public/linaro/lkft/builds/2VbZdpWwncUx8oSxsSXCW...
- https://lkft.validation.linaro.org/scheduler/job/6666807#L2461
-- Linaro LKFT https://lkft.linaro.org
linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Wed, Sep 20, 2023 at 7:02 PM Paul E. McKenney paulmck@kernel.org wrote:
On Wed, Sep 20, 2023 at 05:26:33PM +0100, Robin Murphy wrote:
On 20/09/2023 3:32 pm, Mark Rutland wrote:
Hi Naresh,
On Wed, Sep 20, 2023 at 11:29:12AM +0200, Naresh Kamboju wrote:
[ my two cents ] While running LTP pty07 test cases on arm64 juno-r2 with Linux next-20230919 the following kernel crash was noticed.
I have been noticing this issue intermittently on Juno-r2 for more than a month. Anyone have noticed this crash ?
How intermittent is this? 1/2, 1/10, 1/100, rarer still?
Are you running *just* the pty07 test, or are you running a whole LTP suite and the issue first occurs around pty07?
Given you've been hitting this for a month, have you tried testing mainline? Do you have a known-good kernel that we can start a bisect from?
Do you *only* see this on Juno-r2 and are you testing on other hardware?
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
[ 0.000000] Linux version 6.6.0-rc2-next-20230919 (tuxmake@tuxmake) (aarch64-linux-gnu-gcc (Debian 13.2.0-2) 13.2.0, GNU ld (GNU Binutils for Debian) 2.41) #1 SMP PREEMPT @1695107157 [ 0.000000] KASLR disabled due to lack of seed [ 0.000000] Machine model: ARM Juno development board (r2) ... LTP running pty ...
pty07.c:92: TINFO: Saving active console 1 ../../../include/tst_fuzzy_sync.h:640: TINFO: Stopped sampling at 552 (out of 1024) samples, sampling time reached 50% of the total time limit ../../../include/tst_fuzzy_sync.h:307: TINFO: loop = 552, delay_bias = 0 ../../../include/tst_fuzzy_sync.h:295: TINFO: start_a - start_b: { avg = 127ns, avg_dev = 84ns, dev_ratio = 0.66 } ../../../include/tst_fuzzy_sync.h:295: TINFO: end_a - start_a : { avg = 17296156ns, avg_dev = 5155058ns, dev_ratio = 0.30 } ../../../include/tst_fuzzy_sync.h:295: TINFO: end_b - start_b : { avg = 101202336ns, avg_dev = 6689286ns, dev_ratio = 0.07 } ../../../include/tst_fuzzy_sync.h:295: TINFO: end_a - end_b : { avg = -83906064ns, avg_dev = 10230694ns, dev_ratio = 0.12 } ../../../include/tst_fuzzy_sync.h:295: TINFO: spins : { avg = 2765565 , avg_dev = 339285 , dev_ratio = 0.12 } [ 384.133538] Unable to handle kernel execute from non-executable memory at virtual address ffff8000834c13a0 [ 384.133559] Mem abort info: [ 384.133568] ESR = 0x000000008600000f [ 384.133578] EC = 0x21: IABT (current EL), IL = 32 bits [ 384.133590] SET = 0, FnV = 0 [ 384.133600] EA = 0, S1PTW = 0 [ 384.133610] FSC = 0x0f: level 3 permission fault [ 384.133621] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000082375000 [ 384.133634] [ffff8000834c13a0] pgd=10000009fffff003, p4d=10000009fffff003, pud=10000009ffffe003, pmd=10000009ffff8003, pte=00780000836c1703 [ 384.133697] Internal error: Oops: 000000008600000f [#1] PREEMPT SMP [ 384.133707] Modules linked in: tda998x onboard_usb_hub cec hdlcd crct10dif_ce drm_dma_helper drm_kms_helper fuse drm backlight dm_mod ip_tables x_tables [ 384.133767] CPU: 3 PID: 589 Comm: (udev-worker) Not tainted 6.6.0-rc2-next-20230919 #1 [ 384.133779] Hardware name: ARM Juno development board (r2) (DT) [ 384.133784] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 384.133796] pc : in_lookup_hashtable+0x178/0x2000
This indicates that the faulting address ffff8000834c13a0 is in_lookup_hashtable+0x178/0x2000, which would been we've somehow marked the kernel text as non-executable, which we never do intentionally.
I suspect that implies memory corruption. Have you tried running this with KASAN enabled?
[ 384.133818] lr : rcu_core (arch/arm64/include/asm/preempt.h:13 (discriminator 1) kernel/rcu/tree.c:2146 (discriminator 1) kernel/rcu/tree.c:2403 (discriminator 1))
For the record, this LR appears to be the expected return address of the "f(rhp);" call within rcu_do_batch() (if CONFIG_DEBUG_LOCK_ALLOC=n), so it looks like a case of a bogus or corrupted RCU callback. The PC is in the middle of a data symbol (in_lookup_hashtable is an array), so NX is expected and I wouldn't imagine the pagetables have gone wrong, just regular data corruption or use-after-free somewhere.
Is it possible to use either KASAN or CONFIG_DEBUG_OBJECTS_RCU_HEAD=y here?
CKI has been also running into issues during pty07 runs lately. This is from aarch64 debug kernel:
[ 5537.660548] LTP: starting pty07 [-- MARK -- Mon Sep 18 14:30:00 2023] [ 5807.450507] ================================================================== [ 5807.450515] BUG: KASAN: slab-use-after-free in d_alloc_parallel+0xbfc/0xdf8 [ 5807.450524] Read of size 4 at addr ffff000169d87630 by task (udev-worker)/280492 [ 5807.450527] [ 5807.450530] CPU: 4 PID: 280492 Comm: (udev-worker) Not tainted 6.6.0-0.rc2.20.test.eln.aarch64+debug #1 [ 5807.450534] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 5807.450536] Call trace: [ 5807.450537] dump_backtrace+0xa0/0x128 [ 5807.450542] show_stack+0x20/0x38 [ 5807.450544] dump_stack_lvl+0xe8/0x178 [ 5807.450550] print_address_description.constprop.0+0x84/0x3a0 [ 5807.450555] print_report+0xb0/0x278 [ 5807.450557] kasan_report+0x90/0xd0 [ 5807.450559] __asan_report_load4_noabort+0x20/0x30 [ 5807.450562] d_alloc_parallel+0xbfc/0xdf8 [ 5807.450564] lookup_open.isra.0+0x6e0/0xe88 [ 5807.450567] open_last_lookups+0x740/0xe88 [ 5807.450571] path_openat+0x16c/0x538 [ 5807.450573] do_filp_open+0x174/0x340 [ 5807.450576] do_sys_openat2+0x134/0x180 [ 5807.450579] __arm64_sys_openat+0x138/0x1d0 [ 5807.450581] invoke_syscall.constprop.0+0xdc/0x1e0 [ 5807.450584] do_el0_svc+0x154/0x1d0 [ 5807.450586] el0_svc+0x58/0x118 [ 5807.450590] el0t_64_sync_handler+0x120/0x130 [ 5807.450593] el0t_64_sync+0x1a4/0x1a8 [ 5807.450595] [ 5807.450596] Allocated by task 280071: [ 5807.450598] kasan_save_stack+0x3c/0x68 [ 5807.450604] kasan_set_track+0x2c/0x40 [ 5807.450607] kasan_save_alloc_info+0x24/0x38 [ 5807.450609] __kasan_slab_alloc+0x8c/0x90 [ 5807.450613] kmem_cache_alloc+0x144/0x300 [ 5807.450618] alloc_empty_file+0x6c/0x180 [ 5807.450621] path_openat+0xd0/0x538 [ 5807.450623] do_filp_open+0x174/0x340 [ 5807.450626] do_sys_openat2+0x134/0x180 [ 5807.450628] __arm64_sys_openat+0x138/0x1d0 [ 5807.450630] invoke_syscall.constprop.0+0xdc/0x1e0 [ 5807.450632] do_el0_svc+0x154/0x1d0 [ 5807.450634] el0_svc+0x58/0x118 [ 5807.450637] el0t_64_sync_handler+0x120/0x130 [ 5807.450639] el0t_64_sync+0x1a4/0x1a8 [ 5807.450642] [ 5807.450643] Freed by task 79: [ 5807.450645] kasan_save_stack+0x3c/0x68 [ 5807.450649] kasan_set_track+0x2c/0x40 [ 5807.450652] kasan_save_free_info+0x38/0x60 [ 5807.450654] __kasan_slab_free+0xe4/0x150 [ 5807.450658] slab_free_freelist_hook+0xf4/0x1d0 [ 5807.450662] kmem_cache_free+0x1d0/0x3e8 [ 5807.450665] file_free_rcu+0xa4/0x120 [ 5807.450668] rcu_do_batch+0x4e0/0x1860 [ 5807.450671] rcu_core+0x408/0x5b0 [ 5807.450673] rcu_core_si+0x18/0x30 [ 5807.450676] __do_softirq+0x2e0/0xed0 [ 5807.450678] [ 5807.450678] Last potentially related work creation: [ 5807.450680] kasan_save_stack+0x3c/0x68 [ 5807.450683] __kasan_record_aux_stack+0x9c/0xc8 [ 5807.450686] kasan_record_aux_stack_noalloc+0x14/0x20 [ 5807.450689] __call_rcu_common.constprop.0+0x100/0x940 [ 5807.450691] call_rcu+0x18/0x30 [ 5807.450693] __fput+0x404/0x848 [ 5807.450696] __fput_sync+0x7c/0x98 [ 5807.450698] __arm64_sys_close+0x74/0xd0 [ 5807.450700] invoke_syscall.constprop.0+0xdc/0x1e0 [ 5807.450702] do_el0_svc+0x154/0x1d0 [ 5807.450704] el0_svc+0x58/0x118 [ 5807.450707] el0t_64_sync_handler+0x120/0x130 [ 5807.450710] el0t_64_sync+0x1a4/0x1a8 [ 5807.450712] [ 5807.450712] Second to last potentially related work creation: [ 5807.450714] kasan_save_stack+0x3c/0x68 [ 5807.450717] __kasan_record_aux_stack+0x9c/0xc8 [ 5807.450719] kasan_record_aux_stack_noalloc+0x14/0x20 [ 5807.450722] __call_rcu_common.constprop.0+0x100/0x940 [ 5807.450724] call_rcu+0x18/0x30 [ 5807.450726] __fput+0x404/0x848 [ 5807.450728] __fput_sync+0x7c/0x98 [ 5807.450731] __arm64_sys_close+0x74/0xd0 [ 5807.450733] invoke_syscall.constprop.0+0xdc/0x1e0 [ 5807.450735] do_el0_svc+0x154/0x1d0 [ 5807.450737] el0_svc+0x58/0x118 [ 5807.450740] el0t_64_sync_handler+0x120/0x130 [ 5807.450742] el0t_64_sync+0x1a4/0x1a8 [ 5807.450744] [ 5807.450745] The buggy address belongs to the object at ffff000169d87480 [ 5807.450745] which belongs to the cache filp of size 464 [ 5807.450748] The buggy address is located 432 bytes inside of [ 5807.450748] freed 464-byte region [ffff000169d87480, ffff000169d87650) [ 5807.450751] [ 5807.450752] The buggy address belongs to the physical page: [ 5807.450754] page:00000000310d19d2 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1a9d84 [ 5807.450758] head:00000000310d19d2 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 5807.450760] memcg:ffff0001350cb601 [ 5807.450762] flags: 0x2fffff00000840(slab|head|node=0|zone=2|lastcpupid=0xfffff) [ 5807.450766] page_type: 0xffffffff() [ 5807.450770] raw: 002fffff00000840 ffff0000d225eb40 fffffc0004b0cc00 dead000000000002 [ 5807.450772] raw: 0000000000000000 0000000000190019 00000001ffffffff ffff0001350cb601 [ 5807.450774] page dumped because: kasan: bad access detected [ 5807.450775] [ 5807.450776] Memory state around the buggy address: [ 5807.450778] ffff000169d87500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 5807.450779] ffff000169d87580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 5807.450781] >ffff000169d87600: fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc [ 5807.450783] ^ [ 5807.450784] ffff000169d87680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 5807.450786] ffff000169d87700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 5807.450787] ================================================================== [ 5807.450790] Disabling lock debugging due to kernel taint [ 5807.458789] Unable to handle kernel execute from non-executable memory at virtual address ffffb66294dcf728 [ 5807.486215] KASAN: maybe wild-memory-access in range [0x0001b314a6e7b940-0x0001b314a6e7b947] [ 5807.486221] Mem abort info: [ 5807.486223] ESR = 0x000000008600000e [ 5807.486226] EC = 0x21: IABT (current EL), IL = 32 bits [ 5807.486228] SET = 0, FnV = 0 [ 5807.486230] EA = 0, S1PTW = 0 [ 5807.486232] FSC = 0x0e: level 2 permission fault [ 5807.486235] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000012b3603000 [ 5807.486238] [ffffb66294dcf728] pgd=100000233ffff003, p4d=100000233ffff003, pud=100000233fffe003, pmd=00680012b7600f01 [ 5807.486250] Internal error: Oops: 000000008600000e [#1] SMP [ 5807.486254] Modules linked in: n_hdlc slcan can_dev slip slhc nfsv3 nfs_acl nfs lockd grace fscache netfs tun brd overlay exfat ext4 mbcache jbd2 rfkill sunrpc vfat fat loop fuse dm_mod xfs crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce virtio_blk virtio_console virtio_net net_failover failover virtio_mmio [last unloaded: hwpoison_inject] [ 5807.486304] CPU: 9 PID: 0 Comm: swapper/9 Tainted: G B ------- --- 6.6.0-0.rc2.20.test.eln.aarch64+debug #1 [ 5807.486308] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 5807.486310] pstate: 10400005 (nzcV daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 5807.486313] pc : in_lookup_hashtable+0x13c8/0x2020 [ 5807.486320] lr : rcu_do_batch+0x4e0/0x1860 [ 5807.486324] sp : ffff800080097cb0 [ 5807.486325] x29: ffff800080097cb0 x28: 0000000000000009 x27: ffff0000d293c008 [ 5807.486330] x26: ffffb6628fb04ca0 x25: ffffb66292266000 x24: ffffb66291de8fc8 [ 5807.486334] x23: ffffb66291de9a20 x22: 0000000000000066 x21: dfff800000000000 [ 5807.486338] x20: ffffb66290e73008 x19: ffff0002f4ba4418 x18: 0000000000000000 [ 5807.486342] x17: ffff49bb6df11000 x16: ffff800080090000 x15: ffff001dfed8b168 [ 5807.486345] x14: ffff001dfed8b0e8 x13: ffff001dfed8b068 x12: ffff76cc529d0793 [ 5807.486349] x11: 1ffff6cc529d0792 x10: ffff76cc529d0792 x9 : ffffb66291de9000 [ 5807.486353] x8 : 0000000000000007 x7 : 0000000000000000 x6 : ffffb6628d8d0488 [ 5807.486357] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 1fffe003bfdb2e10 [ 5807.486360] x2 : 1fffe0001a527801 x1 : ffffb66294dcf728 x0 : ffff0002f4ba4418 [ 5807.486364] Call trace: [ 5807.486365] in_lookup_hashtable+0x13c8/0x2020 [ 5807.486368] rcu_core+0x408/0x5b0 [ 5807.486371] rcu_core_si+0x18/0x30 [ 5807.486373] __do_softirq+0x2e0/0xed0 [ 5807.486376] ____do_softirq+0x18/0x30 [ 5807.486379] call_on_irq_stack+0x24/0x30 [ 5807.486384] do_softirq_own_stack+0x24/0x38 [ 5807.486386] __irq_exit_rcu+0x1f8/0x580 [ 5807.486391] irq_exit_rcu+0x1c/0x90 [ 5807.486393] el1_interrupt+0x4c/0xb0 [ 5807.486398] el1h_64_irq_handler+0x18/0x28 [ 5807.486400] el1h_64_irq+0x78/0x80 [ 5807.486402] arch_local_irq_enable+0x8/0x20 [ 5807.486406] cpuidle_idle_call+0x26c/0x370 [ 5807.486409] do_idle+0x1ac/0x208 [ 5807.486411] cpu_startup_entry+0x2c/0x40 [ 5807.486413] secondary_start_kernel+0x240/0x360 [ 5807.486417] __secondary_switched+0xb8/0xc0 [ 5807.486423] Code: 00000000 00000000 00000000 00000000 (f4ba4418) [ 5807.486426] ---[ end trace 0000000000000000 ]--- [ 5807.486428] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 5807.486430] SMP: stopping secondary CPUs [ 5807.486467] Kernel Offset: 0x36620d560000 from 0xffff800080000000 [ 5807.486469] PHYS_OFFSET: 0x40000000 [ 5807.486470] CPU features: 0x00000001,70020143,1001720b [ 5807.486472] Memory Limit: none
On Thu, Sep 21, 2023 at 11:01:06AM +0200, Jan Stancek wrote:
On Wed, Sep 20, 2023 at 7:02 PM Paul E. McKenney paulmck@kernel.org wrote:
On Wed, Sep 20, 2023 at 05:26:33PM +0100, Robin Murphy wrote:
On 20/09/2023 3:32 pm, Mark Rutland wrote:
Hi Naresh,
On Wed, Sep 20, 2023 at 11:29:12AM +0200, Naresh Kamboju wrote:
[ my two cents ] While running LTP pty07 test cases on arm64 juno-r2 with Linux next-20230919 the following kernel crash was noticed.
I have been noticing this issue intermittently on Juno-r2 for more than a month. Anyone have noticed this crash ?
How intermittent is this? 1/2, 1/10, 1/100, rarer still?
Are you running *just* the pty07 test, or are you running a whole LTP suite and the issue first occurs around pty07?
Given you've been hitting this for a month, have you tried testing mainline? Do you have a known-good kernel that we can start a bisect from?
Do you *only* see this on Juno-r2 and are you testing on other hardware?
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
[ 0.000000] Linux version 6.6.0-rc2-next-20230919 (tuxmake@tuxmake) (aarch64-linux-gnu-gcc (Debian 13.2.0-2) 13.2.0, GNU ld (GNU Binutils for Debian) 2.41) #1 SMP PREEMPT @1695107157 [ 0.000000] KASLR disabled due to lack of seed [ 0.000000] Machine model: ARM Juno development board (r2) ... LTP running pty ...
pty07.c:92: TINFO: Saving active console 1 ../../../include/tst_fuzzy_sync.h:640: TINFO: Stopped sampling at 552 (out of 1024) samples, sampling time reached 50% of the total time limit ../../../include/tst_fuzzy_sync.h:307: TINFO: loop = 552, delay_bias = 0 ../../../include/tst_fuzzy_sync.h:295: TINFO: start_a - start_b: { avg = 127ns, avg_dev = 84ns, dev_ratio = 0.66 } ../../../include/tst_fuzzy_sync.h:295: TINFO: end_a - start_a : { avg = 17296156ns, avg_dev = 5155058ns, dev_ratio = 0.30 } ../../../include/tst_fuzzy_sync.h:295: TINFO: end_b - start_b : { avg = 101202336ns, avg_dev = 6689286ns, dev_ratio = 0.07 } ../../../include/tst_fuzzy_sync.h:295: TINFO: end_a - end_b : { avg = -83906064ns, avg_dev = 10230694ns, dev_ratio = 0.12 } ../../../include/tst_fuzzy_sync.h:295: TINFO: spins : { avg = 2765565 , avg_dev = 339285 , dev_ratio = 0.12 } [ 384.133538] Unable to handle kernel execute from non-executable memory at virtual address ffff8000834c13a0 [ 384.133559] Mem abort info: [ 384.133568] ESR = 0x000000008600000f [ 384.133578] EC = 0x21: IABT (current EL), IL = 32 bits [ 384.133590] SET = 0, FnV = 0 [ 384.133600] EA = 0, S1PTW = 0 [ 384.133610] FSC = 0x0f: level 3 permission fault [ 384.133621] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000082375000 [ 384.133634] [ffff8000834c13a0] pgd=10000009fffff003, p4d=10000009fffff003, pud=10000009ffffe003, pmd=10000009ffff8003, pte=00780000836c1703 [ 384.133697] Internal error: Oops: 000000008600000f [#1] PREEMPT SMP [ 384.133707] Modules linked in: tda998x onboard_usb_hub cec hdlcd crct10dif_ce drm_dma_helper drm_kms_helper fuse drm backlight dm_mod ip_tables x_tables [ 384.133767] CPU: 3 PID: 589 Comm: (udev-worker) Not tainted 6.6.0-rc2-next-20230919 #1 [ 384.133779] Hardware name: ARM Juno development board (r2) (DT) [ 384.133784] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 384.133796] pc : in_lookup_hashtable+0x178/0x2000
This indicates that the faulting address ffff8000834c13a0 is in_lookup_hashtable+0x178/0x2000, which would been we've somehow marked the kernel text as non-executable, which we never do intentionally.
I suspect that implies memory corruption. Have you tried running this with KASAN enabled?
[ 384.133818] lr : rcu_core (arch/arm64/include/asm/preempt.h:13 (discriminator 1) kernel/rcu/tree.c:2146 (discriminator 1) kernel/rcu/tree.c:2403 (discriminator 1))
For the record, this LR appears to be the expected return address of the "f(rhp);" call within rcu_do_batch() (if CONFIG_DEBUG_LOCK_ALLOC=n), so it looks like a case of a bogus or corrupted RCU callback. The PC is in the middle of a data symbol (in_lookup_hashtable is an array), so NX is expected and I wouldn't imagine the pagetables have gone wrong, just regular data corruption or use-after-free somewhere.
Is it possible to use either KASAN or CONFIG_DEBUG_OBJECTS_RCU_HEAD=y here?
CKI has been also running into issues during pty07 runs lately. This is from aarch64 debug kernel:
These might well be related, so perhaps fixing one will fix the other.
Thanx, Paul
[ 5537.660548] LTP: starting pty07 [-- MARK -- Mon Sep 18 14:30:00 2023] [ 5807.450507] ================================================================== [ 5807.450515] BUG: KASAN: slab-use-after-free in d_alloc_parallel+0xbfc/0xdf8 [ 5807.450524] Read of size 4 at addr ffff000169d87630 by task (udev-worker)/280492 [ 5807.450527] [ 5807.450530] CPU: 4 PID: 280492 Comm: (udev-worker) Not tainted 6.6.0-0.rc2.20.test.eln.aarch64+debug #1 [ 5807.450534] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 5807.450536] Call trace: [ 5807.450537] dump_backtrace+0xa0/0x128 [ 5807.450542] show_stack+0x20/0x38 [ 5807.450544] dump_stack_lvl+0xe8/0x178 [ 5807.450550] print_address_description.constprop.0+0x84/0x3a0 [ 5807.450555] print_report+0xb0/0x278 [ 5807.450557] kasan_report+0x90/0xd0 [ 5807.450559] __asan_report_load4_noabort+0x20/0x30 [ 5807.450562] d_alloc_parallel+0xbfc/0xdf8 [ 5807.450564] lookup_open.isra.0+0x6e0/0xe88 [ 5807.450567] open_last_lookups+0x740/0xe88 [ 5807.450571] path_openat+0x16c/0x538 [ 5807.450573] do_filp_open+0x174/0x340 [ 5807.450576] do_sys_openat2+0x134/0x180 [ 5807.450579] __arm64_sys_openat+0x138/0x1d0 [ 5807.450581] invoke_syscall.constprop.0+0xdc/0x1e0 [ 5807.450584] do_el0_svc+0x154/0x1d0 [ 5807.450586] el0_svc+0x58/0x118 [ 5807.450590] el0t_64_sync_handler+0x120/0x130 [ 5807.450593] el0t_64_sync+0x1a4/0x1a8 [ 5807.450595] [ 5807.450596] Allocated by task 280071: [ 5807.450598] kasan_save_stack+0x3c/0x68 [ 5807.450604] kasan_set_track+0x2c/0x40 [ 5807.450607] kasan_save_alloc_info+0x24/0x38 [ 5807.450609] __kasan_slab_alloc+0x8c/0x90 [ 5807.450613] kmem_cache_alloc+0x144/0x300 [ 5807.450618] alloc_empty_file+0x6c/0x180 [ 5807.450621] path_openat+0xd0/0x538 [ 5807.450623] do_filp_open+0x174/0x340 [ 5807.450626] do_sys_openat2+0x134/0x180 [ 5807.450628] __arm64_sys_openat+0x138/0x1d0 [ 5807.450630] invoke_syscall.constprop.0+0xdc/0x1e0 [ 5807.450632] do_el0_svc+0x154/0x1d0 [ 5807.450634] el0_svc+0x58/0x118 [ 5807.450637] el0t_64_sync_handler+0x120/0x130 [ 5807.450639] el0t_64_sync+0x1a4/0x1a8 [ 5807.450642] [ 5807.450643] Freed by task 79: [ 5807.450645] kasan_save_stack+0x3c/0x68 [ 5807.450649] kasan_set_track+0x2c/0x40 [ 5807.450652] kasan_save_free_info+0x38/0x60 [ 5807.450654] __kasan_slab_free+0xe4/0x150 [ 5807.450658] slab_free_freelist_hook+0xf4/0x1d0 [ 5807.450662] kmem_cache_free+0x1d0/0x3e8 [ 5807.450665] file_free_rcu+0xa4/0x120 [ 5807.450668] rcu_do_batch+0x4e0/0x1860 [ 5807.450671] rcu_core+0x408/0x5b0 [ 5807.450673] rcu_core_si+0x18/0x30 [ 5807.450676] __do_softirq+0x2e0/0xed0 [ 5807.450678] [ 5807.450678] Last potentially related work creation: [ 5807.450680] kasan_save_stack+0x3c/0x68 [ 5807.450683] __kasan_record_aux_stack+0x9c/0xc8 [ 5807.450686] kasan_record_aux_stack_noalloc+0x14/0x20 [ 5807.450689] __call_rcu_common.constprop.0+0x100/0x940 [ 5807.450691] call_rcu+0x18/0x30 [ 5807.450693] __fput+0x404/0x848 [ 5807.450696] __fput_sync+0x7c/0x98 [ 5807.450698] __arm64_sys_close+0x74/0xd0 [ 5807.450700] invoke_syscall.constprop.0+0xdc/0x1e0 [ 5807.450702] do_el0_svc+0x154/0x1d0 [ 5807.450704] el0_svc+0x58/0x118 [ 5807.450707] el0t_64_sync_handler+0x120/0x130 [ 5807.450710] el0t_64_sync+0x1a4/0x1a8 [ 5807.450712] [ 5807.450712] Second to last potentially related work creation: [ 5807.450714] kasan_save_stack+0x3c/0x68 [ 5807.450717] __kasan_record_aux_stack+0x9c/0xc8 [ 5807.450719] kasan_record_aux_stack_noalloc+0x14/0x20 [ 5807.450722] __call_rcu_common.constprop.0+0x100/0x940 [ 5807.450724] call_rcu+0x18/0x30 [ 5807.450726] __fput+0x404/0x848 [ 5807.450728] __fput_sync+0x7c/0x98 [ 5807.450731] __arm64_sys_close+0x74/0xd0 [ 5807.450733] invoke_syscall.constprop.0+0xdc/0x1e0 [ 5807.450735] do_el0_svc+0x154/0x1d0 [ 5807.450737] el0_svc+0x58/0x118 [ 5807.450740] el0t_64_sync_handler+0x120/0x130 [ 5807.450742] el0t_64_sync+0x1a4/0x1a8 [ 5807.450744] [ 5807.450745] The buggy address belongs to the object at ffff000169d87480 [ 5807.450745] which belongs to the cache filp of size 464 [ 5807.450748] The buggy address is located 432 bytes inside of [ 5807.450748] freed 464-byte region [ffff000169d87480, ffff000169d87650) [ 5807.450751] [ 5807.450752] The buggy address belongs to the physical page: [ 5807.450754] page:00000000310d19d2 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1a9d84 [ 5807.450758] head:00000000310d19d2 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 5807.450760] memcg:ffff0001350cb601 [ 5807.450762] flags: 0x2fffff00000840(slab|head|node=0|zone=2|lastcpupid=0xfffff) [ 5807.450766] page_type: 0xffffffff() [ 5807.450770] raw: 002fffff00000840 ffff0000d225eb40 fffffc0004b0cc00 dead000000000002 [ 5807.450772] raw: 0000000000000000 0000000000190019 00000001ffffffff ffff0001350cb601 [ 5807.450774] page dumped because: kasan: bad access detected [ 5807.450775] [ 5807.450776] Memory state around the buggy address: [ 5807.450778] ffff000169d87500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 5807.450779] ffff000169d87580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 5807.450781] >ffff000169d87600: fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc [ 5807.450783] ^ [ 5807.450784] ffff000169d87680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 5807.450786] ffff000169d87700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 5807.450787] ================================================================== [ 5807.450790] Disabling lock debugging due to kernel taint [ 5807.458789] Unable to handle kernel execute from non-executable memory at virtual address ffffb66294dcf728 [ 5807.486215] KASAN: maybe wild-memory-access in range [0x0001b314a6e7b940-0x0001b314a6e7b947] [ 5807.486221] Mem abort info: [ 5807.486223] ESR = 0x000000008600000e [ 5807.486226] EC = 0x21: IABT (current EL), IL = 32 bits [ 5807.486228] SET = 0, FnV = 0 [ 5807.486230] EA = 0, S1PTW = 0 [ 5807.486232] FSC = 0x0e: level 2 permission fault [ 5807.486235] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000012b3603000 [ 5807.486238] [ffffb66294dcf728] pgd=100000233ffff003, p4d=100000233ffff003, pud=100000233fffe003, pmd=00680012b7600f01 [ 5807.486250] Internal error: Oops: 000000008600000e [#1] SMP [ 5807.486254] Modules linked in: n_hdlc slcan can_dev slip slhc nfsv3 nfs_acl nfs lockd grace fscache netfs tun brd overlay exfat ext4 mbcache jbd2 rfkill sunrpc vfat fat loop fuse dm_mod xfs crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce virtio_blk virtio_console virtio_net net_failover failover virtio_mmio [last unloaded: hwpoison_inject] [ 5807.486304] CPU: 9 PID: 0 Comm: swapper/9 Tainted: G B ------- --- 6.6.0-0.rc2.20.test.eln.aarch64+debug #1 [ 5807.486308] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 5807.486310] pstate: 10400005 (nzcV daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 5807.486313] pc : in_lookup_hashtable+0x13c8/0x2020 [ 5807.486320] lr : rcu_do_batch+0x4e0/0x1860 [ 5807.486324] sp : ffff800080097cb0 [ 5807.486325] x29: ffff800080097cb0 x28: 0000000000000009 x27: ffff0000d293c008 [ 5807.486330] x26: ffffb6628fb04ca0 x25: ffffb66292266000 x24: ffffb66291de8fc8 [ 5807.486334] x23: ffffb66291de9a20 x22: 0000000000000066 x21: dfff800000000000 [ 5807.486338] x20: ffffb66290e73008 x19: ffff0002f4ba4418 x18: 0000000000000000 [ 5807.486342] x17: ffff49bb6df11000 x16: ffff800080090000 x15: ffff001dfed8b168 [ 5807.486345] x14: ffff001dfed8b0e8 x13: ffff001dfed8b068 x12: ffff76cc529d0793 [ 5807.486349] x11: 1ffff6cc529d0792 x10: ffff76cc529d0792 x9 : ffffb66291de9000 [ 5807.486353] x8 : 0000000000000007 x7 : 0000000000000000 x6 : ffffb6628d8d0488 [ 5807.486357] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 1fffe003bfdb2e10 [ 5807.486360] x2 : 1fffe0001a527801 x1 : ffffb66294dcf728 x0 : ffff0002f4ba4418 [ 5807.486364] Call trace: [ 5807.486365] in_lookup_hashtable+0x13c8/0x2020 [ 5807.486368] rcu_core+0x408/0x5b0 [ 5807.486371] rcu_core_si+0x18/0x30 [ 5807.486373] __do_softirq+0x2e0/0xed0 [ 5807.486376] ____do_softirq+0x18/0x30 [ 5807.486379] call_on_irq_stack+0x24/0x30 [ 5807.486384] do_softirq_own_stack+0x24/0x38 [ 5807.486386] __irq_exit_rcu+0x1f8/0x580 [ 5807.486391] irq_exit_rcu+0x1c/0x90 [ 5807.486393] el1_interrupt+0x4c/0xb0 [ 5807.486398] el1h_64_irq_handler+0x18/0x28 [ 5807.486400] el1h_64_irq+0x78/0x80 [ 5807.486402] arch_local_irq_enable+0x8/0x20 [ 5807.486406] cpuidle_idle_call+0x26c/0x370 [ 5807.486409] do_idle+0x1ac/0x208 [ 5807.486411] cpu_startup_entry+0x2c/0x40 [ 5807.486413] secondary_start_kernel+0x240/0x360 [ 5807.486417] __secondary_switched+0xb8/0xc0 [ 5807.486423] Code: 00000000 00000000 00000000 00000000 (f4ba4418) [ 5807.486426] ---[ end trace 0000000000000000 ]--- [ 5807.486428] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 5807.486430] SMP: stopping secondary CPUs [ 5807.486467] Kernel Offset: 0x36620d560000 from 0xffff800080000000 [ 5807.486469] PHYS_OFFSET: 0x40000000 [ 5807.486470] CPU features: 0x00000001,70020143,1001720b [ 5807.486472] Memory Limit: none
Hello,
On sdm845 cheza board, with 6.5.4 (before running 6.4.x), I saw this issue too. Happened only once, but I can try stress-test it how often it pops up. Whole log is here [1]. Used kernel is referenced from full log, kernel extra patches [2].
... 23-09-22 10:48:53 R SERIAL-CPU> deviceName: Turnip Adreno (TM) 630 23-09-22 10:48:55 R SERIAL-CPU> Running dEQP on 8 threads in 500-test groups 23-09-22 10:48:57 R SERIAL-CPU> Running dEQP on 8 threads in 500-test groups 23-09-22 10:49:00 R SERIAL-CPU> Running dEQP on 8 threads in 500-test groups 23-09-22 10:49:03 R SERIAL-CPU> Running dEQP on 8 threads in 188-test groups 23-09-22 10:49:05 R SERIAL-CPU> Running dEQP on 8 threads in 10-test groups 23-09-22 10:49:08 R SERIAL-CPU> Running dEQP on 8 threads in 378-test groups 23-09-22 10:49:10 R SERIAL-CPU> Running dEQP on 8 threads in 500-test groups 23-09-22 10:49:10 R SERIAL-CPU> Pass: 0, Duration: 0 23-09-22 10:49:28 R SERIAL-CPU> ERROR - dEQP error: SPIR-V WARNING: 23-09-22 10:49:28 R SERIAL-CPU> ERROR - dEQP error: In file ../src/compiler/spirv/spirv_to_nir.c:1492 23-09-22 10:49:28 R SERIAL-CPU> ERROR - dEQP error: Image Type operand of OpTypeSampledImage should not have a Dim of Buffer. 23-09-22 10:49:28 R SERIAL-CPU> ERROR - dEQP error: 456 bytes into the SPIR-V binary 23-09-22 10:49:28 R SERIAL-CPU> Pass: 222, Skip: 278, Duration: 17, Remaining: 46:28 23-09-22 10:49:43 R SERIAL-CPU> ERROR - dEQP error: SPIR-V WARNING: 23-09-22 10:49:43 R SERIAL-CPU> ERROR - dEQP error: In file ../src/compiler/spirv/spirv_to_nir.c:4772 23-09-22 10:49:43 R SERIAL-CPU> ERROR - dEQP error: Unsupported SPIR-V capability: SpvCapabilityUniformAndStorageBuffer16BitAccess (4434) 23-09-22 10:49:43 R SERIAL-CPU> ERROR - dEQP error: 36 bytes into the SPIR-V binary 23-09-22 10:49:43 R SERIAL-CPU> ERROR - dEQP error: SPIR-V WARNING: 23-09-22 10:49:43 R SERIAL-CPU> ERROR - dEQP error: In file ../src/compiler/spirv/spirv_to_nir.c:4772 23-09-22 10:49:43 R SERIAL-CPU> ERROR - dEQP error: Unsupported SPIR-V capability: SpvCapabilityUniformAndStorageBuffer16BitAccess (4434) 23-09-22 10:49:43 R SERIAL-CPU> ERROR - dEQP error: 36 bytes into the SPIR-V binary 23-09-22 10:49:43 R SERIAL-CPU> Pass: 1949, Skip: 2551, Duration: 32, Remaining: 9:15 23-09-22 10:49:44 R SERIAL-CPU> [ 73.300176] Unable to handle kernel execute from non-executable memory at virtual address ffffaa213674cd88 23-09-22 10:49:44 R SERIAL-CPU> [ 73.310124] Mem abort info: 23-09-22 10:49:44 R SERIAL-CPU> [ 73.313003] ESR = 0x000000008600000f 23-09-22 10:49:44 R SERIAL-CPU> [ 73.316859] EC = 0x21: IABT (current EL), IL = 32 bits 23-09-22 10:49:44 R SERIAL-CPU> [ 73.322318] SET = 0, FnV = 0 23-09-22 10:49:44 R SERIAL-CPU> [ 73.325464] EA = 0, S1PTW = 0 23-09-22 10:49:44 R SERIAL-CPU> [ 73.328703] FSC = 0x0f: level 3 permission fault 23-09-22 10:49:44 R SERIAL-CPU> [ 73.333628] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000081b1c000 23-09-22 10:49:44 R SERIAL-CPU> [ 73.340513] [ffffaa213674cd88] pgd=100000027ffff003, p4d=100000027ffff003, pud=100000027fffe003, pmd=100000027fff9003, pte=007800008274cf03 23-09-22 10:49:44 R SERIAL-CPU> [ 73.353372] Internal error: Oops: 000000008600000f [#1] PREEMPT SMP 23-09-22 10:49:44 R SERIAL-CPU> [ 73.359808] Modules linked in: 23-09-22 10:49:44 R SERIAL-CPU> [ 73.362954] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 6.5.4-g8a16969a8434 #1 23-09-22 10:49:44 R SERIAL-CPU> [ 73.371705] Hardware name: Google Cheza (rev3+) (DT) 23-09-22 10:49:44 R SERIAL-CPU> [ 73.376801] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) 23-09-22 10:49:44 R SERIAL-CPU> [ 73.383944] pc : in_lookup_hashtable+0x538/0x2000 23-09-22 10:49:44 R SERIAL-CPU> [ 73.388787] lr : rcu_core+0x250/0x640 23-09-22 10:49:44 R SERIAL-CPU> [ 73.392559] sp : ffff80008000bec0 23-09-22 10:49:44 R SERIAL-CPU> [ 73.395962] x29: ffff80008000bec0 x28: ffffaa2134116b28 x27: ffffaa2136381840 23-09-22 10:49:44 R SERIAL-CPU> [ 73.403289] x26: 000000000000000a x25: ffff266077371b38 x24: 0000000000000000 23-09-22 10:49:44 R SERIAL-CPU> [ 73.410614] x23: 0000000000000003 x22: ffff80008000bf30 x21: ffff266077371ac0 23-09-22 10:49:44 R SERIAL-CPU> [ 73.417940] x20: ffff265f00190000 x19: 0000000000000004 x18: 0000000000000000 23-09-22 10:49:44 R SERIAL-CPU> [ 73.425266] x17: ffff7c3f416ea000 x16: ffff800080008000 x15: 0000000000000000 23-09-22 10:49:44 R SERIAL-CPU> [ 73.432593] x14: 0000000000000000 x13: 0000000000000078 x12: 0000000000000000 23-09-22 10:49:44 R SERIAL-CPU> [ 73.439920] x11: 0000000000000000 x10: 0000000000000001 x9 : 0000000000000000 23-09-22 10:49:44 R SERIAL-CPU> [ 73.447245] x8 : ffff80008000be50 x7 : 0000000000000000 x6 : ffff266077371b48 23-09-22 10:49:44 R SERIAL-CPU> [ 73.454571] x5 : ffffaa2134216444 x4 : fffffc99800bcb20 x3 : ffffaa2135c8be78 23-09-22 10:49:45 R SERIAL-CPU> [ 73.461902] x2 : ffff265f5d7ca700 x1 : ffffaa213674cd88 x0 : ffff265f0773e3b0 23-09-22 10:49:45 R SERIAL-CPU> [ 73.469231] Call trace: 23-09-22 10:49:45 R SERIAL-CPU> [ 73.471751] in_lookup_hashtable+0x538/0x2000 23-09-22 10:49:45 R SERIAL-CPU> [ 73.476235] rcu_core_si+0x10/0x1c 23-09-22 10:49:45 R SERIAL-CPU> [ 73.479739] __do_softirq+0x10c/0x284 23-09-22 10:49:45 R SERIAL-CPU> [ 73.483508] ____do_softirq+0x10/0x1c 23-09-22 10:49:45 R SERIAL-CPU> [ 73.487276] call_on_irq_stack+0x24/0x4c 23-09-22 10:49:45 R SERIAL-CPU> [ 73.491310] do_softirq_own_stack+0x1c/0x28 23-09-22 10:49:45 R SERIAL-CPU> [ 73.495609] irq_exit_rcu+0xd8/0xf4 23-09-22 10:49:45 R SERIAL-CPU> [ 73.499197] el1_interrupt+0x38/0x68 23-09-22 10:49:45 R SERIAL-CPU> [ 73.502871] el1h_64_irq_handler+0x18/0x24 23-09-22 10:49:45 R SERIAL-CPU> [ 73.507086] el1h_64_irq+0x64/0x68 23-09-22 10:49:45 R SERIAL-CPU> [ 73.510587] cpuidle_enter_state+0x134/0x2e0 23-09-22 10:49:45 R SERIAL-CPU> [ 73.514973] cpuidle_enter+0x38/0x50 23-09-22 10:49:45 R SERIAL-CPU> [ 73.518648] do_idle+0x1f4/0x264 23-09-22 10:49:45 R SERIAL-CPU> [ 73.521970] cpu_startup_entry+0x28/0x2c 23-09-22 10:49:45 R SERIAL-CPU> [ 73.526005] secondary_start_kernel+0x130/0x150 23-09-22 10:49:45 R SERIAL-CPU> [ 73.530660] __secondary_switched+0xb8/0xbc 23-09-22 10:49:45 R SERIAL-CPU> [ 73.534965] Code: 00000000 00000000 00000000 00000000 (0773e3b0) 23-09-22 10:49:45 R SERIAL-CPU> [ 73.541226] ---[ end trace 0000000000000000 ]--- 23-09-22 10:49:45 R SERIAL-CPU> [ 73.545975] Kernel panic - not syncing: Oops: Fatal exception in interrupt 23-09-22 10:49:45 R SERIAL-CPU> [ 73.553032] SMP: stopping secondary CPUs 23-09-22 10:49:45 R SERIAL-CPU> [ 73.557142] Kernel Offset: 0x2a20b4000000 from 0xffff800080000000 23-09-22 10:49:45 R SERIAL-CPU> [ 73.563396] PHYS_OFFSET: 0xffffd9a200000000 23-09-22 10:49:45 R SERIAL-CPU> [ 73.567687] CPU features: 0x00000000,800140a1,8800721b 23-09-22 10:49:45 R SERIAL-CPU> [ 73.572962] Memory Limit: none 23-09-22 10:49:45 R SERIAL-CPU> [ 73.576106] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
David
[1] https://gitlab.freedesktop.org/mesa/mesa/-/jobs/49333832
[2] https://gitlab.freedesktop.org/gfx-ci/linux/-/commits/v6.5-for-mesa-ci/