The kernel crash reported on arm64 juno-r2 device with kselftest-merge config while booting Linux next-20220513 kernel [1].
[ 0.000000] Booting Linux on physical CPU 0x0000000100 [0x410fd033] [ 0.000000] Linux version 5.18.0-rc6-next-20220513 (oe-user@oe-host) (aarch64-linaro-linux-gcc (GCC) 7.3.0, GNU ld (GNU Binutils) 2.30.0.20180208) #1 SMP PREEMPT Fri May 13 08:34:42 UTC 2022 [ 0.000000] Machine model: ARM Juno development board (r2) [ 0.000000] earlycon: pl11 at MMIO 0x000000007ff80000 (options '') [ 0.000000] printk: bootconsole [pl11] enabled [ 0.000000] efi: UEFI not found. [ 0.000000] NUMA: No NUMA configuration found [ 0.000000] NUMA: Faking a node at [mem 0x0000000080000000-0x00000009ffffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x9fefce600-0x9fefd0fff] [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x0000000080000000-0x00000000ffffffff] [ 0.000000] DMA32 empty [ 0.000000] Normal [mem 0x0000000100000000-0x00000009ffffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000080000000-0x00000000feffffff] [ 0.000000] node 0: [mem 0x0000000880000000-0x00000009ffffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000009ffffffff] [ 0.000000] On node 0, zone Normal: 4096 pages in unavailable ranges [ 0.000000] cma: Reserved 32 MiB at 0x00000000fd000000 [ 0.000000] psci: probing for conduit method from DT. [ 0.000000] psci: PSCIv1.1 detected in firmware. [ 0.000000] psci: Using standard PSCI v0.2 function IDs [ 0.000000] psci: Trusted OS migration not required [ 0.000000] psci: SMC Calling Convention v1.0 [ 0.000000] percpu: Embedded 31 pages/cpu s89632 r8192 d29152 u126976 [ 0.000000] pcpu-alloc: s89632 r8192 d29152 u126976 alloc=31*4096 [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [ 0.000000] Detected VIPT I-cache on CPU0 [ 0.000000] CPU features: detected: ARM erratum 843419 [ 0.000000] CPU features: detected: ARM erratum 845719 [ 0.000000] Fallback order for Node 0: 0 [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 2060288 [ 0.000000] Policy zone: Normal [ 0.000000] Kernel command line: console=ttyAMA0,115200n8 root=/dev/nfs rw nfsroot=10.66.16.125:/var/lib/lava/dispatcher/tmp/5021955/extract-nfsrootfs-23zdukp_,tcp,hard,vers=3 rootwait earlycon=pl011,0x7ff80000 debug systemd.log_target=null user_debug=31 androidboot.hardware=juno loglevel=9 sky2.mac_address=0x00,0x02,0xF7,0x00,0x68,0x3F ip=dhcp [ 0.000000] Unknown kernel command line parameters "user_debug=31", will be passed to user space. [ 0.000000] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear) [ 0.000000] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes, linear) [ 0.000000] mem auto-init: stack:off, heap alloc:on, heap free:off [ 0.000000] Stack Depot early init allocating hash table with memblock_alloc, 8388608 bytes [ 0.000000] software IO TLB: mapped [mem 0x00000000f9000000-0x00000000fd000000] (64MB) [ 0.000000] Memory: 8038640K/8372224K available (22784K kernel code, 5468K rwdata, 11824K rodata, 11520K init, 11734K bss, 300816K reserved, 32768K cma-reserved) [ 0.000000] ********************************************************** [ 0.000000] ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE ** [ 0.000000] ** ** [ 0.000000] ** This system shows unhashed kernel memory addresses ** [ 0.000000] ** via the console, logs, and other interfaces. This ** [ 0.000000] ** might reduce the security of your system. ** [ 0.000000] ** ** [ 0.000000] ** If you see this message and you are not debugging ** [ 0.000000] ** the kernel, report this immediately to your system ** [ 0.000000] ** administrator! ** [ 0.000000] ** ** [ 0.000000] ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE ** [ 0.000000] ********************************************************** [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=6, Nodes=1 [ 0.000000] ftrace: allocating 69398 entries in 272 pages [ 0.000000] ftrace: allocated 272 pages with 2 groups [ 0.000000] trace event string verifier disabled [ 0.000000] Running RCU self tests [ 0.000000] rcu: Preemptible hierarchical RCU implementation. [ 0.000000] rcu: RCU event tracing is enabled. [ 0.000000] rcu: RCU lockdep checking is enabled. [ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=6. [ 0.000000] Trampoline variant of Tasks RCU enabled. [ 0.000000] Rude variant of Tasks RCU enabled. [ 0.000000] Tracing variant of Tasks RCU enabled. [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. [ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=6 [ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 [ 0.000000] Root IRQ handler: gic_handle_irq [ 0.000000] GIC: Using split EOI/Deactivate mode [ 0.000000] Unexpected kernel BRK exception at EL1 [ 0.000000] Internal error: BRK handler: f20003e8 [#1] PREEMPT SMP [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.0-rc6-next-20220513 #1 [ 0.000000] Hardware name: ARM Juno development board (r2) (DT) [ 0.000000] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 0.000000] pc : gic_dist_config+0x4c/0x68 [ 0.000000] lr : gic_init_bases+0xd4/0x248 [ 0.000000] sp : ffff80000ad33b90 [ 0.000000] x29: ffff80000ad33b90 x28: ffff80000a2dd8d8 x27: ffff80000ad442d0 [ 0.000000] x26: dead000000000100 x25: ffff00097efdeb10 x24: 0000000000000000 [ 0.000000] x23: ffff80000a0ef000 x22: ffff80000a0ef000 x21: 0000000000000000 [ 0.000000] x20: ffff800008005000 x19: 0000000000000180 x18: 0000000000000000 [ 0.000000] x17: ffff80000a2237c4 x16: ffff80000a223210 x15: ffff8000087f8fa0 [ 0.000000] x14: ffff80000815dfdc x13: ffff80000a1f03ec x12: 000000000004ff45 [ 0.000000] x11: ffff80000b33e680 x10: 0000000000000032 x9 : ffff80000b29b7b0 [ 0.000000] x8 : ffff80000bcfb24c x7 : 00000000ffffffff x6 : ffff80000ad44244 [ 0.000000] x5 : ffff80000ad4d100 x4 : ffff80000ade3508 x3 : 0000000004040404 [ 0.000000] x2 : 0000000000000180 x1 : 0000000000000180 x0 : ffff800008005c5c [ 0.000000] Call trace: [ 0.000000] gic_dist_config+0x4c/0x68 [ 0.000000] gic_init_bases+0xd4/0x248 [ 0.000000] __gic_init_bases+0xac/0x16c [ 0.000000] gic_of_init+0x28c/0x380 [ 0.000000] of_irq_init+0x208/0x3f8 [ 0.000000] irqchip_init+0x1c/0x40 [ 0.000000] init_IRQ+0xf0/0x108 [ 0.000000] start_kernel+0x53c/0x740 [ 0.000000] __primary_switched+0xc0/0xc8 [ 0.000000] Code: b900001f 11004042 6b02027f 54ffff48 (d4207d00) [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
metadata: git branch: master git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git git commit: 1e1b28b936aed946122b4e0991e7144fdbbfd77e git describe: next-20220513 make_kernelversion: 5.18.0-rc6 kernel-config: http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/juno/lkft/linux-next... build-location: http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/juno/lkft/linux-next... toolchain: aarch64-linaro-linux 7.%
-- Linaro LKFT https://lkft.linaro.org
[1] https://lkft.validation.linaro.org/scheduler/job/5021955#L931
On Mon, 16 May 2022 07:16:22 +0100, Naresh Kamboju naresh.kamboju@linaro.org wrote:
Huh. Who inserts random BRKs like this?
Please provide a disassembly of this function.
Thanks,
M.
Hi Marc,
Thanks for looking into this report.
On Mon, 16 May 2022 at 12:38, Marc Zyngier maz@kernel.org wrote:
<trim>
objdump snipper is here. http://ix.io/3XUW
The vmlinux file is located in this url Please make use of it. http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/juno/lkft/linux-next...
- Naresh
Hi Marc,
FYI,
On Mon, 16 May 2022 at 19:28, Naresh Kamboju naresh.kamboju@linaro.org wrote:
This is happening only with the gcc-7.3.x built kernel on arm64 juno-r2 device.
- Naresh
On Mon, 16 May 2022 14:58:28 +0100, Naresh Kamboju naresh.kamboju@linaro.org wrote:
Wrong function (I wasn't clear I wanted the breaking function, not the caller).
ffff8000087f9908 <gic_dist_config>: ffff8000087f9908: a9bd7bfd stp x29, x30, [sp, #-48]! ffff8000087f990c: 910003fd mov x29, sp ffff8000087f9910: a90153f3 stp x19, x20, [sp, #16] ffff8000087f9914: f90013f5 str x21, [sp, #32] ffff8000087f9918: 2a0103f3 mov w19, w1 ffff8000087f991c: aa0003f4 mov x20, x0 ffff8000087f9920: aa0203f5 mov x21, x2 ffff8000087f9924: aa1e03e0 mov x0, x30 ffff8000087f9928: 97e0de72 bl ffff8000080312f0 <_mcount> ffff8000087f992c: 7100827f cmp w19, #0x20 ffff8000087f9930: 54000149 b.ls ffff8000087f9958 <gic_dist_config+0x50> // b.plast ffff8000087f9934: 52800402 mov w2, #0x20 // #32 ffff8000087f9938: 53027c40 lsr w0, w2, #2 ffff8000087f993c: 91300000 add x0, x0, #0xc00 ffff8000087f9940: 8b000280 add x0, x20, x0 ffff8000087f9944: b900001f str wzr, [x0] ffff8000087f9948: 11004042 add w2, w2, #0x10 ffff8000087f994c: 6b02027f cmp w19, w2 ffff8000087f9950: 54ffff48 b.hi ffff8000087f9938 <gic_dist_config+0x30> // b.pmore ffff8000087f9954: d4207d00 brk #0x3e8
What the hell is this??? This function has no WARN_ON, no BUG_ON, the allowed values for the immediate are:
#define KPROBES_BRK_IMM 0x004 #define UPROBES_BRK_IMM 0x005 #define KPROBES_BRK_SS_IMM 0x006 #define FAULT_BRK_IMM 0x100 #define KGDB_DYN_DBG_BRK_IMM 0x400 #define KGDB_COMPILED_DBG_BRK_IMM 0x401 #define BUG_BRK_IMM 0x800 #define KASAN_BRK_IMM 0x900 #define KASAN_BRK_MASK 0x0ff
and 0x3e8 isn't one of them. This seems like a GCC 'division by zero' hack, but there are no divisions by zero here. Your kernel is also full of the stuff.
What sort of odd options do you have? I can't help but notice that you have the Rust stuff in your tree. Can you please start by disabling this, just in case there is an interaction with your toolchain?
Thanks,
M.