On 3/23/2023 1:43 PM, Conor Dooley wrote:
On Thu, Mar 23, 2023 at 08:51:25AM +0100, Vlastimil Babka wrote:
On 3/23/23 08:35, Naresh Kamboju wrote:
The following kernel crash was noticed on arm x15, arm64 hikey-6220, Juno-r2, x86_64 and i386 devices on Linux next-20230323.
To add one more to the sample size, it's falling over on RISC-V too!
Its failing on AMD arch, with below trace:
2.510619] BUG: unable to handle page fault for address: 0000000008100111^M [ 2.513951] #PF: supervisor read access in kernel mode^M [ 2.521156] usb 3-1.1: New USB device found, idVendor=1604, idProduct=10c0, bcdDevice= 0.00^M [ 2.513951] #PF: error_code(0x0000) - not-present page^M [ 2.530981] usb 3-1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0^M [ 2.513951] PGD 0 P4D 0 ^M [ 2.513951] Oops: 0000 [#1] PREEMPT SMP NOPTI^M [ 2.513951] CPU: 95 PID: 868 Comm: modprobe Not tainted 6.3.0-rc3-next-20230323-next-20230323-814642c #1^M [ 2.513951] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS 2.8.5 08/18/2022^M [ 2.513951] RIP: 0010:vma_merge+0xe4/0xc50^M [ 2.513951] Code: 0f 84 59 08 00 00 48 8b 45 88 49 39 47 08 0f 84 27 02 00 00 4d 85 f6 74 0a 4d 39 6e 08 0f 84 a7 01 00 00 31 c9 48 85 db 74 79 <48> 8b b3 a0 00 00 00 4c 39 e6 0f 84 98 00 00 00 4c 89 e7 88 8d 4f^M [ 2.577270] hub 3-1.1:1.0: USB hub found^M [ 2.513951] RSP: 0018:ffffb5e98ec47c88 EFLAGS: 00010206^M [ 2.513951] RAX: 0000000000000000 RBX: 0000000008100071 RCX: 0000000000000000^M [ 2.513951] RDX: ffff9857508c2c30 RSI: 0000000000100001 RDI: ffff985754569870^M [ 2.513951] RBP: ffffb5e98ec47d40 R08: 00000000000001bb R09: 0000000000000000^M [ 2.513951] R10: 0000000000000000 R11: ffff98575452ef0c R12: 0000000000000000^M [ 2.513951] R13: 00007f8be41f4000 R14: ffff985754569870 R15: ffff985754568958^M [ 2.594281] hub 3-1.1:1.0: 4 ports detected^M [ 2.513951] FS: 00007f8be4f64740(0000) GS:ffff987640dc0000(0000) knlGS:0000000000000000^M [ 2.513951] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M [ 2.513951] CR2: 0000000008100111 CR3: 0008000114566002 CR4: 0000000000770ee0^M [ 2.513951] PKRU: 55555554^M [ 2.620194] Call Trace:^M [ 2.620194] <TASK>^M [ 2.620194] mprotect_fixup+0x13e/0x320^M [ 2.620194] do_mprotect_pkey+0x43c/0x4d0^M [ 2.620194] ? do_user_addr_fault+0x34f/0x8e0^M [ 2.620194] ? exit_to_user_mode_prepare+0x32/0x190^M [ 2.620194] __x64_sys_mprotect+0x23/0x30^M [ 2.688176] usb 3-1.4: new high-speed USB device number 4 using xhci_hcd^M [ 2.620194] do_syscall_64+0x3e/0x90^M [ 2.620194] entry_SYSCALL_64_after_hwframe+0x72/0xdc^M [ 2.620194] RIP: 0033:0x7f8be4d40ebb^M [ 2.620194] Code: 73 01 c3 48 8d 0d 2d e3 22 00 f7 d8 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa b8 0a 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8d 0d fd e2 22 00 f7 d8 89 01 48 83^M [ 2.620194] RSP: 002b:00007fff1da6b298 EFLAGS: 00000206 ORIG_RAX: 000000000000000a^M [ 2.620194] RAX: ffffffffffffffda RBX: 00007f8be4f6a3d0 RCX: 00007f8be4d40ebb^M [ 2.620194] RDX: 0000000000000001 RSI: 0000000000004000 RDI: 00007f8be41f4000^M [ 2.620194] RBP: 00007fff1da6b3c0 R08: 0000000000000000 R09: 00007f8be3e39000^M [ 2.620194] R10: 00007f8be4f6a3d0 R11: 0000000000000206 R12: 0000000000000000^M [ 2.620194] R13: 00007f8be41f8018 R14: 00007f8be4f6a3d0 R15: 00007f8be4f6a3d0^M [ 2.620194] </TASK>^M [ 2.620194] Modules linked in:^M [ 2.620194] CR2: 0000000008100111^M [ 2.620194] ---[ end trace 0000000000000000 ]---^M [ 2.620194] pstore: backend (erst) writing error (-28)^M [ 2.854021] usb 3-1.4: New USB device found, idVendor=1604, idProduct=10c0, bcdDevice= 0.00^M [ 2.620194] RIP: 0010:vma_merge+0xe4/0xc50^M
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
crash log on arm64:
[ 19.281223] Unable to handle kernel paging request at virtual address 0000000000100111 [ 19.289189] Mem abort info: [ 19.291995] ESR = 0x0000000096000006 [ 19.295757] EC = 0x25: DABT (current EL), IL = 32 bits [ 19.301086] SET = 0, FnV = 0 [ 19.304151] EA = 0, S1PTW = 0 [ 19.307302] FSC = 0x06: level 2 translation fault [ 19.312194] Data abort info: [ 19.315083] ISV = 0, ISS = 0x00000006 [ 19.318930] CM = 0, WnR = 0 [ 19.321901] user pgtable: 4k pages, 48-bit VAs, pgdp=00000008a23c5000 [ 19.328374] [0000000000100111] pgd=08000008a14c5003, p4d=08000008a14c5003, pud=08000008a14c6003, pmd=0000000000000000 [ 19.339037] Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP [ 19.345315] Modules linked in: [ 19.348373] CPU: 2 PID: 1 Comm: init Not tainted 6.3.0-rc3-next-20230323 #1
next-20230323 seems to contain v2 of Lorenzo's vma_merge cleanups
[ 19.355347] Hardware name: ARM Juno development board (r2) (DT) [ 19.361273] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 19.368246] pc : vma_merge (mm/mmap.c:952 (discriminator 1))
And this is a line involving 'next' and Liam pointed out a possibly unitialized next in v2, so that's probably it. Andrew replaced it with a fixed version so it should make its way to -next as well.
Cool, hopefully it is fixed tomorrow :)
Thanks will keep an eye on it. Srikanth Aithal
Cheers, Conor.
[ 19.371917] lr : vma_merge (mm/mmap.c:945) [ 19.375670] sp : ffff80000b37bb40 [ 19.378985] x29: ffff80000b37bb40 x28: ffff000820c0ff20 x27: 0000000000000000 [ 19.386139] x26: ffff000820c17210 x25: ffff000800bfac00 x24: 0000ffff8e8b7000 [ 19.393293] x23: 0000000000100071 x22: ffff000800898d80 x21: 0000000000100071 [ 19.400446] x20: ffff80000b37bd18 x19: 0000ffff8e8ba000 x18: ffff80000b37bd18 [ 19.407599] x17: 0000000000000000 x16: ffff8000099a58c8 x15: 0000ffff8e9aefff [ 19.414752] x14: 0000ffff8e8b7000 x13: 1fffe001041bb361 x12: ffff80000b37baf8 [ 19.421905] x11: ffff000822473400 x10: ffff000820dd9b08 x9 : ffff80000830fc64 [ 19.429057] x8 : 0000ffff8e8b7000 x7 : 0000ffff8e8b7000 x6 : ffff000820dd9b50 [ 19.436210] x5 : ffff000820c0ff20 x4 : 0000000000000187 x3 : ffff000800bfac00 [ 19.443363] x2 : 0000000000000000 x1 : 0000000000100071 x0 : 0000000000000000 [ 19.450515] Call trace: [ 19.452960] vma_merge (mm/mmap.c:952 (discriminator 1)) [ 19.456279] mprotect_fixup (mm/mprotect.c:676) [ 19.460034] do_mprotect_pkey.constprop.0 (mm/mprotect.c:862) [ 19.465094] __arm64_sys_mprotect (mm/mprotect.c:880) [ 19.469283] invoke_syscall (arch/arm64/include/asm/current.h:19 arch/arm64/kernel/syscall.c:57) [ 19.473041] el0_svc_common (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/syscall.c:150) [ 19.476796] do_el0_svc (arch/arm64/kernel/syscall.c:194) [ 19.480117] el0_svc (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:638) [ 19.483177] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:656) [ 19.487454] el0t_64_sync (arch/arm64/kernel/entry.S:591) [ 19.491123] Code: eb18001f 54000800 52800002 b40004d7 (f94052e1) All code ======== 0:* 1f (bad) <-- trapping instruction 1: 00 18 add %bl,(%rax) 3: eb 00 jmp 0x5 5: 08 00 or %al,(%rax) 7: 54 push %rsp 8: 02 00 add (%rax),%al a: 80 52 d7 04 adcb $0x4,-0x29(%rdx) e: 00 .byte 0x0 f: b4 e1 mov $0xe1,%ah 11: 52 push %rdx 12: 40 f9 rex stc
Code starting with the faulting instruction
0: e1 52 loope 0x54 2: 40 f9 rex stc
Looks like an x86 decodecode of arm64 code :) calling a wrong objdump or something?
[ 19.497226] ---[ end trace 0000000000000000 ]--- [ 19.501883] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 19.509551] SMP: stopping secondary CPUs [ 19.513665] Kernel Offset: disabled [ 19.517152] CPU features: 0x400002,0c3c0400,0000421b [ 19.522123] Memory Limit: none [ 19.525181] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
metadata: git_ref: master git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git_sha: 7c4a254d78f89546d0e74a40617ef24c6151c8d1 git_describe: next-20230323 kernel_version: 6.3.0-rc3 kernel-config: https://storage.tuxsuite.com/public/linaro/lkft/builds/2NOjwfRUa0fjWWZBWCUG4... build-url: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next/-/pipelines/815177945 artifact-location: https://storage.tuxsuite.com/public/linaro/lkft/builds/2NOjwfRUa0fjWWZBWCUG4... toolchain: gcc-11
-- Linaro LKFT https://lkft.linaro.org