The QEMU-arm64 boot has failed with the Linux next-20241017 tag. The boot log is incomplete, and no kernel crash was detected. However, the system did not proceed far enough to reach the login prompt.
Please find the incomplete boot log links below for your reference. The Qemu version is 9.0.2. The arm64 devices boot pass.
This is always reproducible. First seen on Linux next-20241017 tag. Good: next-20241016 Bad: next-20241017
qemu-arm64-protected: boot: * clang-19-lkftconfig * gcc-13-lkftconfig * clang-nightly-lkftconfig
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Boot log: --------- [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x000f0510] [ 0.000000] Linux version 6.12.0-rc3-next-20241017 (tuxmake@tuxmake) (Debian clang version 19.1.2 (++20241001023520+d5498c39fe6a-1~exp1~20241001143639.51), Debian LLD 19.1.2) #1 SMP PREEMPT @1729156545 [ 0.000000] KASLR enabled [ 0.000000] random: crng init done [ 0.000000] Machine model: linux,dummy-virt [ 0.000000] efi: UEFI not found. [ 0.000000] Capping linear region to 51 bits for KVM in nVHE mode on LVA capable hardware. ... [ 0.000000] Kernel command line: console=ttyAMA0,115200 rootwait root=/dev/vda debug verbose console_msg_format=syslog systemd.log_level=warning rw kvm-arm.mode=protected earlycon ... <6>[ 0.305549] SME: maximum available vector length 256 bytes per vector <6>[ 0.306214] SME: default vector length 32 bytes per vector ** ERROR:target/arm/internals.h:923:regime_is_user: code should not be reached Bail out! ERROR:target/arm/internals.h:923:regime_is_user: code should not be reached <nothing after this>
Boot failed log links, ------------- dmesg log: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20241017/tes... test details: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20241017/tes...
Build image: ----------- - https://storage.tuxsuite.com/public/linaro/lkft/builds/2nYi294C2rkwmj8hWZ0Xn...
Steps to reproduce: ------------ /usr/bin/qemu-system-aarch64 -cpu max,pauth-impdef=on \ -machine virt,virtualization=on,gic-version=3,mte=on \ -nographic -nic none -m 4G -monitor none -no-reboot -smp 2 \ -kernel Image -append "console=ttyAMA0,115200 rootwait root=/dev/vda debug verbose console_msg_format=syslog systemd.log_level=warning rw kvm-arm.mode=protected earlycon" \ -drive file=arm64_rootfs.ext4,if=none,format=raw,id=hd0 -device virtio-blk-device,drive=hd0
metadata: ---- git describe: next-20241017 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git git sha: 7df1e7189cecb6965ce672e820a5ec6cf499b65b kernel config: https://storage.tuxsuite.com/public/linaro/lkft/builds/2nYi294C2rkwmj8hWZ0Xn... build url: https://storage.tuxsuite.com/public/linaro/lkft/builds/2nYi294C2rkwmj8hWZ0Xn... toolchain: clang-19, gcc-13 and clang-nightly config: defconfig arch: arm64
-- Linaro LKFT https://lkft.linaro.org
On Fri, Oct 18, 2024 at 12:56:01PM +0530, Naresh Kamboju wrote:
The QEMU-arm64 boot has failed with the Linux next-20241017 tag. The boot log is incomplete, and no kernel crash was detected. However, the system did not proceed far enough to reach the login prompt.
Please find the incomplete boot log links below for your reference. The Qemu version is 9.0.2. The arm64 devices boot pass.
This is always reproducible. First seen on Linux next-20241017 tag. Good: next-20241016 Bad: next-20241017
qemu-arm64-protected: boot: * clang-19-lkftconfig * gcc-13-lkftconfig * clang-nightly-lkftconfig
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Boot log:
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x000f0510] [ 0.000000] Linux version 6.12.0-rc3-next-20241017 (tuxmake@tuxmake) (Debian clang version 19.1.2 (++20241001023520+d5498c39fe6a-1~exp1~20241001143639.51), Debian LLD 19.1.2) #1 SMP PREEMPT @1729156545 [ 0.000000] KASLR enabled [ 0.000000] random: crng init done [ 0.000000] Machine model: linux,dummy-virt [ 0.000000] efi: UEFI not found. [ 0.000000] Capping linear region to 51 bits for KVM in nVHE mode on LVA capable hardware. ... [ 0.000000] Kernel command line: console=ttyAMA0,115200 rootwait root=/dev/vda debug verbose console_msg_format=syslog systemd.log_level=warning rw kvm-arm.mode=protected earlycon ... <6>[ 0.305549] SME: maximum available vector length 256 bytes per vector <6>[ 0.306214] SME: default vector length 32 bytes per vector ** ERROR:target/arm/internals.h:923:regime_is_user: code should not be reached Bail out! ERROR:target/arm/internals.h:923:regime_is_user: code should not be reached
<nothing after this>
Qemu bug. See this email from Peter:
https://lore.kernel.org/r/CAFEAcA8uJL1t2MDjaJL7u5oW4ns23_E+sk7987x4gAcs3dSZO...
Naresh Kamboju naresh.kamboju@linaro.org writes:
The QEMU-arm64 boot has failed with the Linux next-20241017 tag. The boot log is incomplete, and no kernel crash was detected. However, the system did not proceed far enough to reach the login prompt.
Please find the incomplete boot log links below for your reference. The Qemu version is 9.0.2. The arm64 devices boot pass.
Can confirm it also fails on the current master of QEMU:
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44 #1 0x00007ffff4a3ae9f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78 #2 0x00007ffff49ebfb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #3 0x00007ffff49d6472 in __GI_abort () at ./stdlib/abort.c:79 #4 0x00007ffff6e47ec8 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #5 0x00007ffff6ea7e1a in g_assertion_message_expr () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #6 0x0000555555f45732 in regime_is_user (env=0x555557f805f0, mmu_idx=ARMMMUIdx_E10_0) at ../../target/arm/internals.h:978 #7 0x0000555555f5b0f1 in aa64_va_parameters (env=0x555557f805f0, va=18446744073709551615, mmu_idx=ARMMMUIdx_E10_0, data=true, el1_is_aa32=false) at ../../target/arm/helper.c:12048 #8 0x0000555555f4e3e5 in tlbi_aa64_get_range (env=0x555557f805f0, mmuidx=ARMMMUIdx_E10_0, value=107271103184929) at ../../target/arm/helper.c:5214 #9 0x0000555555f4e5a4 in do_rvae_write (env=0x555557f805f0, value=107271103184929, idxmap=21, synced=true) at ../../target/arm/helper.c:5260 #10 0x0000555555f4e6d9 in tlbi_aa64_rvae1is_write (env=0x555557f805f0, ri=0x555557ffda90, value=107271103184929) at ../../target/arm/helper.c:5302 #11 0x00005555560553c8 in helper_set_cp_reg64 (env=0x555557f805f0, rip=0x555557ffda90, value=107271103184929) at ../../target/arm/tcg/op_helper.c:965 #12 0x00007fff60fc3939 in code_gen_buffer ()
while with:
./qemu-system-aarch64 \ -machine type=virt,virtualization=on,gic-version=3,mte=on \ -cpu max,pauth-impdef=on \ -smp 4 \ -accel tcg \ -serial mon:stdio \ -m 8192 \ -kernel /home/alex/lsrc/qemu.git/builds/all/Image -append "root=/dev/sda2 console=ttyAMA0 kvm-arm.mode=protected earlycon" \ -display none
Specifically kvm-arm.mode=protected has to be on.
With more detail I can see:
(gdb) p/x value $1 = 0x619000000021 (gdb) p *ri $2 = {name = 0x555557ffdb28 "TLBI_RVAALE1IS", cp = 19 '\023', crn = 8 '\b', crm = 2 '\002', opc0 = 1 '\001', opc1 = 0 '\000', opc2 = 7 '\a', state = ARM_CP_STATE_AA64, type = 1024, access = PL1_W, secure = ARM_CP_SECSTATE_NS, fgt = FGT_TLBIRVAALE1IS, nv2_redirect_offset = 0, opaque = 0x0, resetvalue = 0, fieldoffset = 0, bank_fieldoffsets = {0, 0}, accessfn = 0x555555f46703 <access_ttlbis>, readfn = 0x0, writefn = 0x555555f4e6a2 <tlbi_aa64_rvae1is_write>, raw_readfn = 0x0, raw_writefn = 0x0, resetfn = 0x0, orig_readfn = 0x0, orig_writefn = 0x0, orig_accessfn = 0x0}
It seems the asset fires because:
case ARMMMUIdx_E10_0: case ARMMMUIdx_E10_1: case ARMMMUIdx_E10_1_PAN: g_assert_not_reached();
But the function:
static int vae1_tlbmask(CPUARMState *env) { uint64_t hcr = arm_hcr_el2_eff(env); uint16_t mask;
if ((hcr & (HCR_E2H | HCR_TGE)) == (HCR_E2H | HCR_TGE)) { mask = ARMMMUIdxBit_E20_2 | ARMMMUIdxBit_E20_2_PAN | ARMMMUIdxBit_E20_0; } else { mask = ARMMMUIdxBit_E10_1 | ARMMMUIdxBit_E10_1_PAN | ARMMMUIdxBit_E10_0; } return mask; }
returns that while handling tlbi_aa64_rvae1is_write(). I don't have an Arm ARM handy with me in the airport. Peter/Richard can you check what the logic should be and if this is a QEMU bug or the kernel doing something it shouldn't?
On Fri, 18 Oct 2024 at 10:46, Alex Bennée alex.bennee@linaro.org wrote:
Naresh Kamboju naresh.kamboju@linaro.org writes:
The QEMU-arm64 boot has failed with the Linux next-20241017 tag. The boot log is incomplete, and no kernel crash was detected. However, the system did not proceed far enough to reach the login prompt.
Please find the incomplete boot log links below for your reference. The Qemu version is 9.0.2. The arm64 devices boot pass.
Can confirm it also fails on the current master of QEMU:
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44 #1 0x00007ffff4a3ae9f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78 #2 0x00007ffff49ebfb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #3 0x00007ffff49d6472 in __GI_abort () at ./stdlib/abort.c:79 #4 0x00007ffff6e47ec8 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #5 0x00007ffff6ea7e1a in g_assertion_message_expr () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #6 0x0000555555f45732 in regime_is_user (env=0x555557f805f0, mmu_idx=ARMMMUIdx_E10_0) at ../../target/arm/internals.h:978 #7 0x0000555555f5b0f1 in aa64_va_parameters (env=0x555557f805f0, va=18446744073709551615, mmu_idx=ARMMMUIdx_E10_0, data=true, el1_is_aa32=false) at ../../target/arm/helper.c:12048 #8 0x0000555555f4e3e5 in tlbi_aa64_get_range (env=0x555557f805f0, mmuidx=ARMMMUIdx_E10_0, value=107271103184929) at ../../target/arm/helper.c:5214
I investigated this yesterday when Catalin reported it and sent a patch: https://patchew.org/QEMU/20241017172331.822587-1-peter.maydell@linaro.org/
thanks -- PMM
Peter Maydell peter.maydell@linaro.org writes:
On Fri, 18 Oct 2024 at 10:46, Alex Bennée alex.bennee@linaro.org wrote:
Naresh Kamboju naresh.kamboju@linaro.org writes:
The QEMU-arm64 boot has failed with the Linux next-20241017 tag. The boot log is incomplete, and no kernel crash was detected. However, the system did not proceed far enough to reach the login prompt.
Please find the incomplete boot log links below for your reference. The Qemu version is 9.0.2. The arm64 devices boot pass.
Can confirm it also fails on the current master of QEMU:
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44 #1 0x00007ffff4a3ae9f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78 #2 0x00007ffff49ebfb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #3 0x00007ffff49d6472 in __GI_abort () at ./stdlib/abort.c:79 #4 0x00007ffff6e47ec8 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #5 0x00007ffff6ea7e1a in g_assertion_message_expr () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #6 0x0000555555f45732 in regime_is_user (env=0x555557f805f0, mmu_idx=ARMMMUIdx_E10_0) at ../../target/arm/internals.h:978 #7 0x0000555555f5b0f1 in aa64_va_parameters (env=0x555557f805f0, va=18446744073709551615, mmu_idx=ARMMMUIdx_E10_0, data=true, el1_is_aa32=false) at ../../target/arm/helper.c:12048 #8 0x0000555555f4e3e5 in tlbi_aa64_get_range (env=0x555557f805f0, mmuidx=ARMMMUIdx_E10_0, value=107271103184929) at ../../target/arm/helper.c:5214
I investigated this yesterday when Catalin reported it and sent a patch: https://patchew.org/QEMU/20241017172331.822587-1-peter.maydell@linaro.org/
And here was I thinking I was being efficient ;-)