The QEMU-ARMv7 boot has failed with the Linux next-20241017 tag. The boot log is incomplete, and no kernel crash was detected. However, the system did not proceed far enough to reach the login prompt.
Please find the incomplete boot log links below for your reference. The Qemu version is 9.0.2. The arm devices TI beaglebone x15 boot pass.
This is always reproducible. First seen on Linux next-20241017 tag. Good: next-20241016 Bad: next-20241017
qemu-armv7: boot: * clang-19-lkftconfig * gcc-13-lkftconfig * clang-nightly-lkftconfig
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Boot log: ------- [ 0.000000] Booting Linux on physical CPU 0x0 [ 0.000000] Linux version 6.12.0-rc3-next-20241017 (tuxmake@tuxmake) (arm-linux-gnueabihf-gcc (Debian 13.3.0-5) 13.3.0, GNU ld (GNU Binutils for Debian) 2.43.1) #1 SMP @1729156545 [ 0.000000] CPU: ARMv7 Processor [414fc0f0] revision 0 (ARMv7), cr=10c5387d [ 0.000000] CPU: div instructions available: patching division code [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache [ 0.000000] OF: fdt: Machine model: linux,dummy-virt [ 0.000000] random: crng init done [ 0.000000] earlycon: pl11 at MMIO 0x09000000 (options '') [ 0.000000] printk: legacy bootconsole [pl11] enabled [ 0.000000] Memory policy: Data cache writealloc [ 0.000000] efi: UEFI not found. [ 0.000000] cma: Size (0x04000000) of region at 0x00000000 exceeds limit (0x00000000) [ 0.000000] cma: Failed to reserve 64 MiB on node -1
<nothing after this>
Boot log link, ----- - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20241017/tes... - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20241017/tes...
Build images: ------ - https://storage.tuxsuite.com/public/linaro/lkft/tests/2nYi2nidfMq35VigDlxJbl...
Steps to reproduce via qemu: ---------------- /usr/bin/qemu-system-arm -cpu cortex-a15 \ -machine virt,gic-version=3 \ -nographic -nic none -m 4G -monitor \ none -no-reboot -smp 2 \ -kernel zImage \ -append "console=ttyAMA0,115200 rootwait root=/dev/vda debug verbose console_msg_format=syslog systemd.log_level=warning rw earlycon" -drive file=debian_trixie_armhf_rootfs.ext4,if=none,format=raw,id=hd0 \ -device virtio-blk-device,drive=hd0
Steps to reproduce with tuxrun reproducer: --------------- - https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/2nYi2nidfMq...
Boot history compare link: ------------------------ - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20241017/tes...
metadata: ---- git describe: next-20241017 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git git sha: 7df1e7189cecb6965ce672e820a5ec6cf499b65b kernel config: https://storage.tuxsuite.com/public/linaro/lkft/tests/2nYi2nidfMq35VigDlxJbl... build url: https://storage.tuxsuite.com/public/linaro/lkft/tests/2nYi2nidfMq35VigDlxJbl... toolchain: clang-19, gcc-13 and clang-nightly config: lkftconfig arch: arm
-- Linaro LKFT https://lkft.linaro.org
Naresh Kamboju naresh.kamboju@linaro.org writes:
The QEMU-ARMv7 boot has failed with the Linux next-20241017 tag. The boot log is incomplete, and no kernel crash was detected. However, the system did not proceed far enough to reach the login prompt.
Please find the incomplete boot log links below for your reference. The Qemu version is 9.0.2. The arm devices TI beaglebone x15 boot pass.
This is always reproducible. First seen on Linux next-20241017 tag. Good: next-20241016 Bad: next-20241017
qemu-armv7: boot: * clang-19-lkftconfig * gcc-13-lkftconfig * clang-nightly-lkftconfig
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Boot log:
[ 0.000000] Booting Linux on physical CPU 0x0 [ 0.000000] Linux version 6.12.0-rc3-next-20241017 (tuxmake@tuxmake) (arm-linux-gnueabihf-gcc (Debian 13.3.0-5) 13.3.0, GNU ld (GNU Binutils for Debian) 2.43.1) #1 SMP @1729156545 [ 0.000000] CPU: ARMv7 Processor [414fc0f0] revision 0 (ARMv7), cr=10c5387d [ 0.000000] CPU: div instructions available: patching division code [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache [ 0.000000] OF: fdt: Machine model: linux,dummy-virt [ 0.000000] random: crng init done [ 0.000000] earlycon: pl11 at MMIO 0x09000000 (options '') [ 0.000000] printk: legacy bootconsole [pl11] enabled [ 0.000000] Memory policy: Data cache writealloc [ 0.000000] efi: UEFI not found. [ 0.000000] cma: Size (0x04000000) of region at 0x00000000 exceeds limit (0x00000000) [ 0.000000] cma: Failed to reserve 64 MiB on node -1
Is this a highmem related thing. Passing -m 2G allows it to get further and 4G is obviously at the limit of 32 bit?
On Fri, 18 Oct 2024 at 12:35, Naresh Kamboju naresh.kamboju@linaro.org wrote:
The QEMU-ARMv7 boot has failed with the Linux next-20241017 tag. The boot log is incomplete, and no kernel crash was detected. However, the system did not proceed far enough to reach the login prompt.
Please find the incomplete boot log links below for your reference. The Qemu version is 9.0.2. The arm devices TI beaglebone x15 boot pass.
This is always reproducible. First seen on Linux next-20241017 tag. Good: next-20241016 Bad: next-20241017
qemu-armv7: boot: * clang-19-lkftconfig * gcc-13-lkftconfig * clang-nightly-lkftconfig
Anders bisected this boot regressions and found, # first bad commit: [efe8419ae78d65e83edc31aad74b605c12e7d60c] vdso: Introduce vdso/page.h
We are investigating the reason for boot failure due to this commit.
Anyone have noticed a similar qemu-arm boot regressions with the Linux next-20241017 and next-20241018 tags ?
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Boot log:
[ 0.000000] Booting Linux on physical CPU 0x0 [ 0.000000] Linux version 6.12.0-rc3-next-20241017 (tuxmake@tuxmake) (arm-linux-gnueabihf-gcc (Debian 13.3.0-5) 13.3.0, GNU ld (GNU Binutils for Debian) 2.43.1) #1 SMP @1729156545 [ 0.000000] CPU: ARMv7 Processor [414fc0f0] revision 0 (ARMv7), cr=10c5387d [ 0.000000] CPU: div instructions available: patching division code [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache [ 0.000000] OF: fdt: Machine model: linux,dummy-virt [ 0.000000] random: crng init done [ 0.000000] earlycon: pl11 at MMIO 0x09000000 (options '') [ 0.000000] printk: legacy bootconsole [pl11] enabled [ 0.000000] Memory policy: Data cache writealloc [ 0.000000] efi: UEFI not found. [ 0.000000] cma: Size (0x04000000) of region at 0x00000000 exceeds limit (0x00000000) [ 0.000000] cma: Failed to reserve 64 MiB on node -1
<nothing after this>
Boot log link,
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20241017/tes...
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20241017/tes...
Build images:
Steps to reproduce via qemu:
/usr/bin/qemu-system-arm -cpu cortex-a15 \ -machine virt,gic-version=3 \ -nographic -nic none -m 4G -monitor \ none -no-reboot -smp 2 \ -kernel zImage \ -append "console=ttyAMA0,115200 rootwait root=/dev/vda debug verbose console_msg_format=syslog systemd.log_level=warning rw earlycon" -drive file=debian_trixie_armhf_rootfs.ext4,if=none,format=raw,id=hd0 \ -device virtio-blk-device,drive=hd0
Steps to reproduce with tuxrun reproducer:
Boot history compare link:
metadata:
git describe: next-20241017 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git git sha: 7df1e7189cecb6965ce672e820a5ec6cf499b65b kernel config: https://storage.tuxsuite.com/public/linaro/lkft/tests/2nYi2nidfMq35VigDlxJbl... build url: https://storage.tuxsuite.com/public/linaro/lkft/tests/2nYi2nidfMq35VigDlxJbl... toolchain: clang-19, gcc-13 and clang-nightly config: lkftconfig arch: arm
-- Linaro LKFT https://lkft.linaro.org
- Naresh
On 10/20/24 10:39, Naresh Kamboju wrote:
On Fri, 18 Oct 2024 at 12:35, Naresh Kamboju naresh.kamboju@linaro.org wrote:
The QEMU-ARMv7 boot has failed with the Linux next-20241017 tag. The boot log is incomplete, and no kernel crash was detected. However, the system did not proceed far enough to reach the login prompt.
Please find the incomplete boot log links below for your reference. The Qemu version is 9.0.2. The arm devices TI beaglebone x15 boot pass.
This is always reproducible. First seen on Linux next-20241017 tag. Good: next-20241016 Bad: next-20241017
qemu-armv7: boot: * clang-19-lkftconfig * gcc-13-lkftconfig * clang-nightly-lkftconfig
Anders bisected this boot regressions and found, # first bad commit: [efe8419ae78d65e83edc31aad74b605c12e7d60c] vdso: Introduce vdso/page.h
We are investigating the reason for boot failure due to this commit.
Probably fixed on qemu master with
commit 67d762e716a7127ecc114e9708254316dd521911 Author: Ard Biesheuvel ardb@kernel.org Date: Fri Sep 27 09:10:51 2024 +0200
target/arm: Avoid target_ulong for physical address lookups
r~
On Sun, Oct 20, 2024, at 17:39, Naresh Kamboju wrote:
On Fri, 18 Oct 2024 at 12:35, Naresh Kamboju naresh.kamboju@linaro.org wrote:
The QEMU-ARMv7 boot has failed with the Linux next-20241017 tag. The boot log is incomplete, and no kernel crash was detected. However, the system did not proceed far enough to reach the login prompt.
Anders bisected this boot regressions and found, # first bad commit: [efe8419ae78d65e83edc31aad74b605c12e7d60c] vdso: Introduce vdso/page.h
We are investigating the reason for boot failure due to this commit.
Anders and I did the analysis on this, the problem turned out to be the early_init_dt_add_memory_arch() function in drivers/of/fdt.c, which does bitwise operations on PAGE_MASK with a 'u64' instead of phys_addr_t:
void __init __weak early_init_dt_add_memory_arch(u64 base, u64 size) { const u64 phys_offset = MIN_MEMBLOCK_ADDR;
if (size < PAGE_SIZE - (base & ~PAGE_MASK)) { pr_warn("Ignoring memory block 0x%llx - 0x%llx\n", base, base + size); return; }
if (!PAGE_ALIGNED(base)) { size -= PAGE_SIZE - (base & ~PAGE_MASK); base = PAGE_ALIGN(base); }
On non-LPAE arm32, this broke the existing behavior for large 32-bit memory sizes. The obvious fix is to change back the PAGE_MASK definition for 32-bit arm to a signed number.
mips32, ppc32 and hexagon had the same definition as well, so I think we should change at least those in order to restore the previous behavior in case they are affected by the same bug (or a different one).
x86-32 and arc git flipped the other way by the patch, from unsigned to signed, when CONFIG_ARC_HAS_PAE40 or CONFIG_X86_PAE are set. I think we should keep the 'signed' behavior as this was a bugfix by itself, but we may want to change arc and x86-32 with short phys_addr_t the same way for consistency.
On csky, m68k, microblaze, nios2, openrisc, parisc32, riscv32, sh, sparc32, um and xtensa, we've always used the 'unsigned' PAGE_MASK, and there is no 64-bit phys_addr_t, so I would lean towards staying with 'unsigned' in order to not introduce a regression. Alternatively we could choose to go with the 'signed' version on all 32-bit architectures unconditionally for consistency. Any preferences?
Arnd