On 20/10/23 17:57, Anders Roxell wrote:
On Fri, 20 Oct 2023 at 11:02, Arnd Bergmann arnd@arndb.de wrote:
On Fri, Oct 20, 2023, at 09:48, Naresh Kamboju wrote:
On Fri, 20 Oct 2023 at 12:07, Arnd Bergmann arnd@arndb.de wrote:
On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call it as compat mode boot testing. Recently it started to failed to get login prompt.
We have not seen any kernel crash logs.
Anders, bisection is pointing to first bad commit, 546694b8f658 autofs: add autofs_parse_fd()
Reported-by: Linux Kernel Functional Testing lkft@linaro.org Reported-by: Anders Roxell anders.roxell@linaro.org
I tried to find something in that commit that would be different in compat mode, but don't see anything at all -- this appears to be just a simple refactoring of the code, unlike the commits that immediately follow it and that do change the mount interface.
Unfortunately this makes it impossible to just revert the commit on top of linux-next. Can you double-check your bisection by testing 546694b8f658 and the commit before it again?
I will try your suggested ways.
Is this information helpful ? Linux-next the regression started happening from next-20230925.
GOOD: next-20230925 BAD: next-20230926
$ git log --oneline next-20230925..next-20230926 -- fs/autofs/ dede367149c4 autofs: fix protocol sub version setting e6ec453bd0f0 autofs: convert autofs to use the new mount api 1f50012d9c63 autofs: validate protocol version 9b2731666d1d autofs: refactor parse_options() 7efd93ea790e autofs: reformat 0pt enum declaration a7467430b4de autofs: refactor super block info init 546694b8f658 autofs: add autofs_parse_fd() bc69fdde0ae1 autofs: refactor autofs_prepare_pipe()
Right, and it looks like the bottom five patches of this should be fairly harmless as they only try to move code around in preparation of the later changes, and even the other ones should not cause any difference between a 32-bit or a 64-bit /sbin/mount binary.
If the native (full 64-bit or full 32-bit) test run still works with the same version, there may be some other difference here.
What are the exact mount options you pass to autofs in your fstab?
mount output shows like this, systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=1421)
This is only the binfmt-misc mount, which should not prevent your rootfs from getting mounted, but it's possible that failure to mount this prevents you from running 32-bit binaries.
I see this comes from the "proc-sys-fs-binfmt_misc.automount" service in systemd. I see this is defined in https://github.com/systemd/systemd/blob/main/units/proc-sys-fs-binfmt_misc.a... but I don't know exactly what its purpose is here. On a 64-bit system, you normally use compat_binfmt_elf.ko to run 32-bit binaries, and this does not require any specific mount points. Alternatively, you could use binfmt_misc.ko with the procfs mount to configure running arbitrary binary formats such as arm32 on x86_64 with qemu-user emulation.
I double-checked your rootfs image from https://storage.tuxboot.com/debian/bookworm/i386/rootfs.ext4.xz to ensure that this indeed contains i386 executables rather than arm32 ones, and that is all fine.
I also see in your log file at https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230926/tes... that it is running the i386 binaries from the rootfs, but it does get stuck soon after trying to set up the binfmt-misc mount at the end of the log:
[[0;32m OK [0m] Reached target [0;1;39mlocal-fs.target[0m - Local File Systems. Starting [0;1;39msystemd-binfmt.se…et Up Additional Binary Formats... Starting [0;1;39msystemd-tmpfiles-… Volatile Files and Directories... Starting [0;1;39msystemd-udevd.ser…ger for Device Events and Files... [ 15.869404] igb 0000:01:00.0 eno1: renamed from eth0 (while UP) [ 15.883753] igb 0000:02:00.0 eno2: renamed from eth1 [ 20.053885] (udev-worker) (175) used greatest stack depth: 12416 bytes left quit
I'm a bit out of ideas at that point, my best guess now is that your bisection points to something in autofs that makes it hang while setting up autofs, but that neither autofs nor binfmt-misc are actually being used otherwise.
Maybe you can try to modify your rootfs to disable or remove the systemd-binfmt.service, to confirm that autofs is not actually needed here but does cause the crash?
I removed systemd-binfmt.service from the rootfs and booted 546694b8f658 ("autofs: add autofs_parse_fd()") and now it booted fine.
I don't suppose you could try an automount after the boot is completed?
It seems a bit odd, it must be some sort of object lifetime inconsistency
but if that was the case automounts would at least fail to function mmm ...
Ian