The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call it as compat mode boot testing. Recently it started to failed to get login prompt.
We have not seen any kernel crash logs.
Anders, bisection is pointing to first bad commit, 546694b8f658 autofs: add autofs_parse_fd()
Reported-by: Linux Kernel Functional Testing lkft@linaro.org Reported-by: Anders Roxell anders.roxell@linaro.org
steps to reproduce: ---------------- # To install tuxrun to your home directory at ~/.local/bin: # pip3 install -U --user tuxrun==0.49.2 # # Or install a deb/rpm depending on the running distribution # See https://tuxmake.org/install-deb/ or # https://tuxmake.org/install-rpm/ # # See https://tuxrun.org/ for complete documentation. #
tuxrun --runtime podman --device qemu-x86_64 --boot-args rw --kernel https://storage.tuxsuite.com/public/linaro/lkft/builds/2WyQyyM0OvXnnbI0d84HL... --modules https://storage.tuxsuite.com/public/linaro/lkft/builds/2WyQyyM0OvXnnbI0d84HL... --rootfs https://storage.tuxboot.com/debian/bookworm/i386/rootfs.ext4.xz --parameters SKIPFILE=skipfile-lkft.yaml --parameters SHARD_NUMBER=10 --parameters SHARD_INDEX=3 --image docker.io/linaro/tuxrun-dispatcher:v0.49.2 --tests ltp-cve --timeouts boot=15 --overlay https://storage.tuxboot.com/overlays/debian/bookworm/i386/ltp/20230929/ltp.t...
Please find related links to test and results comparison.
Links: - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20231019/tes... - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230926/tes... - https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/2WyR2MGbyTa... - https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/2WyR2MGbyTa... -- Linaro LKFT https://lkft.linaro.org
On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call it as compat mode boot testing. Recently it started to failed to get login prompt.
We have not seen any kernel crash logs.
Anders, bisection is pointing to first bad commit, 546694b8f658 autofs: add autofs_parse_fd()
Reported-by: Linux Kernel Functional Testing lkft@linaro.org Reported-by: Anders Roxell anders.roxell@linaro.org
I tried to find something in that commit that would be different in compat mode, but don't see anything at all -- this appears to be just a simple refactoring of the code, unlike the commits that immediately follow it and that do change the mount interface.
Unfortunately this makes it impossible to just revert the commit on top of linux-next. Can you double-check your bisection by testing 546694b8f658 and the commit before it again?
What are the exact mount options you pass to autofs in your fstab?
Arnd
On Fri, 20 Oct 2023 at 12:07, Arnd Bergmann arnd@arndb.de wrote:
On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call it as compat mode boot testing. Recently it started to failed to get login prompt.
We have not seen any kernel crash logs.
Anders, bisection is pointing to first bad commit, 546694b8f658 autofs: add autofs_parse_fd()
Reported-by: Linux Kernel Functional Testing lkft@linaro.org Reported-by: Anders Roxell anders.roxell@linaro.org
I tried to find something in that commit that would be different in compat mode, but don't see anything at all -- this appears to be just a simple refactoring of the code, unlike the commits that immediately follow it and that do change the mount interface.
Unfortunately this makes it impossible to just revert the commit on top of linux-next. Can you double-check your bisection by testing 546694b8f658 and the commit before it again?
I will try your suggested ways.
Is this information helpful ? Linux-next the regression started happening from next-20230925.
GOOD: next-20230925 BAD: next-20230926
$ git log --oneline next-20230925..next-20230926 -- fs/autofs/ dede367149c4 autofs: fix protocol sub version setting e6ec453bd0f0 autofs: convert autofs to use the new mount api 1f50012d9c63 autofs: validate protocol version 9b2731666d1d autofs: refactor parse_options() 7efd93ea790e autofs: reformat 0pt enum declaration a7467430b4de autofs: refactor super block info init 546694b8f658 autofs: add autofs_parse_fd() bc69fdde0ae1 autofs: refactor autofs_prepare_pipe()
What are the exact mount options you pass to autofs in your fstab?
mount output shows like this, systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=1421)
More information: ------------ + mkdir -p /scratch + mount /dev/disk/by-id/ata-TOSHIBA_MG03ACA100_37O9KGL0F /scratch [ 38.217507] EXT4-fs (sda): mounted filesystem 13d24d11-02a0-4cc0-842e-d339637e2564 r/w with ordered data mode. Quota mode: none. + echo mounted mounted + df -h Filesystem Size Used Avail Use% Mounted on 10.66.16.116:/var/lib/lava/dispatcher/tmp/6951354/extract-nfsrootfs-fsw9ebbf 559G 72G 487G 13% / devtmpfs 7.8G 0 7.8G 0% /dev tmpfs 7.8G 0 7.8G 0% /dev/shm tmpfs 3.2G 640K 3.2G 1% /run tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 1.6G 0 1.6G 0% /run/user/0 /dev/sda 916G 28K 870G 1% /scratch + mount 10.66.16.116:/var/lib/lava/dispatcher/tmp/6951354/extract-nfsrootfs-fsw9ebbf on / type nfs (rw,relatime,vers=3,rsize=4096,wsize=4096,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.66.16.116,mountvers=3,mountproto=tcp,local_lock=all,addr=10.66.16.116) devtmpfs on /dev type devtmpfs (rw,relatime,size=8165164k,nr_inodes=2041291,mode=755) proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime) selinuxfs on /sys/fs/selinux type selinuxfs (rw,nosuid,noexec,relatime) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) tmpfs on /run type tmpfs (rw,nosuid,nodev,size=3266844k,nr_inodes=819200,mode=755) tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k) cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot) systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=1421) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M) mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime) debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime) tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime) ramfs on /run/credentials/systemd-sysctl.service type ramfs (ro,nosuid,nodev,noexec,relatime,mode=700) ramfs on /run/credentials/systemd-sysusers.service type ramfs (ro,nosuid,nodev,noexec,relatime,mode=700) ramfs on /run/credentials/systemd-tmpfiles-setup-dev.service type ramfs (ro,nosuid,nodev,noexec,relatime,mode=700) ramfs on /run/credentials/systemd-tmpfiles-setup.service type ramfs (ro,nosuid,nodev,noexec,relatime,mode=700) binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,nosuid,nodev,noexec,relatime) tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=1633420k,nr_inodes=408355,mode=700) /dev/sda on /scratch type ext4 (rw,relatime)
Here is the bisect log for your reference,
# bad: [e3b18f7200f45d66f7141136c25554ac1e82009b] Add linux-next specific files for 20231013 # good: [d3b4075b173f033387b614297bb4d998cf22c8bd] drm/msm/dp: use correct lifetime device for devm_drm_bridge_add git bisect start 'next-20231013' 'd3b4075b173f033387b614297bb4d998cf22c8bd' # bad: [9949a257fbbef4bf21f77649d4585b6ba0d0abae] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux.git git bisect bad 9949a257fbbef4bf21f77649d4585b6ba0d0abae # good: [39f9dc8a2d8e789ea31e343070a3b51c3f7ae3a2] Merge branch 'xtensa-for-next' of git://github.com/jcmvbkbc/linux-xtensa.git git bisect good 39f9dc8a2d8e789ea31e343070a3b51c3f7ae3a2 # good: [4b32d7970b9865ad711a65094609a8c0ec9867a8] bcachefs: Change journal_io.c assertion to error message git bisect good 4b32d7970b9865ad711a65094609a8c0ec9867a8 # good: [79592709a731af69f33613490a7d88168fd52efb] bcachefs: Improve key_visible_in_snapshot() git bisect good 79592709a731af69f33613490a7d88168fd52efb # good: [7ea2bcad6a0d5d15801abc318738d7eec3164d0a] Merge branch 'linux-next' of git://git.linux-nfs.org/projects/anna/linux-nfs.git git bisect good 7ea2bcad6a0d5d15801abc318738d7eec3164d0a # bad: [be636e0fa5fdb395366fd73a078e959d2587c476] Merge branch 'vfs.super' into vfs.all git bisect bad be636e0fa5fdb395366fd73a078e959d2587c476 # good: [fea0e8fc7829dc85f82c8a1a8249630f6fb85553] fs: rename inode i_atime and i_mtime fields git bisect good fea0e8fc7829dc85f82c8a1a8249630f6fb85553 # bad: [18fa6d4d5ed5e6d219aeab9b92f602fa60020d95] Merge branch 'vfs.iov_iter' into vfs.all git bisect bad 18fa6d4d5ed5e6d219aeab9b92f602fa60020d95 # good: [13f8510ba3215d2fb2ee7d1c64c0d51827ac28bd] ovl: rely on SB_I_NOUMASK git bisect good 13f8510ba3215d2fb2ee7d1c64c0d51827ac28bd # good: [b5f0e20f444cd150121e0ce912ebd3f2dabd12bc] iov_iter, net: Move hash_and_copy_to_iter() to net/ git bisect good b5f0e20f444cd150121e0ce912ebd3f2dabd12bc # bad: [1f50012d9c63c690f25956239bd25d10236405f8] autofs: validate protocol version git bisect bad 1f50012d9c63c690f25956239bd25d10236405f8 # bad: [a7467430b4de0985b7d1de8f1e50f8dd47eb6c4a] autofs: refactor super block info init git bisect bad a7467430b4de0985b7d1de8f1e50f8dd47eb6c4a # bad: [546694b8f65807427a0104154abd8cdc633b36e3] autofs: add autofs_parse_fd() git bisect bad 546694b8f65807427a0104154abd8cdc633b36e3 # good: [bc69fdde0ae1aff595590d802b6ef39114f2b260] autofs: refactor autofs_prepare_pipe() git bisect good bc69fdde0ae1aff595590d802b6ef39114f2b260 # first bad commit: [546694b8f65807427a0104154abd8cdc633b36e3] autofs: add autofs_parse_fd()
- Naresh
On Fri, Oct 20, 2023, at 09:48, Naresh Kamboju wrote:
On Fri, 20 Oct 2023 at 12:07, Arnd Bergmann arnd@arndb.de wrote:
On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call it as compat mode boot testing. Recently it started to failed to get login prompt.
We have not seen any kernel crash logs.
Anders, bisection is pointing to first bad commit, 546694b8f658 autofs: add autofs_parse_fd()
Reported-by: Linux Kernel Functional Testing lkft@linaro.org Reported-by: Anders Roxell anders.roxell@linaro.org
I tried to find something in that commit that would be different in compat mode, but don't see anything at all -- this appears to be just a simple refactoring of the code, unlike the commits that immediately follow it and that do change the mount interface.
Unfortunately this makes it impossible to just revert the commit on top of linux-next. Can you double-check your bisection by testing 546694b8f658 and the commit before it again?
I will try your suggested ways.
Is this information helpful ? Linux-next the regression started happening from next-20230925.
GOOD: next-20230925 BAD: next-20230926
$ git log --oneline next-20230925..next-20230926 -- fs/autofs/ dede367149c4 autofs: fix protocol sub version setting e6ec453bd0f0 autofs: convert autofs to use the new mount api 1f50012d9c63 autofs: validate protocol version 9b2731666d1d autofs: refactor parse_options() 7efd93ea790e autofs: reformat 0pt enum declaration a7467430b4de autofs: refactor super block info init 546694b8f658 autofs: add autofs_parse_fd() bc69fdde0ae1 autofs: refactor autofs_prepare_pipe()
Right, and it looks like the bottom five patches of this should be fairly harmless as they only try to move code around in preparation of the later changes, and even the other ones should not cause any difference between a 32-bit or a 64-bit /sbin/mount binary.
If the native (full 64-bit or full 32-bit) test run still works with the same version, there may be some other difference here.
What are the exact mount options you pass to autofs in your fstab?
mount output shows like this, systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=1421)
This is only the binfmt-misc mount, which should not prevent your rootfs from getting mounted, but it's possible that failure to mount this prevents you from running 32-bit binaries.
I see this comes from the "proc-sys-fs-binfmt_misc.automount" service in systemd. I see this is defined in https://github.com/systemd/systemd/blob/main/units/proc-sys-fs-binfmt_misc.a... but I don't know exactly what its purpose is here. On a 64-bit system, you normally use compat_binfmt_elf.ko to run 32-bit binaries, and this does not require any specific mount points. Alternatively, you could use binfmt_misc.ko with the procfs mount to configure running arbitrary binary formats such as arm32 on x86_64 with qemu-user emulation.
I double-checked your rootfs image from https://storage.tuxboot.com/debian/bookworm/i386/rootfs.ext4.xz to ensure that this indeed contains i386 executables rather than arm32 ones, and that is all fine.
I also see in your log file at https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230926/tes... that it is running the i386 binaries from the rootfs, but it does get stuck soon after trying to set up the binfmt-misc mount at the end of the log:
[[0;32m OK [0m] Reached target [0;1;39mlocal-fs.target[0m - Local File Systems. Starting [0;1;39msystemd-binfmt.se…et Up Additional Binary Formats... Starting [0;1;39msystemd-tmpfiles-… Volatile Files and Directories... Starting [0;1;39msystemd-udevd.ser…ger for Device Events and Files... [ 15.869404] igb 0000:01:00.0 eno1: renamed from eth0 (while UP) [ 15.883753] igb 0000:02:00.0 eno2: renamed from eth1 [ 20.053885] (udev-worker) (175) used greatest stack depth: 12416 bytes left quit
I'm a bit out of ideas at that point, my best guess now is that your bisection points to something in autofs that makes it hang while setting up autofs, but that neither autofs nor binfmt-misc are actually being used otherwise.
Maybe you can try to modify your rootfs to disable or remove the systemd-binfmt.service, to confirm that autofs is not actually needed here but does cause the crash?
Arnd
On Fri, 20 Oct 2023 at 11:02, Arnd Bergmann arnd@arndb.de wrote:
On Fri, Oct 20, 2023, at 09:48, Naresh Kamboju wrote:
On Fri, 20 Oct 2023 at 12:07, Arnd Bergmann arnd@arndb.de wrote:
On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call it as compat mode boot testing. Recently it started to failed to get login prompt.
We have not seen any kernel crash logs.
Anders, bisection is pointing to first bad commit, 546694b8f658 autofs: add autofs_parse_fd()
Reported-by: Linux Kernel Functional Testing lkft@linaro.org Reported-by: Anders Roxell anders.roxell@linaro.org
I tried to find something in that commit that would be different in compat mode, but don't see anything at all -- this appears to be just a simple refactoring of the code, unlike the commits that immediately follow it and that do change the mount interface.
Unfortunately this makes it impossible to just revert the commit on top of linux-next. Can you double-check your bisection by testing 546694b8f658 and the commit before it again?
I will try your suggested ways.
Is this information helpful ? Linux-next the regression started happening from next-20230925.
GOOD: next-20230925 BAD: next-20230926
$ git log --oneline next-20230925..next-20230926 -- fs/autofs/ dede367149c4 autofs: fix protocol sub version setting e6ec453bd0f0 autofs: convert autofs to use the new mount api 1f50012d9c63 autofs: validate protocol version 9b2731666d1d autofs: refactor parse_options() 7efd93ea790e autofs: reformat 0pt enum declaration a7467430b4de autofs: refactor super block info init 546694b8f658 autofs: add autofs_parse_fd() bc69fdde0ae1 autofs: refactor autofs_prepare_pipe()
Right, and it looks like the bottom five patches of this should be fairly harmless as they only try to move code around in preparation of the later changes, and even the other ones should not cause any difference between a 32-bit or a 64-bit /sbin/mount binary.
If the native (full 64-bit or full 32-bit) test run still works with the same version, there may be some other difference here.
What are the exact mount options you pass to autofs in your fstab?
mount output shows like this, systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=1421)
This is only the binfmt-misc mount, which should not prevent your rootfs from getting mounted, but it's possible that failure to mount this prevents you from running 32-bit binaries.
I see this comes from the "proc-sys-fs-binfmt_misc.automount" service in systemd. I see this is defined in https://github.com/systemd/systemd/blob/main/units/proc-sys-fs-binfmt_misc.a... but I don't know exactly what its purpose is here. On a 64-bit system, you normally use compat_binfmt_elf.ko to run 32-bit binaries, and this does not require any specific mount points. Alternatively, you could use binfmt_misc.ko with the procfs mount to configure running arbitrary binary formats such as arm32 on x86_64 with qemu-user emulation.
I double-checked your rootfs image from https://storage.tuxboot.com/debian/bookworm/i386/rootfs.ext4.xz to ensure that this indeed contains i386 executables rather than arm32 ones, and that is all fine.
I also see in your log file at https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230926/tes... that it is running the i386 binaries from the rootfs, but it does get stuck soon after trying to set up the binfmt-misc mount at the end of the log:
[[0;32m OK [0m] Reached target [0;1;39mlocal-fs.target[0m - Local File Systems. Starting [0;1;39msystemd-binfmt.se…et Up Additional Binary Formats... Starting [0;1;39msystemd-tmpfiles-… Volatile Files and Directories... Starting [0;1;39msystemd-udevd.ser…ger for Device Events and Files... [ 15.869404] igb 0000:01:00.0 eno1: renamed from eth0 (while UP) [ 15.883753] igb 0000:02:00.0 eno2: renamed from eth1 [ 20.053885] (udev-worker) (175) used greatest stack depth: 12416 bytes left quit
I'm a bit out of ideas at that point, my best guess now is that your bisection points to something in autofs that makes it hang while setting up autofs, but that neither autofs nor binfmt-misc are actually being used otherwise.
Maybe you can try to modify your rootfs to disable or remove the systemd-binfmt.service, to confirm that autofs is not actually needed here but does cause the crash?
I removed systemd-binfmt.service from the rootfs and booted 546694b8f658 ("autofs: add autofs_parse_fd()") and now it booted fine.
Cheers, Anders
On 20/10/23 17:57, Anders Roxell wrote:
On Fri, 20 Oct 2023 at 11:02, Arnd Bergmann arnd@arndb.de wrote:
On Fri, Oct 20, 2023, at 09:48, Naresh Kamboju wrote:
On Fri, 20 Oct 2023 at 12:07, Arnd Bergmann arnd@arndb.de wrote:
On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call it as compat mode boot testing. Recently it started to failed to get login prompt.
We have not seen any kernel crash logs.
Anders, bisection is pointing to first bad commit, 546694b8f658 autofs: add autofs_parse_fd()
Reported-by: Linux Kernel Functional Testing lkft@linaro.org Reported-by: Anders Roxell anders.roxell@linaro.org
I tried to find something in that commit that would be different in compat mode, but don't see anything at all -- this appears to be just a simple refactoring of the code, unlike the commits that immediately follow it and that do change the mount interface.
Unfortunately this makes it impossible to just revert the commit on top of linux-next. Can you double-check your bisection by testing 546694b8f658 and the commit before it again?
I will try your suggested ways.
Is this information helpful ? Linux-next the regression started happening from next-20230925.
GOOD: next-20230925 BAD: next-20230926
$ git log --oneline next-20230925..next-20230926 -- fs/autofs/ dede367149c4 autofs: fix protocol sub version setting e6ec453bd0f0 autofs: convert autofs to use the new mount api 1f50012d9c63 autofs: validate protocol version 9b2731666d1d autofs: refactor parse_options() 7efd93ea790e autofs: reformat 0pt enum declaration a7467430b4de autofs: refactor super block info init 546694b8f658 autofs: add autofs_parse_fd() bc69fdde0ae1 autofs: refactor autofs_prepare_pipe()
Right, and it looks like the bottom five patches of this should be fairly harmless as they only try to move code around in preparation of the later changes, and even the other ones should not cause any difference between a 32-bit or a 64-bit /sbin/mount binary.
If the native (full 64-bit or full 32-bit) test run still works with the same version, there may be some other difference here.
What are the exact mount options you pass to autofs in your fstab?
mount output shows like this, systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=1421)
This is only the binfmt-misc mount, which should not prevent your rootfs from getting mounted, but it's possible that failure to mount this prevents you from running 32-bit binaries.
I see this comes from the "proc-sys-fs-binfmt_misc.automount" service in systemd. I see this is defined in https://github.com/systemd/systemd/blob/main/units/proc-sys-fs-binfmt_misc.a... but I don't know exactly what its purpose is here. On a 64-bit system, you normally use compat_binfmt_elf.ko to run 32-bit binaries, and this does not require any specific mount points. Alternatively, you could use binfmt_misc.ko with the procfs mount to configure running arbitrary binary formats such as arm32 on x86_64 with qemu-user emulation.
I double-checked your rootfs image from https://storage.tuxboot.com/debian/bookworm/i386/rootfs.ext4.xz to ensure that this indeed contains i386 executables rather than arm32 ones, and that is all fine.
I also see in your log file at https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230926/tes... that it is running the i386 binaries from the rootfs, but it does get stuck soon after trying to set up the binfmt-misc mount at the end of the log:
[[0;32m OK [0m] Reached target [0;1;39mlocal-fs.target[0m - Local File Systems. Starting [0;1;39msystemd-binfmt.se…et Up Additional Binary Formats... Starting [0;1;39msystemd-tmpfiles-… Volatile Files and Directories... Starting [0;1;39msystemd-udevd.ser…ger for Device Events and Files... [ 15.869404] igb 0000:01:00.0 eno1: renamed from eth0 (while UP) [ 15.883753] igb 0000:02:00.0 eno2: renamed from eth1 [ 20.053885] (udev-worker) (175) used greatest stack depth: 12416 bytes left quit
I'm a bit out of ideas at that point, my best guess now is that your bisection points to something in autofs that makes it hang while setting up autofs, but that neither autofs nor binfmt-misc are actually being used otherwise.
Maybe you can try to modify your rootfs to disable or remove the systemd-binfmt.service, to confirm that autofs is not actually needed here but does cause the crash?
I removed systemd-binfmt.service from the rootfs and booted 546694b8f658 ("autofs: add autofs_parse_fd()") and now it booted fine.
I don't suppose you could try an automount after the boot is completed?
It seems a bit odd, it must be some sort of object lifetime inconsistency
but if that was the case automounts would at least fail to function mmm ...
Ian
On 20/10/23 17:02, Arnd Bergmann wrote:
On Fri, Oct 20, 2023, at 09:48, Naresh Kamboju wrote:
On Fri, 20 Oct 2023 at 12:07, Arnd Bergmann arnd@arndb.de wrote:
On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call it as compat mode boot testing. Recently it started to failed to get login prompt.
We have not seen any kernel crash logs.
Anders, bisection is pointing to first bad commit, 546694b8f658 autofs: add autofs_parse_fd()
Reported-by: Linux Kernel Functional Testing lkft@linaro.org Reported-by: Anders Roxell anders.roxell@linaro.org
I tried to find something in that commit that would be different in compat mode, but don't see anything at all -- this appears to be just a simple refactoring of the code, unlike the commits that immediately follow it and that do change the mount interface.
Unfortunately this makes it impossible to just revert the commit on top of linux-next. Can you double-check your bisection by testing 546694b8f658 and the commit before it again?
I will try your suggested ways.
Is this information helpful ? Linux-next the regression started happening from next-20230925.
GOOD: next-20230925 BAD: next-20230926
$ git log --oneline next-20230925..next-20230926 -- fs/autofs/ dede367149c4 autofs: fix protocol sub version setting e6ec453bd0f0 autofs: convert autofs to use the new mount api 1f50012d9c63 autofs: validate protocol version 9b2731666d1d autofs: refactor parse_options() 7efd93ea790e autofs: reformat 0pt enum declaration a7467430b4de autofs: refactor super block info init 546694b8f658 autofs: add autofs_parse_fd() bc69fdde0ae1 autofs: refactor autofs_prepare_pipe()
Right, and it looks like the bottom five patches of this should be fairly harmless as they only try to move code around in preparation of the later changes, and even the other ones should not cause any difference between a 32-bit or a 64-bit /sbin/mount binary.
If the native (full 64-bit or full 32-bit) test run still works with the same version, there may be some other difference here.
What are the exact mount options you pass to autofs in your fstab?
mount output shows like this, systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=1421)
This is only the binfmt-misc mount, which should not prevent your rootfs from getting mounted, but it's possible that failure to mount this prevents you from running 32-bit binaries.
I see this comes from the "proc-sys-fs-binfmt_misc.automount" service in systemd. I see this is defined in https://github.com/systemd/systemd/blob/main/units/proc-sys-fs-binfmt_misc.a... but I don't know exactly what its purpose is here. On a 64-bit system, you normally use compat_binfmt_elf.ko to run 32-bit binaries, and this does not require any specific mount points. Alternatively, you could use binfmt_misc.ko with the procfs mount to configure running arbitrary binary formats such as arm32 on x86_64 with qemu-user emulation.
I double-checked your rootfs image from https://storage.tuxboot.com/debian/bookworm/i386/rootfs.ext4.xz to ensure that this indeed contains i386 executables rather than arm32 ones, and that is all fine.
I also see in your log file at https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230926/tes... that it is running the i386 binaries from the rootfs, but it does get stuck soon after trying to set up the binfmt-misc mount at the end of the log:
[[0;32m OK [0m] Reached target [0;1;39mlocal-fs.target[0m - Local File Systems. Starting [0;1;39msystemd-binfmt.se…et Up Additional Binary Formats... Starting [0;1;39msystemd-tmpfiles-… Volatile Files and Directories... Starting [0;1;39msystemd-udevd.ser…ger for Device Events and Files... [ 15.869404] igb 0000:01:00.0 eno1: renamed from eth0 (while UP) [ 15.883753] igb 0000:02:00.0 eno2: renamed from eth1 [ 20.053885] (udev-worker) (175) used greatest stack depth: 12416 bytes left quit
Were there any console log messages at the time the problem occurred?
I'm a bit out of ideas at that point, my best guess now is that your bisection points to something in autofs that makes it hang while setting up autofs, but that neither autofs nor binfmt-misc are actually being used otherwise.
Maybe you can try to modify your rootfs to disable or remove the systemd-binfmt.service, to confirm that autofs is not actually needed here but does cause the crash?
Arnd
On Fri, 20 Oct 2023 at 08:37, Arnd Bergmann arnd@arndb.de wrote:
On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call it as compat mode boot testing. Recently it started to failed to get login prompt.
We have not seen any kernel crash logs.
Anders, bisection is pointing to first bad commit, 546694b8f658 autofs: add autofs_parse_fd()
Reported-by: Linux Kernel Functional Testing lkft@linaro.org Reported-by: Anders Roxell anders.roxell@linaro.org
I tried to find something in that commit that would be different in compat mode, but don't see anything at all -- this appears to be just a simple refactoring of the code, unlike the commits that immediately follow it and that do change the mount interface.
Unfortunately this makes it impossible to just revert the commit on top of linux-next. Can you double-check your bisection by testing 546694b8f658 and the commit before it again?
I tried these two patches again: 546694b8f658 ("autofs: add autofs_parse_fd()") - doesn't boot bc69fdde0ae1 ("autofs: refactor autofs_prepare_pipe()") - boots
What are the exact mount options you pass to autofs in your fstab?
cat /etc/fstab # UNCONFIGURED FSTAB FOR BASE SYSTEM
Cheers, Anders
On Fri, Oct 20, 2023 at 11:55:57AM +0200, Anders Roxell wrote:
On Fri, 20 Oct 2023 at 08:37, Arnd Bergmann arnd@arndb.de wrote:
On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call it as compat mode boot testing. Recently it started to failed to get login prompt.
We have not seen any kernel crash logs.
Anders, bisection is pointing to first bad commit, 546694b8f658 autofs: add autofs_parse_fd()
Reported-by: Linux Kernel Functional Testing lkft@linaro.org Reported-by: Anders Roxell anders.roxell@linaro.org
I tried to find something in that commit that would be different in compat mode, but don't see anything at all -- this appears to be just a simple refactoring of the code, unlike the commits that immediately follow it and that do change the mount interface.
Unfortunately this makes it impossible to just revert the commit on top of linux-next. Can you double-check your bisection by testing 546694b8f658 and the commit before it again?
I tried these two patches again: 546694b8f658 ("autofs: add autofs_parse_fd()") - doesn't boot bc69fdde0ae1 ("autofs: refactor autofs_prepare_pipe()") - boots
One difference that I notice between those two patches is that we no long call autofs_prepare_pipe(). We just call autofs_check_pipe().
regards, dan carpenter
On Fri, Oct 20, 2023, at 12:45, Dan Carpenter wrote:
On Fri, Oct 20, 2023 at 11:55:57AM +0200, Anders Roxell wrote:
On Fri, 20 Oct 2023 at 08:37, Arnd Bergmann arnd@arndb.de wrote:
On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call it as compat mode boot testing. Recently it started to failed to get login prompt.
We have not seen any kernel crash logs.
Anders, bisection is pointing to first bad commit, 546694b8f658 autofs: add autofs_parse_fd()
Reported-by: Linux Kernel Functional Testing lkft@linaro.org Reported-by: Anders Roxell anders.roxell@linaro.org
I tried to find something in that commit that would be different in compat mode, but don't see anything at all -- this appears to be just a simple refactoring of the code, unlike the commits that immediately follow it and that do change the mount interface.
Unfortunately this makes it impossible to just revert the commit on top of linux-next. Can you double-check your bisection by testing 546694b8f658 and the commit before it again?
I tried these two patches again: 546694b8f658 ("autofs: add autofs_parse_fd()") - doesn't boot bc69fdde0ae1 ("autofs: refactor autofs_prepare_pipe()") - boots
One difference that I notice between those two patches is that we no long call autofs_prepare_pipe(). We just call autofs_check_pipe().
Indeed, so some of the f_flags end up being different. I assumed this was done intentionally, but it might be worth checking if the patch below makes any difference when the flags get put back the way they were. This is probably not the correct fix, but may help figure out what is going on. It should apply to anything from 546694b8f658 ("autofs: add autofs_parse_fd()") to the current linux-next:
--- a/fs/autofs/inode.c +++ b/fs/autofs/inode.c @@ -358,6 +358,11 @@ static int autofs_fill_super(struct super_block *s, struct fs_context *fc) pr_debug("pipe fd = %d, pgrp = %u\n", sbi->pipefd, pid_nr(sbi->oz_pgrp));
+ /* We want a packet pipe */ + sbi->pipe->f_flags |= O_DIRECT; + /* We don't expect -EAGAIN */ + sbi->pipe->f_flags &= ~O_NONBLOCK; + sbi->flags &= ~AUTOFS_SBI_CATATONIC;
/*
On 20/10/23 19:23, Arnd Bergmann wrote:
On Fri, Oct 20, 2023, at 12:45, Dan Carpenter wrote:
On Fri, Oct 20, 2023 at 11:55:57AM +0200, Anders Roxell wrote:
On Fri, 20 Oct 2023 at 08:37, Arnd Bergmann arnd@arndb.de wrote:
On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call it as compat mode boot testing. Recently it started to failed to get login prompt.
We have not seen any kernel crash logs.
Anders, bisection is pointing to first bad commit, 546694b8f658 autofs: add autofs_parse_fd()
Reported-by: Linux Kernel Functional Testing lkft@linaro.org Reported-by: Anders Roxell anders.roxell@linaro.org
I tried to find something in that commit that would be different in compat mode, but don't see anything at all -- this appears to be just a simple refactoring of the code, unlike the commits that immediately follow it and that do change the mount interface.
Unfortunately this makes it impossible to just revert the commit on top of linux-next. Can you double-check your bisection by testing 546694b8f658 and the commit before it again?
I tried these two patches again: 546694b8f658 ("autofs: add autofs_parse_fd()") - doesn't boot bc69fdde0ae1 ("autofs: refactor autofs_prepare_pipe()") - boots
One difference that I notice between those two patches is that we no long call autofs_prepare_pipe(). We just call autofs_check_pipe().
Indeed, so some of the f_flags end up being different. I assumed this was done intentionally, but it might be worth checking if the patch below makes any difference when the flags get put back the way they were. This is probably not the correct fix, but may help figure out what is going on. It should apply to anything from 546694b8f658 ("autofs: add autofs_parse_fd()") to the current linux-next:
--- a/fs/autofs/inode.c +++ b/fs/autofs/inode.c @@ -358,6 +358,11 @@ static int autofs_fill_super(struct super_block *s, struct fs_context *fc) pr_debug("pipe fd = %d, pgrp = %u\n", sbi->pipefd, pid_nr(sbi->oz_pgrp));
/* We want a packet pipe */
sbi->pipe->f_flags |= O_DIRECT;
/* We don't expect -EAGAIN */
sbi->pipe->f_flags &= ~O_NONBLOCK;
That makes sense, we do want a packet pipe and that does also mean
we don't want a non-blocking pipe, it will be interesting to see
if that makes a difference. It's been a long time since Linus
implemented that packet pipe and I can't remember now what the
case was that lead to it.
Ian
sbi->flags &= ~AUTOFS_SBI_CATATONIC;
/*
On 20/10/23 21:09, Ian Kent wrote:
On 20/10/23 19:23, Arnd Bergmann wrote:
On Fri, Oct 20, 2023, at 12:45, Dan Carpenter wrote:
On Fri, Oct 20, 2023 at 11:55:57AM +0200, Anders Roxell wrote:
On Fri, 20 Oct 2023 at 08:37, Arnd Bergmann arnd@arndb.de wrote:
On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call it as compat mode boot testing. Recently it started to failed to get login prompt.
We have not seen any kernel crash logs.
Anders, bisection is pointing to first bad commit, 546694b8f658 autofs: add autofs_parse_fd()
Reported-by: Linux Kernel Functional Testing lkft@linaro.org Reported-by: Anders Roxell anders.roxell@linaro.org
I tried to find something in that commit that would be different in compat mode, but don't see anything at all -- this appears to be just a simple refactoring of the code, unlike the commits that immediately follow it and that do change the mount interface.
Unfortunately this makes it impossible to just revert the commit on top of linux-next. Can you double-check your bisection by testing 546694b8f658 and the commit before it again?
I tried these two patches again: 546694b8f658 ("autofs: add autofs_parse_fd()") - doesn't boot bc69fdde0ae1 ("autofs: refactor autofs_prepare_pipe()") - boots
One difference that I notice between those two patches is that we no long call autofs_prepare_pipe(). We just call autofs_check_pipe().
Indeed, so some of the f_flags end up being different. I assumed this was done intentionally, but it might be worth checking if the patch below makes any difference when the flags get put back the way they were. This is probably not the correct fix, but may help figure out what is going on. It should apply to anything from 546694b8f658 ("autofs: add autofs_parse_fd()") to the current linux-next:
--- a/fs/autofs/inode.c +++ b/fs/autofs/inode.c @@ -358,6 +358,11 @@ static int autofs_fill_super(struct super_block *s, struct fs_context *fc) pr_debug("pipe fd = %d, pgrp = %u\n", sbi->pipefd, pid_nr(sbi->oz_pgrp)); + /* We want a packet pipe */ + sbi->pipe->f_flags |= O_DIRECT; + /* We don't expect -EAGAIN */ + sbi->pipe->f_flags &= ~O_NONBLOCK;
That makes sense, we do want a packet pipe and that does also mean
we don't want a non-blocking pipe, it will be interesting to see
if that makes a difference. It's been a long time since Linus
implemented that packet pipe and I can't remember now what the
case was that lead to it.
After thinking about this over the weekend I'm pretty sure my mistake
is dropping the call to autofs_prepare_pipe() without adding the tail
end of it into autofs_parse_fd().
To explain a bit of history which I'll include in the fix description.
During autofs v5 development I decided to stay with the existing usage
instead of changing to a packed structure for autofs <=> user space
communications which turned out to be a mistake on my part.
Problems arose and they were fixed by allowing for the 64 bit to 32 bit
size difference in the automount(8) code.
Along the way systemd started to use autofs and eventually encountered
this problem too. systemd refused to compensate for the length difference
insisting it be fixed in the kernel. Fortunately Linus implemented the
packetized pipe which resolved the problem in a straight forward and
simple way.
So I pretty sure that the cause of the problem is the inadvertent dropping
of the flags setting in autofs_fill_super() that Arnd spotted although I
don't think putting it in autofs_fill_super() is the right thing to do.
I'll produce a patch today which includes most of this explanation for
future travelers ...
Ian
sbi->flags &= ~AUTOFS_SBI_CATATONIC; /*
On 23/10/23 08:48, Ian Kent wrote:
On 20/10/23 21:09, Ian Kent wrote:
On 20/10/23 19:23, Arnd Bergmann wrote:
On Fri, Oct 20, 2023, at 12:45, Dan Carpenter wrote:
On Fri, Oct 20, 2023 at 11:55:57AM +0200, Anders Roxell wrote:
On Fri, 20 Oct 2023 at 08:37, Arnd Bergmann arnd@arndb.de wrote:
On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote: > The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit > rootfs we call > it as compat mode boot testing. Recently it started to failed to > get login > prompt. > > We have not seen any kernel crash logs. > > Anders, bisection is pointing to first bad commit, > 546694b8f658 autofs: add autofs_parse_fd() > > Reported-by: Linux Kernel Functional Testing lkft@linaro.org > Reported-by: Anders Roxell anders.roxell@linaro.org I tried to find something in that commit that would be different in compat mode, but don't see anything at all -- this appears to be just a simple refactoring of the code, unlike the commits that immediately follow it and that do change the mount interface.
Unfortunately this makes it impossible to just revert the commit on top of linux-next. Can you double-check your bisection by testing 546694b8f658 and the commit before it again?
I tried these two patches again: 546694b8f658 ("autofs: add autofs_parse_fd()") - doesn't boot bc69fdde0ae1 ("autofs: refactor autofs_prepare_pipe()") - boots
One difference that I notice between those two patches is that we no long call autofs_prepare_pipe(). We just call autofs_check_pipe().
Indeed, so some of the f_flags end up being different. I assumed this was done intentionally, but it might be worth checking if the patch below makes any difference when the flags get put back the way they were. This is probably not the correct fix, but may help figure out what is going on. It should apply to anything from 546694b8f658 ("autofs: add autofs_parse_fd()") to the current linux-next:
--- a/fs/autofs/inode.c +++ b/fs/autofs/inode.c @@ -358,6 +358,11 @@ static int autofs_fill_super(struct super_block *s, struct fs_context *fc) pr_debug("pipe fd = %d, pgrp = %u\n", sbi->pipefd, pid_nr(sbi->oz_pgrp)); + /* We want a packet pipe */ + sbi->pipe->f_flags |= O_DIRECT; + /* We don't expect -EAGAIN */ + sbi->pipe->f_flags &= ~O_NONBLOCK;
That makes sense, we do want a packet pipe and that does also mean
we don't want a non-blocking pipe, it will be interesting to see
if that makes a difference. It's been a long time since Linus
implemented that packet pipe and I can't remember now what the
case was that lead to it.
After thinking about this over the weekend I'm pretty sure my mistake
is dropping the call to autofs_prepare_pipe() without adding the tail
end of it into autofs_parse_fd().
To explain a bit of history which I'll include in the fix description.
During autofs v5 development I decided to stay with the existing usage
instead of changing to a packed structure for autofs <=> user space
communications which turned out to be a mistake on my part.
Problems arose and they were fixed by allowing for the 64 bit to 32 bit
size difference in the automount(8) code.
Along the way systemd started to use autofs and eventually encountered
this problem too. systemd refused to compensate for the length difference
insisting it be fixed in the kernel. Fortunately Linus implemented the
packetized pipe which resolved the problem in a straight forward and
simple way.
So I pretty sure that the cause of the problem is the inadvertent dropping
of the flags setting in autofs_fill_super() that Arnd spotted although I
don't think putting it in autofs_fill_super() is the right thing to do.
I'll produce a patch today which includes most of this explanation for
future travelers ...
So I have a patch.
I'm of two minds whether to try and use the instructions to reproduce this
or not because of experiences I have had with other similar testing automation
systems that claim to provide a reproducer and end up a huge waste of time and
are significantly frustrating.
Can someone please perform a test for me once I provide the patch?
Ian
Ian
sbi->flags &= ~AUTOFS_SBI_CATATONIC; /*
On Mon, 23 Oct 2023 at 09:35, Ian Kent raven@themaw.net wrote:
On 23/10/23 08:48, Ian Kent wrote:
On 20/10/23 21:09, Ian Kent wrote:
On 20/10/23 19:23, Arnd Bergmann wrote:
On Fri, Oct 20, 2023, at 12:45, Dan Carpenter wrote:
On Fri, Oct 20, 2023 at 11:55:57AM +0200, Anders Roxell wrote:
On Fri, 20 Oct 2023 at 08:37, Arnd Bergmann arnd@arndb.de wrote: > On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote: >> The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit >> rootfs we call >> it as compat mode boot testing. Recently it started to failed to >> get login >> prompt. >> >> We have not seen any kernel crash logs. >> >> Anders, bisection is pointing to first bad commit, >> 546694b8f658 autofs: add autofs_parse_fd() >> >> Reported-by: Linux Kernel Functional Testing lkft@linaro.org >> Reported-by: Anders Roxell anders.roxell@linaro.org > I tried to find something in that commit that would be different > in compat mode, but don't see anything at all -- this appears > to be just a simple refactoring of the code, unlike the commits > that immediately follow it and that do change the mount > interface. > > Unfortunately this makes it impossible to just revert the commit > on top of linux-next. Can you double-check your bisection by > testing 546694b8f658 and the commit before it again? I tried these two patches again: 546694b8f658 ("autofs: add autofs_parse_fd()") - doesn't boot bc69fdde0ae1 ("autofs: refactor autofs_prepare_pipe()") - boots
One difference that I notice between those two patches is that we no long call autofs_prepare_pipe(). We just call autofs_check_pipe().
Indeed, so some of the f_flags end up being different. I assumed this was done intentionally, but it might be worth checking if the patch below makes any difference when the flags get put back the way they were. This is probably not the correct fix, but may help figure out what is going on. It should apply to anything from 546694b8f658 ("autofs: add autofs_parse_fd()") to the current linux-next:
--- a/fs/autofs/inode.c +++ b/fs/autofs/inode.c @@ -358,6 +358,11 @@ static int autofs_fill_super(struct super_block *s, struct fs_context *fc) pr_debug("pipe fd = %d, pgrp = %u\n", sbi->pipefd, pid_nr(sbi->oz_pgrp));
/* We want a packet pipe */
sbi->pipe->f_flags |= O_DIRECT;
/* We don't expect -EAGAIN */
sbi->pipe->f_flags &= ~O_NONBLOCK;
That makes sense, we do want a packet pipe and that does also mean
we don't want a non-blocking pipe, it will be interesting to see
if that makes a difference. It's been a long time since Linus
implemented that packet pipe and I can't remember now what the
case was that lead to it.
After thinking about this over the weekend I'm pretty sure my mistake
is dropping the call to autofs_prepare_pipe() without adding the tail
end of it into autofs_parse_fd().
To explain a bit of history which I'll include in the fix description.
During autofs v5 development I decided to stay with the existing usage
instead of changing to a packed structure for autofs <=> user space
communications which turned out to be a mistake on my part.
Problems arose and they were fixed by allowing for the 64 bit to 32 bit
size difference in the automount(8) code.
Along the way systemd started to use autofs and eventually encountered
this problem too. systemd refused to compensate for the length difference
insisting it be fixed in the kernel. Fortunately Linus implemented the
packetized pipe which resolved the problem in a straight forward and
simple way.
So I pretty sure that the cause of the problem is the inadvertent dropping
of the flags setting in autofs_fill_super() that Arnd spotted although I
don't think putting it in autofs_fill_super() is the right thing to do.
I'll produce a patch today which includes most of this explanation for
future travelers ...
So I have a patch.
I'm of two minds whether to try and use the instructions to reproduce this
or not because of experiences I have had with other similar testing automation
systems that claim to provide a reproducer and end up a huge waste of time and
are significantly frustrating.
Can someone please perform a test for me once I provide the patch?
Just tested it, and it passed our tests. Added a tested by flag in your patch.
Thanks for the prompt fix.
Cheers, Anders
Ian
Ian
sbi->flags &= ~AUTOFS_SBI_CATATONIC; /*
On 23/10/23 21:57, Anders Roxell wrote:
On Mon, 23 Oct 2023 at 09:35, Ian Kent raven@themaw.net wrote:
On 23/10/23 08:48, Ian Kent wrote:
On 20/10/23 21:09, Ian Kent wrote:
On 20/10/23 19:23, Arnd Bergmann wrote:
On Fri, Oct 20, 2023, at 12:45, Dan Carpenter wrote:
On Fri, Oct 20, 2023 at 11:55:57AM +0200, Anders Roxell wrote: > On Fri, 20 Oct 2023 at 08:37, Arnd Bergmann arnd@arndb.de wrote: >> On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote: >>> The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit >>> rootfs we call >>> it as compat mode boot testing. Recently it started to failed to >>> get login >>> prompt. >>> >>> We have not seen any kernel crash logs. >>> >>> Anders, bisection is pointing to first bad commit, >>> 546694b8f658 autofs: add autofs_parse_fd() >>> >>> Reported-by: Linux Kernel Functional Testing lkft@linaro.org >>> Reported-by: Anders Roxell anders.roxell@linaro.org >> I tried to find something in that commit that would be different >> in compat mode, but don't see anything at all -- this appears >> to be just a simple refactoring of the code, unlike the commits >> that immediately follow it and that do change the mount >> interface. >> >> Unfortunately this makes it impossible to just revert the commit >> on top of linux-next. Can you double-check your bisection by >> testing 546694b8f658 and the commit before it again? > I tried these two patches again: > 546694b8f658 ("autofs: add autofs_parse_fd()") - doesn't boot > bc69fdde0ae1 ("autofs: refactor autofs_prepare_pipe()") - boots > One difference that I notice between those two patches is that we no long call autofs_prepare_pipe(). We just call autofs_check_pipe().
Indeed, so some of the f_flags end up being different. I assumed this was done intentionally, but it might be worth checking if the patch below makes any difference when the flags get put back the way they were. This is probably not the correct fix, but may help figure out what is going on. It should apply to anything from 546694b8f658 ("autofs: add autofs_parse_fd()") to the current linux-next:
--- a/fs/autofs/inode.c +++ b/fs/autofs/inode.c @@ -358,6 +358,11 @@ static int autofs_fill_super(struct super_block *s, struct fs_context *fc) pr_debug("pipe fd = %d, pgrp = %u\n", sbi->pipefd, pid_nr(sbi->oz_pgrp));
/* We want a packet pipe */
sbi->pipe->f_flags |= O_DIRECT;
/* We don't expect -EAGAIN */
sbi->pipe->f_flags &= ~O_NONBLOCK;
That makes sense, we do want a packet pipe and that does also mean
we don't want a non-blocking pipe, it will be interesting to see
if that makes a difference. It's been a long time since Linus
implemented that packet pipe and I can't remember now what the
case was that lead to it.
After thinking about this over the weekend I'm pretty sure my mistake
is dropping the call to autofs_prepare_pipe() without adding the tail
end of it into autofs_parse_fd().
To explain a bit of history which I'll include in the fix description.
During autofs v5 development I decided to stay with the existing usage
instead of changing to a packed structure for autofs <=> user space
communications which turned out to be a mistake on my part.
Problems arose and they were fixed by allowing for the 64 bit to 32 bit
size difference in the automount(8) code.
Along the way systemd started to use autofs and eventually encountered
this problem too. systemd refused to compensate for the length difference
insisting it be fixed in the kernel. Fortunately Linus implemented the
packetized pipe which resolved the problem in a straight forward and
simple way.
So I pretty sure that the cause of the problem is the inadvertent dropping
of the flags setting in autofs_fill_super() that Arnd spotted although I
don't think putting it in autofs_fill_super() is the right thing to do.
I'll produce a patch today which includes most of this explanation for
future travelers ...
So I have a patch.
I'm of two minds whether to try and use the instructions to reproduce this
or not because of experiences I have had with other similar testing automation
systems that claim to provide a reproducer and end up a huge waste of time and
are significantly frustrating.
Can someone please perform a test for me once I provide the patch?
Just tested it, and it passed our tests. Added a tested by flag in your patch.
Thanks for the prompt fix.
That's great to hear Anders, thanks for doing the testing, ;)
Ian
On 20/10/23 14:36, Arnd Bergmann wrote:
On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call it as compat mode boot testing. Recently it started to failed to get login prompt.
We have not seen any kernel crash logs.
Anders, bisection is pointing to first bad commit, 546694b8f658 autofs: add autofs_parse_fd()
Reported-by: Linux Kernel Functional Testing lkft@linaro.org Reported-by: Anders Roxell anders.roxell@linaro.org
I tried to find something in that commit that would be different in compat mode, but don't see anything at all -- this appears to be just a simple refactoring of the code, unlike the commits that immediately follow it and that do change the mount interface.
Unfortunately this makes it impossible to just revert the commit on top of linux-next. Can you double-check your bisection by testing 546694b8f658 and the commit before it again?
What are the exact mount options you pass to autofs in your fstab?
I've only just seen this report, sorry about that.
I'll have a go at trying to reproduce it but debugging in an environment
like this will be something of a challenge I think.
Ian