While running LTP syscalls tests on Linux next-20250924 tag build the following kernel oops noticed on arm64 and x86_64 devices.
First seen on next-20250924 Good: next-20250923 Bad: next-2025094
Regression Analysis: - New regression? yes - Reproducibility? yes
Test regression: next-20250924: Internal error: Oops: mnt_ns_release (fs/namespace.c:148) __arm64_sys_listmount (fs/namespace.c:5936)
Reported-by: Linux Kernel Functional Testing lkft@linaro.org $ git log --oneline next-20250923..next-20250924 -- fs/namespace.c c54644c3221b6 (next/fs-next) Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git 1f28cc19559a8 Merge branch 'namespace-6.18' into vfs.all e2c277f720291 Merge branch 'kernel-6.18.clone3' into vfs.all b2af83d5b8223 Merge branch 'vfs-6.18.mount' into vfs.all 29ecd1ca48ec2 Merge branch 'vfs-6.18.misc' into vfs.all d7610cb7454bb ns: simplify ns_common_init() further 59bfb66816809 listmount: don't call path_put() under namespace semaphore 2bc5bfbfd3f27 statmount: don't call path_put() under namespace semaphore
## Test log [ 41.821877] Internal error: Oops: 0000000096000005 [#1] SMP [ 41.919038] Modules linked in: cdc_ether usbnet sm3_ce sha3_ce nvme xhci_pci_renesas nvme_core arm_cspmu_module arm_spe_pmu ipmi_devintf ipmi_msghandler arm_cmn cppc_cpufreq drm fuse backlight [ 41.944048] CPU: 14 UID: 0 PID: 6416 Comm: listmount04 Not tainted 6.17.0-rc7-next-20250924 #1 PREEMPT [ 41.958197] Hardware name: Inspur NF5280R7/Mitchell MB, BIOS 04.04.00004001 2025-02-04 22:23:30 02/04/2025 [ 41.967837] pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [ 41.974958] pc : mnt_ns_release (arch/arm64/include/asm/atomic_lse.h:62 (discriminator 1) arch/arm64/include/asm/atomic_lse.h:76 (discriminator 1) arch/arm64/include/asm/atomic.h:51 (discriminator 1) include/linux/atomic/atomic-arch-fallback.h:944 (discriminator 1) include/linux/atomic/atomic-instrumented.h:401 (discriminator 1) include/linux/refcount.h:389 (discriminator 1) include/linux/refcount.h:432 (discriminator 1) include/linux/refcount.h:450 (discriminator 1) fs/namespace.c:148 (discriminator 1)) [ 41.981910] lr : __arm64_sys_listmount (fs/namespace.c:5936) [ 41.993467] sp : ffff8000ff5afd50 [ 42.000329] x29: ffff8000ff5afd50 x28: fff00001bd947380 x27: 0000000000000000 [ 42.007454] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000100 [ 42.030726] x23: 0000000000000000 x22: 0000000000000020 x21: ffff8000ff5afdc8 [ 42.038281] x20: 0000aaaabd6a1110 x19: 0000000000000000 x18: 0000000000000000 [ 42.045405] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 42.052528] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 42.075541] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffda68dcdbbe30 [ 42.082835] x8 : ffff8000ff5afda0 x7 : fefefefefefefefe x6 : ffffda68df5e9000 [ 42.096212] x5 : fff00001bd947380 [ 42.108978] x4 : fff00001bd947380 x3 : 0000000000000000 [ 42.114449] x2 : 0000000000000000 x1 : 00000000ffffffff x0 : 00000000000000b8 [ 42.134515] Call trace: [ 42.139725] mnt_ns_release (arch/arm64/include/asm/atomic_lse.h:62 (discriminator 1) arch/arm64/include/asm/atomic_lse.h:76 (discriminator 1) arch/arm64/include/asm/atomic.h:51 (discriminator 1) include/linux/atomic/atomic-arch-fallback.h:944 (discriminator 1) include/linux/atomic/atomic-instrumented.h:401 (discriminator 1) include/linux/refcount.h:389 (discriminator 1) include/linux/refcount.h:432 (discriminator 1) include/linux/refcount.h:450 (discriminator 1) fs/namespace.c:148 (discriminator 1)) (P) [ 42.143811] __arm64_sys_listmount (fs/namespace.c:5936) [ 42.148327] invoke_syscall.constprop.0 (arch/arm64/include/asm/syscall.h:61 arch/arm64/kernel/syscall.c:54) [ 42.159193] do_el0_svc (include/linux/thread_info.h:135 (discriminator 2) arch/arm64/kernel/syscall.c:140 (discriminator 2) arch/arm64/kernel/syscall.c:151 (discriminator 2)) [ 42.163970] el0_svc (arch/arm64/include/asm/irqflags.h:82 (discriminator 1) arch/arm64/include/asm/irqflags.h:123 (discriminator 1) arch/arm64/include/asm/irqflags.h:136 (discriminator 1) arch/arm64/kernel/entry-common.c:102 (discriminator 1) arch/arm64/kernel/entry-common.c:745 (discriminator 1)) [ 42.173791] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:764) [ 42.185342] el0t_64_sync (arch/arm64/kernel/entry.S:596) [ 42.189165] Code: aa0003f3 9102e000 d503201f 12800001 (b8610001) All code ======== 0: aa0003f3 mov x19, x0 4: 9102e000 add x0, x0, #0xb8 8: d503201f nop c: 12800001 mov w1, #0xffffffff // #-1 10:* b8610001 ldaddl w1, w1, [x0] <-- trapping instruction
Code starting with the faulting instruction =========================================== 0: b8610001 ldaddl w1, w1, [x0] [ 42.211485] ---[ end trace 0000000000000000 ]---
## Source * Kernel version: 6.17.0-rc7 * Git tree: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git * Git describe: 6.17.0-rc7-next-20250924 * Git commit: b5a4da2c459f79a2c87c867398f1c0c315779781 * Architectures: arm64, x86_64 * Toolchains: gcc-13 * Kconfigs: defconfig+lkftconfig
## Build * Test log arm64: https://qa-reports.linaro.org/api/testruns/30007634/log_file/ * Test log x86_64: https://qa-reports.linaro.org/api/testruns/30000230/log_file/ * Test details: https://regressions.linaro.org/lkft/linux-next-master-ampere/next-20250924/l... * Build plan: https://tuxapi.tuxsuite.com/v1/groups/ampere/projects/ci/tests/339teV8pAwrsg... * Build link: https://storage.tuxsuite.com/public/ampere/ci/builds/339teBhKZ4DENKbJJNnbWKh... * Kernel config: https://storage.tuxsuite.com/public/ampere/ci/builds/339teBhKZ4DENKbJJNnbWKh...
-- Linaro LKFT
On Fri, Sep 26, 2025 at 12:00:08AM +0530, Naresh Kamboju wrote:
[snip]
With 59bfb6681680 "listmount: don't call path_put() under namespace semaphore" we get this:
static void __free_klistmount_free(const struct klistmount *kls) { path_put(&kls->root); kvfree(kls->kmnt_ids); mnt_ns_release(kls->ns); }
...
SYSCALL_DEFINE4(listmount, const struct mnt_id_req __user *, req, u64 __user *, mnt_ids, size_t, nr_mnt_ids, unsigned int, flags) { struct klistmount kls __free(klistmount_free) = {}; const size_t maxcount = 1000000; struct mnt_id_req kreq; ssize_t ret; if (flags & ~LISTMOUNT_REVERSE) return -EINVAL;
which will oops if it takes that failure exit - if you are initializing something with any kind of cleanup on it, you'd better make sure the cleanup will survive being called for the initial value...
Christian, that's your branch and I don't want to play with rebasing it - had it been mine, the fix would be folded into commit in question, with the rest of the branch cherry-picked on top of fixed commit, but everyone got their own preferences in how to do such stuff.
Minimal fix would be to make mnt_ns_release(NULL) a no-op.
BTW, I suspect that one of the sources of confusion had been the fact that __free(mnt_ns_release) *does* treat NULL as no-op; in statmount(2) you are using that and get away with NULL as initializer. In listmount(2)), OTOH, you are dealing with the function call - same identifier, different behaviour...
On Fri, Sep 26, 2025 at 07:48:01AM +0100, Al Viro wrote:
On Fri, Sep 26, 2025 at 12:00:08AM +0530, Naresh Kamboju wrote:
[snip]
With 59bfb6681680 "listmount: don't call path_put() under namespace semaphore" we get this:
static void __free_klistmount_free(const struct klistmount *kls) { path_put(&kls->root); kvfree(kls->kmnt_ids); mnt_ns_release(kls->ns); }
...
SYSCALL_DEFINE4(listmount, const struct mnt_id_req __user *, req, u64 __user *, mnt_ids, size_t, nr_mnt_ids, unsigned int, flags) { struct klistmount kls __free(klistmount_free) = {}; const size_t maxcount = 1000000; struct mnt_id_req kreq; ssize_t ret; if (flags & ~LISTMOUNT_REVERSE) return -EINVAL;
which will oops if it takes that failure exit - if you are initializing something with any kind of cleanup on it, you'd better make sure the cleanup will survive being called for the initial value...
Christian, that's your branch and I don't want to play with rebasing it - had it been mine, the fix would be folded into commit in question, with the rest of the branch cherry-picked on top of fixed commit, but everyone got their own preferences in how to do such stuff.
Minimal fix would be to make mnt_ns_release(NULL) a no-op.
BTW, I suspect that one of the sources of confusion had been the fact that __free(mnt_ns_release) *does* treat NULL as no-op; in statmount(2) you are using that and get away with NULL as initializer. In listmount(2)), OTOH, you are dealing with the function call - same identifier, different behaviour...
Ah, fuck me. Thanks for spotting that! I'll take care of it.
linux-kselftest-mirror@lists.linaro.org