Regressions noticed intermittently on AmpereOne while running selftest cgroup testing with Linux next-20250805 and earlier seen on next-20250722 tag also.
Regression Analysis: - New regression? Yes - Reproducibility? Intermittent
First seen on the next-20250722 and after next-20250805.
Test regression: next-20250805 ampere WARNING kernel cgroup cpuset.c at remote_partition_disable
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
## Test log selftests: cgroup: test_cpuset_prs.sh Running state tRunning state transition test ... ransition test ... Running test 0 ... Running test 1 ... Running test 2 ... Running test 3 ... Running test 4 ... Running test 5 ... Running test 6 ... Running test 7 ... Running test 8 ... Running test 9 ... Running test 10 ... Running test 11 ... Running test 12 ... Running test 13 ... Running test 14 ... Running test 15 ... Running test 16 ... Running test 17 ... Running test 18 ... Running test 19 ... [ 137.504549] psci: CPU2 killed (polled 0 ms) [ 137.747094] Detected PIPT I-cache on CPU2 [ 137.747214] GICv3: CPU2: found redistributor 3500 region 0:0x0000400201cc0000 [ 137.747312] CPU2: Booted secondary processor 0x0000003500 [0xc00fac40]
<>
Running test 63 ... Running test 64 ... Running test 66 ... [ 174.929535] psci: CPU3 killed (polled 0 ms) [ 175.263087] Detected PIPT I-cache on CPU3 [ 175.263203] GICv3: CPU3: found redistributor 3501 region 0:0x0000400201d00000 [ 175.263300] CPU3: Booted secondary processor 0x0000003501 [0xc00fac40] [ 175.434129] workqueue: Interrupted when creating a worker thread "kworker/u1028:0" ** replaying previous printk message ** [ 175.434129] workqueue: Interrupted when creating a worker thread "kworker/u1028:0" [ 175.440230] ------------[ cut here ]------------ [ 175.440234] WARNING: kernel/cgroup/cpuset.c:1352 at remote_partition_disable+0x120/0x160, CPU#170: rmdir/33763 [ 175.467456] Modules linked in: cdc_ether usbnet sm3_ce sha3_ce nvme nvme_core xhci_pci_renesas arm_cspmu_module ipmi_devintf arm_spe_pmu ipmi_msghandler arm_cmn cppc_cpufreq fuse drm backlight [ 175.484676] CPU: 170 UID: 0 PID: 33763 Comm: rmdir Not tainted 6.16.0-next-20250805 #1 PREEMPT [ 175.493365] Hardware name: Inspur NF5280R7/Mitchell MB, BIOS 04.04.00004001 2025-02-04 22:23:30 02/04/2025 [ 175.503178] pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) not ok 12 selftests: cgroup: test_cpuset_prs.sh TIMEOUT 45 seconds [ 175.510130] pc : remote_partition_disable (kernel/cgroup/cpuset.c:1352 (discriminator 1) kernel/cgroup/cpuset.c:1342 (discriminator 1) kernel/cgroup/cpuset.c:1514 (discriminator 1)) [ 175.518032] lr : remote_partition_disable (kernel/cgroup/cpuset.c:1352 (discriminator 1) kernel/cgroup/cpuset.c:1514 (discriminator 1)) [ 175.525849] sp : ffff8000c853bb90 [ 175.529585] x29: ffff8000c853bb90 x28: ffff00017badc800 x27: 0000000000000000 timeout set to 45 [ 175.536713] x26: 0000000000000000 x25: ffff00014c422540 x24: ffffb1c71020b000 [ 175.545489] x23: ffff000113769c00 x22: 0000000000000001 x21: ffffb1c71020b5c0 [ 175.552615] x20: ffff8000c853bbd0 x19: ffff000113769a00 x18: 00000000ffffffff selftests: cgroup: test_cpuset_v1_hp.sh [ 175.559910] x17: 31752f72656b726f x16: 776b222064616572 x15: 68742072656b726f [ 175.569900] x14: 0000000000000004 x13: ffffb1c70fb4f160 x12: 0000000000000000 cpuset v1 mount point not found! [ 175.577888] x11: 000002f6b9bf58c3 x10: 0000000000000023 x9 : ffffb1c70d6bdff8 Test SKIPPED ok 13 selftests: cgroup: test_cpuset_v1_hp.sh #SKIP [ 175.587877] x8 : ffff8000c853bad0 x7 : 0000000000000000 x6 : 0000000000000001 [ 175.597864] x5 : ffffb1c70e87a488 x4 : fffffdffc40a88e0 x3 : 000000000080007d [ 175.607849] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 00000000fffffff4 [ 175.615578] Call trace: [ 175.618013] remote_partition_disable (kernel/cgroup/cpuset.c:1352 (discriminator 1) kernel/cgroup/cpuset.c:1342 (discriminator 1) kernel/cgroup/cpuset.c:1514 (discriminator 1)) (P) [ 175.623057] update_prstate (include/linux/spinlock.h:376 kernel/cgroup/cpuset.c:2963) [ 175.626799] cpuset_css_killed (kernel/cgroup/cpuset.c:3598) [ 175.630713] kill_css.part.0 (kernel/cgroup/cgroup.c:5968) [ 175.634464] cgroup_destroy_locked (kernel/cgroup/cgroup.c:6058 (discriminator 4)) [ 175.638810] cgroup_rmdir (kernel/cgroup/cgroup.c:6102) [ 175.642376] kernfs_iop_rmdir (fs/kernfs/dir.c:1286) [ 175.646203] vfs_rmdir (fs/namei.c:4461 fs/namei.c:4438) [ 175.649515] do_rmdir (fs/namei.c:4516 (discriminator 1)) [ 175.652823] __arm64_sys_unlinkat (fs/namei.c:4690 (discriminator 2) fs/namei.c:4684 (discriminator 2) fs/namei.c:4684 (discriminator 2)) [ 175.656998] invoke_syscall (arch/arm64/include/asm/current.h:19 arch/arm64/kernel/syscall.c:54) [ 175.660738] el0_svc_common.constprop.0 (include/linux/thread_info.h:135 (discriminator 2) arch/arm64/kernel/syscall.c:140 (discriminator 2)) [ 175.665431] do_el0_svc (arch/arm64/kernel/syscall.c:152) [ 175.668735] el0_svc (arch/arm64/include/asm/irqflags.h:82 (discriminator 1) arch/arm64/include/asm/irqflags.h:123 (discriminator 1) arch/arm64/include/asm/irqflags.h:136 (discriminator 1) arch/arm64/kernel/entry-common.c:169 (discriminator 1) arch/arm64/kernel/entry-common.c:182 (discriminator 1) arch/arm64/kernel/entry-common.c:880 (discriminator 1)) [ 175.671877] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:899) [ 175.676052] el0t_64_sync (arch/arm64/kernel/entry.S:596) [ 175.679705] ---[ end trace 0000000000000000 ]---
## Source * Git tree: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git * Git sha: afec768a6a8fe7fb02a08ffce5f2f556f51d4b52 * Git describe: next-20250805 * Architectures: arm64 * Toolchains: gcc-13 * Kconfigs: defconfig+selftests/*/configs
## Build * Test log 1: https://qa-reports.linaro.org/api/testruns/29220998/log_file/ * Test log 2: https://qa-reports.linaro.org/api/testruns/29395866/log_file/ * LAVA log: https://lkft-staging.validation.linaro.org/scheduler/job/187100#L6621 * Test history: https://regressions.linaro.org/lkft/linux-next-master-ampere/next-20250805/l... * Test plan: https://tuxapi.tuxsuite.com/v1/groups/ampere/projects/ci/tests/30rj0dIdTXUiG... * Build link: https://storage.tuxsuite.com/public/ampere/ci/builds/30rj0OYSDUMeT0cyTDioTe5... * Kernel config: https://storage.tuxsuite.com/public/ampere/ci/builds/30rj0OYSDUMeT0cyTDioTe5...
-- Linaro LKFT https://lkft.linaro.org
On 8/7/25 4:27 AM, Naresh Kamboju wrote:
Regressions noticed intermittently on AmpereOne while running selftest cgroup testing with Linux next-20250805 and earlier seen on next-20250722 tag also.
Regression Analysis:
- New regression? Yes
- Reproducibility? Intermittent
First seen on the next-20250722 and after next-20250805.
Test regression: next-20250805 ampere WARNING kernel cgroup cpuset.c at remote_partition_disable
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
## Test log selftests: cgroup: test_cpuset_prs.sh Running state tRunning state transition test ... ransition test ... Running test 0 ... Running test 1 ... Running test 2 ... Running test 3 ... Running test 4 ... Running test 5 ... Running test 6 ... Running test 7 ... Running test 8 ... Running test 9 ... Running test 10 ... Running test 11 ... Running test 12 ... Running test 13 ... Running test 14 ... Running test 15 ... Running test 16 ... Running test 17 ... Running test 18 ... Running test 19 ... [ 137.504549] psci: CPU2 killed (polled 0 ms) [ 137.747094] Detected PIPT I-cache on CPU2 [ 137.747214] GICv3: CPU2: found redistributor 3500 region 0:0x0000400201cc0000 [ 137.747312] CPU2: Booted secondary processor 0x0000003500 [0xc00fac40]
<>
Running test 63 ... Running test 64 ... Running test 66 ... [ 174.929535] psci: CPU3 killed (polled 0 ms) [ 175.263087] Detected PIPT I-cache on CPU3 [ 175.263203] GICv3: CPU3: found redistributor 3501 region 0:0x0000400201d00000 [ 175.263300] CPU3: Booted secondary processor 0x0000003501 [0xc00fac40] [ 175.434129] workqueue: Interrupted when creating a worker thread "kworker/u1028:0" ** replaying previous printk message ** [ 175.434129] workqueue: Interrupted when creating a worker thread "kworker/u1028:0" [ 175.440230] ------------[ cut here ]------------ [ 175.440234] WARNING: kernel/cgroup/cpuset.c:1352 at remote_partition_disable+0x120/0x160, CPU#170: rmdir/33763 [ 175.467456] Modules linked in: cdc_ether usbnet sm3_ce sha3_ce nvme nvme_core xhci_pci_renesas arm_cspmu_module ipmi_devintf arm_spe_pmu ipmi_msghandler arm_cmn cppc_cpufreq fuse drm backlight [ 175.484676] CPU: 170 UID: 0 PID: 33763 Comm: rmdir Not tainted 6.16.0-next-20250805 #1 PREEMPT [ 175.493365] Hardware name: Inspur NF5280R7/Mitchell MB, BIOS 04.04.00004001 2025-02-04 22:23:30 02/04/2025 [ 175.503178] pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) not ok 12 selftests: cgroup: test_cpuset_prs.sh TIMEOUT 45 seconds [ 175.510130] pc : remote_partition_disable (kernel/cgroup/cpuset.c:1352 (discriminator 1) kernel/cgroup/cpuset.c:1342 (discriminator 1) kernel/cgroup/cpuset.c:1514 (discriminator 1))
The warning is caused by workqueue_unbound_exclude_cpumask() returning an error which should not normally happen. There is a "workqueue: Interrupted when creating a worker thread" which may cause problem in the workqueue code leading to this error. That particular error happens when kthread_create_on_node() fails to create the requested worker kthread.
The test itself uses the hotplug code rather heavily to offline/online CPUs to test the cpuset code related to hotplug. I don't know if that is part of the problem or not. Anyway, there isn't any big change in the cpuset code recently. I think the real bug may lie in other kernel areas used by the cpuset code.
Cheers, Longman
[ 175.518032] lr : remote_partition_disable (kernel/cgroup/cpuset.c:1352 (discriminator 1) kernel/cgroup/cpuset.c:1514 (discriminator 1)) [ 175.525849] sp : ffff8000c853bb90 [ 175.529585] x29: ffff8000c853bb90 x28: ffff00017badc800 x27: 0000000000000000 timeout set to 45 [ 175.536713] x26: 0000000000000000 x25: ffff00014c422540 x24: ffffb1c71020b000 [ 175.545489] x23: ffff000113769c00 x22: 0000000000000001 x21: ffffb1c71020b5c0 [ 175.552615] x20: ffff8000c853bbd0 x19: ffff000113769a00 x18: 00000000ffffffff selftests: cgroup: test_cpuset_v1_hp.sh [ 175.559910] x17: 31752f72656b726f x16: 776b222064616572 x15: 68742072656b726f [ 175.569900] x14: 0000000000000004 x13: ffffb1c70fb4f160 x12: 0000000000000000 cpuset v1 mount point not found! [ 175.577888] x11: 000002f6b9bf58c3 x10: 0000000000000023 x9 : ffffb1c70d6bdff8 Test SKIPPED ok 13 selftests: cgroup: test_cpuset_v1_hp.sh #SKIP [ 175.587877] x8 : ffff8000c853bad0 x7 : 0000000000000000 x6 : 0000000000000001 [ 175.597864] x5 : ffffb1c70e87a488 x4 : fffffdffc40a88e0 x3 : 000000000080007d [ 175.607849] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 00000000fffffff4 [ 175.615578] Call trace: [ 175.618013] remote_partition_disable (kernel/cgroup/cpuset.c:1352 (discriminator 1) kernel/cgroup/cpuset.c:1342 (discriminator 1) kernel/cgroup/cpuset.c:1514 (discriminator 1)) (P) [ 175.623057] update_prstate (include/linux/spinlock.h:376 kernel/cgroup/cpuset.c:2963) [ 175.626799] cpuset_css_killed (kernel/cgroup/cpuset.c:3598) [ 175.630713] kill_css.part.0 (kernel/cgroup/cgroup.c:5968) [ 175.634464] cgroup_destroy_locked (kernel/cgroup/cgroup.c:6058 (discriminator 4)) [ 175.638810] cgroup_rmdir (kernel/cgroup/cgroup.c:6102) [ 175.642376] kernfs_iop_rmdir (fs/kernfs/dir.c:1286) [ 175.646203] vfs_rmdir (fs/namei.c:4461 fs/namei.c:4438) [ 175.649515] do_rmdir (fs/namei.c:4516 (discriminator 1)) [ 175.652823] __arm64_sys_unlinkat (fs/namei.c:4690 (discriminator 2) fs/namei.c:4684 (discriminator 2) fs/namei.c:4684 (discriminator 2)) [ 175.656998] invoke_syscall (arch/arm64/include/asm/current.h:19 arch/arm64/kernel/syscall.c:54) [ 175.660738] el0_svc_common.constprop.0 (include/linux/thread_info.h:135 (discriminator 2) arch/arm64/kernel/syscall.c:140 (discriminator 2)) [ 175.665431] do_el0_svc (arch/arm64/kernel/syscall.c:152) [ 175.668735] el0_svc (arch/arm64/include/asm/irqflags.h:82 (discriminator 1) arch/arm64/include/asm/irqflags.h:123 (discriminator
- arch/arm64/include/asm/irqflags.h:136 (discriminator 1)
arch/arm64/kernel/entry-common.c:169 (discriminator 1) arch/arm64/kernel/entry-common.c:182 (discriminator 1) arch/arm64/kernel/entry-common.c:880 (discriminator 1)) [ 175.671877] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:899) [ 175.676052] el0t_64_sync (arch/arm64/kernel/entry.S:596) [ 175.679705] ---[ end trace 0000000000000000 ]---
## Source
- Git tree: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git
- Git sha: afec768a6a8fe7fb02a08ffce5f2f556f51d4b52
- Git describe: next-20250805
- Architectures: arm64
- Toolchains: gcc-13
- Kconfigs: defconfig+selftests/*/configs
## Build
- Test log 1: https://qa-reports.linaro.org/api/testruns/29220998/log_file/
- Test log 2: https://qa-reports.linaro.org/api/testruns/29395866/log_file/
- LAVA log: https://lkft-staging.validation.linaro.org/scheduler/job/187100#L6621
- Test history:
https://regressions.linaro.org/lkft/linux-next-master-ampere/next-20250805/l...
- Test plan: https://tuxapi.tuxsuite.com/v1/groups/ampere/projects/ci/tests/30rj0dIdTXUiG...
- Build link: https://storage.tuxsuite.com/public/ampere/ci/builds/30rj0OYSDUMeT0cyTDioTe5...
- Kernel config:
https://storage.tuxsuite.com/public/ampere/ci/builds/30rj0OYSDUMeT0cyTDioTe5...
-- Linaro LKFT https://lkft.linaro.org
linux-kselftest-mirror@lists.linaro.org