As part of LKFT’s re-validation of known issues, we have observed that the selftests: cgroup suite is consistently failing across almost all LKFT-supported devices due to: - Test timeouts (45 seconds limit reached) - OOM-killer invocation
## Key Questions for Discussion: - Would it be beneficial to increase the test timeout to ~180 seconds to allow sufficient execution time? - Should we enhance logging to explicitly print failure reasons when a test fails? - Are there any missing dependencies that could be causing these failures? Note: The required selftests/cgroup/config options were included in LKFT's build and test plans.
## Devices Affected: The following DUTs consistently experience these failures: - dragonboard-410c (arm64) - dragonboard-845c (arm64) - e850-96 (arm64) - juno-r2 (arm64) - qemu-arm64 (arm64) - qemu-armv7 - qemu-x86_64 - rk3399-rock-pi-4b (arm64) - x15 (arm) - x86_64
Regression Analysis: - New regression? No (these failures have been observed for months/years). - Reproducibility? Yes, the failures occur consistently. - Test suite affected? selftests: cgroup (timeouts and OOM-related failures).
Test regression: selftests cgroup fails timeout and oom-killer Reported-by: Linux Kernel Functional Testing lkft@linaro.org
## Test log: # selftests: cgroup: test_cpu # ok 1 test_cpucg_subtree_control # ok 2 test_cpucg_stats # ok 3 test_cpucg_nice # not ok 4 test_cpucg_weight_overprovisioned # ok 5 test_cpucg_weight_underprovisioned # ok 6 test_cpucg_nested_weight_overprovisioned # ok 7 test_cpucg_nested_weight_underprovisioned # not ok 2 selftests: cgroup: test_cpu # TIMEOUT 45 seconds
<trim> # selftests: cgroup: test_freezer # ok 1 test_cgfreezer_simple # ok 2 test_cgfreezer_tree # ok 3 test_cgfreezer_forkbomb # ok 4 test_cgfreezer_mkdir # ok 5 test_cgfreezer_rmdir # ok 6 test_cgfreezer_migrate # Cgroup /sys/fs/cgroup/cg_test_ptrace isn't frozen # not ok 7 test_cgfreezer_ptrace # ok 8 test_cgfreezer_stopped # ok 9 test_cgfreezer_ptraced # ok 10 test_cgfreezer_vfork not ok 4 selftests: cgroup: test_freezer # exit=1 <trim>
selftests: cgroup: test_kmem # not ok 7 selftests: cgroup: test_kmem # TIMEOUT 45 seconds
<trim>
# selftests: cgroup: test_memcontrol # ok 1 test_memcg_subtree_control # not ok 2 test_memcg_current_peak # not ok 3 test_memcg_min # not ok 4 test_memcg_low # not ok 5 test_memcg_high # ok 6 test_memcg_high_sync [ 270.699078] test_memcontrol invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0 [ 270.699921] CPU: 1 UID: 0 PID: 946 Comm: test_memcontrol Not tainted 6.14.0-rc5-next-20250303 #1 [ 270.699930] Hardware name: Radxa ROCK Pi 4B (DT)
<trim> [ 270.729527] Memory cgroup out of memory: Killed process 946 (test_memcontrol) total-vm:104840kB, anon-rss:30596kB, file-rss:1056kB, shmem-rss:0kB, UID:0 pgtables:104kB oom_score_adj:0 # not ok 7 test_memcg_max # not ok 8 test_memcg_reclaim <trim> not ok 8 selftests: cgroup: test_memcontrol # exit=1
## Source * Kernel version: 6.14.0-rc5-next-20250303 * Git tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git * Git sha: cd3215bbcb9d4321def93fea6cfad4d5b42b9d1d * Git describe: 6.14.0-rc5-next-20250303 * Project details: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250303/
## Test data * Test log: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250303/tes... * Test history: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250303/tes... * Test details: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250303/tes... * Test logs rock pi: https://lkft.validation.linaro.org/scheduler/job/8148789#L1774 * Test logs x86: https://lkft.validation.linaro.org/scheduler/job/8148731#L1948
-- Linaro LKFT https://lkft.linaro.org
Hello Naresh.
On Tue, Mar 04, 2025 at 05:26:45PM +0530, Naresh Kamboju naresh.kamboju@linaro.org wrote:
As part of LKFT’s re-validation of known issues, we have observed that the selftests: cgroup suite is consistently failing across almost all LKFT-supported devices due to:
- Test timeouts (45 seconds limit reached)
- OOM-killer invocation
Thanks for reporting the issues with the tests.
## Key Questions for Discussion:
- Would it be beneficial to increase the test timeout to ~180 seconds to allow sufficient execution time?
That depends.
test_cpu has some lenghtier checks and they can in sum surpass 45s, it'd might be better to shorten them (withing precision margin) instead of prolonging the limit.
test_kmem -- it shouldn't take so long, if anything I'd suspect /proc/kpagecgroup -- are your systems larger than 100GiB of memory (that's my rough estimate for this reads to take above the limit)?
(Are there any other timeouts?)
OOM -- some tests are supposed to trigger memcg OOM.
- Should we enhance logging to explicitly print failure reasons when a test fails?
These tests are useful when run by developers them_selves_. In such a case it's handy to obtain more info running them understrace (since they're so simple).
- Are there any missing dependencies that could be causing these failures? Note: The required selftests/cgroup/config options were included in LKFT's build and test plans.
The deps are rather minimal, only some coreutils (cgroup selftests should be covered by e.g. this list [1]).
## Devices Affected: The following DUTs consistently experience these failures:
- dragonboard-410c (arm64)
- dragonboard-845c (arm64)
- e850-96 (arm64)
- juno-r2 (arm64)
- qemu-arm64 (arm64)
- qemu-armv7
- qemu-x86_64
- rk3399-rock-pi-4b (arm64)
- x15 (arm)
- x86_64
Regression Analysis:
- New regression? No (these failures have been observed for months/years).
Actually, I noticed test_memcontrol failure yesterday (with ~mainline kernel) but I remember they used to work also rather recently. I haven't got time to look into that but at least that one may be a regression (in code or test).
- Reproducibility? Yes, the failures occur consistently.
+/- as that may depend no nr_cpus or totalram.
- Test suite affected? selftests: cgroup (timeouts and OOM-related failures).
Michal
linux-kselftest-mirror@lists.linaro.org