Hello Naresh.
On Tue, Mar 04, 2025 at 05:26:45PM +0530, Naresh Kamboju naresh.kamboju@linaro.org wrote:
As part of LKFT’s re-validation of known issues, we have observed that the selftests: cgroup suite is consistently failing across almost all LKFT-supported devices due to:
- Test timeouts (45 seconds limit reached)
- OOM-killer invocation
Thanks for reporting the issues with the tests.
## Key Questions for Discussion:
- Would it be beneficial to increase the test timeout to ~180 seconds to allow sufficient execution time?
That depends.
test_cpu has some lenghtier checks and they can in sum surpass 45s, it'd might be better to shorten them (withing precision margin) instead of prolonging the limit.
test_kmem -- it shouldn't take so long, if anything I'd suspect /proc/kpagecgroup -- are your systems larger than 100GiB of memory (that's my rough estimate for this reads to take above the limit)?
(Are there any other timeouts?)
OOM -- some tests are supposed to trigger memcg OOM.
- Should we enhance logging to explicitly print failure reasons when a test fails?
These tests are useful when run by developers them_selves_. In such a case it's handy to obtain more info running them understrace (since they're so simple).
- Are there any missing dependencies that could be causing these failures? Note: The required selftests/cgroup/config options were included in LKFT's build and test plans.
The deps are rather minimal, only some coreutils (cgroup selftests should be covered by e.g. this list [1]).
## Devices Affected: The following DUTs consistently experience these failures:
- dragonboard-410c (arm64)
- dragonboard-845c (arm64)
- e850-96 (arm64)
- juno-r2 (arm64)
- qemu-arm64 (arm64)
- qemu-armv7
- qemu-x86_64
- rk3399-rock-pi-4b (arm64)
- x15 (arm)
- x86_64
Regression Analysis:
- New regression? No (these failures have been observed for months/years).
Actually, I noticed test_memcontrol failure yesterday (with ~mainline kernel) but I remember they used to work also rather recently. I haven't got time to look into that but at least that one may be a regression (in code or test).
- Reproducibility? Yes, the failures occur consistently.
+/- as that may depend no nr_cpus or totalram.
- Test suite affected? selftests: cgroup (timeouts and OOM-related failures).
Michal