Hi all,
This patch series introduces improvements to the cgroup selftests by adding a helper function to better handle asynchronous updates in cgroup statistics. These changes are especially useful for managing cgroup stats like memory.stat and cgroup.stat, which can be affected by delays (e.g., RCU callbacks and asynchronous rstat flushing).
Patch 1/3 adds cg_read_key_long_poll(), a generic helper to poll a numeric key in a cgroup file until it reaches an expected value or a retry limit is hit. Patches 2/3 and 3/3 convert existing tests to use this helper, making them more robust on busy systems.
v5: - Drop the "/* 3s total */" comment from MEMCG_SOCKSTAT_WAIT_RETRIES as suggested by Shakeel, so it does not become stale if the wait interval changes. - Elaborate in the commit message of patch 3/3 on the rationale behind the 8s timeout in test_kmem_dead_cgroups(), and add a comment next to KMEM_DEAD_WAIT_RETRIES explaining that it is a generous upper bound derived from stress testing and not tied to a specific kernel constant. - Add Reviewed-by: Shakeel Butt shakeel.butt@linux.dev to this series. - Link to v4: https://lore.kernel.org/all/20251124123816.486164-1-zhangguopeng@kylinos.cn/
v4: - Patch 1/3: Add the cg_read_key_long_poll() helper to poll cgroup keys with retries and configurable intervals. - Patch 2/3: Update test_memcg_sock() to use cg_read_key_long_poll() for handling delayed "sock " counter updates in memory.stat. - Patch 3/3: Replace the sleep-and-retry logic in test_kmem_dead_cgroups() with cg_read_key_long_poll() for waiting on nr_dying_descendants. - Link to v3: https://lore.kernel.org/all/p655qedqjaakrnqpytc6dltejfluxo6jrffcltfz2ivonmk6...
v3: - Move MEMCG_SOCKSTAT_WAIT_* defines after the #include block as suggested. - Link to v2: https://lore.kernel.org/all/5ad2b75f-748a-4e93-8d11-63295bda0cbf@linux.dev/
v2: - Clarify the rationale for the 3s timeout and mention the periodic rstat flush interval (FLUSH_TIME = 2 * HZ) in the comment. - Replace hardcoded retry count and wait interval with macros to avoid magic numbers and make the timeout calculation explicit. - Link to v1: https://lore.kernel.org/all/20251119122758.85610-1-ioworker0@gmail.com/
Thanks to Michal Koutný for the suggestion, and to Lance Yang and Shakeel Butt for their reviews and feedback.
Guopeng Zhang (3): selftests: cgroup: Add cg_read_key_long_poll() to poll a cgroup key with retries selftests: cgroup: make test_memcg_sock robust against delayed sock stats selftests: cgroup: Replace sleep with cg_read_key_long_poll() for waiting on nr_dying_descendants
.../selftests/cgroup/lib/cgroup_util.c | 21 ++++++++++++ .../cgroup/lib/include/cgroup_util.h | 5 +++ tools/testing/selftests/cgroup/test_kmem.c | 33 +++++++++---------- .../selftests/cgroup/test_memcontrol.c | 20 ++++++++++- 4 files changed, 60 insertions(+), 19 deletions(-)