Hi all,
This patch series introduces improvements to the cgroup selftests by adding a helper function to better handle asynchronous updates in cgroup statistics. These changes are especially useful for managing cgroup stats like memory.stat and cgroup.stat, which can be affected by delays (e.g., RCU callbacks and asynchronous rstat flushing).
Patch 1/3 adds cg_read_key_long_poll(), a generic helper to poll a numeric key in a cgroup file until it reaches an expected value or a retry limit is hit. Patches 2/3 and 3/3 convert existing tests to use this helper, making them more robust on busy systems.
v5: - Drop the "/* 3s total */" comment from MEMCG_SOCKSTAT_WAIT_RETRIES as suggested by Shakeel, so it does not become stale if the wait interval changes. - Elaborate in the commit message of patch 3/3 on the rationale behind the 8s timeout in test_kmem_dead_cgroups(), and add a comment next to KMEM_DEAD_WAIT_RETRIES explaining that it is a generous upper bound derived from stress testing and not tied to a specific kernel constant. - Add Reviewed-by: Shakeel Butt shakeel.butt@linux.dev to this series. - Link to v4: https://lore.kernel.org/all/20251124123816.486164-1-zhangguopeng@kylinos.cn/
v4: - Patch 1/3: Add the cg_read_key_long_poll() helper to poll cgroup keys with retries and configurable intervals. - Patch 2/3: Update test_memcg_sock() to use cg_read_key_long_poll() for handling delayed "sock " counter updates in memory.stat. - Patch 3/3: Replace the sleep-and-retry logic in test_kmem_dead_cgroups() with cg_read_key_long_poll() for waiting on nr_dying_descendants. - Link to v3: https://lore.kernel.org/all/p655qedqjaakrnqpytc6dltejfluxo6jrffcltfz2ivonmk6...
v3: - Move MEMCG_SOCKSTAT_WAIT_* defines after the #include block as suggested. - Link to v2: https://lore.kernel.org/all/5ad2b75f-748a-4e93-8d11-63295bda0cbf@linux.dev/
v2: - Clarify the rationale for the 3s timeout and mention the periodic rstat flush interval (FLUSH_TIME = 2 * HZ) in the comment. - Replace hardcoded retry count and wait interval with macros to avoid magic numbers and make the timeout calculation explicit. - Link to v1: https://lore.kernel.org/all/20251119122758.85610-1-ioworker0@gmail.com/
Thanks to Michal Koutný for the suggestion, and to Lance Yang and Shakeel Butt for their reviews and feedback.
Guopeng Zhang (3): selftests: cgroup: Add cg_read_key_long_poll() to poll a cgroup key with retries selftests: cgroup: make test_memcg_sock robust against delayed sock stats selftests: cgroup: Replace sleep with cg_read_key_long_poll() for waiting on nr_dying_descendants
.../selftests/cgroup/lib/cgroup_util.c | 21 ++++++++++++ .../cgroup/lib/include/cgroup_util.h | 5 +++ tools/testing/selftests/cgroup/test_kmem.c | 33 +++++++++---------- .../selftests/cgroup/test_memcontrol.c | 20 ++++++++++- 4 files changed, 60 insertions(+), 19 deletions(-)
Introduce a new helper function `cg_read_key_long_poll()` in cgroup_util.h. This function polls the specified key in a cgroup file until it matches the expected value or the retry limit is reached, with configurable wait intervals between retries.
This helper is particularly useful for handling asynchronously updated cgroup statistics (e.g., memory.stat), where immediate reads may observe stale values, especially on busy systems. It allows tests and other utilities to handle such cases more flexibly.
Signed-off-by: Guopeng Zhang zhangguopeng@kylinos.cn Suggested-by: Michal Koutný mkoutny@suse.com Reviewed-by: Shakeel Butt shakeel.butt@linux.dev --- .../selftests/cgroup/lib/cgroup_util.c | 21 +++++++++++++++++++ .../cgroup/lib/include/cgroup_util.h | 5 +++++ 2 files changed, 26 insertions(+)
diff --git a/tools/testing/selftests/cgroup/lib/cgroup_util.c b/tools/testing/selftests/cgroup/lib/cgroup_util.c index 44c52f620fda..ce6c2642fd9b 100644 --- a/tools/testing/selftests/cgroup/lib/cgroup_util.c +++ b/tools/testing/selftests/cgroup/lib/cgroup_util.c @@ -168,6 +168,27 @@ long cg_read_key_long(const char *cgroup, const char *control, const char *key) return atol(ptr + strlen(key)); }
+long cg_read_key_long_poll(const char *cgroup, const char *control, + const char *key, long expected, int retries, + useconds_t wait_interval_us) +{ + long val = -1; + int i; + + for (i = 0; i < retries; i++) { + val = cg_read_key_long(cgroup, control, key); + if (val < 0) + return val; + + if (val == expected) + break; + + usleep(wait_interval_us); + } + + return val; +} + long cg_read_lc(const char *cgroup, const char *control) { char buf[PAGE_SIZE]; diff --git a/tools/testing/selftests/cgroup/lib/include/cgroup_util.h b/tools/testing/selftests/cgroup/lib/include/cgroup_util.h index 7ab2824ed7b5..77f386dab5e8 100644 --- a/tools/testing/selftests/cgroup/lib/include/cgroup_util.h +++ b/tools/testing/selftests/cgroup/lib/include/cgroup_util.h @@ -17,6 +17,8 @@ #define CG_NAMED_NAME "selftest" #define CG_PATH_FORMAT (!cg_test_v1_named ? "0::%s" : (":name=" CG_NAMED_NAME ":%s"))
+#define DEFAULT_WAIT_INTERVAL_US (100 * 1000) /* 100 ms */ + /* * Checks if two given values differ by less than err% of their sum. */ @@ -64,6 +66,9 @@ extern int cg_read_strstr(const char *cgroup, const char *control, extern long cg_read_long(const char *cgroup, const char *control); extern long cg_read_long_fd(int fd); long cg_read_key_long(const char *cgroup, const char *control, const char *key); +long cg_read_key_long_poll(const char *cgroup, const char *control, + const char *key, long expected, int retries, + useconds_t wait_interval_us); extern long cg_read_lc(const char *cgroup, const char *control); extern int cg_write(const char *cgroup, const char *control, char *buf); extern int cg_open(const char *cgroup, const char *control, int flags);
test_memcg_sock() currently requires that memory.stat's "sock " counter is exactly zero immediately after the TCP server exits. On a busy system this assumption is too strict:
- Socket memory may be freed with a small delay (e.g. RCU callbacks). - memcg statistics are updated asynchronously via the rstat flushing worker, so the "sock " value in memory.stat can stay non-zero for a short period of time even after all socket memory has been uncharged.
As a result, test_memcg_sock() can intermittently fail even though socket memory accounting is working correctly.
Make the test more robust by polling memory.stat for the "sock " counter and allowing it some time to drop to zero instead of checking it only once. The timeout is set to 3 seconds to cover the periodic rstat flush interval (FLUSH_TIME = 2*HZ by default) plus some scheduling slack. If the counter does not become zero within the timeout, the test still fails as before.
On my test system, running test_memcontrol 50 times produced:
- Before this patch: 6/50 runs passed. - After this patch: 50/50 runs passed.
Signed-off-by: Guopeng Zhang zhangguopeng@kylinos.cn Suggested-by: Lance Yang lance.yang@linux.dev Reviewed-by: Shakeel Butt shakeel.butt@linux.dev --- .../selftests/cgroup/test_memcontrol.c | 20 ++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index 4e1647568c5b..2fb096a2a9f9 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -21,6 +21,8 @@ #include "kselftest.h" #include "cgroup_util.h"
+#define MEMCG_SOCKSTAT_WAIT_RETRIES 30 + static bool has_localevents; static bool has_recursiveprot;
@@ -1384,6 +1386,7 @@ static int test_memcg_sock(const char *root) int bind_retries = 5, ret = KSFT_FAIL, pid, err; unsigned short port; char *memcg; + long sock_post = -1;
memcg = cg_name(root, "memcg_test"); if (!memcg) @@ -1432,7 +1435,22 @@ static int test_memcg_sock(const char *root) if (cg_read_long(memcg, "memory.current") < 0) goto cleanup;
- if (cg_read_key_long(memcg, "memory.stat", "sock ")) + /* + * memory.stat is updated asynchronously via the memcg rstat + * flushing worker, which runs periodically (every 2 seconds, + * see FLUSH_TIME). On a busy system, the "sock " counter may + * stay non-zero for a short period of time after the TCP + * connection is closed and all socket memory has been + * uncharged. + * + * Poll memory.stat for up to 3 seconds (~FLUSH_TIME plus some + * scheduling slack) and require that the "sock " counter + * eventually drops to zero. + */ + sock_post = cg_read_key_long_poll(memcg, "memory.stat", "sock ", 0, + MEMCG_SOCKSTAT_WAIT_RETRIES, + DEFAULT_WAIT_INTERVAL_US); + if (sock_post) goto cleanup;
ret = KSFT_PASS;
Replace the manual sleep-and-retry logic in test_kmem_dead_cgroups() with the new helper `cg_read_key_long_poll()`. This change improves the robustness of the test by polling the "nr_dying_descendants" counter in `cgroup.stat` until it reaches 0 or the timeout is exceeded.
Additionally, increase the retry timeout to 8 seconds (from 5 seconds) based on testing results: - With 5-second timeout: 4/20 runs passed. - With 8-second timeout: 20/20 runs passed.
The 8 second timeout is based on stress testing of test_kmem_dead_cgroups() under load: 5 seconds was occasionally not enough for reclaim of dying descendants to complete, whereas 8 seconds consistently covered the observed latencies. This value is intended as a generous upper bound for the asynchronous reclaim and is not tied to any specific kernel constant, so it can be adjusted in the future if reclaim behavior changes.
Signed-off-by: Guopeng Zhang zhangguopeng@kylinos.cn Reviewed-by: Shakeel Butt shakeel.butt@linux.dev --- tools/testing/selftests/cgroup/test_kmem.c | 33 ++++++++++------------ 1 file changed, 15 insertions(+), 18 deletions(-)
diff --git a/tools/testing/selftests/cgroup/test_kmem.c b/tools/testing/selftests/cgroup/test_kmem.c index ca38525484e3..eeabd34bf083 100644 --- a/tools/testing/selftests/cgroup/test_kmem.c +++ b/tools/testing/selftests/cgroup/test_kmem.c @@ -26,6 +26,7 @@ */ #define MAX_VMSTAT_ERROR (4096 * 64 * get_nprocs())
+#define KMEM_DEAD_WAIT_RETRIES 80
static int alloc_dcache(const char *cgroup, void *arg) { @@ -306,9 +307,7 @@ static int test_kmem_dead_cgroups(const char *root) { int ret = KSFT_FAIL; char *parent; - long dead; - int i; - int max_time = 20; + long dead = -1;
parent = cg_name(root, "kmem_dead_cgroups_test"); if (!parent) @@ -323,21 +322,19 @@ static int test_kmem_dead_cgroups(const char *root) if (cg_run_in_subcgroups(parent, alloc_dcache, (void *)100, 30)) goto cleanup;
- for (i = 0; i < max_time; i++) { - dead = cg_read_key_long(parent, "cgroup.stat", - "nr_dying_descendants "); - if (dead == 0) { - ret = KSFT_PASS; - break; - } - /* - * Reclaiming cgroups might take some time, - * let's wait a bit and repeat. - */ - sleep(1); - if (i > 5) - printf("Waiting time longer than 5s; wait: %ds (dead: %ld)\n", i, dead); - } + /* + * Allow up to ~8s for reclaim of dying descendants to complete. + * This is a generous upper bound derived from stress testing, not + * from a specific kernel constant, and can be adjusted if reclaim + * behavior changes in the future. + */ + dead = cg_read_key_long_poll(parent, "cgroup.stat", + "nr_dying_descendants ", 0, KMEM_DEAD_WAIT_RETRIES, + DEFAULT_WAIT_INTERVAL_US); + if (dead) + goto cleanup; + + ret = KSFT_PASS;
cleanup: cg_destroy(parent);
On Wed, Dec 03, 2025 at 07:56:29PM +0800, Guopeng Zhang zhangguopeng@kylinos.cn wrote:
Introduce a new helper function `cg_read_key_long_poll()` in cgroup_util.h. This function polls the specified key in a cgroup file until it matches the expected value or the retry limit is reached, with configurable wait intervals between retries.
This helper is particularly useful for handling asynchronously updated cgroup statistics (e.g., memory.stat), where immediate reads may observe stale values, especially on busy systems. It allows tests and other utilities to handle such cases more flexibly.
Signed-off-by: Guopeng Zhang zhangguopeng@kylinos.cn Suggested-by: Michal Koutný mkoutny@suse.com Reviewed-by: Shakeel Butt shakeel.butt@linux.dev
.../selftests/cgroup/lib/cgroup_util.c | 21 +++++++++++++++++++ .../cgroup/lib/include/cgroup_util.h | 5 +++++ 2 files changed, 26 insertions(+)
Acked-by: Michal Koutný mkoutny@suse.com
On Wed, Dec 03, 2025 at 07:56:30PM +0800, Guopeng Zhang zhangguopeng@kylinos.cn wrote:
test_memcg_sock() currently requires that memory.stat's "sock " counter is exactly zero immediately after the TCP server exits. On a busy system this assumption is too strict:
- Socket memory may be freed with a small delay (e.g. RCU callbacks).
- memcg statistics are updated asynchronously via the rstat flushing worker, so the "sock " value in memory.stat can stay non-zero for a short period of time even after all socket memory has been uncharged.
As a result, test_memcg_sock() can intermittently fail even though socket memory accounting is working correctly.
Make the test more robust by polling memory.stat for the "sock " counter and allowing it some time to drop to zero instead of checking it only once. The timeout is set to 3 seconds to cover the periodic rstat flush interval (FLUSH_TIME = 2*HZ by default) plus some scheduling slack. If the counter does not become zero within the timeout, the test still fails as before.
On my test system, running test_memcontrol 50 times produced:
- Before this patch: 6/50 runs passed.
- After this patch: 50/50 runs passed.
Signed-off-by: Guopeng Zhang zhangguopeng@kylinos.cn Suggested-by: Lance Yang lance.yang@linux.dev Reviewed-by: Shakeel Butt shakeel.butt@linux.dev
.../selftests/cgroup/test_memcontrol.c | 20 ++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-)
Acked-by: Michal Koutný mkoutny@suse.com
On Wed, Dec 03, 2025 at 07:56:31PM +0800, Guopeng Zhang zhangguopeng@kylinos.cn wrote:
Replace the manual sleep-and-retry logic in test_kmem_dead_cgroups() with the new helper `cg_read_key_long_poll()`. This change improves the robustness of the test by polling the "nr_dying_descendants" counter in `cgroup.stat` until it reaches 0 or the timeout is exceeded.
Additionally, increase the retry timeout to 8 seconds (from 5 seconds) based on testing results:
- With 5-second timeout: 4/20 runs passed.
- With 8-second timeout: 20/20 runs passed.
The 8 second timeout is based on stress testing of test_kmem_dead_cgroups() under load: 5 seconds was occasionally not enough for reclaim of dying descendants to complete, whereas 8 seconds consistently covered the observed latencies. This value is intended as a generous upper bound for the asynchronous reclaim and is not tied to any specific kernel constant, so it can be adjusted in the future if reclaim behavior changes.
Great!
Signed-off-by: Guopeng Zhang zhangguopeng@kylinos.cn Reviewed-by: Shakeel Butt shakeel.butt@linux.dev
tools/testing/selftests/cgroup/test_kmem.c | 33 ++++++++++------------ 1 file changed, 15 insertions(+), 18 deletions(-)
Acked-by: Michal Koutný mkoutny@suse.com
linux-kselftest-mirror@lists.linaro.org