test_memcg_sock() currently requires that memory.stat's "sock " counter is exactly zero immediately after the TCP server exits. On a busy system this assumption is too strict:
- Socket memory may be freed with a small delay (e.g. RCU callbacks). - memcg statistics are updated asynchronously via the rstat flushing worker, so the "sock " value in memory.stat can stay non-zero for a short period of time even after all socket memory has been uncharged.
As a result, test_memcg_sock() can intermittently fail even though socket memory accounting is working correctly.
Make the test more robust by polling memory.stat for the "sock " counter and allowing it some time to drop to zero instead of checking it only once. If the counter does not become zero within the timeout, the test still fails as before.
On my test system, running test_memcontrol 50 times produced:
- Before this patch: 6/50 runs passed. - After this patch: 50/50 runs passed.
Signed-off-by: Guopeng Zhang zhangguopeng@kylinos.cn --- .../selftests/cgroup/test_memcontrol.c | 24 ++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index 4e1647568c5b..86d9981cddd8 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -1384,6 +1384,8 @@ static int test_memcg_sock(const char *root) int bind_retries = 5, ret = KSFT_FAIL, pid, err; unsigned short port; char *memcg; + long sock_post = -1; + int i, retries = 30;
memcg = cg_name(root, "memcg_test"); if (!memcg) @@ -1432,7 +1434,27 @@ static int test_memcg_sock(const char *root) if (cg_read_long(memcg, "memory.current") < 0) goto cleanup;
- if (cg_read_key_long(memcg, "memory.stat", "sock ")) + /* + * memory.stat is updated asynchronously via the memcg rstat + * flushing worker, so the "sock " counter may stay non-zero + * for a short period of time after the TCP connection is + * closed and all socket memory has been uncharged. + * + * Poll memory.stat for up to 3 seconds and require that the + * "sock " counter eventually drops to zero. + */ + for (i = 0; i < retries; i++) { + sock_post = cg_read_key_long(memcg, "memory.stat", "sock "); + if (sock_post < 0) + goto cleanup; + + if (!sock_post) + break; + + usleep(100 * 1000); /* 100ms */ + } + + if (sock_post) goto cleanup;
ret = KSFT_PASS;
From: Lance Yang lance.yang@linux.dev
On Wed, 19 Nov 2025 18:52:16 +0800, Guopeng Zhang wrote:
test_memcg_sock() currently requires that memory.stat's "sock " counter is exactly zero immediately after the TCP server exits. On a busy system this assumption is too strict:
- Socket memory may be freed with a small delay (e.g. RCU callbacks).
- memcg statistics are updated asynchronously via the rstat flushing worker, so the "sock " value in memory.stat can stay non-zero for a short period of time even after all socket memory has been uncharged.
As a result, test_memcg_sock() can intermittently fail even though socket memory accounting is working correctly.
Make the test more robust by polling memory.stat for the "sock " counter and allowing it some time to drop to zero instead of checking it only once. If the counter does not become zero within the timeout, the test still fails as before.
On my test system, running test_memcontrol 50 times produced:
- Before this patch: 6/50 runs passed.
- After this patch: 50/50 runs passed.
Good catch! Thanks!
With more CPU cores, updates may be distributed across cores, making it slower to reach the per-CPU flush threshold, IIUC :)
Signed-off-by: Guopeng Zhang zhangguopeng@kylinos.cn
.../selftests/cgroup/test_memcontrol.c | 24 ++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index 4e1647568c5b..86d9981cddd8 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -1384,6 +1384,8 @@ static int test_memcg_sock(const char *root) int bind_retries = 5, ret = KSFT_FAIL, pid, err; unsigned short port; char *memcg;
- long sock_post = -1;
- int i, retries = 30;
memcg = cg_name(root, "memcg_test"); if (!memcg) @@ -1432,7 +1434,27 @@ static int test_memcg_sock(const char *root) if (cg_read_long(memcg, "memory.current") < 0) goto cleanup;
- if (cg_read_key_long(memcg, "memory.stat", "sock "))
- /*
* memory.stat is updated asynchronously via the memcg rstat* flushing worker, so the "sock " counter may stay non-zero* for a short period of time after the TCP connection is* closed and all socket memory has been uncharged.** Poll memory.stat for up to 3 seconds and require that the* "sock " counter eventually drops to zero.
It might be worth mentioning that the current periodic rstat flush happens every 2 seconds (#define FLUSH_TIME (2UL*HZ)). Adding this context to the comment would explain why the 3-second timeout was chosen ;)
*/- for (i = 0; i < retries; i++) {
sock_post = cg_read_key_long(memcg, "memory.stat", "sock ");if (sock_post < 0)goto cleanup;if (!sock_post)break;usleep(100 * 1000); /* 100ms */
Nit: It would be better to define the retry count and interval as macros (e.g., MAX_RETRIES, WAIT_INTERVAL) to avoid magic numbers and make the 3s timeout calculation explicit.
- }
- if (sock_post) goto cleanup;
ret = KSFT_PASS;
Thanks, Lance
On 11/19/25 20:27, Lance Yang wrote:
From: Lance Yang lance.yang@linux.dev
On Wed, 19 Nov 2025 18:52:16 +0800, Guopeng Zhang wrote:
test_memcg_sock() currently requires that memory.stat's "sock " counter is exactly zero immediately after the TCP server exits. On a busy system this assumption is too strict:
- Socket memory may be freed with a small delay (e.g. RCU callbacks).
- memcg statistics are updated asynchronously via the rstat flushing worker, so the "sock " value in memory.stat can stay non-zero for a short period of time even after all socket memory has been uncharged.
As a result, test_memcg_sock() can intermittently fail even though socket memory accounting is working correctly.
Make the test more robust by polling memory.stat for the "sock " counter and allowing it some time to drop to zero instead of checking it only once. If the counter does not become zero within the timeout, the test still fails as before.
On my test system, running test_memcontrol 50 times produced:
- Before this patch: 6/50 runs passed.
- After this patch: 50/50 runs passed.
Hi Lance,
Thanks a lot for your review and helpful comments!
Good catch! Thanks!
With more CPU cores, updates may be distributed across cores, making it slower to reach the per-CPU flush threshold, IIUC :)
Yes, that matches what I’ve seen as well — on larger systems it indeed takes longer for the stats to converge due to per-CPU distribution and the flush threshold.
Signed-off-by: Guopeng Zhang zhangguopeng@kylinos.cn
.../selftests/cgroup/test_memcontrol.c | 24 ++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index 4e1647568c5b..86d9981cddd8 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -1384,6 +1384,8 @@ static int test_memcg_sock(const char *root) int bind_retries = 5, ret = KSFT_FAIL, pid, err; unsigned short port; char *memcg;
- long sock_post = -1;
- int i, retries = 30;
memcg = cg_name(root, "memcg_test"); if (!memcg) @@ -1432,7 +1434,27 @@ static int test_memcg_sock(const char *root) if (cg_read_long(memcg, "memory.current") < 0) goto cleanup;
- if (cg_read_key_long(memcg, "memory.stat", "sock "))
- /*
* memory.stat is updated asynchronously via the memcg rstat* flushing worker, so the "sock " counter may stay non-zero* for a short period of time after the TCP connection is* closed and all socket memory has been uncharged.** Poll memory.stat for up to 3 seconds and require that the* "sock " counter eventually drops to zero.It might be worth mentioning that the current periodic rstat flush happens every 2 seconds (#define FLUSH_TIME (2UL*HZ)). Adding this context to the comment would explain why the 3-second timeout was chosen ;)
Good idea,I actually started with a 2-second timeout to match the rstat flush interval (FLUSH_TIME = 2*HZ), and that already reduced the flakiness compared to the original code. However, on my test system there were still a few intermittent failures over multiple runs. Bumping the timeout to 3 seconds made the test stable (50/50 runs passed), so I went with a slightly larger value to cover the periodic flush plus scheduling noise on busy machines. I’ll update the comment in v2 to spell out this rationale.
*/- for (i = 0; i < retries; i++) {
sock_post = cg_read_key_long(memcg, "memory.stat", "sock ");if (sock_post < 0)goto cleanup;if (!sock_post)break;usleep(100 * 1000); /* 100ms */Nit: It would be better to define the retry count and interval as macros (e.g., MAX_RETRIES, WAIT_INTERVAL) to avoid magic numbers and make the 3s timeout calculation explicit.
Makes sense. I’ll introduce macros for the retry count and the wait interval in v2 so that the 3s timeout is explicit and we don’t rely on magic numbers.
I’ll send a v2 shortly.
Best regards, Guopeng
- }
- if (sock_post) goto cleanup;
ret = KSFT_PASS;
Thanks, Lance
On Thu, Nov 20, 2025 at 10:12 AM Guopeng Zhang zhangguopeng@kylinos.cn wrote:
On 11/19/25 20:27, Lance Yang wrote:
From: Lance Yang lance.yang@linux.dev
On Wed, 19 Nov 2025 18:52:16 +0800, Guopeng Zhang wrote:
test_memcg_sock() currently requires that memory.stat's "sock " counter is exactly zero immediately after the TCP server exits. On a busy system this assumption is too strict:
- Socket memory may be freed with a small delay (e.g. RCU callbacks).
- memcg statistics are updated asynchronously via the rstat flushing worker, so the "sock " value in memory.stat can stay non-zero for a short period of time even after all socket memory has been uncharged.
As a result, test_memcg_sock() can intermittently fail even though socket memory accounting is working correctly.
Make the test more robust by polling memory.stat for the "sock " counter and allowing it some time to drop to zero instead of checking it only once. If the counter does not become zero within the timeout, the test still fails as before.
On my test system, running test_memcontrol 50 times produced:
- Before this patch: 6/50 runs passed.
- After this patch: 50/50 runs passed.
Hi Lance,
Thanks a lot for your review and helpful comments!
Good catch! Thanks!
With more CPU cores, updates may be distributed across cores, making it slower to reach the per-CPU flush threshold, IIUC :)
Yes, that matches what I’ve seen as well — on larger systems it indeed takes longer for the stats to converge due to per-CPU distribution and the flush threshold.
Me too.
I previously proposed a potential solution to explicitly flush stats via a new interface, "memory.stat_refresh" [1]. However, improving the existing flush mechanism would likely be a better long-term direction.
Links: [1] https://lore.kernel.org/linux-mm/20251110101948.19277-1-leon.huangfu@shopee....
Thanks, Leon
[...]
linux-kselftest-mirror@lists.linaro.org