On Thu, Nov 20, 2025 at 10:12 AM Guopeng Zhang zhangguopeng@kylinos.cn wrote:
On 11/19/25 20:27, Lance Yang wrote:
From: Lance Yang lance.yang@linux.dev
On Wed, 19 Nov 2025 18:52:16 +0800, Guopeng Zhang wrote:
test_memcg_sock() currently requires that memory.stat's "sock " counter is exactly zero immediately after the TCP server exits. On a busy system this assumption is too strict:
- Socket memory may be freed with a small delay (e.g. RCU callbacks).
- memcg statistics are updated asynchronously via the rstat flushing worker, so the "sock " value in memory.stat can stay non-zero for a short period of time even after all socket memory has been uncharged.
As a result, test_memcg_sock() can intermittently fail even though socket memory accounting is working correctly.
Make the test more robust by polling memory.stat for the "sock " counter and allowing it some time to drop to zero instead of checking it only once. If the counter does not become zero within the timeout, the test still fails as before.
On my test system, running test_memcontrol 50 times produced:
- Before this patch: 6/50 runs passed.
- After this patch: 50/50 runs passed.
Hi Lance,
Thanks a lot for your review and helpful comments!
Good catch! Thanks!
With more CPU cores, updates may be distributed across cores, making it slower to reach the per-CPU flush threshold, IIUC :)
Yes, that matches what I’ve seen as well — on larger systems it indeed takes longer for the stats to converge due to per-CPU distribution and the flush threshold.
Me too.
I previously proposed a potential solution to explicitly flush stats via a new interface, "memory.stat_refresh" [1]. However, improving the existing flush mechanism would likely be a better long-term direction.
Links: [1] https://lore.kernel.org/linux-mm/20251110101948.19277-1-leon.huangfu@shopee....
Thanks, Leon
[...]