The test_memcontrol selftest consistently fails its test_memcg_low sub-test due to the fact that two of its test child cgroups which have a memmory.low of 0 or an effective memory.low of 0 still have low events generated for them since mem_cgroup_below_low() use the ">=" operator when comparing to elow.
The two failed use cases are as follows:
1) memory.low is set to 0, but low events can still be triggered and so the cgroup may have a non-zero low event count.
2) memory.low is set to a non-zero value but the cgroup has no task in it so that it has an effective low value of 0. Again it may have a non-zero low event count if memory reclaim happens. This is probably not a result expected by the users and it is really doubtful that users will check an empty cgroup with no task in it and expecting some non-zero event counts.
In the first case, even though memory.low isn't set, it may still have some low protection if memory.low is set in the parent and the cgroup2 memory_recursiveprot mount option is enabled. So low event may still be recorded. The test_memcontrol.c test has to be modified to account for that.
For the second case, it really doesn't make sense to have non-zero low event if the cgroup has 0 usage. So we need to skip this corner case in shrink_node_memcgs() by skipping the !usage case.
With this patch applied, the test_memcg_low sub-test finishes successfully without failure in most cases. Though both test_memcg_low and test_memcg_min sub-tests may still fail occasionally if the memory.current values fall outside of the expected ranges.
Suggested-by: Johannes Weiner hannes@cmpxchg.org Suggested-by: Michal Koutný mkoutny@suse.com Signed-off-by: Waiman Long longman@redhat.com --- mm/internal.h | 9 +++++++++ mm/memcontrol-v1.h | 2 -- mm/vmscan.c | 4 ++++ tools/testing/selftests/cgroup/test_memcontrol.c | 16 +++++++++++----- 4 files changed, 24 insertions(+), 7 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h index 50c2f590b2d0..c06fb0e8d75c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1535,6 +1535,15 @@ void __meminit __init_page_from_nid(unsigned long pfn, int nid); unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg, int priority);
+#ifdef CONFIG_MEMCG +unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap); +#else +static inline unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) +{ + return 1UL; +} +#endif + #ifdef CONFIG_SHRINKER_DEBUG static inline __printf(2, 0) int shrinker_debugfs_name_alloc( struct shrinker *shrinker, const char *fmt, va_list ap) diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 6358464bb416..e92b21af92b1 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -22,8 +22,6 @@ iter != NULL; \ iter = mem_cgroup_iter(NULL, iter, NULL))
-unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap); - void drain_all_stock(struct mem_cgroup *root_memcg);
unsigned long memcg_events(struct mem_cgroup *memcg, int event); diff --git a/mm/vmscan.c b/mm/vmscan.c index b620d74b0f66..a771a0145a12 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -5963,6 +5963,10 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
mem_cgroup_calculate_protection(target_memcg, memcg);
+ /* Skip memcg with no usage */ + if (!mem_cgroup_usage(memcg, false)) + continue; + if (mem_cgroup_below_min(target_memcg, memcg)) { /* * Hard protection. diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index 16f5d74ae762..5a5dcbe57b56 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -380,10 +380,10 @@ static bool reclaim_until(const char *memcg, long goal); * * Then it checks actual memory usages and expects that: * A/B memory.current ~= 50M - * A/B/C memory.current ~= 29M - * A/B/D memory.current ~= 21M - * A/B/E memory.current ~= 0 - * A/B/F memory.current = 0 + * A/B/C memory.current ~= 29M [memory.events:low > 0] + * A/B/D memory.current ~= 21M [memory.events:low > 0] + * A/B/E memory.current ~= 0 [memory.events:low == 0 if !memory_recursiveprot, > 0 otherwise] + * A/B/F memory.current = 0 [memory.events:low == 0] * (for origin of the numbers, see model in memcg_protection.m.) * * After that it tries to allocate more than there is @@ -525,8 +525,14 @@ static int test_memcg_protection(const char *root, bool min) goto cleanup; }
+ /* + * Child 2 has memory.low=0, but some low protection is still being + * distributed down from its parent with memory.low=50M if cgroup2 + * memory_recursiveprot mount option is enabled. So the low event + * count will be non-zero in this case. + */ for (i = 0; i < ARRAY_SIZE(children); i++) { - int no_low_events_index = 1; + int no_low_events_index = has_recursiveprot ? 2 : 1; long low, oom;
oom = cg_read_key_long(children[i], "memory.events", "oom ");