v3: - Take up Johannes' suggestion of just skip the !usage case and fix test_memcontrol selftest to fix the rests of the min/low failures. v4: - Add "#ifdef CONFIG_MEMCG" directives around shrink_node_memcgs() to avoid compilation problem with !CONFIG_MEMCG configs.
The test_memcontrol selftest consistently fails its test_memcg_low sub-test and sporadically fails its test_memcg_min sub-test. This patchset fixes the test_memcg_min and test_memcg_low failures by skipping the !usage case in shrink_node_memcgs() and adjust the test_memcontrol selftest to fix other causes of the test failures.
Note that I decide not to use the suggested mem_cgroup_usage() call as it is a real function call defined in mm/memcontrol.c to be used mainly by cgroup v1 code.
Waiman Long (2): mm/vmscan: Skip memcg with !usage in shrink_node_memcgs() selftests: memcg: Increase error tolerance of child memory.current check in test_memcg_protection()
mm/vmscan.c | 10 ++++++++++ tools/testing/selftests/cgroup/test_memcontrol.c | 11 ++++++++--- 2 files changed, 18 insertions(+), 3 deletions(-)
The test_memcontrol selftest consistently fails its test_memcg_low sub-test due to the fact that two of its test child cgroups which have a memmory.low of 0 or an effective memory.low of 0 still have low events generated for them since mem_cgroup_below_low() use the ">=" operator when comparing to elow.
The two failed use cases are as follows:
1) memory.low is set to 0, but low events can still be triggered and so the cgroup may have a non-zero low event count. I doubt users are looking for that as they didn't set memory.low at all.
2) memory.low is set to a non-zero value but the cgroup has no task in it so that it has an effective low value of 0. Again it may have a non-zero low event count if memory reclaim happens. This is probably not a result expected by the users and it is really doubtful that users will check an empty cgroup with no task in it and expecting some non-zero event counts.
In the first case, even though memory.low isn't set, it may still have some low protection if memory.low is set in the parent. So low event may still be recorded. The test_memcontrol.c test has to be modified to account for that.
For the second case, it really doesn't make sense to have non-zero low event if the cgroup has 0 usage. So we need to skip this corner case in shrink_node_memcgs() by skipping the !usage case. The "#ifdef CONFIG_MEMCG" directive is added to avoid problem with the non-CONFIG_MEMCG case.
With this patch applied, the test_memcg_low sub-test finishes successfully without failure in most cases. Though both test_memcg_low and test_memcg_min sub-tests may still fail occasionally if the memory.current values fall outside of the expected ranges.
Suggested-by: Johannes Weiner hannes@cmpxchg.org Signed-off-by: Waiman Long longman@redhat.com --- mm/vmscan.c | 10 ++++++++++ tools/testing/selftests/cgroup/test_memcontrol.c | 7 ++++++- 2 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c index b620d74b0f66..65dee0ad6627 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -5926,6 +5926,7 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, return inactive_lru_pages > pages_for_compaction; }
+#ifdef CONFIG_MEMCG static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) { struct mem_cgroup *target_memcg = sc->target_mem_cgroup; @@ -5963,6 +5964,10 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
mem_cgroup_calculate_protection(target_memcg, memcg);
+ /* Skip memcg with no usage */ + if (!page_counter_read(&memcg->memory)) + continue; + if (mem_cgroup_below_min(target_memcg, memcg)) { /* * Hard protection. @@ -6004,6 +6009,11 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) } } while ((memcg = mem_cgroup_iter(target_memcg, memcg, partial))); } +#else +static inline void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) +{ +} +#endif /* CONFIG_MEMCG */
static void shrink_node(pg_data_t *pgdat, struct scan_control *sc) { diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index 16f5d74ae762..bab826b6b7b0 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -525,8 +525,13 @@ static int test_memcg_protection(const char *root, bool min) goto cleanup; }
+ /* + * Child 2 has memory.low=0, but some low protection is still being + * distributed down from its parent with memory.low=50M. So the low + * event count will be non-zero. + */ for (i = 0; i < ARRAY_SIZE(children); i++) { - int no_low_events_index = 1; + int no_low_events_index = 2; long low, oom;
oom = cg_read_key_long(children[i], "memory.events", "oom ");
On Sun, Apr 06, 2025 at 09:41:58PM -0400, Waiman Long wrote:
The test_memcontrol selftest consistently fails its test_memcg_low sub-test due to the fact that two of its test child cgroups which have a memmory.low of 0 or an effective memory.low of 0 still have low events generated for them since mem_cgroup_below_low() use the ">=" operator when comparing to elow.
The two failed use cases are as follows:
memory.low is set to 0, but low events can still be triggered and so the cgroup may have a non-zero low event count. I doubt users are looking for that as they didn't set memory.low at all.
memory.low is set to a non-zero value but the cgroup has no task in it so that it has an effective low value of 0. Again it may have a non-zero low event count if memory reclaim happens. This is probably not a result expected by the users and it is really doubtful that users will check an empty cgroup with no task in it and expecting some non-zero event counts.
In the first case, even though memory.low isn't set, it may still have some low protection if memory.low is set in the parent. So low event may still be recorded. The test_memcontrol.c test has to be modified to account for that.
For the second case, it really doesn't make sense to have non-zero low event if the cgroup has 0 usage. So we need to skip this corner case in shrink_node_memcgs() by skipping the !usage case. The "#ifdef CONFIG_MEMCG" directive is added to avoid problem with the non-CONFIG_MEMCG case.
With this patch applied, the test_memcg_low sub-test finishes successfully without failure in most cases. Though both test_memcg_low and test_memcg_min sub-tests may still fail occasionally if the memory.current values fall outside of the expected ranges.
Suggested-by: Johannes Weiner hannes@cmpxchg.org Signed-off-by: Waiman Long longman@redhat.com
mm/vmscan.c | 10 ++++++++++ tools/testing/selftests/cgroup/test_memcontrol.c | 7 ++++++- 2 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c index b620d74b0f66..65dee0ad6627 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -5926,6 +5926,7 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, return inactive_lru_pages > pages_for_compaction; } +#ifdef CONFIG_MEMCG static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) { struct mem_cgroup *target_memcg = sc->target_mem_cgroup; @@ -5963,6 +5964,10 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) mem_cgroup_calculate_protection(target_memcg, memcg);
/* Skip memcg with no usage */
if (!page_counter_read(&memcg->memory))
continue;
Please use mem_cgroup_usage() like I had originally suggested.
The !CONFIG_MEMCG case can be done like its root cgroup branch.
if (mem_cgroup_below_min(target_memcg, memcg)) { /* * Hard protection.
@@ -6004,6 +6009,11 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) } } while ((memcg = mem_cgroup_iter(target_memcg, memcg, partial))); } +#else +static inline void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) +{ +} +#endif /* CONFIG_MEMCG */
You made the entire reclaim path a nop for !CONFIG_MEMCG.
On 4/7/25 10:24 AM, Johannes Weiner wrote:
On Sun, Apr 06, 2025 at 09:41:58PM -0400, Waiman Long wrote:
The test_memcontrol selftest consistently fails its test_memcg_low sub-test due to the fact that two of its test child cgroups which have a memmory.low of 0 or an effective memory.low of 0 still have low events generated for them since mem_cgroup_below_low() use the ">=" operator when comparing to elow.
The two failed use cases are as follows:
memory.low is set to 0, but low events can still be triggered and so the cgroup may have a non-zero low event count. I doubt users are looking for that as they didn't set memory.low at all.
memory.low is set to a non-zero value but the cgroup has no task in it so that it has an effective low value of 0. Again it may have a non-zero low event count if memory reclaim happens. This is probably not a result expected by the users and it is really doubtful that users will check an empty cgroup with no task in it and expecting some non-zero event counts.
In the first case, even though memory.low isn't set, it may still have some low protection if memory.low is set in the parent. So low event may still be recorded. The test_memcontrol.c test has to be modified to account for that.
For the second case, it really doesn't make sense to have non-zero low event if the cgroup has 0 usage. So we need to skip this corner case in shrink_node_memcgs() by skipping the !usage case. The "#ifdef CONFIG_MEMCG" directive is added to avoid problem with the non-CONFIG_MEMCG case.
With this patch applied, the test_memcg_low sub-test finishes successfully without failure in most cases. Though both test_memcg_low and test_memcg_min sub-tests may still fail occasionally if the memory.current values fall outside of the expected ranges.
Suggested-by: Johannes Weiner hannes@cmpxchg.org Signed-off-by: Waiman Long longman@redhat.com
mm/vmscan.c | 10 ++++++++++ tools/testing/selftests/cgroup/test_memcontrol.c | 7 ++++++- 2 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c index b620d74b0f66..65dee0ad6627 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -5926,6 +5926,7 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, return inactive_lru_pages > pages_for_compaction; } +#ifdef CONFIG_MEMCG static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) { struct mem_cgroup *target_memcg = sc->target_mem_cgroup; @@ -5963,6 +5964,10 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) mem_cgroup_calculate_protection(target_memcg, memcg);
/* Skip memcg with no usage */
if (!page_counter_read(&memcg->memory))
continue;
Please use mem_cgroup_usage() like I had originally suggested.
The !CONFIG_MEMCG case can be done like its root cgroup branch.
Will do that.
if (mem_cgroup_below_min(target_memcg, memcg)) { /* * Hard protection.
@@ -6004,6 +6009,11 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) } } while ((memcg = mem_cgroup_iter(target_memcg, memcg, partial))); } +#else +static inline void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) +{ +} +#endif /* CONFIG_MEMCG */
You made the entire reclaim path a nop for !CONFIG_MEMCG.
Yes, that is probably not right. Will fix that.
Cheers, Longman
Hi Waiman.
On Sun, Apr 06, 2025 at 09:41:58PM -0400, Waiman Long longman@redhat.com wrote: ...
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index 16f5d74ae762..bab826b6b7b0 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c
I'd suggest updating also the header of the test for clarity and then exempt the Child 2 ('E') conditionally from comparisons, something like:
@@ -380,10 +380,10 @@ static bool reclaim_until(const char *memcg, long goal); * * Then it checks actual memory usages and expects that: * A/B memory.current ~= 50M - * A/B/C memory.current ~= 29M - * A/B/D memory.current ~= 21M - * A/B/E memory.current ~= 0 - * A/B/F memory.current = 0 + * A/B/C memory.current ~= 29M, memory.events:low > 0 + * A/B/D memory.current ~= 21M, memory.events:low > 0 + * A/B/E memory.current ~= 0, memory.events:low not specified (==0 w/out memory_recursiveprot) + * A/B/F memory.current = 0, memory.events:low == 0 * (for origin of the numbers, see model in memcg_protection.m.) * * After that it tries to allocate more than there is @@ -527,6 +527,7 @@ static int test_memcg_protection(const char *root, bool min)
for (i = 0; i < ARRAY_SIZE(children); i++) { int no_low_events_index = 1; + int ignore_low_events_index = has_recursiveprot ? 2 : -1; long low, oom;
oom = cg_read_key_long(children[i], "memory.events", "oom "); @@ -534,6 +535,8 @@ static int test_memcg_protection(const char *root, bool min)
if (oom) goto cleanup; + if (i == ignore_low_events_index) + continue; if (i <= no_low_events_index && low <= 0) goto cleanup; if (i > no_low_events_index && low)
On 4/7/25 11:25 AM, Michal Koutný wrote:
Hi Waiman.
On Sun, Apr 06, 2025 at 09:41:58PM -0400, Waiman Long longman@redhat.com wrote: ...
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index 16f5d74ae762..bab826b6b7b0 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c
I'd suggest updating also the header of the test for clarity and then exempt the Child 2 ('E') conditionally from comparisons, something like:
@@ -380,10 +380,10 @@ static bool reclaim_until(const char *memcg, long goal);
- Then it checks actual memory usages and expects that:
- A/B memory.current ~= 50M
- A/B/C memory.current ~= 29M
- A/B/D memory.current ~= 21M
- A/B/E memory.current ~= 0
- A/B/F memory.current = 0
- A/B/C memory.current ~= 29M, memory.events:low > 0
- A/B/D memory.current ~= 21M, memory.events:low > 0
- A/B/E memory.current ~= 0, memory.events:low not specified (==0 w/out memory_recursiveprot)
- A/B/F memory.current = 0, memory.events:low == 0
- (for origin of the numbers, see model in memcg_protection.m.)
Sorry for the late reply. I think it is a good idea to update the header as well. This function is actually used by both test_memcg_low and test_memcg.min. So I will use low/min instead.
Cheers, Longman
- After that it tries to allocate more than there is
@@ -527,6 +527,7 @@ static int test_memcg_protection(const char *root, bool min)
for (i = 0; i < ARRAY_SIZE(children); i++) { int no_low_events_index = 1;
int ignore_low_events_index = has_recursiveprot ? 2 : -1; long low, oom; oom = cg_read_key_long(children[i], "memory.events", "oom ");
@@ -534,6 +535,8 @@ static int test_memcg_protection(const char *root, bool min)
if (oom) goto cleanup;
if (i == ignore_low_events_index)
continue; if (i <= no_low_events_index && low <= 0) goto cleanup; if (i > no_low_events_index && low)
On Fri, 11 Apr 2025 17:08:33 -0400 Waiman Long llong@redhat.com wrote:
- A/B/F memory.current = 0
- A/B/C memory.current ~= 29M, memory.events:low > 0
- A/B/D memory.current ~= 21M, memory.events:low > 0
- A/B/E memory.current ~= 0, memory.events:low not specified (==0 w/out memory_recursiveprot)
- A/B/F memory.current = 0, memory.events:low == 0
- (for origin of the numbers, see model in memcg_protection.m.)
Sorry for the late reply. I think it is a good idea to update the header as well. This function is actually used by both test_memcg_low and test_memcg.min. So I will use low/min instead.
It appears that quite a few updates are expected for this series, so I'll drop v4 from mm.git.
The test_memcg_protection() function is used for the test_memcg_min and test_memcg_low sub-tests. This function generates a set of parent/child cgroups like:
parent: memory.min/low = 50M child 0: memory.min/low = 75M, memory.current = 50M child 1: memory.min/low = 25M, memory.current = 50M child 2: memory.min/low = 0, memory.current = 50M
After applying memory pressure, the function expects the following actual memory usages.
parent: memory.current ~= 50M child 0: memory.current ~= 29M child 1: memory.current ~= 21M child 2: memory.current ~= 0
In reality, the actual memory usages can differ quite a bit from the expected values. It uses an error tolerance of 10% with the values_close() helper.
Both the test_memcg_min and test_memcg_low sub-tests can fail sporadically because the actual memory usage exceeds the 10% error tolerance. Below are a sample of the usage data of the tests runs that fail.
Child Actual usage Expected usage %err ----- ------------ -------------- ---- 1 16990208 22020096 -12.9% 1 17252352 22020096 -12.1% 0 37699584 30408704 +10.7% 1 14368768 22020096 -21.0% 1 16871424 22020096 -13.2%
The current 10% error tolerenace might be right at the time test_memcontrol.c was first introduced in v4.18 kernel, but memory reclaim have certainly evolved quite a bit since then which may result in a bit more run-to-run variation than previously expected.
Increase the error tolerance to 15% for child 0 and 20% for child 1 to minimize the chance of this type of failure. The tolerance is bigger for child 1 because an upswing in child 0 corresponds to a smaller %err than a similar downswing in child 1 due to the way %err is used in values_close().
Before this patch, a 100 test runs of test_memcontrol produced the following results:
17 not ok 1 test_memcg_min 22 not ok 2 test_memcg_low
After applying this patch, there were no test failure for test_memcg_min and test_memcg_low in 100 test runs.
Signed-off-by: Waiman Long longman@redhat.com --- tools/testing/selftests/cgroup/test_memcontrol.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index bab826b6b7b0..8f4f2479650e 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -495,10 +495,10 @@ static int test_memcg_protection(const char *root, bool min) for (i = 0; i < ARRAY_SIZE(children); i++) c[i] = cg_read_long(children[i], "memory.current");
- if (!values_close(c[0], MB(29), 10)) + if (!values_close(c[0], MB(29), 15)) goto cleanup;
- if (!values_close(c[1], MB(21), 10)) + if (!values_close(c[1], MB(21), 20)) goto cleanup;
if (c[3] != 0)
linux-kselftest-mirror@lists.linaro.org