On Thu 21-05-20 11:55:16, Michal Hocko wrote:
On Wed 20-05-20 20:09:06, Chris Down wrote:
Hi Naresh,
Naresh Kamboju writes:
As a part of investigation on this issue LKFT teammate Anders Roxell git bisected the problem and found bad commit(s) which caused this problem.
The following two patches have been reverted on next-20200519 and retested the reproducible steps and confirmed the test case mkfs -t ext4 got PASS. ( invoked oom-killer is gone now)
Revert "mm, memcg: avoid stale protection values when cgroup is above protection" This reverts commit 23a53e1c02006120f89383270d46cbd040a70bc6.
Revert "mm, memcg: decouple e{low,min} state mutations from protection checks" This reverts commit 7b88906ab7399b58bb088c28befe50bcce076d82.
Thanks Anders and Naresh for tracking this down and reverting.
I'll take a look tomorrow. I don't see anything immediately obviously wrong in either of those commits from a (very) cursory glance, but they should only be taking effect if protections are set.
Agreed. If memory.{low,min} is not used then the patch should be effectively a nop.
I was staring into the code and do not see anything. Could you give the following debugging patch a try and see whether it triggers?
diff --git a/mm/vmscan.c b/mm/vmscan.c index cc555903a332..df2e8df0eb71 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2404,6 +2404,8 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, * sc->priority further than desirable. */ scan = max(scan, SWAP_CLUSTER_MAX); + + trace_printk("scan:%lu protection:%lu\n", scan, protection); } else { scan = lruvec_size; } @@ -2648,6 +2650,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) mem_cgroup_calculate_protection(target_memcg, memcg);
if (mem_cgroup_below_min(memcg)) { + trace_printk("under min:%lu emin:%lu\n", memcg->memory.min, memcg->memory.emin); /* * Hard protection. * If there is no reclaimable memory, OOM. @@ -2660,6 +2663,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) * there is an unprotected supply * of reclaimable memory from other cgroups. */ + trace_printk("under low:%lu elow:%lu\n", memcg->memory.low, memcg->memory.elow); if (!sc->memcg_low_reclaim) { sc->memcg_low_skipped = 1; continue;