Re: mm: mkfs.ext4 invoked oom-killer on i386 - pagecache_get_page

29 May 2020


      On Fri, May 29, 2020 at 12:41 AM Chris Down chris@chrisdown.name wrote:
...
Naresh Kamboju writes:
...
On Thu, 28 May 2020 at 20:33, Michal Hocko mhocko@kernel.org wrote:
...
On Fri 22-05-20 02:23:09, Naresh Kamboju wrote:
...
My apology !
As per the test results history this problem started happening from
Bad : next-20200430 (still reproducible on next-20200519)
Good : next-20200429
The git tree / tag used for testing is from linux next-20200430 tag and reverted
following three patches and oom-killer problem fixed.
Revert "mm, memcg: avoid stale protection values when cgroup is above
protection"
Revert "mm, memcg: decouple e{low,min} state mutations from protectinn checks"
Revert "mm-memcg-decouple-elowmin-state-mutations-from-protection-checks-fix"
The discussion has fragmented and I got lost TBH.
In http://lkml.kernel.org/r/CA+G9fYuDWGZx50UpD+WcsDeHX9vi3hpksvBAWbMgRZadb0Pkww...
you have said that none of the added tracing output has triggered. Does
this still hold? Because I still have a hard time to understand how
those three patches could have the observed effects.
On the other email thread [1] this issue is concluded.
Yafang wrote on May 22 2020,
Regarding the root cause, my guess is it makes a similar mistake that
I tried to fix in the previous patch that the direct reclaimer read a
stale protection value.  But I don't think it is worth to add another
fix. The best way is to revert this commit.
This isn't a conclusion, just a guess (and one I think is unlikely). For this
to reliably happen, it implies that the same race happens the same way each
time.
Hi Chris,
Look at this patch[1] carefully you will find that it introduces the
same issue that I tried to fix in another patch [2]. Even more sad is
these two patches are in the same patchset. Although this issue isn't
related with the issue found by Naresh, we have to ask ourselves why
we always make the same mistake ?
One possible answer is that we always forget the lifecyle of
memory.emin before we read it. memory.emin doesn't have the same
lifecycle with the memcg, while it really has the same lifecyle with
the reclaimer. IOW, once a reclaimer begins the protetion value should
be set to 0, and after we traversal the memcg tree we calculate a
protection value for this reclaimer, finnaly it disapears after the
reclaimer stops. That is why I highly suggest to add an new protection
member in scan_control before.
[1]. https://lore.kernel.org/linux-mm/20200505084127.12923-3-laoar.shao@gmail.com...
[2]. https://lore.kernel.org/linux-mm/20200505084127.12923-2-laoar.shao@gmail.com...
-- 
Thanks
Yafang

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: mm: mkfs.ext4 invoked oom-killer on i386 - pagecache_get_page