On Thu, 28 May 2020 at 20:33, Michal Hocko mhocko@kernel.org wrote:
On Fri 22-05-20 02:23:09, Naresh Kamboju wrote:
My apology ! As per the test results history this problem started happening from Bad : next-20200430 (still reproducible on next-20200519) Good : next-20200429
The git tree / tag used for testing is from linux next-20200430 tag and reverted following three patches and oom-killer problem fixed.
Revert "mm, memcg: avoid stale protection values when cgroup is above protection" Revert "mm, memcg: decouple e{low,min} state mutations from protectinn checks" Revert "mm-memcg-decouple-elowmin-state-mutations-from-protection-checks-fix"
The discussion has fragmented and I got lost TBH. In http://lkml.kernel.org/r/CA+G9fYuDWGZx50UpD+WcsDeHX9vi3hpksvBAWbMgRZadb0Pkww... you have said that none of the added tracing output has triggered. Does this still hold? Because I still have a hard time to understand how those three patches could have the observed effects.
On the other email thread [1] this issue is concluded.
Yafang wrote on May 22 2020,
Regarding the root cause, my guess is it makes a similar mistake that I tried to fix in the previous patch that the direct reclaimer read a stale protection value. But I don't think it is worth to add another fix. The best way is to revert this commit.
[1] [PATCH v3 2/2] mm, memcg: Decouple e{low,min} state mutations from protection checks https://lore.kernel.org/linux-mm/CALOAHbArZ3NsuR3mCnx_kbSF8ktpjhUF2kaaTa7Mb7...
- Naresh
-- Michal Hocko SUSE Labs