[Our emails have crossed]
On Wed 17-06-20 14:57:58, Chris Down wrote:
Naresh Kamboju writes:
mkfs -t ext4 /dev/disk/by-id/ata-TOSHIBA_MG04ACA100N_Y8RQK14KF6XF mke2fs 1.43.8 (1-Jan-2018) Creating filesystem with 244190646 4k blocks and 61054976 inodes Filesystem UUID: 7c380766-0ed8-41ba-a0de-3c08e78f1891 Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848 Allocating group tables: 0/7453 done Writing inode tables: 0/7453 done Creating journal (262144 blocks): [ 51.544525] under min:0 emin:0 [ 51.845304] under min:0 emin:0 [ 51.848738] under min:0 emin:0 [ 51.858147] under min:0 emin:0 [ 51.861333] under min:0 emin:0 [ 51.862034] under min:0 emin:0 [ 51.862442] under min:0 emin:0 [ 51.862763] under min:0 emin:0
Thanks, this helps a lot. Somehow we're entering mem_cgroup_below_min even when min/emin is 0 (which should indeed be the case if you haven't set them in the hierarchy).
My guess is that page_counter_read(&memcg->memory) is 0, which means mem_cgroup_below_min will return 1.
Yes this is the case because this is likely the root memcg which skips all charges.
However, I don't know for sure why that should then result in the OOM killer coming along. My guess is that since this memcg has 0 pages to scan anyway, we enter premature OOM under some conditions. I don't know why we wouldn't have hit that with the old version of mem_cgroup_protected that returned MEMCG_PROT_* members, though.
Not really. There is likely no other memcg to reclaim from and assuming min limit protection will result in no reclaimable memory and thus the OOM killer.
Can you please try the patch with the `>=` checks in mem_cgroup_below_min and mem_cgroup_below_low changed to `>`? If that fixes it, then that gives a strong hint about what's going on here.
This would work but I believe an explicit check for the root memcg would be easier to spot the reasoning.