Re: [PATCH] mm, memcg: cg2 memory{.swap,}.peak write handlers

16 Jul 2024


      Hello,
On Tue, Jul 16, 2024 at 01:10:14PM -0400, David Finkel wrote:
...
...
Swap still has bad reps but there's nothing drastically worse about it than
page cache. ie. If you're under memory pressure, you get thrashing one way
or another. If there's no swap, the system is just memlocking anon memory
even when they are a lot colder than page cache, so I'm skeptical that no
swap + mostly anon + kernel OOM kills is a good strategy in general
especially given that the system behavior is not very predictable under OOM
conditions.
The reason we need peak memory information is to let us schedule work in a
way that we generally avoid OOM conditions. For the workloads I work on,
we generally have very little in the page-cache, since the data isn't
stored locally most of the time, but streamed from other storage/database
systems. For those cases, demand-paging will cause large variations in
servicing time, and we'd rather restart the process than have
unpredictable latency. The same is true for the batch/queue-work system I
wrote this patch to support. We keep very little data on the local disk,
so the page cache is relatively small.
You can detect these conditions more reliably and *earlier* using PSI
triggers with swap enabled than hard allocations and OOM kills. Then, you
can take whatever decision you want to take including killing the job
without worrying about the whole system severely suffering. You can even do
things like freezing the cgroup and taking backtraces and collecting other
debug info to better understand why the memory usage is blowing up.
There are of course multiple ways to go about things but I think it's useful
to note that hard alloc based on peak usage + OOM kills likely isn't the
best way here.
...
...
I appreciate the ownership issues with the current resetting interface in
the other locations. However, this peak RSS data is not used by all that
many applications (as evidenced by the fact that the memory.peak file was
only added a bit over a year ago). I think there are enough cases where
ownership is enforced externally that mirroring the existing interface to
cgroup2 is sufficient.
It's fairly new addition and its utility is limited, so it's not that widely
used. Adding reset makes it more useful but in a way which can be
deterimental in the long term.
...
I do think a more stateful interface would be nice, but I don't know
whether I have enough knowledge of memcg to implement that in a reasonable
amount of time.
Right, this probably isn't trivial.
...
Ownership aside, I think being able to reset the high watermark of a
process makes it significantly more useful. Creating new cgroups and
moving processes around is significantly heavier-weight.
Yeah, the setup / teardown cost can be non-trivial for short lived cgroups.
I agree that having some way of measuring peak in different time intervals
can be useful.
Thanks.
-- 
tejun

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] mm, memcg: cg2 memory{.swap,}.peak write handlers