Re: [PATCH] mm, memcg: cg2 memory{.swap,}.peak write handlers

16 Jul 2024


      On Tue 16-07-24 06:44:11, Tejun Heo wrote:
...
Hello,
On Tue, Jul 16, 2024 at 03:48:17PM +0200, Michal Hocko wrote:
...
...
...
This behavior is particularly useful for work scheduling systems that
need to track memory usage of worker processes/cgroups per-work-item.
Since memory can't be squeezed like CPU can (the OOM-killer has
opinions), these systems need to track the peak memory usage to compute
system/container fullness when binpacking workitems.
Swap still has bad reps but there's nothing drastically worse about it than
page cache. ie. If you're under memory pressure, you get thrashing one way
or another. If there's no swap, the system is just memlocking anon memory
even when they are a lot colder than page cache, so I'm skeptical that no
swap + mostly anon + kernel OOM kills is a good strategy in general
especially given that the system behavior is not very predictable under OOM
conditions.
Completely agree on this!
...
...
As mentioned down the email thread, I consider usefulness of peak value
rather limited. It is misleading when memory is reclaimed. But
fundamentally I do not oppose to unifying the write behavior to reset
values.
The removal of resets was intentional. The problem was that it wasn't clear
who owned those counters and there's no way of telling who reset what when.
It was easy to accidentally end up with multiple entities that think they
can get timed measurement by resetting.
yes, I understand and agree with you. Generally speaking peak value is
of a very limited value. On the other hand we already have it in v2 and
if it allows a reliable way to scale the workload (which seems to be the
case here) than reseting the value sounds like a cheaper value than
tearing down the memcg and creating it again (with all the dead cgroups
headache that might follow). The reset interface doesn't cause much from
the maintenance POV and if somebody wants to use it they surely need
find a way to coordinate.
...
So, in general, I don't think this is a great idea. There are shortcomings
to how memory.peak behaves in that its meaningfulness quickly declines over
time. This is expected and the rationale behind adding memory.peak, IIRC,
was that it was difficult to tell the memory usage of a short-lived cgroup.
If we want to allow peak measurement of time periods, I wonder whether we
could do something similar to pressure triggers - ie. let users register
watchers so that each user can define their own watch periods. This is more
involved but more useful and less error-inducing than adding reset to a
single counter.
I would rather not get back to that unless we have many more users that
really need that. Absolute value of the memory consumption is a long
living concept that doesn't make much sense most of the time. People
just tend to still use it because it is much simpler to compare two different
values rather than something as dynamic as PSI similar metrics.
-- 
Michal Hocko
SUSE Labs

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] mm, memcg: cg2 memory{.swap,}.peak write handlers