On Wed, Jul 17, 2024 at 2:26 AM Michal Hocko mhocko@suse.com wrote:
On Tue 16-07-24 18:06:17, David Finkel wrote:
On Tue, Jul 16, 2024 at 4:00 PM Tejun Heo tj@kernel.org wrote:
Hello,
On Tue, Jul 16, 2024 at 08:00:52PM +0200, Michal Hocko wrote: ...
If we want to allow peak measurement of time periods, I wonder whether we could do something similar to pressure triggers - ie. let users register watchers so that each user can define their own watch periods. This is more involved but more useful and less error-inducing than adding reset to a single counter.
I would rather not get back to that unless we have many more users that really need that. Absolute value of the memory consumption is a long living concept that doesn't make much sense most of the time. People just tend to still use it because it is much simpler to compare two different values rather than something as dynamic as PSI similar metrics.
The initial justification for adding memory.peak was that it's mostly to monitor short lived cgroups. Adding reset would make it used more widely, which isn't necessarily a bad thing and people most likely will find ways to use it creatively. I'm mostly worried that that's going to create a mess down the road. Yeah, so, it's not widely useful now but adding reset makes it more useful and in a way which can potentially paint us into a corner.
This is a pretty low-overhead feature as-is, but we can reduce it further by changing it so instead of resetting the watermark on any non-empty string, we reset only on one particular string.
I'm thinking of something like "global_reset\n", so if we do something like the PSI interface later, users can write "fd_local_reset\n", and get that nicer behavior.
This also has the benefit of allowing "echo global_reset > /sys/fs/cgroup/.../memory.peak" to do the right thing. (better names welcome)
This would be a different behavior than in v1 and therefore confusing for those who rely on this in v1 already. So I wouldn't overengineer it and keep the semantic as simple as possible. If we decide to add PSI triggers they are completely independent on peak value because that is reclaim based interface which by definition makes peak value very dubious.
That's fair.
My only thought is that "write any non-empty string", is a very wide interface to support, and limits other possible behaviors later.
FWIW, I have patches with and without this narrowed interface ready to re-mail pending the outcome of this discussion. (both include additional use-case info in the changelog)
Keeping the "all non-empty writes" behavior: https://github.com/dfinkel/linux/commit/5edb21882a88693024a95bbd76b4a8d45613...
With the narrowed interface: https://github.com/dfinkel/linux/commit/f341c8c0cf5fbdcf7af30e20b334e532df74...
-- Michal Hocko SUSE Labs
Thanks,