On Tue, Jul 16, 2024 at 1:01 PM Roman Gushchin roman.gushchin@linux.dev wrote:
On Tue, Jul 16, 2024 at 06:44:11AM -1000, Tejun Heo wrote:
Hello,
On Tue, Jul 16, 2024 at 03:48:17PM +0200, Michal Hocko wrote: ...
The removal of resets was intentional. The problem was that it wasn't clear who owned those counters and there's no way of telling who reset what when. It was easy to accidentally end up with multiple entities that think they can get timed measurement by resetting.
So, in general, I don't think this is a great idea. There are shortcomings to how memory.peak behaves in that its meaningfulness quickly declines over time. This is expected and the rationale behind adding memory.peak, IIRC, was that it was difficult to tell the memory usage of a short-lived cgroup.
If we want to allow peak measurement of time periods, I wonder whether we could do something similar to pressure triggers - ie. let users register watchers so that each user can define their own watch periods. This is more involved but more useful and less error-inducing than adding reset to a single counter.
It's definitely a better user interface and I totally agree with you regarding the shortcomings of the proposed interface with a global reset. But if you let users to register a (potentially large) number of watchers, it might be quite bad for the overall performance, isn't it? To mitigate it, we'll need to reduce the accuracy of peak values. And then the question is why not just poll it periodically from userspace?
FWIW, as a stop-gap, we did implement periodic polling from userspace for the system that motivated this change, but that is unlikely to catch memory-usage spikes that have shorter timescales than the polling interval. For now, we're keeping it on cgroups v1, but that's looking like a long-term untenable position.
Thanks,