Hello, Michal.
On Thu, Sep 26, 2024 at 08:10:35PM +0200, Michal Koutný wrote: ...
On Tue, Sep 10, 2024 at 11:01:07AM GMT, Tejun Heo tj@kernel.org wrote:
I think it's as useful as system-wide nice metric is.
Exactly -- and I don't understand how that system-wide value (without any cgroups) is useful. If I don't know how many there are niced and non-niced tasks and what their runnable patterns are, the aggregated nice time can have ambiguous interpretations.
I think there are benefits to mirroring system wide metrics, at least ones as widely spread as nice.
I agree with benefits of mirroring of some system wide metrics when they are useful <del>but not all of them because it's difficult/impossible to take them away once they're exposed</del>. Actually, readers _should_ handle missing keys gracefuly, so this may be just fine.
(Is this nice time widely spread? (I remember the field from `top`, still not sure how to use it.) Are other proc_stat(5) fields different?
A personal anecdote: I usually run compile jobs with nice and look at the nice utilization to see what the system is doing. I think it'd be simliar for most folks. Because the number has always been there and ubiqutous across many monitoring tools, people end up using it for something. It's not a great metric but a long-standing and widely available one, so it ends up with usages.
BTW, there are numbers which are actively silly - e.g. iowait, especially due to how it gets aggregated across multiple CPUs. That, we want to actively drop especially as the pressure metrics is the better substitute. I don't think nice is in that category. It's not the best metric there is but not useless or misleading.
I see how this can be the global analog on leaf cgroups but interpretting middle cgroups with children of different cpu.weights?)
I think aggregating per-thread numbers is the right thing to do. It's just sum of CPU cycles spent by threads which got niced.
Thanks.