Hello, Michal.
On Mon, Sep 02, 2024 at 05:45:39PM +0200, Michal Koutný wrote:
It makes (some) sense only on leave cgroups (where variously nice'd tasks are competing against each other). Not so much on inner node cgroups (where it's a mere sum but sibling cgroups could have different weights, so the absolute times would contribute differently).
When all tasks have nice > 0 (or nice <= 0), it loses any information it could have had.
I think it's as useful as system-wide nice metric is. It's not a versatile metric but is widely available and understood and people use it. Maybe a workload is split across a sub-hierarchy and they wanna collect how much lowpri threads are consuming. cpu.stats is available without cpu control being enabled and people use it as a way to just aggregate metrics across a portion of the system.
(Thus I don't know whether to commit to exposing that value via cgroups.)
I wonder, wouldn't your use case be equally served by some post-processing [1] of /sys/kernel/debug/sched/debug info which is already available?
...
above is only for CPU nr=0. So processing would mean sampling that file over all CPUs and time.
I think there are benefits to mirroring system wide metrics, at least ones as widely spread as nice.
Thanks.