Andrey Ryabinin arbn@yandex-team.com writes:
cpuacct.stat in no-root cgroups shows user time without guest time included int it. This doesn't match with user time shown in root cpuacct.stat and /proc/<pid>/stat.
Yeah, that's inconsistent.
Make account_guest_time() to add user time to cgroup's cpustat to fix this.
Yep.
cgroup2's cpu.stat is broken the same way for child cgroups, and this happily fixes it. Probably deserves a mention in the changelog.
The problem with cgroup2 was, if the workload was mostly guest time, cpu.stat's user and system together reflected it, but it was split unevenly across the two. I think guest time wasn't actually included in either bucket, it was just that the little user and system time there was got scaled up in cgroup_base_stat_cputime_show -> cputime_adjust to match sum_exec_runtime, which did have it.
The stats look ok now for both cgroup1 and 2. Just slightly unsure whether we want to change the way both interfaces expose the accounting in case something out there depends on it. Seems like we should, but it'd be good to hear more opinions.
@@ -148,11 +146,11 @@ void account_guest_time(struct task_struct *p, u64 cputime) /* Add guest time to cpustat. */ if (task_nice(p) > 0) {
cpustat[CPUTIME_NICE] += cputime;
cpustat[CPUTIME_GUEST_NICE] += cputime;
task_group_account_field(p, CPUTIME_NICE, cputime);
} else {task_group_account_field(p, CPUTIME_GUEST_NICE, cputime);
cpustat[CPUTIME_USER] += cputime;
cpustat[CPUTIME_GUEST] += cputime;
task_group_account_field(p, CPUTIME_USER, cputime);
}task_group_account_field(p, CPUTIME_GUEST, cputime);
Makes sense for _USER and _NICE, but it doesn't seem cgroup1 or 2 actually use _GUEST and _GUEST_NICE.
Could go either way. Consistency is nice, but I probably wouldn't change the GUEST ones so people aren't confused about why they're accounted. It's also extra cycles for nothing, even though most of the data is probably in the cache.