Hi Chen,
Thanks again for taking a look!
Chen Ridong chenridong@huaweicloud.com writes:
On 2025/8/22 14:14, Chen Ridong wrote:
On 2025/8/22 9:37, Tiffany Yang wrote:
There isn't yet a clear way to identify a set of "lost" time that everyone (or at least a wider group of users) cares about. However, users can perform some delay accounting by iterating over components of interest. This patch allows cgroup v2 freezing time to be one of those components.
Track the cumulative time that each v2 cgroup spends freezing and expose it to userland via a new local stat file in cgroupfs. Thank you to Michal, who provided the ASCII art in the updated documentation.
To access this value: $ mkdir /sys/fs/cgroup/test $ cat /sys/fs/cgroup/test/cgroup.stat.local freeze_time_total 0
Ensure consistent freeze time reads with freeze_seq, a per-cgroup sequence counter. Writes are serialized using the css_set_lock.
...
spin_lock_irq(&css_set_lock);
- if (freeze)
- write_seqcount_begin(&cgrp->freezer.freeze_seq);
- if (freeze) { set_bit(CGRP_FREEZE, &cgrp->flags);
- else
cgrp->freezer.freeze_start_nsec = ts_nsec;
- } else { clear_bit(CGRP_FREEZE, &cgrp->flags);
cgrp->freezer.frozen_nsec += (ts_nsec -
cgrp->freezer.freeze_start_nsec);
- }
- write_seqcount_end(&cgrp->freezer.freeze_seq); spin_unlock_irq(&css_set_lock);
Hello Tiffany,
I wanted to check if there are any specific considerations regarding how we should input the ts_nsec value.
Would it be possible to define this directly within the cgroup_do_freeze function rather than passing it as a parameter? This approach might simplify the implementation and potentially improve timing accuracy when it have lots of descendants.
I revisited v3, and this was Michal's point. p / | \ 1 ... n When we freeze the parent group p, is it expected that all descendant cgroups (1 to n) should share the same frozen timestamp?
Yes, this is the expectation from the current change. I understand your concern about the accuracy of this measurement (especially when there are many descendants), but I agree with Michal's point that the time to traverse the descendant cgroups is basically noise relative to the quantity we're trying to measure here.
If the cgroup tree structure is stable, the exact frozen time may not be really matter. However, if the tree is not stable, obtaining the same frozen time is acceptable?
I'm a little unclear as to what you mean about when the cgroup tree is unstable. In the case where a new descendant of p is being created, I believe the cgroup_mutex prevents that from happening at the same time as we are freezing p's other descendants. If it won the race, was created unfrozen under p, and then became frozen during cgroup_freeze, it would have the same timestamp as the other descendants. If it lost the race and was created as a frozen cgroup under p, it would get its own timestamp in cgroup_create, so its freezing duration would be slightly less than that of the others in the hierarchy. Both values would be acceptable for our purposes, but if there was a different case you had in mind, please let me know!
Thanks,