On 6/6/23 15:58, Tejun Heo wrote:
Hello, Waiman.
On Mon, Jun 05, 2023 at 10:47:08PM -0400, Waiman Long wrote: ...
I had a different idea on the semantics of the cpuset.cpus.exclusive at the beginning. My original thinking is that it was the actual exclusive CPUs that are allocated to the cgroup. Now if we treat this as a hint of what exclusive CPUs should be used and it becomes valid only if the cgroup can
I wouldn't call it a hint. It's still hard allocation of the CPUs to the cgroups that own them. Setting up a partition requires exclusive CPUs and thus would depend on exclusive allocations set up accordingly.
become a valid partition. I can see it as a value that can be hierarchically set throughout the whole cpuset hierarchy.
So a transition to a valid partition is possible iff
- cpuset.cpus.exclusive is a subset of cpuset.cpus and is a subset of
cpuset.cpus.exclusive of all its ancestors.
Yes.
- If its parent is not a partition root, none of the CPUs in
cpuset.cpus.exclusive are currently allocated to other partitions. This the
Not just that, the CPUs aren't available to cgroups which don't have them set in the .exclusive file. IOW, if a CPU is in cpus.exclusive of some cgroups, it shouldn't appear in cpus.effective of cgroups which don't have the CPU in their cpus.exclusive.
So, .exclusive explicitly establishes exclusive ownership of CPUs and partitions depend on that with an implicit "turn CPUs exclusive" behavior in case the parent is a partition root for backward compatibility.
The current CPU exclusive behavior is limited to sibling cgroups only. Because of the hierarchical nature of cpu distribution, the set of exclusive CPUs have to appear in all its ancestors. When partition is enabled, we do a sibling exclusivity test at that point to verify that it is exclusive. It looks like you want to do an exclusivity test even when the partition isn't active. I can certainly do that when the file is being updated. However, it will fail the write if the exclusivity test fails just like the v1 cpuset.cpus.exclusive flag if you are OK with that.
same remote partition concept in my v2 patch. If its parent is a partition root, part of its exclusive CPUs will be distributed to this child partition like the current behavior of cpuset partition.
Yes, similar in a sense. Please do away with the "once .reserve is used, the behavior is switched" part.
That behavior has been gone in my v2 patch.
Instead, it can be sth like "if the parent is a partition root, cpuset implicitly tries to set all CPUs in its cpus file in its cpus.exclusive file" so that user-visible behavior stays unchanged depending on past history.
If parent is a partition root, auto reservation will be done and cpus.exclusive will be set automatically just like before. So existing applications using partition will not be affected.
Cheers, Longman