On 2025/11/27 3:43, Waiman Long wrote:
On 11/26/25 9:13 AM, Michal Koutný wrote:
On Mon, Nov 24, 2025 at 05:30:47PM -0500, Waiman Long llong@redhat.com wrote:
In the example above, the final configuration is A1:0-1 & B1:1-2. As the cpu lists overlap, we can't have both of them as valid partition roots. So either one of A1 or B1 is valid or they are both invalid. The current code makes them both invalid no matter the operation ordering. This patch will
I have to admit that I prefer the current implementation.
At the very least, it ensures that all partitions are treated fairly[1]. Relaxing this rule would make it more difficult for users to understand why the cpuset.cpus they configured do not match the effective CPUs in use, and why different operation orders yield different results.
In another scenario, if we do not invalidate the siblings, new leaf cpusets (marked as member) created under A1 will end up with empty effective CPUs—and this is not a desired behavior.
root cgroup | A1 / \ A2 A3...
#1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition mkdir A4 mkdir A5 echo "0" > A4/cpuset.cpus echo $$ > A4/cgroup.procs echo "1" > A5/cpuset.cpus echo $$ > A5/cgroup.procs
[1]: "B1 is a second-class partition only because it starts later or why is it OK to not fulfill its requirement?" --Michal.
make one of them valid given the operation ordering above. To minimize partition invalidation, we will have to live with the fact that it will be first-come first-serve as noted by Michal. I am not against this, we just have to document it. However, the following operation order will still make both of them invalid:
I'm skeptical of the FCFS behavior since I'm afraid it may be subject to race conditions in practice. BTW should cpuset.cpus and cpuset.cpus.exclusive have different behavior in this regard?
Modification to cpumasks are all serialized by the cpuset_mutex. If you are referring to 2 or more tasks doing parallel updates to various cpuset control files of sibling cpusets, the results can actually vary depending on the actual serialization results of those operations.
One difference between cpuset.cpus and cpuset.cpus.exclusive is the fact that operations on cpuset.cpus.exclusive can fail if the result is not exclusive WRT sibling cpusets, but becoming a valid partition is guaranteed unless none of the exclusive CPUs are passed down from the parent. The use of cpuset.cpus.exclusive is required for creating remote partition.
OTOH, changes to cpuset.cpus will never fail, but becoming a valid partition root is not guaranteed and is limited to the creation of local partition only.
Does that answer your question?
Cheers, Longman