On 12/8/25 9:32 AM, Michal Koutný wrote:
Hi Waiman.
On Wed, Nov 26, 2025 at 02:43:50PM -0500, Waiman Long llong@redhat.com wrote:
Modification to cpumasks are all serialized by the cpuset_mutex. If you are referring to 2 or more tasks doing parallel updates to various cpuset control files of sibling cpusets, the results can actually vary depending on the actual serialization results of those operations.
I meant the latter when the difference in results when concurrent tasks do the update (e.g. two containers start in parallel), I don't see an issue with the race wrt consistency of in-kernel data. We're on the same page here.
One difference between cpuset.cpus and cpuset.cpus.exclusive is the fact that operations on cpuset.cpus.exclusive can fail if the result is not exclusive WRT sibling cpusets, but becoming a valid partition is guaranteed unless none of the exclusive CPUs are passed down from the parent. The use of cpuset.cpus.exclusive is required for creating remote partition.
OTOH, changes to cpuset.cpus will never fail, but becoming a valid partition root is not guaranteed and is limited to the creation of local partition only.
Does that answer your question?
It does help my understanding. Do you envision that remote and local partitions should be used together (in one subtree)?
It should be rare to have both remote and local partition enabled in the same system, though it is not disallowed. The local partition should only be used on system that run a small number of applications with one or just a few that need partition support. For systems that run a large number of containerized applications like a Kubernetes managed system, local partition cannot be used because of the way container management is being done as the actual cgroups associated with a container can be a bit far from the cgroup root. Remote partition was created for such a use case where local partition will be used at all.
Cheers, Longman