On 5/2/23 14:01, Michal Koutný wrote:
Hello.
The previous thread arrived incomplete to me, so I respond to the last message only. Point me to a message URL if it was covered.
On Fri, Apr 14, 2023 at 03:06:27PM -0400, Waiman Long longman@redhat.com wrote:
Below is a draft of the new cpuset.cpus.reserve cgroupfs file:
cpuset.cpus.reserve A read-write multiple values file which exists on all cpuset-enabled cgroups.
It lists the reserved CPUs to be used for the creation of child partitions. See the section on "cpuset.cpus.partition" below for more information on cpuset partition. These reserved CPUs should be a subset of "cpuset.cpus" and will be mutually exclusive of "cpuset.cpus.effective" when used since these reserved CPUs cannot be used by tasks in the current cgroup.
There are two modes for partition CPUs reservation - auto or manual. The system starts up in auto mode where "cpuset.cpus.reserve" will be set automatically when valid child partitions are created and users don't need to touch the file at all. This mode has the limitation that the parent of a partition must be a partition root itself. So child partition has to be created one-by-one from the cgroup root down.
To enable the creation of a partition down in the hierarchy without the intermediate cgroups to be partition roots,
Why would be this needed? Owning a CPU (a resource) must logically be passed all the way from root to the target cgroup, i.e. this is expressed by valid partitioning down to given level.
one
has to turn on the manual reservation mode by writing directly to "cpuset.cpus.reserve" with a value different from its current value. By distributing the reserve CPUs down the cgroup hierarchy to the parent of the target cgroup, this target cgroup can be switched to become a partition root if its "cpuset.cpus" is a subset of the set of valid reserve CPUs in its parent.
level n `- level n+1 cpuset.cpus // these are actually configured by "owner" of level n cpuset.cpus.partition // similrly here, level n decides if child is a partition
I.e. what would be level n/cpuset.cpus.reserve good for when it can directly control level n+1/cpuset.cpus?
In the new scheme, the available cpus are still directly passed down to a descendant cgroup. However, isolated CPUs (or more generally CPUs dedicated to a partition) have to be exclusive. So what the cpuset.cpus.reserve does is to identify those exclusive CPUs that can be excluded from the effective_cpus of the parent cgroups before they are claimed by a child partition. Currently this is done automatically when a child partition is created off a parent partition root. The new scheme will break it into 2 separate steps without the requirement that the parent of a partition has to be a partition root itself.
Cheers, Longman
claimed by a partition and will be excluded from the effective_cpus of the parent