On 2026/1/5 11:59, Waiman Long wrote:
On 1/4/26 8:35 PM, Chen Ridong wrote:
On 2026/1/5 5:48, Waiman Long wrote:
On 1/4/26 2:09 AM, Chen Ridong wrote:
On 2026/1/2 3:15, Waiman Long wrote:
Commit fe8cd2736e75 ("cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE until valid partition") introduced a new check to disallow the setting of a new cpuset.cpus.exclusive value that is a superset of a sibling's cpuset.cpus value so that there will at least be one CPU left in the sibling in case the cpuset becomes a valid partition root. This new check does have the side effect of failing a cpuset.cpus change that make it a subset of a sibling's cpuset.cpus.exclusive value.
With v2, users are supposed to be allowed to set whatever value they want in cpuset.cpus without failure. To maintain this rule, the check is now restricted to only when cpuset.cpus.exclusive is being changed not when cpuset.cpus is changed.
Hi, Longman,
You've emphasized that modifying cpuset.cpus should never fail. While I haven't found this explicitly documented. Should we add it?
More importantly, does this mean the "never fail" rule has higher priority than the exclusive CPU constraints? This seems to be the underlying assumption in this patch.
Before the introduction of cpuset partition, writing to cpuset.cpus will only fail if the cpu list is invalid like containing CPUs outside of the valid cpu range. What I mean by "never-fail" is that if the cpu list is valid, the write action should not fail. The rule is not explicitly stated in the documentation, but it is a pre-existing behavior which we should try to keep to avoid breaking existing applications.
There are two conditions that can cause a cpuset.cpus write operation to fail: ENOSPC (No space left on device) and EBUSY.
I just want to ensure the behavior aligns with our design intent.
Consider this example:
# cd /sys/fs/cgroup/ # mkdir test # echo 1 > test/cpuset.cpus # echo $$ > test/cgroup.procs # echo 0 > /sys/devices/system/cpu/cpu1/online # echo > test/cpuset.cpus -bash: echo: write error: No space left on device
In cgroups v2, if the test cgroup becomes empty, it could inherit the parent's effective CPUs. My question is: Should we still fail to clear cpuset.cpus (returning an error) when the cgroup is populated?
Good catch. This error is for v1. It shouldn't apply for v2. Yes, I think we should fix that for v2.
The EBUSY check (through cpuset_cpumask_can_shrink) is necessary, correct?
Since the subsequent patch modifies exclusive checking for v1, should we consolidate all v1-related code into a separate function like cpuset1_validate_change() (maybe come duplicate code)?, it would allow us to isolate v1 logic and avoid having to account for v1 implementation details in future features.
In other words:
validate_change(...) { if (!is_in_v2_mode()) return cpuset1_validate_change(cur, trial); ... // only v2 code here }