In cgroup v2, a mutual overlap check is required when at least one of two cpusets is exclusive. However, this check should be relaxed and limited to cases where both cpusets are exclusive.
This patch ensures that for sibling cpusets A1 (exclusive) and B1 (non-exclusive), change B1 cannot affect A1's exclusivity.
for example. Assume a machine has 4 CPUs (0-3).
root cgroup / \ A1 B1
Case 1: Table 1.1: Before applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |
After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0-3). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."
Table 1.2: After applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |
Case 2: (This situation remains unchanged from before) Table 2.1: Before applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #3> echo "1-2" > B1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root invalid | member |
Table 2.2: After applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #3> echo "1-2" > B1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root invalid | member |
All other cases remain unaffected. For example, cgroup-v1, both A1 and B1 are exclusive or non-exlusive.
--- v3 -> v4: - Adjust the test_cpuset_prt.sh test file to align with the current behavior.
v2 -> v3: - Ensure compliance with constraints such as cpuset.cpus.exclusive. - Link: https://lore.kernel.org/cgroups/20251113131434.606961-1-sunshaojie@kylinos.c...
v1 -> v2: - Keeps the current cgroup v1 behavior unchanged - Link: https://lore.kernel.org/cgroups/c8e234f4-2c27-4753-8f39-8ae83197efd3@redhat....
--- kernel/cgroup/cpuset-internal.h | 3 ++ kernel/cgroup/cpuset-v1.c | 20 +++++++++ kernel/cgroup/cpuset.c | 43 ++++++++++++++----- .../selftests/cgroup/test_cpuset_prs.sh | 5 ++- 4 files changed, 58 insertions(+), 13 deletions(-)
In cgroup v2, a mutual overlap check is required when at least one of two cpusets is exclusive. However, this check should be relaxed and limited to cases where both cpusets are exclusive.
This patch ensures that for sibling cpusets A1 (exclusive) and B1 (non-exclusive), change B1 cannot affect A1's exclusivity.
for example. Assume a machine has 4 CPUs (0-3).
root cgroup / \ A1 B1
Case 1: Table 1.1: Before applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |
After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0-3). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."
Table 1.2: After applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |
All other cases remain unaffected. For example, cgroup-v1, both A1 and B1 are exclusive or non-exlusive.
Signed-off-by: Sun Shaojie sunshaojie@kylinos.cn --- kernel/cgroup/cpuset-internal.h | 3 ++ kernel/cgroup/cpuset-v1.c | 20 +++++++++ kernel/cgroup/cpuset.c | 43 ++++++++++++++----- .../selftests/cgroup/test_cpuset_prs.sh | 5 ++- 4 files changed, 58 insertions(+), 13 deletions(-)
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h index 337608f408ce..c53111998432 100644 --- a/kernel/cgroup/cpuset-internal.h +++ b/kernel/cgroup/cpuset-internal.h @@ -292,6 +292,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs, struct cpumask *new_cpus, nodemask_t *new_mems, bool cpus_updated, bool mems_updated); int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial); +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2); #else static inline void fmeter_init(struct fmeter *fmp) {} static inline void cpuset1_update_task_spread_flags(struct cpuset *cs, @@ -302,6 +303,8 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs, bool cpus_updated, bool mems_updated) {} static inline int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) { return 0; } +static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, + struct cpuset *cs2) {return false; } #endif /* CONFIG_CPUSETS_V1 */
#endif /* __CPUSET_INTERNAL_H */ diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c index 12e76774c75b..5c1296bf6a34 100644 --- a/kernel/cgroup/cpuset-v1.c +++ b/kernel/cgroup/cpuset-v1.c @@ -373,6 +373,26 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) return ret; }
+/* + * cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU conflicts + * to legacy (v1) + * @cs1: first cpuset to check + * @cs2: second cpuset to check + * + * Returns: true if CPU exclusivity conflict exists, false otherwise + * + * If either cpuset is CPU exclusive, their allowed CPUs cannot intersect. + */ +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) +{ + if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2)) + if (cpumask_intersects(cs1->cpus_allowed, + cs2->cpus_allowed)) + return true; + + return false; +} + #ifdef CONFIG_PROC_PID_CPUSET /* * proc_cpuset_show() diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..0fd803612513 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -580,35 +580,56 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2)
/** * cpus_excl_conflict - Check if two cpusets have exclusive CPU conflicts - * @cs1: first cpuset to check - * @cs2: second cpuset to check + * @cs1: current cpuset to check + * @cs2: cpuset involved in the check * * Returns: true if CPU exclusivity conflict exists, false otherwise * * Conflict detection rules: - * 1. If either cpuset is CPU exclusive, they must be mutually exclusive + * For cgroup-v1: + * see cpuset1_cpus_excl_conflict() + * For cgroup-v2: + * 1. If cs1 is exclusive, cs1 and cs2 must be mutually exclusive * 2. exclusive_cpus masks cannot intersect between cpusets - * 3. The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs + * 3. If cs2 is exclusive, cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs + * 4. if cs1 and cs2 are not exclusive, the allowed CPUs of one cpuset cannot be a subset + * of another's exclusive CPUs */ static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) { - /* If either cpuset is exclusive, check if they are mutually exclusive */ - if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2)) + /* For cgroup-v1 */ + if (!cpuset_v2()) + return cpuset1_cpus_excl_conflict(cs1, cs2); + + /* If cs1 are exclusive, check if they are mutually exclusive */ + if (is_cpu_exclusive(cs1)) return !cpusets_are_exclusive(cs1, cs2);
+ /* The following check applies when either + * both cs1 and cs2 are non-exclusive,or + * only cs2 is exclusive. + */ + /* Exclusive_cpus cannot intersect */ if (cpumask_intersects(cs1->exclusive_cpus, cs2->exclusive_cpus)) return true;
- /* The cpus_allowed of one cpuset cannot be a subset of another cpuset's exclusive_cpus */ - if (!cpumask_empty(cs1->cpus_allowed) && - cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus)) - return true; - + /* cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs */ if (!cpumask_empty(cs2->cpus_allowed) && cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus)) return true;
+ /* If cs2 is exclusive, check finished here */ + if (is_cpu_exclusive(cs2)) + return false; + + /* The following check applies only if both cs1 and cs2 are non-exclusive. */ + + /* cs1's allowed CPUs cannot be a subset of cs1's exclusive CPUs */ + if (!cpumask_empty(cs1->cpus_allowed) && + cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus)) + return true; + return false; }
diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh index a17256d9f88a..b848bc0729cf 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -388,10 +388,11 @@ TEST_MATRIX=( " C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:1 A1:P0|A2:P-2" " C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"
- # A non-exclusive cpuset.cpus change will invalidate partition and its siblings + # A non-exclusive cpuset.cpus change will not invalidate its siblings partition. + # But a exclusive cpuset.cpus change will invalidate itself. " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P0" " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P-1" - " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P0|B1:P-1" + " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P0|B1:P1"
# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it " C0-3 . . C4-5 X5 . . . 0 A1:0-3|B1:4-5"
On 2025/11/17 9:57, Sun Shaojie wrote:
In cgroup v2, a mutual overlap check is required when at least one of two cpusets is exclusive. However, this check should be relaxed and limited to cases where both cpusets are exclusive.
This patch ensures that for sibling cpusets A1 (exclusive) and B1 (non-exclusive), change B1 cannot affect A1's exclusivity.
for example. Assume a machine has 4 CPUs (0-3).
root cgroup / \ A1 B1
Case 1: Table 1.1: Before applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |
After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0-3). However, B1 can actually
B1 (0-3) --> B1(0) ?
use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."
Table 1.2: After applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |
All other cases remain unaffected. For example, cgroup-v1, both A1 and B1 are exclusive or non-exlusive.
Signed-off-by: Sun Shaojie sunshaojie@kylinos.cn
kernel/cgroup/cpuset-internal.h | 3 ++ kernel/cgroup/cpuset-v1.c | 20 +++++++++ kernel/cgroup/cpuset.c | 43 ++++++++++++++----- .../selftests/cgroup/test_cpuset_prs.sh | 5 ++- 4 files changed, 58 insertions(+), 13 deletions(-)
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h index 337608f408ce..c53111998432 100644 --- a/kernel/cgroup/cpuset-internal.h +++ b/kernel/cgroup/cpuset-internal.h @@ -292,6 +292,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs, struct cpumask *new_cpus, nodemask_t *new_mems, bool cpus_updated, bool mems_updated); int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial); +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2); #else static inline void fmeter_init(struct fmeter *fmp) {} static inline void cpuset1_update_task_spread_flags(struct cpuset *cs, @@ -302,6 +303,8 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs, bool cpus_updated, bool mems_updated) {} static inline int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) { return 0; } +static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1,
struct cpuset *cs2) {return false; }#endif /* CONFIG_CPUSETS_V1 */ #endif /* __CPUSET_INTERNAL_H */ diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c index 12e76774c75b..5c1296bf6a34 100644 --- a/kernel/cgroup/cpuset-v1.c +++ b/kernel/cgroup/cpuset-v1.c @@ -373,6 +373,26 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) return ret; } +/*
- cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU conflicts
to legacy (v1)
- @cs1: first cpuset to check
- @cs2: second cpuset to check
- Returns: true if CPU exclusivity conflict exists, false otherwise
- If either cpuset is CPU exclusive, their allowed CPUs cannot intersect.
- */
+bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) +{
- if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
if (cpumask_intersects(cs1->cpus_allowed,cs2->cpus_allowed))return true;- return false;
+}
#ifdef CONFIG_PROC_PID_CPUSET /*
- proc_cpuset_show()
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..0fd803612513 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -580,35 +580,56 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2) /**
- cpus_excl_conflict - Check if two cpusets have exclusive CPU conflicts
- @cs1: first cpuset to check
- @cs2: second cpuset to check
- @cs1: current cpuset to check
- @cs2: cpuset involved in the check
- Returns: true if CPU exclusivity conflict exists, false otherwise
- Conflict detection rules:
- If either cpuset is CPU exclusive, they must be mutually exclusive
- For cgroup-v1:
see cpuset1_cpus_excl_conflict()
- For cgroup-v2:
- If cs1 is exclusive, cs1 and cs2 must be mutually exclusive
- exclusive_cpus masks cannot intersect between cpusets
- The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs
- If cs2 is exclusive, cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs
- if cs1 and cs2 are not exclusive, the allowed CPUs of one cpuset cannot be a subset
*/
- of another's exclusive CPUs
The revised conflict detection rules are confusing to me. I thought cs1 and cs2 should be treated symmetrically, but that doesn’t seem to be the case here.
Shouldn’t the following rule apply regardless of whether cs1 or cs2 is exclusive: "The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs"?
static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) {
- /* If either cpuset is exclusive, check if they are mutually exclusive */
- if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
- /* For cgroup-v1 */
- if (!cpuset_v2())
return cpuset1_cpus_excl_conflict(cs1, cs2);- /* If cs1 are exclusive, check if they are mutually exclusive */
- if (is_cpu_exclusive(cs1)) return !cpusets_are_exclusive(cs1, cs2);
- /* The following check applies when either
* both cs1 and cs2 are non-exclusive,or* only cs2 is exclusive.*/- /* Exclusive_cpus cannot intersect */ if (cpumask_intersects(cs1->exclusive_cpus, cs2->exclusive_cpus)) return true;
- /* The cpus_allowed of one cpuset cannot be a subset of another cpuset's exclusive_cpus */
- if (!cpumask_empty(cs1->cpus_allowed) &&
cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))return true;
- /* cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs */ if (!cpumask_empty(cs2->cpus_allowed) && cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus)) return true;
- /* If cs2 is exclusive, check finished here */
- if (is_cpu_exclusive(cs2))
return false;- /* The following check applies only if both cs1 and cs2 are non-exclusive. */
- /* cs1's allowed CPUs cannot be a subset of cs1's exclusive CPUs */
- if (!cpumask_empty(cs1->cpus_allowed) &&
cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))return true;- return false;
}
From your commit message, it appears you intend to modify "if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))" to "if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))".
However, I’m having trouble following the change.
diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh index a17256d9f88a..b848bc0729cf 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -388,10 +388,11 @@ TEST_MATRIX=( " C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:1 A1:P0|A2:P-2" " C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"
- # A non-exclusive cpuset.cpus change will invalidate partition and its siblings
- # A non-exclusive cpuset.cpus change will not invalidate its siblings partition.
- # But a exclusive cpuset.cpus change will invalidate itself. " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P0" " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P-1"
- " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P0|B1:P-1"
- " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P0|B1:P1"
# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it " C0-3 . . C4-5 X5 . . . 0 A1:0-3|B1:4-5"
On 2025/11/17 15:45, Chen Ridong Wrote:
On 2025/11/17 9:57, Sun Shaojie wrote:
In cgroup v2, a mutual overlap check is required when at least one of two cpusets is exclusive. However, this check should be relaxed and limited to cases where both cpusets are exclusive.
This patch ensures that for sibling cpusets A1 (exclusive) and B1 (non-exclusive), change B1 cannot affect A1's exclusivity.
for example. Assume a machine has 4 CPUs (0-3).
root cgroup / \ A1 B1
Case 1: Table 1.1: Before applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |
After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0-3). However, B1 can actually
B1 (0-3) --> B1(0) ?
Sorry, that was a typo. It should indeed be B1 (0).
use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."
Table 1.2: After applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |
All other cases remain unaffected. For example, cgroup-v1, both A1 and B1 are exclusive or non-exlusive.
Signed-off-by: Sun Shaojie sunshaojie@kylinos.cn
kernel/cgroup/cpuset-internal.h | 3 ++ kernel/cgroup/cpuset-v1.c | 20 +++++++++ kernel/cgroup/cpuset.c | 43 ++++++++++++++----- .../selftests/cgroup/test_cpuset_prs.sh | 5 ++- 4 files changed, 58 insertions(+), 13 deletions(-)
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h index 337608f408ce..c53111998432 100644 --- a/kernel/cgroup/cpuset-internal.h +++ b/kernel/cgroup/cpuset-internal.h @@ -292,6 +292,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs, struct cpumask *new_cpus, nodemask_t *new_mems, bool cpus_updated, bool mems_updated); int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial); +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2); #else static inline void fmeter_init(struct fmeter *fmp) {} static inline void cpuset1_update_task_spread_flags(struct cpuset *cs, @@ -302,6 +303,8 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs, bool cpus_updated, bool mems_updated) {} static inline int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) { return 0; } +static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1,
struct cpuset *cs2) {return false; }#endif /* CONFIG_CPUSETS_V1 */ #endif /* __CPUSET_INTERNAL_H */ diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c index 12e76774c75b..5c1296bf6a34 100644 --- a/kernel/cgroup/cpuset-v1.c +++ b/kernel/cgroup/cpuset-v1.c @@ -373,6 +373,26 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) return ret; } +/*
- cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU conflicts
to legacy (v1)
- @cs1: first cpuset to check
- @cs2: second cpuset to check
- Returns: true if CPU exclusivity conflict exists, false otherwise
- If either cpuset is CPU exclusive, their allowed CPUs cannot intersect.
- */
+bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) +{
- if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
if (cpumask_intersects(cs1->cpus_allowed,cs2->cpus_allowed))return true;- return false;
+}
#ifdef CONFIG_PROC_PID_CPUSET /*
- proc_cpuset_show()
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..0fd803612513 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -580,35 +580,56 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2) /**
- cpus_excl_conflict - Check if two cpusets have exclusive CPU conflicts
- @cs1: first cpuset to check
- @cs2: second cpuset to check
- @cs1: current cpuset to check
- @cs2: cpuset involved in the check
- Returns: true if CPU exclusivity conflict exists, false otherwise
- Conflict detection rules:
- If either cpuset is CPU exclusive, they must be mutually exclusive
- For cgroup-v1:
see cpuset1_cpus_excl_conflict()
- For cgroup-v2:
- If cs1 is exclusive, cs1 and cs2 must be mutually exclusive
- exclusive_cpus masks cannot intersect between cpusets
- The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs
- If cs2 is exclusive, cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs
- if cs1 and cs2 are not exclusive, the allowed CPUs of one cpuset cannot be a subset
*/
- of another's exclusive CPUs
The revised conflict detection rules are confusing to me. I thought cs1 and cs2 should be treated symmetrically, but that doesn’t seem to be the case here.
Shouldn’t the following rule apply regardless of whether cs1 or cs2 is exclusive: "The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs"?
Certainly, this rule applies regardless of whether cs1 or cs2 is exclusive, and the current implementation already handles it this way. The following two cases cover this rule. "1. If cs1 is exclusive, cs1 and cs2 must be mutually exclusive" "3. If cs2 is exclusive, cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs"
static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) {
- /* If either cpuset is exclusive, check if they are mutually exclusive */
- if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
- /* For cgroup-v1 */
- if (!cpuset_v2())
return cpuset1_cpus_excl_conflict(cs1, cs2);- /* If cs1 are exclusive, check if they are mutually exclusive */
- if (is_cpu_exclusive(cs1)) return !cpusets_are_exclusive(cs1, cs2);
- /* The following check applies when either
* both cs1 and cs2 are non-exclusive,or* only cs2 is exclusive.*/- /* Exclusive_cpus cannot intersect */ if (cpumask_intersects(cs1->exclusive_cpus, cs2->exclusive_cpus)) return true;
- /* The cpus_allowed of one cpuset cannot be a subset of another cpuset's exclusive_cpus */
- if (!cpumask_empty(cs1->cpus_allowed) &&
cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))return true;
- /* cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs */ if (!cpumask_empty(cs2->cpus_allowed) && cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus)) return true;
- /* If cs2 is exclusive, check finished here */
- if (is_cpu_exclusive(cs2))
return false;- /* The following check applies only if both cs1 and cs2 are non-exclusive. */
- /* cs1's allowed CPUs cannot be a subset of cs1's exclusive CPUs */
- if (!cpumask_empty(cs1->cpus_allowed) &&
cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))return true;- return false;
}
From your commit message, it appears you intend to modify "if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))" to "if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))".
However, I’m having trouble following the change.
The current modification specifically addresses the scenario where one cpuset A1 is exclusive and its sibling cpuset B1 is non-exclusive. The goal is to ensure that when the non-exclusive cpuset B1 modifies its own "cpuset.cpus" or "cpuset.cpus.exclusive", it does not cause A1 to change from exclusive to non-exclusive.
The following three scenarios are not affected by this patch: 1.both A1 and B1 are exclusive. 2.both A1 and B1 are non-exclusive. 3.A1 is exclusive, B1 is non-exclusive, change "cpuset.cpus" or "cpuset.cpus.exclusive" of A1.
diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh index a17256d9f88a..b848bc0729cf 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -388,10 +388,11 @@ TEST_MATRIX=( " C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:1 A1:P0|A2:P-2" " C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"
- # A non-exclusive cpuset.cpus change will invalidate partition and its siblings
- # A non-exclusive cpuset.cpus change will not invalidate its siblings partition.
- # But a exclusive cpuset.cpus change will invalidate itself. " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P0" " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P-1"
- " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P0|B1:P-1"
- " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P0|B1:P1"
# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it " C0-3 . . C4-5 X5 . . . 0 A1:0-3|B1:4-5"
On 2025/11/17 18:00, Sun Shaojie wrote:
On 2025/11/17 15:45, Chen Ridong Wrote:
On 2025/11/17 9:57, Sun Shaojie wrote:
In cgroup v2, a mutual overlap check is required when at least one of two cpusets is exclusive. However, this check should be relaxed and limited to cases where both cpusets are exclusive.
This patch ensures that for sibling cpusets A1 (exclusive) and B1 (non-exclusive), change B1 cannot affect A1's exclusivity.
for example. Assume a machine has 4 CPUs (0-3).
root cgroup / \ A1 B1
Case 1: Table 1.1: Before applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |
After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0-3). However, B1 can actually
B1 (0-3) --> B1(0) ?
Sorry, that was a typo. It should indeed be B1 (0).
use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."
Table 1.2: After applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |
All other cases remain unaffected. For example, cgroup-v1, both A1 and B1 are exclusive or non-exlusive.
Signed-off-by: Sun Shaojie sunshaojie@kylinos.cn
kernel/cgroup/cpuset-internal.h | 3 ++ kernel/cgroup/cpuset-v1.c | 20 +++++++++ kernel/cgroup/cpuset.c | 43 ++++++++++++++----- .../selftests/cgroup/test_cpuset_prs.sh | 5 ++- 4 files changed, 58 insertions(+), 13 deletions(-)
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h index 337608f408ce..c53111998432 100644 --- a/kernel/cgroup/cpuset-internal.h +++ b/kernel/cgroup/cpuset-internal.h @@ -292,6 +292,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs, struct cpumask *new_cpus, nodemask_t *new_mems, bool cpus_updated, bool mems_updated); int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial); +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2); #else static inline void fmeter_init(struct fmeter *fmp) {} static inline void cpuset1_update_task_spread_flags(struct cpuset *cs, @@ -302,6 +303,8 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs, bool cpus_updated, bool mems_updated) {} static inline int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) { return 0; } +static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1,
struct cpuset *cs2) {return false; }#endif /* CONFIG_CPUSETS_V1 */ #endif /* __CPUSET_INTERNAL_H */ diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c index 12e76774c75b..5c1296bf6a34 100644 --- a/kernel/cgroup/cpuset-v1.c +++ b/kernel/cgroup/cpuset-v1.c @@ -373,6 +373,26 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) return ret; } +/*
- cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU conflicts
to legacy (v1)
- @cs1: first cpuset to check
- @cs2: second cpuset to check
- Returns: true if CPU exclusivity conflict exists, false otherwise
- If either cpuset is CPU exclusive, their allowed CPUs cannot intersect.
- */
+bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) +{
- if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
if (cpumask_intersects(cs1->cpus_allowed,cs2->cpus_allowed))return true;- return false;
+}
#ifdef CONFIG_PROC_PID_CPUSET /*
- proc_cpuset_show()
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..0fd803612513 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -580,35 +580,56 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2) /**
- cpus_excl_conflict - Check if two cpusets have exclusive CPU conflicts
- @cs1: first cpuset to check
- @cs2: second cpuset to check
- @cs1: current cpuset to check
- @cs2: cpuset involved in the check
- Returns: true if CPU exclusivity conflict exists, false otherwise
- Conflict detection rules:
- If either cpuset is CPU exclusive, they must be mutually exclusive
- For cgroup-v1:
see cpuset1_cpus_excl_conflict()
- For cgroup-v2:
- If cs1 is exclusive, cs1 and cs2 must be mutually exclusive
- exclusive_cpus masks cannot intersect between cpusets
- The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs
- If cs2 is exclusive, cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs
- if cs1 and cs2 are not exclusive, the allowed CPUs of one cpuset cannot be a subset
*/
- of another's exclusive CPUs
The revised conflict detection rules are confusing to me. I thought cs1 and cs2 should be treated symmetrically, but that doesn’t seem to be the case here.
Shouldn’t the following rule apply regardless of whether cs1 or cs2 is exclusive: "The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs"?
Certainly, this rule applies regardless of whether cs1 or cs2 is exclusive, and the current implementation already handles it this way. The following two cases cover this rule. "1. If cs1 is exclusive, cs1 and cs2 must be mutually exclusive" "3. If cs2 is exclusive, cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs"
I believe this function should return the same result regardless of whether it is called as cpus_excl_conflict(A1, B1) or cpus_excl_conflict(B1, A1), which means cs1 and cs2 should be treated symmetrically. However, since cs1 and cs2 are handled differently, it is difficult to convince me that this implementation is correct.
static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) {
- /* If either cpuset is exclusive, check if they are mutually exclusive */
- if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
- /* For cgroup-v1 */
- if (!cpuset_v2())
return cpuset1_cpus_excl_conflict(cs1, cs2);- /* If cs1 are exclusive, check if they are mutually exclusive */
- if (is_cpu_exclusive(cs1)) return !cpusets_are_exclusive(cs1, cs2);
- /* The following check applies when either
* both cs1 and cs2 are non-exclusive,or* only cs2 is exclusive.*/- /* Exclusive_cpus cannot intersect */ if (cpumask_intersects(cs1->exclusive_cpus, cs2->exclusive_cpus)) return true;
- /* The cpus_allowed of one cpuset cannot be a subset of another cpuset's exclusive_cpus */
- if (!cpumask_empty(cs1->cpus_allowed) &&
cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))return true;
- /* cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs */ if (!cpumask_empty(cs2->cpus_allowed) && cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus)) return true;
- /* If cs2 is exclusive, check finished here */
- if (is_cpu_exclusive(cs2))
return false;- /* The following check applies only if both cs1 and cs2 are non-exclusive. */
- /* cs1's allowed CPUs cannot be a subset of cs1's exclusive CPUs */
- if (!cpumask_empty(cs1->cpus_allowed) &&
cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))return true;- return false;
}
From your commit message, it appears you intend to modify "if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))" to "if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))".
However, I’m having trouble following the change.
The current modification specifically addresses the scenario where one cpuset A1 is exclusive and its sibling cpuset B1 is non-exclusive. The goal is to ensure that when the non-exclusive cpuset B1 modifies its own "cpuset.cpus" or "cpuset.cpus.exclusive", it does not cause A1 to change from exclusive to non-exclusive.
The following three scenarios are not affected by this patch: 1.both A1 and B1 are exclusive. 2.both A1 and B1 are non-exclusive. 3.A1 is exclusive, B1 is non-exclusive, change "cpuset.cpus" or "cpuset.cpus.exclusive" of A1.
diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh index a17256d9f88a..b848bc0729cf 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -388,10 +388,11 @@ TEST_MATRIX=( " C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:1 A1:P0|A2:P-2" " C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"
- # A non-exclusive cpuset.cpus change will invalidate partition and its siblings
- # A non-exclusive cpuset.cpus change will not invalidate its siblings partition.
- # But a exclusive cpuset.cpus change will invalidate itself. " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P0" " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P-1"
- " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P0|B1:P-1"
- " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P0|B1:P1"
# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it " C0-3 . . C4-5 X5 . . . 0 A1:0-3|B1:4-5"
Currently, when setting a cpuset's cpuset.cpus to a value that conflicts with its sibling partition, the sibling's partition state becomes invalid. However, this invalidation is often unnecessary. If the cpuset being modified is exclusive, it should invalidate itself upon conflict.
This patch applies only to the following two cases:
Assume the machine has 4 CPUs (0-3).
root cgroup / \ A1 B1
Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus
Table 1.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |
After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."
Table 1.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |
Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus
Table 2.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root invalid | root invalid |
After step #4, B1 can exclusively use CPU 2. Therefore, at step #5, regardless of what conflicting value B1 writes to cpuset.cpus, it will always have at least CPU 2 available. This makes it unnecessary to mark A1 as "root invalid".
Table 2.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root | root invalid |
In summary, regardless of how B1 configures its cpuset.cpus, there will always be available CPUs in B1's cpuset.cpus.effective. Therefore, there is no need to change A1 from "root" to "root invalid".
All other cases remain unaffected. For example, cgroup-v1.
Signed-off-by: Sun Shaojie sunshaojie@kylinos.cn --- kernel/cgroup/cpuset.c | 19 +------------------ .../selftests/cgroup/test_cpuset_prs.sh | 7 ++++--- 2 files changed, 5 insertions(+), 21 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..f6a834335ebf 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2411,34 +2411,17 @@ static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialc struct tmpmasks *tmp) { int retval; - struct cpuset *parent = parent_cs(cs);
retval = validate_change(cs, trialcs);
if ((retval == -EINVAL) && cpuset_v2()) { - struct cgroup_subsys_state *css; - struct cpuset *cp; - /* * The -EINVAL error code indicates that partition sibling * CPU exclusivity rule has been violated. We still allow * the cpumask change to proceed while invalidating the - * partition. However, any conflicting sibling partitions - * have to be marked as invalid too. + * partition. */ trialcs->prs_err = PERR_NOTEXCL; - rcu_read_lock(); - cpuset_for_each_child(cp, css, parent) { - struct cpumask *xcpus = user_xcpus(trialcs); - - if (is_partition_valid(cp) && - cpumask_intersects(xcpus, cp->effective_xcpus)) { - rcu_read_unlock(); - update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp); - rcu_read_lock(); - } - } - rcu_read_unlock(); retval = 0; } return retval; diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh index a17256d9f88a..7d8941f65d84 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -388,10 +388,11 @@ TEST_MATRIX=( " C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:1 A1:P0|A2:P-2" " C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"
- # A non-exclusive cpuset.cpus change will invalidate partition and its siblings + # A non-exclusive cpuset.cpus change will not invalidate its siblings partition. + # An exclusive cpuset.cpus change will invalidate itself. " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P0" - " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P-1" - " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P0|B1:P-1" + " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P-1|B1:P1" + " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P0|B1:P1"
# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it " C0-3 . . C4-5 X5 . . . 0 A1:0-3|B1:4-5"
On Wed, Nov 19, 2025 at 06:57:49PM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:
Currently, when setting a cpuset's cpuset.cpus to a value that conflicts with its sibling partition, the sibling's partition state becomes invalid. However, this invalidation is often unnecessary. If the cpuset being modified is exclusive, it should invalidate itself upon conflict.
This patch applies only to the following two cases:
Assume the machine has 4 CPUs (0-3).
root cgroup / \ A1 B1
Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus
Table 1.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |
After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."
Table 1.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |
Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus
(Thanks for working this out, Shaojie.)
Table 2.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root invalid | root invalid |
After step #4, B1 can exclusively use CPU 2. Therefore, at step #5, regardless of what conflicting value B1 writes to cpuset.cpus, it will always have at least CPU 2 available. This makes it unnecessary to mark A1 as "root invalid".
Table 2.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root | root invalid |
In summary, regardless of how B1 configures its cpuset.cpus, there will always be available CPUs in B1's cpuset.cpus.effective. Therefore, there is no need to change A1 from "root" to "root invalid".
Admittedly, I don't like this change because it relies on implicit preference ordering between siblings (here first comes, first served) and so the effective config cannot be derived just from the applied values :-/
Do you actually want to achieve this or is it an implementation side-effect of the Case 1 scenario that you want to achieve?
Thanks, Michal
On 2025/11/19 21:20, Michal Koutný wrote:
On Wed, Nov 19, 2025 at 06:57:49PM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:
Currently, when setting a cpuset's cpuset.cpus to a value that conflicts with its sibling partition, the sibling's partition state becomes invalid. However, this invalidation is often unnecessary. If the cpuset being modified is exclusive, it should invalidate itself upon conflict.
This patch applies only to the following two cases:
Assume the machine has 4 CPUs (0-3).
root cgroup / \ A1 B1
Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus
Table 1.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |
After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."
Table 1.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |
Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus
(Thanks for working this out, Shaojie.)
Table 2.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root invalid | root invalid |
After step #4, B1 can exclusively use CPU 2. Therefore, at step #5, regardless of what conflicting value B1 writes to cpuset.cpus, it will always have at least CPU 2 available. This makes it unnecessary to mark A1 as "root invalid".
Table 2.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root | root invalid |
In summary, regardless of how B1 configures its cpuset.cpus, there will always be available CPUs in B1's cpuset.cpus.effective. Therefore, there is no need to change A1 from "root" to "root invalid".
Admittedly, I don't like this change because it relies on implicit preference ordering between siblings (here first comes, first served)
Agree. If we only invalidate the latter one, I think regardless of the implementation approach, we may end up with different results depending on the order of operations.
and so the effective config cannot be derived just from the applied values :-/
Do you actually want to achieve this or is it an implementation side-effect of the Case 1 scenario that you want to achieve?
Thanks, Michal
Hi, Ridong,
On Thu, 20 Nov 2025 08:57:51, Chen Ridong wrote:
On 2025/11/19 21:20, Michal Koutný wrote:
On Wed, Nov 19, 2025 at 06:57:49PM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:
Table 2.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root invalid | root invalid |
After step #4, B1 can exclusively use CPU 2. Therefore, at step #5, regardless of what conflicting value B1 writes to cpuset.cpus, it will always have at least CPU 2 available. This makes it unnecessary to mark A1 as "root invalid".
Table 2.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root | root invalid |
In summary, regardless of how B1 configures its cpuset.cpus, there will always be available CPUs in B1's cpuset.cpus.effective. Therefore, there is no need to change A1 from "root" to "root invalid".
Admittedly, I don't like this change because it relies on implicit preference ordering between siblings (here first comes, first served)
Agree. If we only invalidate the latter one, I think regardless of the implementation approach, we may end up with different results depending on the order of operations.
I don't understand the "order of operations" mentioned here. After reviewing the previous email content, are you referring to this?
On Sat, 15 Nov 2025 15:41:03, Chen Ridong wrote:
With the result you expect, would we observe the following behaviors:
#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus #4> echo "1-2" > B1/cpuset.cpus #5> echo "root" > A1/cpuset.cpus.partition #6> echo "root" > B1/cpuset.cpus.partition # A1:root;B1:root invalid
#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus #4> echo "1-2" > B1/cpuset.cpus #5> echo "root" > B1/cpuset.cpus.partition #6> echo "root" > A1/cpuset.cpus.partition # A1:root invalid;B1:root
Do different operation orders yield different results? If so, this is not what we expect.
However, after applying this patch, the outcomes of these two examples are as follows:
#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "1-2" > B1/cpuset.cpus | member | member | #5> echo "root" > A1/cpuset.cpus.partition | root invalid | root | #6> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid|
#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "1-2" > B1/cpuset.cpus | member | member | #5> echo "root" > B1/cpuset.cpus.partition | root | root invalid| #6> echo "root" > A1/cpuset.cpus.partition | root invalid | root invalid|
Moreover, even without applying this patch, the result remains the same, because modifying cpuset.cpus.partition does not disable its siblings' partitions.
So, what are the specific issues that you believe would arise?
Thanks, Sun Shaojie
On 2025/11/20 21:07, Sun Shaojie wrote:
Hi, Ridong,
On Thu, 20 Nov 2025 08:57:51, Chen Ridong wrote:
On 2025/11/19 21:20, Michal Koutný wrote:
On Wed, Nov 19, 2025 at 06:57:49PM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:
Table 2.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root invalid | root invalid |
After step #4, B1 can exclusively use CPU 2. Therefore, at step #5, regardless of what conflicting value B1 writes to cpuset.cpus, it will always have at least CPU 2 available. This makes it unnecessary to mark A1 as "root invalid".
Table 2.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root | root invalid |
In summary, regardless of how B1 configures its cpuset.cpus, there will always be available CPUs in B1's cpuset.cpus.effective. Therefore, there is no need to change A1 from "root" to "root invalid".
Admittedly, I don't like this change because it relies on implicit preference ordering between siblings (here first comes, first served)
Agree. If we only invalidate the latter one, I think regardless of the implementation approach, we may end up with different results depending on the order of operations.
I don't understand the "order of operations" mentioned here. After reviewing the previous email content, are you referring to this?
On Sat, 15 Nov 2025 15:41:03, Chen Ridong wrote:
With the result you expect, would we observe the following behaviors:
#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus #4> echo "1-2" > B1/cpuset.cpus #5> echo "root" > A1/cpuset.cpus.partition #6> echo "root" > B1/cpuset.cpus.partition # A1:root;B1:root invalid
#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus #4> echo "1-2" > B1/cpuset.cpus #5> echo "root" > B1/cpuset.cpus.partition #6> echo "root" > A1/cpuset.cpus.partition # A1:root invalid;B1:root
Do different operation orders yield different results? If so, this is not what we expect.
However, after applying this patch, the outcomes of these two examples are as follows: #1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "1-2" > B1/cpuset.cpus | member | member | #5> echo "root" > A1/cpuset.cpus.partition | root invalid | root | #6> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid|
#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "1-2" > B1/cpuset.cpus | member | member | #5> echo "root" > B1/cpuset.cpus.partition | root | root invalid| #6> echo "root" > A1/cpuset.cpus.partition | root invalid | root invalid|
How about the following two sequences of operations:
#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus #4> echo "root" > A1/cpuset.cpus.partition #5> echo "1-2" > B1/cpuset.cpus #6> echo "root" > B1/cpuset.cpus.partition
#1> mkdir -p A1 #2> mkdir -p B1 #5> echo "1-2" > B1/cpuset.cpus #6> echo "root" > B1/cpuset.cpus.partition #3> echo "0-1" > A1/cpuset.cpus #4> echo "root" > A1/cpuset.cpus.partition
Will these two sequences yield the same result?
As a key requirement: Regardless of the order in which we apply the configurations, identical final settings should always result in identical system states. We need to confirm if this holds true here.
Moreover, even without applying this patch, the result remains the same, because modifying cpuset.cpus.partition does not disable its siblings' partitions.
So, what are the specific issues that you believe would arise?
Thanks, Sun Shaojie
Hi, Michal,
On Wed, 19 Nov 2025 14:20:25, Michal Koutný wrote:
On Wed, Nov 19, 2025 at 06:57:49PM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:
Currently, when setting a cpuset's cpuset.cpus to a value that conflicts with its sibling partition, the sibling's partition state becomes invalid. However, this invalidation is often unnecessary. If the cpuset being modified is exclusive, it should invalidate itself upon conflict.
This patch applies only to the following two cases:
Assume the machine has 4 CPUs (0-3).
root cgroup / \ A1 B1
Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus
Table 1.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |
After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."
Table 1.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |
Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus
(Thanks for working this out, Shaojie.)
Table 2.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root invalid | root invalid |
After step #4, B1 can exclusively use CPU 2. Therefore, at step #5, regardless of what conflicting value B1 writes to cpuset.cpus, it will always have at least CPU 2 available. This makes it unnecessary to mark A1 as "root invalid".
Table 2.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root | root invalid |
In summary, regardless of how B1 configures its cpuset.cpus, there will always be available CPUs in B1's cpuset.cpus.effective. Therefore, there is no need to change A1 from "root" to "root invalid".
Admittedly, I don't like this change because it relies on implicit preference ordering between siblings (here first comes, first served) and so the effective config cannot be derived just from the applied values :-/
Do you actually want to achieve this or is it an implementation side-effect of the Case 1 scenario that you want to achieve?
Yes, this is indeed the functionality I intended to achieve, as I find it follows the same logic as Case 1.
However, I didn't fully understand what you meant by "implicit preference ordering between siblings (here first comes, first served)." Could you provide an example?
As for your point that "the effective config cannot be derived just from the applied values," even before this patch, we couldn't derive the final effective configuration solely from the applied values.
For example, consider the following scenario: (not apply this patch) Table 1: Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "1-2" > B1/cpuset.cpus | root invalid | member |
Table 2: Step | A1's prstate | B1's prstate | #1> echo "1-2" > B1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root invalid | member | #3> echo "0-1" > A1/cpuset.cpus | root | member |
After step #3, both Table 1 and Table 2 have identical value settings, yet A1's partition state differs between them.
Thanks, Sun Shaojie
On 2025/11/19 18:57, Sun Shaojie wrote:
Currently, when setting a cpuset's cpuset.cpus to a value that conflicts with its sibling partition, the sibling's partition state becomes invalid. However, this invalidation is often unnecessary. If the cpuset being modified is exclusive, it should invalidate itself upon conflict.
This patch applies only to the following two cases:
Assume the machine has 4 CPUs (0-3).
root cgroup / \ A1 B1
Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus
Table 1.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |
After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."
Table 1.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |
Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus
Table 2.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root invalid | root invalid |
After step #4, B1 can exclusively use CPU 2. Therefore, at step #5, regardless of what conflicting value B1 writes to cpuset.cpus, it will always have at least CPU 2 available. This makes it unnecessary to mark A1 as "root invalid".
Table 2.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root | root invalid |
In summary, regardless of how B1 configures its cpuset.cpus, there will always be available CPUs in B1's cpuset.cpus.effective. Therefore, there is no need to change A1 from "root" to "root invalid".
All other cases remain unaffected. For example, cgroup-v1.
Signed-off-by: Sun Shaojie sunshaojie@kylinos.cn
kernel/cgroup/cpuset.c | 19 +------------------ .../selftests/cgroup/test_cpuset_prs.sh | 7 ++++--- 2 files changed, 5 insertions(+), 21 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..f6a834335ebf 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2411,34 +2411,17 @@ static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialc struct tmpmasks *tmp) { int retval;
- struct cpuset *parent = parent_cs(cs);
retval = validate_change(cs, trialcs); if ((retval == -EINVAL) && cpuset_v2()) {
struct cgroup_subsys_state *css;struct cpuset *cp;- /*
- The -EINVAL error code indicates that partition sibling
- CPU exclusivity rule has been violated. We still allow
- the cpumask change to proceed while invalidating the
* partition. However, any conflicting sibling partitions* have to be marked as invalid too.
*/ trialcs->prs_err = PERR_NOTEXCL;* partition.
rcu_read_lock();cpuset_for_each_child(cp, css, parent) {struct cpumask *xcpus = user_xcpus(trialcs);if (is_partition_valid(cp) &&cpumask_intersects(xcpus, cp->effective_xcpus)) {rcu_read_unlock();update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp);rcu_read_lock();}} retval = 0; } return retval;rcu_read_unlock();
If we remove this logic, there is a scenario where the parent (a partition) could end up with empty effective CPUs. This means the corresponding CS will also have empty effective CPUs and thus fail to disable its siblings' partitions.
Hi, Ridong,
On Thu, 20 Nov 2025 08:51:30, Chen Ridong wrote:
On 2025/11/19 18:57, Sun Shaojie wrote:
kernel/cgroup/cpuset.c | 19 +------------------ .../selftests/cgroup/test_cpuset_prs.sh | 7 ++++--- 2 files changed, 5 insertions(+), 21 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..f6a834335ebf 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2411,34 +2411,17 @@ static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialc struct tmpmasks *tmp) { int retval;
- struct cpuset *parent = parent_cs(cs);
retval = validate_change(cs, trialcs); if ((retval == -EINVAL) && cpuset_v2()) {
struct cgroup_subsys_state *css;struct cpuset *cp;- /*
- The -EINVAL error code indicates that partition sibling
- CPU exclusivity rule has been violated. We still allow
- the cpumask change to proceed while invalidating the
* partition. However, any conflicting sibling partitions* have to be marked as invalid too.
*/ trialcs->prs_err = PERR_NOTEXCL;* partition.
rcu_read_lock();cpuset_for_each_child(cp, css, parent) {struct cpumask *xcpus = user_xcpus(trialcs);if (is_partition_valid(cp) &&cpumask_intersects(xcpus, cp->effective_xcpus)) {rcu_read_unlock();update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp);rcu_read_lock();}} retval = 0; } return retval;rcu_read_unlock();If we remove this logic, there is a scenario where the parent (a partition) could end up with empty effective CPUs. This means the corresponding CS will also have empty effective CPUs and thus fail to disable its siblings' partitions.
I have carefully considered the scenario where parent effective CPUs are empty, which corresponds to the following two cases. (After apply this patch).
root cgroup | A1 / \ A2 A3
Case 1: Step: #1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition
After step #4,
| A1 | A2 | A3 | cpus_allowed | 0-1 | 0-1 | | effective_cpus | | 0-1 | | prstate | root | root | member |
After step #4, A3's effective CPUs is empty.
#5> echo "0-1" > A3/cpuset.cpus
After step #5,
| A1 | A2 | A3 | cpus_allowed | 0-1 | 0-1 | 0-1 | effective_cpus | | 0-1 | | prstate | root | root | member |
This patch affects step #5. After step #5, A3's effective CPUs is also empty. Since A3's effective CPUs can be empty before step #5 (setting cpuset.cpus), it is acceptable for them to remain empty after step #5. Moreover, if A3 is aware that its parent's effective CPUs are empty, it should understand that the CPUs it requests may not be granted.
Case 2: Step: #1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition #5> echo "1" > A3/cpuset.cpus #6> echo "root" > A3/cpuset.cpus.partition
After step #6,
| A1 | A2 | A3 | cpus_allowed | 0-1 | 0 | 1 | effective_cpus | | 0 | 1 | prstate | root | root | root |
#7> echo "0-1" > A3/cpuset.cpus
After step #7,
| A1 | A2 | A3 | cpus_allowed | 0-1 | 0 | 0-1 | effective_cpus | 1 | 0 | 1 | prstate | root | root | root invalid |
This patch affects step #7. After step #7, A3 only affects itself, changing from "root" to "root invalid". However, since its effective CPUs remain 1 both before and after step #7, it doesn't matter even if A2 is not invalidated.
The purpose of this patch is to ensure that modifying cpuset.cpus does not disable its siblings' partitions.
Thanks, Sun Shaojie
On 2025/11/20 21:07, Sun Shaojie wrote:
Hi, Ridong,
On Thu, 20 Nov 2025 08:51:30, Chen Ridong wrote:
On 2025/11/19 18:57, Sun Shaojie wrote:
kernel/cgroup/cpuset.c | 19 +------------------ .../selftests/cgroup/test_cpuset_prs.sh | 7 ++++--- 2 files changed, 5 insertions(+), 21 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..f6a834335ebf 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2411,34 +2411,17 @@ static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialc struct tmpmasks *tmp) { int retval;
- struct cpuset *parent = parent_cs(cs);
retval = validate_change(cs, trialcs); if ((retval == -EINVAL) && cpuset_v2()) {
struct cgroup_subsys_state *css;struct cpuset *cp;- /*
- The -EINVAL error code indicates that partition sibling
- CPU exclusivity rule has been violated. We still allow
- the cpumask change to proceed while invalidating the
* partition. However, any conflicting sibling partitions* have to be marked as invalid too.
*/ trialcs->prs_err = PERR_NOTEXCL;* partition.
rcu_read_lock();cpuset_for_each_child(cp, css, parent) {struct cpumask *xcpus = user_xcpus(trialcs);if (is_partition_valid(cp) &&cpumask_intersects(xcpus, cp->effective_xcpus)) {rcu_read_unlock();update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp);rcu_read_lock();}} retval = 0; } return retval;rcu_read_unlock();If we remove this logic, there is a scenario where the parent (a partition) could end up with empty effective CPUs. This means the corresponding CS will also have empty effective CPUs and thus fail to disable its siblings' partitions.
I have carefully considered the scenario where parent effective CPUs are empty, which corresponds to the following two cases. (After apply this patch).
root cgroup | A1 / \ A2 A3
Case 1: Step: #1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition
After step #4,
| A1 | A2 | A3 |cpus_allowed | 0-1 | 0-1 | | effective_cpus | | 0-1 | | prstate | root | root | member |
After step #4, A3's effective CPUs is empty.
That may be a corner case is unexpected.
#5> echo "0-1" > A3/cpuset.cpus
If we create subdirectories (e.g., A4, A5, ...) under the A1 cpuset and then configure cpuset.cpus for A1 (a common usage scenario), processes can no longer be migrated into these subdirectories (A4, A5, ...) afterward. However, prior to your patch, this migration was allowed.
After step #5,
| A1 | A2 | A3 |cpus_allowed | 0-1 | 0-1 | 0-1 | effective_cpus | | 0-1 | | prstate | root | root | member |
This patch affects step #5. After step #5, A3's effective CPUs is also empty. Since A3's effective CPUs can be empty before step #5 (setting cpuset.cpus), it is acceptable for them to remain empty after step #5. Moreover, if A3 is aware that its parent's effective CPUs are empty, it should understand that the CPUs it requests may not be granted.
Case 2: Step: #1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition #5> echo "1" > A3/cpuset.cpus #6> echo "root" > A3/cpuset.cpus.partition
After step #6,
| A1 | A2 | A3 |cpus_allowed | 0-1 | 0 | 1 | effective_cpus | | 0 | 1 | prstate | root | root | root |
#7> echo "0-1" > A3/cpuset.cpus
After step #7,
| A1 | A2 | A3 |cpus_allowed | 0-1 | 0 | 0-1 | effective_cpus | 1 | 0 | 1 | prstate | root | root | root invalid |
This patch affects step #7. After step #7, A3 only affects itself, changing from "root" to "root invalid". However, since its effective CPUs remain 1 both before and after step #7, it doesn't matter even if A2 is not invalidated.
The purpose of this patch is to ensure that modifying cpuset.cpus does not disable its siblings' partitions.
Thanks, Sun Shaojie
Hi, Ridong,
On 2025/11/17 19:37, Chen Ridong wrote:
On 2025/11/17 18:00, Sun Shaojie wrote:
Certainly, this rule applies regardless of whether cs1 or cs2 is exclusive, and the current implementation already handles it this way. The following two cases cover this rule. "1. If cs1 is exclusive, cs1 and cs2 must be mutually exclusive" "3. If cs2 is exclusive, cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs"
I believe this function should return the same result regardless of whether it is called as cpus_excl_conflict(A1, B1) or cpus_excl_conflict(B1, A1), which means cs1 and cs2 should be treated symmetrically. However, since cs1 and cs2 are handled differently, it is difficult to convince me that this implementation is correct.
In patch v5, modifications to the cpus_excl_conflict interface have been avoided, along with preventing the following ineffective scenario.
Both A1 and B1 are exclusive, change B1's cpuset.cpus, avoid A1 becoming non-exclusive.
Looking forward to your feedback on patch v5. patch v5 : https://lore.kernel.org/cgroups/20251119105749.1385946-1-sunshaojie@kylinos....
Thanks, Sun Shaojie
On Mon, Nov 17, 2025 at 09:57:08AM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:
This patch ensures that for sibling cpusets A1 (exclusive) and B1 (non-exclusive), change B1 cannot affect A1's exclusivity.
for example. Assume a machine has 4 CPUs (0-3).
root cgroup / \ A1 B1
Case 1: Table 1.1: Before applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |
After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0-3). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."
Table 1.2: After applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |
OK, this looks fine to me, based on this statement from the docs about cpuset.cpus.effective:
subset of "cpuset.cpus" unless none of the CPUs listed in "cpuset.cpus" can be granted. In this case, it will be treated just like an empty "cpuset.cpus".
I was likely confused by the eventual switch of B1 to root in your previous example. (Because if you continue, it should result in (after patch too): #4> echo "root" > B1/cpuset.partition | root invalid | root invalid | and end state should be invariant wrt A1,B1 or B1,A1 config order.)
All other cases remain unaffected. For example, cgroup-v1, both A1 and B1 are exclusive or non-exlusive.
(Note, I'm only commenting the concept here, I haven't checked the code change actually achieves that and doesn't break anythine else ;-)
Thanks, Michal
Hi, Michal,
On Tue, 18 Nov 2025 18:52:24 +0100, Michal Koutný wrote:
On Mon, Nov 17, 2025 at 09:57:08AM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:
This patch ensures that for sibling cpusets A1 (exclusive) and B1 (non-exclusive), change B1 cannot affect A1's exclusivity.
for example. Assume a machine has 4 CPUs (0-3).
root cgroup / \ A1 B1
Case 1: Table 1.1: Before applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |
After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0-3). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."
Table 1.2: After applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |
OK, this looks fine to me, based on this statement from the docs about cpuset.cpus.effective:
subset of "cpuset.cpus" unless none of the CPUs listed in "cpuset.cpus" can be granted. In this case, it will be treated just like an empty "cpuset.cpus".
I was likely confused by the eventual switch of B1 to root in your previous example. (Because if you continue, it should result in (after patch too): #4> echo "root" > B1/cpuset.partition | root invalid | root invalid | and end state should be invariant wrt A1,B1 or B1,A1 config order.)
This patch is based on a version after v6.18.0-rc5. Whether or not this patch is applied, modifications to cpuset.partition do not affect the state of sibling partitions.
If continue, the result should be as follows: #4> echo "root" > B1/cpuset.partition | root | root invalid |
I've updated patch v5 with some new ideas and look forward to your feedback.
patch v5 : https://lore.kernel.org/cgroups/20251119105749.1385946-1-sunshaojie@kylinos....
Thanks, Sun Shaojie
On 11/16/25 8:57 PM, Sun Shaojie wrote:
In cgroup v2, a mutual overlap check is required when at least one of two cpusets is exclusive. However, this check should be relaxed and limited to cases where both cpusets are exclusive.
This patch ensures that for sibling cpusets A1 (exclusive) and B1 (non-exclusive), change B1 cannot affect A1's exclusivity.
for example. Assume a machine has 4 CPUs (0-3).
root cgroup / \ A1 B1Case 1: Table 1.1: Before applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |
After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0-3). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."
Table 1.2: After applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |
All other cases remain unaffected. For example, cgroup-v1, both A1 and B1 are exclusive or non-exlusive.
Signed-off-by: Sun Shaojie sunshaojie@kylinos.cn
kernel/cgroup/cpuset-internal.h | 3 ++ kernel/cgroup/cpuset-v1.c | 20 +++++++++ kernel/cgroup/cpuset.c | 43 ++++++++++++++----- .../selftests/cgroup/test_cpuset_prs.sh | 5 ++- 4 files changed, 58 insertions(+), 13 deletions(-)
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h index 337608f408ce..c53111998432 100644 --- a/kernel/cgroup/cpuset-internal.h +++ b/kernel/cgroup/cpuset-internal.h @@ -292,6 +292,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs, struct cpumask *new_cpus, nodemask_t *new_mems, bool cpus_updated, bool mems_updated); int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial); +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2); #else static inline void fmeter_init(struct fmeter *fmp) {} static inline void cpuset1_update_task_spread_flags(struct cpuset *cs, @@ -302,6 +303,8 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs, bool cpus_updated, bool mems_updated) {} static inline int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) { return 0; } +static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1,
#endif /* CONFIG_CPUSETS_V1 */struct cpuset *cs2) {return false; }#endif /* __CPUSET_INTERNAL_H */ diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c index 12e76774c75b..5c1296bf6a34 100644 --- a/kernel/cgroup/cpuset-v1.c +++ b/kernel/cgroup/cpuset-v1.c @@ -373,6 +373,26 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) return ret; } +/*
- cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU conflicts
to legacy (v1)
- @cs1: first cpuset to check
- @cs2: second cpuset to check
- Returns: true if CPU exclusivity conflict exists, false otherwise
- If either cpuset is CPU exclusive, their allowed CPUs cannot intersect.
- */
+bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) +{
- if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
if (cpumask_intersects(cs1->cpus_allowed,cs2->cpus_allowed))return true;- return false;
+}
- #ifdef CONFIG_PROC_PID_CPUSET /*
- proc_cpuset_show()
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..0fd803612513 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -580,35 +580,56 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2) /**
- cpus_excl_conflict - Check if two cpusets have exclusive CPU conflicts
- @cs1: first cpuset to check
- @cs2: second cpuset to check
- @cs1: current cpuset to check
- @cs2: cpuset involved in the check
- Returns: true if CPU exclusivity conflict exists, false otherwise
- Conflict detection rules:
- If either cpuset is CPU exclusive, they must be mutually exclusive
- For cgroup-v1:
see cpuset1_cpus_excl_conflict()
- For cgroup-v2:
- If cs1 is exclusive, cs1 and cs2 must be mutually exclusive
- exclusive_cpus masks cannot intersect between cpusets
- The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs
- If cs2 is exclusive, cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs
- if cs1 and cs2 are not exclusive, the allowed CPUs of one cpuset cannot be a subset
*/ static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
- of another's exclusive CPUs
As cs1 and cs2 is going to be handled differently, their current naming will make it hard to understand why they are treated differently. I will recommended changing the parameter name to "trial, sibling" as the caller call it with "cpus_excl_conflict(trial, c)" where trial is the new cpuset data to be tested and sibling is one of its sibling cpusets. It has to be clearly document what each parameter is for and the fact that swapping the parameters will cause it to return incorrect result.
{
- /* If either cpuset is exclusive, check if they are mutually exclusive */
- if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
- /* For cgroup-v1 */
- if (!cpuset_v2())
return cpuset1_cpus_excl_conflict(cs1, cs2);- /* If cs1 are exclusive, check if they are mutually exclusive */
- if (is_cpu_exclusive(cs1)) return !cpusets_are_exclusive(cs1, cs2);
Code change like the following can eliminate the need to introduce a new cpuset1_cpus_excl_conflict() helper.
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index ec8bebc66469..201c70fb7401 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -599,9 +599,15 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2) */ static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) { - /* If either cpuset is exclusive, check if they are mutually exclusive */ - if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2)) - return !cpusets_are_exclusive(cs1, cs2); + /* + * If trial is exclusive or sibling is exclusive & in v1, + * check if they are mutually exclusive + */ + if (is_cpu_exclusive(trial) || (!cpuset_v2() && is_cpu_exclusive(sibling))) + return !cpusets_are_exclusive(trial, sibling); + + if (!cpuset_v2()) + return false; /* The checking below is irrelevant to cpuset v1 */
/* Exclusive_cpus cannot intersect */ if (cpumask_intersects(cs1->exclusive_cpus, cs2->exclusive_cpus))
- /* The following check applies when either
* both cs1 and cs2 are non-exclusive,or* only cs2 is exclusive.*/- /* Exclusive_cpus cannot intersect */ if (cpumask_intersects(cs1->exclusive_cpus, cs2->exclusive_cpus)) return true;
- /* The cpus_allowed of one cpuset cannot be a subset of another cpuset's exclusive_cpus */
- if (!cpumask_empty(cs1->cpus_allowed) &&
cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))return true;
- /* cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs */ if (!cpumask_empty(cs2->cpus_allowed) && cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus)) return true;
- /* If cs2 is exclusive, check finished here */
- if (is_cpu_exclusive(cs2))
return false;- /* The following check applies only if both cs1 and cs2 are non-exclusive. */
- /* cs1's allowed CPUs cannot be a subset of cs1's exclusive CPUs */
"sibling's exclusive CPUs"
- if (!cpumask_empty(cs1->cpus_allowed) &&
cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))return true;
As said before, we can't fail change to cpuset.cpus by default, but we can fail change to cpuset.cpus.exclusive. So this additional check isn't OK unless this check is under a special mode that is opted in via other means like an additional cgroup control file or a boot command line option and so on.
Cheers, Longman
Hi, Longman,
On Tue, 18 Nov 2025 14:53:27 -0500, Longman wrote:
On 11/16/25 8:57 PM, Sun Shaojie wrote:
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..0fd803612513 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -580,35 +580,56 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2) /**
- cpus_excl_conflict - Check if two cpusets have exclusive CPU conflicts
- @cs1: first cpuset to check
- @cs2: second cpuset to check
- @cs1: current cpuset to check
- @cs2: cpuset involved in the check
- Returns: true if CPU exclusivity conflict exists, false otherwise
- Conflict detection rules:
- If either cpuset is CPU exclusive, they must be mutually exclusive
- For cgroup-v1:
see cpuset1_cpus_excl_conflict()
- For cgroup-v2:
- If cs1 is exclusive, cs1 and cs2 must be mutually exclusive
- exclusive_cpus masks cannot intersect between cpusets
- The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs
- If cs2 is exclusive, cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs
- if cs1 and cs2 are not exclusive, the allowed CPUs of one cpuset cannot be a subset
*/ static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
- of another's exclusive CPUs
As cs1 and cs2 is going to be handled differently, their current naming will make it hard to understand why they are treated differently. I will recommended changing the parameter name to "trial, sibling" as the caller call it with "cpus_excl_conflict(trial, c)" where trial is the new cpuset data to be tested and sibling is one of its sibling cpusets. It has to be clearly document what each parameter is for and the fact that swapping the parameters will cause it to return incorrect result.
{
- /* If either cpuset is exclusive, check if they are mutually exclusive */
- if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
- /* For cgroup-v1 */
- if (!cpuset_v2())
return cpuset1_cpus_excl_conflict(cs1, cs2);- /* If cs1 are exclusive, check if they are mutually exclusive */
- if (is_cpu_exclusive(cs1)) return !cpusets_are_exclusive(cs1, cs2);
Code change like the following can eliminate the need to introduce a new cpuset1_cpus_excl_conflict() helper.
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index ec8bebc66469..201c70fb7401 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -599,9 +599,15 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2) */ static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) {
/* If either cpuset is exclusive, check if they are mutuallyexclusive */
if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))return !cpusets_are_exclusive(cs1, cs2);
/** If trial is exclusive or sibling is exclusive & in v1,* check if they are mutually exclusive*/if (is_cpu_exclusive(trial) || (!cpuset_v2() &&is_cpu_exclusive(sibling)))
return !cpusets_are_exclusive(trial, sibling);if (!cpuset_v2())return false; /* The checking below is irrelevant tocpuset v1 */
/* Exclusive_cpus cannot intersect */ if (cpumask_intersects(cs1->exclusive_cpus, cs2->exclusive_cpus))
Thank you very much for your guidance and suggestions on the code.
I've updated patch v5 with some new ideas and look forward to your feedback.
patch v5 : https://lore.kernel.org/cgroups/20251119105749.1385946-1-sunshaojie@kylinos....
Thanks, Sun Shaojie
On 2025/11/17 9:57, Sun Shaojie wrote:
In cgroup v2, a mutual overlap check is required when at least one of two cpusets is exclusive. However, this check should be relaxed and limited to cases where both cpusets are exclusive.
This patch ensures that for sibling cpusets A1 (exclusive) and B1 (non-exclusive), change B1 cannot affect A1's exclusivity.
for example. Assume a machine has 4 CPUs (0-3).
root cgroup / \ A1 B1
Case 1: Table 1.1: Before applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |
After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0-3). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."
Table 1.2: After applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |
Case 2: (This situation remains unchanged from before) Table 2.1: Before applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #3> echo "1-2" > B1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root invalid | member |
Table 2.2: After applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #3> echo "1-2" > B1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root invalid | member |
All other cases remain unaffected. For example, cgroup-v1, both A1 and B1 are exclusive or non-exlusive.
v3 -> v4:
- Adjust the test_cpuset_prt.sh test file to align with the current behavior.
v2 -> v3:
- Ensure compliance with constraints such as cpuset.cpus.exclusive.
- Link: https://lore.kernel.org/cgroups/20251113131434.606961-1-sunshaojie@kylinos.c...
v1 -> v2:
- Keeps the current cgroup v1 behavior unchanged
- Link: https://lore.kernel.org/cgroups/c8e234f4-2c27-4753-8f39-8ae83197efd3@redhat....
kernel/cgroup/cpuset-internal.h | 3 ++ kernel/cgroup/cpuset-v1.c | 20 +++++++++ kernel/cgroup/cpuset.c | 43 ++++++++++++++----- .../selftests/cgroup/test_cpuset_prs.sh | 5 ++- 4 files changed, 58 insertions(+), 13 deletions(-)
Is this a cover letter?
The cover letter is labeled as v3, while the patch itself is v4.
For a single patch, I don’t think a cover letter is necessary.
On 2025/11/17 11:23, Chen Ridong wrote:
Is this a cover letter?
The cover letter is labeled as v3, while the patch itself is v4.
For a single patch, I don’t think a cover letter is necessary.
Hi, Ridong,
Thank you so much. I've made a mental note of this point.
Thanks, Sun Shaojie
Hi Waiman.
On Wed, Nov 26, 2025 at 02:43:50PM -0500, Waiman Long llong@redhat.com wrote:
Modification to cpumasks are all serialized by the cpuset_mutex. If you are referring to 2 or more tasks doing parallel updates to various cpuset control files of sibling cpusets, the results can actually vary depending on the actual serialization results of those operations.
I meant the latter when the difference in results when concurrent tasks do the update (e.g. two containers start in parallel), I don't see an issue with the race wrt consistency of in-kernel data. We're on the same page here.
One difference between cpuset.cpus and cpuset.cpus.exclusive is the fact that operations on cpuset.cpus.exclusive can fail if the result is not exclusive WRT sibling cpusets, but becoming a valid partition is guaranteed unless none of the exclusive CPUs are passed down from the parent. The use of cpuset.cpus.exclusive is required for creating remote partition.
OTOH, changes to cpuset.cpus will never fail, but becoming a valid partition root is not guaranteed and is limited to the creation of local partition only.
Does that answer your question?
It does help my understanding. Do you envision that remote and local partitions should be used together (in one subtree)?
Thanks, Michal
Hi, Ridong,
On Thu, 20 Nov 2025 21:45:16, Chen Ridong wrote:
On 2025/11/20 21:07, Sun Shaojie wrote:
I have carefully considered the scenario where parent effective CPUs are empty, which corresponds to the following two cases. (After apply this patch).
root cgroup | A1 / \ A2 A3
Case 1: Step: #1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition
After step #4,
| A1 | A2 | A3 |cpus_allowed | 0-1 | 0-1 | | effective_cpus | | 0-1 | | prstate | root | root | member |
After step #4, A3's effective CPUs is empty.
That may be a corner case is unexpected.
#5> echo "0-1" > A3/cpuset.cpus
If we create subdirectories (e.g., A4, A5, ...) under the A1 cpuset and then configure cpuset.cpus for A1 (a common usage scenario), processes can no longer be migrated into these subdirectories (A4, A5, ...) afterward. However, prior to your patch, this migration was allowed.
Are you referring to creating subdirectories (A4, A5, ...) after step #4? And what parameters should be configured for A1's cpuset.cpus? Could you provide a specific example?
Additionally, processes cannot be migrated into a cgroup whose cpuset.cpus.effective is empty. However, this patch does not modify this behavior.
So why does applying this patch enable such migration?
After step #5,
| A1 | A2 | A3 |cpus_allowed | 0-1 | 0-1 | 0-1 | effective_cpus | | 0-1 | | prstate | root | root | member |
Thanks, Sun Shaojie
Hi, Ridong,
Thu, 20 Nov 2025 21:25:12, Chen Ridong wrote:
On 2025/11/20 21:07, Sun Shaojie wrote:
I don't understand the "order of operations" mentioned here. After reviewing the previous email content, are you referring to this?
On Sat, 15 Nov 2025 15:41:03, Chen Ridong wrote:
With the result you expect, would we observe the following behaviors:
#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus #4> echo "1-2" > B1/cpuset.cpus #5> echo "root" > A1/cpuset.cpus.partition #6> echo "root" > B1/cpuset.cpus.partition # A1:root;B1:root invalid
#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus #4> echo "1-2" > B1/cpuset.cpus #5> echo "root" > B1/cpuset.cpus.partition #6> echo "root" > A1/cpuset.cpus.partition # A1:root invalid;B1:root
Do different operation orders yield different results? If so, this is not what we expect.
However, after applying this patch, the outcomes of these two examples are as follows: #1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "1-2" > B1/cpuset.cpus | member | member | #5> echo "root" > A1/cpuset.cpus.partition | root invalid | root | #6> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid|
#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "1-2" > B1/cpuset.cpus | member | member | #5> echo "root" > B1/cpuset.cpus.partition | root | root invalid| #6> echo "root" > A1/cpuset.cpus.partition | root invalid | root invalid|
How about the following two sequences of operations:
#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus #4> echo "root" > A1/cpuset.cpus.partition #5> echo "1-2" > B1/cpuset.cpus #6> echo "root" > B1/cpuset.cpus.partition
#1> mkdir -p A1 #2> mkdir -p B1 #5> echo "1-2" > B1/cpuset.cpus #6> echo "root" > B1/cpuset.cpus.partition #3> echo "0-1" > A1/cpuset.cpus #4> echo "root" > A1/cpuset.cpus.partition
Will these two sequences yield the same result?
As a key requirement: Regardless of the order in which we apply the configurations, identical final settings should always result in identical system states. We need to confirm if this holds true here.
Is this truly a key requirement? It appears this requirement wasn't met even before applying my patch.
The example below, which does not use this patch, demonstrates how different sequences with identical configurations can still lead to different system states.
#1> mkdir -p A1 #2> mkdir -p B1 | A1's prstate | B1's prstate | #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "0-1" > A1/cpuset.cpus.exclusive | member | member | #5> echo "root" > A1/cpuset.cpus.partition | root | member | #6> echo "1-2" > B1/cpuset.cpus | root invalid | member | #7> echo "2-3" > B1/cpuset.cpus.exclusive | root invalid | member | #8> echo "root" > B1/cpuset.cpus.partition | root invalid | root |
#1> mkdir -p A1 #2> mkdir -p B1 | A1's prstate | B1's prstate | #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "0-1" > A1/cpuset.cpus.exclusive | member | member | #5> echo "2-3" > B1/cpuset.cpus.exclusive | member | member | #6> echo "root" > A1/cpuset.cpus.partition | root | member | #7> echo "1-2" > B1/cpuset.cpus | root | member | #8> echo "root" > B1/cpuset.cpus.partition | root | root |
Even without this patch, the result can still differ.
Thanks, Sun Shaojie
On 2025/11/21 18:33, Sun Shaojie wrote:
Hi, Ridong,
Thu, 20 Nov 2025 21:25:12, Chen Ridong wrote:
On 2025/11/20 21:07, Sun Shaojie wrote:
I don't understand the "order of operations" mentioned here. After reviewing the previous email content, are you referring to this?
On Sat, 15 Nov 2025 15:41:03, Chen Ridong wrote:
With the result you expect, would we observe the following behaviors:
#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus #4> echo "1-2" > B1/cpuset.cpus #5> echo "root" > A1/cpuset.cpus.partition #6> echo "root" > B1/cpuset.cpus.partition # A1:root;B1:root invalid
#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus #4> echo "1-2" > B1/cpuset.cpus #5> echo "root" > B1/cpuset.cpus.partition #6> echo "root" > A1/cpuset.cpus.partition # A1:root invalid;B1:root
Do different operation orders yield different results? If so, this is not what we expect.
However, after applying this patch, the outcomes of these two examples are as follows: #1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "1-2" > B1/cpuset.cpus | member | member | #5> echo "root" > A1/cpuset.cpus.partition | root invalid | root | #6> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid|
#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "1-2" > B1/cpuset.cpus | member | member | #5> echo "root" > B1/cpuset.cpus.partition | root | root invalid| #6> echo "root" > A1/cpuset.cpus.partition | root invalid | root invalid|
How about the following two sequences of operations:
#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus #4> echo "root" > A1/cpuset.cpus.partition #5> echo "1-2" > B1/cpuset.cpus #6> echo "root" > B1/cpuset.cpus.partition
#1> mkdir -p A1 #2> mkdir -p B1 #5> echo "1-2" > B1/cpuset.cpus #6> echo "root" > B1/cpuset.cpus.partition #3> echo "0-1" > A1/cpuset.cpus #4> echo "root" > A1/cpuset.cpus.partition
Will these two sequences yield the same result?
As a key requirement: Regardless of the order in which we apply the configurations, identical final settings should always result in identical system states. We need to confirm if this holds true here.
Is this truly a key requirement? It appears this requirement wasn't met even before applying my patch.
I believe it requires, it may some corner cases we should fix.
The example below, which does not use this patch, demonstrates how different sequences with identical configurations can still lead to different system states.
#1> mkdir -p A1 #2> mkdir -p B1 | A1's prstate | B1's prstate | #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "0-1" > A1/cpuset.cpus.exclusive | member | member | #5> echo "root" > A1/cpuset.cpus.partition | root | member | #6> echo "1-2" > B1/cpuset.cpus | root invalid | member | #7> echo "2-3" > B1/cpuset.cpus.exclusive | root invalid | member | #8> echo "root" > B1/cpuset.cpus.partition | root invalid | root |
IIUC, you've created this example with the expectation that both A1 and B1 should serve as root partitions. However, we currently lack a mechanism where modifying a cpuset's state (e.g., cpus, cpus.exclusive, or cpus.partition) can transition its sibling from an invalid to a valid partition.
The behavior observed before step #6 is acceptable. Proactively setting B1 as a partition in step #8 is permitted, given that B1 does not conflict with A1. However, we do not have a mechanism to passively and automatically transition A1 to a valid partition state.
#1> mkdir -p A1 #2> mkdir -p B1 | A1's prstate | B1's prstate | #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "0-1" > A1/cpuset.cpus.exclusive | member | member | #5> echo "2-3" > B1/cpuset.cpus.exclusive | member | member | #6> echo "root" > A1/cpuset.cpus.partition | root | member | #7> echo "1-2" > B1/cpuset.cpus | root | member | #8> echo "root" > B1/cpuset.cpus.partition | root | root |
Even without this patch, the result can still differ.
Thanks, Sun Shaojie
On 2025/11/21 18:32, Sun Shaojie wrote:
Hi, Ridong,
On Thu, 20 Nov 2025 21:45:16, Chen Ridong wrote:
On 2025/11/20 21:07, Sun Shaojie wrote:
I have carefully considered the scenario where parent effective CPUs are empty, which corresponds to the following two cases. (After apply this patch).
root cgroup | A1 / \ A2 A3
Case 1: Step: #1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition
After step #4,
| A1 | A2 | A3 |cpus_allowed | 0-1 | 0-1 | | effective_cpus | | 0-1 | | prstate | root | root | member |
After step #4, A3's effective CPUs is empty.
That may be a corner case is unexpected.
#5> echo "0-1" > A3/cpuset.cpus
If we create subdirectories (e.g., A4, A5, ...) under the A1 cpuset and then configure cpuset.cpus for A1 (a common usage scenario), processes can no longer be migrated into these subdirectories (A4, A5, ...) afterward. However, prior to your patch, this migration was allowed.
Are you referring to creating subdirectories (A4, A5, ...) after step #4? And what parameters should be configured for A1's cpuset.cpus? Could you provide a specific example?
#1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition mkdir A4 mkdir A5 echo "0" > A4/cpuset.cpus echo $$ > A4/cgroup.procs echo "1" > A5/cpuset.cpus echo $$ > A5/cgroup.procs
You might be going to argue that we haven't set the cpus for A4/A5..., yeah, maybe a corner case.
However, it’s common practice to configure a cpuset’s cpus first and then migrate processes—this is a typical usage scenario.
Additionally, processes cannot be migrated into a cgroup whose cpuset.cpus.effective is empty. However, this patch does not modify this behavior.
So why does applying this patch enable such migration?
Hi, Ridong,
On Sat, 22 Nov 2025 09:33:34, Chen Ridong wrote:
On 2025/11/21 18:32, Sun Shaojie wrote:
Hi, Ridong,
On Thu, 20 Nov 2025 21:45:16, Chen Ridong wrote:
On 2025/11/20 21:07, Sun Shaojie wrote:
I have carefully considered the scenario where parent effective CPUs are empty, which corresponds to the following two cases. (After apply this patch).
root cgroup | A1 / \ A2 A3
Case 1: Step: #1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition
After step #4,
| A1 | A2 | A3 |cpus_allowed | 0-1 | 0-1 | | effective_cpus | | 0-1 | | prstate | root | root | member |
After step #4, A3's effective CPUs is empty.
That may be a corner case is unexpected.
#5> echo "0-1" > A3/cpuset.cpus
If we create subdirectories (e.g., A4, A5, ...) under the A1 cpuset and then configure cpuset.cpus for A1 (a common usage scenario), processes can no longer be migrated into these subdirectories (A4, A5, ...) afterward. However, prior to your patch, this migration was allowed.
Are you referring to creating subdirectories (A4, A5, ...) after step #4? And what parameters should be configured for A1's cpuset.cpus? Could you provide a specific example?
#1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition mkdir A4 mkdir A5 echo "0" > A4/cpuset.cpus echo $$ > A4/cgroup.procs echo "1" > A5/cpuset.cpus echo $$ > A5/cgroup.procs
You might be going to argue that we haven't set the cpus for A4/A5..., yeah, maybe a corner case.
However, it’s common practice to configure a cpuset’s cpus first and then migrate processes—this is a typical usage scenario.
I'm sorry, I didn't quite understand the point you were trying to make with this example.
If that's the case
root cgroup | A1 / / \ \ A2 A3 A4 A5
#1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition mkdir A4 mkdir A5 echo "0" > A4/cpuset.cpus echo $$ > A4/cgroup.procs ->This will return an error because A4's effective CPUs are empty. echo "1" > A5/cpuset.cpus echo $$ > A5/cgroup.procs ->This will return an error because A5's effective CPUs are empty.
Even with this patch applied, this result will not change.
Additionally, processes cannot be migrated into a cgroup whose cpuset.cpus.effective is empty. However, this patch does not modify this behavior.
So why does applying this patch enable such migration?
Thanks, Sun Shaojie
Hi, Ridong,
On Sat, 22 Nov 2025 09:19:39, Chen Ridong wrote:
On 2025/11/21 18:33, Sun Shaojie wrote:
Is this truly a key requirement? It appears this requirement wasn't met even before applying my patch.
I believe it requires, it may some corner cases we should fix.
The example below, which does not use this patch, demonstrates how different sequences with identical configurations can still lead to different system states.
#1> mkdir -p A1 #2> mkdir -p B1 | A1's prstate | B1's prstate | #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "0-1" > A1/cpuset.cpus.exclusive | member | member | #5> echo "root" > A1/cpuset.cpus.partition | root | member | #6> echo "1-2" > B1/cpuset.cpus | root invalid | member | #7> echo "2-3" > B1/cpuset.cpus.exclusive | root invalid | member | #8> echo "root" > B1/cpuset.cpus.partition | root invalid | root |
IIUC, you've created this example with the expectation that both A1 and B1 should serve as root partitions. However, we currently lack a mechanism where modifying a cpuset's state (e.g., cpus, cpus.exclusive, or cpus.partition) can transition its sibling from an invalid to a valid partition.
The behavior observed before step #6 is acceptable. Proactively setting B1 as a partition in step #8 is permitted, given that B1 does not conflict with A1. However, we do not have a mechanism to passively and automatically transition A1 to a valid partition state.
So, was the original behavior of invalidating sibling partitions driven by this key requirement? (As a key requirement: Regardless of the order in which we apply the configurations, identical final settings should always result in identical system states.)
#1> mkdir -p A1 #2> mkdir -p B1 | A1's prstate | B1's prstate | #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "0-1" > A1/cpuset.cpus.exclusive | member | member | #5> echo "2-3" > B1/cpuset.cpus.exclusive | member | member | #6> echo "root" > A1/cpuset.cpus.partition | root | member | #7> echo "1-2" > B1/cpuset.cpus | root | member | #8> echo "root" > B1/cpuset.cpus.partition | root | root |
Even without this patch, the result can still differ.
Thanks, Sun Shaojie
On 2025/11/24 18:20, Sun Shaojie wrote:
Hi, Ridong,
On Sat, 22 Nov 2025 09:33:34, Chen Ridong wrote:
On 2025/11/21 18:32, Sun Shaojie wrote:
Hi, Ridong,
On Thu, 20 Nov 2025 21:45:16, Chen Ridong wrote:
On 2025/11/20 21:07, Sun Shaojie wrote:
I have carefully considered the scenario where parent effective CPUs are empty, which corresponds to the following two cases. (After apply this patch).
root cgroup | A1 / \ A2 A3
Case 1: Step: #1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition
After step #4,
| A1 | A2 | A3 |cpus_allowed | 0-1 | 0-1 | | effective_cpus | | 0-1 | | prstate | root | root | member |
After step #4, A3's effective CPUs is empty.
That may be a corner case is unexpected.
#5> echo "0-1" > A3/cpuset.cpus
If we create subdirectories (e.g., A4, A5, ...) under the A1 cpuset and then configure cpuset.cpus for A1 (a common usage scenario), processes can no longer be migrated into these subdirectories (A4, A5, ...) afterward. However, prior to your patch, this migration was allowed.
Are you referring to creating subdirectories (A4, A5, ...) after step #4? And what parameters should be configured for A1's cpuset.cpus? Could you provide a specific example?
#1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition mkdir A4 mkdir A5 echo "0" > A4/cpuset.cpus echo $$ > A4/cgroup.procs echo "1" > A5/cpuset.cpus echo $$ > A5/cgroup.procs
You might be going to argue that we haven't set the cpus for A4/A5..., yeah, maybe a corner case.
However, it’s common practice to configure a cpuset’s cpus first and then migrate processes—this is a typical usage scenario.
I'm sorry, I didn't quite understand the point you were trying to make with this example.
If that's the case
root cgroup | A1 / / \ \ A2 A3 A4 A5#1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition mkdir A4 mkdir A5 echo "0" > A4/cpuset.cpus
If we don't apply your patch, A2 will be invalidated.
echo $$ > A4/cgroup.procs ->This will return an error because A4's effective CPUs are empty. echo "1" > A5/cpuset.cpus echo $$ > A5/cgroup.procs ->This will return an error because A5's effective CPUs are empty.
Even with this patch applied, this result will not change.
You can have a try, the result I got:
# mkdir A1 # echo "0-1" > A1/cpuset.cpus # echo "root" > A1/cpuset.cpus.partition # cd A1/ # mkdir A2 # mkdir A4 # mkdir A5 # echo "0-1" > A2/cpuset.cpus # echo "root" > A2/cpuset.cpus.partition # # echo "0" > A4/cpuset.cpus # cat A2/cpuset.cpus 0-1 # cat A2/cpuset.cpus.partition root invalid # cat A4/cpuset.cpus.effective 0
Additionally, processes cannot be migrated into a cgroup whose cpuset.cpus.effective is empty. However, this patch does not modify this behavior.
So why does applying this patch enable such migration?
Thanks, Sun Shaojie
On 11/19/25 5:57 AM, Sun Shaojie wrote:
Currently, when setting a cpuset's cpuset.cpus to a value that conflicts with its sibling partition, the sibling's partition state becomes invalid. However, this invalidation is often unnecessary. If the cpuset being modified is exclusive, it should invalidate itself upon conflict.
This patch applies only to the following two cases:
Assume the machine has 4 CPUs (0-3).
root cgroup / \ A1 B1Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus
Table 1.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |
After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."
Table 1.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |
Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus
Table 2.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root invalid | root invalid |
After step #4, B1 can exclusively use CPU 2. Therefore, at step #5, regardless of what conflicting value B1 writes to cpuset.cpus, it will always have at least CPU 2 available. This makes it unnecessary to mark A1 as "root invalid".
Table 2.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root | root invalid |
In summary, regardless of how B1 configures its cpuset.cpus, there will always be available CPUs in B1's cpuset.cpus.effective. Therefore, there is no need to change A1 from "root" to "root invalid".
All other cases remain unaffected. For example, cgroup-v1.
This patch is relatively simple. As others have pointed out, there are inconsistency depending on the operation ordering.
In the example above, the final configuration is A1:0-1 & B1:1-2. As the cpu lists overlap, we can't have both of them as valid partition roots. So either one of A1 or B1 is valid or they are both invalid. The current code makes them both invalid no matter the operation ordering. This patch will make one of them valid given the operation ordering above. To minimize partition invalidation, we will have to live with the fact that it will be first-come first-serve as noted by Michal. I am not against this, we just have to document it. However, the following operation order will still make both of them invalid:
# echo "0-1" >A1/cpuset.cpus # echo "2" > B1/cpuset.cpus # echo "1-2" > B1/cpuset.cpus # echo "root" > A1/cpuset.cpus.partition # echo "root" > B1/cpuset.cpus.partition
To follow the "first-come first-serve" rule, A1 should be valid and B1 invalid. That is the inconsistency with your current patch. To fix that, we still need to relax the overlap checking rule similar to your v4 patch.
Cheers, Longman
Hi, Ridong,
On Mon, 24 Nov 2025 19:33:54, Chen Ridong wrote:
On 2025/11/24 18:20, Sun Shaojie wrote:
I'm sorry, I didn't quite understand the point you were trying to make with this example.
If that's the case
root cgroup | A1 / / \ \ A2 A3 A4 A5#1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition mkdir A4 mkdir A5 echo "0" > A4/cpuset.cpus
If we don't apply your patch, A2 will be invalidated.
echo $$ > A4/cgroup.procs ->This will return an error because A4's effective CPUs are empty. echo "1" > A5/cpuset.cpus echo $$ > A5/cgroup.procs ->This will return an error because A5's effective CPUs are empty.
Even with this patch applied, this result will not change.
You can have a try, the result I got:
# mkdir A1 # echo "0-1" > A1/cpuset.cpus # echo "root" > A1/cpuset.cpus.partition # cd A1/ # mkdir A2 # mkdir A4 # mkdir A5 # echo "0-1" > A2/cpuset.cpus # echo "root" > A2/cpuset.cpus.partition # # echo "0" > A4/cpuset.cpus # cat A2/cpuset.cpus 0-1 # cat A2/cpuset.cpus.partition root invalid # cat A4/cpuset.cpus.effective 0
A4's cpuset.cpus.effective is 0 because A2 changed from root to root invalid. However, the purpose of this patch is precisely to keep A2 as "root".
Before 'echo "0" > A4/cpuset.cpus', A4 is aware that its cpuset.cpus.effective is empty and that its parent's cpuset.cpus.effective is also empty. Therefore, after executing 'echo "0" > A4/cpuset.cpus', A4 should anticipate the possibility that it may not be allocated any available CPUs.
Thanks, Sun Shaojie
Hi, Longman,
On Mon, 24 Nov 2025 17:30:47, Waiman Long wrote:
On 11/19/25 5:57 AM, Sun Shaojie wrote:
Currently, when setting a cpuset's cpuset.cpus to a value that conflicts with its sibling partition, the sibling's partition state becomes invalid. However, this invalidation is often unnecessary. If the cpuset being modified is exclusive, it should invalidate itself upon conflict.
This patch applies only to the following two cases:
Assume the machine has 4 CPUs (0-3).
root cgroup / \ A1 B1Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus
Table 1.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |
After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."
Table 1.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |
Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus
Table 2.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root invalid | root invalid |
After step #4, B1 can exclusively use CPU 2. Therefore, at step #5, regardless of what conflicting value B1 writes to cpuset.cpus, it will always have at least CPU 2 available. This makes it unnecessary to mark A1 as "root invalid".
Table 2.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root | root invalid |
In summary, regardless of how B1 configures its cpuset.cpus, there will always be available CPUs in B1's cpuset.cpus.effective. Therefore, there is no need to change A1 from "root" to "root invalid".
All other cases remain unaffected. For example, cgroup-v1.
This patch is relatively simple. As others have pointed out, there are inconsistency depending on the operation ordering.
In the example above, the final configuration is A1:0-1 & B1:1-2. As the cpu lists overlap, we can't have both of them as valid partition roots. So either one of A1 or B1 is valid or they are both invalid. The current code makes them both invalid no matter the operation ordering. This patch will make one of them valid given the operation ordering above. To minimize partition invalidation, we will have to live with the fact that it will be first-come first-serve as noted by Michal. I am not against this, we just have to document it. However, the following operation order will still make both of them invalid:
# echo "0-1" >A1/cpuset.cpus # echo "2" > B1/cpuset.cpus # echo "1-2" > B1/cpuset.cpus # echo "root" > A1/cpuset.cpus.partition # echo "root" > B1/cpuset.cpus.partition
To follow the "first-come first-serve" rule, A1 should be valid and B1 invalid. That is the inconsistency with your current patch. To fix that, we still need to relax the overlap checking rule similar to your v4 patch.
Thank you for your suggestion! Will update.
Thanks, Sun Shaojie
On Thu, Nov 20, 2025 at 09:05:57PM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:
Do you actually want to achieve this or is it an implementation side-effect of the Case 1 scenario that you want to achieve?
Yes, this is indeed the functionality I intended to achieve, as I find it follows the same logic as Case 1.
So you want to achieve a stable [1] set of CPUs for a cgroup that cannot be taken away from you by any sibling, correct? My reasoning is that the siblings should be under one management entity and therefore such overcommitment should be avoided already in the configuration. Invalidating all conflicting siblings is then the most fair result achievable. B1 is a second-class partition _only_ because it starts later or why is it OK to not fulfill its requirement?
[1] Note that A1 should still watch its cpuset.cpus.partition if it takes exclusivity seriously because its cpus may be taken away by hot(un)plug or ancestry reconfiguration.
As for your point that "the effective config cannot be derived just from the applied values," even before this patch, we couldn't derive the final effective configuration solely from the applied values.
For example, consider the following scenario: (not apply this patch) Table 1: Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "1-2" > B1/cpuset.cpus | root invalid | member |
Table 2: Step | A1's prstate | B1's prstate | #1> echo "1-2" > B1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root invalid | member | #3> echo "0-1" > A1/cpuset.cpus | root | member |
After step #3, both Table 1 and Table 2 have identical value settings, yet A1's partition state differs between them.
Aha, I must admit I didn't expect that. IMO, nothing (documented) prevents the latter (Table 2) behavior (here I'm referring to cpuset.cpus, not sure about cpuset.cpus.exclusive). Which of Table 1 or Table do you prefer?
Thanks, Michal
On Mon, Nov 24, 2025 at 05:30:47PM -0500, Waiman Long llong@redhat.com wrote:
In the example above, the final configuration is A1:0-1 & B1:1-2. As the cpu lists overlap, we can't have both of them as valid partition roots. So either one of A1 or B1 is valid or they are both invalid. The current code makes them both invalid no matter the operation ordering. This patch will make one of them valid given the operation ordering above. To minimize partition invalidation, we will have to live with the fact that it will be first-come first-serve as noted by Michal. I am not against this, we just have to document it. However, the following operation order will still make both of them invalid:
I'm skeptical of the FCFS behavior since I'm afraid it may be subject to race conditions in practice. BTW should cpuset.cpus and cpuset.cpus.exclusive have different behavior in this regard?
Thanks, Michal
On 11/26/25 9:13 AM, Michal Koutný wrote:
On Mon, Nov 24, 2025 at 05:30:47PM -0500, Waiman Long llong@redhat.com wrote:
In the example above, the final configuration is A1:0-1 & B1:1-2. As the cpu lists overlap, we can't have both of them as valid partition roots. So either one of A1 or B1 is valid or they are both invalid. The current code makes them both invalid no matter the operation ordering. This patch will make one of them valid given the operation ordering above. To minimize partition invalidation, we will have to live with the fact that it will be first-come first-serve as noted by Michal. I am not against this, we just have to document it. However, the following operation order will still make both of them invalid:
I'm skeptical of the FCFS behavior since I'm afraid it may be subject to race conditions in practice. BTW should cpuset.cpus and cpuset.cpus.exclusive have different behavior in this regard?
Modification to cpumasks are all serialized by the cpuset_mutex. If you are referring to 2 or more tasks doing parallel updates to various cpuset control files of sibling cpusets, the results can actually vary depending on the actual serialization results of those operations.
One difference between cpuset.cpus and cpuset.cpus.exclusive is the fact that operations on cpuset.cpus.exclusive can fail if the result is not exclusive WRT sibling cpusets, but becoming a valid partition is guaranteed unless none of the exclusive CPUs are passed down from the parent. The use of cpuset.cpus.exclusive is required for creating remote partition.
OTOH, changes to cpuset.cpus will never fail, but becoming a valid partition root is not guaranteed and is limited to the creation of local partition only.
Does that answer your question?
Cheers, Longman
On 2025/11/27 3:43, Waiman Long wrote:
On 11/26/25 9:13 AM, Michal Koutný wrote:
On Mon, Nov 24, 2025 at 05:30:47PM -0500, Waiman Long llong@redhat.com wrote:
In the example above, the final configuration is A1:0-1 & B1:1-2. As the cpu lists overlap, we can't have both of them as valid partition roots. So either one of A1 or B1 is valid or they are both invalid. The current code makes them both invalid no matter the operation ordering. This patch will
I have to admit that I prefer the current implementation.
At the very least, it ensures that all partitions are treated fairly[1]. Relaxing this rule would make it more difficult for users to understand why the cpuset.cpus they configured do not match the effective CPUs in use, and why different operation orders yield different results.
In another scenario, if we do not invalidate the siblings, new leaf cpusets (marked as member) created under A1 will end up with empty effective CPUs—and this is not a desired behavior.
root cgroup | A1 / \ A2 A3...
#1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition mkdir A4 mkdir A5 echo "0" > A4/cpuset.cpus echo $$ > A4/cgroup.procs echo "1" > A5/cpuset.cpus echo $$ > A5/cgroup.procs
[1]: "B1 is a second-class partition only because it starts later or why is it OK to not fulfill its requirement?" --Michal.
make one of them valid given the operation ordering above. To minimize partition invalidation, we will have to live with the fact that it will be first-come first-serve as noted by Michal. I am not against this, we just have to document it. However, the following operation order will still make both of them invalid:
I'm skeptical of the FCFS behavior since I'm afraid it may be subject to race conditions in practice. BTW should cpuset.cpus and cpuset.cpus.exclusive have different behavior in this regard?
Modification to cpumasks are all serialized by the cpuset_mutex. If you are referring to 2 or more tasks doing parallel updates to various cpuset control files of sibling cpusets, the results can actually vary depending on the actual serialization results of those operations.
One difference between cpuset.cpus and cpuset.cpus.exclusive is the fact that operations on cpuset.cpus.exclusive can fail if the result is not exclusive WRT sibling cpusets, but becoming a valid partition is guaranteed unless none of the exclusive CPUs are passed down from the parent. The use of cpuset.cpus.exclusive is required for creating remote partition.
OTOH, changes to cpuset.cpus will never fail, but becoming a valid partition root is not guaranteed and is limited to the creation of local partition only.
Does that answer your question?
Cheers, Longman
On 2025/11/26 22:13, Michal Koutný wrote:
On Thu, Nov 20, 2025 at 09:05:57PM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:
Do you actually want to achieve this or is it an implementation side-effect of the Case 1 scenario that you want to achieve?
Yes, this is indeed the functionality I intended to achieve, as I find it follows the same logic as Case 1.
So you want to achieve a stable [1] set of CPUs for a cgroup that cannot be taken away from you by any sibling, correct? My reasoning is that the siblings should be under one management entity and therefore such overcommitment should be avoided already in the configuration. Invalidating all conflicting siblings is then the most fair result achievable. B1 is a second-class partition _only_ because it starts later or why is it OK to not fulfill its requirement?
[1] Note that A1 should still watch its cpuset.cpus.partition if it takes exclusivity seriously because its cpus may be taken away by hot(un)plug or ancestry reconfiguration.
As for your point that "the effective config cannot be derived just from the applied values," even before this patch, we couldn't derive the final effective configuration solely from the applied values.
For example, consider the following scenario: (not apply this patch) Table 1: Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "1-2" > B1/cpuset.cpus | root invalid | member |
Table 2: Step | A1's prstate | B1's prstate | #1> echo "1-2" > B1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root invalid | member | #3> echo "0-1" > A1/cpuset.cpus | root | member |
After step #3, both Table 1 and Table 2 have identical value settings, yet A1's partition state differs between them.
A corner case should be fixed, and I have sent the patch.
https://lore.kernel.org/cgroups/20251115093140.1121329-1-chenridong@huaweicl...
Aha, I must admit I didn't expect that. IMO, nothing (documented) prevents the latter (Table 2) behavior (here I'm referring to cpuset.cpus, not sure about cpuset.cpus.exclusive). Which of Table 1 or Table do you prefer?
Thanks, Michal
Currently, when setting a cpuset's cpuset.cpus to a value that conflicts with its sibling partition, the sibling's partition state becomes invalid. However, this invalidation is often unnecessary.
For example: On a machine with 128 CPUs, there are m (m < 128) cpusets under the root cgroup. Each cpuset is used by a single user(user-1 use A1, ... , user-m use Am), and the partition states of these cpusets are configured as follows:
root cgroup / / \ \ A1 A2 ... An Am (root) (root) ... (root) (root/root invalid/member)
Assume that A1 through Am have not set cpuset.cpus.exclusive. When user-m modifies Am's cpuset.cpus to "0-127", it will cause all partition states from A1 to An to change from root to root invalid, as shown below.
root cgroup / / \ \ A1 A2 ... An Am (root invalid) (root invalid) ... (root invalid) (root invalid/member)
This outcome is entirely undeserved for all users from A1 to An.
This patch prevents such outcomes by ensuring that modifications to cpuset.cpus do not affect the partition state of other sibling cpusets. Therefore, with this patch applied, when user-m configures Am's cpuset.cpus to "0-127", the result will be as follows.
root cgroup / / \ \ A1 A2 ... An Am (root) (root) ... (root) (root invalid/member)
It is worth noting that, since this patch enforces the exclusivity of sibling cpusets, setting exclusivity now follows a "first-come, first-served" principle.
For example, consider the following four steps: before applying this patch, regardless of the order in which they are executed, the final partition state of both A1 and B1 would always be "root invalid."
Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "1-2" > B1/cpuset.cpus | root invalid | member | #4> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid |
After applying this patch, the first party to set "root" will maintain its exclusive validity. As follows:
Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "1-2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root invalid |
Step | A1's prstate | B1's prstate | #1> echo "0-1" > B1/cpuset.cpus | member | member | #2> echo "root" > B1/cpuset.cpus.partition | member | root | #3> echo "1-2" > A1/cpuset.cpus | member | root | #4> echo "root" > A1/cpuset.cpus.partition | root invalid | root |
In summary, if the current cpuset conflicts with its sibling cpusets on exclusive CPUs (If a cpuset is exclusive and its exclusive CPUs are empty, its allowed CPUs will be treated as exclusive CPUs), only the current cpuset should bear the consequences.
Signed-off-by: Sun Shaojie sunshaojie@kylinos.cn --- kernel/cgroup/cpuset-internal.h | 3 + kernel/cgroup/cpuset-v1.c | 19 ++++++ kernel/cgroup/cpuset.c | 60 ++++++++++++------- .../selftests/cgroup/test_cpuset_prs.sh | 12 ++-- 4 files changed, 65 insertions(+), 29 deletions(-)
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h index 337608f408ce..c53111998432 100644 --- a/kernel/cgroup/cpuset-internal.h +++ b/kernel/cgroup/cpuset-internal.h @@ -292,6 +292,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs, struct cpumask *new_cpus, nodemask_t *new_mems, bool cpus_updated, bool mems_updated); int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial); +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2); #else static inline void fmeter_init(struct fmeter *fmp) {} static inline void cpuset1_update_task_spread_flags(struct cpuset *cs, @@ -302,6 +303,8 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs, bool cpus_updated, bool mems_updated) {} static inline int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) { return 0; } +static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, + struct cpuset *cs2) {return false; } #endif /* CONFIG_CPUSETS_V1 */
#endif /* __CPUSET_INTERNAL_H */ diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c index 12e76774c75b..5aa0ac092ef6 100644 --- a/kernel/cgroup/cpuset-v1.c +++ b/kernel/cgroup/cpuset-v1.c @@ -373,6 +373,25 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) return ret; }
+/* + * cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU conflicts + * to legacy (v1) + * @cs1: first cpuset to check + * @cs2: second cpuset to check + * + * Returns: true if CPU exclusivity conflict exists, false otherwise + * + * If either cpuset is CPU exclusive, their allowed CPUs cannot intersect. + */ +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) +{ + if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2)) + return cpumask_intersects(cs1->cpus_allowed, + cs2->cpus_allowed); + + return false; +} + #ifdef CONFIG_PROC_PID_CPUSET /* * proc_cpuset_show() diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..e58dd26e074a 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -586,14 +586,24 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2) * Returns: true if CPU exclusivity conflict exists, false otherwise * * Conflict detection rules: - * 1. If either cpuset is CPU exclusive, they must be mutually exclusive + * For cgroup-v1: + * see cpuset1_cpus_excl_conflict() + * For cgroup-v2: + * 1. If both cs1 and cs2 are exclusive, cs1 and cs2 must be mutually exclusive * 2. exclusive_cpus masks cannot intersect between cpusets * 3. The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs + * 4. If a cpuset is exclusive and its exclusive CPUs are empty, its allowed CPUs + * will be treated as exclusive CPUs; therefore, its allowed CPUs must not + * intersect with another's exclusive CPUs. */ static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) { - /* If either cpuset is exclusive, check if they are mutually exclusive */ - if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2)) + /* For cgroup-v1 */ + if (!cpuset_v2()) + return cpuset1_cpus_excl_conflict(cs1, cs2); + + /* If cpusets are exclusive, check if they are mutually exclusive*/ + if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2)) return !cpusets_are_exclusive(cs1, cs2);
/* Exclusive_cpus cannot intersect */ @@ -609,6 +619,20 @@ static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus)) return true;
+ /* + * When a cpuset is exclusive and its exclusive CPUs are empty, + * its cpus_allowed cannot intersect with another cpuset's exclusive_cpus. + */ + if (is_cpu_exclusive(cs1) && + cpumask_empty(cs1->exclusive_cpus) && + cpumask_intersects(cs1->cpus_allowed, cs2->exclusive_cpus)) + return true; + + if (is_cpu_exclusive(cs2) && + cpumask_empty(cs2->exclusive_cpus) && + cpumask_intersects(cs2->cpus_allowed, cs1->exclusive_cpus)) + return true; + return false; }
@@ -2411,34 +2435,17 @@ static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialc struct tmpmasks *tmp) { int retval; - struct cpuset *parent = parent_cs(cs);
retval = validate_change(cs, trialcs);
if ((retval == -EINVAL) && cpuset_v2()) { - struct cgroup_subsys_state *css; - struct cpuset *cp; - /* * The -EINVAL error code indicates that partition sibling * CPU exclusivity rule has been violated. We still allow * the cpumask change to proceed while invalidating the - * partition. However, any conflicting sibling partitions - * have to be marked as invalid too. + * partition. */ trialcs->prs_err = PERR_NOTEXCL; - rcu_read_lock(); - cpuset_for_each_child(cp, css, parent) { - struct cpumask *xcpus = user_xcpus(trialcs); - - if (is_partition_valid(cp) && - cpumask_intersects(xcpus, cp->effective_xcpus)) { - rcu_read_unlock(); - update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp); - rcu_read_lock(); - } - } - rcu_read_unlock(); retval = 0; } return retval; @@ -2506,8 +2513,15 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, if (alloc_tmpmasks(&tmp)) return -ENOMEM;
- compute_trialcs_excpus(trialcs, cs); - trialcs->prs_err = PERR_NONE; + /* + * if there is exclusive CPUs conflict with the siblings, + * we still allow the cpumask change to proceed while + * invalidating the partition. + */ + if (compute_trialcs_excpus(trialcs, cs)) + trialcs->prs_err = PERR_NOTEXCL; + else + trialcs->prs_err = PERR_NONE;
retval = cpus_allowed_validate_change(cs, trialcs, &tmp); if (retval < 0) diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh index a17256d9f88a..75154e22c702 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -269,7 +269,7 @@ TEST_MATRIX=( " C0-3:S+ C1-3:S+ C2-3 . X2-3 X3:P2 . . 0 A1:0-2|A2:3|A3:3 A1:P0|A2:P2 3" " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3" " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2:C3 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3" - " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-3|A2:1-3|A3:2-3|B1:2-3 A1:P0|A3:P0|B1:P-2" + " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-1|A2:1|A3:1|B1:2-3 A1:P0|A3:P0|B1:P2" " C0-3:S+ C1-3:S+ C2-3 C4-5 . . . P2 0 B1:4-5 B1:P2 4-5" " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4" " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2:C1-3 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4" @@ -318,7 +318,7 @@ TEST_MATRIX=( # Invalid to valid local partition direct transition tests " C1-3:S+:P2 X4:P2 . . . . . . 0 A1:1-3|XA1:1-3|A2:1-3:XA2: A1:P2|A2:P-2 1-3" " C1-3:S+:P2 X4:P2 . . . X3:P2 . . 0 A1:1-2|XA1:1-3|A2:3:XA2:3 A1:P2|A2:P2 1-3" - " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:4-6 A1:P-2|B1:P0" + " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:5-6 A1:P2|B1:P0" " C0-3:P2 . . C4-6 C0-4:C0-3 . . . 0 A1:0-3|B1:4-6 A1:P2|B1:P0 0-3"
# Local partition invalidation tests @@ -388,10 +388,10 @@ TEST_MATRIX=( " C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:1 A1:P0|A2:P-2" " C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"
- # A non-exclusive cpuset.cpus change will invalidate partition and its siblings - " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P0" - " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P-1" - " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P0|B1:P-1" + # A non-exclusive cpuset.cpus change will not invalidate its siblings partition. + " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:3 A1:P1|B1:P0" + " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P-1|B1:P1" + " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P0|B1:P1"
# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it " C0-3 . . C4-5 X5 . . . 0 A1:0-3|B1:4-5"
Hi, Michal
On Wed, 26 Nov 2025 15:13:13, Michal Koutný wrote:
On Thu, Nov 20, 2025 at 09:05:57PM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:
Do you actually want to achieve this or is it an implementation side-effect of the Case 1 scenario that you want to achieve?
Yes, this is indeed the functionality I intended to achieve, as I find it follows the same logic as Case 1.
So you want to achieve a stable [1] set of CPUs for a cgroup that cannot be taken away from you by any sibling, correct? My reasoning is that the siblings should be under one management entity and therefore such overcommitment should be avoided already in the configuration. Invalidating all conflicting siblings is then the most fair result achievable. B1 is a second-class partition _only_ because it starts later or why is it OK to not fulfill its requirement?
If the siblings are under a single management entity, that certainly works. But what if there are multiple administrative users? Should we really violate other users' requirements just to satisfy one user's requirement? Given this, first-come-first-served might be fairer.
[1] Note that A1 should still watch its cpuset.cpus.partition if it takes exclusivity seriously because its cpus may be taken away by hot(un)plug or ancestry reconfiguration.
Thanks, Sun Shaojie
Hi, Ridong,
On Thu, 27 Nov 2025 09:55:21, Chen Ridong wrote:
I have to admit that I prefer the current implementation.
At the very least, it ensures that all partitions are treated fairly[1]. Relaxing this rule would make it more difficult for users to understand why the cpuset.cpus they configured do not match the effective CPUs in use, and why different operation orders yield different results.
As for "different operation orders yield different results", Below is an example that is not a corner case.
root cgroup / \ A1 B1
#1> echo "0" > A1/cpuset.cpus #2> echo "0-1" > B1/cpuset.cpus.exclusive --> return error
#1> echo "0-1" > B1/cpuset.cpus.exclusive #2> echo "0" > A1/cpuset.cpus
In another scenario, if we do not invalidate the siblings, new leaf cpusets (marked as member) created under A1 will end up with empty effective CPUs—and this is not a desired behavior.
root cgroup | A1 / \ A2 A3...
#1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition mkdir A4 mkdir A5 echo "0" > A4/cpuset.cpus echo $$ > A4/cgroup.procs echo "1" > A5/cpuset.cpus echo $$ > A5/cgroup.procs
If A2...A5 all belong to the same user, and that user wants both A4 and A5 to have effective CPUs, then the user should also understand that A2 needs to be adjusted to "member" instead of "root".
if A2...A5 belong to different users, must satisfying user A4’s requirement come at the expense of user A2’s requirement? That is not fair.
[1]: "B1 is a second-class partition only because it starts later or why is it OK to not fulfill its requirement?" --Michal.
Thanks, Sun Shaojie
Hello.
On Mon, Dec 01, 2025 at 05:44:47PM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:
As for "different operation orders yield different results", Below is an example that is not a corner case.
root cgroup / \ A1 B1#1> echo "0" > A1/cpuset.cpus #2> echo "0-1" > B1/cpuset.cpus.exclusive --> return error
#1> echo "0-1" > B1/cpuset.cpus.exclusive #2> echo "0" > A1/cpuset.cpus
Here it is a combination of remote cs local partitions. I'd like to treat the two approaches separately and better not consider their combination.
The idea (and permissions check AFACS) behind remote partitions is to allow "stealing" CPU ownership so cpuset.cpus.exclusive has different behavior.
root cgroup | A1 //MK: A4 A5 here? / \ A2 A3... //MK: A4 A5 or here?
#1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition mkdir A4 mkdir A5 echo "0" > A4/cpuset.cpus echo $$ > A4/cgroup.procs echo "1" > A5/cpuset.cpus echo $$ > A5/cgroup.procs
If A2...A5 all belong to the same user, and that user wants both A4 and A5 to have effective CPUs, then the user should also understand that A2 needs to be adjusted to "member" instead of "root".
if A2...A5 belong to different users, must satisfying user A4’s requirement come at the expense of user A2’s requirement? That is not fair.
If A4 is a sibling at the level of A1, then A2 must be stripped of its CPUs to honor the hierarchy hence the apparent unfairness.
If A4 is a sibling at the level of A2 and they have different owning users, their respective cpuset.cpus should only be writable by A1's user (the one who distributes the cpus) so that any arbitration between the siblings is avoided.
0.02€, Michal
linux-kselftest-mirror@lists.linaro.org