[PATCH v3 0/1] cpuset: relax the overlap check for cgroup-v2

List overview All Threads
Download

newer

older

[PATCH 00/20] ublk: add support...

[PATCH net-next v12 00/12] vsock:...

Sun Shaojie

17 Nov 2025 17 Nov '25

1:57 a.m.

In cgroup v2, a mutual overlap check is required when at least one of two cpusets is exclusive. However, this check should be relaxed and limited to cases where both cpusets are exclusive.

This patch ensures that for sibling cpusets A1 (exclusive) and B1 (non-exclusive), change B1 cannot affect A1's exclusivity.

for example. Assume a machine has 4 CPUs (0-3).

root cgroup / \ A1 B1

After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0-3). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."

All other cases remain unaffected. For example, cgroup-v1, both A1 and B1 are exclusive or non-exlusive.

--- v3 -> v4: - Adjust the test_cpuset_prt.sh test file to align with the current behavior.

v2 -> v3: - Ensure compliance with constraints such as cpuset.cpus.exclusive. - Link: https://lore.kernel.org/cgroups/20251113131434.606961-1-sunshaojie@kylinos.c...

v1 -> v2: - Keeps the current cgroup v1 behavior unchanged - Link: https://lore.kernel.org/cgroups/c8e234f4-2c27-4753-8f39-8ae83197efd3@redhat....

--- kernel/cgroup/cpuset-internal.h | 3 ++ kernel/cgroup/cpuset-v1.c | 20 +++++++++ kernel/cgroup/cpuset.c | 43 ++++++++++++++----- .../selftests/cgroup/test_cpuset_prs.sh | 5 ++- 4 files changed, 58 insertions(+), 13 deletions(-)

-- 2.25.1

Show replies by date

Sun Shaojie

17 Nov 17 Nov

1:57 a.m.

New subject: [PATCH v4 1/1] cpuset: relax the overlap check for cgroup-v2

In cgroup v2, a mutual overlap check is required when at least one of two cpusets is exclusive. However, this check should be relaxed and limited to cases where both cpusets are exclusive.

This patch ensures that for sibling cpusets A1 (exclusive) and B1 (non-exclusive), change B1 cannot affect A1's exclusivity.

for example. Assume a machine has 4 CPUs (0-3).

root cgroup / \ A1 B1

All other cases remain unaffected. For example, cgroup-v1, both A1 and B1 are exclusive or non-exlusive.

Signed-off-by: Sun Shaojie sunshaojie@kylinos.cn --- kernel/cgroup/cpuset-internal.h | 3 ++ kernel/cgroup/cpuset-v1.c | 20 +++++++++ kernel/cgroup/cpuset.c | 43 ++++++++++++++----- .../selftests/cgroup/test_cpuset_prs.sh | 5 ++- 4 files changed, 58 insertions(+), 13 deletions(-)

diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h index 337608f408ce..c53111998432 100644 --- a/kernel/cgroup/cpuset-internal.h +++ b/kernel/cgroup/cpuset-internal.h @@ -292,6 +292,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs, struct cpumask *new_cpus, nodemask_t *new_mems, bool cpus_updated, bool mems_updated); int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial); +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2); #else static inline void fmeter_init(struct fmeter *fmp) {} static inline void cpuset1_update_task_spread_flags(struct cpuset *cs, @@ -302,6 +303,8 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs, bool cpus_updated, bool mems_updated) {} static inline int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) { return 0; } +static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, + struct cpuset *cs2) {return false; } #endif /* CONFIG_CPUSETS_V1 */

#endif /* __CPUSET_INTERNAL_H */ diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c index 12e76774c75b..5c1296bf6a34 100644 --- a/kernel/cgroup/cpuset-v1.c +++ b/kernel/cgroup/cpuset-v1.c @@ -373,6 +373,26 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) return ret; }

/** * cpus_excl_conflict - Check if two cpusets have exclusive CPU conflicts - * @cs1: first cpuset to check - * @cs2: second cpuset to check + * @cs1: current cpuset to check + * @cs2: cpuset involved in the check * * Returns: true if CPU exclusivity conflict exists, false otherwise * * Conflict detection rules: - * 1. If either cpuset is CPU exclusive, they must be mutually exclusive + * For cgroup-v1: + * see cpuset1_cpus_excl_conflict() + * For cgroup-v2: + * 1. If cs1 is exclusive, cs1 and cs2 must be mutually exclusive * 2. exclusive_cpus masks cannot intersect between cpusets - * 3. The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs + * 3. If cs2 is exclusive, cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs + * 4. if cs1 and cs2 are not exclusive, the allowed CPUs of one cpuset cannot be a subset + * of another's exclusive CPUs */ static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) { - /* If either cpuset is exclusive, check if they are mutually exclusive */ - if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2)) + /* For cgroup-v1 */ + if (!cpuset_v2()) + return cpuset1_cpus_excl_conflict(cs1, cs2); + + /* If cs1 are exclusive, check if they are mutually exclusive */ + if (is_cpu_exclusive(cs1)) return !cpusets_are_exclusive(cs1, cs2);

+ /* The following check applies when either + * both cs1 and cs2 are non-exclusive，or + * only cs2 is exclusive. + */ + /* Exclusive_cpus cannot intersect */ if (cpumask_intersects(cs1->exclusive_cpus, cs2->exclusive_cpus)) return true;

- /* The cpus_allowed of one cpuset cannot be a subset of another cpuset's exclusive_cpus */ - if (!cpumask_empty(cs1->cpus_allowed) && - cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus)) - return true; - + /* cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs */ if (!cpumask_empty(cs2->cpus_allowed) && cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus)) return true;

+ /* If cs2 is exclusive, check finished here */ + if (is_cpu_exclusive(cs2)) + return false; + + /* The following check applies only if both cs1 and cs2 are non-exclusive. */ + + /* cs1's allowed CPUs cannot be a subset of cs1's exclusive CPUs */ + if (!cpumask_empty(cs1->cpus_allowed) && + cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus)) + return true; + return false; }

diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh index a17256d9f88a..b848bc0729cf 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -388,10 +388,11 @@ TEST_MATRIX=( " C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:1 A1:P0|A2:P-2" " C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"

- # A non-exclusive cpuset.cpus change will invalidate partition and its siblings + # A non-exclusive cpuset.cpus change will not invalidate its siblings partition. + # But a exclusive cpuset.cpus change will invalidate itself. " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P0" " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P-1" - " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P0|B1:P-1" + " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P0|B1:P1"

# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it " C0-3 . . C4-5 X5 . . . 0 A1:0-3|B1:4-5"

-- 2.25.1

Chen Ridong

7:45 a.m.

New subject: [PATCH v4 1/1] cpuset: relax the overlap check for cgroup-v2

On 2025/11/17 9:57, Sun Shaojie wrote:

...

In cgroup v2, a mutual overlap check is required when at least one of two cpusets is exclusive. However, this check should be relaxed and limited to cases where both cpusets are exclusive.

This patch ensures that for sibling cpusets A1 (exclusive) and B1 (non-exclusive), change B1 cannot affect A1's exclusivity.

for example. Assume a machine has 4 CPUs (0-3).

root cgroup / \ A1 B1

Case 1: Table 1.1: Before applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |

After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0-3). However, B1 can actually

B1 (0-3) --> B1(0) ?

...

use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."

Table 1.2: After applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |

All other cases remain unaffected. For example, cgroup-v1, both A1 and B1 are exclusive or non-exlusive.

Signed-off-by: Sun Shaojie sunshaojie@kylinos.cn

kernel/cgroup/cpuset-internal.h | 3 ++ kernel/cgroup/cpuset-v1.c | 20 +++++++++ kernel/cgroup/cpuset.c | 43 ++++++++++++++----- .../selftests/cgroup/test_cpuset_prs.sh | 5 ++- 4 files changed, 58 insertions(+), 13 deletions(-)

diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h index 337608f408ce..c53111998432 100644 --- a/kernel/cgroup/cpuset-internal.h +++ b/kernel/cgroup/cpuset-internal.h @@ -292,6 +292,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs, struct cpumask *new_cpus, nodemask_t *new_mems, bool cpus_updated, bool mems_updated); int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial); +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2); #else static inline void fmeter_init(struct fmeter *fmp) {} static inline void cpuset1_update_task_spread_flags(struct cpuset *cs, @@ -302,6 +303,8 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs, bool cpus_updated, bool mems_updated) {} static inline int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) { return 0; } +static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1,
		struct cpuset *cs2) {return false; }
#endif /* CONFIG_CPUSETS_V1 */ #endif /* __CPUSET_INTERNAL_H */ diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c index 12e76774c75b..5c1296bf6a34 100644 --- a/kernel/cgroup/cpuset-v1.c +++ b/kernel/cgroup/cpuset-v1.c @@ -373,6 +373,26 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) return ret; } +/*
cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU conflicts
                           to legacy (v1)
@cs1: first cpuset to check

@cs2: second cpuset to check

Returns: true if CPU exclusivity conflict exists, false otherwise

If either cpuset is CPU exclusive, their allowed CPUs cannot intersect.

*/
+bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) +{
if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
if (cpumask_intersects(cs1->cpus_allowed,
		       cs2->cpus_allowed))
	return true;
return false;
+}

#ifdef CONFIG_PROC_PID_CPUSET /*

proc_cpuset_show()

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..0fd803612513 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -580,35 +580,56 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2) /**

cpus_excl_conflict - Check if two cpusets have exclusive CPU conflicts

@cs1: first cpuset to check

@cs2: second cpuset to check

@cs1: current cpuset to check

@cs2: cpuset involved in the check

Returns: true if CPU exclusivity conflict exists, false otherwise

Conflict detection rules:

If either cpuset is CPU exclusive, they must be mutually exclusive
For cgroup-v1:
see cpuset1_cpus_excl_conflict()
For cgroup-v2:

If cs1 is exclusive, cs1 and cs2 must be mutually exclusive

exclusive_cpus masks cannot intersect between cpusets
The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs

If cs2 is exclusive, cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs

if cs1 and cs2 are not exclusive, the allowed CPUs of one cpuset cannot be a subset

of another's exclusive CPUs

*/

The revised conflict detection rules are confusing to me. I thought cs1 and cs2 should be treated symmetrically, but that doesn’t seem to be the case here.

Shouldn’t the following rule apply regardless of whether cs1 or cs2 is exclusive: "The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs"?

...

static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) {

/* If either cpuset is exclusive, check if they are mutually exclusive */

if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
/* For cgroup-v1 */

if (!cpuset_v2())
return cpuset1_cpus_excl_conflict(cs1, cs2);
/* If cs1 are exclusive, check if they are mutually exclusive */

if (is_cpu_exclusive(cs1)) return !cpusets_are_exclusive(cs1, cs2);
/* The following check applies when either
* both cs1 and cs2 are non-exclusive，or
* only cs2 is exclusive.
*/
/* Exclusive_cpus cannot intersect */ if (cpumask_intersects(cs1->exclusive_cpus, cs2->exclusive_cpus)) return true;
/* The cpus_allowed of one cpuset cannot be a subset of another cpuset's exclusive_cpus */

if (!cpumask_empty(cs1->cpus_allowed) &&
   cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))
return true;
/* cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs */ if (!cpumask_empty(cs2->cpus_allowed) && cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus)) return true;
/* If cs2 is exclusive, check finished here */

if (is_cpu_exclusive(cs2))
return false;
/* The following check applies only if both cs1 and cs2 are non-exclusive. */

/* cs1's allowed CPUs cannot be a subset of cs1's exclusive CPUs */

if (!cpumask_empty(cs1->cpus_allowed) &&
   cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))
return true;
return false;
}

From your commit message, it appears you intend to modify "if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))" to "if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))".

However, I’m having trouble following the change.

...

diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh index a17256d9f88a..b848bc0729cf 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -388,10 +388,11 @@ TEST_MATRIX=( " C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:1 A1:P0|A2:P-2" " C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"

# A non-exclusive cpuset.cpus change will invalidate partition and its siblings

# A non-exclusive cpuset.cpus change will not invalidate its siblings partition.

# But a exclusive cpuset.cpus change will invalidate itself. " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P0" " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P-1"

" C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P0|B1:P-1"

" C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P0|B1:P1"

# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it " C0-3 . . C4-5 X5 . . . 0 A1:0-3|B1:4-5"

-- Best regards, Ridong

Sun Shaojie

10 a.m.

New subject: [PATCH v4 1/1] cpuset: relax the overlap check for cgroup-v2

On 2025/11/17 15:45, Chen Ridong Wrote:

...

On 2025/11/17 9:57, Sun Shaojie wrote:

...
In cgroup v2, a mutual overlap check is required when at least one of two cpusets is exclusive. However, this check should be relaxed and limited to cases where both cpusets are exclusive.

This patch ensures that for sibling cpusets A1 (exclusive) and B1 (non-exclusive), change B1 cannot affect A1's exclusivity.

for example. Assume a machine has 4 CPUs (0-3).

root cgroup / \ A1 B1

Case 1: Table 1.1: Before applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |

After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0-3). However, B1 can actually

B1 (0-3) --> B1(0) ?

Sorry, that was a typo. It should indeed be B1 (0).

...

...
use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."

Table 1.2: After applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |

All other cases remain unaffected. For example, cgroup-v1, both A1 and B1 are exclusive or non-exlusive.

Signed-off-by: Sun Shaojie sunshaojie@kylinos.cn

kernel/cgroup/cpuset-internal.h | 3 ++ kernel/cgroup/cpuset-v1.c | 20 +++++++++ kernel/cgroup/cpuset.c | 43 ++++++++++++++----- .../selftests/cgroup/test_cpuset_prs.sh | 5 ++- 4 files changed, 58 insertions(+), 13 deletions(-)

diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h index 337608f408ce..c53111998432 100644 --- a/kernel/cgroup/cpuset-internal.h +++ b/kernel/cgroup/cpuset-internal.h @@ -292,6 +292,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs, struct cpumask *new_cpus, nodemask_t *new_mems, bool cpus_updated, bool mems_updated); int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial); +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2); #else static inline void fmeter_init(struct fmeter *fmp) {} static inline void cpuset1_update_task_spread_flags(struct cpuset *cs, @@ -302,6 +303,8 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs, bool cpus_updated, bool mems_updated) {} static inline int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) { return 0; } +static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1,
		struct cpuset *cs2) {return false; }
#endif /* CONFIG_CPUSETS_V1 */ #endif /* __CPUSET_INTERNAL_H */ diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c index 12e76774c75b..5c1296bf6a34 100644 --- a/kernel/cgroup/cpuset-v1.c +++ b/kernel/cgroup/cpuset-v1.c @@ -373,6 +373,26 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) return ret; } +/*
cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU conflicts
                           to legacy (v1)
@cs1: first cpuset to check

@cs2: second cpuset to check

Returns: true if CPU exclusivity conflict exists, false otherwise

If either cpuset is CPU exclusive, their allowed CPUs cannot intersect.

*/
+bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) +{
if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
if (cpumask_intersects(cs1->cpus_allowed,
		       cs2->cpus_allowed))
	return true;
return false;
+}

#ifdef CONFIG_PROC_PID_CPUSET /*

proc_cpuset_show()

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..0fd803612513 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -580,35 +580,56 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2) /**

cpus_excl_conflict - Check if two cpusets have exclusive CPU conflicts

@cs1: first cpuset to check

@cs2: second cpuset to check

@cs1: current cpuset to check

@cs2: cpuset involved in the check

Returns: true if CPU exclusivity conflict exists, false otherwise

Conflict detection rules:

If either cpuset is CPU exclusive, they must be mutually exclusive
For cgroup-v1:
see cpuset1_cpus_excl_conflict()
For cgroup-v2:

If cs1 is exclusive, cs1 and cs2 must be mutually exclusive

exclusive_cpus masks cannot intersect between cpusets
The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs

If cs2 is exclusive, cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs

if cs1 and cs2 are not exclusive, the allowed CPUs of one cpuset cannot be a subset

of another's exclusive CPUs

*/
The revised conflict detection rules are confusing to me. I thought cs1 and cs2 should be treated symmetrically, but that doesn’t seem to be the case here.

Shouldn’t the following rule apply regardless of whether cs1 or cs2 is exclusive: "The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs"?

Certainly, this rule applies regardless of whether cs1 or cs2 is exclusive, and the current implementation already handles it this way. The following two cases cover this rule. "1. If cs1 is exclusive, cs1 and cs2 must be mutually exclusive" "3. If cs2 is exclusive, cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs"

...

...
static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) {

/* If either cpuset is exclusive, check if they are mutually exclusive */

if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
/* For cgroup-v1 */

if (!cpuset_v2())
return cpuset1_cpus_excl_conflict(cs1, cs2);
/* If cs1 are exclusive, check if they are mutually exclusive */

if (is_cpu_exclusive(cs1)) return !cpusets_are_exclusive(cs1, cs2);
/* The following check applies when either
* both cs1 and cs2 are non-exclusive，or
* only cs2 is exclusive.
*/
/* Exclusive_cpus cannot intersect */ if (cpumask_intersects(cs1->exclusive_cpus, cs2->exclusive_cpus)) return true;
/* The cpus_allowed of one cpuset cannot be a subset of another cpuset's exclusive_cpus */

if (!cpumask_empty(cs1->cpus_allowed) &&
   cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))
return true;
/* cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs */ if (!cpumask_empty(cs2->cpus_allowed) && cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus)) return true;
/* If cs2 is exclusive, check finished here */

if (is_cpu_exclusive(cs2))
return false;
/* The following check applies only if both cs1 and cs2 are non-exclusive. */

/* cs1's allowed CPUs cannot be a subset of cs1's exclusive CPUs */

if (!cpumask_empty(cs1->cpus_allowed) &&
   cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))
return true;
return false;
}
From your commit message, it appears you intend to modify "if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))" to "if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))".

However, I’m having trouble following the change.

The current modification specifically addresses the scenario where one cpuset A1 is exclusive and its sibling cpuset B1 is non-exclusive. The goal is to ensure that when the non-exclusive cpuset B1 modifies its own "cpuset.cpus" or "cpuset.cpus.exclusive", it does not cause A1 to change from exclusive to non-exclusive.

The following three scenarios are not affected by this patch: 1.both A1 and B1 are exclusive. 2.both A1 and B1 are non-exclusive. 3.A1 is exclusive, B1 is non-exclusive, change "cpuset.cpus" or "cpuset.cpus.exclusive" of A1.

...

...
diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh index a17256d9f88a..b848bc0729cf 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -388,10 +388,11 @@ TEST_MATRIX=( " C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:1 A1:P0|A2:P-2" " C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"

# A non-exclusive cpuset.cpus change will invalidate partition and its siblings

# A non-exclusive cpuset.cpus change will not invalidate its siblings partition.

# But a exclusive cpuset.cpus change will invalidate itself. " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P0" " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P-1"

" C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P0|B1:P-1"

" C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P0|B1:P1"

# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it " C0-3 . . C4-5 X5 . . . 0 A1:0-3|B1:4-5"

Chen Ridong

11:37 a.m.

New subject: [PATCH v4 1/1] cpuset: relax the overlap check for cgroup-v2

On 2025/11/17 18:00, Sun Shaojie wrote:

...

On 2025/11/17 15:45, Chen Ridong Wrote:

...
On 2025/11/17 9:57, Sun Shaojie wrote:

...
In cgroup v2, a mutual overlap check is required when at least one of two cpusets is exclusive. However, this check should be relaxed and limited to cases where both cpusets are exclusive.

This patch ensures that for sibling cpusets A1 (exclusive) and B1 (non-exclusive), change B1 cannot affect A1's exclusivity.

for example. Assume a machine has 4 CPUs (0-3).

root cgroup / \ A1 B1

Case 1: Table 1.1: Before applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |

After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0-3). However, B1 can actually

B1 (0-3) --> B1(0) ?

Sorry, that was a typo. It should indeed be B1 (0).

...
...
use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."

Table 1.2: After applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |

All other cases remain unaffected. For example, cgroup-v1, both A1 and B1 are exclusive or non-exlusive.

Signed-off-by: Sun Shaojie sunshaojie@kylinos.cn

kernel/cgroup/cpuset-internal.h | 3 ++ kernel/cgroup/cpuset-v1.c | 20 +++++++++ kernel/cgroup/cpuset.c | 43 ++++++++++++++----- .../selftests/cgroup/test_cpuset_prs.sh | 5 ++- 4 files changed, 58 insertions(+), 13 deletions(-)

diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h index 337608f408ce..c53111998432 100644 --- a/kernel/cgroup/cpuset-internal.h +++ b/kernel/cgroup/cpuset-internal.h @@ -292,6 +292,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs, struct cpumask *new_cpus, nodemask_t *new_mems, bool cpus_updated, bool mems_updated); int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial); +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2); #else static inline void fmeter_init(struct fmeter *fmp) {} static inline void cpuset1_update_task_spread_flags(struct cpuset *cs, @@ -302,6 +303,8 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs, bool cpus_updated, bool mems_updated) {} static inline int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) { return 0; } +static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1,
		struct cpuset *cs2) {return false; }
#endif /* CONFIG_CPUSETS_V1 */ #endif /* __CPUSET_INTERNAL_H */ diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c index 12e76774c75b..5c1296bf6a34 100644 --- a/kernel/cgroup/cpuset-v1.c +++ b/kernel/cgroup/cpuset-v1.c @@ -373,6 +373,26 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) return ret; } +/*
cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU conflicts
                           to legacy (v1)
@cs1: first cpuset to check

@cs2: second cpuset to check

Returns: true if CPU exclusivity conflict exists, false otherwise

If either cpuset is CPU exclusive, their allowed CPUs cannot intersect.

*/
+bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) +{
if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
if (cpumask_intersects(cs1->cpus_allowed,
		       cs2->cpus_allowed))
	return true;
return false;
+}

#ifdef CONFIG_PROC_PID_CPUSET /*

proc_cpuset_show()

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..0fd803612513 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -580,35 +580,56 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2) /**

cpus_excl_conflict - Check if two cpusets have exclusive CPU conflicts

@cs1: first cpuset to check

@cs2: second cpuset to check

@cs1: current cpuset to check

@cs2: cpuset involved in the check

Returns: true if CPU exclusivity conflict exists, false otherwise

Conflict detection rules:

If either cpuset is CPU exclusive, they must be mutually exclusive
For cgroup-v1:
see cpuset1_cpus_excl_conflict()
For cgroup-v2:

If cs1 is exclusive, cs1 and cs2 must be mutually exclusive

exclusive_cpus masks cannot intersect between cpusets
The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs

If cs2 is exclusive, cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs

if cs1 and cs2 are not exclusive, the allowed CPUs of one cpuset cannot be a subset

of another's exclusive CPUs

*/
The revised conflict detection rules are confusing to me. I thought cs1 and cs2 should be treated symmetrically, but that doesn’t seem to be the case here.

Shouldn’t the following rule apply regardless of whether cs1 or cs2 is exclusive: "The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs"?
Certainly, this rule applies regardless of whether cs1 or cs2 is exclusive, and the current implementation already handles it this way. The following two cases cover this rule. "1. If cs1 is exclusive, cs1 and cs2 must be mutually exclusive" "3. If cs2 is exclusive, cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs"

I believe this function should return the same result regardless of whether it is called as cpus_excl_conflict(A1, B1) or cpus_excl_conflict(B1, A1), which means cs1 and cs2 should be treated symmetrically. However, since cs1 and cs2 are handled differently, it is difficult to convince me that this implementation is correct.

...

...
...
static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) {

/* If either cpuset is exclusive, check if they are mutually exclusive */

if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
/* For cgroup-v1 */

if (!cpuset_v2())
return cpuset1_cpus_excl_conflict(cs1, cs2);
/* If cs1 are exclusive, check if they are mutually exclusive */

if (is_cpu_exclusive(cs1)) return !cpusets_are_exclusive(cs1, cs2);
/* The following check applies when either
* both cs1 and cs2 are non-exclusive，or
* only cs2 is exclusive.
*/
/* Exclusive_cpus cannot intersect */ if (cpumask_intersects(cs1->exclusive_cpus, cs2->exclusive_cpus)) return true;
/* The cpus_allowed of one cpuset cannot be a subset of another cpuset's exclusive_cpus */

if (!cpumask_empty(cs1->cpus_allowed) &&
   cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))
return true;
/* cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs */ if (!cpumask_empty(cs2->cpus_allowed) && cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus)) return true;
/* If cs2 is exclusive, check finished here */

if (is_cpu_exclusive(cs2))
return false;
/* The following check applies only if both cs1 and cs2 are non-exclusive. */

/* cs1's allowed CPUs cannot be a subset of cs1's exclusive CPUs */

if (!cpumask_empty(cs1->cpus_allowed) &&
   cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))
return true;
return false;
}
From your commit message, it appears you intend to modify "if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))" to "if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))".

However, I’m having trouble following the change.
The current modification specifically addresses the scenario where one cpuset A1 is exclusive and its sibling cpuset B1 is non-exclusive. The goal is to ensure that when the non-exclusive cpuset B1 modifies its own "cpuset.cpus" or "cpuset.cpus.exclusive", it does not cause A1 to change from exclusive to non-exclusive.

The following three scenarios are not affected by this patch: 1.both A1 and B1 are exclusive. 2.both A1 and B1 are non-exclusive. 3.A1 is exclusive, B1 is non-exclusive, change "cpuset.cpus" or "cpuset.cpus.exclusive" of A1.

...
...
diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh index a17256d9f88a..b848bc0729cf 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -388,10 +388,11 @@ TEST_MATRIX=( " C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:1 A1:P0|A2:P-2" " C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"

# A non-exclusive cpuset.cpus change will invalidate partition and its siblings

# A non-exclusive cpuset.cpus change will not invalidate its siblings partition.

# But a exclusive cpuset.cpus change will invalidate itself. " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P0" " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P-1"

" C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P0|B1:P-1"

" C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P0|B1:P1"

# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it " C0-3 . . C4-5 X5 . . . 0 A1:0-3|B1:4-5"

-- Best regards, Ridong

Sun Shaojie

19 Nov 19 Nov

10:57 a.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

Currently, when setting a cpuset's cpuset.cpus to a value that conflicts with its sibling partition, the sibling's partition state becomes invalid. However, this invalidation is often unnecessary. If the cpuset being modified is exclusive, it should invalidate itself upon conflict.

This patch applies only to the following two cases：

Assume the machine has 4 CPUs (0-3).

root cgroup / \ A1 B1

Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus

After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."

Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus

After step #4, B1 can exclusively use CPU 2. Therefore, at step #5, regardless of what conflicting value B1 writes to cpuset.cpus, it will always have at least CPU 2 available. This makes it unnecessary to mark A1 as "root invalid".

In summary, regardless of how B1 configures its cpuset.cpus, there will always be available CPUs in B1's cpuset.cpus.effective. Therefore, there is no need to change A1 from "root" to "root invalid".

All other cases remain unaffected. For example, cgroup-v1.

Signed-off-by: Sun Shaojie sunshaojie@kylinos.cn --- kernel/cgroup/cpuset.c | 19 +------------------ .../selftests/cgroup/test_cpuset_prs.sh | 7 ++++--- 2 files changed, 5 insertions(+), 21 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..f6a834335ebf 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2411,34 +2411,17 @@ static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialc struct tmpmasks *tmp) { int retval; - struct cpuset *parent = parent_cs(cs);

retval = validate_change(cs, trialcs);

if ((retval == -EINVAL) && cpuset_v2()) { - struct cgroup_subsys_state *css; - struct cpuset *cp; - /* * The -EINVAL error code indicates that partition sibling * CPU exclusivity rule has been violated. We still allow * the cpumask change to proceed while invalidating the - * partition. However, any conflicting sibling partitions - * have to be marked as invalid too. + * partition. */ trialcs->prs_err = PERR_NOTEXCL; - rcu_read_lock(); - cpuset_for_each_child(cp, css, parent) { - struct cpumask *xcpus = user_xcpus(trialcs); - - if (is_partition_valid(cp) && - cpumask_intersects(xcpus, cp->effective_xcpus)) { - rcu_read_unlock(); - update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp); - rcu_read_lock(); - } - } - rcu_read_unlock(); retval = 0; } return retval; diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh index a17256d9f88a..7d8941f65d84 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -388,10 +388,11 @@ TEST_MATRIX=( " C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:1 A1:P0|A2:P-2" " C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"

- # A non-exclusive cpuset.cpus change will invalidate partition and its siblings + # A non-exclusive cpuset.cpus change will not invalidate its siblings partition. + # An exclusive cpuset.cpus change will invalidate itself. " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P0" - " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P-1" - " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P0|B1:P-1" + " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P-1|B1:P1" + " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P0|B1:P1"

# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it " C0-3 . . C4-5 X5 . . . 0 A1:0-3|B1:4-5"

-- 2.25.1

Michal Koutný

1:20 p.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

On Wed, Nov 19, 2025 at 06:57:49PM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:

...

Currently, when setting a cpuset's cpuset.cpus to a value that conflicts with its sibling partition, the sibling's partition state becomes invalid. However, this invalidation is often unnecessary. If the cpuset being modified is exclusive, it should invalidate itself upon conflict.

This patch applies only to the following two cases：

Assume the machine has 4 CPUs (0-3).

root cgroup / \ A1 B1

Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus

Table 1.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |

After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."

Table 1.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |

Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus

(Thanks for working this out, Shaojie.)

...

Table 2.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root invalid | root invalid |

After step #4, B1 can exclusively use CPU 2. Therefore, at step #5, regardless of what conflicting value B1 writes to cpuset.cpus, it will always have at least CPU 2 available. This makes it unnecessary to mark A1 as "root invalid".

Table 2.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root | root invalid |

In summary, regardless of how B1 configures its cpuset.cpus, there will always be available CPUs in B1's cpuset.cpus.effective. Therefore, there is no need to change A1 from "root" to "root invalid".

Admittedly, I don't like this change because it relies on implicit preference ordering between siblings (here first comes, first served) and so the effective config cannot be derived just from the applied values :-/

Do you actually want to achieve this or is it an implementation side-effect of the Case 1 scenario that you want to achieve?

Thanks, Michal

Chen Ridong

20 Nov 20 Nov

12:57 a.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

On 2025/11/19 21:20, Michal Koutný wrote:

...

On Wed, Nov 19, 2025 at 06:57:49PM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:

...
Currently, when setting a cpuset's cpuset.cpus to a value that conflicts with its sibling partition, the sibling's partition state becomes invalid. However, this invalidation is often unnecessary. If the cpuset being modified is exclusive, it should invalidate itself upon conflict.

This patch applies only to the following two cases：

Assume the machine has 4 CPUs (0-3).

root cgroup / \ A1 B1

Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus

Table 1.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |

After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."

Table 1.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |

Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus

(Thanks for working this out, Shaojie.)

...
Table 2.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root invalid | root invalid |

After step #4, B1 can exclusively use CPU 2. Therefore, at step #5, regardless of what conflicting value B1 writes to cpuset.cpus, it will always have at least CPU 2 available. This makes it unnecessary to mark A1 as "root invalid".

Table 2.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root | root invalid |

In summary, regardless of how B1 configures its cpuset.cpus, there will always be available CPUs in B1's cpuset.cpus.effective. Therefore, there is no need to change A1 from "root" to "root invalid".

Admittedly, I don't like this change because it relies on implicit preference ordering between siblings (here first comes, first served)

Agree. If we only invalidate the latter one, I think regardless of the implementation approach, we may end up with different results depending on the order of operations.

...

and so the effective config cannot be derived just from the applied values :-/

Do you actually want to achieve this or is it an implementation side-effect of the Case 1 scenario that you want to achieve?

Thanks, Michal

-- Best regards, Ridong

Sun Shaojie

1:07 p.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

Hi, Ridong,

On Thu, 20 Nov 2025 08:57:51, Chen Ridong wrote:

...

On 2025/11/19 21:20, Michal Koutný wrote:

...
On Wed, Nov 19, 2025 at 06:57:49PM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:

...
Table 2.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root invalid | root invalid |

After step #4, B1 can exclusively use CPU 2. Therefore, at step #5, regardless of what conflicting value B1 writes to cpuset.cpus, it will always have at least CPU 2 available. This makes it unnecessary to mark A1 as "root invalid".

Table 2.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root | root invalid |

In summary, regardless of how B1 configures its cpuset.cpus, there will always be available CPUs in B1's cpuset.cpus.effective. Therefore, there is no need to change A1 from "root" to "root invalid".

Admittedly, I don't like this change because it relies on implicit preference ordering between siblings (here first comes, first served)

Agree. If we only invalidate the latter one, I think regardless of the implementation approach, we may end up with different results depending on the order of operations.

I don't understand the "order of operations" mentioned here. After reviewing the previous email content, are you referring to this?

On Sat, 15 Nov 2025 15:41:03, Chen Ridong wrote:

...

With the result you expect, would we observe the following behaviors:

#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus #4> echo "1-2" > B1/cpuset.cpus #5> echo "root" > A1/cpuset.cpus.partition #6> echo "root" > B1/cpuset.cpus.partition # A1:root;B1:root invalid

#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus #4> echo "1-2" > B1/cpuset.cpus #5> echo "root" > B1/cpuset.cpus.partition #6> echo "root" > A1/cpuset.cpus.partition # A1:root invalid;B1:root

Do different operation orders yield different results? If so, this is not what we expect.

However, after applying this patch, the outcomes of these two examples are as follows:

Moreover, even without applying this patch, the result remains the same, because modifying cpuset.cpus.partition does not disable its siblings' partitions.

So, what are the specific issues that you believe would arise?

Thanks, Sun Shaojie

Chen Ridong

1:25 p.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

On 2025/11/20 21:07, Sun Shaojie wrote:

...

Hi, Ridong,

On Thu, 20 Nov 2025 08:57:51, Chen Ridong wrote:

...
On 2025/11/19 21:20, Michal Koutný wrote:

...
On Wed, Nov 19, 2025 at 06:57:49PM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:

...
Table 2.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root invalid | root invalid |

After step #4, B1 can exclusively use CPU 2. Therefore, at step #5, regardless of what conflicting value B1 writes to cpuset.cpus, it will always have at least CPU 2 available. This makes it unnecessary to mark A1 as "root invalid".

Table 2.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root | root invalid |

In summary, regardless of how B1 configures its cpuset.cpus, there will always be available CPUs in B1's cpuset.cpus.effective. Therefore, there is no need to change A1 from "root" to "root invalid".

Admittedly, I don't like this change because it relies on implicit preference ordering between siblings (here first comes, first served)

Agree. If we only invalidate the latter one, I think regardless of the implementation approach, we may end up with different results depending on the order of operations.

I don't understand the "order of operations" mentioned here. After reviewing the previous email content, are you referring to this?

On Sat, 15 Nov 2025 15:41:03, Chen Ridong wrote:

...
With the result you expect, would we observe the following behaviors:

#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus #4> echo "1-2" > B1/cpuset.cpus #5> echo "root" > A1/cpuset.cpus.partition #6> echo "root" > B1/cpuset.cpus.partition # A1:root;B1:root invalid

#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus #4> echo "1-2" > B1/cpuset.cpus #5> echo "root" > B1/cpuset.cpus.partition #6> echo "root" > A1/cpuset.cpus.partition # A1:root invalid;B1:root

Do different operation orders yield different results? If so, this is not what we expect.

However, after applying this patch, the outcomes of these two examples are as follows: #1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "1-2" > B1/cpuset.cpus | member | member | #5> echo "root" > A1/cpuset.cpus.partition | root invalid | root | #6> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid|

#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "1-2" > B1/cpuset.cpus | member | member | #5> echo "root" > B1/cpuset.cpus.partition | root | root invalid| #6> echo "root" > A1/cpuset.cpus.partition | root invalid | root invalid|

How about the following two sequences of operations:

#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus #4> echo "root" > A1/cpuset.cpus.partition #5> echo "1-2" > B1/cpuset.cpus #6> echo "root" > B1/cpuset.cpus.partition

#1> mkdir -p A1 #2> mkdir -p B1 #5> echo "1-2" > B1/cpuset.cpus #6> echo "root" > B1/cpuset.cpus.partition #3> echo "0-1" > A1/cpuset.cpus #4> echo "root" > A1/cpuset.cpus.partition

Will these two sequences yield the same result?

As a key requirement: Regardless of the order in which we apply the configurations, identical final settings should always result in identical system states. We need to confirm if this holds true here.

...

Moreover, even without applying this patch, the result remains the same, because modifying cpuset.cpus.partition does not disable its siblings' partitions.

So, what are the specific issues that you believe would arise?

Thanks, Sun Shaojie

-- Best regards, Ridong

Sun Shaojie

1:05 p.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

Hi, Michal,

On Wed, 19 Nov 2025 14:20:25, Michal Koutný wrote:

...

On Wed, Nov 19, 2025 at 06:57:49PM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:

...
Currently, when setting a cpuset's cpuset.cpus to a value that conflicts with its sibling partition, the sibling's partition state becomes invalid. However, this invalidation is often unnecessary. If the cpuset being modified is exclusive, it should invalidate itself upon conflict.

This patch applies only to the following two cases：

Assume the machine has 4 CPUs (0-3).

root cgroup / \ A1 B1

Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus

Table 1.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |

After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."

Table 1.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |

Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus

(Thanks for working this out, Shaojie.)

...
Table 2.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root invalid | root invalid |

After step #4, B1 can exclusively use CPU 2. Therefore, at step #5, regardless of what conflicting value B1 writes to cpuset.cpus, it will always have at least CPU 2 available. This makes it unnecessary to mark A1 as "root invalid".

Table 2.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root | root invalid |

In summary, regardless of how B1 configures its cpuset.cpus, there will always be available CPUs in B1's cpuset.cpus.effective. Therefore, there is no need to change A1 from "root" to "root invalid".

Admittedly, I don't like this change because it relies on implicit preference ordering between siblings (here first comes, first served) and so the effective config cannot be derived just from the applied values :-/

Do you actually want to achieve this or is it an implementation side-effect of the Case 1 scenario that you want to achieve?

Yes, this is indeed the functionality I intended to achieve, as I find it follows the same logic as Case 1.

However, I didn't fully understand what you meant by "implicit preference ordering between siblings (here first comes, first served)." Could you provide an example?

As for your point that "the effective config cannot be derived just from the applied values," even before this patch, we couldn't derive the final effective configuration solely from the applied values.

After step #3, both Table 1 and Table 2 have identical value settings, yet A1's partition state differs between them.

Thanks, Sun Shaojie

Chen Ridong

12:51 a.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

On 2025/11/19 18:57, Sun Shaojie wrote:

...

Currently, when setting a cpuset's cpuset.cpus to a value that conflicts with its sibling partition, the sibling's partition state becomes invalid. However, this invalidation is often unnecessary. If the cpuset being modified is exclusive, it should invalidate itself upon conflict.

This patch applies only to the following two cases：

Assume the machine has 4 CPUs (0-3).

root cgroup / \ A1 B1

Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus

Table 1.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |

After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."

Table 1.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |

Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus

Table 2.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root invalid | root invalid |

After step #4, B1 can exclusively use CPU 2. Therefore, at step #5, regardless of what conflicting value B1 writes to cpuset.cpus, it will always have at least CPU 2 available. This makes it unnecessary to mark A1 as "root invalid".

Table 2.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root | root invalid |

In summary, regardless of how B1 configures its cpuset.cpus, there will always be available CPUs in B1's cpuset.cpus.effective. Therefore, there is no need to change A1 from "root" to "root invalid".

All other cases remain unaffected. For example, cgroup-v1.

Signed-off-by: Sun Shaojie sunshaojie@kylinos.cn

kernel/cgroup/cpuset.c | 19 +------------------ .../selftests/cgroup/test_cpuset_prs.sh | 7 ++++--- 2 files changed, 5 insertions(+), 21 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..f6a834335ebf 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2411,34 +2411,17 @@ static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialc struct tmpmasks *tmp) { int retval;

struct cpuset *parent = parent_cs(cs);

retval = validate_change(cs, trialcs); if ((retval == -EINVAL) && cpuset_v2()) {
struct cgroup_subsys_state *css;
struct cpuset *cp;
/*

The -EINVAL error code indicates that partition sibling

CPU exclusivity rule has been violated. We still allow

the cpumask change to proceed while invalidating the
 * partition. However, any conflicting sibling partitions
 * have to be marked as invalid too.
 * partition.
*/ trialcs->prs_err = PERR_NOTEXCL;
rcu_read_lock();
cpuset_for_each_child(cp, css, parent) {
	struct cpumask *xcpus = user_xcpus(trialcs);
	if (is_partition_valid(cp) &&
	    cpumask_intersects(xcpus, cp->effective_xcpus)) {
		rcu_read_unlock();
		update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp);
		rcu_read_lock();
	}
}
rcu_read_unlock();
retval = 0; } return retval;

If we remove this logic, there is a scenario where the parent (a partition) could end up with empty effective CPUs. This means the corresponding CS will also have empty effective CPUs and thus fail to disable its siblings' partitions.

-- Best regards, Ridong

Sun Shaojie

1:07 p.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

Hi, Ridong,

On Thu, 20 Nov 2025 08:51:30, Chen Ridong wrote:

...

On 2025/11/19 18:57, Sun Shaojie wrote:

...
kernel/cgroup/cpuset.c | 19 +------------------ .../selftests/cgroup/test_cpuset_prs.sh | 7 ++++--- 2 files changed, 5 insertions(+), 21 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..f6a834335ebf 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2411,34 +2411,17 @@ static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialc struct tmpmasks *tmp) { int retval;

struct cpuset *parent = parent_cs(cs);

retval = validate_change(cs, trialcs); if ((retval == -EINVAL) && cpuset_v2()) {
struct cgroup_subsys_state *css;
struct cpuset *cp;
/*

The -EINVAL error code indicates that partition sibling

CPU exclusivity rule has been violated. We still allow

the cpumask change to proceed while invalidating the
 * partition. However, any conflicting sibling partitions
 * have to be marked as invalid too.
 * partition.
*/ trialcs->prs_err = PERR_NOTEXCL;
rcu_read_lock();
cpuset_for_each_child(cp, css, parent) {
	struct cpumask *xcpus = user_xcpus(trialcs);
	if (is_partition_valid(cp) &&
	    cpumask_intersects(xcpus, cp->effective_xcpus)) {
		rcu_read_unlock();
		update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp);
		rcu_read_lock();
	}
}
rcu_read_unlock();
retval = 0; } return retval;
If we remove this logic, there is a scenario where the parent (a partition) could end up with empty effective CPUs. This means the corresponding CS will also have empty effective CPUs and thus fail to disable its siblings' partitions.

I have carefully considered the scenario where parent effective CPUs are empty, which corresponds to the following two cases. (After apply this patch).

root cgroup | A1 / \ A2 A3

Case 1: Step: #1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition

After step #4,

| A1 | A2 | A3 | cpus_allowed | 0-1 | 0-1 | | effective_cpus | | 0-1 | | prstate | root | root | member |

After step #4, A3's effective CPUs is empty.

#5> echo "0-1" > A3/cpuset.cpus

After step #5,

| A1 | A2 | A3 | cpus_allowed | 0-1 | 0-1 | 0-1 | effective_cpus | | 0-1 | | prstate | root | root | member |

This patch affects step #5. After step #5, A3's effective CPUs is also empty. Since A3's effective CPUs can be empty before step #5 (setting cpuset.cpus), it is acceptable for them to remain empty after step #5. Moreover, if A3 is aware that its parent's effective CPUs are empty, it should understand that the CPUs it requests may not be granted.

Case 2: Step: #1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition #5> echo "1" > A3/cpuset.cpus #6> echo "root" > A3/cpuset.cpus.partition

After step #6,

| A1 | A2 | A3 | cpus_allowed | 0-1 | 0 | 1 | effective_cpus | | 0 | 1 | prstate | root | root | root |

#7> echo "0-1" > A3/cpuset.cpus

After step #7,

| A1 | A2 | A3 | cpus_allowed | 0-1 | 0 | 0-1 | effective_cpus | 1 | 0 | 1 | prstate | root | root | root invalid |

This patch affects step #7. After step #7, A3 only affects itself, changing from "root" to "root invalid". However, since its effective CPUs remain 1 both before and after step #7, it doesn't matter even if A2 is not invalidated.

The purpose of this patch is to ensure that modifying cpuset.cpus does not disable its siblings' partitions.

Thanks, Sun Shaojie

Chen Ridong

1:45 p.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

On 2025/11/20 21:07, Sun Shaojie wrote:

...

Hi, Ridong,

On Thu, 20 Nov 2025 08:51:30, Chen Ridong wrote:

...
On 2025/11/19 18:57, Sun Shaojie wrote:

...
kernel/cgroup/cpuset.c | 19 +------------------ .../selftests/cgroup/test_cpuset_prs.sh | 7 ++++--- 2 files changed, 5 insertions(+), 21 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..f6a834335ebf 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2411,34 +2411,17 @@ static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialc struct tmpmasks *tmp) { int retval;

struct cpuset *parent = parent_cs(cs);

retval = validate_change(cs, trialcs); if ((retval == -EINVAL) && cpuset_v2()) {
struct cgroup_subsys_state *css;
struct cpuset *cp;
/*

The -EINVAL error code indicates that partition sibling

CPU exclusivity rule has been violated. We still allow

the cpumask change to proceed while invalidating the
 * partition. However, any conflicting sibling partitions
 * have to be marked as invalid too.
 * partition.
*/ trialcs->prs_err = PERR_NOTEXCL;
rcu_read_lock();
cpuset_for_each_child(cp, css, parent) {
	struct cpumask *xcpus = user_xcpus(trialcs);
	if (is_partition_valid(cp) &&
	    cpumask_intersects(xcpus, cp->effective_xcpus)) {
		rcu_read_unlock();
		update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp);
		rcu_read_lock();
	}
}
rcu_read_unlock();
retval = 0; } return retval;
If we remove this logic, there is a scenario where the parent (a partition) could end up with empty effective CPUs. This means the corresponding CS will also have empty effective CPUs and thus fail to disable its siblings' partitions.
I have carefully considered the scenario where parent effective CPUs are empty, which corresponds to the following two cases. (After apply this patch).

root cgroup | A1 / \ A2 A3

Case 1: Step: #1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition

After step #4,
            |      A1      |      A2      |      A3      |
cpus_allowed | 0-1 | 0-1 | | effective_cpus | | 0-1 | | prstate | root | root | member |

After step #4, A3's effective CPUs is empty.

That may be a corner case is unexpected.

...

#5> echo "0-1" > A3/cpuset.cpus

If we create subdirectories (e.g., A4, A5, ...) under the A1 cpuset and then configure cpuset.cpus for A1 (a common usage scenario), processes can no longer be migrated into these subdirectories (A4, A5, ...) afterward. However, prior to your patch, this migration was allowed.

...

After step #5,
            |      A1      |      A2      |      A3      |
cpus_allowed | 0-1 | 0-1 | 0-1 | effective_cpus | | 0-1 | | prstate | root | root | member |

This patch affects step #5. After step #5, A3's effective CPUs is also empty. Since A3's effective CPUs can be empty before step #5 (setting cpuset.cpus), it is acceptable for them to remain empty after step #5. Moreover, if A3 is aware that its parent's effective CPUs are empty, it should understand that the CPUs it requests may not be granted.

Case 2: Step: #1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition #5> echo "1" > A3/cpuset.cpus #6> echo "root" > A3/cpuset.cpus.partition

After step #6,
            |      A1      |      A2      |      A3      |
cpus_allowed | 0-1 | 0 | 1 | effective_cpus | | 0 | 1 | prstate | root | root | root |

#7> echo "0-1" > A3/cpuset.cpus

After step #7,
            |      A1      |      A2      |      A3      |
cpus_allowed | 0-1 | 0 | 0-1 | effective_cpus | 1 | 0 | 1 | prstate | root | root | root invalid |

This patch affects step #7. After step #7, A3 only affects itself, changing from "root" to "root invalid". However, since its effective CPUs remain 1 both before and after step #7, it doesn't matter even if A2 is not invalidated.

The purpose of this patch is to ensure that modifying cpuset.cpus does not disable its siblings' partitions.

Thanks, Sun Shaojie

-- Best regards, Ridong

Sun Shaojie

19 Nov 19 Nov

11:03 a.m.

New subject: [PATCH v4 1/1] cpuset: relax the overlap check for cgroup-v2

Hi, Ridong,

On 2025/11/17 19:37, Chen Ridong wrote:

...

On 2025/11/17 18:00, Sun Shaojie wrote:

...
Certainly, this rule applies regardless of whether cs1 or cs2 is exclusive, and the current implementation already handles it this way. The following two cases cover this rule. "1. If cs1 is exclusive, cs1 and cs2 must be mutually exclusive" "3. If cs2 is exclusive, cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs"

I believe this function should return the same result regardless of whether it is called as cpus_excl_conflict(A1, B1) or cpus_excl_conflict(B1, A1), which means cs1 and cs2 should be treated symmetrically. However, since cs1 and cs2 are handled differently, it is difficult to convince me that this implementation is correct.

In patch v5, modifications to the cpus_excl_conflict interface have been avoided, along with preventing the following ineffective scenario.

Both A1 and B1 are exclusive, change B1's cpuset.cpus, avoid A1 becoming non-exclusive.

Looking forward to your feedback on patch v5. patch v5 : https://lore.kernel.org/cgroups/20251119105749.1385946-1-sunshaojie@kylinos....

Thanks, Sun Shaojie

Michal Koutný

18 Nov 18 Nov

5:52 p.m.

New subject: [PATCH v4 1/1] cpuset: relax the overlap check for cgroup-v2

On Mon, Nov 17, 2025 at 09:57:08AM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:

...

This patch ensures that for sibling cpusets A1 (exclusive) and B1 (non-exclusive), change B1 cannot affect A1's exclusivity.

for example. Assume a machine has 4 CPUs (0-3).

root cgroup / \ A1 B1

Case 1: Table 1.1: Before applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |

After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0-3). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."

Table 1.2: After applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |

OK, this looks fine to me, based on this statement from the docs about cpuset.cpus.effective:

...

subset of "cpuset.cpus" unless none of the CPUs listed in "cpuset.cpus" can be granted. In this case, it will be treated just like an empty "cpuset.cpus".

I was likely confused by the eventual switch of B1 to root in your previous example. (Because if you continue, it should result in (after patch too): #4> echo "root" > B1/cpuset.partition | root invalid | root invalid | and end state should be invariant wrt A1,B1 or B1,A1 config order.)

...

All other cases remain unaffected. For example, cgroup-v1, both A1 and B1 are exclusive or non-exlusive.

(Note, I'm only commenting the concept here, I haven't checked the code change actually achieves that and doesn't break anythine else ;-)

Thanks, Michal

Sun Shaojie

19 Nov 19 Nov

11:04 a.m.

New subject: [PATCH v4 1/1] cpuset: relax the overlap check for cgroup-v2

Hi, Michal,

On Tue, 18 Nov 2025 18:52:24 +0100, Michal Koutný wrote:

...

On Mon, Nov 17, 2025 at 09:57:08AM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:

...
This patch ensures that for sibling cpusets A1 (exclusive) and B1 (non-exclusive), change B1 cannot affect A1's exclusivity.

for example. Assume a machine has 4 CPUs (0-3).

root cgroup / \ A1 B1

Case 1: Table 1.1: Before applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |

After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0-3). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."

Table 1.2: After applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |

OK, this looks fine to me, based on this statement from the docs about cpuset.cpus.effective:

...
subset of "cpuset.cpus" unless none of the CPUs listed in "cpuset.cpus" can be granted. In this case, it will be treated just like an empty "cpuset.cpus".

I was likely confused by the eventual switch of B1 to root in your previous example. (Because if you continue, it should result in (after patch too): #4> echo "root" > B1/cpuset.partition | root invalid | root invalid | and end state should be invariant wrt A1,B1 or B1,A1 config order.)

This patch is based on a version after v6.18.0-rc5. Whether or not this patch is applied, modifications to cpuset.partition do not affect the state of sibling partitions.

If continue, the result should be as follows: #4> echo "root" > B1/cpuset.partition | root | root invalid |

I've updated patch v5 with some new ideas and look forward to your feedback.

patch v5 : https://lore.kernel.org/cgroups/20251119105749.1385946-1-sunshaojie@kylinos....

Thanks, Sun Shaojie

Waiman Long

18 Nov 18 Nov

7:53 p.m.

New subject: [PATCH v4 1/1] cpuset: relax the overlap check for cgroup-v2

On 11/16/25 8:57 PM, Sun Shaojie wrote:

...

In cgroup v2, a mutual overlap check is required when at least one of two cpusets is exclusive. However, this check should be relaxed and limited to cases where both cpusets are exclusive.

This patch ensures that for sibling cpusets A1 (exclusive) and B1 (non-exclusive), change B1 cannot affect A1's exclusivity.

for example. Assume a machine has 4 CPUs (0-3).
root cgroup
   /    \
 A1      B1
Case 1: Table 1.1: Before applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |

After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0-3). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."

Table 1.2: After applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |

All other cases remain unaffected. For example, cgroup-v1, both A1 and B1 are exclusive or non-exlusive.

Signed-off-by: Sun Shaojie sunshaojie@kylinos.cn

kernel/cgroup/cpuset-internal.h | 3 ++ kernel/cgroup/cpuset-v1.c | 20 +++++++++ kernel/cgroup/cpuset.c | 43 ++++++++++++++----- .../selftests/cgroup/test_cpuset_prs.sh | 5 ++- 4 files changed, 58 insertions(+), 13 deletions(-)

diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h index 337608f408ce..c53111998432 100644 --- a/kernel/cgroup/cpuset-internal.h +++ b/kernel/cgroup/cpuset-internal.h @@ -292,6 +292,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs, struct cpumask *new_cpus, nodemask_t *new_mems, bool cpus_updated, bool mems_updated); int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial); +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2); #else static inline void fmeter_init(struct fmeter *fmp) {} static inline void cpuset1_update_task_spread_flags(struct cpuset *cs, @@ -302,6 +303,8 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs, bool cpus_updated, bool mems_updated) {} static inline int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) { return 0; } +static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1,
		struct cpuset *cs2) {return false; }
#endif /* CONFIG_CPUSETS_V1 */
#endif /* __CPUSET_INTERNAL_H */ diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c index 12e76774c75b..5c1296bf6a34 100644 --- a/kernel/cgroup/cpuset-v1.c +++ b/kernel/cgroup/cpuset-v1.c @@ -373,6 +373,26 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) return ret; } +/*
cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU conflicts
                           to legacy (v1)
@cs1: first cpuset to check

@cs2: second cpuset to check

Returns: true if CPU exclusivity conflict exists, false otherwise

If either cpuset is CPU exclusive, their allowed CPUs cannot intersect.

*/
+bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) +{
if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
if (cpumask_intersects(cs1->cpus_allowed,
		       cs2->cpus_allowed))
	return true;
return false;
+}

#ifdef CONFIG_PROC_PID_CPUSET /*

proc_cpuset_show()

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..0fd803612513 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -580,35 +580,56 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2) /**

cpus_excl_conflict - Check if two cpusets have exclusive CPU conflicts

@cs1: first cpuset to check

@cs2: second cpuset to check

@cs1: current cpuset to check

@cs2: cpuset involved in the check

Returns: true if CPU exclusivity conflict exists, false otherwise

Conflict detection rules:

If either cpuset is CPU exclusive, they must be mutually exclusive
For cgroup-v1:
see cpuset1_cpus_excl_conflict()
For cgroup-v2:

If cs1 is exclusive, cs1 and cs2 must be mutually exclusive

exclusive_cpus masks cannot intersect between cpusets
The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs

If cs2 is exclusive, cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs

if cs1 and cs2 are not exclusive, the allowed CPUs of one cpuset cannot be a subset

of another's exclusive CPUs

*/ static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)

As cs1 and cs2 is going to be handled differently, their current naming will make it hard to understand why they are treated differently. I will recommended changing the parameter name to "trial, sibling" as the caller call it with "cpus_excl_conflict(trial, c)" where trial is the new cpuset data to be tested and sibling is one of its sibling cpusets. It has to be clearly document what each parameter is for and the fact that swapping the parameters will cause it to return incorrect result.

...

{

/* If either cpuset is exclusive, check if they are mutually exclusive */

if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
/* For cgroup-v1 */

if (!cpuset_v2())
return cpuset1_cpus_excl_conflict(cs1, cs2);
/* If cs1 are exclusive, check if they are mutually exclusive */

if (is_cpu_exclusive(cs1)) return !cpusets_are_exclusive(cs1, cs2);

Code change like the following can eliminate the need to introduce a new cpuset1_cpus_excl_conflict() helper.

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index ec8bebc66469..201c70fb7401 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -599,9 +599,15 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2) */ static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) { - /* If either cpuset is exclusive, check if they are mutually exclusive */ - if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2)) - return !cpusets_are_exclusive(cs1, cs2); + /* + * If trial is exclusive or sibling is exclusive & in v1, + * check if they are mutually exclusive + */ + if (is_cpu_exclusive(trial) || (!cpuset_v2() && is_cpu_exclusive(sibling))) + return !cpusets_are_exclusive(trial, sibling); + + if (!cpuset_v2()) + return false; /* The checking below is irrelevant to cpuset v1 */

/* Exclusive_cpus cannot intersect */ if (cpumask_intersects(cs1->exclusive_cpus, cs2->exclusive_cpus))

...

/* The following check applies when either
* both cs1 and cs2 are non-exclusive，or
* only cs2 is exclusive.
*/
/* Exclusive_cpus cannot intersect */ if (cpumask_intersects(cs1->exclusive_cpus, cs2->exclusive_cpus)) return true;
/* The cpus_allowed of one cpuset cannot be a subset of another cpuset's exclusive_cpus */

if (!cpumask_empty(cs1->cpus_allowed) &&
   cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))
return true;
/* cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs */ if (!cpumask_empty(cs2->cpus_allowed) && cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus)) return true;
/* If cs2 is exclusive, check finished here */

if (is_cpu_exclusive(cs2))
return false;
/* The following check applies only if both cs1 and cs2 are non-exclusive. */

/* cs1's allowed CPUs cannot be a subset of cs1's exclusive CPUs */

"sibling's exclusive CPUs"

...

if (!cpumask_empty(cs1->cpus_allowed) &&
   cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))
return true;

As said before, we can't fail change to cpuset.cpus by default, but we can fail change to cpuset.cpus.exclusive. So this additional check isn't OK unless this check is under a special mode that is opted in via other means like an additional cgroup control file or a boot command line option and so on.

Cheers, Longman

Sun Shaojie

19 Nov 19 Nov

11:05 a.m.

New subject: [PATCH v4 1/1] cpuset: relax the overlap check for cgroup-v2

Hi, Longman,

On Tue, 18 Nov 2025 14:53:27 -0500, Longman wrote:

...

On 11/16/25 8:57 PM, Sun Shaojie wrote:

...
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..0fd803612513 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -580,35 +580,56 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2) /**

cpus_excl_conflict - Check if two cpusets have exclusive CPU conflicts

@cs1: first cpuset to check

@cs2: second cpuset to check

@cs1: current cpuset to check

@cs2: cpuset involved in the check

Returns: true if CPU exclusivity conflict exists, false otherwise

Conflict detection rules:

If either cpuset is CPU exclusive, they must be mutually exclusive
For cgroup-v1:
see cpuset1_cpus_excl_conflict()
For cgroup-v2:

If cs1 is exclusive, cs1 and cs2 must be mutually exclusive

exclusive_cpus masks cannot intersect between cpusets
The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs

If cs2 is exclusive, cs2's allowed CPUs cannot be a subset of cs1's exclusive CPUs

if cs1 and cs2 are not exclusive, the allowed CPUs of one cpuset cannot be a subset

of another's exclusive CPUs

*/ static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
As cs1 and cs2 is going to be handled differently, their current naming will make it hard to understand why they are treated differently. I will recommended changing the parameter name to "trial, sibling" as the caller call it with "cpus_excl_conflict(trial, c)" where trial is the new cpuset data to be tested and sibling is one of its sibling cpusets. It has to be clearly document what each parameter is for and the fact that swapping the parameters will cause it to return incorrect result.

...
{

/* If either cpuset is exclusive, check if they are mutually exclusive */

if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
/* For cgroup-v1 */

if (!cpuset_v2())
return cpuset1_cpus_excl_conflict(cs1, cs2);
/* If cs1 are exclusive, check if they are mutually exclusive */

if (is_cpu_exclusive(cs1)) return !cpusets_are_exclusive(cs1, cs2);
Code change like the following can eliminate the need to introduce a new cpuset1_cpus_excl_conflict() helper.

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index ec8bebc66469..201c70fb7401 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -599,9 +599,15 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2) */ static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) {
  /* If either cpuset is exclusive, check if they are mutually 
exclusive */
  if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
          return !cpusets_are_exclusive(cs1, cs2);
  /*
   * If trial is exclusive or sibling is exclusive & in v1,
   * check if they are mutually exclusive
   */
  if (is_cpu_exclusive(trial) || (!cpuset_v2() && 
is_cpu_exclusive(sibling)))
          return !cpusets_are_exclusive(trial, sibling);
  if (!cpuset_v2())
          return false;   /* The checking below is irrelevant to 
cpuset v1 */
    /* Exclusive_cpus cannot intersect */
    if (cpumask_intersects(cs1->exclusive_cpus, cs2->exclusive_cpus))

Thank you very much for your guidance and suggestions on the code.

I've updated patch v5 with some new ideas and look forward to your feedback.

patch v5 : https://lore.kernel.org/cgroups/20251119105749.1385946-1-sunshaojie@kylinos....

Thanks, Sun Shaojie

Chen Ridong

17 Nov 17 Nov

3:23 a.m.

On 2025/11/17 9:57, Sun Shaojie wrote:

...

In cgroup v2, a mutual overlap check is required when at least one of two cpusets is exclusive. However, this check should be relaxed and limited to cases where both cpusets are exclusive.

This patch ensures that for sibling cpusets A1 (exclusive) and B1 (non-exclusive), change B1 cannot affect A1's exclusivity.

for example. Assume a machine has 4 CPUs (0-3).

root cgroup / \ A1 B1

Case 1: Table 1.1: Before applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |

After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0-3). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."

Table 1.2: After applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |

Case 2: (This situation remains unchanged from before) Table 2.1: Before applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #3> echo "1-2" > B1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root invalid | member |

Table 2.2: After applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #3> echo "1-2" > B1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root invalid | member |

All other cases remain unaffected. For example, cgroup-v1, both A1 and B1 are exclusive or non-exlusive.

v3 -> v4:

Adjust the test_cpuset_prt.sh test file to align with the current behavior.

v2 -> v3:

Ensure compliance with constraints such as cpuset.cpus.exclusive.

Link: https://lore.kernel.org/cgroups/20251113131434.606961-1-sunshaojie@kylinos.c...

v1 -> v2:

Keeps the current cgroup v1 behavior unchanged

Link: https://lore.kernel.org/cgroups/c8e234f4-2c27-4753-8f39-8ae83197efd3@redhat....

kernel/cgroup/cpuset-internal.h | 3 ++ kernel/cgroup/cpuset-v1.c | 20 +++++++++ kernel/cgroup/cpuset.c | 43 ++++++++++++++----- .../selftests/cgroup/test_cpuset_prs.sh | 5 ++- 4 files changed, 58 insertions(+), 13 deletions(-)

Is this a cover letter?

The cover letter is labeled as v3, while the patch itself is v4.

For a single patch, I don’t think a cover letter is necessary.

-- Best regards, Ridong

Sun Shaojie

5:58 a.m.

On 2025/11/17 11:23, Chen Ridong wrote:

...

Is this a cover letter?

The cover letter is labeled as v3, while the patch itself is v4.

For a single patch, I don’t think a cover letter is necessary.

Hi, Ridong,

Thank you so much. I've made a mental note of this point.

Thanks, Sun Shaojie

Waiman Long

23 Dec 23 Dec

6:06 a.m.

New subject: [PING][PATCH v6] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict

On 12/17/25 4:45 AM, Sun Shaojie wrote:

...

Hi, Longman, Just a friendly ping regarding the patch "[PATCH v6] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict" sent on [Mon, 1 Dec 2025 17:38:06 +0800]. Link: https://lore.kernel.org/cgroups/20251201093806.107157-1-sunshaojie@kylinos.c...

Could you please take a look when you have a moment? We'd appreciate any initial feedback or suggestions you might have.

Thank you again for your time and consideration.

I am sorry that I am late in reviewing your patch. I was busy in the last few weeks. Now I will try to review your patch later this week.

Cheers, Longman

Waiman Long

6:03 a.m.

New subject: [PATCH v6] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

On 12/22/25 10:26 AM, Michal Koutný wrote:

...

Hello Shaojie.

On Mon, Dec 01, 2025 at 05:38:06PM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:

...
Currently, when setting a cpuset's cpuset.cpus to a value that conflicts with its sibling partition, the sibling's partition state becomes invalid. However, this invalidation is often unnecessary.

For example: On a machine with 128 CPUs, there are m (m < 128) cpusets under the root cgroup. Each cpuset is used by a single user(user-1 use A1, ... , user-m use Am), and the partition states of these cpusets are configured as follows:
                        root cgroup
     /             /                  \                 \
    A1            A2        ...       An                Am
  (root)        (root)      ...     (root) (root/root invalid/member)
Assume that A1 through Am have not set cpuset.cpus.exclusive. When user-m modifies Am's cpuset.cpus to "0-127", it will cause all partition states from A1 to An to change from root to root invalid, as shown below.
                        root cgroup
     /              /                 \                 \
    A1             A2       ...       An                Am
(root invalid) (root invalid) ... (root invalid) (root invalid/member)

This outcome is entirely undeserved for all users from A1 to An.
s/cpuset.cpus/memory.max/

When the permissions are such that the last (any) sibling can come and claim so much to cause overcommit, then it can set up large limit and (potentially) reclaim from others.

s/cpuset.cpus/memory.min/

Here is the overcommit approached by recalculating effective values of memory.min, again one sibling can skew toward itself and reduce every other's effective value.

Above are not exact analogies because first of them is Limits, the second is Protections and cpusets are Allocations (refering to Resource Distribution Models from Documentation/admin-guide/cgroup-v2.rst).

But the advice to get some guarantees would be same in all cases -- if some guarantees are expected, the permissions (of respective cgroup attributes) should be configured so that it decouples the owner of the cgroup from the owner of the resource (i.e. Ai/cpuset.cpus belongs to root or there's a middle level cgroup that'd cap each of the siblings individually).

From sibling point of view, CPUs in partitions are exclusive. A cpuset either have all the requested CPUs to form a partition (assuming that at least one can be granted from the parent cpuset) or it doesn't have all of them and fails to form a valid partition. It is different from memory that a cgroup can have a reduced amount of memory than requested and can still work fine.

Anyway, I consider using cpuset.cpus to form a partition is legacy and is supported for backward compatibility reason. Now the proper way to form a partition is to use cpuset.cpus.exclusive, the setting of it can fail if it conflicts with siblings.

By using cpuset.cpus only to form partitions, the cpuset.cpus value will be treated the same as cpuset.cpus.exclusive if a valid partition is formed. In that sense, the examples listed in the patch will have the same result if cpuset.cpu.exclusive is used instead of cpuset.cpus. The difference is that writing to the cpuset.cpus.exclusive will fail instead of forming an invalid partition in the case of cpust.cpus.

...

...
After applying this patch, the first party to set "root" will maintain its exclusive validity. As follows:

Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "1-2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root invalid |

Step | A1's prstate | B1's prstate | #1> echo "0-1" > B1/cpuset.cpus | member | member | #2> echo "root" > B1/cpuset.cpus.partition | member | root | #3> echo "1-2" > A1/cpuset.cpus | member | root | #4> echo "root" > A1/cpuset.cpus.partition | root invalid | root |

I'm worried that the ordering dependency would lead to situations where users may not be immediately aware their config is overcommitting the system. Consider that CPUs are vital for A1 but B1 can somehow survive the degraded state, depending on the starting order the system may either run fine (A1 valid) or fail because of A1.

I'm curious about Waiman's take.

That is why I will recommend users to use cpuset.cpus.exclusive to form partition as they can get early feedback if they are overcommitting. Of course, setting cpuset.cpus.exclusive without failure still doesn't guarantee the formation of a valid partition if none of the exclusive CPUs can be granted from the parent.

Cheers, Longman

Michal Koutný

22 Dec 22 Dec

3:26 p.m.

New subject: [PATCH v6] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

Hello Shaojie.

On Mon, Dec 01, 2025 at 05:38:06PM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:

...

Currently, when setting a cpuset's cpuset.cpus to a value that conflicts with its sibling partition, the sibling's partition state becomes invalid. However, this invalidation is often unnecessary.

For example: On a machine with 128 CPUs, there are m (m < 128) cpusets under the root cgroup. Each cpuset is used by a single user(user-1 use A1, ... , user-m use Am), and the partition states of these cpusets are configured as follows:
                       root cgroup
    /             /                  \                 \
   A1            A2        ...       An                Am
 (root)        (root)      ...     (root) (root/root invalid/member)
Assume that A1 through Am have not set cpuset.cpus.exclusive. When user-m modifies Am's cpuset.cpus to "0-127", it will cause all partition states from A1 to An to change from root to root invalid, as shown below.
                       root cgroup
    /              /                 \                 \
   A1             A2       ...       An                Am
(root invalid) (root invalid) ... (root invalid) (root invalid/member)

This outcome is entirely undeserved for all users from A1 to An.

s/cpuset.cpus/memory.max/

When the permissions are such that the last (any) sibling can come and claim so much to cause overcommit, then it can set up large limit and (potentially) reclaim from others.

s/cpuset.cpus/memory.min/

Here is the overcommit approached by recalculating effective values of memory.min, again one sibling can skew toward itself and reduce every other's effective value.

Above are not exact analogies because first of them is Limits, the second is Protections and cpusets are Allocations (refering to Resource Distribution Models from Documentation/admin-guide/cgroup-v2.rst).

But the advice to get some guarantees would be same in all cases -- if some guarantees are expected, the permissions (of respective cgroup attributes) should be configured so that it decouples the owner of the cgroup from the owner of the resource (i.e. Ai/cpuset.cpus belongs to root or there's a middle level cgroup that'd cap each of the siblings individually).

...

After applying this patch, the first party to set "root" will maintain its exclusive validity. As follows:

Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "1-2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root invalid |

Step | A1's prstate | B1's prstate | #1> echo "0-1" > B1/cpuset.cpus | member | member | #2> echo "root" > B1/cpuset.cpus.partition | member | root | #3> echo "1-2" > A1/cpuset.cpus | member | root | #4> echo "root" > A1/cpuset.cpus.partition | root invalid | root |

I'm worried that the ordering dependency would lead to situations where users may not be immediately aware their config is overcommitting the system. Consider that CPUs are vital for A1 but B1 can somehow survive the degraded state, depending on the starting order the system may either run fine (A1 valid) or fail because of A1.

I'm curious about Waiman's take.

Thanks, Michal

Sun Shaojie

17 Dec 17 Dec

9:45 a.m.

New subject: [PING][PATCH v6] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict

Hi, Longman,

Just a friendly ping regarding the patch "[PATCH v6] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict" sent on [Mon, 1 Dec 2025 17:38:06 +0800]. Link: https://lore.kernel.org/cgroups/20251201093806.107157-1-sunshaojie@kylinos.c...

Could you please take a look when you have a moment? We'd appreciate any initial feedback or suggestions you might have.

Thank you again for your time and consideration.

Thanks, Sun Shaojie

Sun Shaojie

21 Nov 21 Nov

10:32 a.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

Hi, Ridong,

On Thu, 20 Nov 2025 21:45:16, Chen Ridong wrote:

...

On 2025/11/20 21:07, Sun Shaojie wrote:

...
I have carefully considered the scenario where parent effective CPUs are empty, which corresponds to the following two cases. (After apply this patch).

root cgroup | A1 / \ A2 A3

Case 1: Step: #1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition

After step #4,
            |      A1      |      A2      |      A3      |
cpus_allowed | 0-1 | 0-1 | | effective_cpus | | 0-1 | | prstate | root | root | member |

After step #4, A3's effective CPUs is empty.
That may be a corner case is unexpected.

...
#5> echo "0-1" > A3/cpuset.cpus

If we create subdirectories (e.g., A4, A5, ...) under the A1 cpuset and then configure cpuset.cpus for A1 (a common usage scenario), processes can no longer be migrated into these subdirectories (A4, A5, ...) afterward. However, prior to your patch, this migration was allowed.

Are you referring to creating subdirectories (A4, A5, ...) after step #4? And what parameters should be configured for A1's cpuset.cpus? Could you provide a specific example?

Additionally, processes cannot be migrated into a cgroup whose cpuset.cpus.effective is empty. However, this patch does not modify this behavior.

So why does applying this patch enable such migration?

...

...
After step #5,
            |      A1      |      A2      |      A3      |
cpus_allowed | 0-1 | 0-1 | 0-1 | effective_cpus | | 0-1 | | prstate | root | root | member |

Thanks, Sun Shaojie

Sun Shaojie

10:33 a.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

Hi, Ridong,

Thu, 20 Nov 2025 21:25:12, Chen Ridong wrote:

...

On 2025/11/20 21:07, Sun Shaojie wrote:

...
I don't understand the "order of operations" mentioned here. After reviewing the previous email content, are you referring to this?

On Sat, 15 Nov 2025 15:41:03, Chen Ridong wrote:

...
With the result you expect, would we observe the following behaviors:

#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus #4> echo "1-2" > B1/cpuset.cpus #5> echo "root" > A1/cpuset.cpus.partition #6> echo "root" > B1/cpuset.cpus.partition # A1:root;B1:root invalid

#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus #4> echo "1-2" > B1/cpuset.cpus #5> echo "root" > B1/cpuset.cpus.partition #6> echo "root" > A1/cpuset.cpus.partition # A1:root invalid;B1:root

Do different operation orders yield different results? If so, this is not what we expect.

However, after applying this patch, the outcomes of these two examples are as follows: #1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "1-2" > B1/cpuset.cpus | member | member | #5> echo "root" > A1/cpuset.cpus.partition | root invalid | root | #6> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid|

#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "1-2" > B1/cpuset.cpus | member | member | #5> echo "root" > B1/cpuset.cpus.partition | root | root invalid| #6> echo "root" > A1/cpuset.cpus.partition | root invalid | root invalid|

How about the following two sequences of operations:

#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus #4> echo "root" > A1/cpuset.cpus.partition #5> echo "1-2" > B1/cpuset.cpus #6> echo "root" > B1/cpuset.cpus.partition

#1> mkdir -p A1 #2> mkdir -p B1 #5> echo "1-2" > B1/cpuset.cpus #6> echo "root" > B1/cpuset.cpus.partition #3> echo "0-1" > A1/cpuset.cpus #4> echo "root" > A1/cpuset.cpus.partition

Will these two sequences yield the same result?

...

As a key requirement: Regardless of the order in which we apply the configurations, identical final settings should always result in identical system states. We need to confirm if this holds true here.

Is this truly a key requirement? It appears this requirement wasn't met even before applying my patch.

The example below, which does not use this patch, demonstrates how different sequences with identical configurations can still lead to different system states.

Even without this patch, the result can still differ.

Thanks, Sun Shaojie

Chen Ridong

22 Nov 22 Nov

1:19 a.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

On 2025/11/21 18:33, Sun Shaojie wrote:

...

Hi, Ridong,

Thu, 20 Nov 2025 21:25:12, Chen Ridong wrote:

...
On 2025/11/20 21:07, Sun Shaojie wrote:

...
I don't understand the "order of operations" mentioned here. After reviewing the previous email content, are you referring to this?

On Sat, 15 Nov 2025 15:41:03, Chen Ridong wrote:

...
With the result you expect, would we observe the following behaviors:

#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus #4> echo "1-2" > B1/cpuset.cpus #5> echo "root" > A1/cpuset.cpus.partition #6> echo "root" > B1/cpuset.cpus.partition # A1:root;B1:root invalid

#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus #4> echo "1-2" > B1/cpuset.cpus #5> echo "root" > B1/cpuset.cpus.partition #6> echo "root" > A1/cpuset.cpus.partition # A1:root invalid;B1:root

Do different operation orders yield different results? If so, this is not what we expect.

However, after applying this patch, the outcomes of these two examples are as follows: #1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "1-2" > B1/cpuset.cpus | member | member | #5> echo "root" > A1/cpuset.cpus.partition | root invalid | root | #6> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid|

#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "1-2" > B1/cpuset.cpus | member | member | #5> echo "root" > B1/cpuset.cpus.partition | root | root invalid| #6> echo "root" > A1/cpuset.cpus.partition | root invalid | root invalid|

How about the following two sequences of operations:

#1> mkdir -p A1 #2> mkdir -p B1 #3> echo "0-1" > A1/cpuset.cpus #4> echo "root" > A1/cpuset.cpus.partition #5> echo "1-2" > B1/cpuset.cpus #6> echo "root" > B1/cpuset.cpus.partition

#1> mkdir -p A1 #2> mkdir -p B1 #5> echo "1-2" > B1/cpuset.cpus #6> echo "root" > B1/cpuset.cpus.partition #3> echo "0-1" > A1/cpuset.cpus #4> echo "root" > A1/cpuset.cpus.partition

Will these two sequences yield the same result?

...
As a key requirement: Regardless of the order in which we apply the configurations, identical final settings should always result in identical system states. We need to confirm if this holds true here.

Is this truly a key requirement? It appears this requirement wasn't met even before applying my patch.

I believe it requires, it may some corner cases we should fix.

...

The example below, which does not use this patch, demonstrates how different sequences with identical configurations can still lead to different system states.

#1> mkdir -p A1 #2> mkdir -p B1 | A1's prstate | B1's prstate | #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "0-1" > A1/cpuset.cpus.exclusive | member | member | #5> echo "root" > A1/cpuset.cpus.partition | root | member | #6> echo "1-2" > B1/cpuset.cpus | root invalid | member | #7> echo "2-3" > B1/cpuset.cpus.exclusive | root invalid | member | #8> echo "root" > B1/cpuset.cpus.partition | root invalid | root |

IIUC, you've created this example with the expectation that both A1 and B1 should serve as root partitions. However, we currently lack a mechanism where modifying a cpuset's state (e.g., cpus, cpus.exclusive, or cpus.partition) can transition its sibling from an invalid to a valid partition.

The behavior observed before step #6 is acceptable. Proactively setting B1 as a partition in step #8 is permitted, given that B1 does not conflict with A1. However, we do not have a mechanism to passively and automatically transition A1 to a valid partition state.

...

#1> mkdir -p A1 #2> mkdir -p B1 | A1's prstate | B1's prstate | #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "0-1" > A1/cpuset.cpus.exclusive | member | member | #5> echo "2-3" > B1/cpuset.cpus.exclusive | member | member | #6> echo "root" > A1/cpuset.cpus.partition | root | member | #7> echo "1-2" > B1/cpuset.cpus | root | member | #8> echo "root" > B1/cpuset.cpus.partition | root | root |

Even without this patch, the result can still differ.

Thanks, Sun Shaojie

-- Best regards, Ridong

Chen Ridong

1:33 a.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

On 2025/11/21 18:32, Sun Shaojie wrote:

...

Hi, Ridong,

On Thu, 20 Nov 2025 21:45:16, Chen Ridong wrote:

...
On 2025/11/20 21:07, Sun Shaojie wrote:

...
I have carefully considered the scenario where parent effective CPUs are empty, which corresponds to the following two cases. (After apply this patch).

root cgroup | A1 / \ A2 A3

Case 1: Step: #1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition

After step #4,
            |      A1      |      A2      |      A3      |
cpus_allowed | 0-1 | 0-1 | | effective_cpus | | 0-1 | | prstate | root | root | member |

After step #4, A3's effective CPUs is empty.
That may be a corner case is unexpected.

...
#5> echo "0-1" > A3/cpuset.cpus

If we create subdirectories (e.g., A4, A5, ...) under the A1 cpuset and then configure cpuset.cpus for A1 (a common usage scenario), processes can no longer be migrated into these subdirectories (A4, A5, ...) afterward. However, prior to your patch, this migration was allowed.
Are you referring to creating subdirectories (A4, A5, ...) after step #4? And what parameters should be configured for A1's cpuset.cpus? Could you provide a specific example?

You might be going to argue that we haven't set the cpus for A4/A5..., yeah, maybe a corner case.

However, it’s common practice to configure a cpuset’s cpus first and then migrate processes—this is a typical usage scenario.

...

Additionally, processes cannot be migrated into a cgroup whose cpuset.cpus.effective is empty. However, this patch does not modify this behavior.

So why does applying this patch enable such migration?

-- Best regards, Ridong

Sun Shaojie

24 Nov 24 Nov

10:20 a.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

Hi, Ridong,

On Sat, 22 Nov 2025 09:33:34, Chen Ridong wrote:

...

On 2025/11/21 18:32, Sun Shaojie wrote:

...
Hi, Ridong,

On Thu, 20 Nov 2025 21:45:16, Chen Ridong wrote:

...
On 2025/11/20 21:07, Sun Shaojie wrote:

...
I have carefully considered the scenario where parent effective CPUs are empty, which corresponds to the following two cases. (After apply this patch).

root cgroup | A1 / \ A2 A3

Case 1: Step: #1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition

After step #4,
            |      A1      |      A2      |      A3      |
cpus_allowed | 0-1 | 0-1 | | effective_cpus | | 0-1 | | prstate | root | root | member |

After step #4, A3's effective CPUs is empty.
That may be a corner case is unexpected.

...
#5> echo "0-1" > A3/cpuset.cpus

If we create subdirectories (e.g., A4, A5, ...) under the A1 cpuset and then configure cpuset.cpus for A1 (a common usage scenario), processes can no longer be migrated into these subdirectories (A4, A5, ...) afterward. However, prior to your patch, this migration was allowed.
Are you referring to creating subdirectories (A4, A5, ...) after step #4? And what parameters should be configured for A1's cpuset.cpus? Could you provide a specific example?
#1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition mkdir A4 mkdir A5 echo "0" > A4/cpuset.cpus echo $$ > A4/cgroup.procs echo "1" > A5/cpuset.cpus echo $$ > A5/cgroup.procs

You might be going to argue that we haven't set the cpus for A4/A5..., yeah, maybe a corner case.

However, it’s common practice to configure a cpuset’s cpus first and then migrate processes—this is a typical usage scenario.

I'm sorry, I didn't quite understand the point you were trying to make with this example.

If that's the case

root cgroup | A1 / / \ \ A2 A3 A4 A5

#1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition mkdir A4 mkdir A5 echo "0" > A4/cpuset.cpus echo $$ > A4/cgroup.procs ->This will return an error because A4's effective CPUs are empty. echo "1" > A5/cpuset.cpus echo $$ > A5/cgroup.procs ->This will return an error because A5's effective CPUs are empty.

Even with this patch applied, this result will not change.

...

...
Additionally, processes cannot be migrated into a cgroup whose cpuset.cpus.effective is empty. However, this patch does not modify this behavior.

So why does applying this patch enable such migration?

Thanks, Sun Shaojie

Sun Shaojie

10:21 a.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

Hi, Ridong,

On Sat, 22 Nov 2025 09:19:39, Chen Ridong wrote:

...

On 2025/11/21 18:33, Sun Shaojie wrote:

...
Is this truly a key requirement? It appears this requirement wasn't met even before applying my patch.

I believe it requires, it may some corner cases we should fix.

...
The example below, which does not use this patch, demonstrates how different sequences with identical configurations can still lead to different system states.

#1> mkdir -p A1 #2> mkdir -p B1 | A1's prstate | B1's prstate | #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "0-1" > A1/cpuset.cpus.exclusive | member | member | #5> echo "root" > A1/cpuset.cpus.partition | root | member | #6> echo "1-2" > B1/cpuset.cpus | root invalid | member | #7> echo "2-3" > B1/cpuset.cpus.exclusive | root invalid | member | #8> echo "root" > B1/cpuset.cpus.partition | root invalid | root |

IIUC, you've created this example with the expectation that both A1 and B1 should serve as root partitions. However, we currently lack a mechanism where modifying a cpuset's state (e.g., cpus, cpus.exclusive, or cpus.partition) can transition its sibling from an invalid to a valid partition.

The behavior observed before step #6 is acceptable. Proactively setting B1 as a partition in step #8 is permitted, given that B1 does not conflict with A1. However, we do not have a mechanism to passively and automatically transition A1 to a valid partition state.

So, was the original behavior of invalidating sibling partitions driven by this key requirement? (As a key requirement: Regardless of the order in which we apply the configurations, identical final settings should always result in identical system states.)

...

...
#1> mkdir -p A1 #2> mkdir -p B1 | A1's prstate | B1's prstate | #3> echo "0-1" > A1/cpuset.cpus | member | member | #4> echo "0-1" > A1/cpuset.cpus.exclusive | member | member | #5> echo "2-3" > B1/cpuset.cpus.exclusive | member | member | #6> echo "root" > A1/cpuset.cpus.partition | root | member | #7> echo "1-2" > B1/cpuset.cpus | root | member | #8> echo "root" > B1/cpuset.cpus.partition | root | root |

Even without this patch, the result can still differ.

Thanks, Sun Shaojie

Chen Ridong

11:33 a.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

On 2025/11/24 18:20, Sun Shaojie wrote:

...

Hi, Ridong,

On Sat, 22 Nov 2025 09:33:34, Chen Ridong wrote:

...
On 2025/11/21 18:32, Sun Shaojie wrote:

...
Hi, Ridong,

On Thu, 20 Nov 2025 21:45:16, Chen Ridong wrote:

...
On 2025/11/20 21:07, Sun Shaojie wrote:

...
I have carefully considered the scenario where parent effective CPUs are empty, which corresponds to the following two cases. (After apply this patch).

root cgroup | A1 / \ A2 A3

Case 1: Step: #1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition

After step #4,
            |      A1      |      A2      |      A3      |
cpus_allowed | 0-1 | 0-1 | | effective_cpus | | 0-1 | | prstate | root | root | member |

After step #4, A3's effective CPUs is empty.
That may be a corner case is unexpected.

...
#5> echo "0-1" > A3/cpuset.cpus

If we create subdirectories (e.g., A4, A5, ...) under the A1 cpuset and then configure cpuset.cpus for A1 (a common usage scenario), processes can no longer be migrated into these subdirectories (A4, A5, ...) afterward. However, prior to your patch, this migration was allowed.
Are you referring to creating subdirectories (A4, A5, ...) after step #4? And what parameters should be configured for A1's cpuset.cpus? Could you provide a specific example?
#1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition mkdir A4 mkdir A5 echo "0" > A4/cpuset.cpus echo $$ > A4/cgroup.procs echo "1" > A5/cpuset.cpus echo $$ > A5/cgroup.procs

You might be going to argue that we haven't set the cpus for A4/A5..., yeah, maybe a corner case.

However, it’s common practice to configure a cpuset’s cpus first and then migrate processes—this is a typical usage scenario.
I'm sorry, I didn't quite understand the point you were trying to make with this example.

If that's the case
 root cgroup
      |
      A1
   / /  \ \
 A2 A3  A4 A5
#1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition mkdir A4 mkdir A5 echo "0" > A4/cpuset.cpus

If we don't apply your patch, A2 will be invalidated.

...

echo $$ > A4/cgroup.procs ->This will return an error because A4's effective CPUs are empty. echo "1" > A5/cpuset.cpus echo $$ > A5/cgroup.procs ->This will return an error because A5's effective CPUs are empty.

Even with this patch applied, this result will not change.

You can have a try, the result I got:

# mkdir A1 # echo "0-1" > A1/cpuset.cpus # echo "root" > A1/cpuset.cpus.partition # cd A1/ # mkdir A2 # mkdir A4 # mkdir A5 # echo "0-1" > A2/cpuset.cpus # echo "root" > A2/cpuset.cpus.partition # # echo "0" > A4/cpuset.cpus # cat A2/cpuset.cpus 0-1 # cat A2/cpuset.cpus.partition root invalid # cat A4/cpuset.cpus.effective 0

...

...
...
Additionally, processes cannot be migrated into a cgroup whose cpuset.cpus.effective is empty. However, this patch does not modify this behavior.

So why does applying this patch enable such migration?

Thanks, Sun Shaojie

-- Best regards, Ridong

Waiman Long

10:30 p.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

On 11/19/25 5:57 AM, Sun Shaojie wrote:

...

Currently, when setting a cpuset's cpuset.cpus to a value that conflicts with its sibling partition, the sibling's partition state becomes invalid. However, this invalidation is often unnecessary. If the cpuset being modified is exclusive, it should invalidate itself upon conflict.

This patch applies only to the following two cases：

Assume the machine has 4 CPUs (0-3).
root cgroup
   /    \
 A1      B1
Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus

Table 1.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |

After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."

Table 1.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |

Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus

Table 2.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root invalid | root invalid |

After step #4, B1 can exclusively use CPU 2. Therefore, at step #5, regardless of what conflicting value B1 writes to cpuset.cpus, it will always have at least CPU 2 available. This makes it unnecessary to mark A1 as "root invalid".

Table 2.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root | root invalid |

In summary, regardless of how B1 configures its cpuset.cpus, there will always be available CPUs in B1's cpuset.cpus.effective. Therefore, there is no need to change A1 from "root" to "root invalid".

All other cases remain unaffected. For example, cgroup-v1.

This patch is relatively simple. As others have pointed out, there are inconsistency depending on the operation ordering.

In the example above, the final configuration is A1:0-1 & B1:1-2. As the cpu lists overlap, we can't have both of them as valid partition roots. So either one of A1 or B1 is valid or they are both invalid. The current code makes them both invalid no matter the operation ordering. This patch will make one of them valid given the operation ordering above. To minimize partition invalidation, we will have to live with the fact that it will be first-come first-serve as noted by Michal. I am not against this, we just have to document it. However, the following operation order will still make both of them invalid:

# echo "0-1" >A1/cpuset.cpus # echo "2" > B1/cpuset.cpus # echo "1-2" > B1/cpuset.cpus # echo "root" > A1/cpuset.cpus.partition # echo "root" > B1/cpuset.cpus.partition

To follow the "first-come first-serve" rule, A1 should be valid and B1 invalid. That is the inconsistency with your current patch. To fix that, we still need to relax the overlap checking rule similar to your v4 patch.

Cheers, Longman

Sun Shaojie

26 Nov 26 Nov

12:29 p.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

Hi, Ridong,

On Mon, 24 Nov 2025 19:33:54, Chen Ridong wrote:

...

On 2025/11/24 18:20, Sun Shaojie wrote:

...
I'm sorry, I didn't quite understand the point you were trying to make with this example.

If that's the case
 root cgroup
      |
      A1
   / /  \ \
 A2 A3  A4 A5
#1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition mkdir A4 mkdir A5 echo "0" > A4/cpuset.cpus
If we don't apply your patch, A2 will be invalidated.

...
echo $$ > A4/cgroup.procs ->This will return an error because A4's effective CPUs are empty. echo "1" > A5/cpuset.cpus echo $$ > A5/cgroup.procs ->This will return an error because A5's effective CPUs are empty.

Even with this patch applied, this result will not change.

You can have a try, the result I got:

# mkdir A1 # echo "0-1" > A1/cpuset.cpus # echo "root" > A1/cpuset.cpus.partition # cd A1/ # mkdir A2 # mkdir A4 # mkdir A5 # echo "0-1" > A2/cpuset.cpus # echo "root" > A2/cpuset.cpus.partition # # echo "0" > A4/cpuset.cpus # cat A2/cpuset.cpus 0-1 # cat A2/cpuset.cpus.partition root invalid # cat A4/cpuset.cpus.effective 0

A4's cpuset.cpus.effective is 0 because A2 changed from root to root invalid. However, the purpose of this patch is precisely to keep A2 as "root".

Before 'echo "0" > A4/cpuset.cpus', A4 is aware that its cpuset.cpus.effective is empty and that its parent's cpuset.cpus.effective is also empty. Therefore, after executing 'echo "0" > A4/cpuset.cpus', A4 should anticipate the possibility that it may not be allocated any available CPUs.

Thanks, Sun Shaojie

Sun Shaojie

12:31 p.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

Hi, Longman,

On Mon, 24 Nov 2025 17:30:47, Waiman Long wrote:

...

On 11/19/25 5:57 AM, Sun Shaojie wrote:

...
Currently, when setting a cpuset's cpuset.cpus to a value that conflicts with its sibling partition, the sibling's partition state becomes invalid. However, this invalidation is often unnecessary. If the cpuset being modified is exclusive, it should invalidate itself upon conflict.

This patch applies only to the following two cases：

Assume the machine has 4 CPUs (0-3).
root cgroup
   /    \
 A1      B1
Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus

Table 1.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member |

After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root."

Table 1.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member |

Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus

Table 2.1: Before applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root invalid | root invalid |

After step #4, B1 can exclusively use CPU 2. Therefore, at step #5, regardless of what conflicting value B1 writes to cpuset.cpus, it will always have at least CPU 2 available. This makes it unnecessary to mark A1 as "root invalid".

Table 2.2: After applying this patch Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root | #5> echo "1-2" > B1/cpuset.cpus | root | root invalid |

In summary, regardless of how B1 configures its cpuset.cpus, there will always be available CPUs in B1's cpuset.cpus.effective. Therefore, there is no need to change A1 from "root" to "root invalid".

All other cases remain unaffected. For example, cgroup-v1.
This patch is relatively simple. As others have pointed out, there are inconsistency depending on the operation ordering.

In the example above, the final configuration is A1:0-1 & B1:1-2. As the cpu lists overlap, we can't have both of them as valid partition roots. So either one of A1 or B1 is valid or they are both invalid. The current code makes them both invalid no matter the operation ordering. This patch will make one of them valid given the operation ordering above. To minimize partition invalidation, we will have to live with the fact that it will be first-come first-serve as noted by Michal. I am not against this, we just have to document it. However, the following operation order will still make both of them invalid:

# echo "0-1" >A1/cpuset.cpus # echo "2" > B1/cpuset.cpus # echo "1-2" > B1/cpuset.cpus # echo "root" > A1/cpuset.cpus.partition # echo "root" > B1/cpuset.cpus.partition

To follow the "first-come first-serve" rule, A1 should be valid and B1 invalid. That is the inconsistency with your current patch. To fix that, we still need to relax the overlap checking rule similar to your v4 patch.

Thank you for your suggestion! Will update.

Thanks, Sun Shaojie

Michal Koutný

2:13 p.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

On Thu, Nov 20, 2025 at 09:05:57PM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:

...

...
Do you actually want to achieve this or is it an implementation side-effect of the Case 1 scenario that you want to achieve?

Yes, this is indeed the functionality I intended to achieve, as I find it follows the same logic as Case 1.

So you want to achieve a stable [1] set of CPUs for a cgroup that cannot be taken away from you by any sibling, correct? My reasoning is that the siblings should be under one management entity and therefore such overcommitment should be avoided already in the configuration. Invalidating all conflicting siblings is then the most fair result achievable. B1 is a second-class partition _only_ because it starts later or why is it OK to not fulfill its requirement?

[1] Note that A1 should still watch its cpuset.cpus.partition if it takes exclusivity seriously because its cpus may be taken away by hot(un)plug or ancestry reconfiguration.

...

As for your point that "the effective config cannot be derived just from the applied values," even before this patch, we couldn't derive the final effective configuration solely from the applied values.

For example, consider the following scenario: (not apply this patch) Table 1: Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "1-2" > B1/cpuset.cpus | root invalid | member |

Table 2: Step | A1's prstate | B1's prstate | #1> echo "1-2" > B1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root invalid | member | #3> echo "0-1" > A1/cpuset.cpus | root | member |

After step #3, both Table 1 and Table 2 have identical value settings, yet A1's partition state differs between them.

Aha, I must admit I didn't expect that. IMO, nothing (documented) prevents the latter (Table 2) behavior (here I'm referring to cpuset.cpus, not sure about cpuset.cpus.exclusive). Which of Table 1 or Table do you prefer?

Thanks, Michal

Michal Koutný

2:13 p.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

On Mon, Nov 24, 2025 at 05:30:47PM -0500, Waiman Long llong@redhat.com wrote:

...

In the example above, the final configuration is A1:0-1 & B1:1-2. As the cpu lists overlap, we can't have both of them as valid partition roots. So either one of A1 or B1 is valid or they are both invalid. The current code makes them both invalid no matter the operation ordering. This patch will make one of them valid given the operation ordering above. To minimize partition invalidation, we will have to live with the fact that it will be first-come first-serve as noted by Michal. I am not against this, we just have to document it. However, the following operation order will still make both of them invalid:

I'm skeptical of the FCFS behavior since I'm afraid it may be subject to race conditions in practice. BTW should cpuset.cpus and cpuset.cpus.exclusive have different behavior in this regard?

Thanks, Michal

Waiman Long

7:43 p.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

On 11/26/25 9:13 AM, Michal Koutný wrote:

...

On Mon, Nov 24, 2025 at 05:30:47PM -0500, Waiman Long llong@redhat.com wrote:

...
In the example above, the final configuration is A1:0-1 & B1:1-2. As the cpu lists overlap, we can't have both of them as valid partition roots. So either one of A1 or B1 is valid or they are both invalid. The current code makes them both invalid no matter the operation ordering. This patch will make one of them valid given the operation ordering above. To minimize partition invalidation, we will have to live with the fact that it will be first-come first-serve as noted by Michal. I am not against this, we just have to document it. However, the following operation order will still make both of them invalid:

I'm skeptical of the FCFS behavior since I'm afraid it may be subject to race conditions in practice. BTW should cpuset.cpus and cpuset.cpus.exclusive have different behavior in this regard?

Modification to cpumasks are all serialized by the cpuset_mutex. If you are referring to 2 or more tasks doing parallel updates to various cpuset control files of sibling cpusets, the results can actually vary depending on the actual serialization results of those operations.

One difference between cpuset.cpus and cpuset.cpus.exclusive is the fact that operations on cpuset.cpus.exclusive can fail if the result is not exclusive WRT sibling cpusets, but becoming a valid partition is guaranteed unless none of the exclusive CPUs are passed down from the parent. The use of cpuset.cpus.exclusive is required for creating remote partition.

OTOH, changes to cpuset.cpus will never fail, but becoming a valid partition root is not guaranteed and is limited to the creation of local partition only.

Does that answer your question?

Cheers, Longman

Chen Ridong

27 Nov 27 Nov

1:55 a.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

On 2025/11/27 3:43, Waiman Long wrote:

...

On 11/26/25 9:13 AM, Michal Koutný wrote:

...
On Mon, Nov 24, 2025 at 05:30:47PM -0500, Waiman Long llong@redhat.com wrote:

...
In the example above, the final configuration is A1:0-1 & B1:1-2. As the cpu lists overlap, we can't have both of them as valid partition roots. So either one of A1 or B1 is valid or they are both invalid. The current code makes them both invalid no matter the operation ordering. This patch will

I have to admit that I prefer the current implementation.

At the very least, it ensures that all partitions are treated fairly[1]. Relaxing this rule would make it more difficult for users to understand why the cpuset.cpus they configured do not match the effective CPUs in use, and why different operation orders yield different results.

In another scenario, if we do not invalidate the siblings, new leaf cpusets (marked as member) created under A1 will end up with empty effective CPUs—and this is not a desired behavior.

root cgroup | A1 / \ A2 A3...

[1]: "B1 is a second-class partition only because it starts later or why is it OK to not fulfill its requirement?" --Michal.

...

...
...
make one of them valid given the operation ordering above. To minimize partition invalidation, we will have to live with the fact that it will be first-come first-serve as noted by Michal. I am not against this, we just have to document it. However, the following operation order will still make both of them invalid:

I'm skeptical of the FCFS behavior since I'm afraid it may be subject to race conditions in practice. BTW should cpuset.cpus and cpuset.cpus.exclusive have different behavior in this regard?

Modification to cpumasks are all serialized by the cpuset_mutex. If you are referring to 2 or more tasks doing parallel updates to various cpuset control files of sibling cpusets, the results can actually vary depending on the actual serialization results of those operations.

One difference between cpuset.cpus and cpuset.cpus.exclusive is the fact that operations on cpuset.cpus.exclusive can fail if the result is not exclusive WRT sibling cpusets, but becoming a valid partition is guaranteed unless none of the exclusive CPUs are passed down from the parent. The use of cpuset.cpus.exclusive is required for creating remote partition.

OTOH, changes to cpuset.cpus will never fail, but becoming a valid partition root is not guaranteed and is limited to the creation of local partition only.

Does that answer your question?

Cheers, Longman

-- Best regards, Ridong

Chen Ridong

1:57 a.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

On 2025/11/26 22:13, Michal Koutný wrote:

...

On Thu, Nov 20, 2025 at 09:05:57PM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:

...
...
Do you actually want to achieve this or is it an implementation side-effect of the Case 1 scenario that you want to achieve?

Yes, this is indeed the functionality I intended to achieve, as I find it follows the same logic as Case 1.

So you want to achieve a stable [1] set of CPUs for a cgroup that cannot be taken away from you by any sibling, correct? My reasoning is that the siblings should be under one management entity and therefore such overcommitment should be avoided already in the configuration. Invalidating all conflicting siblings is then the most fair result achievable. B1 is a second-class partition _only_ because it starts later or why is it OK to not fulfill its requirement?

[1] Note that A1 should still watch its cpuset.cpus.partition if it takes exclusivity seriously because its cpus may be taken away by hot(un)plug or ancestry reconfiguration.

...
As for your point that "the effective config cannot be derived just from the applied values," even before this patch, we couldn't derive the final effective configuration solely from the applied values.

For example, consider the following scenario: (not apply this patch) Table 1: Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "1-2" > B1/cpuset.cpus | root invalid | member |

Table 2: Step | A1's prstate | B1's prstate | #1> echo "1-2" > B1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root invalid | member | #3> echo "0-1" > A1/cpuset.cpus | root | member |

After step #3, both Table 1 and Table 2 have identical value settings, yet A1's partition state differs between them.

A corner case should be fixed, and I have sent the patch.

https://lore.kernel.org/cgroups/20251115093140.1121329-1-chenridong@huaweicl...

...

Aha, I must admit I didn't expect that. IMO, nothing (documented) prevents the latter (Table 2) behavior (here I'm referring to cpuset.cpus, not sure about cpuset.cpus.exclusive). Which of Table 1 or Table do you prefer?

Thanks, Michal

-- Best regards, Ridong

Sun Shaojie

1 Dec 1 Dec

9:38 a.m.

New subject: [PATCH v6] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

Currently, when setting a cpuset's cpuset.cpus to a value that conflicts with its sibling partition, the sibling's partition state becomes invalid. However, this invalidation is often unnecessary.

For example: On a machine with 128 CPUs, there are m (m < 128) cpusets under the root cgroup. Each cpuset is used by a single user(user-1 use A1, ... , user-m use Am), and the partition states of these cpusets are configured as follows:

root cgroup / / \ \ A1 A2 ... An Am (root) (root) ... (root) (root/root invalid/member)

Assume that A1 through Am have not set cpuset.cpus.exclusive. When user-m modifies Am's cpuset.cpus to "0-127", it will cause all partition states from A1 to An to change from root to root invalid, as shown below.

root cgroup / / \ \ A1 A2 ... An Am (root invalid) (root invalid) ... (root invalid) (root invalid/member)

This outcome is entirely undeserved for all users from A1 to An.

This patch prevents such outcomes by ensuring that modifications to cpuset.cpus do not affect the partition state of other sibling cpusets. Therefore, with this patch applied, when user-m configures Am's cpuset.cpus to "0-127", the result will be as follows.

root cgroup / / \ \ A1 A2 ... An Am (root) (root) ... (root) (root invalid/member)

It is worth noting that, since this patch enforces the exclusivity of sibling cpusets, setting exclusivity now follows a "first-come, first-served" principle.

For example, consider the following four steps: before applying this patch, regardless of the order in which they are executed, the final partition state of both A1 and B1 would always be "root invalid."

After applying this patch, the first party to set "root" will maintain its exclusive validity. As follows:

In summary, if the current cpuset conflicts with its sibling cpusets on exclusive CPUs (If a cpuset is exclusive and its exclusive CPUs are empty, its allowed CPUs will be treated as exclusive CPUs), only the current cpuset should bear the consequences.

Signed-off-by: Sun Shaojie sunshaojie@kylinos.cn --- kernel/cgroup/cpuset-internal.h | 3 + kernel/cgroup/cpuset-v1.c | 19 ++++++ kernel/cgroup/cpuset.c | 60 ++++++++++++------- .../selftests/cgroup/test_cpuset_prs.sh | 12 ++-- 4 files changed, 65 insertions(+), 29 deletions(-)

#endif /* __CPUSET_INTERNAL_H */ diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c index 12e76774c75b..5aa0ac092ef6 100644 --- a/kernel/cgroup/cpuset-v1.c +++ b/kernel/cgroup/cpuset-v1.c @@ -373,6 +373,25 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) return ret; }

+/* + * cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU conflicts + * to legacy (v1) + * @cs1: first cpuset to check + * @cs2: second cpuset to check + * + * Returns: true if CPU exclusivity conflict exists, false otherwise + * + * If either cpuset is CPU exclusive, their allowed CPUs cannot intersect. + */ +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) +{ + if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2)) + return cpumask_intersects(cs1->cpus_allowed, + cs2->cpus_allowed); + + return false; +} + #ifdef CONFIG_PROC_PID_CPUSET /* * proc_cpuset_show() diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..e58dd26e074a 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -586,14 +586,24 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2) * Returns: true if CPU exclusivity conflict exists, false otherwise * * Conflict detection rules: - * 1. If either cpuset is CPU exclusive, they must be mutually exclusive + * For cgroup-v1: + * see cpuset1_cpus_excl_conflict() + * For cgroup-v2: + * 1. If both cs1 and cs2 are exclusive, cs1 and cs2 must be mutually exclusive * 2. exclusive_cpus masks cannot intersect between cpusets * 3. The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs + * 4. If a cpuset is exclusive and its exclusive CPUs are empty, its allowed CPUs + * will be treated as exclusive CPUs; therefore, its allowed CPUs must not + * intersect with another's exclusive CPUs. */ static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) { - /* If either cpuset is exclusive, check if they are mutually exclusive */ - if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2)) + /* For cgroup-v1 */ + if (!cpuset_v2()) + return cpuset1_cpus_excl_conflict(cs1, cs2); + + /* If cpusets are exclusive, check if they are mutually exclusive*/ + if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2)) return !cpusets_are_exclusive(cs1, cs2);

/* Exclusive_cpus cannot intersect */ @@ -609,6 +619,20 @@ static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus)) return true;

+ /* + * When a cpuset is exclusive and its exclusive CPUs are empty, + * its cpus_allowed cannot intersect with another cpuset's exclusive_cpus. + */ + if (is_cpu_exclusive(cs1) && + cpumask_empty(cs1->exclusive_cpus) && + cpumask_intersects(cs1->cpus_allowed, cs2->exclusive_cpus)) + return true; + + if (is_cpu_exclusive(cs2) && + cpumask_empty(cs2->exclusive_cpus) && + cpumask_intersects(cs2->cpus_allowed, cs1->exclusive_cpus)) + return true; + return false; }

@@ -2411,34 +2435,17 @@ static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialc struct tmpmasks *tmp) { int retval; - struct cpuset *parent = parent_cs(cs);

retval = validate_change(cs, trialcs);

if ((retval == -EINVAL) && cpuset_v2()) { - struct cgroup_subsys_state *css; - struct cpuset *cp; - /* * The -EINVAL error code indicates that partition sibling * CPU exclusivity rule has been violated. We still allow * the cpumask change to proceed while invalidating the - * partition. However, any conflicting sibling partitions - * have to be marked as invalid too. + * partition. */ trialcs->prs_err = PERR_NOTEXCL; - rcu_read_lock(); - cpuset_for_each_child(cp, css, parent) { - struct cpumask *xcpus = user_xcpus(trialcs); - - if (is_partition_valid(cp) && - cpumask_intersects(xcpus, cp->effective_xcpus)) { - rcu_read_unlock(); - update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp); - rcu_read_lock(); - } - } - rcu_read_unlock(); retval = 0; } return retval; @@ -2506,8 +2513,15 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, if (alloc_tmpmasks(&tmp)) return -ENOMEM;

- compute_trialcs_excpus(trialcs, cs); - trialcs->prs_err = PERR_NONE; + /* + * if there is exclusive CPUs conflict with the siblings, + * we still allow the cpumask change to proceed while + * invalidating the partition. + */ + if (compute_trialcs_excpus(trialcs, cs)) + trialcs->prs_err = PERR_NOTEXCL; + else + trialcs->prs_err = PERR_NONE;

retval = cpus_allowed_validate_change(cs, trialcs, &tmp); if (retval < 0) diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh index a17256d9f88a..75154e22c702 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -269,7 +269,7 @@ TEST_MATRIX=( " C0-3:S+ C1-3:S+ C2-3 . X2-3 X3:P2 . . 0 A1:0-2|A2:3|A3:3 A1:P0|A2:P2 3" " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3" " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2:C3 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3" - " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-3|A2:1-3|A3:2-3|B1:2-3 A1:P0|A3:P0|B1:P-2" + " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-1|A2:1|A3:1|B1:2-3 A1:P0|A3:P0|B1:P2" " C0-3:S+ C1-3:S+ C2-3 C4-5 . . . P2 0 B1:4-5 B1:P2 4-5" " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4" " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2:C1-3 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4" @@ -318,7 +318,7 @@ TEST_MATRIX=( # Invalid to valid local partition direct transition tests " C1-3:S+:P2 X4:P2 . . . . . . 0 A1:1-3|XA1:1-3|A2:1-3:XA2: A1:P2|A2:P-2 1-3" " C1-3:S+:P2 X4:P2 . . . X3:P2 . . 0 A1:1-2|XA1:1-3|A2:3:XA2:3 A1:P2|A2:P2 1-3" - " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:4-6 A1:P-2|B1:P0" + " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:5-6 A1:P2|B1:P0" " C0-3:P2 . . C4-6 C0-4:C0-3 . . . 0 A1:0-3|B1:4-6 A1:P2|B1:P0 0-3"

# Local partition invalidation tests @@ -388,10 +388,10 @@ TEST_MATRIX=( " C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:1 A1:P0|A2:P-2" " C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"

- # A non-exclusive cpuset.cpus change will invalidate partition and its siblings - " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P0" - " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P-1" - " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P0|B1:P-1" + # A non-exclusive cpuset.cpus change will not invalidate its siblings partition. + " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:3 A1:P1|B1:P0" + " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P-1|B1:P1" + " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P0|B1:P1"

# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it " C0-3 . . C4-5 X5 . . . 0 A1:0-3|B1:4-5"

-- 2.25.1

Sun Shaojie

9:42 a.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

Hi, Michal

On Wed, 26 Nov 2025 15:13:13, Michal Koutný wrote:

...

On Thu, Nov 20, 2025 at 09:05:57PM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:

...
...
Do you actually want to achieve this or is it an implementation side-effect of the Case 1 scenario that you want to achieve?

Yes, this is indeed the functionality I intended to achieve, as I find it follows the same logic as Case 1.

So you want to achieve a stable [1] set of CPUs for a cgroup that cannot be taken away from you by any sibling, correct? My reasoning is that the siblings should be under one management entity and therefore such overcommitment should be avoided already in the configuration. Invalidating all conflicting siblings is then the most fair result achievable. B1 is a second-class partition _only_ because it starts later or why is it OK to not fulfill its requirement?

If the siblings are under a single management entity, that certainly works. But what if there are multiple administrative users? Should we really violate other users' requirements just to satisfy one user's requirement? Given this, first-come-first-served might be fairer.

...

[1] Note that A1 should still watch its cpuset.cpus.partition if it takes exclusivity seriously because its cpus may be taken away by hot(un)plug or ancestry reconfiguration.

Thanks, Sun Shaojie

Sun Shaojie

9:44 a.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

Hi, Ridong,

On Thu, 27 Nov 2025 09:55:21, Chen Ridong wrote:

...

I have to admit that I prefer the current implementation.

At the very least, it ensures that all partitions are treated fairly[1]. Relaxing this rule would make it more difficult for users to understand why the cpuset.cpus they configured do not match the effective CPUs in use, and why different operation orders yield different results.

As for "different operation orders yield different results", Below is an example that is not a corner case.

root cgroup / \ A1 B1

#1> echo "0" > A1/cpuset.cpus #2> echo "0-1" > B1/cpuset.cpus.exclusive --> return error

#1> echo "0-1" > B1/cpuset.cpus.exclusive #2> echo "0" > A1/cpuset.cpus

...

In another scenario, if we do not invalidate the siblings, new leaf cpusets (marked as member) created under A1 will end up with empty effective CPUs—and this is not a desired behavior.

root cgroup | A1 / \ A2 A3...

#1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition mkdir A4 mkdir A5 echo "0" > A4/cpuset.cpus echo $$ > A4/cgroup.procs echo "1" > A5/cpuset.cpus echo $$ > A5/cgroup.procs

If A2...A5 all belong to the same user, and that user wants both A4 and A5 to have effective CPUs, then the user should also understand that A2 needs to be adjusted to "member" instead of "root".

if A2...A5 belong to different users, must satisfying user A4’s requirement come at the expense of user A2’s requirement? That is not fair.

...

[1]: "B1 is a second-class partition only because it starts later or why is it OK to not fulfill its requirement?" --Michal.

Thanks, Sun Shaojie

Michal Koutný

8 Dec 8 Dec

2:31 p.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

Hello.

On Mon, Dec 01, 2025 at 05:44:47PM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:

...

As for "different operation orders yield different results", Below is an example that is not a corner case.
root cgroup
  /    \
 A1    B1
#1> echo "0" > A1/cpuset.cpus #2> echo "0-1" > B1/cpuset.cpus.exclusive --> return error

#1> echo "0-1" > B1/cpuset.cpus.exclusive #2> echo "0" > A1/cpuset.cpus

Here it is a combination of remote cs local partitions. I'd like to treat the two approaches separately and better not consider their combination.

The idea (and permissions check AFACS) behind remote partitions is to allow "stealing" CPU ownership so cpuset.cpus.exclusive has different behavior.

...

...
root cgroup | A1 //MK: A4 A5 here? / \ A2 A3... //MK: A4 A5 or here?

#1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition mkdir A4 mkdir A5 echo "0" > A4/cpuset.cpus echo $$ > A4/cgroup.procs echo "1" > A5/cpuset.cpus echo $$ > A5/cgroup.procs

If A2...A5 all belong to the same user, and that user wants both A4 and A5 to have effective CPUs, then the user should also understand that A2 needs to be adjusted to "member" instead of "root".

if A2...A5 belong to different users, must satisfying user A4’s requirement come at the expense of user A2’s requirement? That is not fair.

If A4 is a sibling at the level of A1, then A2 must be stripped of its CPUs to honor the hierarchy hence the apparent unfairness.

If A4 is a sibling at the level of A2 and they have different owning users, their respective cpuset.cpus should only be writable by A1's user (the one who distributes the cpus) so that any arbitration between the siblings is avoided.

0.02€, Michal

Michal Koutný

2:32 p.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

Hi Waiman.

On Wed, Nov 26, 2025 at 02:43:50PM -0500, Waiman Long llong@redhat.com wrote:

...

Modification to cpumasks are all serialized by the cpuset_mutex. If you are referring to 2 or more tasks doing parallel updates to various cpuset control files of sibling cpusets, the results can actually vary depending on the actual serialization results of those operations.

I meant the latter when the difference in results when concurrent tasks do the update (e.g. two containers start in parallel), I don't see an issue with the race wrt consistency of in-kernel data. We're on the same page here.

...

One difference between cpuset.cpus and cpuset.cpus.exclusive is the fact that operations on cpuset.cpus.exclusive can fail if the result is not exclusive WRT sibling cpusets, but becoming a valid partition is guaranteed unless none of the exclusive CPUs are passed down from the parent. The use of cpuset.cpus.exclusive is required for creating remote partition.

OTOH, changes to cpuset.cpus will never fail, but becoming a valid partition root is not guaranteed and is limited to the creation of local partition only.

Does that answer your question?

It does help my understanding. Do you envision that remote and local partitions should be used together (in one subtree)?

Thanks, Michal

Sun Shaojie

10 Dec 10 Dec

10:11 a.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

Hi, Michal,

On Mon, 8 Dec 2025 15:31:52 +0100, Michal Koutný wrote:

...

...
...
root cgroup | A1 //MK: A4 A5 here? / \ A2 A3... //MK: A4 A5 or here?

#1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition mkdir A4 mkdir A5 echo "0" > A4/cpuset.cpus echo $$ > A4/cgroup.procs echo "1" > A5/cpuset.cpus echo $$ > A5/cgroup.procs

If A2...A5 all belong to the same user, and that user wants both A4 and A5 to have effective CPUs, then the user should also understand that A2 needs to be adjusted to "member" instead of "root".

if A2...A5 belong to different users, must satisfying user A4’s requirement come at the expense of user A2’s requirement? That is not fair.

If A4 is a sibling at the level of A1, then A2 must be stripped of its CPUs to honor the hierarchy hence the apparent unfairness.

If A4 is a sibling at the level of A2 and they have different owning users, their respective cpuset.cpus should only be writable by A1's user (the one who distributes the cpus) so that any arbitration between the siblings is avoided.

Regardless of whether A1 through A5 belong to the same user or different users, arbitration conflicts between sibling nodes can still occur (e.g., due to user misconfiguration). The key question is: when such a conflict arises, should all sibling nodes be invalidated, or only the node that triggered the conflict?

Thanks, Sun Shaojie

Michal Koutný

11 Dec 11 Dec

10:59 a.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

On Wed, Dec 10, 2025 at 06:11:08PM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:

...

Regardless of whether A1 through A5 belong to the same user or different users, arbitration conflicts between sibling nodes can still occur (e.g., due to user misconfiguration). The key question is: when such a conflict arises, should all sibling nodes be invalidated, or only the node that triggered the conflict?

Any serious [1] affinity users should watch for cpuset.cpus.partition already (since it can be invalidated by hotplug or IMO more probable ancestor re-configuration). Do you agree?

Then I'd say it's reasonable to invalidate all (same reasoning -- it doesn't matter on the order in which siblings are configured, I consider local partitions). What would you see as the upsides of invalidating only the last offender (under the assumption above about watching)?

Thanks, Michal

[1] The others may make use of the proposed cpu.max.concurrency [2] [2] https://lpc.events/event/18/contributions/1978/

Sun Shaojie

12 Dec 12 Dec

10:10 a.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

Hi, Michal

On Thu, 11 Dec 2025 11:59:27 +0100, Michal Koutný wrote:

...

On Wed, Dec 10, 2025 at 06:11:08PM +0800, Sun Shaojie sunshaojie@kylinos.cn wrote:

...
Regardless of whether A1 through A5 belong to the same user or different users, arbitration conflicts between sibling nodes can still occur (e.g., due to user misconfiguration). The key question is: when such a conflict arises, should all sibling nodes be invalidated, or only the node that triggered the conflict?

Any serious [1] affinity users should watch for cpuset.cpus.partition already (since it can be invalidated by hotplug or IMO more probable ancestor re-configuration). Do you agree?

Then I'd say it's reasonable to invalidate all (same reasoning -- it doesn't matter on the order in which siblings are configured, I consider local partitions). What would you see as the upsides of invalidating only the last offender (under the assumption above about watching)?

I agree that users should watch the state of their cpuset.cpus.partition. Moreover, assuming the user is watching, there is no harm in invalidating only the last conflicting partition.

For example

root cgroup | -------------------------- | | | | | A B ... M N (root) (root) ... (root) (root)

Condition: Node N is the last one configured by the user. After its configuration, it conflicts with all previous nodes (A through M).

When all are invalidated, the user will notice that A-M are all invalidated because they are watching. If the user wants to restore the exclusivity of A-M, they need to reconfigure A-M once more, as well as N.

When only the last conflict is invalidated, the user will notice that N is invalidated, and then they only need to reconfigure N. This seems more convenient for the user.

However, whether watching is in place is not the key to this issue, because watching merely reveals the outcome.

If A through N belong to different users, and when N conflicts with all of A through M, then after the users of A-M observe the invalidation result through watching, they cannot even restore their exclusive state, because they will always conflict with N.

Thanks, Sun Shaojie

Chen Ridong

13 Dec 13 Dec

12:52 a.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

On 2025/12/1 17:44, Sun Shaojie wrote:

...

Hi, Ridong,

On Thu, 27 Nov 2025 09:55:21, Chen Ridong wrote:

...
I have to admit that I prefer the current implementation.

At the very least, it ensures that all partitions are treated fairly[1]. Relaxing this rule would make it more difficult for users to understand why the cpuset.cpus they configured do not match the effective CPUs in use, and why different operation orders yield different results.

As for "different operation orders yield different results", Below is an example that is not a corner case.
root cgroup
  /    \
 A1    B1
#1> echo "0" > A1/cpuset.cpus #2> echo "0-1" > B1/cpuset.cpus.exclusive --> return error

#1> echo "0-1" > B1/cpuset.cpus.exclusive #2> echo "0" > A1/cpuset.cpus

You're looking at one rule, but there's another one—Longman pointed out that setting cpuset.cpu should never fail.

...

...
In another scenario, if we do not invalidate the siblings, new leaf cpusets (marked as member) created under A1 will end up with empty effective CPUs—and this is not a desired behavior.

root cgroup | A1 / \ A2 A3...

#1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition mkdir A4 mkdir A5 echo "0" > A4/cpuset.cpus echo $$ > A4/cgroup.procs echo "1" > A5/cpuset.cpus echo $$ > A5/cgroup.procs

If A2...A5 all belong to the same user, and that user wants both A4 and A5 to have effective CPUs, then the user should also understand that A2 needs to be adjusted to "member" instead of "root".

if A2...A5 belong to different users, must satisfying user A4’s requirement come at the expense of user A2’s requirement? That is not fair.

Regarding cpuset usage with Docker: when binding CPUs at container startup, do you check the sibling CPUs in use? Without this check, A2 will not be invalidated.

Your patch has been discussed for a while. It seems to make the rules more complex.

...

...
[1]: "B1 is a second-class partition only because it starts later or why is it OK to not fulfill its requirement?" --Michal.

Thanks, Sun Shaojie

-- Best regards, Ridong

Waiman Long

4:58 a.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

On 12/8/25 9:32 AM, Michal Koutný wrote:

...

Hi Waiman.

On Wed, Nov 26, 2025 at 02:43:50PM -0500, Waiman Long llong@redhat.com wrote:

...
Modification to cpumasks are all serialized by the cpuset_mutex. If you are referring to 2 or more tasks doing parallel updates to various cpuset control files of sibling cpusets, the results can actually vary depending on the actual serialization results of those operations.

I meant the latter when the difference in results when concurrent tasks do the update (e.g. two containers start in parallel), I don't see an issue with the race wrt consistency of in-kernel data. We're on the same page here.

...
One difference between cpuset.cpus and cpuset.cpus.exclusive is the fact that operations on cpuset.cpus.exclusive can fail if the result is not exclusive WRT sibling cpusets, but becoming a valid partition is guaranteed unless none of the exclusive CPUs are passed down from the parent. The use of cpuset.cpus.exclusive is required for creating remote partition.

OTOH, changes to cpuset.cpus will never fail, but becoming a valid partition root is not guaranteed and is limited to the creation of local partition only.

Does that answer your question?

It does help my understanding. Do you envision that remote and local partitions should be used together (in one subtree)?

It should be rare to have both remote and local partition enabled in the same system, though it is not disallowed. The local partition should only be used on system that run a small number of applications with one or just a few that need partition support. For systems that run a large number of containerized applications like a Kubernetes managed system, local partition cannot be used because of the way container management is being done as the actual cgroups associated with a container can be a bit far from the cgroup root. Remote partition was created for such a use case where local partition will be used at all.

Cheers, Longman

Sun Shaojie

17 Dec 17 Dec

9:09 a.m.

New subject: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict

Hi, Ridong

On Sat, 13 Dec 2025 08:52:11 +0800, Chen Ridong wrote:

...

On 2025/12/1 17:44, Sun Shaojie wrote:

...
Hi, Ridong,

On Thu, 27 Nov 2025 09:55:21, Chen Ridong wrote:

...
I have to admit that I prefer the current implementation.

At the very least, it ensures that all partitions are treated fairly[1]. Relaxing this rule would make it more difficult for users to understand why the cpuset.cpus they configured do not match the effective CPUs in use, and why different operation orders yield different results.

As for "different operation orders yield different results", Below is an example that is not a corner case.
root cgroup
  /    \
 A1    B1
#1> echo "0" > A1/cpuset.cpus #2> echo "0-1" > B1/cpuset.cpus.exclusive --> return error

#1> echo "0-1" > B1/cpuset.cpus.exclusive #2> echo "0" > A1/cpuset.cpus
You're looking at one rule, but there's another one—Longman pointed out that setting cpuset.cpu should never fail.

Precisely because I know that setting cpuset.cpus should never fail, I provided this example, which is why it demonstrates that "different operation orders yield different results."

...

...
...
In another scenario, if we do not invalidate the siblings, new leaf cpusets (marked as member) created under A1 will end up with empty effective CPUs—and this is not a desired behavior.

root cgroup | A1 / \ A2 A3...

#1> echo "0-1" > A1/cpuset.cpus #2> echo "root" > A1/cpuset.cpus.partition #3> echo "0-1" > A2/cpuset.cpus #4> echo "root" > A2/cpuset.cpus.partition mkdir A4 mkdir A5 echo "0" > A4/cpuset.cpus echo $$ > A4/cgroup.procs echo "1" > A5/cpuset.cpus echo $$ > A5/cgroup.procs

If A2...A5 all belong to the same user, and that user wants both A4 and A5 to have effective CPUs, then the user should also understand that A2 needs to be adjusted to "member" instead of "root".

if A2...A5 belong to different users, must satisfying user A4’s requirement come at the expense of user A2’s requirement? That is not fair.

Regarding cpuset usage with Docker: when binding CPUs at container startup, do you check the sibling CPUs in use? Without this check, A2 will not be invalidated.

Your patch has been discussed for a while. It seems to make the rules more complex.

My aim is to safeguard the independence of sibling nodes while adhering to existing rules. I continuously update the patch to uphold these rules, as seen in the recently updated patch v6 (https://lore.kernel.org/cgroups/20251201093806.107157-1-sunshaojie@kylinos.c...).

Thanks, Sun Shaojie

Waiman Long

25 Dec 25 Dec

7:30 a.m.

New subject: [PATCH v6] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

On 12/1/25 4:38 AM, Sun Shaojie wrote:

...

Currently, when setting a cpuset's cpuset.cpus to a value that conflicts with its sibling partition, the sibling's partition state becomes invalid. However, this invalidation is often unnecessary.

For example: On a machine with 128 CPUs, there are m (m < 128) cpusets under the root cgroup. Each cpuset is used by a single user(user-1 use A1, ... , user-m use Am), and the partition states of these cpusets are configured as follows:
                        root cgroup
     /             /                  \                 \
    A1            A2        ...       An                Am
  (root)        (root)      ...     (root) (root/root invalid/member)
Assume that A1 through Am have not set cpuset.cpus.exclusive. When user-m modifies Am's cpuset.cpus to "0-127", it will cause all partition states from A1 to An to change from root to root invalid, as shown below.
                        root cgroup
     /              /                 \                 \
    A1             A2       ...       An                Am
(root invalid) (root invalid) ... (root invalid) (root invalid/member)

This outcome is entirely undeserved for all users from A1 to An.

This patch prevents such outcomes by ensuring that modifications to cpuset.cpus do not affect the partition state of other sibling cpusets. Therefore, with this patch applied, when user-m configures Am's cpuset.cpus to "0-127", the result will be as follows.
                        root cgroup
     /             /                  \                 \
    A1            A2        ...       An                Am
  (root)        (root)      ...     (root)     (root invalid/member)
It is worth noting that, since this patch enforces the exclusivity of sibling cpusets, setting exclusivity now follows a "first-come, first-served" principle.

For example, consider the following four steps: before applying this patch, regardless of the order in which they are executed, the final partition state of both A1 and B1 would always be "root invalid."

Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "1-2" > B1/cpuset.cpus | root invalid | member | #4> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid |

After applying this patch, the first party to set "root" will maintain its exclusive validity. As follows:

Step | A1's prstate | B1's prstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "1-2" > B1/cpuset.cpus | root | member | #4> echo "root" > B1/cpuset.cpus.partition | root | root invalid |

Step | A1's prstate | B1's prstate | #1> echo "0-1" > B1/cpuset.cpus | member | member | #2> echo "root" > B1/cpuset.cpus.partition | member | root | #3> echo "1-2" > A1/cpuset.cpus | member | root | #4> echo "root" > A1/cpuset.cpus.partition | root invalid | root |

In summary, if the current cpuset conflicts with its sibling cpusets on exclusive CPUs (If a cpuset is exclusive and its exclusive CPUs are empty, its allowed CPUs will be treated as exclusive CPUs), only the current cpuset should bear the consequences.

Signed-off-by: Sun Shaojie sunshaojie@kylinos.cn

I agree with you that it is probably not a good idea to invalidate partitions whenever there is a conflict. However, I have a different idea of how to do it. I am going to post another patch to show my idea. Let me know what you think about it and whether it can meet your need.

Cheers, Longman

...

kernel/cgroup/cpuset-internal.h | 3 + kernel/cgroup/cpuset-v1.c | 19 ++++++ kernel/cgroup/cpuset.c | 60 ++++++++++++------- .../selftests/cgroup/test_cpuset_prs.sh | 12 ++-- 4 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h index 337608f408ce..c53111998432 100644 --- a/kernel/cgroup/cpuset-internal.h +++ b/kernel/cgroup/cpuset-internal.h @@ -292,6 +292,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs, struct cpumask *new_cpus, nodemask_t *new_mems, bool cpus_updated, bool mems_updated); int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial); +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2); #else static inline void fmeter_init(struct fmeter *fmp) {} static inline void cpuset1_update_task_spread_flags(struct cpuset *cs, @@ -302,6 +303,8 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs, bool cpus_updated, bool mems_updated) {} static inline int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) { return 0; } +static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1,
		struct cpuset *cs2) {return false; }
#endif /* CONFIG_CPUSETS_V1 */
#endif /* __CPUSET_INTERNAL_H */ diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c index 12e76774c75b..5aa0ac092ef6 100644 --- a/kernel/cgroup/cpuset-v1.c +++ b/kernel/cgroup/cpuset-v1.c @@ -373,6 +373,25 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) return ret; } +/*
cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU conflicts
                           to legacy (v1)
@cs1: first cpuset to check

@cs2: second cpuset to check

Returns: true if CPU exclusivity conflict exists, false otherwise

If either cpuset is CPU exclusive, their allowed CPUs cannot intersect.

*/
+bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) +{
if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
return cpumask_intersects(cs1->cpus_allowed,
			  cs2->cpus_allowed);
return false;
+}

#ifdef CONFIG_PROC_PID_CPUSET /*

proc_cpuset_show()

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..e58dd26e074a 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -586,14 +586,24 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2)

Returns: true if CPU exclusivity conflict exists, false otherwise

Conflict detection rules:

If either cpuset is CPU exclusive, they must be mutually exclusive
For cgroup-v1:
see cpuset1_cpus_excl_conflict()
For cgroup-v2:

If both cs1 and cs2 are exclusive, cs1 and cs2 must be mutually exclusive

exclusive_cpus masks cannot intersect between cpusets

The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs

If a cpuset is exclusive and its exclusive CPUs are empty, its allowed CPUs

will be treated as exclusive CPUs; therefore, its allowed CPUs must not

intersect with another's exclusive CPUs.

*/ static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) {
/* If either cpuset is exclusive, check if they are mutually exclusive */

if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
/* For cgroup-v1 */

if (!cpuset_v2())
return cpuset1_cpus_excl_conflict(cs1, cs2);
/* If cpusets are exclusive, check if they are mutually exclusive*/

if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2)) return !cpusets_are_exclusive(cs1, cs2);
/* Exclusive_cpus cannot intersect */ @@ -609,6 +619,20 @@ static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus)) return true;
/*
* When a cpuset is exclusive and its exclusive CPUs are empty,
* its cpus_allowed cannot intersect with another cpuset's exclusive_cpus.
*/
if (is_cpu_exclusive(cs1) &&
   cpumask_empty(cs1->exclusive_cpus) &&
   cpumask_intersects(cs1->cpus_allowed, cs2->exclusive_cpus))
return true;
if (is_cpu_exclusive(cs2) &&
   cpumask_empty(cs2->exclusive_cpus) &&
   cpumask_intersects(cs2->cpus_allowed, cs1->exclusive_cpus))
return true;
return false; }
@@ -2411,34 +2435,17 @@ static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialc struct tmpmasks *tmp) { int retval;

struct cpuset *parent = parent_cs(cs);

retval = validate_change(cs, trialcs); if ((retval == -EINVAL) && cpuset_v2()) {
struct cgroup_subsys_state *css;
struct cpuset *cp;
/*

The -EINVAL error code indicates that partition sibling

CPU exclusivity rule has been violated. We still allow

the cpumask change to proceed while invalidating the
 * partition. However, any conflicting sibling partitions
 * have to be marked as invalid too.
 * partition.
*/ trialcs->prs_err = PERR_NOTEXCL;
rcu_read_lock();
cpuset_for_each_child(cp, css, parent) {
	struct cpumask *xcpus = user_xcpus(trialcs);
	if (is_partition_valid(cp) &&
	    cpumask_intersects(xcpus, cp->effective_xcpus)) {
		rcu_read_unlock();
		update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp);
		rcu_read_lock();
	}
}
rcu_read_unlock();
retval = 0; } return retval;
@@ -2506,8 +2513,15 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, if (alloc_tmpmasks(&tmp)) return -ENOMEM;

compute_trialcs_excpus(trialcs, cs);

trialcs->prs_err = PERR_NONE;
/*
* if there is exclusive CPUs conflict with the siblings,
* we still allow the cpumask change to proceed while
* invalidating the partition.
*/
if (compute_trialcs_excpus(trialcs, cs))
trialcs->prs_err = PERR_NOTEXCL;
else
trialcs->prs_err = PERR_NONE;
retval = cpus_allowed_validate_change(cs, trialcs, &tmp); if (retval < 0) diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh index a17256d9f88a..75154e22c702 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -269,7 +269,7 @@ TEST_MATRIX=( " C0-3:S+ C1-3:S+ C2-3 . X2-3 X3:P2 . . 0 A1:0-2|A2:3|A3:3 A1:P0|A2:P2 3" " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3" " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2:C3 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"

" C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-3|A2:1-3|A3:2-3|B1:2-3 A1:P0|A3:P0|B1:P-2"

" C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-1|A2:1|A3:1|B1:2-3 A1:P0|A3:P0|B1:P2" " C0-3:S+ C1-3:S+ C2-3 C4-5 . . . P2 0 B1:4-5 B1:P2 4-5" " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4" " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2:C1-3 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"

@@ -318,7 +318,7 @@ TEST_MATRIX=( # Invalid to valid local partition direct transition tests " C1-3:S+:P2 X4:P2 . . . . . . 0 A1:1-3|XA1:1-3|A2:1-3:XA2: A1:P2|A2:P-2 1-3" " C1-3:S+:P2 X4:P2 . . . X3:P2 . . 0 A1:1-2|XA1:1-3|A2:3:XA2:3 A1:P2|A2:P2 1-3"

" C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:4-6 A1:P-2|B1:P0"

" C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:5-6 A1:P2|B1:P0" " C0-3:P2 . . C4-6 C0-4:C0-3 . . . 0 A1:0-3|B1:4-6 A1:P2|B1:P0 0-3"

# Local partition invalidation tests @@ -388,10 +388,10 @@ TEST_MATRIX=( " C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:1 A1:P0|A2:P-2" " C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"

# A non-exclusive cpuset.cpus change will invalidate partition and its siblings

" C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P0"

" C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P-1"

" C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P0|B1:P-1"

# A non-exclusive cpuset.cpus change will not invalidate its siblings partition.

" C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:3 A1:P1|B1:P0"

" C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P-1|B1:P1"

" C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P0|B1:P1"

# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it " C0-3 . . C4-5 X5 . . . 0 A1:0-3|B1:4-5"

days inactive

days old

linux-kselftest-mirror@lists.linaro.org

51 comments

participants

tags (0)

participants (4)

Chen Ridong
Michal Koutný
Sun Shaojie
Waiman Long