This patch series is inspired by the cpuset patch sent by Sun Shaojie [1]. The idea is to avoid invalidating sibling partitions when there is a cpuset.cpus conflict. However this patch series does it in a slightly different way to make its behavior more consistent with other cpuset properties.
The first 3 patches are just some cleanup and minor bug fixes on issues found during the investigation process. The last one is the major patch that changes the way cpuset.cpus is being handled during the partition creation process. Instead of invalidating sibling partitions when there is a conflict, it will strip out the conflicting exclusive CPUs and assign the remaining non-conflicting exclusive CPUs to the new partition unless there is no more CPU left which will fail the partition creation process. It is similar to the idea that cpuset.cpus.effective may only contain a subset of CPUs specified in cpuset.cpus. So cpuset.cpus.exclusive.effective may contain only a subset of cpuset.cpus when a partition is created without setting cpuset.cpus.exclusive.
Even setting cpuset.cpus.exclusive instead of cpuset.cpus may not guarantee all the requested CPUs can be granted if parent doesn't have access to some of those exclusive CPUs. The difference is that conflicts from siblings is not possible with cpuset.cpus.exclusive as long as it can be set successfully without failure.
[1] https://lore.kernel.org/lkml/20251117015708.977585-1-sunshaojie@kylinos.cn/
Waiman Long (4): cgroup/cpuset: Streamline rm_siblings_excl_cpus() cgroup/cpuset: Consistently compute effective_xcpus in update_cpumasks_hier() cgroup/cpuset: Don't fail cpuset.cpus change in v2 cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict
kernel/cgroup/cpuset-internal.h | 3 + kernel/cgroup/cpuset-v1.c | 19 +++ kernel/cgroup/cpuset.c | 135 +++++++----------- .../selftests/cgroup/test_cpuset_prs.sh | 26 +++- 4 files changed, 93 insertions(+), 90 deletions(-)
If exclusive_cpus is set, effective_xcpus must be a subset of exclusive_cpus. Currently, rm_siblings_excl_cpus() checks both exclusive_cpus and effective_xcpus connectively. It is simpler to check only exclusive_cpus if non-empty or just effective_xcpus otherwise.
No functional change is expected.
Signed-off-by: Waiman Long longman@redhat.com --- kernel/cgroup/cpuset.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 221da921b4f9..3d2d28f0fd03 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1355,23 +1355,24 @@ static int rm_siblings_excl_cpus(struct cpuset *parent, struct cpuset *cs, int retval = 0;
if (cpumask_empty(excpus)) - return retval; + return 0;
/* * Exclude exclusive CPUs from siblings */ rcu_read_lock(); cpuset_for_each_child(sibling, css, parent) { + struct cpumask *sibling_xcpus; + if (sibling == cs) continue;
- if (cpumask_intersects(excpus, sibling->exclusive_cpus)) { - cpumask_andnot(excpus, excpus, sibling->exclusive_cpus); - retval++; - continue; - } - if (cpumask_intersects(excpus, sibling->effective_xcpus)) { - cpumask_andnot(excpus, excpus, sibling->effective_xcpus); + sibling_xcpus = cpumask_empty(sibling->exclusive_cpus) + ? sibling->effective_xcpus + : sibling->exclusive_cpus; + + if (cpumask_intersects(excpus, sibling_xcpus)) { + cpumask_andnot(excpus, excpus, sibling_xcpus); retval++; } }
Commit fe8cd2736e75 ("cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE until valid partition") introduced a new check to disallow the setting of a new cpuset.cpus.exclusive value that is a superset of a sibling's cpuset.cpus value so that there will at least be one CPU left in the sibling in case the cpuset becomes a valid partition root. This new check does have the side effect of failing a cpuset.cpus change that make it a subset of a sibling's cpuset.cpus.exclusive value.
With v2, users are supposed to be allowed to set whatever value they want in cpuset.cpus without failure. To maintain this rule, the check is now restricted to only when cpuset.cpus.exclusive is being changed not when cpuset.cpus is changed.
Signed-off-by: Waiman Long longman@redhat.com --- kernel/cgroup/cpuset.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 850334dbc36a..83bf6b588e5f 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -609,33 +609,31 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2)
/** * cpus_excl_conflict - Check if two cpusets have exclusive CPU conflicts - * @cs1: first cpuset to check - * @cs2: second cpuset to check + * @trial: the trial cpuset to be checked + * @sibling: a sibling cpuset to be checked against + * @new_xcpus: new exclusive_cpus in trial cpuset * * Returns: true if CPU exclusivity conflict exists, false otherwise * * Conflict detection rules: * 1. If either cpuset is CPU exclusive, they must be mutually exclusive * 2. exclusive_cpus masks cannot intersect between cpusets - * 3. The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs + * 3. The allowed CPUs of a sibling cpuset cannot be a subset of the new exclusive CPUs */ -static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) +static inline bool cpus_excl_conflict(struct cpuset *trial, struct cpuset *sibling, + bool new_xcpus) { /* If either cpuset is exclusive, check if they are mutually exclusive */ - if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2)) - return !cpusets_are_exclusive(cs1, cs2); + if (is_cpu_exclusive(trial) || is_cpu_exclusive(sibling)) + return !cpusets_are_exclusive(trial, sibling);
/* Exclusive_cpus cannot intersect */ - if (cpumask_intersects(cs1->exclusive_cpus, cs2->exclusive_cpus)) + if (cpumask_intersects(trial->exclusive_cpus, sibling->exclusive_cpus)) return true;
- /* The cpus_allowed of one cpuset cannot be a subset of another cpuset's exclusive_cpus */ - if (!cpumask_empty(cs1->cpus_allowed) && - cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus)) - return true; - - if (!cpumask_empty(cs2->cpus_allowed) && - cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus)) + /* The cpus_allowed of a sibling cpuset cannot be a subset of the new exclusive_cpus */ + if (new_xcpus && !cpumask_empty(sibling->cpus_allowed) && + cpumask_subset(sibling->cpus_allowed, trial->exclusive_cpus)) return true;
return false; @@ -672,6 +670,7 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial) { struct cgroup_subsys_state *css; struct cpuset *c, *par; + bool new_xcpus; int ret = 0;
rcu_read_lock(); @@ -728,10 +727,11 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial) * overlap. exclusive_cpus cannot overlap with each other if set. */ ret = -EINVAL; + new_xcpus = !cpumask_equal(cur->exclusive_cpus, trial->exclusive_cpus); cpuset_for_each_child(c, css, par) { if (c == cur) continue; - if (cpus_excl_conflict(trial, c)) + if (cpus_excl_conflict(trial, c, new_xcpus)) goto out; if (mems_excl_conflict(trial, c)) goto out;
Currently, when setting a cpuset's cpuset.cpus to a value that conflicts with the cpuset.cpus/cpuset.cpus.exclusive of a sibling partition, the sibling's partition state becomes invalid. This is overly harsh and is probably not necessary.
The cpuset.cpus.exclusive control file, if set, will override the cpuset.cpus of the same cpuset when creating a cpuset partition. So cpuset.cpus has less priority than cpuset.cpus.exclusive in setting up a partition. However, it cannot override a conflicting cpuset.cpus file in a sibling cpuset and the partition creation process will fail. This is inconsistent. That will also make using cpuset.cpus.exclusive less valuable as a tool to set up cpuset partitions as the users have to check if such a cpuset.cpus conflict exists or not.
Fix these problems by strictly adhering to the setting of the following control files in descending order of priority when setting up a partition.
1. cpuset.cpus.exclusive.effective of a valid partition 2. cpuset.cpus.exclusive 3. cpuset.cpus
So once a cpuset.cpus.exclusive is set without failure, it will always be allowed to form a valid partition as long as at least one CPU can be granted from its parent irrespective of the state of the siblings' cpuset.cpus values. Of course, setting cpuset.cpus.exclusive will fail if it conflicts with the cpuset.cpus.exclusive or the cpuset.cpus.exclusive.effective value of a sibling.
Partition can still be created by setting only cpuset.cpus without setting cpuset.cpus.exclusive. However, any conflicting CPUs in sibling's cpuset.cpus.exclusive.effective and cpuset.cpus.exclusive values will be removed from its cpuset.cpus.exclusive.effective as long as there is still one or more CPUs left and can be granted from its parent. This CPU stripping is currently done in rm_siblings_excl_cpus().
The new code will now try its best to enable the creation of new partitions with only cpuset.cpus set without invalidating existing ones. However it is not guaranteed that all the CPUs requested in cpuset.cpus will be used in the new partition even when all these CPUs can be granted from the parent.
This is similar to the fact that cpuset.cpus.effective may not be able to include all the CPUs requested in cpuset.cpus. In this case, the parent may not able to grant all the exclusive CPUs requested in cpuset.cpus to cpuset.cpus.exclusive.effective if some of them have already been granted to other partitions earlier.
With the creation of multiple sibling partitions by setting only cpuset.cpus, this does have the side effect that their exact cpuset.cpus.exclusive.effective settings will depend on the order of partition creation if there are conflicts.
For example, # echo "0-2" > A1/cpuset.cpus # echo "root" > A1/cpuset.cpus.partition # echo A1/cpuset.cpus.partition root # echo A1/cpuset.cpus.exclusive.effective 0-2 # echo "2-4" > B1/cpuset.cpus # echo "root" > B1/cpuset.cpus.partition # echo B1/cpuset.cpus.partition root # echo B1/cpuset.cpus.exclusive.effective 3-4 # echo B1/cpuset.cpus.effective 3-4
For users who want to be sure that they can get all the CPUs they want, cpuset.cpus.exclusive should be used instead if they can set cpuset.cpus.exclusive without failure.
To make this change, we have to separate out the is_cpu_exclusive() check in cpus_excl_conflict() into a cgroup v1 only cpuset1_cpus_excl_conflict() helper. The cpus_allowed_validate_change() helper is also no longer need and can now be removed.
Some existing tests in test_cpuset_prs.sh are updated and new ones are added to reflect the new behavior.
Reported-by: Sun Shaojie sunshaojie@kylinos.cn Closes: https://lore.kernel.org/lkml/20251117015708.977585-1-sunshaojie@kylinos.cn/ Signed-off-by: Waiman Long longman@redhat.com --- kernel/cgroup/cpuset-internal.h | 3 + kernel/cgroup/cpuset-v1.c | 19 +++++ kernel/cgroup/cpuset.c | 81 +++++++------------ .../selftests/cgroup/test_cpuset_prs.sh | 26 ++++-- 4 files changed, 68 insertions(+), 61 deletions(-)
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h index e718a4f54360..e8e2683cb067 100644 --- a/kernel/cgroup/cpuset-internal.h +++ b/kernel/cgroup/cpuset-internal.h @@ -312,6 +312,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs, struct cpumask *new_cpus, nodemask_t *new_mems, bool cpus_updated, bool mems_updated); int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial); +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2); void cpuset1_init(struct cpuset *cs); void cpuset1_online_css(struct cgroup_subsys_state *css); int cpuset1_generate_sched_domains(cpumask_var_t **domains, @@ -326,6 +327,8 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs, bool cpus_updated, bool mems_updated) {} static inline int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) { return 0; } +static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, + struct cpuset *cs2) { return false; } static inline void cpuset1_init(struct cpuset *cs) {} static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {} static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains, diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c index ecfea7800f0d..04124c38a774 100644 --- a/kernel/cgroup/cpuset-v1.c +++ b/kernel/cgroup/cpuset-v1.c @@ -373,6 +373,25 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) return ret; }
+/* + * cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU conflicts + * to legacy (v1) + * @cs1: first cpuset to check + * @cs2: second cpuset to check + * + * Returns: true if CPU exclusivity conflict exists, false otherwise + * + * If either cpuset is CPU exclusive, their allowed CPUs cannot intersect. + */ +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) +{ + if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2)) + return cpumask_intersects(cs1->cpus_allowed, + cs2->cpus_allowed); + + return false; +} + #ifdef CONFIG_PROC_PID_CPUSET /* * proc_cpuset_show() diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 83bf6b588e5f..c7e54d090c7b 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -129,6 +129,18 @@ static bool force_sd_rebuild; * For simplicity, a local partition can be created under a local or remote * partition but a remote partition cannot have any partition root in its * ancestor chain except the cgroup root. + * + * A valid partition can be formed by setting either cpus_allowed or + * exclusive_cpus. If there are exclusive CPU conflicts, the conflicting + * CPUs will be assigned to the effective_xcpus of the partition according + * to the appearance of those CPUs in cpumasks (in descending order of + * priority). + * 1. effective_xcpus of a valid partition + * 2. exclusive_cpus + * 3. cpus_allowed + * + * exclusive_cpus should be used for setting up partition if the users want + * to get as many CPUs as possible. */ #define PRS_MEMBER 0 #define PRS_ROOT 1 @@ -616,27 +628,25 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2) * Returns: true if CPU exclusivity conflict exists, false otherwise * * Conflict detection rules: - * 1. If either cpuset is CPU exclusive, they must be mutually exclusive - * 2. exclusive_cpus masks cannot intersect between cpusets - * 3. The allowed CPUs of a sibling cpuset cannot be a subset of the new exclusive CPUs + * o cgroup v1 + * See cpuset1_cpus_excl_conflict() + * o cgroup v2 + * - The exclusive_cpus values cannot overlap. + * - New exclusive_cpus cannot be a superset of a sibling's cpus_allowed. */ static inline bool cpus_excl_conflict(struct cpuset *trial, struct cpuset *sibling, bool new_xcpus) { - /* If either cpuset is exclusive, check if they are mutually exclusive */ - if (is_cpu_exclusive(trial) || is_cpu_exclusive(sibling)) - return !cpusets_are_exclusive(trial, sibling); - - /* Exclusive_cpus cannot intersect */ - if (cpumask_intersects(trial->exclusive_cpus, sibling->exclusive_cpus)) - return true; + if (!cpuset_v2()) + return cpuset1_cpus_excl_conflict(trial, sibling);
/* The cpus_allowed of a sibling cpuset cannot be a subset of the new exclusive_cpus */ if (new_xcpus && !cpumask_empty(sibling->cpus_allowed) && cpumask_subset(sibling->cpus_allowed, trial->exclusive_cpus)) return true;
- return false; + /* Exclusive_cpus cannot intersect */ + return cpumask_intersects(trial->exclusive_cpus, sibling->exclusive_cpus); }
static inline bool mems_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) @@ -2306,43 +2316,6 @@ static enum prs_errcode validate_partition(struct cpuset *cs, struct cpuset *tri return PERR_NONE; }
-static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialcs, - struct tmpmasks *tmp) -{ - int retval; - struct cpuset *parent = parent_cs(cs); - - retval = validate_change(cs, trialcs); - - if ((retval == -EINVAL) && cpuset_v2()) { - struct cgroup_subsys_state *css; - struct cpuset *cp; - - /* - * The -EINVAL error code indicates that partition sibling - * CPU exclusivity rule has been violated. We still allow - * the cpumask change to proceed while invalidating the - * partition. However, any conflicting sibling partitions - * have to be marked as invalid too. - */ - trialcs->prs_err = PERR_NOTEXCL; - rcu_read_lock(); - cpuset_for_each_child(cp, css, parent) { - struct cpumask *xcpus = user_xcpus(trialcs); - - if (is_partition_valid(cp) && - cpumask_intersects(xcpus, cp->effective_xcpus)) { - rcu_read_unlock(); - update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp); - rcu_read_lock(); - } - } - rcu_read_unlock(); - retval = 0; - } - return retval; -} - /** * partition_cpus_change - Handle partition state changes due to CPU mask updates * @cs: The target cpuset being modified @@ -2402,15 +2375,15 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, if (cpumask_equal(cs->cpus_allowed, trialcs->cpus_allowed)) return 0;
- if (alloc_tmpmasks(&tmp)) - return -ENOMEM; - compute_trialcs_excpus(trialcs, cs); trialcs->prs_err = PERR_NONE;
- retval = cpus_allowed_validate_change(cs, trialcs, &tmp); + retval = validate_change(cs, trialcs); if (retval < 0) - goto out_free; + return retval; + + if (alloc_tmpmasks(&tmp)) + return -ENOMEM;
/* * Check all the descendants in update_cpumasks_hier() if @@ -2433,7 +2406,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, /* Update CS_SCHED_LOAD_BALANCE and/or sched_domains, if necessary */ if (cs->partition_root_state) update_partition_sd_lb(cs, old_prs); -out_free: + free_tmpmasks(&tmp); return retval; } diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh index a17256d9f88a..ff4540b0490e 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -269,7 +269,7 @@ TEST_MATRIX=( " C0-3:S+ C1-3:S+ C2-3 . X2-3 X3:P2 . . 0 A1:0-2|A2:3|A3:3 A1:P0|A2:P2 3" " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3" " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2:C3 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3" - " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-3|A2:1-3|A3:2-3|B1:2-3 A1:P0|A3:P0|B1:P-2" + " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-1|A2:1|A3:1|B1:2-3 A1:P0|A3:P0|B1:P2" " C0-3:S+ C1-3:S+ C2-3 C4-5 . . . P2 0 B1:4-5 B1:P2 4-5" " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4" " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2:C1-3 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4" @@ -318,7 +318,7 @@ TEST_MATRIX=( # Invalid to valid local partition direct transition tests " C1-3:S+:P2 X4:P2 . . . . . . 0 A1:1-3|XA1:1-3|A2:1-3:XA2: A1:P2|A2:P-2 1-3" " C1-3:S+:P2 X4:P2 . . . X3:P2 . . 0 A1:1-2|XA1:1-3|A2:3:XA2:3 A1:P2|A2:P2 1-3" - " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:4-6 A1:P-2|B1:P0" + " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:5-6 A1:P2|B1:P0" " C0-3:P2 . . C4-6 C0-4:C0-3 . . . 0 A1:0-3|B1:4-6 A1:P2|B1:P0 0-3"
# Local partition invalidation tests @@ -388,10 +388,10 @@ TEST_MATRIX=( " C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:1 A1:P0|A2:P-2" " C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"
- # A non-exclusive cpuset.cpus change will invalidate partition and its siblings - " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P0" - " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P-1" - " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P0|B1:P-1" + # A non-exclusive cpuset.cpus change will not invalidate its siblings partition. + " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:3 A1:P1|B1:P0" + " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|XA1:0-1|B1:2-3 A1:P1|B1:P1" + " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P0|B1:P1"
# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it " C0-3 . . C4-5 X5 . . . 0 A1:0-3|B1:4-5" @@ -417,6 +417,14 @@ TEST_MATRIX=( " CX1-4:S+ CX2-4:P2 . C5-6 . . . P1:C3-6 0 A1:1|A2:2-4|B1:5-6 \ A1:P0|A2:P2:B1:P-1 2-4"
+ # When multiple partitions with conflicting cpuset.cpus are created, the + # latter created ones will only get what are left of the available exclusive + # CPUs. + " C1-3:P1 . . . . . . C3-5:P1 0 A1:1-3|B1:4-5:XB1:4-5 A1:P1|B1:P1" + + # cpuset.cpus can be set to a subset of sibling's cpuset.cpus.exclusive + " C1-3:X1-3 . . C4-5 . . . C1-2 0 A1:1-3|B1:1-2" + # old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate ISOLCPUS # ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ -------- # Failure cases: @@ -427,7 +435,7 @@ TEST_MATRIX=( # Changes to cpuset.cpus.exclusive that violate exclusivity rule is rejected " C0-3 . . C4-5 X0-3 . . X3-5 1 A1:0-3|B1:4-5"
- # cpuset.cpus cannot be a subset of sibling cpuset.cpus.exclusive + # cpuset.cpus.exclusive cannot be set to a superset of sibling's cpuset.cpus " C0-3 . . C4-5 X3-5 . . . 1 A1:0-3|B1:4-5" )
@@ -477,6 +485,10 @@ REMOTE_TEST_MATRIX=( . . X1-2:P2 X4-5:P1 . X1-7:P2 p1:3|c11:1-2|c12:4:c22:5-6 \ p1:P0|p2:P1|c11:P2|c12:P1|c22:P2 \ 1-2,4-6|1-2,5-6" + # c12 whose cpuset.cpus CPUs are all granted to c11 will become invalid partition + " C1-5:P1:S+ . C1-4:P1 C2-3 . . \ + . . . P1 . . p1:5|c11:1-4|c12:5 \ + p1:P1|c11:P1|c12:P-1" )
#
Since commit f62a5d39368e ("cgroup/cpuset: Remove remote_partition_check() & make update_cpumasks_hier() handle remote partition"), the compute_effective_exclusive_cpumask() helper was extended to strip exclusive CPUs from siblings when computing effective_xcpus (cpuset.cpus.exclusive.effective). This helper was later renamed to compute_excpus() in commit 86bbbd1f33ab ("cpuset: Refactor exclusive CPU mask computation logic").
This helper is supposed to be used consistently to compute effective_xcpus. However, there is an exception within the callback critical section in update_cpumasks_hier() when exclusive_cpus of a valid partition root is empty. This can cause effective_xcpus value to differ depending on where exactly it is last computed. Fix this by using compute_excpus() in this case to give a consistent result.
Signed-off-by: Waiman Long longman@redhat.com --- kernel/cgroup/cpuset.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 3d2d28f0fd03..850334dbc36a 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2050,6 +2050,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp, struct cpuset *parent = parent_cs(cp); bool remote = is_remote_partition(cp); bool update_parent = false; + bool empty_xcpus;
old_prs = new_prs = cp->partition_root_state;
@@ -2160,20 +2161,14 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp, new_prs = cp->partition_root_state; }
+ empty_xcpus = cpumask_empty(cp->exclusive_cpus); spin_lock_irq(&callback_lock); cpumask_copy(cp->effective_cpus, tmp->new_cpus); cp->partition_root_state = new_prs; - if (!cpumask_empty(cp->exclusive_cpus) && (cp != cs)) + if (((new_prs > 0) && empty_xcpus) || + ((cp != cs) && !empty_xcpus)) compute_excpus(cp, cp->effective_xcpus); - - /* - * Make sure effective_xcpus is properly set for a valid - * partition root. - */ - if ((new_prs > 0) && cpumask_empty(cp->exclusive_cpus)) - cpumask_and(cp->effective_xcpus, - cp->cpus_allowed, parent->effective_xcpus); - else if (new_prs < 0) + if (new_prs < 0) reset_partition_data(cp); spin_unlock_irq(&callback_lock);
On 2025/12/25 15:30, Waiman Long wrote:
If exclusive_cpus is set, effective_xcpus must be a subset of exclusive_cpus. Currently, rm_siblings_excl_cpus() checks both exclusive_cpus and effective_xcpus connectively. It is simpler to check only exclusive_cpus if non-empty or just effective_xcpus otherwise.
No functional change is expected.
Signed-off-by: Waiman Long longman@redhat.com
kernel/cgroup/cpuset.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 221da921b4f9..3d2d28f0fd03 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1355,23 +1355,24 @@ static int rm_siblings_excl_cpus(struct cpuset *parent, struct cpuset *cs, int retval = 0; if (cpumask_empty(excpus))
return retval;
return 0;/* * Exclude exclusive CPUs from siblings */ rcu_read_lock(); cpuset_for_each_child(sibling, css, parent) {
struct cpumask *sibling_xcpus;- if (sibling == cs) continue;
if (cpumask_intersects(excpus, sibling->exclusive_cpus)) {cpumask_andnot(excpus, excpus, sibling->exclusive_cpus);retval++;continue;}if (cpumask_intersects(excpus, sibling->effective_xcpus)) {cpumask_andnot(excpus, excpus, sibling->effective_xcpus);
sibling_xcpus = cpumask_empty(sibling->exclusive_cpus)? sibling->effective_xcpus: sibling->exclusive_cpus;
I'm wondering if this is sufficient?
sibling_xcpus = sibling->effective_xcpus
p(exclusive_cpus = 1) / \ a b(root, exclusive_cpus=1-7, effective_xcpus=1)
What the sibling's effective exclusive CPUs actually should be is not CPUs 1-7 but CPU 1. So, do we need to remove CPUs 2-7?
if (cpumask_intersects(excpus, sibling_xcpus)) { } }cpumask_andnot(excpus, excpus, sibling_xcpus); retval++;
On 2025/12/25 15:30, Waiman Long wrote:
Commit fe8cd2736e75 ("cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE until valid partition") introduced a new check to disallow the setting of a new cpuset.cpus.exclusive value that is a superset of a sibling's cpuset.cpus value so that there will at least be one CPU left in the sibling in case the cpuset becomes a valid partition root. This new check does have the side effect of failing a cpuset.cpus change that make it a subset of a sibling's cpuset.cpus.exclusive value.
With v2, users are supposed to be allowed to set whatever value they want in cpuset.cpus without failure. To maintain this rule, the check is now restricted to only when cpuset.cpus.exclusive is being changed not when cpuset.cpus is changed.
Signed-off-by: Waiman Long longman@redhat.com
kernel/cgroup/cpuset.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 850334dbc36a..83bf6b588e5f 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -609,33 +609,31 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2) /**
- cpus_excl_conflict - Check if two cpusets have exclusive CPU conflicts
- @cs1: first cpuset to check
- @cs2: second cpuset to check
- @trial: the trial cpuset to be checked
- @sibling: a sibling cpuset to be checked against
- @new_xcpus: new exclusive_cpus in trial cpuset
Can we rename it to xcpus_changed?
The current name new_xcpus gives me the impression that CPUs are being added. For example: if exclusive_cpus is 1, and it changes to 1-7, then new_xcpus would be 2-7.
- Returns: true if CPU exclusivity conflict exists, false otherwise
- Conflict detection rules:
- If either cpuset is CPU exclusive, they must be mutually exclusive
- exclusive_cpus masks cannot intersect between cpusets
- The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs
*/
- The allowed CPUs of a sibling cpuset cannot be a subset of the new exclusive CPUs
-static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) +static inline bool cpus_excl_conflict(struct cpuset *trial, struct cpuset *sibling,
bool new_xcpus){ /* If either cpuset is exclusive, check if they are mutually exclusive */
- if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
return !cpusets_are_exclusive(cs1, cs2);
- if (is_cpu_exclusive(trial) || is_cpu_exclusive(sibling))
return !cpusets_are_exclusive(trial, sibling);/* Exclusive_cpus cannot intersect */
- if (cpumask_intersects(cs1->exclusive_cpus, cs2->exclusive_cpus))
- if (cpumask_intersects(trial->exclusive_cpus, sibling->exclusive_cpus)) return true;
- /* The cpus_allowed of one cpuset cannot be a subset of another cpuset's exclusive_cpus */
- if (!cpumask_empty(cs1->cpus_allowed) &&
cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))return true;- if (!cpumask_empty(cs2->cpus_allowed) &&
cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus))
- /* The cpus_allowed of a sibling cpuset cannot be a subset of the new exclusive_cpus */
- if (new_xcpus && !cpumask_empty(sibling->cpus_allowed) &&
return true;cpumask_subset(sibling->cpus_allowed, trial->exclusive_cpus))return false; @@ -672,6 +670,7 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial) { struct cgroup_subsys_state *css; struct cpuset *c, *par;
- bool new_xcpus; int ret = 0;
rcu_read_lock(); @@ -728,10 +727,11 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial) * overlap. exclusive_cpus cannot overlap with each other if set. */ ret = -EINVAL;
- new_xcpus = !cpumask_equal(cur->exclusive_cpus, trial->exclusive_cpus); cpuset_for_each_child(c, css, par) { if (c == cur) continue;
if (cpus_excl_conflict(trial, c))
if (mems_excl_conflict(trial, c)) goto out;if (cpus_excl_conflict(trial, c, new_xcpus)) goto out;
On 2025/12/25 15:30, Waiman Long wrote:
Commit fe8cd2736e75 ("cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE until valid partition") introduced a new check to disallow the setting of a new cpuset.cpus.exclusive value that is a superset of a sibling's cpuset.cpus value so that there will at least be one CPU left in the sibling in case the cpuset becomes a valid partition root. This new check does have the side effect of failing a cpuset.cpus change that make it a subset of a sibling's cpuset.cpus.exclusive value.
With v2, users are supposed to be allowed to set whatever value they want in cpuset.cpus without failure. To maintain this rule, the check is now restricted to only when cpuset.cpus.exclusive is being changed not when cpuset.cpus is changed.
Signed-off-by: Waiman Long longman@redhat.com
kernel/cgroup/cpuset.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 850334dbc36a..83bf6b588e5f 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -609,33 +609,31 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2) /**
- cpus_excl_conflict - Check if two cpusets have exclusive CPU conflicts
- @cs1: first cpuset to check
- @cs2: second cpuset to check
- @trial: the trial cpuset to be checked
- @sibling: a sibling cpuset to be checked against
- @new_xcpus: new exclusive_cpus in trial cpuset
- Returns: true if CPU exclusivity conflict exists, false otherwise
- Conflict detection rules:
- If either cpuset is CPU exclusive, they must be mutually exclusive
- exclusive_cpus masks cannot intersect between cpusets
- The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs
*/
- The allowed CPUs of a sibling cpuset cannot be a subset of the new exclusive CPUs
-static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) +static inline bool cpus_excl_conflict(struct cpuset *trial, struct cpuset *sibling,
bool new_xcpus){ /* If either cpuset is exclusive, check if they are mutually exclusive */
- if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
return !cpusets_are_exclusive(cs1, cs2);
- if (is_cpu_exclusive(trial) || is_cpu_exclusive(sibling))
return !cpusets_are_exclusive(trial, sibling);/* Exclusive_cpus cannot intersect */
- if (cpumask_intersects(cs1->exclusive_cpus, cs2->exclusive_cpus))
- if (cpumask_intersects(trial->exclusive_cpus, sibling->exclusive_cpus)) return true;
- /* The cpus_allowed of one cpuset cannot be a subset of another cpuset's exclusive_cpus */
- if (!cpumask_empty(cs1->cpus_allowed) &&
cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))return true;- if (!cpumask_empty(cs2->cpus_allowed) &&
cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus))
- /* The cpus_allowed of a sibling cpuset cannot be a subset of the new exclusive_cpus */
- if (new_xcpus && !cpumask_empty(sibling->cpus_allowed) &&
return true;cpumask_subset(sibling->cpus_allowed, trial->exclusive_cpus))return false; @@ -672,6 +670,7 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial) { struct cgroup_subsys_state *css; struct cpuset *c, *par;
- bool new_xcpus; int ret = 0;
rcu_read_lock(); @@ -728,10 +727,11 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial) * overlap. exclusive_cpus cannot overlap with each other if set. */ ret = -EINVAL;
- new_xcpus = !cpumask_equal(cur->exclusive_cpus, trial->exclusive_cpus); cpuset_for_each_child(c, css, par) { if (c == cur) continue;
if (cpus_excl_conflict(trial, c))
if (mems_excl_conflict(trial, c)) goto out;if (cpus_excl_conflict(trial, c, new_xcpus)) goto out;
validate_change() is also called from cpuset_update_flag(), which may not change any cpus_allowed or exclusive_cpus. This could lead to incorrect checks.
i.e,
# cd /sys/fs/cgroup/ # mkdir a # mkdir b # echo 1-2 > b/cpuset.cpus.exclusive -- no conflict with a # echo 1 > a/cpuset.cpus # echo root > b/cpuset.cpus.partition -- b becomes root partition, conflict with a, but exclusive_cpus unchanged # cat b/cpuset.cpus.partition root
As a result, cpuset a (as a member) contains CPU 1, which is a subset of partition b's exclusive CPUs — a conflict that might be missed.
linux-kselftest-mirror@lists.linaro.org