This is a resend of
https://lore.kernel.org/stable/20230308162207.2886641-1-qyousef@layalina.io/
Which was dropped because of build errors on 5.10 equivalent backport. I extended the testing to make sure this series is not impacted like 5.10 backport. And update the cover letter to clarify there's no need to take further backports which removes capacity inversion detection.
Portion of the fixes were ported in 5.15 but missed some.
This ports the remainder of the fixes.
Based on 5.15.98.
a2e90611b9f4 ("sched/fair: Remove capacity inversion detection") is not necessary to backport because it has a dependency on e5ed0550c04c ("sched/fair: unlink misfit task from cpu overutilized") which is nice to have but not strictly required. It improves the search for best CPU under adverse thermal pressure to try harder. And the new search effectively replaces the capacity inversion detection, so it is removed afterwards.
Build tested on (cross compile when necessary; x86_64 otherwise):
1. default ubuntu config which has uclamp + smp 2. default ubuntu config without uclamp + smp 3. default ubunto config without smp (which automatically disables uclamp) 4. reported riscv-allnoconfig, mips-randconfig, x86_64-randocnfigs
Boot tested on android 5.15 GKI with slight modifications due to other conflicts there. I need more time to be able to do full functional testing on 5.15 - but since some patches were already taken - posting the remainder now.
Sorry due to job/email change I missed the emails when the other backports were partially taken.
Qais Yousef (7): sched/uclamp: Fix fits_capacity() check in feec() sched/uclamp: Make cpu_overutilized() use util_fits_cpu() sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early exit condition sched/fair: Detect capacity inversion sched/fair: Consider capacity inversion in util_fits_cpu() sched/uclamp: Fix a uninitialized variable warnings sched/fair: Fixes for capacity inversion detection
kernel/sched/core.c | 10 ++-- kernel/sched/fair.c | 128 +++++++++++++++++++++++++++++++++++++------ kernel/sched/sched.h | 61 ++++++++++++++++++++- 3 files changed, 174 insertions(+), 25 deletions(-)
From: Qais Yousef qais.yousef@arm.com
commit 244226035a1f9b2b6c326e55ae5188fab4f428cb upstream.
As reported by Yun Hsiang [1], if a task has its uclamp_min >= 0.8 * 1024, it'll always pick the previous CPU because fits_capacity() will always return false in this case.
The new util_fits_cpu() logic should handle this correctly for us beside more corner cases where similar failures could occur, like when using UCLAMP_MAX.
We open code uclamp_rq_util_with() except for the clamp() part, util_fits_cpu() needs the 'raw' values to be passed to it.
Also introduce uclamp_rq_{set, get}() shorthand accessors to get uclamp value for the rq. Makes the code more readable and ensures the right rules (use READ_ONCE/WRITE_ONCE) are respected transparently.
[1] https://lists.linaro.org/pipermail/eas-dev/2020-July/001488.html
Fixes: 1d42509e475c ("sched/fair: Make EAS wakeup placement consider uclamp restrictions") Reported-by: Yun Hsiang hsiang023167@gmail.com Signed-off-by: Qais Yousef qais.yousef@arm.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lore.kernel.org/r/20220804143609.515789-4-qais.yousef@arm.com (cherry picked from commit 244226035a1f9b2b6c326e55ae5188fab4f428cb) [Conflict in kernel/sched/fair.c mainly due to new automatic variables being added on master vs 5.15] Signed-off-by: Qais Yousef (Google) qyousef@layalina.io --- kernel/sched/core.c | 10 +++++----- kernel/sched/fair.c | 26 ++++++++++++++++++++++++-- kernel/sched/sched.h | 42 +++++++++++++++++++++++++++++++++++++++--- 3 files changed, 68 insertions(+), 10 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c1458fa8beb3..313d09020161 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1335,7 +1335,7 @@ static inline void uclamp_idle_reset(struct rq *rq, enum uclamp_id clamp_id, if (!(rq->uclamp_flags & UCLAMP_FLAG_IDLE)) return;
- WRITE_ONCE(rq->uclamp[clamp_id].value, clamp_value); + uclamp_rq_set(rq, clamp_id, clamp_value); }
static inline @@ -1513,8 +1513,8 @@ static inline void uclamp_rq_inc_id(struct rq *rq, struct task_struct *p, if (bucket->tasks == 1 || uc_se->value > bucket->value) bucket->value = uc_se->value;
- if (uc_se->value > READ_ONCE(uc_rq->value)) - WRITE_ONCE(uc_rq->value, uc_se->value); + if (uc_se->value > uclamp_rq_get(rq, clamp_id)) + uclamp_rq_set(rq, clamp_id, uc_se->value); }
/* @@ -1580,7 +1580,7 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p, if (likely(bucket->tasks)) return;
- rq_clamp = READ_ONCE(uc_rq->value); + rq_clamp = uclamp_rq_get(rq, clamp_id); /* * Defensive programming: this should never happen. If it happens, * e.g. due to future modification, warn and fixup the expected value. @@ -1588,7 +1588,7 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p, SCHED_WARN_ON(bucket->value > rq_clamp); if (bucket->value >= rq_clamp) { bkt_clamp = uclamp_rq_max_value(rq, clamp_id, uc_se->value); - WRITE_ONCE(uc_rq->value, bkt_clamp); + uclamp_rq_set(rq, clamp_id, bkt_clamp); } }
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6648683cd964..1f7db97b531e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6938,6 +6938,8 @@ compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd) static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) { unsigned long prev_delta = ULONG_MAX, best_delta = ULONG_MAX; + unsigned long p_util_min = uclamp_is_used() ? uclamp_eff_value(p, UCLAMP_MIN) : 0; + unsigned long p_util_max = uclamp_is_used() ? uclamp_eff_value(p, UCLAMP_MAX) : 1024; struct root_domain *rd = cpu_rq(smp_processor_id())->rd; int cpu, best_energy_cpu = prev_cpu, target = -1; unsigned long cpu_cap, util, base_energy = 0; @@ -6967,6 +6969,8 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
for (; pd; pd = pd->next) { unsigned long cur_delta, spare_cap, max_spare_cap = 0; + unsigned long rq_util_min, rq_util_max; + unsigned long util_min, util_max; bool compute_prev_delta = false; unsigned long base_energy_pd; int max_spare_cap_cpu = -1; @@ -6987,8 +6991,26 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) * much capacity we can get out of the CPU; this is * aligned with sched_cpu_util(). */ - util = uclamp_rq_util_with(cpu_rq(cpu), util, p); - if (!fits_capacity(util, cpu_cap)) + if (uclamp_is_used()) { + if (uclamp_rq_is_idle(cpu_rq(cpu))) { + util_min = p_util_min; + util_max = p_util_max; + } else { + /* + * Open code uclamp_rq_util_with() except for + * the clamp() part. Ie: apply max aggregation + * only. util_fits_cpu() logic requires to + * operate on non clamped util but must use the + * max-aggregated uclamp_{min, max}. + */ + rq_util_min = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MIN); + rq_util_max = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MAX); + + util_min = max(rq_util_min, p_util_min); + util_max = max(rq_util_max, p_util_max); + } + } + if (!util_fits_cpu(util, util_min, util_max, cpu)) continue;
if (cpu == prev_cpu) { diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index e1f46ed412bc..47d40f8ef1b0 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2855,6 +2855,23 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {} #ifdef CONFIG_UCLAMP_TASK unsigned long uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id);
+static inline unsigned long uclamp_rq_get(struct rq *rq, + enum uclamp_id clamp_id) +{ + return READ_ONCE(rq->uclamp[clamp_id].value); +} + +static inline void uclamp_rq_set(struct rq *rq, enum uclamp_id clamp_id, + unsigned int value) +{ + WRITE_ONCE(rq->uclamp[clamp_id].value, value); +} + +static inline bool uclamp_rq_is_idle(struct rq *rq) +{ + return rq->uclamp_flags & UCLAMP_FLAG_IDLE; +} + /** * uclamp_rq_util_with - clamp @util with @rq and @p effective uclamp values. * @rq: The rq to clamp against. Must not be NULL. @@ -2890,12 +2907,12 @@ unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util, * Ignore last runnable task's max clamp, as this task will * reset it. Similarly, no need to read the rq's min clamp. */ - if (rq->uclamp_flags & UCLAMP_FLAG_IDLE) + if (uclamp_rq_is_idle(rq)) goto out; }
- min_util = max_t(unsigned long, min_util, READ_ONCE(rq->uclamp[UCLAMP_MIN].value)); - max_util = max_t(unsigned long, max_util, READ_ONCE(rq->uclamp[UCLAMP_MAX].value)); + min_util = max_t(unsigned long, min_util, uclamp_rq_get(rq, UCLAMP_MIN)); + max_util = max_t(unsigned long, max_util, uclamp_rq_get(rq, UCLAMP_MAX)); out: /* * Since CPU's {min,max}_util clamps are MAX aggregated considering @@ -2941,6 +2958,25 @@ static inline bool uclamp_is_used(void) { return false; } + +static inline unsigned long uclamp_rq_get(struct rq *rq, + enum uclamp_id clamp_id) +{ + if (clamp_id == UCLAMP_MIN) + return 0; + + return SCHED_CAPACITY_SCALE; +} + +static inline void uclamp_rq_set(struct rq *rq, enum uclamp_id clamp_id, + unsigned int value) +{ +} + +static inline bool uclamp_rq_is_idle(struct rq *rq) +{ + return false; +} #endif /* CONFIG_UCLAMP_TASK */
#ifdef arch_scale_freq_capacity
From: Qais Yousef qais.yousef@arm.com
commit c56ab1b3506ba0e7a872509964b100912bde165d upstream.
So that it is now uclamp aware.
This fixes a major problem of busy tasks capped with UCLAMP_MAX keeping the system in overutilized state which disables EAS and leads to wasting energy in the long run.
Without this patch running a busy background activity like JIT compilation on Pixel 6 causes the system to be in overutilized state 74.5% of the time.
With this patch this goes down to 9.79%.
It also fixes another problem when long running tasks that have their UCLAMP_MIN changed while running such that they need to upmigrate to honour the new UCLAMP_MIN value. The upmigration doesn't get triggered because overutilized state never gets set in this state, hence misfit migration never happens at tick in this case until the task wakes up again.
Fixes: af24bde8df202 ("sched/uclamp: Add uclamp support to energy_compute()") Signed-off-by: Qais Yousef qais.yousef@arm.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lore.kernel.org/r/20220804143609.515789-7-qais.yousef@arm.com (cherry picked from commit c56ab1b3506ba0e7a872509964b100912bde165d) [Fixed trivial conflict in cpu_overutilized() - use cpu_util() instead of cpu_util_cfs()] Signed-off-by: Qais Yousef (Google) qyousef@layalina.io --- kernel/sched/fair.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1f7db97b531e..cd4eccdf3291 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5692,7 +5692,10 @@ static inline unsigned long cpu_util(int cpu);
static inline bool cpu_overutilized(int cpu) { - return !fits_capacity(cpu_util(cpu), capacity_of(cpu)); + unsigned long rq_util_min = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MIN); + unsigned long rq_util_max = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MAX); + + return !util_fits_cpu(cpu_util(cpu), rq_util_min, rq_util_max, cpu); }
static inline void update_overutilized_status(struct rq *rq)
From: Qais Yousef qais.yousef@arm.com
commit d81304bc6193554014d4372a01debdf65e1e9a4d upstream.
If the utilization of the woken up task is 0, we skip the energy calculation because it has no impact.
But if the task is boosted (uclamp_min != 0) will have an impact on task placement and frequency selection. Only skip if the util is truly 0 after applying uclamp values.
Change uclamp_task_cpu() signature to avoid unnecessary additional calls to uclamp_eff_get(). feec() is the only user now.
Fixes: 732cd75b8c920 ("sched/fair: Select an energy-efficient CPU on task wake-up") Signed-off-by: Qais Yousef qais.yousef@arm.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lore.kernel.org/r/20220804143609.515789-8-qais.yousef@arm.com (cherry picked from commit d81304bc6193554014d4372a01debdf65e1e9a4d) Signed-off-by: Qais Yousef (Google) qyousef@layalina.io --- kernel/sched/fair.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index cd4eccdf3291..66939333bdd1 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3974,14 +3974,16 @@ static inline unsigned long task_util_est(struct task_struct *p) }
#ifdef CONFIG_UCLAMP_TASK -static inline unsigned long uclamp_task_util(struct task_struct *p) +static inline unsigned long uclamp_task_util(struct task_struct *p, + unsigned long uclamp_min, + unsigned long uclamp_max) { - return clamp(task_util_est(p), - uclamp_eff_value(p, UCLAMP_MIN), - uclamp_eff_value(p, UCLAMP_MAX)); + return clamp(task_util_est(p), uclamp_min, uclamp_max); } #else -static inline unsigned long uclamp_task_util(struct task_struct *p) +static inline unsigned long uclamp_task_util(struct task_struct *p, + unsigned long uclamp_min, + unsigned long uclamp_max) { return task_util_est(p); } @@ -6967,7 +6969,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) target = prev_cpu;
sync_entity_load_avg(&p->se); - if (!task_util_est(p)) + if (!uclamp_task_util(p, p_util_min, p_util_max)) goto unlock;
for (; pd; pd = pd->next) {
From: Qais Yousef qais.yousef@arm.com
commit 44c7b80bffc3a657a36857098d5d9c49d94e652b upstream.
Check each performance domain to see if thermal pressure is causing its capacity to be lower than another performance domain.
We assume that each performance domain has CPUs with the same capacities, which is similar to an assumption made in energy_model.c
We also assume that thermal pressure impacts all CPUs in a performance domain equally.
If there're multiple performance domains with the same capacity_orig, we will trigger a capacity inversion if the domain is under thermal pressure.
The new cpu_in_capacity_inversion() should help users to know when information about capacity_orig are not reliable and can opt in to use the inverted capacity as the 'actual' capacity_orig.
Signed-off-by: Qais Yousef qais.yousef@arm.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lore.kernel.org/r/20220804143609.515789-9-qais.yousef@arm.com (cherry picked from commit 44c7b80bffc3a657a36857098d5d9c49d94e652b) [fix trivial conflict in kernel/sched/sched.h due to code shuffling] Signed-off-by: Qais Yousef (Google) qyousef@layalina.io --- kernel/sched/fair.c | 63 +++++++++++++++++++++++++++++++++++++++++--- kernel/sched/sched.h | 19 +++++++++++++ 2 files changed, 79 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 66939333bdd1..e472f1849092 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8574,16 +8574,73 @@ static unsigned long scale_rt_capacity(int cpu)
static void update_cpu_capacity(struct sched_domain *sd, int cpu) { + unsigned long capacity_orig = arch_scale_cpu_capacity(cpu); unsigned long capacity = scale_rt_capacity(cpu); struct sched_group *sdg = sd->groups; + struct rq *rq = cpu_rq(cpu);
- cpu_rq(cpu)->cpu_capacity_orig = arch_scale_cpu_capacity(cpu); + rq->cpu_capacity_orig = capacity_orig;
if (!capacity) capacity = 1;
- cpu_rq(cpu)->cpu_capacity = capacity; - trace_sched_cpu_capacity_tp(cpu_rq(cpu)); + rq->cpu_capacity = capacity; + + /* + * Detect if the performance domain is in capacity inversion state. + * + * Capacity inversion happens when another perf domain with equal or + * lower capacity_orig_of() ends up having higher capacity than this + * domain after subtracting thermal pressure. + * + * We only take into account thermal pressure in this detection as it's + * the only metric that actually results in *real* reduction of + * capacity due to performance points (OPPs) being dropped/become + * unreachable due to thermal throttling. + * + * We assume: + * * That all cpus in a perf domain have the same capacity_orig + * (same uArch). + * * Thermal pressure will impact all cpus in this perf domain + * equally. + */ + if (static_branch_unlikely(&sched_asym_cpucapacity)) { + unsigned long inv_cap = capacity_orig - thermal_load_avg(rq); + struct perf_domain *pd = rcu_dereference(rq->rd->pd); + + rq->cpu_capacity_inverted = 0; + + for (; pd; pd = pd->next) { + struct cpumask *pd_span = perf_domain_span(pd); + unsigned long pd_cap_orig, pd_cap; + + cpu = cpumask_any(pd_span); + pd_cap_orig = arch_scale_cpu_capacity(cpu); + + if (capacity_orig < pd_cap_orig) + continue; + + /* + * handle the case of multiple perf domains have the + * same capacity_orig but one of them is under higher + * thermal pressure. We record it as capacity + * inversion. + */ + if (capacity_orig == pd_cap_orig) { + pd_cap = pd_cap_orig - thermal_load_avg(cpu_rq(cpu)); + + if (pd_cap > inv_cap) { + rq->cpu_capacity_inverted = inv_cap; + break; + } + } else if (pd_cap_orig > inv_cap) { + rq->cpu_capacity_inverted = inv_cap; + break; + } + } + } + + trace_sched_cpu_capacity_tp(rq);
sdg->sgc->capacity = capacity; sdg->sgc->min_capacity = capacity; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 47d40f8ef1b0..6312f1904825 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1003,6 +1003,7 @@ struct rq {
unsigned long cpu_capacity; unsigned long cpu_capacity_orig; + unsigned long cpu_capacity_inverted;
struct callback_head *balance_callback;
@@ -2993,6 +2994,24 @@ static inline unsigned long capacity_orig_of(int cpu) return cpu_rq(cpu)->cpu_capacity_orig; }
+/* + * Returns inverted capacity if the CPU is in capacity inversion state. + * 0 otherwise. + * + * Capacity inversion detection only considers thermal impact where actual + * performance points (OPPs) gets dropped. + * + * Capacity inversion state happens when another performance domain that has + * equal or lower capacity_orig_of() becomes effectively larger than the perf + * domain this CPU belongs to due to thermal pressure throttling it hard. + * + * See comment in update_cpu_capacity(). + */ +static inline unsigned long cpu_in_capacity_inversion(int cpu) +{ + return cpu_rq(cpu)->cpu_capacity_inverted; +} + /** * enum cpu_util_type - CPU utilization type * @FREQUENCY_UTIL: Utilization used to select frequency
From: Qais Yousef qais.yousef@arm.com
commit aa69c36f31aadc1669bfa8a3de6a47b5e6c98ee8 upstream.
We do consider thermal pressure in util_fits_cpu() for uclamp_min only. With the exception of the biggest cores which by definition are the max performance point of the system and all tasks by definition should fit.
Even under thermal pressure, the capacity of the biggest CPU is the highest in the system and should still fit every task. Except when it reaches capacity inversion point, then this is no longer true.
We can handle this by using the inverted capacity as capacity_orig in util_fits_cpu(). Which not only addresses the problem above, but also ensure uclamp_max now considers the inverted capacity. Force fitting a task when a CPU is in this adverse state will contribute to making the thermal throttling last longer.
Signed-off-by: Qais Yousef qais.yousef@arm.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lore.kernel.org/r/20220804143609.515789-10-qais.yousef@arm.com (cherry picked from commit aa69c36f31aadc1669bfa8a3de6a47b5e6c98ee8) Signed-off-by: Qais Yousef (Google) qyousef@layalina.io --- kernel/sched/fair.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e472f1849092..a9ae621d58cb 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4159,12 +4159,16 @@ static inline int util_fits_cpu(unsigned long util, * For uclamp_max, we can tolerate a drop in performance level as the * goal is to cap the task. So it's okay if it's getting less. * - * In case of capacity inversion, which is not handled yet, we should - * honour the inverted capacity for both uclamp_min and uclamp_max all - * the time. + * In case of capacity inversion we should honour the inverted capacity + * for both uclamp_min and uclamp_max all the time. */ - capacity_orig = capacity_orig_of(cpu); - capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu); + capacity_orig = cpu_in_capacity_inversion(cpu); + if (capacity_orig) { + capacity_orig_thermal = capacity_orig; + } else { + capacity_orig = capacity_orig_of(cpu); + capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu); + }
/* * We want to force a task to fit a cpu as implied by uclamp_max.
commit e26fd28db82899be71b4b949527373d0a6be1e65 upstream.
Addresses the following warnings:
config: riscv-randconfig-m031-20221111 compiler: riscv64-linux-gcc (GCC) 12.1.0
smatch warnings: kernel/sched/fair.c:7263 find_energy_efficient_cpu() error: uninitialized symbol 'util_min'. kernel/sched/fair.c:7263 find_energy_efficient_cpu() error: uninitialized symbol 'util_max'.
Fixes: 244226035a1f ("sched/uclamp: Fix fits_capacity() check in feec()") Reported-by: kernel test robot lkp@intel.com Reported-by: Dan Carpenter error27@gmail.com Signed-off-by: Qais Yousef (Google) qyousef@layalina.io Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Vincent Guittot vincent.guittot@linaro.org Link: https://lore.kernel.org/r/20230112122708.330667-2-qyousef@layalina.io (cherry picked from commit e26fd28db82899be71b4b949527373d0a6be1e65) [Conflict in kernel/sched/fair.c due to new automatic variables being added on master vs 5.15] Signed-off-by: Qais Yousef (Google) qyousef@layalina.io --- kernel/sched/fair.c | 35 ++++++++++++++++------------------- 1 file changed, 16 insertions(+), 19 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index a9ae621d58cb..1a78a7882868 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6977,14 +6977,16 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) goto unlock;
for (; pd; pd = pd->next) { + unsigned long util_min = p_util_min, util_max = p_util_max; unsigned long cur_delta, spare_cap, max_spare_cap = 0; unsigned long rq_util_min, rq_util_max; - unsigned long util_min, util_max; bool compute_prev_delta = false; unsigned long base_energy_pd; int max_spare_cap_cpu = -1;
for_each_cpu_and(cpu, perf_domain_span(pd), sched_domain_span(sd)) { + struct rq *rq = cpu_rq(cpu); + if (!cpumask_test_cpu(cpu, p->cpus_ptr)) continue;
@@ -7000,24 +7002,19 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) * much capacity we can get out of the CPU; this is * aligned with sched_cpu_util(). */ - if (uclamp_is_used()) { - if (uclamp_rq_is_idle(cpu_rq(cpu))) { - util_min = p_util_min; - util_max = p_util_max; - } else { - /* - * Open code uclamp_rq_util_with() except for - * the clamp() part. Ie: apply max aggregation - * only. util_fits_cpu() logic requires to - * operate on non clamped util but must use the - * max-aggregated uclamp_{min, max}. - */ - rq_util_min = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MIN); - rq_util_max = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MAX); - - util_min = max(rq_util_min, p_util_min); - util_max = max(rq_util_max, p_util_max); - } + if (uclamp_is_used() && !uclamp_rq_is_idle(rq)) { + /* + * Open code uclamp_rq_util_with() except for + * the clamp() part. Ie: apply max aggregation + * only. util_fits_cpu() logic requires to + * operate on non clamped util but must use the + * max-aggregated uclamp_{min, max}. + */ + rq_util_min = uclamp_rq_get(rq, UCLAMP_MIN); + rq_util_max = uclamp_rq_get(rq, UCLAMP_MAX); + + util_min = max(rq_util_min, p_util_min); + util_max = max(rq_util_max, p_util_max); } if (!util_fits_cpu(util, util_min, util_max, cpu)) continue;
commit da07d2f9c153e457e845d4dcfdd13568d71d18a4 upstream.
Traversing the Perf Domains requires rcu_read_lock() to be held and is conditional on sched_energy_enabled(). Ensure right protections applied.
Also skip capacity inversion detection for our own pd; which was an error.
Fixes: 44c7b80bffc3 ("sched/fair: Detect capacity inversion") Reported-by: Dietmar Eggemann dietmar.eggemann@arm.com Signed-off-by: Qais Yousef (Google) qyousef@layalina.io Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Vincent Guittot vincent.guittot@linaro.org Link: https://lore.kernel.org/r/20230112122708.330667-3-qyousef@layalina.io (cherry picked from commit da07d2f9c153e457e845d4dcfdd13568d71d18a4) Signed-off-by: Qais Yousef (Google) qyousef@layalina.io --- kernel/sched/fair.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1a78a7882868..5a050c827113 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8605,16 +8605,23 @@ static void update_cpu_capacity(struct sched_domain *sd, int cpu) * * Thermal pressure will impact all cpus in this perf domain * equally. */ - if (static_branch_unlikely(&sched_asym_cpucapacity)) { + if (sched_energy_enabled()) { unsigned long inv_cap = capacity_orig - thermal_load_avg(rq); - struct perf_domain *pd = rcu_dereference(rq->rd->pd); + struct perf_domain *pd; + + rcu_read_lock();
+ pd = rcu_dereference(rq->rd->pd); rq->cpu_capacity_inverted = 0;
for (; pd; pd = pd->next) { struct cpumask *pd_span = perf_domain_span(pd); unsigned long pd_cap_orig, pd_cap;
+ /* We can't be inverted against our own pd */ + if (cpumask_test_cpu(cpu_of(rq), pd_span)) + continue; + cpu = cpumask_any(pd_span); pd_cap_orig = arch_scale_cpu_capacity(cpu);
@@ -8639,6 +8646,8 @@ static void update_cpu_capacity(struct sched_domain *sd, int cpu) break; } } + + rcu_read_unlock(); }
trace_sched_cpu_capacity_tp(rq);
linux-stable-mirror@lists.linaro.org