Currently energy calculation in EAS has missed to consider RT pressure, it's quite possible to select CPU for CFS tasks which has high RT pressure and finally accumulate total utilization; as result the other low RT pressure CPUs lose chance to run CFS tasks and reduce contention between CFS and RT tasks, from performance view this is not optimal; furthermore this also harms power data due pack RT task and CFS task on single one CPU is more easily to trigger CPU frequency increasing.
We can measure the summed CPU utilization and calculate the CPU freqency standard deviation to get to if the tasks can be well spreading within the same cluster for middle workload case. So below is the comparison result for video playback on Hikey960 for before and after applied this patch set (Using schedutil CPUFreq governor):
Without Patch Set: With Patch Set:
CPU Min(Util) Mean(Util) Mean(Util) | Min(Util) Mean(Util) Mean(Util) 0 7 67 205 | 8 52 170 1 4 53 227 | 9 47 188 2 4 57 191 | 8 38 192 3 4 35 165 | 16 47 146 s.d. 1.5 13.3 25.9 | 3.9 5.83 20.9
4 0 35 160 | 10 34 129 5 0 24 129 | 0 30 115 6 0 18 123 | 0 18 95 7 0 12 84 | 0 21 73 s.d. 0 9.8 31.2 | 5 7.5 24.4
The standard diviation for CPU utilization mean value has been decreased after applying this patch set (Little cluster: 13.3 vs 5.83, big cluster: 9.8 vs 7.5). This also confirm from the average CPU frequency:
Without Patch Set: With Patch Set: Average Frequency | Average Frequency LITTLT Cluster 737MHz | 646MHz big Cluster 916MHz | 922MHz
Leo Yan (4): sched/fair: Select maximum spare capacity for idle candidate CPUs sched: Introduce cpu_util_sum()/__cpu_util_sum() functions sched/fair: Consider RT pressure for find_best_target() sched/fair: Consider RT/DL pressure for energy calculation
kernel/sched/fair.c | 22 +++++++++++++++++++--- kernel/sched/sched.h | 29 +++++++++++++++++++++++++++++ 2 files changed, 48 insertions(+), 3 deletions(-)
-- 1.9.1
The idle candidate CPUs utilization may have quite different values, as result this may introduce high CPU OPP selection if place the waken task on the candidate CPU which has high utilization value.
This commit introduces the variable 'best_idle_max_spare_cap' to select the maximum spare capacity within the idle CPUs with the same idle state. This results in the candidate idle CPU has lowest utilization, and later it compares energy with the candidate active CPU which also has lowest utilization from all active CPUs
This commit is highly inspired by Joonwoo Park's patch "sched/fair: take CPU energy cost into consideration for placement"; Joonwoo's patch uses cpu_util(), which may has stale utilization value for waken up task, and cpu_util() value is hard to take account into RT/DL pressure. This commit uses variable 'new_util' to calculate the maximum spare capacity for idle CPUs, which has fixed up the waken task util and can more smoothly to considered RT/DL pressure by sequential optimization.
Signed-off-by: Leo Yan leo.yan@linaro.org --- kernel/sched/fair.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 062a3b7..4b17a08 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6643,6 +6643,7 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, unsigned long target_capacity = ULONG_MAX; unsigned long min_wake_util = ULONG_MAX; unsigned long target_max_spare_cap = 0; + unsigned long best_idle_max_spare_cap = 0; unsigned long target_util = ULONG_MAX; unsigned long best_active_util = ULONG_MAX; int best_idle_cstate = INT_MAX; @@ -6897,15 +6898,27 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, * if they are also less energy efficient. * IOW, prefer a deep IDLE LITTLE CPU vs a * shallow idle big CPU. + * + * Within same idle state CPUs, select the + * maximum spare capacity CPU. */ - if (sysctl_sched_cstate_aware && - best_idle_cstate <= idle_idx) - continue; + if (sysctl_sched_cstate_aware) { + if (best_idle_cstate < idle_idx) + continue; + + if (best_idle_cstate == idle_idx && + (capacity_orig - new_util) < best_idle_max_spare_cap) + continue; + } else { + if ((capacity_orig - new_util) < best_idle_max_spare_cap) + continue; + }
/* Keep track of best idle CPU */ best_idle_min_cap_orig = capacity_orig; best_idle_cstate = idle_idx; best_idle_cpu = i; + best_idle_max_spare_cap = capacity_orig - new_util; continue; }
-- 1.9.1
CPU utilization is not only about CFS utilization, it also includes RT/DL utilization. Current code we track utilization with both PELT and WALT signals.
For PELT signal: CFS has its own utilization tracking, and has another signal which uses PELT algorithm to track RT/DL utilization value;
For WALT signal: WALT signal has accumulated all scheduling class utilization into single one signal, so directly use the value return back from cpu_util().
This commit provides the two functions to calculate the CPU summed utilization, the function cpu_util_sum() returns the summed utilization for specified CPU; the function __cpu_util_sum() needs to pass the CFS and RT utilization value and return back summed value, so this function can be used for some estimation case, e.g. we can pass the estimated CFS utilization if place one waken task on the specified CPU.
Signed-off-by: Leo Yan leo.yan@linaro.org --- kernel/sched/sched.h | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index d01a06f..1928d27 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1822,6 +1822,35 @@ static inline unsigned long cpu_util_est(int cpu) }
unsigned long boosted_cpu_util(int cpu); + +/** + * __cpu_util_sum: get summed utilization for the specified CPU + * @cpu: the CPU to get the summed utilization for + * + * According to the passed parameter for RT utilization and CFS + * utilization to calculate the sum utilization for the CPU. The WALT + * signal has includes CFS/RT/DL classes utilization, so directly + * return back the utilization value; PELT signal will accumulate RT/DL + * and CFS utilization. + * + * Return: the summed utilization for the specified CPU + */ +static inline unsigned long __cpu_util_sum(int cpu, unsigned long rt_util, + unsigned long cfs_util) +{ +#ifdef CONFIG_SCHED_WALT + if (!walt_disabled && sysctl_sched_use_walt_cpu_util) + return cfs_util; +#endif + return (rt_util + cfs_util); +} + +static inline unsigned long cpu_util_sum(int cpu) +{ + struct rq *rq = cpu_rq(cpu); + + return __cpu_util_sum(cpu, rq->rt.avg.util_avg, boosted_cpu_util(cpu)); +} #endif
#ifdef CONFIG_CPU_FREQ_GOV_SCHED -- 1.9.1
Function bind_best_target() tries to find maximum spare capacity CPU, this is finished by finding max(capacity_orig_of(cpu) - new_util). The value 'new_util' has different meaning for PELT signal and WALT signal, for WALT signal it has includes RT pressure but PELT has missed RT pressure.
This commit is to fix up 'new_util' to include RT pressure, we can achieve this by using function __cpu_util_sum() to accumulate RT utilization for PELT signal, and this function directly returns back 'new_util' value so avoid to accumulate RT pressure twice.
Signed-off-by: Leo Yan leo.yan@linaro.org --- kernel/sched/fair.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4b17a08..1ce700a 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6700,6 +6700,7 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, * accounting. However, the blocked utilization may be zero. */ wake_util = cpu_util_wake(i, p); + wake_util = __cpu_util_sum(i, cpu_rq(i)->rt.avg.util_avg, wake_util); new_util = wake_util + task_util_est(p);
/* -- 1.9.1
Energy calculation misses to consider RT/DL pressure for two factors. The first one factor is in function group_max_util() which calculates the CPU capacity index, if we ignore RT/DL pressure then may estimate the wrong capacity index value. The second factor is in function group_norm_util() which calculate the CPU busy/idle percentage, this part have totally ignored the energy introduced by RT/DL threads.
This commit considers RT/DL pressure for energy calculation by using __cpu_util_sum() to accumulate RT/DL and CFS utilization for both group_max_util() and group_norm_util().
Signed-off-by: Leo Yan leo.yan@linaro.org --- kernel/sched/fair.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1ce700a..f7a767d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5440,6 +5440,7 @@ static unsigned long group_max_util(struct energy_env *eenv, int cpu_idx) }
util += schedtune_margin(util, boost); + util = __cpu_util_sum(cpu, cpu_rq(cpu)->rt.avg.util_avg, util); max_util = max(max_util, util); }
@@ -5478,6 +5479,7 @@ long group_norm_util(struct energy_env *eenv, int cpu_idx) if (unlikely(cpu == eenv->cpu[cpu_idx].cpu_id)) util += eenv->util_delta;
+ util = __cpu_util_sum(cpu, cpu_rq(cpu)->rt.avg.util_avg, util); util_sum += __cpu_norm_util(util, capacity); }
-- 1.9.1
On Tue, Oct 24, 2017 at 02:20:59PM +0800, Leo Yan wrote:
Currently energy calculation in EAS has missed to consider RT pressure, it's quite possible to select CPU for CFS tasks which has high RT pressure and finally accumulate total utilization; as result the other low RT pressure CPUs lose chance to run CFS tasks and reduce contention between CFS and RT tasks, from performance view this is not optimal; furthermore this also harms power data due pack RT task and CFS task on single one CPU is more easily to trigger CPU frequency increasing.
We can measure the summed CPU utilization and calculate the CPU freqency standard deviation to get to if the tasks can be well spreading within the same cluster for middle workload case. So below is the comparison result for video playback on Hikey960 for before and after applied this patch set (Using schedutil CPUFreq governor):
Without Patch Set: With Patch Set:
CPU Min(Util) Mean(Util) Mean(Util) | Min(Util) Mean(Util) Mean(Util)
Sorry for typo, should be:
CPU Min(Util) Mean(Util) Max(Util) | Min(Util) Mean(Util) Max(Util)
0 7 67 205 | 8 52 170 1 4 53 227 | 9 47 188 2 4 57 191 | 8 38 192 3 4 35 165 | 16 47 146 s.d. 1.5 13.3 25.9 | 3.9 5.83 20.9
4 0 35 160 | 10 34 129 5 0 24 129 | 0 30 115 6 0 18 123 | 0 18 95 7 0 12 84 | 0 21 73 s.d. 0 9.8 31.2 | 5 7.5 24.4
The standard diviation for CPU utilization mean value has been decreased after applying this patch set (Little cluster: 13.3 vs 5.83, big cluster: 9.8 vs 7.5). This also confirm from the average CPU frequency:
Without Patch Set: With Patch Set: Average Frequency | Average Frequency LITTLT Cluster 737MHz | 646MHz big Cluster 916MHz | 922MHz
Leo Yan (4): sched/fair: Select maximum spare capacity for idle candidate CPUs sched: Introduce cpu_util_sum()/__cpu_util_sum() functions sched/fair: Consider RT pressure for find_best_target() sched/fair: Consider RT/DL pressure for energy calculation
kernel/sched/fair.c | 22 +++++++++++++++++++--- kernel/sched/sched.h | 29 +++++++++++++++++++++++++++++ 2 files changed, 48 insertions(+), 3 deletions(-)
-- 1.9.1
Hi Leo,
On Mon, Oct 23, 2017 at 11:21 PM, Leo Yan leo.yan@linaro.org wrote:
CPU utilization is not only about CFS utilization, it also includes RT/DL utilization. Current code we track utilization with both PELT and WALT signals.
For PELT signal: CFS has its own utilization tracking, and has another signal which uses PELT algorithm to track RT/DL utilization value;
For WALT signal: WALT signal has accumulated all scheduling class utilization into single one signal, so directly use the value return back from cpu_util().
This commit provides the two functions to calculate the CPU summed utilization, the function cpu_util_sum() returns the summed utilization for specified CPU; the function __cpu_util_sum() needs to pass the CFS and RT utilization value and return back summed value, so this function can be used for some estimation case, e.g. we can pass the estimated CFS utilization if place one waken task on the specified CPU.
Signed-off-by: Leo Yan leo.yan@linaro.org
kernel/sched/sched.h | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index d01a06f..1928d27 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1822,6 +1822,35 @@ static inline unsigned long cpu_util_est(int cpu) }
unsigned long boosted_cpu_util(int cpu);
+/**
- __cpu_util_sum: get summed utilization for the specified CPU
- @cpu: the CPU to get the summed utilization for
- According to the passed parameter for RT utilization and CFS
- utilization to calculate the sum utilization for the CPU. The WALT
- signal has includes CFS/RT/DL classes utilization, so directly
- return back the utilization value; PELT signal will accumulate RT/DL
- and CFS utilization.
- Return: the summed utilization for the specified CPU
- */
+static inline unsigned long __cpu_util_sum(int cpu, unsigned long rt_util,
unsigned long cfs_util)
+{ +#ifdef CONFIG_SCHED_WALT
if (!walt_disabled && sysctl_sched_use_walt_cpu_util)
return cfs_util;
+#endif
return (rt_util + cfs_util);
Just thinking out loud - would it be better to simply make: rt_util = (capacity_orig_of - capacity_of) as the RT pressure utilization? This will also consider IRQ pressure and is a bit more generic. Also maybe have a helper function to calculate the RT utilization?
thanks,
- Joel
<snip>
Hi Leo,
Nice to see this patch and I like this approach better than the alternate approach proposed since its cleaner.
Just for fyi, I am experimenting with patches to use mainline's find_idlest_group for Android: https://android-review.googlesource.com/#/q/status:open+project:kernel/commo... My thought is to discontinue using our out-of-tree find_best_target as much as possible and switch to using mainline slow/fast paths. It was resulted in a decent code reduction as well.
I was wondering if we can discontinue modifying find_best_target more, and continue working on that? Although the patches in the link above only replace (Case A - prefer-idle latency sensitive tasks) to use mainline - so this particular patch of yours is still fine. Some more inline comments below:
On Mon, Oct 23, 2017 at 11:21 PM, Leo Yan leo.yan@linaro.org wrote:
The idle candidate CPUs utilization may have quite different values, as result this may introduce high CPU OPP selection if place the waken task on the candidate CPU which has high utilization value.
This commit introduces the variable 'best_idle_max_spare_cap' to select the maximum spare capacity within the idle CPUs with the same idle state. This results in the candidate idle CPU has lowest utilization, and later it compares energy with the candidate active CPU which also has lowest utilization from all active CPUs
This commit is highly inspired by Joonwoo Park's patch "sched/fair: take CPU energy cost into consideration for placement"; Joonwoo's patch uses cpu_util(), which may has stale utilization value for waken up task, and cpu_util() value is hard to take account into RT/DL pressure. This commit uses variable 'new_util' to calculate the maximum spare capacity for idle CPUs, which has fixed up the waken task util and can more smoothly to considered RT/DL pressure by sequential optimization.
Signed-off-by: Leo Yan leo.yan@linaro.org
kernel/sched/fair.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 062a3b7..4b17a08 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6643,6 +6643,7 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, unsigned long target_capacity = ULONG_MAX; unsigned long min_wake_util = ULONG_MAX; unsigned long target_max_spare_cap = 0;
unsigned long best_idle_max_spare_cap = 0; unsigned long target_util = ULONG_MAX; unsigned long best_active_util = ULONG_MAX; int best_idle_cstate = INT_MAX;
@@ -6897,15 +6898,27 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, * if they are also less energy efficient. * IOW, prefer a deep IDLE LITTLE CPU vs a * shallow idle big CPU.
*
* Within same idle state CPUs, select the
* maximum spare capacity CPU. */
if (sysctl_sched_cstate_aware &&
best_idle_cstate <= idle_idx)
continue;
if (sysctl_sched_cstate_aware) {
if (best_idle_cstate < idle_idx)
continue;
if (best_idle_cstate == idle_idx &&
(capacity_orig - new_util) < best_idle_max_spare_cap)
continue;
} else {
if ((capacity_orig - new_util) < best_idle_max_spare_cap)
continue;
This is quite similar to the patch I am testing more that adds capacity_of to mainline's capacity_spare_wake. Things look good with my synthetic and Rohit's tests. I posted a thread about it here to eas-dev: https://lists.linaro.org/pipermail/eas-dev/2017-October/001005.html
If there's a way we can switch to using that for this case, that would be great as well - although this path also does energy diff.
thanks,
- Joel
<snip>
Hi Joel,
On Tue, Oct 24, 2017 at 07:37:41AM -0700, Joel Fernandes wrote:
Hi Leo,
On Mon, Oct 23, 2017 at 11:21 PM, Leo Yan leo.yan@linaro.org wrote:
CPU utilization is not only about CFS utilization, it also includes RT/DL utilization. Current code we track utilization with both PELT and WALT signals.
For PELT signal: CFS has its own utilization tracking, and has another signal which uses PELT algorithm to track RT/DL utilization value;
For WALT signal: WALT signal has accumulated all scheduling class utilization into single one signal, so directly use the value return back from cpu_util().
This commit provides the two functions to calculate the CPU summed utilization, the function cpu_util_sum() returns the summed utilization for specified CPU; the function __cpu_util_sum() needs to pass the CFS and RT utilization value and return back summed value, so this function can be used for some estimation case, e.g. we can pass the estimated CFS utilization if place one waken task on the specified CPU.
Signed-off-by: Leo Yan leo.yan@linaro.org
kernel/sched/sched.h | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index d01a06f..1928d27 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1822,6 +1822,35 @@ static inline unsigned long cpu_util_est(int cpu) }
unsigned long boosted_cpu_util(int cpu);
+/**
- __cpu_util_sum: get summed utilization for the specified CPU
- @cpu: the CPU to get the summed utilization for
- According to the passed parameter for RT utilization and CFS
- utilization to calculate the sum utilization for the CPU. The WALT
- signal has includes CFS/RT/DL classes utilization, so directly
- return back the utilization value; PELT signal will accumulate RT/DL
- and CFS utilization.
- Return: the summed utilization for the specified CPU
- */
+static inline unsigned long __cpu_util_sum(int cpu, unsigned long rt_util,
unsigned long cfs_util)
+{ +#ifdef CONFIG_SCHED_WALT
if (!walt_disabled && sysctl_sched_use_walt_cpu_util)
return cfs_util;
+#endif
return (rt_util + cfs_util);
Just thinking out loud - would it be better to simply make: rt_util = (capacity_orig_of - capacity_of) as the RT pressure utilization? This will also consider IRQ pressure and is a bit more generic. Also maybe have a helper function to calculate the RT utilization?
Be honest, before I worked out this patch set I took advice with Vincent; and I tried to use capacity_of() to present the RT pressure.
But finally I work out this patch with below reasons:
Firstly capacity_of() doesn't update RT pressure utilization when everytime call it; the RT pressure utilization is updated periodically in load balance. So we need firstly resolve the question for updating RT pressure utilization for capacity_of() for every call and without introducing big workload.
If using (capacity_orig_of - capcity_of) to repsent RT utilization, I have one more question is: why not directly use rt_avg or rt.avg.util_sum (PELT for RT class)?
Thanks, Leo Yan
On Tue, Oct 24, 2017 at 07:45:07AM -0700, Joel Fernandes wrote:
Hi Leo,
Nice to see this patch and I like this approach better than the alternate approach proposed since its cleaner.
Thanks!
Just for fyi, I am experimenting with patches to use mainline's find_idlest_group for Android: https://android-review.googlesource.com/#/q/status:open+project:kernel/commo... My thought is to discontinue using our out-of-tree find_best_target as much as possible and switch to using mainline slow/fast paths. It was resulted in a decent code reduction as well.
I was wondering if we can discontinue modifying find_best_target more, and continue working on that? Although the patches in the link above only replace (Case A - prefer-idle latency sensitive tasks) to use mainline - so this particular patch of yours is still fine. Some more
I agree it's a good direction for Case A to use mainline paths.
inline comments below:
On Mon, Oct 23, 2017 at 11:21 PM, Leo Yan leo.yan@linaro.org wrote:
The idle candidate CPUs utilization may have quite different values, as result this may introduce high CPU OPP selection if place the waken task on the candidate CPU which has high utilization value.
This commit introduces the variable 'best_idle_max_spare_cap' to select the maximum spare capacity within the idle CPUs with the same idle state. This results in the candidate idle CPU has lowest utilization, and later it compares energy with the candidate active CPU which also has lowest utilization from all active CPUs
This commit is highly inspired by Joonwoo Park's patch "sched/fair: take CPU energy cost into consideration for placement"; Joonwoo's patch uses cpu_util(), which may has stale utilization value for waken up task, and cpu_util() value is hard to take account into RT/DL pressure. This commit uses variable 'new_util' to calculate the maximum spare capacity for idle CPUs, which has fixed up the waken task util and can more smoothly to considered RT/DL pressure by sequential optimization.
Signed-off-by: Leo Yan leo.yan@linaro.org
kernel/sched/fair.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 062a3b7..4b17a08 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6643,6 +6643,7 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, unsigned long target_capacity = ULONG_MAX; unsigned long min_wake_util = ULONG_MAX; unsigned long target_max_spare_cap = 0;
unsigned long best_idle_max_spare_cap = 0; unsigned long target_util = ULONG_MAX; unsigned long best_active_util = ULONG_MAX; int best_idle_cstate = INT_MAX;
@@ -6897,15 +6898,27 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, * if they are also less energy efficient. * IOW, prefer a deep IDLE LITTLE CPU vs a * shallow idle big CPU.
*
* Within same idle state CPUs, select the
* maximum spare capacity CPU. */
if (sysctl_sched_cstate_aware &&
best_idle_cstate <= idle_idx)
continue;
if (sysctl_sched_cstate_aware) {
if (best_idle_cstate < idle_idx)
continue;
if (best_idle_cstate == idle_idx &&
(capacity_orig - new_util) < best_idle_max_spare_cap)
continue;
} else {
if ((capacity_orig - new_util) < best_idle_max_spare_cap)
continue;
This is quite similar to the patch I am testing more that adds capacity_of to mainline's capacity_spare_wake. Things look good with my synthetic and Rohit's tests. I posted a thread about it here to eas-dev: https://lists.linaro.org/pipermail/eas-dev/2017-October/001005.html
I will look into more deeply for your and Rohit's patches.
If there's a way we can switch to using that for this case, that would be great as well - although this path also does energy diff.
I think you are suggesting the modification like below patch; please note below patch has two issues, first one issue I have replied in another email that capacity_of() cannot update RT pressure for every call, another issue is for WALT signal the RT pressure has been counted twice (one is from capacity_of(), another is from cpu_util() which natually includes RT + CFS util).
---8<---
From eb5544d3f8827b196453485b65a4ef72b6748bdc Mon Sep 17 00:00:00 2001
From: Leo Yan leo.yan@linaro.org Date: Tue, 17 Oct 2017 09:26:55 +0800 Subject: [PATCH 3/7] sched/fair: Replace capacity_orig_of() with capacity_of() in find_best_target()
Inspired by Vincent, the CFS scheduling class uses capacity_of() to present RT/DL pressure and capacity_orig_of() presents the CPU original capacity; Furthermore, capacity_of() reflects the capacity capping from thermal throttling as well.
This patch is to use capacity_of() to replace capacity_orig_of() for below two cases in find_best_target():
- The first case is to check CPU is overutilized, in this case we use capacity_of() rather than capacity_orig_of(); as result this can let find_best_target() has consistent behaviour with function cpu_overutilized();
- The second case is calculation the maximum spare capacity; for this case we need to consider RT/DL pressure, so can use capacity_of() value to reflect it. This results in we can select CPU with lowest sum utilization from RT, DT and CFS classes.
Change-Id: I4b0bb994f9c28f90f55dd6572c90f11bc0608f70 Signed-off-by: Leo Yan leo.yan@linaro.org --- kernel/sched/fair.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9aaf8ee..ae85fa7 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6691,7 +6691,7 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, sg = sd->groups; do { for_each_cpu_and(i, tsk_cpus_allowed(p), sched_group_cpus(sg)) { - unsigned long capacity_curr = capacity_curr_of(i); + unsigned long capacity = capacity_of(i); unsigned long capacity_orig = capacity_orig_of(i); unsigned long wake_util, new_util;
@@ -6813,9 +6813,9 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, * Case A.2: Target ACTIVE CPU * Favor CPUs with max spare capacity. */ - if ((capacity_curr > new_util) && - (capacity_orig - new_util > target_max_spare_cap)) { - target_max_spare_cap = capacity_orig - new_util; + if ((capacity > new_util) && + (capacity - new_util > target_max_spare_cap)) { + target_max_spare_cap = capacity - new_util; target_cpu = i; continue; } @@ -6863,7 +6863,7 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, * possible at least for !prefer_idle tasks. */ if ((new_util * capacity_margin) > - (capacity_orig * SCHED_CAPACITY_SCALE)) + (capacity * SCHED_CAPACITY_SCALE)) continue;
/* @@ -6939,10 +6939,10 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, continue;
/* Favor CPUs with maximum spare capacity */ - if ((capacity_orig - new_util) < target_max_spare_cap) + if ((capacity - new_util) < target_max_spare_cap) continue;
- target_max_spare_cap = capacity_orig - new_util; + target_max_spare_cap = capacity - new_util; target_capacity = capacity_orig; target_util = new_util; target_cpu = i; -- 1.9.1
thanks,
- Joel
<snip>
Hi Leo,
<snip>
If there's a way we can switch to using that for this case, that would be great as well - although this path also does energy diff.
I think you are suggesting the modification like below patch; please
No actually I was just referring to using a unified CPU selection logic in mainline code for Case B and C as well, instead of doing the find_best_target. I guess it is more difficult to do than I think since mainline tries to spread tasks for perf reasons, than pack them so may be it has to be separate after all for a separate "energy saving" selection logic..
note below patch has two issues, first one issue I have replied in another email that capacity_of() cannot update RT pressure for every
Ok, I'll reply to this in that thread.
call, another issue is for WALT signal the RT pressure has been counted twice (one is from capacity_of(), another is from cpu_util() which natually includes RT + CFS util).
Yes, I see this too. At the moment I was trying to do it from mainline perspective - and Rohit doesn't use EAS/WALT anyway. For my testing of capacity_of in capacity_spare_wake, I turn off WALT for the very reason you mention. With WALT capacity_spare_wake already has RT pressure included.
The below patch looks good in principle to me, but I think its a good idea to add a new abstraction 'capacity_spare' which can prevents the double accounting for WALT case capacity spare. And for PELT, just do the regular thing, and then make use of that in find_best_target. So for WALT the new capacity_spare() would return (capacity_orig_of - cpu_util) and for PELT it would return (capacity_of - cpu_util). What do you think?
thanks,
- Joel
<snip>
Hi Leo,
On Wed, Oct 25, 2017 at 7:19 AM, Leo Yan leo.yan@linaro.org wrote:
Hi Joel,
On Tue, Oct 24, 2017 at 07:37:41AM -0700, Joel Fernandes wrote:
Hi Leo,
On Mon, Oct 23, 2017 at 11:21 PM, Leo Yan leo.yan@linaro.org wrote:
CPU utilization is not only about CFS utilization, it also includes RT/DL utilization. Current code we track utilization with both PELT and WALT signals.
For PELT signal: CFS has its own utilization tracking, and has another signal which uses PELT algorithm to track RT/DL utilization value;
For WALT signal: WALT signal has accumulated all scheduling class utilization into single one signal, so directly use the value return back from cpu_util().
This commit provides the two functions to calculate the CPU summed utilization, the function cpu_util_sum() returns the summed utilization for specified CPU; the function __cpu_util_sum() needs to pass the CFS and RT utilization value and return back summed value, so this function can be used for some estimation case, e.g. we can pass the estimated CFS utilization if place one waken task on the specified CPU.
Signed-off-by: Leo Yan leo.yan@linaro.org
kernel/sched/sched.h | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index d01a06f..1928d27 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1822,6 +1822,35 @@ static inline unsigned long cpu_util_est(int cpu) }
unsigned long boosted_cpu_util(int cpu);
+/**
- __cpu_util_sum: get summed utilization for the specified CPU
- @cpu: the CPU to get the summed utilization for
- According to the passed parameter for RT utilization and CFS
- utilization to calculate the sum utilization for the CPU. The WALT
- signal has includes CFS/RT/DL classes utilization, so directly
- return back the utilization value; PELT signal will accumulate RT/DL
- and CFS utilization.
- Return: the summed utilization for the specified CPU
- */
+static inline unsigned long __cpu_util_sum(int cpu, unsigned long rt_util,
unsigned long cfs_util)
+{ +#ifdef CONFIG_SCHED_WALT
if (!walt_disabled && sysctl_sched_use_walt_cpu_util)
return cfs_util;
+#endif
return (rt_util + cfs_util);
Just thinking out loud - would it be better to simply make: rt_util = (capacity_orig_of - capacity_of) as the RT pressure utilization? This will also consider IRQ pressure and is a bit more generic. Also maybe have a helper function to calculate the RT utilization?
Be honest, before I worked out this patch set I took advice with Vincent; and I tried to use capacity_of() to present the RT pressure.
But finally I work out this patch with below reasons:
Firstly capacity_of() doesn't update RT pressure utilization when everytime call it; the RT pressure utilization is updated periodically in load balance. So we need firstly resolve the
Did you see any performance improvement when considering 'instantaneous' pressure? Could you provide an example of in which cases having the instaneous value than a more coarsely (time wise) updated value is better?
question for updating RT pressure utilization for capacity_of() for every call and without introducing big workload.
Yes. IMO - that's a good thing because if I recall there were concerns about this signal changing too rapidly and causes some temporary bouncing of tasks. I believe you also tried to address this here. https://lists.linaro.org/pipermail/eas-dev/2016-October/000613.html. I think if its a temporary increase in RT pressure, is it better not to consider it?
If using (capacity_orig_of - capcity_of) to repsent RT utilization, I have one more question is: why not directly use rt_avg or rt.avg.util_sum (PELT for RT class)?
So rt.avg.util_sum is not upstream right? I was looking for solution that was not dependent on anything not yet upstream. Also we cannot use rt_avg directly since we have to do scaling and averaging before it can be used as capacity. Computation I feel the averaging/scaling is expensive to do too often since its has to divide by the sched_avg_period.
thanks, - Joel
<snip>
On 24-10-17, 14:20, Leo Yan wrote:
Currently energy calculation in EAS has missed to consider RT pressure, it's quite possible to select CPU for CFS tasks which has high RT pressure and finally accumulate total utilization; as result the other low RT pressure CPUs lose chance to run CFS tasks and reduce contention between CFS and RT tasks, from performance view this is not optimal; furthermore this also harms power data due pack RT task and CFS task on single one CPU is more easily to trigger CPU frequency increasing.
We can measure the summed CPU utilization and calculate the CPU freqency standard deviation to get to if the tasks can be well spreading within the same cluster for middle workload case. So below is the comparison result for video playback on Hikey960 for before and after applied this patch set (Using schedutil CPUFreq governor):
@Patrick: Can you (ARM) guys please give some feedback on this series ?
Thanks.
-- viresh
On 13-Mar 15:07, Viresh Kumar wrote:
On 24-10-17, 14:20, Leo Yan wrote:
Currently energy calculation in EAS has missed to consider RT pressure, it's quite possible to select CPU for CFS tasks which has high RT pressure and finally accumulate total utilization; as result the other low RT pressure CPUs lose chance to run CFS tasks and reduce contention between CFS and RT tasks, from performance view this is not optimal; furthermore this also harms power data due pack RT task and CFS task on single one CPU is more easily to trigger CPU frequency increasing.
We can measure the summed CPU utilization and calculate the CPU freqency standard deviation to get to if the tasks can be well spreading within the same cluster for middle workload case. So below is the comparison result for video playback on Hikey960 for before and after applied this patch set (Using schedutil CPUFreq governor):
@Patrick: Can you (ARM) guys please give some feedback on this series ?
Hi Viresh, I'll try to go through this series before our next synchup
Thanks.
-- viresh
-- #include <best/regards.h>
Patrick Bellasi