In previous code, energy aware scheduling selects CPU based on CPU current capacity can meet wakeup task requirement and prefer the running CPU rather than idle CPU so can reduce the wakeup latency. This has a side effect which usually introduces vicious circle: after place the task onto CPU, CPUFreq governor is possible to increase frequency after it detects CPU has higher utilization; if next time have another waken up task, this task also will be placed onto the same CPU due this CPU has been improved frequency yet and it's the preferred CPU due it have running CPU. As result, CPUFreq governor increases frequency in turn to packing more tasks on one CPU.
Also observed another issue is: if system has a big task and several small tasks, the LITTLE core can meet the task capacity requirement; with previous code it's more likely to pack tasks onto one CPU so it has more chance to let CPU to be overutilized, at the end the big task is migrated to big core but actually one standalone LITTLE core can meet the task capacity requirement.
So this patch is based on below two rules for power saving:
- If the CPUs share voltage and clock, then CPUs with higher OPP will consume more power than lower OPP; so prefer to spread tasks to stay on lowest OPP as possible; - After achieve lowest OPP, it's good to pack tasks so can reduce the times to wake up CPUs.
This patch follows up upper two rules, the most important criteria is to select CPUs which has lowest OPP to meet task requirement; Furthermore it pack tasks as possible and keep at at lowest OPP.
Signed-off-by: Leo Yan leo.yan@linaro.org --- kernel/sched/fair.c | 45 ++++++++++++++++++++++++++++++++++++--------- 1 file changed, 36 insertions(+), 9 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 42a40bf..0486edb 100755 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6321,18 +6321,25 @@ static int energy_aware_select_candidate_cpu(struct task_struct *p, struct sched_group *sg) { int i, cpu = -1; - unsigned long task_util_boosted, new_util; + int cap_idx = INT_MAX, idx; + unsigned long task_util_boosted, new_util, wake_util; + struct sched_domain *sd; + const struct sched_group_energy *sge; + int prev_cpu = task_cpu(p);
task_util_boosted = boosted_task_util(p);
/* Find cpu with sufficient capacity */ for_each_cpu_and(i, tsk_cpus_allowed(p), sched_group_cpus(sg)) { + + wake_util = cpu_util(i); + /* * p's blocked utilization is still accounted for on prev_cpu * so prev_cpu will receive a negative bias due to the double * accounting. However, the blocked utilization may be zero. */ - new_util = cpu_util(i) + task_util_boosted; + new_util = wake_util + task_util_boosted;
/* * Ensure minimum capacity to grant the required boost. @@ -6342,15 +6349,35 @@ static int energy_aware_select_candidate_cpu(struct task_struct *p, if (new_util > capacity_orig_of(i)) continue;
- if (new_util < capacity_curr_of(i)) { - cpu = i; - if (cpu_rq(i)->nr_running) - break; - } + /* + * According to waken up task and CPU utilization, predict + * the CPU OPP. So select CPU with two criterias from power + * saving perspective: + * + * - CPU can stay at lowest OPP as possible; + * - For same OPP, CPU has highest utilization so bias to + * pack tasks as possible. + */ + sd = rcu_dereference(per_cpu(sd_ea, i)); + sge = sd->groups->sge; + idx = find_new_capacity(sge, new_util); + + /* Select cpu with possible lower OPP */ + if (cap_idx > idx) {
- /* cpu has capacity at higher OPP, keep it as fallback */ - if (cpu == task_cpu(p)) + cap_idx = idx; cpu = i; + + /* Optimization for CPUs with same OPP */ + } else if (cap_idx == idx) { + + if (cpu == prev_cpu) + continue; + + /* Keep previous CPU and pack tasks if possible */ + if (i == prev_cpu || wake_util > cpu_util(cpu)) + cpu = i; + } }
return cpu; -- 1.9.1