In previous code, energy aware scheduling selects CPU based on CPU current capacity can meet wakeup task requirement and prefer the running CPU rather than idle CPU so can reduce the wakeup latency. This has a side effect which usually introduces vicious circle: after place the task onto CPU, CPUFreq governor is possible to increase frequency after it detects CPU has higher utilization; if next time have another waken up task, this task also will be placed onto the same CPU due this CPU has been improved frequency yet and it's the preferred CPU due it have running CPU. As result, CPUFreq governor increases frequency in turn to packing more tasks on one CPU.
Also observed another issue is: if system has a big task and several small tasks, the LITTLE core can meet the task capacity requirement; with previous code it's more likely to pack tasks onto one CPU so it has more chance to let CPU to be overutilized, at the end the big task is migrated to big core but actually one standalone LITTLE core can meet the task capacity requirement.
So this patch is based on below two rules for power saving: - Higher frequency will consume more power than lower frequency; so prefer to spread tasks to stay on lower frequency rather than increase frequency; - After we can achieve lowest frequency, it's better to pack tasks so can reduce the times to wake up CPUs.
This patch follows up upper two rules, the most important criteria is to select CPUs which has lowest capacity to meet task requirement; furthermore it selects the CPU with most highest utilization which still can stay at the lowest frequency, so can pack tasks as possible and keep at lowest frequency.
Signed-off-by: Leo Yan leo.yan@linaro.org --- kernel/sched/fair.c | 57 ++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 45 insertions(+), 12 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 050e8b9..713b031 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4931,6 +4931,25 @@ static int find_new_capacity(struct energy_env *eenv, return idx; }
+static int find_cpu_new_capacity(int cpu, unsigned long util) +{ + struct sched_domain *sd; + const struct sched_group_energy *sge; + int idx; + + sd = rcu_dereference(per_cpu(sd_ea, cpu)); + sge = sd->groups->sge; + + for (idx = 0; idx < sge->nr_cap_states; idx++) + if (sge->cap_states[idx].cap >= util) + break; + + if (idx == sge->nr_cap_states) + idx = idx - 1; + + return idx; +} + static int group_idle_state(struct sched_group *sg) { int i, state = INT_MAX; @@ -5756,9 +5775,10 @@ static int energy_aware_wake_cpu(struct task_struct *p, int target, int sync) struct sched_domain *sd; struct sched_group *sg, *sg_target; int target_max_cap = INT_MAX; - int target_cpu = task_cpu(p); - unsigned long task_util_boosted, new_util; - int i; + int target_cap_idx = INT_MAX; + int target_cpu = -1; + unsigned long task_util_boosted, new_util, wake_util; + int cap_idx, i;
if (sysctl_sched_sync_hint_enable && sync) { int cpu = smp_processor_id(); @@ -5804,12 +5824,15 @@ static int energy_aware_wake_cpu(struct task_struct *p, int target, int sync) task_util_boosted = boosted_task_util(p); /* Find cpu with sufficient capacity */ for_each_cpu_and(i, tsk_cpus_allowed(p), sched_group_cpus(sg_target)) { + + wake_util = cpu_util(i); + /* * p's blocked utilization is still accounted for on prev_cpu * so prev_cpu will receive a negative bias due to the double * accounting. However, the blocked utilization may be zero. */ - new_util = cpu_util(i) + task_util_boosted; + new_util = wake_util + task_util_boosted;
/* * Ensure minimum capacity to grant the required boost. @@ -5819,16 +5842,25 @@ static int energy_aware_wake_cpu(struct task_struct *p, int target, int sync) if (new_util > capacity_orig_of(i)) continue;
- if (new_util < capacity_curr_of(i)) { - target_cpu = i; - if (cpu_rq(i)->nr_running) - break; - } + cap_idx = find_cpu_new_capacity(i, new_util); + if (target_cap_idx > cap_idx) {
- /* cpu has capacity at higher OPP, keep it as fallback */ - if (target_cpu == task_cpu(p)) + /* Select cpu with possible lower OPP */ target_cpu = i; + target_cap_idx = cap_idx; + + } else if (target_cap_idx == cap_idx) { + + /* Pack tasks if possible */ + if (wake_util > cpu_util(target_cpu)) + target_cpu = i; + } } + + /* If have not select any CPU, then to use previous CPU */ + if (target_cpu == -1) + return task_cpu(p); + } else { /* * Find a cpu with sufficient capacity @@ -5845,7 +5877,8 @@ static int energy_aware_wake_cpu(struct task_struct *p, int target, int sync) target_cpu = tmp_target; if ((boosted || prefer_idle) && idle_cpu(target_cpu)) return target_cpu; - } + } else + target_cpu = task_cpu(p); }
if (target_cpu != task_cpu(p)) { -- 1.9.1