In current code, energy aware scheduling selects CPU based on current CPU capacity. So this will fall into the circle: when CPUFreq governor improves frequency, scheduler will try to put more tasks into one single CPU; then CPUFreq governor detects more loads on CPU then increase frequency furthermore. So step by step to pack small tasks onto one CPU with quite high operating point.
So current code wants to avoid wake up more idle CPUs and save power by avoid power consuming during paths during CPU waken up and going to sleeping. But as an contrary result, if pack small tasks into single CPU with high operating point also get worse power.
So this patch changes to compare CPU current utilization with CPU capacity, predict the CPU operating point after place task on the CPU. So we can easily to select CPU with lowest operation point after place task on it. Beyond this, it still keep to pack tasks into one CPU if CPU can keep in the lowest operating point.
Signed-off-by: Leo Yan leo.yan@linaro.org --- kernel/sched/fair.c | 49 +++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 39 insertions(+), 10 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 724b36c..04bb3d9 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4893,6 +4893,25 @@ static int find_new_capacity(struct energy_env *eenv, return idx; }
+static int find_cpu_new_capacity(int cpu, unsigned long util) +{ + struct sched_domain *sd; + const struct sched_group_energy *sge; + int idx; + + sd = rcu_dereference(per_cpu(sd_ea, cpu)); + sge = sd->groups->sge; + + for (idx = 0; idx < sge->nr_cap_states; idx++) + if (sge->cap_states[idx].cap >= util) + break; + + if (idx == sge->nr_cap_states) + idx = idx - 1; + + return idx; +} + static int group_idle_state(struct sched_group *sg) { int i, state = INT_MAX; @@ -5718,6 +5737,7 @@ static int energy_aware_wake_cpu(struct task_struct *p, int target, int sync) struct sched_domain *sd; struct sched_group *sg, *sg_target; int target_max_cap = INT_MAX; + int target_cap_idx = INT_MAX; int target_cpu = -1; unsigned long task_util_boosted, new_util; int i; @@ -5766,6 +5786,9 @@ static int energy_aware_wake_cpu(struct task_struct *p, int target, int sync) task_util_boosted = boosted_task_util(p); /* Find cpu with sufficient capacity */ for_each_cpu_and(i, tsk_cpus_allowed(p), sched_group_cpus(sg_target)) { + + int cap_idx; + /* * p's blocked utilization is still accounted for on prev_cpu * so prev_cpu will receive a negative bias due to the double @@ -5781,18 +5804,24 @@ static int energy_aware_wake_cpu(struct task_struct *p, int target, int sync) if (new_util > capacity_orig_of(i)) continue;
- if (new_util < capacity_curr_of(i)) { - target_cpu = i; - if (cpu_rq(i)->nr_running) - break; - } + cap_idx = find_cpu_new_capacity(i, new_util); + if (target_cap_idx > cap_idx) {
- /* - * cpu has capacity at higher OPP, keep it as fallback; - * give the previous cpu more chance to run - */ - if (task_cpu(p) == i || target_cpu == -1) + /* Select cpu with possible lower OPP */ target_cpu = i; + target_cap_idx = cap_idx; + + } else if (target_cap_idx == cap_idx) { + + /* Pack tasks if possible */ + if (cpu_rq(i)->nr_running) { + if (!cpu_rq(target_cpu)->nr_running) + target_cpu = i; + /* Give the previous cpu more chance to run */ + else if (task_cpu(p) == i) + target_cpu = i; + } + } }
/* If have not select any CPU, then to use previous CPU */ -- 1.9.1