This patch series is to optimize performance and refine patches according to review comments.
- Patch 0001 is add more chance to select previous CPU for cache hot;
- In EAS code, the critical path is task waken up with function energy_aware_wake_cpu(); this function is purposed to select one possible target CPU with most energy saving. So it includes two underlying functionality: the first one is to select most power efficiency CPU for the task in one cluster, another is to migrate task from big core to little core if little core can meet performance requirement.
For first functionality for selection most power efficiency CPU within cluster, EAS prefers to select a non-idle CPU so as result it packs tasks into one CPU as possible. This is not an optimal solution with two reasons: the first reason is this introduces long schedule latency after multiple tasks on the same rq; the second reason is it easily gets result as small tasks packing within one CPU with higher operating point. Finally this is the observed foremost issue if there have multiple tasks, neither power or performance can achieve optimal result.
So patch 0002 is to solve this issue to try to select CPU if can keep CPU at lowest OPP as possible.
- Current code has no mechanism to spread these tasks throughout the little cluster so tasks are packed on one CPU when CPU is not “over-utilized”. In this case, only one CPU is very busy but other CPUs in the same cluster are in idle state.
Patch 0003 is to spread task in lowest schedule domain (in cluster level) after add a medium state named "half-utilized". This may a temperary solution, due this likely a better solution is to unify flag for "over-utilized".
- In CFS, PELT signals take long time to increase to high value and decay to small value; on the other hand, EAS does not take account load_avg value (runnable time) but only focus on util_avg value (running time). So these issues are really dependent on fundamental signals.
So hope have advanced method to accelerate PELT signals and dismiss the issue introduced by long runnable time. Patch 0004 we can take it as a temperary solution, likely we can use the big difference between load_avg and util_avg to change to use inflate value, also can use it to reflect runnable time.
Patch 0004 also has side effects for misfit flag. If any CPU has “misfit” task on it, then EAS will set imbalance value as CPU capacity and migrate such load from little core to big core. So “misfit” is quite good for there have only one big task on the little CPU so the CPU cannot meet task’s performance requirement with function “task_fits_max(p, rq->cpu)”; but if there have two tasks on the little CPU, then the task’s utilization value just half of CPU capacity value so finally EAS considers CPU can meet task requirement. Patch 0004 can more easily to set true for misfit: rq->misfit_task = !task_fits_max(p, rq->cpu)
- In function energy_aware_wake_cpu(), it is possible to directly migrate task from little core to big core, but the conditions are rigid: the condition 1 is CPU capacity cannot meet this task requirement; the condition 2 is source CPU is “over-utilized”. If the source CPU is not “over-utilized” for condition 2, then even little CPU cannot meet task requirement but EAS will compare CPU energy and as the end it still selects previous little CPU
Patch 0005 is to add extra path to directly migrate task from little core to big core.
- For very heavily workload with multi-threads, we observed the tasks are not migrated within big cluster, also tasks are hard to migrate from big cluster to little cluster even little cluster have idle CPUs are available to run. So need optimize EAS to handle this case likely to go back with CFS behaviour.
Patch 0006 and 0008 are to fix this related issues.
- SMP load balance may migrate small task onto big core, but usually at this time point we are only looking forward big tasks migration, finally this hurts both power and performance. So patch 0007 it will avoid small task to migrate to higher capacity CPU so it will give more chance to real big task migration to higher capacity CPU.
Leo Yan (8): sched/fair: optimize to more chance to select previous CPU sched/fair: select CPU based on using lowest capacity sched/fair: support to spread task in lowest schedule domain sched/fair: use load metrics to replace util when have big difference sched/fair: add path to migrate to higher capacity CPU sched/fair: force idle balance when busiest group is overloaded sched/fair: avoid small task to migrate to higher capacity CPU sched/fair: set imbalance for too many tasks on rq
kernel/sched/fair.c | 193 ++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 173 insertions(+), 20 deletions(-)
-- 1.9.1