New subject: [PATCH RFCv2 1/8] sched/fair: optimize to more chance to select previous CPU

12 Jul 2016


      This patch series is to optimize performance and refine patches
according to review comments.
- Patch 0001 is add more chance to select previous CPU for cache hot;
- In EAS code, the critical path is task waken up with function
  energy_aware_wake_cpu(); this function is purposed to select one
  possible target CPU with most energy saving. So it includes two
  underlying functionality: the first one is to select most power
  efficiency CPU for the task in one cluster, another is to migrate
  task from big core to little core if little core can meet performance
  requirement.
For first functionality for selection most power efficiency CPU
  within cluster, EAS prefers to select a non-idle CPU so as result it
  packs tasks into one CPU as possible. This is not an optimal solution
  with two reasons: the first reason is this introduces long schedule
  latency after multiple tasks on the same rq; the second reason is it
  easily gets result as small tasks packing within one CPU with higher
  operating point. Finally this is the observed foremost issue if there
  have multiple tasks, neither power or performance can achieve optimal
  result.
So patch 0002 is to solve this issue to try to select CPU if can keep
  CPU at lowest OPP as possible.
- Current code has no mechanism to spread these tasks throughout the
  little cluster so tasks are packed on one CPU when CPU is not
  “over-utilized”. In this case, only one CPU is very busy but other
  CPUs in the same cluster are in idle state.
Patch 0003 is to spread task in lowest schedule domain (in cluster
  level) after add a medium state named "half-utilized". This may a
  temperary solution, due this likely a better solution is to unify
  flag for "over-utilized".
- In CFS, PELT signals take long time to increase to high value and
  decay to small value; on the other hand, EAS does not take account
  load_avg value (runnable time) but only focus on util_avg value
  (running time). So these issues are really dependent on fundamental
  signals.
So hope have advanced method to accelerate PELT signals and dismiss
  the issue introduced by long runnable time. Patch 0004 we can take it
  as a temperary solution, likely we can use the big difference between
  load_avg and util_avg to change to use inflate value, also can use it
  to reflect runnable time.
Patch 0004 also has side effects for misfit flag. If any CPU has
  “misfit” task on it, then EAS will set imbalance value as CPU
  capacity and migrate such load from little core to big core. So
  “misfit” is quite good for there have only one big task on the
  little CPU so the CPU cannot meet task’s performance requirement
  with function “task_fits_max(p, rq->cpu)”; but if there have two
  tasks on the little CPU, then the task’s utilization value just
  half of CPU capacity value so finally EAS considers CPU can meet
  task requirement. Patch 0004 can more easily to set true for
  misfit: rq->misfit_task = !task_fits_max(p, rq->cpu)
- In function energy_aware_wake_cpu(), it is possible to directly
  migrate task from little core to big core, but the conditions are
  rigid: the condition 1 is CPU capacity cannot meet this task
  requirement; the condition 2 is source CPU is “over-utilized”. If the
  source CPU is not “over-utilized” for condition 2, then even little
  CPU cannot meet task requirement but EAS will compare CPU energy and
  as the end it still selects previous little CPU
Patch 0005 is to add extra path to directly migrate task from little
  core to big core.
- For very heavily workload with multi-threads, we observed the tasks
  are not migrated within big cluster, also tasks are hard to migrate
  from big cluster to little cluster even little cluster have idle CPUs
  are available to run. So need optimize EAS to handle this case likely
  to go back with CFS behaviour.
Patch 0006 and 0008 are to fix this related issues.
- SMP load balance may migrate small task onto big core, but usually at
  this time point we are only looking forward big tasks migration,
  finally this hurts both power and performance. So patch 0007 it will
  avoid small task to migrate to higher capacity CPU so it will give
  more chance to real big task migration to higher capacity CPU.
Leo Yan (8):
  sched/fair: optimize to more chance to select previous CPU
  sched/fair: select CPU based on using lowest capacity
  sched/fair: support to spread task in lowest schedule domain
  sched/fair: use load metrics to replace util when have big difference
  sched/fair: add path to migrate to higher capacity CPU
  sched/fair: force idle balance when busiest group is overloaded
  sched/fair: avoid small task to migrate to higher capacity CPU
  sched/fair: set imbalance for too many tasks on rq
kernel/sched/fair.c | 193 ++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 173 insertions(+), 20 deletions(-)
--
1.9.1

[PATCH RFCv2 0/8] EASv5.2: Performance Optimization