Re: [Eas-dev] [PATCH RFC 4/8] sched/fair: use load to replace util when have big difference

5 Jul 2016

On Tue, Jul 05, 2016 at 04:17:03PM +0800, Leo Yan wrote:
...
On Mon, Jul 04, 2016 at 11:13:50AM +0100, Morten Rasmussen wrote:
...
On Thu, Jun 23, 2016 at 09:43:06PM +0800, Leo Yan wrote:
...
When load_avg is much higher than util_avg, then it indicate either the
task have higher priority for more weight value for load_avg or because
the task have much longer time for runnable state.
So for both this two case, replace util_avg value with load_avg. So use
this way to inflate utilization signal and finally let the single big
task has more chance to migrate to big CPU.
Signed-off-by: Leo Yan leo.yan@linaro.org
include/linux/sched.h |  1 +
 kernel/sched/fair.c   | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 644c39a..5d6bb25 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1166,6 +1166,7 @@ struct load_weight {


for entity, support any load.weight always runnable



*/
 struct sched_avg {

u64 last_migrate_time;
u64 last_update_time, load_sum;
u32 util_sum, period_contrib;
unsigned long load_avg, util_avg;

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 185efe1..7fbfd41 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -674,6 +674,7 @@ void init_entity_runnable_average(struct sched_entity *se)
 {
   struct sched_avg *sa = &se->avg;

sa->last_migrate_time = 0;
sa->last_update_time = 0;
/*
sched_avg's period_contrib should be strictly less then 1024, so



@@ -2771,6 +2772,7 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
skip_aging:
   se->avg.last_update_time = cfs_rq->avg.last_update_time;

se->avg.last_migrate_time = cfs_rq->avg.last_update_time;
cfs_rq->avg.load_avg += se->avg.load_avg;
cfs_rq->avg.load_sum += se->avg.load_sum;

@@ -5228,6 +5230,11 @@ static inline unsigned long task_util(struct task_struct *p)
   return p->se.avg.util_avg;
 }
+static inline unsigned long task_load(struct task_struct *p)
+{

return p->se.avg.load_avg;

+}



unsigned int capacity_margin = 1280; /* ~20% margin */
static inline unsigned long boosted_task_util(struct task_struct *task);
@@ -5369,8 +5376,35 @@ static inline unsigned long
 boosted_task_util(struct task_struct *task)
 {
   unsigned long util = task_util(task);

unsigned long load = task_load(task);
unsigned long margin = schedtune_task_margin(task);

int cpu = task_cpu(task);

struct sched_entity *se = &task->se;

u64 delta;


/*

* change to use load metrics if can meet two conditions:


* - load is 20% higher than util, so that means task have extra


*   20% time for runnable state and waiting to run; Or the task has


*   higher prioirty than nice 0; then consider to use load signal


*   rather than util signal;


* - load reach CPU "over-utilized" criteria.


*/


if ((load * capacity_margin > capacity_of(cpu) * 1024) &&

   (load * 1024 > util * capacity_margin))


util = load;


else {

/*


 * Avoid ping-pong issue, so make sure the task can run at


 * least once in higher capacity CPU


 */


delta = se->avg.last_update_time - se->avg.last_migrate_time;


if (delta < sysctl_sched_latency &&


    capacity_of(cpu) == cpu_rq(cpu)->rd->max_cpu_capacity.val)


	util = load;


}


This extra boost for tasks that have recently migrated isn't mentioned
in the cover letter but seems to be a significant part of the actual
patch.
Yes.
...
IIUC, you boost utilization of tasks that have recently migrated. Could
you explain a little more about why it is needed?
At first the patch wants to boost utilization if the task have stayed at
runnable state for enough time; then after migrate the task from little
core to big core, ensure the task can keep to run on big core for a
while (at least ensure the task can run on big core for one time). So
this is why in this two scenarios replace utilization by load signals.
I don't see why a task that has recently migrated little->big should not
get to run at least once on the big cpu if the system is not above the
tipping point (over-utilized).
The task was enqueued on a big rq recently, and nobody should have
pulled it away before it had a chance to run at least once. We don't do
load_balance() when below the tipping point. AFAICT, your recently
migrated condition only has effect after the first wake-up on a big cpu
(i.e. second wake, third wake, and so forth until sched_latency time has
passed since the migration). The first wake-up was when the migration
happened.
So to me, it looks like a mechanism to make the task keep waking up on a
big cpu until sched_latency time has passed after a migration. The first
wake-up should already be covered.
...
...
The task will appear bigger each time it migrates regardless of whether
it has migrate little->big or big->little. Doesn't it mean that you are
likely to send tasks that have recently migrated big->little back to big
immediately because of the boost?
Yes. Also want to avoid ping-pong issue if we have boosted utilization
signal so let task can run at big cluster for a while.
Actually this patch wants to achieve similiar effect with HMP
up_threshold and down_threshold, if task is over up_threshod then the
task can stay at big core for a while and the task will be migrated
back to little core until its load is less than down_threshold.
I think I get what you want to achieve, but isn't it more a kind of
one-way bias rather than a hysteresis like HMP has? You only try to keep
task on big cpus.
...
So especially for the scenario of single thread which has big load but
it does _NOT_ over EAS tipping point, we can see the task can stay in
little cluster and has much less chance to migrate to big core. But
the same scenario running with HMP, its load_avg value just need
occasionally cross up_threshod then it will have chance to stay on big
core. So HMP can achieve much better performance result than EAS.
This sounds like a scenario where you want to boost utilization of the
task to get it out of the tipping point grey zone to improve latency.

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Eas-dev] [PATCH RFC 4/8] sched/fair: use load to replace util when have big difference

Signed-off-by: Leo Yan leo.yan@linaro.org