On Tue, Jul 05, 2016 at 04:17:03PM +0800, Leo Yan wrote:
On Mon, Jul 04, 2016 at 11:13:50AM +0100, Morten Rasmussen wrote:
On Thu, Jun 23, 2016 at 09:43:06PM +0800, Leo Yan wrote:
When load_avg is much higher than util_avg, then it indicate either the task have higher priority for more weight value for load_avg or because the task have much longer time for runnable state.
So for both this two case, replace util_avg value with load_avg. So use this way to inflate utilization signal and finally let the single big task has more chance to migrate to big CPU.
Signed-off-by: Leo Yan leo.yan@linaro.org
include/linux/sched.h | 1 + kernel/sched/fair.c | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 36 insertions(+)
diff --git a/include/linux/sched.h b/include/linux/sched.h index 644c39a..5d6bb25 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1166,6 +1166,7 @@ struct load_weight {
- for entity, support any load.weight always runnable
*/ struct sched_avg {
- u64 last_migrate_time; u64 last_update_time, load_sum; u32 util_sum, period_contrib; unsigned long load_avg, util_avg;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 185efe1..7fbfd41 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -674,6 +674,7 @@ void init_entity_runnable_average(struct sched_entity *se) { struct sched_avg *sa = &se->avg;
- sa->last_migrate_time = 0; sa->last_update_time = 0; /*
- sched_avg's period_contrib should be strictly less then 1024, so
@@ -2771,6 +2772,7 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
skip_aging: se->avg.last_update_time = cfs_rq->avg.last_update_time;
- se->avg.last_migrate_time = cfs_rq->avg.last_update_time; cfs_rq->avg.load_avg += se->avg.load_avg; cfs_rq->avg.load_sum += se->avg.load_sum;
@@ -5228,6 +5230,11 @@ static inline unsigned long task_util(struct task_struct *p) return p->se.avg.util_avg; }
+static inline unsigned long task_load(struct task_struct *p) +{
- return p->se.avg.load_avg;
+}
unsigned int capacity_margin = 1280; /* ~20% margin */
static inline unsigned long boosted_task_util(struct task_struct *task); @@ -5369,8 +5376,35 @@ static inline unsigned long boosted_task_util(struct task_struct *task) { unsigned long util = task_util(task);
unsigned long load = task_load(task); unsigned long margin = schedtune_task_margin(task);
int cpu = task_cpu(task);
struct sched_entity *se = &task->se;
u64 delta;
/*
* change to use load metrics if can meet two conditions:
* - load is 20% higher than util, so that means task have extra
* 20% time for runnable state and waiting to run; Or the task has
* higher prioirty than nice 0; then consider to use load signal
* rather than util signal;
* - load reach CPU "over-utilized" criteria.
*/
if ((load * capacity_margin > capacity_of(cpu) * 1024) &&
(load * 1024 > util * capacity_margin))
util = load;
else {
/*
* Avoid ping-pong issue, so make sure the task can run at
* least once in higher capacity CPU
*/
delta = se->avg.last_update_time - se->avg.last_migrate_time;
if (delta < sysctl_sched_latency &&
capacity_of(cpu) == cpu_rq(cpu)->rd->max_cpu_capacity.val)
util = load;
}
This extra boost for tasks that have recently migrated isn't mentioned in the cover letter but seems to be a significant part of the actual patch.
Yes.
IIUC, you boost utilization of tasks that have recently migrated. Could you explain a little more about why it is needed?
At first the patch wants to boost utilization if the task have stayed at runnable state for enough time; then after migrate the task from little core to big core, ensure the task can keep to run on big core for a while (at least ensure the task can run on big core for one time). So this is why in this two scenarios replace utilization by load signals.
I don't see why a task that has recently migrated little->big should not get to run at least once on the big cpu if the system is not above the tipping point (over-utilized).
The task was enqueued on a big rq recently, and nobody should have pulled it away before it had a chance to run at least once. We don't do load_balance() when below the tipping point. AFAICT, your recently migrated condition only has effect after the first wake-up on a big cpu (i.e. second wake, third wake, and so forth until sched_latency time has passed since the migration). The first wake-up was when the migration happened.
So to me, it looks like a mechanism to make the task keep waking up on a big cpu until sched_latency time has passed after a migration. The first wake-up should already be covered.
The task will appear bigger each time it migrates regardless of whether it has migrate little->big or big->little. Doesn't it mean that you are likely to send tasks that have recently migrated big->little back to big immediately because of the boost?
Yes. Also want to avoid ping-pong issue if we have boosted utilization signal so let task can run at big cluster for a while.
Actually this patch wants to achieve similiar effect with HMP up_threshold and down_threshold, if task is over up_threshod then the task can stay at big core for a while and the task will be migrated back to little core until its load is less than down_threshold.
I think I get what you want to achieve, but isn't it more a kind of one-way bias rather than a hysteresis like HMP has? You only try to keep task on big cpus.
So especially for the scenario of single thread which has big load but it does _NOT_ over EAS tipping point, we can see the task can stay in little cluster and has much less chance to migrate to big core. But the same scenario running with HMP, its load_avg value just need occasionally cross up_threshod then it will have chance to stay on big core. So HMP can achieve much better performance result than EAS.
This sounds like a scenario where you want to boost utilization of the task to get it out of the tipping point grey zone to improve latency.