Le 24 juin 2016 5:12 PM, "Leo Yan" <leo.yan@linaro.org> a écrit :
>
> Hi Vincent,
>
> Thanks for reviewing.
>
> On Fri, Jun 24, 2016 at 04:44:37PM +0200, Vincent Guittot wrote:
> > Hi Leo,
> >
> > On 23 June 2016 at 15:43, Leo Yan <leo.yan@linaro.org> wrote:
> > > After over tipping point if there have two big tasks are packing on
> > > single CPU, these two tasks can be easily meet the condition for
> > > task_hot(). In result can_migrate_task() returns false and will _NOT_
> > > spread task within cluster.
> > >
> > > This patch check extra condition if source CPU has more than two
> > > runnable tasks and destination CPU is idle, then consider tasks can be
> > > more aggressively to migrate.
> > >
> > > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > > ---
> > > kernel/sched/fair.c | 11 +++++++++++
> > > 1 file changed, 11 insertions(+)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 7fbfd41..a6eef88 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -6541,6 +6541,17 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
> > > return 1;
> > > }
> > >
> > > + /*
> > > + * After over tipping point then aggressively to spread task
> > > + * to CPUs if destination CPU is idle and source CPU has more than
> > > + * one task is runnable.
> > > + */
> > > + if (energy_aware() && env->dst_rq->rd->overutilized) {
> > > + if (env->dst_rq->nr_running == 0 &&
> > > + env->src_rq->nr_running >= 2)
> >
> > your condition seems to say that there are 2 tasks on src_rq and its
> > place in the function indicates that the task is not the running one
> > If the scheduler is not allowed to migrate the task, it's just because
> > the task has run on this cpu during the last 500us (default value). So
>
> I totally agree with your analysis and thanks your helping point out
> the root cause.
>
> > I still consider that clearing sysctl_sched_migration_cost is a
> > better alternative if the root cause is that the task has just started
> > to run on this cfs_rq and you want to favor task placement more than
> > cache hotness. sysctl_sched_migration_cost is used elsewhere in the
> > scheduler and can prevent other opportunity to balance tasks between
> > CPUs
>
> I'm not strong to stick to use this patch but not to set
> sysctl_sched_migration_cost. The mainly benefit for this patch is
> after applying this patch, then it will be useful to keep it in the
> code. Otherwise there have no one place to track this issue anymore.
>
> Or follow your suggestion, I may write one patch to define a special
> macor like: CONFIG_ENERGY_AWARE_AGGRESSIVE_TASK_SPREAD, after enable
> this configuration then we set sysctl_sched_migration_cost to 0.
>
> If have any other idea, just free let me know. Thanks.

Such settings are system tuning and don't have to be put in the kernel. They have to be considered like other tunings: cpufreq sampling rate, interruption affinity ...

I agree that we have to track these tunings which have major impact on EAS but kernel doesn't seem to be the best place IMO. Can we create a document like a wiki page in which we can list all settings that impact EAS behavior and what us the preferred value?

I pretty sure that other tunings (scheduler but not only) impacts EAS but default values might not be the best ones for EAS.

So we could track all if them in one single place

Regards,

>
> > > + return 1;
> > > + }
> > > +
> > > schedstat_inc(p, se.statistics.nr_failed_migrations_hot);
> > > return 0;
> > > }
> > > --
> > > 1.9.1
> > >