On 22 December 2016 at 16:58, Leo Yan leo.yan@linaro.org wrote:
When a new idle CPU executes idle balance, the idle swap thread has not been switched in actually. The current thread is a normal task and this task is going to not occupy the CPU anymore so the CPU is seeking to pull task onto it.
But at this moment rq->h_nr_running still adds accounts for this normal thread; this gives scheduler misunderstanding the CPU has one running task on it and finally adds it into sum running number of schedule group.
Are you sure of the point above ? I'm pretty sure that in the mainline scheduler the task has already been dequeued and cfs->h_nr_running and rq->nr_running have been decreased when newly idle load balance is called so their are null
At the end, function group_has_capacity() compare the running task number with CPU number, and unfortunately if all other CPUs have real running tasks then the group is considered as no spare 'capacity' and skip migrate any misfit task from another schedule group in the same schedule domain.
This patch is to fix nu_running accounting for new idle CPU, when checks the new idle CPU it doesn't account the running number into schedule group.
Signed-off-by: Leo Yan leo.yan@linaro.org
kernel/sched/fair.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index f5fb04f..6ebf7c7 100755 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7154,9 +7154,22 @@ static inline void update_sg_lb_stats(struct lb_env *env,
sgs->group_load += load; sgs->group_util += cpu_util(i);
sgs->sum_nr_running += rq->cfs.h_nr_running;
nr_running = rq->nr_running;
/*
* If destination CPU is one new idle CPU, that means current
* task is occupying CPU so h_nr_running = 1 but in fact this
* task is going to release CPU for idle balance.
*
* Here should not account this task into running number, so
* give more chance for task migration onto this idle CPU.
*/
if (env->idle == CPU_NEWLY_IDLE && env->dst_cpu == i)
nr_running = 0;
else {
sgs->sum_nr_running += rq->cfs.h_nr_running;
nr_running = rq->nr_running;
}
if (nr_running > 1) *overload = true;
-- 2.7.4