When execute nohz_load_balance, it will call find_busiest_group() so find out which schedule group is busiest. There have one case is: busiest group is overloaded but local group still has spare capacity, but current code it will skip this situation by below code:
/* SD_BALANCE_NEWIDLE trumps SMP nice when underutilized */ if (env->idle == CPU_NEWLY_IDLE &&group_has_capacity(env, local) && busiest->group_no_capacity) goto force_balance;
This is due env->idle is CPU_IDLE for the idle CPUs during nohz_load_balance. So finally it will not force balance. And for worse situation, it will directly skip load balance after meet below condition:
/* * If the local group is busier than the selected busiest group * don't try and pull any tasks. */ if (local->avg_load >= busiest->avg_load) goto out_balanced;
/* * Don't pull any tasks if this group is already above the domain * average load. */ if (local->avg_load >= sds.avg_load) goto out_balanced;
So this patch is to force balance for idle CPU.
Signed-off-by: Leo Yan leo.yan@linaro.org --- kernel/sched/fair.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 71f020d..42b8801 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7843,8 +7843,8 @@ static struct sched_group *find_busiest_group(struct lb_env *env) goto force_balance;
/* SD_BALANCE_NEWIDLE trumps SMP nice when underutilized */ - if (env->idle == CPU_NEWLY_IDLE && group_has_capacity(env, local) && - busiest->group_no_capacity) + if ((env->idle == CPU_NEWLY_IDLE || env->idle == CPU_IDLE) && + group_has_capacity(env, local) && busiest->group_no_capacity) goto force_balance;
/* Misfitting tasks should be dealt with regardless of the avg load */ -- 1.9.1