Current code calculates avg_load as below: sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity
Let's see below scenario for cluster level average load calculation:
The little cluster have 4 CPUs and 2 tasks running with 100% utilization (nice = 0); if little cluster capacity is 400, then get little cluster avg_load = (2 * 1024 * 1024) / (400 * 4) = 1310.
On the other hand, if big cluster have 4 CPUs and 4 tasks running with 100% utilization (nice = 0); if big cluster capacity is 1024, then get big cluster avg_load = (4 * 1024 * 1024) / (1024 * 4) = 1024.
So finally scheduler considers little cluster has higher load and this obviously doesn't make sense due big cluster has 4 CPUs been running but little cluster actually have 2 CPUs are idle.
So this is caused by the load value is not scaled by capacity, so it will get wrong average load value by divide capacity value. So just need simply divide CPU number (group->group_weight). So for upper case we can get little cluster avg_load = (2 * 1024) / 4 = 512; big cluster avg_load = (4 * 1024) / 4 = 1024.
Signed-off-by: Leo Yan leo.yan@linaro.org --- kernel/sched/fair.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index a6eef88..353520d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5475,7 +5475,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, }
/* Adjust by relative CPU capacity of the group */ - avg_load = (avg_load * SCHED_CAPACITY_SCALE) / group->sgc->capacity; + avg_load = avg_load / group->group_weight;
if (local_group) { this_load = avg_load; @@ -7250,7 +7250,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
/* Adjust by relative CPU capacity of the group */ sgs->group_capacity = group->sgc->capacity; - sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity; + sgs->avg_load = sgs->group_load / group->group_weight;
if (sgs->sum_nr_running) sgs->load_per_task = sgs->sum_weighted_load / sgs->sum_nr_running; -- 1.9.1