Re: [Eas-dev] [PATCH RFC 6/8] sched/fair: correct avg_load as CPU average load

28 Jun 2016

On Mon, Jun 27, 2016 at 10:56:57PM +0200, Vincent Guittot wrote:
...
On 23 June 2016 at 15:43, Leo Yan leo.yan@linaro.org wrote:
...
Current code calculates avg_load as below:
  sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) /
        sgs->group_capacity
Let's see below scenario for cluster level average load calculation:
The little cluster have 4 CPUs and 2 tasks running with 100%
utilization (nice = 0); if little cluster capacity is 400, then get
little cluster avg_load = (2 * 1024 * 1024) / (400 * 4) = 1310.
On the other hand, if big cluster have 4 CPUs and 4 tasks running with
100% utilization (nice = 0); if big cluster capacity is 1024, then get
big cluster avg_load = (4 * 1024 * 1024) / (1024 * 4) = 1024.
So finally scheduler considers little cluster has higher load and this
obviously doesn't make sense due big cluster has 4 CPUs been running but
little cluster actually have 2 CPUs are idle.
This perfectly makes sense, it's all about how much computation
capacity has to be shared between load so even if 2 little cpu are
idle, the tasks on big cluster have more compute capacity than those
on little cluster.
I'm not sure if I have done enough homework for this :) My
understanding for load value is the task requirement for CPU
computation capacity, the range is [0..1024]. So when reach 1024
meaning all CPU capacity has been consumed. avg_load is the average
value for sched_group's all CPUs.
And avg_load is a singal for CPU capacity consuming but not for task.
Please free correct me if I misunderstand for this.
I gave the example which is not quite typical. Let's see more practical
example:
Big cluster has 4 CPUs with 6 runnable tasks, group_load = 3547; so
big cluster avg_load = (3547 * 1024) / 4096 = 886;
Little cluster has 4 CPUs with 2 runnable tasks, group_load = 1639;,
so little cluster avg_load = (1639 * 1024 / 1603 = 1046;
So how we calculate imbalance load between these two clusters? In
current code, even big cluster is overloaded but it will not migrate
tasks to little cluser:
7788         /*
7789          * If the local group is busier than the selected busiest
group
7790          * don't try and pull any tasks.
7791          */
7792         if (local->avg_load >= busiest->avg_load)
7793                 goto out_balanced;
...
We can't use this avg_load into account only when big cluster becomes
overloaded whereas little cpu has some idle cpus.
Even we can add one extra checking for if big cluster is overloaded,
then question is how many imbalance load should be migrated from big
cluster to little cluster? For upper case, we can see avg_load value
is pointless. So this is why I tried to write this patch to change
avg_load to represent the concept for "CPU capacity consuming ratio".
...
...
So this is caused by the load value is not scaled by capacity, so it
will get wrong average load value by divide capacity value. So just need
simply divide CPU number (group->group_weight). So for upper case we can
get little cluster avg_load = (2 * 1024) / 4 = 512; big cluster avg_load
= (4 * 1024) / 4 = 1024.
Signed-off-by: Leo Yan leo.yan@linaro.org
kernel/sched/fair.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a6eef88..353520d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5475,7 +5475,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p,
                }
            /* Adjust by relative CPU capacity of the group */


          avg_load = (avg_load * SCHED_CAPACITY_SCALE) / group->sgc->capacity;




          avg_load = avg_load / group->group_weight;

          if (local_group) {
                  this_load = avg_load;



@@ -7250,7 +7250,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
    /* Adjust by relative CPU capacity of the group */
    sgs->group_capacity = group->sgc->capacity;


  sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity;




  sgs->avg_load = sgs->group_load / group->group_weight;

  if (sgs->sum_nr_running)
          sgs->load_per_task = sgs->sum_weighted_load / sgs->sum_nr_running;



--
1.9.1

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Eas-dev] [PATCH RFC 6/8] sched/fair: correct avg_load as CPU average load

Signed-off-by: Leo Yan leo.yan@linaro.org