Re: [Eas-dev] [RFC] Per Sched domain over utilization

15 Dec 2016


      Hello Morten,
Thanks for the review. Sorry for the delay in reply.
On 12/09/2016 12:06 PM, Morten Rasmussen wrote:
...
Hi Thara,
Thanks for sharing your proposal.
On Thu, Dec 08, 2016 at 11:42:50AM -0500, Thara Gopinath wrote:
...
On 12/08/2016 05:24 AM, Leo Yan wrote:
...
On Wed, Dec 07, 2016 at 05:22:37PM -0500, Thara Gopinath wrote:
...
@@ -7701,17 +7743,26 @@ next_group:
env->src_grp_nr_running = sds->busiest_stat.sum_nr_running;

/* Setting overutilized flag might not be necessary here

Revisit


*/
if (!lb_sd_parent(env->sd)) {
       /* update overload indicator if we are at root domain */
       if (env->dst_rq->rd->overload != overload)
               env->dst_rq->rd->overload = overload;
}


     /* Update over-utilization (tipping point, U >= 0) indicator */


     if (env->dst_rq->rd->overutilized != overutilized)


             env->dst_rq->rd->overutilized = overutilized;


} else {
     if (!env->dst_rq->rd->overutilized && overutilized)


             env->dst_rq->rd->overutilized = true;




if (overutilized)
     set_sd_overutilized(env->sd, env->dst_rq->rd);



If it's not overutilized, here should call function
clear_sd_overutilized()? The old code clears root domain flag.
Yes. May be we should. I am not sure about it. Because we clear the flag
at a different place in this implementation.
I think Leo is right. If we have just visited all cpus in the
sched_domain and not found that any of them are over-utilized, we should
be able to clear the flag for the domain. I don't see why we shouldn't
if the intention with the flag is to indicate whether the domain is
over-utilized or not?
Yes I will clear the flag here.
...
...
...
...

/* If the domain util is greater that domain capacity, load balancing

needs to be done at the next sched domain level as well


*/
if (sds->total_capacity * 1024 < sds->total_util * capacity_margin) {
     /* If  already at the highest domain nothing can be done */


     if (env->sd->parent)


             set_sd_overutilized(env->sd->parent,


                                     env->dst_rq->rd);



So usually this will set root domain's flag after the whole schedule
domain util greater than domain capacity. If CPU has one "misfit" task
then scheduler will not reach this condition, so this will not set
root domain's flag and introduce delay to migrate task.
Hmm yes. you are correct. we may have to handle misfit tasks separately.
I don't quite understand why we have to set the flag on the parent
sched_domain here. It should be set anyways when update_sd_lb_stats() is
called for the parent sched_domain in the next iteration of
for_each_domain() in rebalance_domains(). But I see that we bails out
early there, which I don't quite understand either. More on that in a
separate reply ;-)
We bail out in rebalance_domains if no group is over loaded in a sched
domain. Setting the flag here ensures that the domain is not skipped in
rebalance_domain.
...
I agree with Leo that total_util doesn't flag some quite important
cases as overutilized.
...
...
...
}
 }
@@ -7932,8 +7983,11 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
    */
   update_sd_lb_stats(env, &sds);

if (energy_aware() && !env->dst_rq->rd->overutilized)
     goto out_balanced;




/* Is this check really required here?? Revisit */

if (energy_aware()) {

     if (!is_sd_overutilized(env->sd, env->dst_rq->rd))


             goto out_balanced;


}
local = &sds.local_stat;
busiest = &sds.busiest_stat;


@@ -8000,6 +8054,12 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
 force_balance:
   /* Looks like there is an imbalance. Compute it */
   calculate_imbalance(env, &sds);


/* Is this the correct place to clear this flag? Should access

to flag be locked? Revisit.


*/
clear_sd_overutilized(env->sd, env->dst_rq->rd);

Have specific sequency to clear overutilized flag for root domain and SD
level 2? Like firstly clean root domain flag and then clear SD level
2 flag.
We don't need a specific sequence. As and when each sched domain get
balanced, the corresponding flag will get cleared.
IIUC, it isn't quite right to clear the flag here. The sched_domain may
have more than two groups, for example a 4+4 big.LITTLE topology has
four groups at the lowest level. Load-balance may resolve
over-utilization in one sched_group, but one or more other groups might
still be over-utilized
I don't understand this. IIUC load balance is between the sched groups
of a domain. Irrespective of how many groups are there call to
load_balance() will balance the load between all the groups in the
domain(affinity being an exception). Am I missing something here?
...
They won't be balanced this time anyway, so the flag might be re-enabled
again at the next load-balance, so it may not be a problem, it just
makes it a bit harder to understand.
Even in the case where we only have two sched_group, load-balance isn't
guaranteed to result no groups being over-utilized. The source group may
be so heavily loaded that it over-utilizes the destination group too
after the load has been balanced.
This I agree. This is not the correct place. I can move this under
out_balanced in load_balance function. Any other recommendation on where
this flag can be conclusively cleared?
...
Morten
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
--
Regards
Thara

    

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Eas-dev] [RFC] Per Sched domain over utilization