Re: [Eas-dev] [RFC] Per Sched domain over utilization

20 Dec 2016


      On Mon, Dec 19, 2016 at 09:16:58PM -0500, Thara Gopinath wrote:
...
On 12/19/2016 07:42 PM, Leo Yan wrote:
...
On Mon, Dec 19, 2016 at 10:17:29AM -0500, Thara Gopinath wrote:
[...]
...
But what is missing is handling of misfit task. Can we not handle misfit
task as a separate condition in update_sd_lb? i.e in the above example
if either CPU A or CPU B has a misfit task, set the overutilization flag
for the next level SD which is equivalent to setting the flag in RD in
this case.
Agree, we can do this for misfit task :)
IIUC, the idea of your patch is firstly to use SD level 2 flag to
present "inner" overutilized, then later in load balance flow to check
if need set rd->overutilized flag for outer 'overutilized'. So for
'misfit' case, we need wait until load balance flow to check it and
set rd->overutilized flag.
rd->overutilized is like the overutilized flag at any sched group level
but for the highest sched_domain that does not have a parent. I am not
sure if i understand inner and outer over utilized properly. Say in a
system a cpu has four levels of sched domain - level1, level2, level3
and level4. What my patch proposes is as follows- If a load balance has
to happen for this cpu at level1, the flag will be set at first sched
group in level2. Similarly if load balance has to happen at level2, the
flag will be set at thefirst sched group in level3. Following this, if a
load balance has to happen at the highest level, ie level4, the flag
will be set at rd.
E.g. in upper case after set rd->overutilized flag, the scheduler cannot
distinguish the load blance requirement _coming_ from which specific
schedule group. rd-overutilied flag is an overall flag to indicate the
load balance should happen within level 4, but we lose info like in
level 4 which schedule group has performance issue so scheduler should
help it.
I recognize here have a big different understanding for how to use the
'overutilized' flag. One method is to use "overutilized" flag to
indicate one specific schedule domain is over-utilized so need do load
balance but we cannot know from these flags which schedule groups within
SD have performance bottleneck.
Another method is to use "overutilized" flag to indicate one specific
schedule group has performance bootleneck so any schedule group can
set "overutilized" flag for itself. Finally scheduler can easily know
which schedule groups have bottlenech (the LB requirement from 'who')
and should migrate out tasks from them. I personally this can give us
more chance to do subtle optimization with these infos, like we know
"overutilized" happens in LITTLE cluster so we can have different
strategy when "overutilized" happens in big cluster.
...
...
This is why I suggest to use 'discrete' flags in corresponding SD
level to present outer 'overutilized', so we can set flag at the first
place for outer 'overutilized' but not delay until in load balance flow.
Instead of directly setting the flag at the highest level, should we not
try to balance the load out at a lower level, if possible?
For 'misfit' task, we don't need do load balance in SD level 1; For
other case, we can firstly do load balance in SD level 1.
Thanks,
Leo Yan

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Eas-dev] [RFC] Per Sched domain over utilization