On 12/19/2016 07:42 PM, Leo Yan wrote:
Hi Thara,
On Mon, Dec 19, 2016 at 10:17:29AM -0500, Thara Gopinath wrote:
[...]
But what is missing is handling of misfit task. Can we not handle misfit task as a separate condition in update_sd_lb? i.e in the above example if either CPU A or CPU B has a misfit task, set the overutilization flag for the next level SD which is equivalent to setting the flag in RD in this case.
Agree, we can do this for misfit task :)
IIUC, the idea of your patch is firstly to use SD level 2 flag to present "inner" overutilized, then later in load balance flow to check if need set rd->overutilized flag for outer 'overutilized'. So for 'misfit' case, we need wait until load balance flow to check it and set rd->overutilized flag.
rd->overutilized is like the overutilized flag at any sched group level but for the highest sched_domain that does not have a parent. I am not sure if i understand inner and outer over utilized properly. Say in a system a cpu has four levels of sched domain - level1, level2, level3 and level4. What my patch proposes is as follows- If a load balance has to happen for this cpu at level1, the flag will be set at first sched group in level2. Similarly if load balance has to happen at level2, the flag will be set at thefirst sched group in level3. Following this, if a load balance has to happen at the highest level, ie level4, the flag will be set at rd.
This is why I suggest to use 'discrete' flags in corresponding SD level to present outer 'overutilized', so we can set flag at the first place for outer 'overutilized' but not delay until in load balance flow.
Instead of directly setting the flag at the highest level, should we not try to balance the load out at a lower level, if possible?
Thanks, Leo Yan
-- Regards Thara