Re: [Eas-dev] [RFC] Per Sched domain over utilization

16 Dec 2016

      On Tue, Dec 13, 2016 at 01:20:07AM +0800, Leo Yan wrote:
...
On Mon, Dec 12, 2016 at 03:28:06PM +0000, Morten Rasmussen wrote:
[...]
...
...
...
For 4 middle-sized tasks the EAS-code should hopefully spread the tasks
at wake-up.
If we want to support spreading 4 middle-size tasks, the question is
what's the criteria to set "overutilized" flag for LITTLE cluster's
sched domain?
The EAS-code at wake-up should be fine as long tasks do regularly wake
up, i.e. no task has a utilization approaching or exceeding the computer
capacity available. Otherwise, we may need help from the big cluster.
Ideally, the criteria for calling for help (setting the overutilized
flag at the root_domain for this RFC patch) should be:
    It is not possible to balance the tasks within the cluster such
    that every cpu has a minimum of spare cycles.

Figuring out the whether it is possible to balance the tasks such that
none of them are constrained in term of available cpu cycles isn't easy.
Until now we have taken a much more conservative approach by making the
criteria: If any cpu is over-utilized. As this is very easy to
determine and should cover all the cases covered by the ideal criteria
above, although we will call for help in many cases where it isn't
necessary.
A half-baked thought:
If we assume that the wake-up EAS-code does a good job, could we set the
flag if the wake target cpu ends up being over-utilized? If EAS failed
to find enough capacity for the task it must be due to one the following
reasons:
        1. Because the task is too big to fit the raw capacity offered
           by the little cpus.
        2. The utilization of other tasks leaves too little spare
           capacity left for the task, and it is not possible to
           reorganize the task distribution to get sufficient
           non-fragmented spare capacity.
        3. Spare capacity is fragmented, but it would be possible to
           reorganize the tasks to provide the necessary spare capacity.

and 2. should be fine as those are cases where we do need help from

the big cluster. 3. is more difficult as while it is theoretically
possible to sort things out, it might take a long time to do so, in the
meantime one or more tasks will suffer.
Your good summary actually reminds another important thing: we have
given multiple meanings to "overutilized" this single one flag, as
result this requires the single flag (a bool value) to handle multiple
cases.
So I try to summary semantics for "overutilized" flag as below:

Inner "overutilized": This kind "overutilized" is the schedule
domain internal issue and can be adjusted within schedule domain; so
scheduler should find best combination between tasks and CPUs in the
schedule domain, this also is quite match the case which you
mantioned in upper item 3;

Outer "overutilized": This kind "overutilized" is that schedule
domain cannot adjust by itself so ask other scheduler domains to help
pull tasks. Usually we expects this is a higher capacity scheduler
domain to pull tasks so can improve performance; This is quite
match upper items 1/2;

Global "overutilized": This kind "overutilized" is that schedule
domains should spread tasks as possible and this may happen for task
migration from higher capacity schedule domain to lower capacity
schedule domain;

IIUC, in original code "rd->overutilized" flag is used to indicate
these three semantics; and in Thara's patch the "sd->overutilized"
flag is used for inner "overutilized" case, and use "rd->overutilized"
to indicate outer "overutilized" case and global "overutilized" case.
So Thara's patch is difficult to handle situation for 'misfit'.
I prefer to we can distinguish upper three semantics properly,
something like below define macros:
#define SCHED_INNER_OVERUTILIZED        0x1
#define SCHED_OUTER_OVERUTILIZED        0x2
#define SCHED_GLOBAL_OVERUTILIZED       0x4
So we use all three macros for "sd->overutilized" and only use
SCHED_GLOBAL_OVERUTILIZED for "rd->overutilized". So for example, if
any schedule domain has set SCHED_OUTER_OVERUTILIZED, that means we
could check the local schedule group with higher capacity than busiest
schedule group and execute load balance.
Please feel free correct me if wrong.
I should have read your email before I wrote the reply I just sent to
Vincent and given you credit for your proposal. I fully agree with you
that for asymmetric cpu capacity systems we have additional meanings of
being 'over-utilized' that can be addressed in different ways.
I'm not sure exactly how we would determine when we are 'global
over-utilized' and distinguish it from 'outer over-utilized'. That
requires a bit more pondering.
Regarding the flags in Thara's proposal. sd->overutilization can be a
parent flag as well if you have more than two sched_domain levels. We
need to consider more levels to have a scalable solution.
Thanks,
Morten
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Eas-dev] [RFC] Per Sched domain over utilization