On Tue, Dec 13, 2016 at 01:20:07AM +0800, Leo Yan wrote:
On Mon, Dec 12, 2016 at 03:28:06PM +0000, Morten Rasmussen wrote:
[...]
For 4 middle-sized tasks the EAS-code should hopefully spread the tasks at wake-up.
If we want to support spreading 4 middle-size tasks, the question is what's the criteria to set "overutilized" flag for LITTLE cluster's sched domain?
The EAS-code at wake-up should be fine as long tasks do regularly wake up, i.e. no task has a utilization approaching or exceeding the computer capacity available. Otherwise, we may need help from the big cluster.
Ideally, the criteria for calling for help (setting the overutilized flag at the root_domain for this RFC patch) should be:
It is not possible to balance the tasks within the cluster such that every cpu has a minimum of spare cycles.
Figuring out the whether it is possible to balance the tasks such that none of them are constrained in term of available cpu cycles isn't easy. Until now we have taken a much more conservative approach by making the criteria: If any cpu is over-utilized. As this is very easy to determine and should cover all the cases covered by the ideal criteria above, although we will call for help in many cases where it isn't necessary.
A half-baked thought:
If we assume that the wake-up EAS-code does a good job, could we set the flag if the wake target cpu ends up being over-utilized? If EAS failed to find enough capacity for the task it must be due to one the following reasons: 1. Because the task is too big to fit the raw capacity offered by the little cpus. 2. The utilization of other tasks leaves too little spare capacity left for the task, and it is not possible to reorganize the task distribution to get sufficient non-fragmented spare capacity. 3. Spare capacity is fragmented, but it would be possible to reorganize the tasks to provide the necessary spare capacity.
- and 2. should be fine as those are cases where we do need help from
the big cluster. 3. is more difficult as while it is theoretically possible to sort things out, it might take a long time to do so, in the meantime one or more tasks will suffer.
Your good summary actually reminds another important thing: we have given multiple meanings to "overutilized" this single one flag, as result this requires the single flag (a bool value) to handle multiple cases.
So I try to summary semantics for "overutilized" flag as below:
Inner "overutilized": This kind "overutilized" is the schedule domain internal issue and can be adjusted within schedule domain; so scheduler should find best combination between tasks and CPUs in the schedule domain, this also is quite match the case which you mantioned in upper item 3;
Outer "overutilized": This kind "overutilized" is that schedule domain cannot adjust by itself so ask other scheduler domains to help pull tasks. Usually we expects this is a higher capacity scheduler domain to pull tasks so can improve performance; This is quite match upper items 1/2;
Global "overutilized": This kind "overutilized" is that schedule domains should spread tasks as possible and this may happen for task migration from higher capacity schedule domain to lower capacity schedule domain;
IIUC, in original code "rd->overutilized" flag is used to indicate these three semantics; and in Thara's patch the "sd->overutilized" flag is used for inner "overutilized" case, and use "rd->overutilized" to indicate outer "overutilized" case and global "overutilized" case. So Thara's patch is difficult to handle situation for 'misfit'.
I prefer to we can distinguish upper three semantics properly, something like below define macros:
#define SCHED_INNER_OVERUTILIZED 0x1 #define SCHED_OUTER_OVERUTILIZED 0x2 #define SCHED_GLOBAL_OVERUTILIZED 0x4
So we use all three macros for "sd->overutilized" and only use SCHED_GLOBAL_OVERUTILIZED for "rd->overutilized". So for example, if any schedule domain has set SCHED_OUTER_OVERUTILIZED, that means we could check the local schedule group with higher capacity than busiest schedule group and execute load balance.
Please feel free correct me if wrong.
I should have read your email before I wrote the reply I just sent to Vincent and given you credit for your proposal. I fully agree with you that for asymmetric cpu capacity systems we have additional meanings of being 'over-utilized' that can be addressed in different ways.
I'm not sure exactly how we would determine when we are 'global over-utilized' and distinguish it from 'outer over-utilized'. That requires a bit more pondering.
Regarding the flags in Thara's proposal. sd->overutilization can be a parent flag as well if you have more than two sched_domain levels. We need to consider more levels to have a scalable solution.
Thanks, Morten IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.