On Mon, Dec 19, 2016 at 08:24:18AM +0100, Vincent Guittot wrote:
[...]
A half-baked thought:
If we assume that the wake-up EAS-code does a good job, could we set the flag if the wake target cpu ends up being over-utilized? If EAS failed to find enough capacity for the task it must be due to one the following reasons: 1. Because the task is too big to fit the raw capacity offered by the little cpus. 2. The utilization of other tasks leaves too little spare capacity left for the task, and it is not possible to reorganize the task distribution to get sufficient non-fragmented spare capacity. 3. Spare capacity is fragmented, but it would be possible to reorganize the tasks to provide the necessary spare capacity.
- and 2. should be fine as those are cases where we do need help from
the big cluster. 3. is more difficult as while it is theoretically possible to sort things out, it might take a long time to do so, in the meantime one or more tasks will suffer.
Your good summary actually reminds another important thing: we have given multiple meanings to "overutilized" this single one flag, as result this requires the single flag (a bool value) to handle multiple cases.
So I try to summary semantics for "overutilized" flag as below:
Inner "overutilized": This kind "overutilized" is the schedule domain internal issue and can be adjusted within schedule domain; so scheduler should find best combination between tasks and CPUs in the schedule domain, this also is quite match the case which you mantioned in upper item 3;
Outer "overutilized": This kind "overutilized" is that schedule domain cannot adjust by itself so ask other scheduler domains to help pull tasks. Usually we expects this is a higher capacity scheduler domain to pull tasks so can improve performance; This is quite match upper items 1/2;
Global "overutilized": This kind "overutilized" is that schedule domains should spread tasks as possible and this may happen for task migration from higher capacity schedule domain to lower capacity schedule domain;
I agree that we should make the disctinction between overutlization that can be handled by current sched_domain (inner) and those that must be addressed by upper levels (outer) but I don't see the need of a third global state which is redundant with outer
For global state, one benefit I can think out is for benchmark. If whole system is quite busy we can set global state and completely roll back to traditional SMP load balance. For outer "overutilize" flag, it only means there have tasks should be moved out from specific cluster to other clusters. Agree for this?
Thanks, Leo Yan