On 27 May 2014 14:48, Peter Zijlstra peterz@infradead.org wrote:
On Fri, May 23, 2014 at 05:52:56PM +0200, Vincent Guittot wrote:
I have tried to understand the meaning of the condition : (this_load <= load && this_load + target_load(prev_cpu, idx) <= tl_per_task) but i failed to find a use case that can take advantage of it and i haven't found description of it in the previous commits' log.
commit 2dd73a4f09beacadde827a032cf15fd8b1fa3d48
int try_to_wake_up(): in this function the value SCHED_LOAD_BALANCE is used to represent the load contribution of a single task in various calculations in the code that decides which CPU to put the waking task on. While this would be a valid on a system where the nice values for the runnable tasks were distributed evenly around zero it will lead to anomalous load balancing if the distribution is skewed in either direction. To overcome this problem SCHED_LOAD_SCALE has been replaced by the load_weight for the relevant task or by the average load_weight per task for the queue in question (as appropriate). if ((tl <= load &&
tl + target_load(cpu, idx) <= SCHED_LOAD_SCALE) ||
100*(tl + SCHED_LOAD_SCALE) <= imbalance*load) {
tl + target_load(cpu, idx) <= tl_per_task) ||
100*(tl + p->load_weight) <= imbalance*load) {
The oldest patch i had found was: https://lkml.org/lkml/2005/2/24/34 where task_hot had been replaced by + if ((tl <= load && + tl + target_load(cpu, idx) <= SCHED_LOAD_SCALE) || + 100*(tl + SCHED_LOAD_SCALE) <= imbalance*load) {
but as explained, i haven't found a clear explanation of this condition
commit a3f21bce1fefdf92a4d1705e888d390b10f3ac6f
if ((tl <= load &&
tl + target_load(cpu, idx) <= SCHED_LOAD_SCALE) ||
100*(tl + SCHED_LOAD_SCALE) <= imbalance*load) {
So back when the code got introduced, it read:
target_load(prev_cpu, idx) - sync*SCHED_LOAD_SCALE < source_load(this_cpu, idx) && target_load(prev_cpu, idx) - sync*SCHED_LOAD_SCALE + target_load(this_cpu, idx) < SCHED_LOAD_SCALE
So while the first line makes some sense, the second line is still somewhat challenging.
I read the second line something like: if there's less than one full task running on the combined cpus.
ok. your explanation makes sense
Now for idx==0 this is hard, because even when sync=1 you can only make it true if both cpus are completely idle, in which case you really want to move to the waking cpu I suppose.
This use case is already taken into account by
if (this_load > 0) .. else balance = true
One task running will have it == SCHED_LOAD_SCALE.
But for idx>0 this can trigger in all kinds of situations of light load.
target_load is the max between load for idx == 0 and load for the selected idx so we have even less chance to match the condition : both cpu are completely idle