On Mon, Dec 19, 2016 at 08:27:15AM +0100, Vincent Guittot wrote:
[...]
No in Thara's path, sd->overutilized and rd->overutilized have the exact same meaning, it is just that we rely on the parent to share the over utilization with the other cpu at the same level and the rd->overutilized is used as the parent of the last sd level but there is no difference in the usage
I think sd->overutilized and rd->overutilized have different visibility for CPUs. Please see below example:
CPU A SD level 1 - SG1 (CPUA), SG2 (CPUB) SD level 2 - SG5(CPUA, CPUB), SG6(CPU C, CPU D) RD
CPU B SD level 1 - SG2(CPUB), SG1 (CPUA) SD level 2 - SG5(CPU A, CPU B), SG6(CPU C, CPUD) RD
CPU C SD level 1 - SG3(CPU C), SG4 (CPUD) SD level 2 - SG6(CPUC, CPUD), SG5(CPUA, CPU B) RD
CPU D SD level 1 - SG4(CPU D), SG3(CPU C) SD level2 - SG6(CPUC, CPU D), SG5(CPU A, APU B) RD
If CPUA set its sd->overutilized flag into SG5, then later CPUC check sd->overutilized CPUC will only check the flags in SG6. So CPUA set sd->overutilized flag and this flag can be observed by CPUB, but CPUC cannot observe it.
But for rd->overutilized flag, it is visible to all CPUs. This is why I think function is_sd_overutilized() should change as below, CPUC iterates all "sd->overutilized" flags in the same schedule domain and finally find SG5's "overutilized" flag is set CPUA.
static bool is_sd_overutilized(struct sched_domain *sd) { struct sched_group *group = sd->groups; int cpu = smp_processor_id();
if (cpu_rq(cpu)->rd->overutilized) return true;
do { if (group->overutilized) return true;
} while (group = group->next, group != sd->groups);
return false; }
Thanks, Leo Yan