The comment inside cpu_util_wake() clearly says that the task_util() isn't subtracted from cpu_util() as WALT doesn't decay idle tasks like PELT does. That probably works fine in most of the cases, but there is at least one case (find_best_target()) where we will account for task_util() twice.
There are significant side effects of this, the most observed one is that it makes the task move to a big CPU instead of the LITTLE ones. And thus provide better results with various benchmarks, like PCMark.
Fix that.
Reported-by: vincent Guittot vincent.guittot@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- Hi,
I don't have knowledge in great depths of either Walt or Pelt, but we noticed something incorrect and wanted to check with others if the finding is correct. If others agree that this is indeed the right fix, then I will send it for Android gerrit.
This was tested as part of my Pelt Vs Walt work and I have noticed significant performance difference with and without this patch. With this patch, the amount of time spent by the big cluster in the highest OPP is reduced significantly and thus we get a bit lower numbers with PCMark for example.
kernel/sched/fair.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 2f5925cc541f..06246e02ea09 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6812,6 +6812,14 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, * accounting. However, the blocked utilization may be zero. */ wake_util = cpu_util_wake(i, p); + +#ifdef CONFIG_SCHED_WALT + if (!walt_disabled && sysctl_sched_use_walt_cpu_util && + i == task_cpu(p)) { + wake_util -= task_util(p); + } +#endif + new_util = wake_util + task_util(p);
/* -- 2.15.0.194.g9af6a3dea062