The comment inside cpu_util_wake() clearly says that the task_util() isn't subtracted from cpu_util() as WALT doesn't decay idle tasks like PELT does. That probably works fine in most of the cases, but there is at least one case (find_best_target()) where we will account for task_util() twice.
There are significant side effects of this, the most observed one is that it makes the task move to a big CPU instead of the LITTLE ones. And thus provide better results with various benchmarks, like PCMark.
Fix that.
Reported-by: vincent Guittot vincent.guittot@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- Hi,
I don't have knowledge in great depths of either Walt or Pelt, but we noticed something incorrect and wanted to check with others if the finding is correct. If others agree that this is indeed the right fix, then I will send it for Android gerrit.
This was tested as part of my Pelt Vs Walt work and I have noticed significant performance difference with and without this patch. With this patch, the amount of time spent by the big cluster in the highest OPP is reduced significantly and thus we get a bit lower numbers with PCMark for example.
kernel/sched/fair.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 2f5925cc541f..06246e02ea09 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6812,6 +6812,14 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, * accounting. However, the blocked utilization may be zero. */ wake_util = cpu_util_wake(i, p); + +#ifdef CONFIG_SCHED_WALT + if (!walt_disabled && sysctl_sched_use_walt_cpu_util && + i == task_cpu(p)) { + wake_util -= task_util(p); + } +#endif + new_util = wake_util + task_util(p);
/* -- 2.15.0.194.g9af6a3dea062
Hi Viresh,
On Thursday 18 Jan 2018 at 12:00:54 (+0530), Viresh Kumar wrote:
The comment inside cpu_util_wake() clearly says that the task_util() isn't subtracted from cpu_util() as WALT doesn't decay idle tasks like PELT does. That probably works fine in most of the cases, but there is at least one case (find_best_target()) where we will account for task_util() twice.
So, I don't have a great knowledge about WALT either but I think what is meant in this comment really is that, since WALT only accounts for runnable tasks, if the task is waking, it is not accounted in cpu_util() so there is no need to remove it.
There are significant side effects of this, the most observed one is that it makes the task move to a big CPU instead of the LITTLE ones. And thus provide better results with various benchmarks, like PCMark.
Fix that.
Reported-by: vincent Guittot vincent.guittot@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Hi,
I don't have knowledge in great depths of either Walt or Pelt, but we noticed something incorrect and wanted to check with others if the finding is correct. If others agree that this is indeed the right fix, then I will send it for Android gerrit.
This was tested as part of my Pelt Vs Walt work and I have noticed significant performance difference with and without this patch. With this patch, the amount of time spent by the big cluster in the highest OPP is reduced significantly and thus we get a bit lower numbers with PCMark for example.
I would say that our goal is to improve the PCMark scores with PELT, not make them worst with WALT ... ;)
kernel/sched/fair.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 2f5925cc541f..06246e02ea09 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6812,6 +6812,14 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, * accounting. However, the blocked utilization may be zero. */ wake_util = cpu_util_wake(i, p);
+#ifdef CONFIG_SCHED_WALT
if (!walt_disabled && sysctl_sched_use_walt_cpu_util &&
i == task_cpu(p)) {
wake_util -= task_util(p);
Since task_util(p) is not accounted in cpu_util_wake(), you shouldn't do this.
}
+#endif
new_util = wake_util + task_util(p); /*
-- 2.15.0.194.g9af6a3dea062
eas-dev mailing list eas-dev@lists.linaro.org https://lists.linaro.org/mailman/listinfo/eas-dev
On 18-01-18, 10:11, Quentin Perret wrote:
On Thursday 18 Jan 2018 at 12:00:54 (+0530), Viresh Kumar wrote:
The comment inside cpu_util_wake() clearly says that the task_util() isn't subtracted from cpu_util() as WALT doesn't decay idle tasks like PELT does. That probably works fine in most of the cases, but there is at least one case (find_best_target()) where we will account for task_util() twice.
So, I don't have a great knowledge about WALT either but I think what is meant in this comment really is that, since WALT only accounts for runnable tasks, if the task is waking, it is not accounted in cpu_util() so there is no need to remove it.
Yeah, I was talking about this with Vincent few minutes back and noticed that this patch surely isn't correct.
I would say that our goal is to improve the PCMark scores with PELT, not make them worst with WALT ... ;)
We need to make them equal, strategy is all yours :)
kernel/sched/fair.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 2f5925cc541f..06246e02ea09 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6812,6 +6812,14 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, * accounting. However, the blocked utilization may be zero. */ wake_util = cpu_util_wake(i, p);
+#ifdef CONFIG_SCHED_WALT
if (!walt_disabled && sysctl_sched_use_walt_cpu_util &&
i == task_cpu(p)) {
wake_util -= task_util(p);
Since task_util(p) is not accounted in cpu_util_wake(), you shouldn't do this.
So aren't there two cases:
- The task's utilization is already part of cpu_util(), if the task was running recently.
- The task's utilization isn't part of cpu_util() as it hasn't run for a long time, maybe last window or whatever ?
If the above is true, then we should still need the above diff, but maybe with few more 'if' conditionals, i.e. a way to check if task's util was already part of cpu_util() ?
-- viresh
On Thursday 18 Jan 2018 at 15:45:43 (+0530), Viresh Kumar wrote:
On 18-01-18, 10:11, Quentin Perret wrote:
On Thursday 18 Jan 2018 at 12:00:54 (+0530), Viresh Kumar wrote:
The comment inside cpu_util_wake() clearly says that the task_util() isn't subtracted from cpu_util() as WALT doesn't decay idle tasks like PELT does. That probably works fine in most of the cases, but there is at least one case (find_best_target()) where we will account for task_util() twice.
So, I don't have a great knowledge about WALT either but I think what is meant in this comment really is that, since WALT only accounts for runnable tasks, if the task is waking, it is not accounted in cpu_util() so there is no need to remove it.
Yeah, I was talking about this with Vincent few minutes back and noticed that this patch surely isn't correct.
I would say that our goal is to improve the PCMark scores with PELT, not make them worst with WALT ... ;)
We need to make them equal, strategy is all yours :)
kernel/sched/fair.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 2f5925cc541f..06246e02ea09 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6812,6 +6812,14 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, * accounting. However, the blocked utilization may be zero. */ wake_util = cpu_util_wake(i, p);
+#ifdef CONFIG_SCHED_WALT
if (!walt_disabled && sysctl_sched_use_walt_cpu_util &&
i == task_cpu(p)) {
wake_util -= task_util(p);
Since task_util(p) is not accounted in cpu_util_wake(), you shouldn't do this.
So aren't there two cases:
The task's utilization is already part of cpu_util(), if the task was running recently.
The task's utilization isn't part of cpu_util() as it hasn't run for a long time, maybe last window or whatever ?
If the above is true, then we should still need the above diff, but maybe with few more 'if' conditionals, i.e. a way to check if task's util was already part of cpu_util() ?
I think this is precisely what is done in cpu_util_wake() no ?
-- viresh
On 18-01-18, 10:17, Quentin Perret wrote:
I think this is precisely what is done in cpu_util_wake() no ?
Looks like I completely misread it, damn.
+static int cpu_util_wake(int cpu, struct task_struct *p) +{ + unsigned long util, capacity; + +#ifdef CONFIG_SCHED_WALT + /* + * WALT does not decay idle tasks in the same manner + * as PELT, so it makes little sense to subtract task + * utilization from cpu utilization. Instead just use + * cpu_util for this case. + */ + if (!walt_disabled && sysctl_sched_use_walt_cpu_util && + p->state == TASK_WAKING) + return cpu_util(cpu);
I somehow read it as if we will always return from here and never execute the below code for walt.
+#endif + /* Task has no contribution or is new */ + if (cpu != task_cpu(p) || !p->se.avg.last_update_time) + return cpu_util(cpu); + + capacity = capacity_orig_of(cpu); + util = max_t(long, cpu_util(cpu) - task_util(p), 0); + + return (util >= capacity) ? capacity : util; +}
Sorry for the noise, there shouldn't be double accounting here.
-- viresh
On Thu, Jan 18, 2018 at 03:54:12PM +0530, Viresh Kumar wrote:
On 18-01-18, 10:17, Quentin Perret wrote:
I think this is precisely what is done in cpu_util_wake() no ?
Looks like I completely misread it, damn.
+static int cpu_util_wake(int cpu, struct task_struct *p) +{
unsigned long util, capacity;
+#ifdef CONFIG_SCHED_WALT
/*
* WALT does not decay idle tasks in the same manner
* as PELT, so it makes little sense to subtract task
* utilization from cpu utilization. Instead just use
* cpu_util for this case.
*/
if (!walt_disabled && sysctl_sched_use_walt_cpu_util &&
p->state == TASK_WAKING)
return cpu_util(cpu);
I somehow read it as if we will always return from here and never execute the below code for walt.
+#endif
/* Task has no contribution or is new */
if (cpu != task_cpu(p) || !p->se.avg.last_update_time)
return cpu_util(cpu);
capacity = capacity_orig_of(cpu);
util = max_t(long, cpu_util(cpu) - task_util(p), 0);
return (util >= capacity) ? capacity : util;
+}
Sorry for the noise, there shouldn't be double accounting here.
Right. There is no double accounting here. cpu_util() i.e WALT's cumulative_runnable_avg represents the sum of demands of tasks queued on a given CPU. So we don't have to discount anything for waking tasks. select_best_target() is also called from tick path to see if the current task needs to be migrated to the BIG cluster. In that case, we have to discount the task's utilization which is being done here.
-- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.