Hi, I reply to the first message because it makes it easy to comment the code.
Long story short: explanation from Dietmar and Leo are correct, lemme try to further clarify here after...
On 28-Feb 12:18, Viresh Kumar wrote:
find_best_target() tries to find the target CPU where the task should be placed, based on how much would be the utilization of the CPU after the task is placed on it. This is represented by 'new_util' in the routine.
Currently it adds task_util(p) to the wake_util of the CPU to find that out, while it should really be adding the boosted task's util to wake_util, as that's how much the cpu utilization would be.
This is how we used to do it before commit 3bfde3b4f848 ("ANDROID: sched/fair: Change cpu iteration order in find_best_target()"), was merged and the commit doesn't describe the rational behind this change.
That's a kind-of a guess but, I think that this patch is the outcome of multiple rebase/update/cleanups which likely ended up squashing different patches.
AFAIR the original patch to change the CPU order by Dietmar was not changing behaviors. That's what me, Chris and Joel mainly played in the past... otherwise we should have noted the change and properly documented it in the changelog.
Dietmar: maybe we can do a bit of archaeology and try to track down your original patch in our internal gerrit...
This patch reverts to the earlier formula to calculate the new_util.
Fixes: 3bfde3b4f848 ("ANDROID: sched/fair: Change cpu iteration order in find_best_target()") Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Just wanted to get some review done over the list before posting it to gerrit. Not sure if this commit is doing the right thing, but I couldn't understand why it should be done this way.
This is for the Android 4.9 EAS dev kernel.
kernel/sched/fair.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 88abd5de69ce..1c33a2ddd39c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6860,14 +6860,13 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu,
I'm quite confident instead that this last formulation is what we considered the most appropriate aggregation of signals. The fundamental idea is:
- for CPU selection: we wanna use the actual utilization (not boosted), thus we add task_util() to cpu_util_wake(). That's because we wanna use the actual utilization to compute things like the "maximum spare capacity".
- for FREQUENCY selection: we "just" wanna to ensure that the selected CPU can _at least_ provide that capacity.
This second point is quite important, as correctly observed by Viresh, if a CPU is not relatively empty (i.e. I think he means with relatively small blocked utilization) then we don't see the "boosted utilization" effects on the OPP selection.
But, that's actually intentional, since the "boosted utilization" from a task perspective defines the minimum OPP we wanna run that task at. For example, if you have a 10% task boosted 50%, it's boosted utlilization will be 10+(100-10)*0.5 ~= 55%. But, if that task is co-scheduled in a CPU which is already running at, let say, 70%... then the task has already the required boost value.
From a different perspective, this means that we _do not_ use an
"additive aggregation" for task boosted utilizations, but instead a "capping aggregation". They are just two different policies, the first one is what Viresh is kind of thinking at. It's also the one we started with, because it was expected to give better guarantees to tasks but, according to our experiments, it has been verified to be much more aggressive since it turned out to increase the chances to run only one task per CPU thus actually degrading performances (e.g. you cannot easily have two TOP_APP tasks co-scheduled on the same big CPU reserved for those tasks).
As a final remark, in SchedTune there is a slightly different definition of "boosted utilization" for tasks and CPUs. While for tasks we use a "capping policy", CPUs are instead using an "additive policy", since boosted_cpu_util() is actually applying the boost value to the aggregated PELT signal relate to all RUNNABLE and BLOCKED tasks on that CPU. That's something we are aware of, and it's also one of the main reasons behind the new boosting strategy proposed by UtilClamp (the future version of SchedTune) where we use more consistently a clamping policy both for tasks and CPUs.
* accounting. However, the blocked utilization may be zero. */ wake_util = cpu_util_wake(i, p);
Here we use the new cpu_util_wake(), notice that in the patch you referenced instead cpu_util() was used... which backup my thesis on a rebase artifact.
new_util = wake_util + task_util(p); /* * Ensure minimum capacity to grant the required boost. * The target CPU can be already at a capacity level higher * than the one required to boost the task. */
new_util = max(min_util, new_util);
Here we just compute the minimum capacity which should be granted to this task... used mainly to skip CPUs which cannot provide it (e.g. LITTLE cores when scheduling a sufficiently boosted task)
new_util = wake_util + min_util;
/* * Include minimum capacity constraint:
-- 2.15.0.194.g9af6a3dea062
eas-dev mailing list eas-dev@lists.linaro.org https://lists.linaro.org/mailman/listinfo/eas-dev
-- #include <best/regards.h>
Patrick Bellasi