From: Leo Yan leo.yan@linaro.org
The homescreen test-case shows unwanted disturbance on the big CPUs on the Hikey620 platform with Android 4.9. There are multiple reasons for that though.
By default boost and prefer_idle are enabled for both top-app and foreground tasks and find_best_target() always (intentionally) prefers the big CPUs if prefer_idle is enabled. And some of the foreground tasks (like: DispSync, PowerManagerSer and PhotonicModulat) for the homescreen test-case get placed on the big CPUs eventually because of that.
Even if prefer_idle is disabled for such foreground tasks, they don't end up on the little CPUs. The reason being that find_best_target() still prefers big CPUs if the task is boosted, though some of the comments in find_best_target() routine say the exact opposite of that. It eventually depends on the order in which CPUs are processed, which is from big to little for boosted tasks.
This patch updates the find_best_target() routine to select low capacity idle-CPU if the task is boosted but doesn't have prefer_idle enabled. This would be the same for non-boosted tasks as well.
Signed-off-by: Leo Yan leo.yan@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/sched/fair.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e45047bdd245..4534d8620989 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6998,8 +6998,11 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, int idle_idx = idle_get_state_idx(cpu_rq(i));
/* Select idle CPU with lower cap_orig */ - if (capacity_orig > best_idle_min_cap_orig) + if (capacity_orig < best_idle_min_cap_orig) + goto found_best_idle_cpu; + else if (capacity_orig > best_idle_min_cap_orig) continue; + /* Favor CPUs that won't end up running at a * high OPP. */ @@ -7017,6 +7020,7 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, best_idle_cstate <= idle_idx) continue;
+found_best_idle_cpu: /* Keep track of best idle CPU */ best_idle_min_cap_orig = capacity_orig; target_idle_max_spare_cap = capacity_orig - -- 2.15.0.194.g9af6a3dea062
Hi,
On 05/03/18 10:27, Viresh Kumar wrote:
From: Leo Yan leo.yan@linaro.org
The homescreen test-case shows unwanted disturbance on the big CPUs on the Hikey620 platform with Android 4.9. There are multiple reasons for that though.
I think you mean Hikey960 here, as Hikey620 has only cores with the same micro-architecture. Are you able to share the test setup and results? I was under the impression that in the current configuration, all boosted tasks are also prefer_idle, so I would not expect any disturbance in that case. I might be wrong, but I'm curious if you had to change the schedtune configuration in order to observe an issue.
By default boost and prefer_idle are enabled for both top-app and foreground tasks and find_best_target() always (intentionally) prefers the big CPUs if prefer_idle is enabled. And some of the foreground tasks (like: DispSync, PowerManagerSer and PhotonicModulat) for the homescreen test-case get placed on the big CPUs eventually because of that.
Even if prefer_idle is disabled for such foreground tasks, they don't end up on the little CPUs. The reason being that find_best_target() still prefers big CPUs if the task is boosted, though some of the comments in find_best_target() routine say the exact opposite of that. It eventually depends on the order in which CPUs are processed, which is from big to little for boosted tasks.
I was looking into the same issue and wondering if this logic should not be applied for both non-latency sensitive scenarios: selection of backup idle CPU (as you've done here) and selection of active CPU, where we would select the lowest capacity one, despite it not having the highest spare. But, I think the scenario involving choosing an active CPU is more complicated as we might want to spread tasks when boosted, rather then pack them.
But this is a more simple case and it would benefit from a wider audience on android gerrit, especially in light of other changes that are considered for find_best_target: https://android-review.googlesource.com/q/strf
Ionela.
This patch updates the find_best_target() routine to select low capacity idle-CPU if the task is boosted but doesn't have prefer_idle enabled. This would be the same for non-boosted tasks as well.
Signed-off-by: Leo Yan leo.yan@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
kernel/sched/fair.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e45047bdd245..4534d8620989 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6998,8 +6998,11 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, int idle_idx = idle_get_state_idx(cpu_rq(i));
/* Select idle CPU with lower cap_orig */
if (capacity_orig > best_idle_min_cap_orig)
if (capacity_orig < best_idle_min_cap_orig)
goto found_best_idle_cpu;
else if (capacity_orig > best_idle_min_cap_orig) continue;
/* Favor CPUs that won't end up running at a * high OPP. */
@@ -7017,6 +7020,7 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, best_idle_cstate <= idle_idx) continue;
+found_best_idle_cpu: /* Keep track of best idle CPU */ best_idle_min_cap_orig = capacity_orig; target_idle_max_spare_cap = capacity_orig -
On 5 March 2018 at 17:52, Ionela Voinescu ionela.voinescu@arm.com wrote:
On 05/03/18 10:27, Viresh Kumar wrote:
From: Leo Yan leo.yan@linaro.org
The homescreen test-case shows unwanted disturbance on the big CPUs on the Hikey620 platform with Android 4.9. There are multiple reasons for that though.
I think you mean Hikey960 here, as Hikey620 has only cores with the same micro-architecture.
Sure you are correct here.
Are you able to share the test setup and results?
Nothing special in the test case really:
- I merged hikey 4.9 branch with eas-dev branch and was testing it with WALT disabled.
- The homescreen testcase is defined here:
http://git.linaro.org/power/product-codeline.git/tree/wlauto/agendas/homescr...
- enabled traces and energy measurements as well.
- Traces clearly show that some tasks go for the big CPUs.
I was under the impression that in the current configuration, all boosted tasks are also prefer_idle,
We have separate tunables for them. So yes we can control prefer_idle separately I think. The same is done in the above yaml.
so I would not expect any disturbance in that case. I might be wrong, but I'm curious if you had to change the schedtune configuration in order to observe an issue.
Yeah, so the yaml disables prefer_idle in one of the test.
I was looking into the same issue and wondering if this logic should not be applied for both non-latency sensitive scenarios: selection of backup idle CPU (as you've done here) and selection of active CPU, where we would select the lowest capacity one, despite it not having the highest spare. But, I think the scenario involving choosing an active CPU is more complicated as we might want to spread tasks when boosted, rather then pack them.
Well, maybe. I will give more thought on that and see if that will make sense as well. But I didn't try it yet as that was part of the prefer_idle path, and we always prefer the big CPU there.
But this is a more simple case and it would benefit from a wider audience on android gerrit, especially in light of other changes that are considered for find_best_target: https://android-review.googlesource.com/q/strf
Sure, I just wanted to get basic feedback here first (reviews are much easier on email) and then post to gerrit.
-- viresh
On 05/03/18 15:03, Viresh Kumar wrote:
On 5 March 2018 at 17:52, Ionela Voinescu ionela.voinescu@arm.com wrote:
On 05/03/18 10:27, Viresh Kumar wrote:
From: Leo Yan leo.yan@linaro.org
The homescreen test-case shows unwanted disturbance on the big CPUs on the Hikey620 platform with Android 4.9. There are multiple reasons for that though.
I think you mean Hikey960 here, as Hikey620 has only cores with the same micro-architecture.
Sure you are correct here.
Are you able to share the test setup and results?
Nothing special in the test case really:
- I merged hikey 4.9 branch with eas-dev branch and was testing it with WALT
disabled.
- The homescreen testcase is defined here:
http://git.linaro.org/power/product-codeline.git/tree/wlauto/agendas/homescr...
enabled traces and energy measurements as well.
Traces clearly show that some tasks go for the big CPUs.
I was under the impression that in the current configuration, all boosted tasks are also prefer_idle,
We have separate tunables for them. So yes we can control prefer_idle separately I think. The same is done in the above yaml.
so I would not expect any disturbance in that case. I might be wrong, but I'm curious if you had to change the schedtune configuration in order to observe an issue.
Yeah, so the yaml disables prefer_idle in one of the test.
I was looking into the same issue and wondering if this logic should not be applied for both non-latency sensitive scenarios: selection of backup idle CPU (as you've done here) and selection of active CPU, where we would select the lowest capacity one, despite it not having the highest spare. But, I think the scenario involving choosing an active CPU is more complicated as we might want to spread tasks when boosted, rather then pack them.
Well, maybe. I will give more thought on that and see if that will make sense as well. But I didn't try it yet as that was part of the prefer_idle path, and we always prefer the big CPU there.
We have three cases in there: case A is the latency sensitive prefer_idle case, case B is the non-latency sensitive selection of backup idle CPU and case C is the selection of an active CPU for a non-latency sensitive task. We only prefer the biggest CPU for prefer-idle, only if the task is boosted as well. I was referring to case C in the comments above about the selection of a active CPU, especially the boosted, non-prefer_idle case. I would appreciate your thoughts.
In any case, all of this is highly dependent on order (bigs to littles, littles to bigs, assumptions on reserved CPUs) and although separate tunables are used to set boost and the prefer_idle flag, the odd cases of non-boosted prefer_idle tasks and non-prefer_idle boosted tasks do not always result in optimal placements. As Patrick mentioned, we're trying to find a solution in which order does not matter at all.
Ionela.
But this is a more simple case and it would benefit from a wider audience on android gerrit, especially in light of other changes that are considered for find_best_target: https://android-review.googlesource.com/q/strf
Sure, I just wanted to get basic feedback here first (reviews are much easier on email) and then post to gerrit.
-- viresh
On Monday 05 Mar 2018 at 15:57:54 (+0530), Viresh Kumar wrote:
From: Leo Yan leo.yan@linaro.org
The homescreen test-case shows unwanted disturbance on the big CPUs on the Hikey620 platform with Android 4.9. There are multiple reasons for that though.
By default boost and prefer_idle are enabled for both top-app and foreground tasks and find_best_target() always (intentionally) prefers the big CPUs if prefer_idle is enabled. And some of the foreground tasks (like: DispSync, PowerManagerSer and PhotonicModulat) for the homescreen test-case get placed on the big CPUs eventually because of that.
Even if prefer_idle is disabled for such foreground tasks, they don't end up on the little CPUs. The reason being that find_best_target() still prefers big CPUs if the task is boosted, though some of the comments in find_best_target() routine say the exact opposite of that. It eventually depends on the order in which CPUs are processed, which is from big to little for boosted tasks.
This patch updates the find_best_target() routine to select low capacity idle-CPU if the task is boosted but doesn't have prefer_idle enabled. This would be the same for non-boosted tasks as well.
So, if we want to start from little CPUs for boosted and non-prefer-idle tasks, what about just changing: cpu = start_cpu(boosted); by cpu = start_cpu(prefer_idle);
?
Signed-off-by: Leo Yan leo.yan@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
kernel/sched/fair.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e45047bdd245..4534d8620989 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6998,8 +6998,11 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, int idle_idx = idle_get_state_idx(cpu_rq(i));
/* Select idle CPU with lower cap_orig */
if (capacity_orig > best_idle_min_cap_orig)
if (capacity_orig < best_idle_min_cap_orig)
goto found_best_idle_cpu;
else if (capacity_orig > best_idle_min_cap_orig) continue;
/* Favor CPUs that won't end up running at a * high OPP. */
@@ -7017,6 +7020,7 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, best_idle_cstate <= idle_idx) continue;
+found_best_idle_cpu: /* Keep track of best idle CPU */ best_idle_min_cap_orig = capacity_orig; target_idle_max_spare_cap = capacity_orig - -- 2.15.0.194.g9af6a3dea062
eas-dev mailing list eas-dev@lists.linaro.org https://lists.linaro.org/mailman/listinfo/eas-dev
On 5 March 2018 at 18:01, Quentin Perret quentin.perret@arm.com wrote:
So, if we want to start from little CPUs for boosted and non-prefer-idle tasks, what about just changing: cpu = start_cpu(boosted); by cpu = start_cpu(prefer_idle);
Well, in the prefer-idle case we really want to pick a big idle CPU instead of a little idle CPU (I am not very sure why) and so the order plays a role.
In the non-prefer-idle case I think the code was designed to work irrespective of the order in which CPUs are parsed and just that we have a bug there which wouldn't let us select a little idle CPU. So, order shouldn't matter in that case and that's what this patch is trying to do.
-- viresh
On Monday 05 Mar 2018 at 20:36:18 (+0530), Viresh Kumar wrote:
On 5 March 2018 at 18:01, Quentin Perret quentin.perret@arm.com wrote:
So, if we want to start from little CPUs for boosted and non-prefer-idle tasks, what about just changing: cpu = start_cpu(boosted); by cpu = start_cpu(prefer_idle);
Well, in the prefer-idle case we really want to pick a big idle CPU instead of a little idle CPU (I am not very sure why) and so the order plays a role.
It's because the prefer-idle flag basically means "this task is latency sensitive". What we want is to wakeup the task ASAP (that's why we pick an idle CPU) and to complete it quickly (that's why we favor big CPUs, and why all prefer idle tasks are boosted).
In the non-prefer-idle case I think the code was designed to work irrespective of the order in which CPUs are parsed and just that we have a bug there which wouldn't let us select a little idle CPU. So, order shouldn't matter in that case and that's what this patch is trying to do.
A regular android system doesn't have such non-prefer-idle boosted tasks for now AFAIK, so the right thing to do isn't clear yet. Ionela is currently looking into those things, so I'd like to see her results before we decide what to do :) Maybe the simplest fix will be to simply start the iteration from the littlest CPU for all non-prefer-idle tasks, even if they're boosted. I think that should also fix the issue Leo noticed. Let's see :)
Thanks, Quentin
On Mon, Mar 05, 2018 at 03:59:09PM +0000, Quentin Perret wrote:
On Monday 05 Mar 2018 at 20:36:18 (+0530), Viresh Kumar wrote:
On 5 March 2018 at 18:01, Quentin Perret quentin.perret@arm.com wrote:
So, if we want to start from little CPUs for boosted and non-prefer-idle tasks, what about just changing: cpu = start_cpu(boosted); by cpu = start_cpu(prefer_idle);
Well, in the prefer-idle case we really want to pick a big idle CPU instead of a little idle CPU (I am not very sure why) and so the order plays a role.
It's because the prefer-idle flag basically means "this task is latency sensitive". What we want is to wakeup the task ASAP (that's why we pick an idle CPU) and to complete it quickly (that's why we favor big CPUs, and why all prefer idle tasks are boosted).
In the non-prefer-idle case I think the code was designed to work irrespective of the order in which CPUs are parsed and just that we have a bug there which wouldn't let us select a little idle CPU. So, order shouldn't matter in that case and that's what this patch is trying to do.
A regular android system doesn't have such non-prefer-idle boosted tasks for now AFAIK, so the right thing to do isn't clear yet. Ionela is currently looking into those things, so I'd like to see her results before we decide what to do :) Maybe the simplest fix will be to simply start the iteration from the littlest CPU for all non-prefer-idle tasks, even if they're boosted. I think that should also fix the issue Leo noticed. Let's see :)
Just want to remind, for the boosting tasks, it's quite tricky to say should place it on big or LITTLE CPUs. The reason is, if we place one boosted task on a big CPU, and this big CPU has its saperate clock from LITTLE CPUs, then this boosted task doesn't interfere other tasks on LITTLE CPUs; on the other hand if we place this boosted task on LITTLE CPUs, this is easily to increase frequency on LITTLE CPUs and all other tasks on LITTLE CPUs also run at higher frequency and consume more power. So if can find some specific boost margin, we can observe power regression after we place boosted task on LITTLE CPU.
I still want to sell my suggestion to use fbt() to choose candidates for every cluster (every cluster can have active and idle candidates), and compare energy for these candidates to select final target.
If so, fbt() doesn't consider different CPU variants and only focus on the same CPU variant. At the end all discussion for this patch will dismiss :D
Thanks, Leo Yan
On 09/03/2018 09:42, Leo Yan wrote:
On Mon, Mar 05, 2018 at 03:59:09PM +0000, Quentin Perret wrote:
On Monday 05 Mar 2018 at 20:36:18 (+0530), Viresh Kumar wrote:
On 5 March 2018 at 18:01, Quentin Perret quentin.perret@arm.com wrote:
So, if we want to start from little CPUs for boosted and non-prefer-idle tasks, what about just changing: cpu = start_cpu(boosted); by cpu = start_cpu(prefer_idle);
Well, in the prefer-idle case we really want to pick a big idle CPU instead of a little idle CPU (I am not very sure why) and so the order plays a role.
It's because the prefer-idle flag basically means "this task is latency sensitive". What we want is to wakeup the task ASAP (that's why we pick an idle CPU) and to complete it quickly (that's why we favor big CPUs, and why all prefer idle tasks are boosted).
In the non-prefer-idle case I think the code was designed to work irrespective of the order in which CPUs are parsed and just that we have a bug there which wouldn't let us select a little idle CPU. So, order shouldn't matter in that case and that's what this patch is trying to do.
A regular android system doesn't have such non-prefer-idle boosted tasks for now AFAIK, so the right thing to do isn't clear yet. Ionela is currently looking into those things, so I'd like to see her results before we decide what to do :) Maybe the simplest fix will be to simply start the iteration from the littlest CPU for all non-prefer-idle tasks, even if they're boosted. I think that should also fix the issue Leo noticed. Let's see :)
Just want to remind, for the boosting tasks, it's quite tricky to say should place it on big or LITTLE CPUs. The reason is, if we place one boosted task on a big CPU, and this big CPU has its saperate clock from LITTLE CPUs, then this boosted task doesn't interfere other tasks on LITTLE CPUs; on the other hand if we place this boosted task on LITTLE CPUs, this is easily to increase frequency on LITTLE CPUs and all other tasks on LITTLE CPUs also run at higher frequency and consume more power. So if can find some specific boost margin, we can observe power regression after we place boosted task on LITTLE CPU.
Hi Leo,
I'm missing the point about a separate clock line for bL. If the task is placed on a big CPUs, the other big CPUs will also be impacted and we will see a frequency increase on the big cluster and the same power regression, no ?
I still want to sell my suggestion to use fbt() to choose candidates for every cluster (every cluster can have active and idle candidates), and compare energy for these candidates to select final target.
If so, fbt() doesn't consider different CPU variants and only focus on the same CPU variant. At the end all discussion for this patch will dismiss :D
Thanks, Leo Yan
-- http://www.linaro.org/ Linaro.org │ Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro Facebook | http://twitter.com/#!/linaroorg Twitter | http://www.linaro.org/linaro-blog/ Blog
Hi Daniel,
On Fri, Mar 09, 2018 at 09:59:48AM +0100, Daniel Lezcano wrote:
[...]
Just want to remind, for the boosting tasks, it's quite tricky to say should place it on big or LITTLE CPUs. The reason is, if we place one boosted task on a big CPU, and this big CPU has its saperate clock from LITTLE CPUs, then this boosted task doesn't interfere other tasks on LITTLE CPUs; on the other hand if we place this boosted task on LITTLE CPUs, this is easily to increase frequency on LITTLE CPUs and all other tasks on LITTLE CPUs also run at higher frequency and consume more power. So if can find some specific boost margin, we can observe power regression after we place boosted task on LITTLE CPU.
Hi Leo,
I'm missing the point about a separate clock line for bL. If the task is placed on a big CPUs, the other big CPUs will also be impacted and we will see a frequency increase on the big cluster and the same power regression, no ?
Yeah, you are right. But I should meantion two extra things for different cases:
- Usually big CPU OPP starts from higher capacity than LITTLE CPU, so it's hard to be impacted by small boost margin; e.g. using Hikey960 as example, boost margin = 20% so the boosted util = 1024*20% = 205; this is easily to increase CA53 CPUs increase from 533MHz (cap=133) to 990MHz (cap=250), but CA73 can stay at lowest frequency 903MHz (cap=390);
- IIRC, e.g. the video case have many backgroud tasks, so these tasks will be placed on LITTLE CPUs with middle workload; but if boosted tasks (foreground tasks, like surfaceflinger, etc) are placed on LITTLE CPUs and it's easily set LITTLE CPU to highest two OPPs (1.4GHz and 1.7GHz); as result the power data might be worse than place boosted tasks on big CPUs at 903MHz.
Thanks, Leo Yan
On 09/03/2018 10:24, Leo Yan wrote:
Hi Daniel,
On Fri, Mar 09, 2018 at 09:59:48AM +0100, Daniel Lezcano wrote:
[...]
Just want to remind, for the boosting tasks, it's quite tricky to say should place it on big or LITTLE CPUs. The reason is, if we place one boosted task on a big CPU, and this big CPU has its saperate clock from LITTLE CPUs, then this boosted task doesn't interfere other tasks on LITTLE CPUs; on the other hand if we place this boosted task on LITTLE CPUs, this is easily to increase frequency on LITTLE CPUs and all other tasks on LITTLE CPUs also run at higher frequency and consume more power. So if can find some specific boost margin, we can observe power regression after we place boosted task on LITTLE CPU.
Hi Leo,
I'm missing the point about a separate clock line for bL. If the task is placed on a big CPUs, the other big CPUs will also be impacted and we will see a frequency increase on the big cluster and the same power regression, no ?
Yeah, you are right. But I should meantion two extra things for different cases:
Usually big CPU OPP starts from higher capacity than LITTLE CPU, so it's hard to be impacted by small boost margin; e.g. using Hikey960 as example, boost margin = 20% so the boosted util = 1024*20% = 205; this is easily to increase CA53 CPUs increase from 533MHz (cap=133) to 990MHz (cap=250), but CA73 can stay at lowest frequency 903MHz (cap=390);
IIRC, e.g. the video case have many backgroud tasks, so these tasks will be placed on LITTLE CPUs with middle workload; but if boosted tasks (foreground tasks, like surfaceflinger, etc) are placed on LITTLE CPUs and it's easily set LITTLE CPU to highest two OPPs (1.4GHz and 1.7GHz); as result the power data might be worse than place boosted tasks on big CPUs at 903MHz.
Ok, got it.
Thanks for the clarification.
-- http://www.linaro.org/ Linaro.org │ Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro Facebook | http://twitter.com/#!/linaroorg Twitter | http://www.linaro.org/linaro-blog/ Blog
On 09-Mar 16:42, Leo Yan wrote:
On Mon, Mar 05, 2018 at 03:59:09PM +0000, Quentin Perret wrote:
On Monday 05 Mar 2018 at 20:36:18 (+0530), Viresh Kumar wrote:
On 5 March 2018 at 18:01, Quentin Perret quentin.perret@arm.com wrote:
So, if we want to start from little CPUs for boosted and non-prefer-idle tasks, what about just changing: cpu = start_cpu(boosted); by cpu = start_cpu(prefer_idle);
Well, in the prefer-idle case we really want to pick a big idle CPU instead of a little idle CPU (I am not very sure why) and so the order plays a role.
It's because the prefer-idle flag basically means "this task is latency sensitive". What we want is to wakeup the task ASAP (that's why we pick an idle CPU) and to complete it quickly (that's why we favor big CPUs, and why all prefer idle tasks are boosted).
In the non-prefer-idle case I think the code was designed to work irrespective of the order in which CPUs are parsed and just that we have a bug there which wouldn't let us select a little idle CPU. So, order shouldn't matter in that case and that's what this patch is trying to do.
A regular android system doesn't have such non-prefer-idle boosted tasks for now AFAIK, so the right thing to do isn't clear yet. Ionela is currently looking into those things, so I'd like to see her results before we decide what to do :) Maybe the simplest fix will be to simply start the iteration from the littlest CPU for all non-prefer-idle tasks, even if they're boosted. I think that should also fix the issue Leo noticed. Let's see :)
Just want to remind, for the boosting tasks, it's quite tricky to say should place it on big or LITTLE CPUs. The reason is, if we place one boosted task on a big CPU, and this big CPU has its saperate clock from LITTLE CPUs, then this boosted task doesn't interfere other tasks on LITTLE CPUs; on the other hand if we place this boosted task on LITTLE CPUs, this is easily to increase frequency on LITTLE CPUs and all other tasks on LITTLE CPUs also run at higher frequency and consume more power.
More power likely yes... but regarding energy it's kind of difficult to say. Wihtin certain ranges you could end up benefiting from a race-to-idle.
So if can find some specific boost margin, we can observe power regression after we place boosted task on LITTLE CPU.
I still want to sell my suggestion to use fbt() to choose candidates for every cluster (every cluster can have active and idle candidates), and compare energy for these candidates to select final target.
That's an interesting idea... did you already posted patches for it?
If so, fbt() doesn't consider different CPU variants and only focus on the same CPU variant. At the end all discussion for this patch will dismiss :D
At a certain level you should still plug in a heuristic to prefer the candidate target from one cluster or another... considering both the user-space hints (e.g. perfer_idle) as well as, potentially, a properly defined PESpace filtering... isn't it?
I mean, we cannot just blindly go for the most energy efficient target...
-- #include <best/regards.h>
Patrick Bellasi
On 05-Mar 20:36, Viresh Kumar wrote:
On 5 March 2018 at 18:01, Quentin Perret quentin.perret@arm.com wrote:
So, if we want to start from little CPUs for boosted and non-prefer-idle tasks, what about just changing: cpu = start_cpu(boosted); by cpu = start_cpu(prefer_idle);
Well, in the prefer-idle case we really want to pick a big idle CPU instead of a little idle CPU (I am not very sure why) and so the order plays a role.
In the non-prefer-idle case I think the code was designed to work irrespective of the order in which CPUs are parsed and just that we have a bug there which wouldn't let us select a little idle CPU. So, order shouldn't matter in that case and that's what this patch is trying to do.
I personally would really like a solution which do not depend on ordering at all. Meaning, find_best_target should work at its best independently from which CPUs we start from... maybe it can be more or less optimal staring from one CPU or another, but from a functional standpoint is should be the same.
Otherwise we will always risk to have a corner case not covered, as well as make it more difficult to get rid of the star_cpu at a certain point.
-- #include <best/regards.h>
Patrick Bellasi
On 05-03-18, 15:59, Patrick Bellasi wrote:
I personally would really like a solution which do not depend on ordering at all. Meaning, find_best_target should work at its best independently from which CPUs we start from... maybe it can be more or less optimal staring from one CPU or another, but from a functional standpoint is should be the same.
Otherwise we will always risk to have a corner case not covered, as well as make it more difficult to get rid of the star_cpu at a certain point.
+1
-- viresh
On 05-03-18, 15:59, Patrick Bellasi wrote:
I personally would really like a solution which do not depend on ordering at all. Meaning, find_best_target should work at its best independently from which CPUs we start from... maybe it can be more or less optimal staring from one CPU or another, but from a functional standpoint is should be the same.
Otherwise we will always risk to have a corner case not covered, as well as make it more difficult to get rid of the star_cpu at a certain point.
There is another thing that I wanted to ask.
With the current state of thing in the EAS-dev kernel + android userspace (cpuset configuration), we want all top-app and foreground tasks to run on the big CPUs however small they might be. That of course causes the big CPUs to wakeup unnecessarily for homescreen and audio playback usecases. And this is by design and not really a bug.
Is it correct to assume that we don't care about being power-efficient for these cases at all as we intentionally place these small tasks on the big CPUs ? I am asking as I am not sure if I should invest time to make EAS-dev kernel to be power efficient for these small foreground tasks, which is exact opposite of what our design is.
-- viresh
Hi Viresh,
On 06-Mar 15:29, Viresh Kumar wrote:
On 05-03-18, 15:59, Patrick Bellasi wrote:
I personally would really like a solution which do not depend on ordering at all. Meaning, find_best_target should work at its best independently from which CPUs we start from... maybe it can be more or less optimal staring from one CPU or another, but from a functional standpoint is should be the same.
Otherwise we will always risk to have a corner case not covered, as well as make it more difficult to get rid of the star_cpu at a certain point.
There is another thing that I wanted to ask.
With the current state of thing in the EAS-dev kernel + android userspace (cpuset configuration), we want all top-app and foreground tasks to run on the big CPUs however small they might be. That
Right, more precisely I would say that we use "prefer_idle" to flag certain tasks as being "latency sensitive". That's because they directly affect the user experience, e.g. the UI thread.
Scheduling "latency sensitive" tasks is quite similar to the mainline CFS policy, which always tries to look for an IDLE CPU when a task wakes up.
of course causes the big CPUs to wakeup unnecessarily for homescreen and audio playback usecases. And this is by design and not really a bug.
Yes, that's the mainline policy to a certain extent. That's also why we are evaluating a simple patch which replaces the case A in FBT with the mainline wakeup slow-path, which does essentially the same, although without the encoded knowledge of the reserved CPUs we have in FBT.
Is it correct to assume that we don't care about being power-efficient for these cases at all as we intentionally place these small tasks on the big CPUs ?
Good point: and the answer is yes and no.
Yes because in principle we wanna always go for an IDLE CPU to reduce the wakeup latency, and possibly for one of the CPUs reserved to "prefer_idle" tasks to reduce the chances for a prefer_idle task to be preempted in the future by another non latency sensitive task.
No, because we know that in certain specific use-cases (e.g. homescreen and audio playback) the system is such low utilized on the LITTLE side which we can still grant reasonable interactive performances by running just on the LITTLE side. The main problem is to "detect" these conditions in a use-case independent way...
I am asking as I am not sure if I should invest time to make EAS-dev kernel to be power efficient for these small foreground tasks, which is exact opposite of what our design is.
On that topic we are already developing a pretty simple solution named "low-util mode". The fundamental idea is to be able to detect at wakeup time if the system is lightly utilized and in that case relax the constraint to run on reserved CPUs to just go for the most energy efficient IDLE CPU.
Here is the implementation we have so far: https://android-review.googlesource.com/c/kernel/common/+/560340/3
which, based on our current wltests evaluation is not yet convincing on the power/performance side. Moreover, as you can see from the review comments on Gerrit, some corner cases still need to be fixed.
Some possible extensions/improvements identified so fare are: 1) playing with the threshold value 2) taking into account the boost value to detect if we're in low util mode
As a final remark, we should also consider that the above series has quite a big overlap with the series to use the mainline slow patch at wakeup (i.e. removing case A from FBT): https://android-review.googlesource.com/c/kernel/common/+/510757
Among the two series, we are more keen on giving higher priority to this last, since it will contribute to reduce the delta between the mainline and the Android scheduler. Thus, whatever is related to the "low utilization mode" should be probably better defined once we are confident with this last series and based on top of it.
Hope this can cast some light in the overall design to improve the energy efficiency of prefer_idle tasks.
Of course, any contribution of ideas / testing in the above described areas are more then welcome.
Cheers Patrick
-- #include <best/regards.h>
Patrick Bellasi
On 06-03-18, 11:00, Patrick Bellasi wrote:
Right, more precisely I would say that we use "prefer_idle" to flag certain tasks as being "latency sensitive". That's because they directly affect the user experience, e.g. the UI thread.
How did we conclude that the tasks that end up on the big CPUs for homescreen and audio playback are *really* latency sensitive ? Is it correct to enable prefer-idle and boost for every foreground task on the system ? Maybe the algorithm (in kernel) is all fine, and we need to categorize the tasks in a finer way ?
Even if the system is not in low-util mode (as explained by you earlier), why should such tasks hit a big CPU ever ? What user experience are they taking care of ?
What about state of the system where the LCD screen is off (not displaying anything) but the platform isn't suspended, like maybe playing audio, software-updates, downloads, etc. Should all threads in such cases land on the big CPUs even if there is no display on LCD (and no UI) ?
-- viresh
On Wed, Mar 07, 2018 at 11:37:30AM +0530, Viresh Kumar wrote:
On 06-03-18, 11:00, Patrick Bellasi wrote:
Right, more precisely I would say that we use "prefer_idle" to flag certain tasks as being "latency sensitive". That's because they directly affect the user experience, e.g. the UI thread.
How did we conclude that the tasks that end up on the big CPUs for homescreen and audio playback are *really* latency sensitive ? Is it correct to enable prefer-idle and boost for every foreground task on the system ? Maybe the algorithm (in kernel) is all fine, and we need to categorize the tasks in a finer way ?
Even if the system is not in low-util mode (as explained by you earlier), why should such tasks hit a big CPU ever ? What user experience are they taking care of ?
What about state of the system where the LCD screen is off (not displaying anything) but the platform isn't suspended, like maybe playing audio, software-updates, downloads, etc. Should all threads in such cases land on the big CPUs even if there is no display on LCD (and no UI) ?
Just ask one following question based on Viresh's:
Now Android divide tasks into several cgroup based on UI activities (top-app, foreground, backgroud and system backgroud), I just wander if we can set cgroup schedTune parameters based on different scenarios, but not based on different cgroups?
For example, when Android have hints for 'interactive'/camera/video, we can set schedtune.boost/prefer_idle for top-app & foreground cgroups; otherwise all schedtune.boost/prefer_idle can be cleared. So simply to say, we clear all schedtune flags for most time, and only set them for performance boosting scenarios (touching screen); I even think gaming scenario we don't need these tunable parameters and WALT/PELT signals have enough time to reflect the heavy workload for gaming threads.
What's your suggestion for this?
Thanks, Leo Yan
Hi Leo,
On 09/03/18 08:18, Leo Yan wrote:
On Wed, Mar 07, 2018 at 11:37:30AM +0530, Viresh Kumar wrote:
What about state of the system where the LCD screen is off (not displaying anything) but the platform isn't suspended, like maybe playing audio, software-updates, downloads, etc. Should all threads in such cases land on the big CPUs even if there is no display on LCD (and no UI) ?
Just ask one following question based on Viresh's:
Now Android divide tasks into several cgroup based on UI activities (top-app, foreground, backgroud and system backgroud), I just wander if we can set cgroup schedTune parameters based on different scenarios, but not based on different cgroups?
For example, when Android have hints for 'interactive'/camera/video, we can set schedtune.boost/prefer_idle for top-app & foreground cgroups; otherwise all schedtune.boost/prefer_idle can be cleared. So simply to say, we clear all schedtune flags for most time, and only set them for performance boosting scenarios (touching screen); I even think gaming scenario we don't need these tunable parameters and WALT/PELT signals have enough time to reflect the heavy workload for gaming threads.
What's your suggestion for this?
Absolutely - this falls squarely into the domain of Power HAL. I would only say that the current setup matches well the heuristics present in fbt, so when we change the assumptions about task classification and schedtune flags, we will have a new optimisation scenario to think about which might work well and might not - I suspect it will largely depend on the device what is suitable or not.
--Chris
Thanks, Leo Yan _______________________________________________ eas-dev mailing list eas-dev@lists.linaro.org https://lists.linaro.org/mailman/listinfo/eas-dev
Hi Chris,
On Fri, Mar 09, 2018 at 10:07:11AM +0000, Chris Redpath wrote:
Hi Leo,
On 09/03/18 08:18, Leo Yan wrote:
On Wed, Mar 07, 2018 at 11:37:30AM +0530, Viresh Kumar wrote:
What about state of the system where the LCD screen is off (not displaying anything) but the platform isn't suspended, like maybe playing audio, software-updates, downloads, etc. Should all threads in such cases land on the big CPUs even if there is no display on LCD (and no UI) ?
Just ask one following question based on Viresh's:
Now Android divide tasks into several cgroup based on UI activities (top-app, foreground, backgroud and system backgroud), I just wander if we can set cgroup schedTune parameters based on different scenarios, but not based on different cgroups?
For example, when Android have hints for 'interactive'/camera/video, we can set schedtune.boost/prefer_idle for top-app & foreground cgroups; otherwise all schedtune.boost/prefer_idle can be cleared. So simply to say, we clear all schedtune flags for most time, and only set them for performance boosting scenarios (touching screen); I even think gaming scenario we don't need these tunable parameters and WALT/PELT signals have enough time to reflect the heavy workload for gaming threads.
What's your suggestion for this?
Absolutely - this falls squarely into the domain of Power HAL. I would only say that the current setup matches well the heuristics present in fbt.
I am a bit confused for this, just as Viresh mentioned the scenario like audio playback with LCD off, usually the SoC has very well optimization for audio island, so even the minor wrongly task placement or OPP increasing can be reflected (the absolute power increasing is not big, but the relative percentage is quite high).
So I think currently we usually set the default boost margin as 10% or 20%, which is not good matching for audio playback with LCD off?
So when we change the assumptions about task classification and schedtune flags, we will have a new optimisation scenario to think about which might work well and might not - I suspect it will largely depend on the device what is suitable or not.
Just want to clarify that rather than proposal new method task classification, I want to know what's better schedtune flag setting with current task classification method.
And for example, if we only set boost margin after Android powerHAL receive 'interactive' hint, does this will introduce how much downgradation for UI performance?
Thanks, Leo Yan
On 09-Mar 18:31, Leo Yan wrote:
Hi Chris,
On Fri, Mar 09, 2018 at 10:07:11AM +0000, Chris Redpath wrote:
Hi Leo,
On 09/03/18 08:18, Leo Yan wrote:
On Wed, Mar 07, 2018 at 11:37:30AM +0530, Viresh Kumar wrote:
What about state of the system where the LCD screen is off (not displaying anything) but the platform isn't suspended, like maybe playing audio, software-updates, downloads, etc. Should all threads in such cases land on the big CPUs even if there is no display on LCD (and no UI) ?
Just ask one following question based on Viresh's:
Now Android divide tasks into several cgroup based on UI activities (top-app, foreground, backgroud and system backgroud), I just wander if we can set cgroup schedTune parameters based on different scenarios, but not based on different cgroups?
For example, when Android have hints for 'interactive'/camera/video, we can set schedtune.boost/prefer_idle for top-app & foreground cgroups; otherwise all schedtune.boost/prefer_idle can be cleared. So simply to say, we clear all schedtune flags for most time, and only set them for performance boosting scenarios (touching screen); I even think gaming scenario we don't need these tunable parameters and WALT/PELT signals have enough time to reflect the heavy workload for gaming threads.
What's your suggestion for this?
Absolutely - this falls squarely into the domain of Power HAL. I would only say that the current setup matches well the heuristics present in fbt.
IMO, from a FBT standpoint nothing really change. The ultimate goal of SchedTune (like) APIs is to allow the "propagation" of user-space knowledge into kernel space... where properly defined mechanisms are in charge to do the best to satisfy user-space requirements.
Thus...
I am a bit confused for this, just as Viresh mentioned the scenario like audio playback with LCD off, usually the SoC has very well optimization for audio island, so even the minor wrongly task placement or OPP increasing can be reflected (the absolute power increasing is not big, but the relative percentage is quite high).
... if the user-space is smart thought to know that in a certain scenario an app does not need anymore user interaction and can relax the prefer_idle hint, then I don't see any downside from the overall quality of the result.
Of course, the user-space has to be smart enough to know that a certain application with the screen off does not requires big CPUs... while there can be maybe some other scenarios where that's not true.
This just to say that I'm not convinced that the PowerHAL can take these decisions by itself, without any specific knowledge coming directly from the apps.
So I think currently we usually set the default boost margin as 10% or 20%, which is not good matching for audio playback with LCD off?
So when we change the assumptions about task classification and schedtune flags, we will have a new optimisation scenario to think about which might work well and might not - I suspect it will largely depend on the device what is suitable or not.
I would say that we should instead be able to define a proper interface where hints can be translated into optimal heuristics implemented as mechanisms in kernel space. If the user-space "propagates" the correct hints... then we should be fine.
Just want to clarify that rather than proposal new method task classification, I want to know what's better schedtune flag setting with current task classification method.
We are working on a different model of "tasks classification". In this new model there will not be pre-defined groups of tasks and tasks moving among groups. Instead, every app will have its own group and a set of attributes which will be tuned at run-time depending on the application role and/or desired behaviors (to a certain extend).
This new model will definitively match what Leo is proposing.
And for example, if we only set boost margin after Android powerHAL receive 'interactive' hint, does this will introduce how much downgradation for UI performance?
That should be relatively simple to profile on a target like an Hikey960 where we have full access to the PowerHAL...
-- #include <best/regards.h>
Patrick Bellasi
On 09/03/18 12:05, Patrick Bellasi wrote:
On 09-Mar 18:31, Leo Yan wrote:
Hi Chris,
On Fri, Mar 09, 2018 at 10:07:11AM +0000, Chris Redpath wrote:
Hi Leo,
On 09/03/18 08:18, Leo Yan wrote:
On Wed, Mar 07, 2018 at 11:37:30AM +0530, Viresh Kumar wrote:
What about state of the system where the LCD screen is off (not displaying anything) but the platform isn't suspended, like maybe playing audio, software-updates, downloads, etc. Should all threads in such cases land on the big CPUs even if there is no display on LCD (and no UI) ?
Just ask one following question based on Viresh's:
Now Android divide tasks into several cgroup based on UI activities (top-app, foreground, backgroud and system backgroud), I just wander if we can set cgroup schedTune parameters based on different scenarios, but not based on different cgroups?
For example, when Android have hints for 'interactive'/camera/video, we can set schedtune.boost/prefer_idle for top-app & foreground cgroups; otherwise all schedtune.boost/prefer_idle can be cleared. So simply to say, we clear all schedtune flags for most time, and only set them for performance boosting scenarios (touching screen); I even think gaming scenario we don't need these tunable parameters and WALT/PELT signals have enough time to reflect the heavy workload for gaming threads.
What's your suggestion for this?
Absolutely - this falls squarely into the domain of Power HAL. I would only say that the current setup matches well the heuristics present in fbt.
IMO, from a FBT standpoint nothing really change. The ultimate goal of SchedTune (like) APIs is to allow the "propagation" of user-space knowledge into kernel space... where properly defined mechanisms are in charge to do the best to satisfy user-space requirements.
Thus...
I am a bit confused for this, just as Viresh mentioned the scenario like audio playback with LCD off, usually the SoC has very well optimization for audio island, so even the minor wrongly task placement or OPP increasing can be reflected (the absolute power increasing is not big, but the relative percentage is quite high).
I might not have expressed what I meant clearly - all I was trying to say was that the heuristics in fbt are tested only with configurations where all top-app and foreground tasks have prefer_idle, all others do not. Similarly, they are tested with touch-boost increasing the schedtune boost level from the powerHAL, and no boost changes when interaction is not happening.
I was trying to make the point that if we change these situations, we might discover parts of the CPU pre-selection in fbt which do not match our existing assumptions.
... if the user-space is smart thought to know that in a certain scenario an app does not need anymore user interaction and can relax the prefer_idle hint, then I don't see any downside from the overall quality of the result.
Of course, the user-space has to be smart enough to know that a certain application with the screen off does not requires big CPUs... while there can be maybe some other scenarios where that's not true.
This just to say that I'm not convinced that the PowerHAL can take these decisions by itself, without any specific knowledge coming directly from the apps.
So I think currently we usually set the default boost margin as 10% or 20%, which is not good matching for audio playback with LCD off?
So when we change the assumptions about task classification and schedtune flags, we will have a new optimisation scenario to think about which might work well and might not - I suspect it will largely depend on the device what is suitable or not.
I'm trying here to say I don't know if we could guarantee (for any particular device) that removing boost and prefer_idle flags from the top_app group if the screen is off will result in a different task placement than leaving it in place, although we should be able to confidently say that it will result in a lower OPP being selected.
You just have a new configuration to test and tune and I think that the ultimate result is hard to reason about with any level of confidence. It likely depends on the device, how the audio pipeline is constituted etc. etc. etc.
It occurs to me that we don't currently know how long it takes for a screen-on event to translate into an event delivered to the power HAL and then to translate into a cgroup parameter change. It seems that we should try to measure that somehow to figure out if we dare to tune-down the system when the screen is off.
I would say that we should instead be able to define a proper interface where hints can be translated into optimal heuristics implemented as mechanisms in kernel space. If the user-space "propagates" the correct hints... then we should be fine.
Just want to clarify that rather than proposal new method task classification, I want to know what's better schedtune flag setting with current task classification method.
We are working on a different model of "tasks classification". In this new model there will not be pre-defined groups of tasks and tasks moving among groups. Instead, every app will have its own group and a set of attributes which will be tuned at run-time depending on the application role and/or desired behaviors (to a certain extend).
This new model will definitively match what Leo is proposing.
And for example, if we only set boost margin after Android powerHAL receive 'interactive' hint, does this will introduce how much downgradation for UI performance?
That should be relatively simple to profile on a target like an Hikey960 where we have full access to the PowerHAL...
Yep, this is going to be device specific.
Hi Leo,
On 05-Mar 15:57, Viresh Kumar wrote:
From: Leo Yan leo.yan@linaro.org
The homescreen test-case shows unwanted disturbance on the big CPUs on the Hikey620 platform with Android 4.9. There are multiple reasons for that though.
By default boost and prefer_idle are enabled for both top-app and foreground tasks and find_best_target() always (intentionally) prefers the big CPUs if prefer_idle is enabled. And some of the foreground tasks (like: DispSync, PowerManagerSer and PhotonicModulat) for the homescreen test-case get placed on the big CPUs eventually because of that.
Even if prefer_idle is disabled for such foreground tasks, they don't end up on the little CPUs. The reason being that find_best_target() still prefers big CPUs if the task is boosted, though some of the
^^^^^^^
Would say we don't prefer, but "just" start from big CPUs...
comments in find_best_target() routine say the exact opposite of that.
I guess you are referring to this comment:
* Case B) Non latency sensitive tasks on IDLE CPUs. * * Find an optimal backup IDLE CPU for non latency * sensitive tasks. * * Looking for: * - minimizing the capacity_orig, * i.e. preferring LITTLE CPUs * - favoring shallowest idle states * i.e. avoid to wakeup deep-idle CPUs * * The following code path is used by non latency * sensitive tasks if IDLE CPUs are available. If at * least one of such CPUs are available it sets the * best_idle_cpu to the most suitable idle CPU to be * selected. * * If idle CPUs are available, favour these CPUs to * improve performances by spreading tasks. * Indeed, the energy_diff() computed by the caller * will take care to ensure the minimization of energy * consumptions without affecting performance.
This comment does not assume any visiting ordering... which is intentional since we wanna, potentially, go thought all the CPUs and find the best target by just doing min/max aggregations... more on that later.
It eventually depends on the order in which CPUs are processed, which is from big to little for boosted tasks.
Order matter yes, but in case B we still do a minimization of the capacity_orig... thus eventually preferring LITTLE over big IDLE CPUs.
The only exception to this is when "sysctl_sched_cstate_aware" is set... which could end up in selection a big CPU over a LITTLE just because it's in a shallowest IDLE state.
Being B the case of non latency sensitive tasks, I'm wondering how much sense still have that condition... since it's definitively affecting our min/max aggregation in a potential bad way...
Do you have this sysctl set in your use-cases?
This patch updates the find_best_target() routine to select low capacity idle-CPU if the task is boosted but doesn't have prefer_idle enabled. This would be the same for non-boosted tasks as well.
Without cstate awareness, I would say we always end up preferring a LITTLE over an big CPU... provided they are both IDLE. Isn't it?
Signed-off-by: Leo Yan leo.yan@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
kernel/sched/fair.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e45047bdd245..4534d8620989 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6998,8 +6998,11 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, int idle_idx = idle_get_state_idx(cpu_rq(i));
/* Select idle CPU with lower cap_orig */
if (capacity_orig > best_idle_min_cap_orig)
if (capacity_orig < best_idle_min_cap_orig)
goto found_best_idle_cpu;
else if (capacity_orig > best_idle_min_cap_orig) continue;
/* Favor CPUs that won't end up running at a * high OPP. */
@@ -7017,6 +7020,7 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, best_idle_cstate <= idle_idx) continue;
+found_best_idle_cpu: /* Keep track of best idle CPU */ best_idle_min_cap_orig = capacity_orig; target_idle_max_spare_cap = capacity_orig - -- 2.15.0.194.g9af6a3dea062
-- #include <best/regards.h>
Patrick Bellasi
On 5 March 2018 at 18:22, Patrick Bellasi patrick.bellasi@arm.com wrote:
On 05-Mar 15:57, Viresh Kumar wrote:
Even if prefer_idle is disabled for such foreground tasks, they don't end up on the little CPUs. The reason being that find_best_target() still prefers big CPUs if the task is boosted, though some of the
^^^^^^^
Would say we don't prefer, but "just" start from big CPUs...
I wrote "prefer" here because that's where the current routine will take us to if at least one little and one big CPUs are idle, probably because of a bug.
comments in find_best_target() routine say the exact opposite of that.
I guess you are referring to this comment:
Yes.
This comment does not assume any visiting ordering... which is intentional since we wanna, potentially, go thought all the CPUs and find the best target by just doing min/max aggregations... more on that later.
Right, so its a bug that I am fixing now :)
It eventually depends on the order in which CPUs are processed, which is from big to little for boosted tasks.
Order matter yes, but in case B we still do a minimization of the capacity_orig... thus eventually preferring LITTLE over big IDLE CPUs.
Well, that's what we want to do but we will never do that. The very next statement after looking for lowest capacity is this:
if ((capacity_orig - min_capped_util) < target_idle_max_spare_cap) continue;
If we start from a big CPU, then the above expression will *always* evaluate to "true" for a little CPU and so we never go to the rest of the code for little CPUs. And that's a bug as we probably don't want that behavior.
The only exception to this is when "sysctl_sched_cstate_aware" is set... which could end up in selection a big CPU over a LITTLE just because it's in a shallowest IDLE state.
We don't reach here for little CPUs currently.
Being B the case of non latency sensitive tasks, I'm wondering how much sense still have that condition... since it's definitively affecting our min/max aggregation in a potential bad way...
Do you have this sysctl set in your use-cases?
Not sure. Is it enabled by default? I haven't changed anything related to that.
Without cstate awareness, I would say we always end up preferring a LITTLE over an big CPU... provided they are both IDLE. Isn't it?
We should, but we don't :)
-- viresh
On 05-Mar 20:44, Viresh Kumar wrote:
On 5 March 2018 at 18:22, Patrick Bellasi patrick.bellasi@arm.com wrote:
On 05-Mar 15:57, Viresh Kumar wrote:
Even if prefer_idle is disabled for such foreground tasks, they don't end up on the little CPUs. The reason being that find_best_target() still prefers big CPUs if the task is boosted, though some of the
^^^^^^^
Would say we don't prefer, but "just" start from big CPUs...
I wrote "prefer" here because that's where the current routine will take us to if at least one little and one big CPUs are idle, probably because of a bug.
comments in find_best_target() routine say the exact opposite of that.
I guess you are referring to this comment:
Yes.
This comment does not assume any visiting ordering... which is intentional since we wanna, potentially, go thought all the CPUs and find the best target by just doing min/max aggregations... more on that later.
Right, so its a bug that I am fixing now :)
It eventually depends on the order in which CPUs are processed, which is from big to little for boosted tasks.
Order matter yes, but in case B we still do a minimization of the capacity_orig... thus eventually preferring LITTLE over big IDLE CPUs.
Well, that's what we want to do but we will never do that. The very next statement after looking for lowest capacity is this:
if ((capacity_orig - min_capped_util) < target_idle_max_spare_cap) continue;
Hops, wait... got it... you are looking at the android-4.9-eas-dev branch!
... I was considering the "mainline" android-4.9 ACK
If we start from a big CPU, then the above expression will *always* evaluate to "true" for a little CPU and so we never go to the rest of the code for little CPUs. And that's a bug as we probably don't want that behavior.
Yes, looking at that code... it seems that we end up favouring a big CPU over a LITTLE... which is a behavior chance wrt the original code.
Let seem what we can do with the snippet above... since it's not yet merged in ACK...
The only exception to this is when "sysctl_sched_cstate_aware" is set... which could end up in selection a big CPU over a LITTLE just because it's in a shallowest IDLE state.
We don't reach here for little CPUs currently.
Being B the case of non latency sensitive tasks, I'm wondering how much sense still have that condition... since it's definitively affecting our min/max aggregation in a potential bad way...
Do you have this sysctl set in your use-cases?
Not sure. Is it enabled by default? I haven't changed anything related to that.
Without cstate awareness, I would say we always end up preferring a LITTLE over an big CPU... provided they are both IDLE. Isn't it?
We should, but we don't :)
-- viresh
-- #include <best/regards.h>
Patrick Bellasi
On 05-Mar 15:53, Patrick Bellasi wrote:
On 05-Mar 20:44, Viresh Kumar wrote:
On 5 March 2018 at 18:22, Patrick Bellasi patrick.bellasi@arm.com wrote:
On 05-Mar 15:57, Viresh Kumar wrote:
[...]
Well, that's what we want to do but we will never do that. The very next statement after looking for lowest capacity is this:
if ((capacity_orig - min_capped_util) < target_idle_max_spare_cap) continue;
Hops, wait... got it... you are looking at the android-4.9-eas-dev branch!
... I was considering the "mainline" android-4.9 ACK
If we start from a big CPU, then the above expression will *always* evaluate to "true" for a little CPU and so we never go to the rest of the code for little CPUs. And that's a bug as we probably don't want that behavior.
Yes, looking at that code... it seems that we end up favouring a big CPU over a LITTLE... which is a behavior chance wrt the original code.
Let seem what we can do with the snippet above... since it's not yet merged in ACK...
Here is in attachment a patch which should fix the issue Viresh/Leo reported. Comments / Tomatoes?
Cheers Patrick
-- #include <best/regards.h>
Patrick Bellasi
On 06-03-18, 11:26, Patrick Bellasi wrote:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1f1a73c8b7c9..9d2780c8c80b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6743,7 +6743,7 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, unsigned long target_max_spare_cap = 0; unsigned long target_util = ULONG_MAX; unsigned long best_active_util = ULONG_MAX;
- unsigned long target_idle_max_spare_cap = 0;
- unsigned long best_idle_max_spare_cap = 0; int best_idle_cstate = INT_MAX; struct sched_domain *sd; struct sched_group *sg;
@@ -6929,15 +6929,25 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, */ if (idle_cpu(i)) { int idle_idx = idle_get_state_idx(cpu_rq(i));
int max_spare_cap; /* Select idle CPU with lower cap_orig */ if (capacity_orig > best_idle_min_cap_orig) continue;
/* Favor CPUs that won't end up running at a
/*
* A more energy efficient CPU has been found,
* let's reset the conditions for the
* following min_capped_utilization check
*/
best_idle_max_spare_cap = 0;
That looks wrong (I haven't tested it though). You shouldn't be doing this unconditionally here as this makes the below comparison useless (which will always fail now).
The best you can do here is:
if (capacity_orig < best_idle_min_cap_orig) best_idle_max_spare_cap = 0;
In which I case I feel the original patch I posted looks better somehow.
/*
* Favor CPUs that won't end up running at a * high OPP. */
if ((capacity_orig - min_capped_util) <
target_idle_max_spare_cap)
max_spare_cap = capacity_orig - min_capped_util;
if (max_spare_cap < best_idle_max_spare_cap) continue; /*
@@ -6951,9 +6961,8 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, continue;
/* Keep track of best idle CPU */
best_idle_max_spare_cap = max_spare_cap; best_idle_min_cap_orig = capacity_orig;
target_idle_max_spare_cap = capacity_orig -
min_capped_util; best_idle_cstate = idle_idx; best_idle_cpu = i; continue;
-- 2.15.1
-- viresh
On 07-Mar 11:04, Viresh Kumar wrote:
On 06-03-18, 11:26, Patrick Bellasi wrote:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1f1a73c8b7c9..9d2780c8c80b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6743,7 +6743,7 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, unsigned long target_max_spare_cap = 0; unsigned long target_util = ULONG_MAX; unsigned long best_active_util = ULONG_MAX;
- unsigned long target_idle_max_spare_cap = 0;
- unsigned long best_idle_max_spare_cap = 0; int best_idle_cstate = INT_MAX; struct sched_domain *sd; struct sched_group *sg;
@@ -6929,15 +6929,25 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, */ if (idle_cpu(i)) { int idle_idx = idle_get_state_idx(cpu_rq(i));
int max_spare_cap; /* Select idle CPU with lower cap_orig */ if (capacity_orig > best_idle_min_cap_orig) continue;
/* Favor CPUs that won't end up running at a
/*
* A more energy efficient CPU has been found,
* let's reset the conditions for the
* following min_capped_utilization check
*/
best_idle_max_spare_cap = 0;
That looks wrong (I haven't tested it though). You shouldn't be doing this unconditionally here as this makes the below comparison useless (which will always fail now).
Good catch... indeed, this is a min/max problem...
The best you can do here is:
if (capacity_orig < best_idle_min_cap_orig) best_idle_max_spare_cap = 0;
Right, that should work, and I like that it makes clear that we want to miximize the spare capacity within the most energy efficient cluster.
In which I case I feel the original patch I posted looks better somehow.
The main thing I don't completely like of your proposal is the goto. Not for code style... but just because this version of FBT is a cleanup of a version which started with some goto and at the end it turned out to be almost impossible to understand.
Thus, althout a single goto does not make is such worst, I would prefer to keep a streamline approach where: - we go throught the CPUs - (possibly) independently from the starting point - just to min (or max) aggregations - continue with the next CPU as soon as the more restrictive condition is not meet
The fix you propose above seems to match this code structuring.
/*
* Favor CPUs that won't end up running at a * high OPP. */
if ((capacity_orig - min_capped_util) <
target_idle_max_spare_cap)
max_spare_cap = capacity_orig - min_capped_util;
if (max_spare_cap < best_idle_max_spare_cap) continue; /*
@@ -6951,9 +6961,8 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, continue;
/* Keep track of best idle CPU */
best_idle_max_spare_cap = max_spare_cap; best_idle_min_cap_orig = capacity_orig;
target_idle_max_spare_cap = capacity_orig -
min_capped_util; best_idle_cstate = idle_idx; best_idle_cpu = i; continue;
-- 2.15.1
-- #include <best/regards.h>
Patrick Bellasi
Hi,
Throwing my patch in the ring as well!
What do you guys think of:
--- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6742,7 +6742,7 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, unsigned long target_max_spare_cap = 0; unsigned long target_util = ULONG_MAX; unsigned long best_active_util = ULONG_MAX; - unsigned long target_idle_max_spare_cap = 0; + unsigned long backup_idle_max_spare_cap = 0; int best_idle_cstate = INT_MAX; struct sched_domain *sd; struct sched_group *sg; @@ -6928,15 +6928,22 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, */ if (idle_cpu(i)) { int idle_idx = idle_get_state_idx(cpu_rq(i)); + int is_min_cap_orig;
/* Select idle CPU with lower cap_orig */ if (capacity_orig > best_idle_min_cap_orig) continue; + + is_min_cap_orig = capacity_orig == + best_idle_min_cap_orig; + /* Favor CPUs that won't end up running at a - * high OPP. + * high OPP, but only if their capacity is equal + * to the efficient CPUs considered. */ - if ((capacity_orig - min_capped_util) < - target_idle_max_spare_cap) + if (is_min_cap_orig && + (capacity_orig - min_capped_util) < + backup_idle_max_spare_cap) continue;
/* @@ -6945,13 +6952,14 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, * IOW, prefer a deep IDLE LITTLE CPU vs a * shallow idle big CPU. */ - if (sysctl_sched_cstate_aware && - best_idle_cstate <= idle_idx) + if (is_min_cap_orig && + sysctl_sched_cstate_aware && + (best_idle_cstate <= idle_idx)) continue;
/* Keep track of best idle CPU */ best_idle_min_cap_orig = capacity_orig; - target_idle_max_spare_cap = capacity_orig - + backup_idle_max_spare_cap = capacity_orig - min_capped_util; best_idle_cstate = idle_idx; best_idle_cpu = i;
I've fixed the naming of target_idle_max_spare_cap as Patrick correctly pointed it out in his patch. Also, I only evaluate minimum capping condition and best idle state only for CPUs with the same capacity orig as the most efficient CPU found so far. Therefore, efficiency of CPU is always considered as the primary condition.
Regards, Ionela.
On 07/03/18 11:13, Patrick Bellasi wrote:
On 07-Mar 11:04, Viresh Kumar wrote:
On 06-03-18, 11:26, Patrick Bellasi wrote:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1f1a73c8b7c9..9d2780c8c80b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6743,7 +6743,7 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, unsigned long target_max_spare_cap = 0; unsigned long target_util = ULONG_MAX; unsigned long best_active_util = ULONG_MAX;
- unsigned long target_idle_max_spare_cap = 0;
- unsigned long best_idle_max_spare_cap = 0; int best_idle_cstate = INT_MAX; struct sched_domain *sd; struct sched_group *sg;
@@ -6929,15 +6929,25 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, */ if (idle_cpu(i)) { int idle_idx = idle_get_state_idx(cpu_rq(i));
int max_spare_cap; /* Select idle CPU with lower cap_orig */ if (capacity_orig > best_idle_min_cap_orig) continue;
/* Favor CPUs that won't end up running at a
/*
* A more energy efficient CPU has been found,
* let's reset the conditions for the
* following min_capped_utilization check
*/
best_idle_max_spare_cap = 0;
That looks wrong (I haven't tested it though). You shouldn't be doing this unconditionally here as this makes the below comparison useless (which will always fail now).
Good catch... indeed, this is a min/max problem...
The best you can do here is:
if (capacity_orig < best_idle_min_cap_orig) best_idle_max_spare_cap = 0;
Right, that should work, and I like that it makes clear that we want to miximize the spare capacity within the most energy efficient cluster.
In which I case I feel the original patch I posted looks better somehow.
The main thing I don't completely like of your proposal is the goto. Not for code style... but just because this version of FBT is a cleanup of a version which started with some goto and at the end it turned out to be almost impossible to understand.
Thus, althout a single goto does not make is such worst, I would prefer to keep a streamline approach where:
- we go throught the CPUs
- (possibly) independently from the starting point
- just to min (or max) aggregations
- continue with the next CPU as soon as the more restrictive condition is not meet
The fix you propose above seems to match this code structuring.
/*
* Favor CPUs that won't end up running at a * high OPP. */
if ((capacity_orig - min_capped_util) <
target_idle_max_spare_cap)
max_spare_cap = capacity_orig - min_capped_util;
if (max_spare_cap < best_idle_max_spare_cap) continue; /*
@@ -6951,9 +6961,8 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, continue;
/* Keep track of best idle CPU */
best_idle_max_spare_cap = max_spare_cap; best_idle_min_cap_orig = capacity_orig;
target_idle_max_spare_cap = capacity_orig -
min_capped_util; best_idle_cstate = idle_idx; best_idle_cpu = i; continue;
-- 2.15.1
On 07-03-18, 12:16, Ionela Voinescu wrote:
--- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6742,7 +6742,7 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, unsigned long target_max_spare_cap = 0; unsigned long target_util = ULONG_MAX; unsigned long best_active_util = ULONG_MAX;
unsigned long target_idle_max_spare_cap = 0;
unsigned long backup_idle_max_spare_cap = 0; int best_idle_cstate = INT_MAX; struct sched_domain *sd; struct sched_group *sg;
@@ -6928,15 +6928,22 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, */ if (idle_cpu(i)) { int idle_idx = idle_get_state_idx(cpu_rq(i));
int is_min_cap_orig; /* Select idle CPU with lower cap_orig */ if (capacity_orig > best_idle_min_cap_orig) continue;
is_min_cap_orig = capacity_orig ==
best_idle_min_cap_orig;
/* Favor CPUs that won't end up running at a
* high OPP.
* high OPP, but only if their capacity is equal
* to the efficient CPUs considered. */
if ((capacity_orig - min_capped_util) <
target_idle_max_spare_cap)
if (is_min_cap_orig &&
(capacity_orig - min_capped_util) <
backup_idle_max_spare_cap) continue; /*
@@ -6945,13 +6952,14 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, * IOW, prefer a deep IDLE LITTLE CPU vs a * shallow idle big CPU. */
if (sysctl_sched_cstate_aware &&
best_idle_cstate <= idle_idx)
if (is_min_cap_orig &&
sysctl_sched_cstate_aware &&
(best_idle_cstate <= idle_idx)) continue;
This surely fixes the issue I pointed in Patrick's patch, but this adds more comparisons to this loop which makes it less efficient. And so I really wonder now if the first patch I proposed was better or this one :)
I know, you guys didn't like the goto thing, and that's fine, but we should be efficient as well.
-- viresh
On 08-Mar 11:06, Viresh Kumar wrote:
On 07-03-18, 12:16, Ionela Voinescu wrote:
--- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6742,7 +6742,7 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, unsigned long target_max_spare_cap = 0; unsigned long target_util = ULONG_MAX; unsigned long best_active_util = ULONG_MAX;
unsigned long target_idle_max_spare_cap = 0;
unsigned long backup_idle_max_spare_cap = 0; int best_idle_cstate = INT_MAX; struct sched_domain *sd; struct sched_group *sg;
@@ -6928,15 +6928,22 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, */ if (idle_cpu(i)) { int idle_idx = idle_get_state_idx(cpu_rq(i));
int is_min_cap_orig; /* Select idle CPU with lower cap_orig */ if (capacity_orig > best_idle_min_cap_orig) continue;
is_min_cap_orig = capacity_orig ==
best_idle_min_cap_orig;
/* Favor CPUs that won't end up running at a
* high OPP.
* high OPP, but only if their capacity is equal
* to the efficient CPUs considered. */
if ((capacity_orig - min_capped_util) <
target_idle_max_spare_cap)
if (is_min_cap_orig &&
(capacity_orig - min_capped_util) <
backup_idle_max_spare_cap) continue; /*
@@ -6945,13 +6952,14 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, * IOW, prefer a deep IDLE LITTLE CPU vs a * shallow idle big CPU. */
if (sysctl_sched_cstate_aware &&
best_idle_cstate <= idle_idx)
if (is_min_cap_orig &&
sysctl_sched_cstate_aware &&
(best_idle_cstate <= idle_idx)) continue;
This surely fixes the issue I pointed in Patrick's patch, but this adds more comparisons to this loop which makes it less efficient. And so I really wonder now if the first patch I proposed was better or this one :)
I know, you guys didn't like the goto thing, and that's fine, but we should be efficient as well.
We are actually considering to completely drop this fix by working in a different direction, which is changing the start_cpu to consider prefer_idle instead of boosted.
This will allow to simplify the heuristic and make it easy the removal of case A in favour of the mainline slow path... but there is currently one downcase, which is that of foreground apps.
Ionela is currently wltesting a patch on P2 she will follow up if it should be ok from the power / performance standpoint.
-- #include <best/regards.h>
Patrick Bellasi
On 07-03-18, 11:13, Patrick Bellasi wrote:
Right, that should work, and I like that it makes clear that we want to miximize the spare capacity within the most energy efficient cluster.
Okay, I am fine with the "goto" reasoning you gave. But I see more problems now. Here is your patch again (with the fix included).
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1f1a73c8b7c9..9d2780c8c80b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6743,7 +6743,7 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, unsigned long target_max_spare_cap = 0; unsigned long target_util = ULONG_MAX; unsigned long best_active_util = ULONG_MAX;
- unsigned long target_idle_max_spare_cap = 0;
- unsigned long best_idle_max_spare_cap = 0; int best_idle_cstate = INT_MAX; struct sched_domain *sd; struct sched_group *sg;
@@ -6929,15 +6929,25 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, */ if (idle_cpu(i)) { int idle_idx = idle_get_state_idx(cpu_rq(i));
int max_spare_cap; /* Select idle CPU with lower cap_orig */ if (capacity_orig > best_idle_min_cap_orig) continue;
/* Favor CPUs that won't end up running at a
/*
* A more energy efficient CPU has been found,
* let's reset the conditions for the
* following min_capped_utilization check
*/
if (capacity_orig < best_idle_min_cap_orig)
best_idle_max_spare_cap = 0;
/*
* Favor CPUs that won't end up running at a * high OPP. */
if ((capacity_orig - min_capped_util) <
target_idle_max_spare_cap)
max_spare_cap = capacity_orig - min_capped_util;
if (max_spare_cap < best_idle_max_spare_cap) continue; /*
@@ -6951,9 +6961,8 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, continue;
/* Keep track of best idle CPU */
best_idle_max_spare_cap = max_spare_cap; best_idle_min_cap_orig = capacity_orig;
target_idle_max_spare_cap = capacity_orig -
min_capped_util; best_idle_cstate = idle_idx; best_idle_cpu = i; continue;
Because we don't use "goto" and end the loop here, we will reach this part of the code now for an idle CPU (traversing from big to little):
if (sysctl_sched_cstate_aware && best_idle_cstate <= idle_idx) continue;
So if the little CPU is at an idle state >= that of the big CPU, we skip it and that is bad as well :)
-- viresh
On 08-03-18, 10:45, Viresh Kumar wrote:
On 07-03-18, 11:13, Patrick Bellasi wrote:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1f1a73c8b7c9..9d2780c8c80b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6743,7 +6743,7 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, unsigned long target_max_spare_cap = 0; unsigned long target_util = ULONG_MAX; unsigned long best_active_util = ULONG_MAX;
- unsigned long target_idle_max_spare_cap = 0;
- unsigned long best_idle_max_spare_cap = 0; int best_idle_cstate = INT_MAX; struct sched_domain *sd; struct sched_group *sg;
@@ -6929,15 +6929,25 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, */ if (idle_cpu(i)) { int idle_idx = idle_get_state_idx(cpu_rq(i));
int max_spare_cap; /* Select idle CPU with lower cap_orig */ if (capacity_orig > best_idle_min_cap_orig) continue;
/* Favor CPUs that won't end up running at a
/*
* A more energy efficient CPU has been found,
* let's reset the conditions for the
* following min_capped_utilization check
*/
if (capacity_orig < best_idle_min_cap_orig)
best_idle_max_spare_cap = 0;
/*
* Favor CPUs that won't end up running at a * high OPP. */
if ((capacity_orig - min_capped_util) <
target_idle_max_spare_cap)
max_spare_cap = capacity_orig - min_capped_util;
if (max_spare_cap < best_idle_max_spare_cap) continue; /*
@@ -6951,9 +6961,8 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, continue;
/* Keep track of best idle CPU */
best_idle_max_spare_cap = max_spare_cap; best_idle_min_cap_orig = capacity_orig;
target_idle_max_spare_cap = capacity_orig -
min_capped_util; best_idle_cstate = idle_idx; best_idle_cpu = i; continue;
Because we don't use "goto" and end the loop here, we will reach this part of the code now for an idle CPU (traversing from big to little):
if (sysctl_sched_cstate_aware && best_idle_cstate <= idle_idx) continue;
So if the little CPU is at an idle state >= that of the big CPU, we skip it and that is bad as well :)
So the next fix I expect would be something like this :)
+ if (capacity_orig < best_idle_min_cap_orig) { + best_idle_max_spare_cap = 0; + best_idle_cstate = INT_MAX; + }
-- viresh
On 08-Mar 11:10, Viresh Kumar wrote:
On 08-03-18, 10:45, Viresh Kumar wrote:
On 07-03-18, 11:13, Patrick Bellasi wrote:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1f1a73c8b7c9..9d2780c8c80b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6743,7 +6743,7 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, unsigned long target_max_spare_cap = 0; unsigned long target_util = ULONG_MAX; unsigned long best_active_util = ULONG_MAX;
- unsigned long target_idle_max_spare_cap = 0;
- unsigned long best_idle_max_spare_cap = 0; int best_idle_cstate = INT_MAX; struct sched_domain *sd; struct sched_group *sg;
@@ -6929,15 +6929,25 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, */ if (idle_cpu(i)) { int idle_idx = idle_get_state_idx(cpu_rq(i));
int max_spare_cap; /* Select idle CPU with lower cap_orig */ if (capacity_orig > best_idle_min_cap_orig) continue;
/* Favor CPUs that won't end up running at a
/*
* A more energy efficient CPU has been found,
* let's reset the conditions for the
* following min_capped_utilization check
*/
if (capacity_orig < best_idle_min_cap_orig)
best_idle_max_spare_cap = 0;
/*
* Favor CPUs that won't end up running at a * high OPP. */
if ((capacity_orig - min_capped_util) <
target_idle_max_spare_cap)
max_spare_cap = capacity_orig - min_capped_util;
if (max_spare_cap < best_idle_max_spare_cap) continue; /*
@@ -6951,9 +6961,8 @@ static inline int find_best_target(struct task_struct *p, int *backup_cpu, continue;
/* Keep track of best idle CPU */
best_idle_max_spare_cap = max_spare_cap; best_idle_min_cap_orig = capacity_orig;
target_idle_max_spare_cap = capacity_orig -
min_capped_util; best_idle_cstate = idle_idx; best_idle_cpu = i; continue;
Because we don't use "goto" and end the loop here, we will reach this part of the code now for an idle CPU (traversing from big to little):
if (sysctl_sched_cstate_aware && best_idle_cstate <= idle_idx) continue;
So if the little CPU is at an idle state >= that of the big CPU, we skip it and that is bad as well :)
Yes I think I've pointed out this corner case in a main before...
So the next fix I expect would be something like this :)
if (capacity_orig < best_idle_min_cap_orig) {
best_idle_max_spare_cap = 0;
best_idle_cstate = INT_MAX;
}
... and I was actually wondering if we really need to keep this condition in.
Would like to run some wltests without this and see if we can really see benefits from it... especially considering that here we are at scheduling non latency sensitive tasks, this wakeup latencies are not a main goal.
The other goal is saving energy... but I'm wondering if the maximization of the spare capacity is not just good enough to pick the best CPU is almost all cases.
Another reason to keep it in, properly fixed, is that we have something similar in mainline...
-- #include <best/regards.h>
Patrick Bellasi
On 05/03/18 15:14, Viresh Kumar wrote:
On 5 March 2018 at 18:22, Patrick Bellasi patrick.bellasi@arm.com wrote:
On 05-Mar 15:57, Viresh Kumar wrote:
Even if prefer_idle is disabled for such foreground tasks, they don't end up on the little CPUs. The reason being that find_best_target() still prefers big CPUs if the task is boosted, though some of the
^^^^^^^
Would say we don't prefer, but "just" start from big CPUs...
I wrote "prefer" here because that's where the current routine will take us to if at least one little and one big CPUs are idle, probably because of a bug.
comments in find_best_target() routine say the exact opposite of that.
I guess you are referring to this comment:
Yes.
This comment does not assume any visiting ordering... which is intentional since we wanna, potentially, go thought all the CPUs and find the best target by just doing min/max aggregations... more on that later.
Right, so its a bug that I am fixing now :)
It eventually depends on the order in which CPUs are processed, which is from big to little for boosted tasks.
Order matter yes, but in case B we still do a minimization of the capacity_orig... thus eventually preferring LITTLE over big IDLE CPUs.
Well, that's what we want to do but we will never do that. The very next statement after looking for lowest capacity is this:
if ((capacity_orig - min_capped_util) < target_idle_max_spare_cap) continue;
If we start from a big CPU, then the above expression will *always* evaluate to "true" for a little CPU and so we never go to the rest of the code for little CPUs. And that's a bug as we probably don't want that behavior.
Yes, this condition was introduced by me with the minimum capacity capping patches to insure that we don't choose a CPU that is forced to run at an unusual high OPP by external actors. But I agree it does cause issues here, for cases in which the min cap is 0, and the bigs will (almost) always offer bigger spare. Probably a better decision is to consider the minimum cap on its own, and skip CPUs where this is high (still pondering if this should be relative to new util or not).
Ionela.
The only exception to this is when "sysctl_sched_cstate_aware" is set... which could end up in selection a big CPU over a LITTLE just because it's in a shallowest IDLE state.
We don't reach here for little CPUs currently.
Being B the case of non latency sensitive tasks, I'm wondering how much sense still have that condition... since it's definitively affecting our min/max aggregation in a potential bad way...
Do you have this sysctl set in your use-cases?
Not sure. Is it enabled by default? I haven't changed anything related to that.
Without cstate awareness, I would say we always end up preferring a LITTLE over an big CPU... provided they are both IDLE. Isn't it?
We should, but we don't :)
-- viresh _______________________________________________ eas-dev mailing list eas-dev@lists.linaro.org https://lists.linaro.org/mailman/listinfo/eas-dev