Hi,
Joel was looking at something else in android-4.14 wakeup path and he noticed that we have a difference in behavior for prefer_idle tasks when we're using find_best_target, so he asked if we could discuss that here.
Behavior in android-4.9 and earlier -----------------------------------
When we find an idle CPU for a task which has the prefer_idle attribute, we immediately return that CPU from the energy-based wakeup CPU selection. This happens in slightly different places over the EAS and android versions but it is always true that we take the recommended idle CPU for tasks in this class.
How that changed in android-4.14 --------------------------------
android-4.14 has two different wakeup paths, selected with a sched_feature FIND_BEST_TARGET. This defaults to true with the intent of preserving the previous behavior. Both paths are different, so I'll describe them below separately.
The two paths however share some common code - the way we integrated EAS with the regular wakeup code is different in android-4.14.
There were two reasons for doing this. 1. minimize the differences in select_task_rq_fair wrt mainline code 2. make better use of the per-sd overutilization flags
Since we have per-sd overutilised flags, we attempt to perform an EAS wakeup at the highest non-overutilised sched_domain - meaning that we can still perform an energy-aware wakeup for small tasks inside a non-overutilized group of small CPUs while potentially other groups of CPUs are overutilized.
The decision about attempting to use energy awareness is taken in wake_energy function at the top of strf - all the cases where we can't use energy-awareness are ruled out, and the decision about using find_idlest_cpu/EAS for prefer_idle tasks is also done here.
If we are using energy aware wakeups, then we will find the highest non-overutilised SD to wake in.
In all cases where we do an energy aware wakeup but don't find any suitable candidate CPUs we will go on to use find_idlest_cpu.
android-4.14 sched_feat(FIND_BEST_TARGET) true wakeup path ----------------------------------------------------------
When FIND_BEST_TARGET sched feature is on (the default), we call find_best_target to populate the energy_env structure. This takes note of the prefer_idle flag and the task boost to change which task placement strategy will be used - the algorithm is the same as in previous versions of android.
However in android-4.14 (unintentionally) the prefer_idle task placement is not immediately acted upon - when a prefer_idle task is placed, we will select the first idle CPU we see *but* this will become the target CPU and we will still perform an energy diff and select between prev/target based upon energy requirement.
The open question is - now that we have realized that there is a different strategy in place, should we change it to be the same as the old version? I think that we should - it will be a simple change to use the idle cpu selected immediately without the energy diff.
All we would need is to add something like this:
--- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7383,7 +7383,7 @@ static int find_energy_efficient_cpu(struct sched_domain *sd, } } else { int boosted = (schedtune_task_boost(p) > 0); - int prefer_idle; + int prefer_idle, target_cpu;
/* * give compiler a hint that if sched_features @@ -7396,10 +7396,18 @@ static int find_energy_efficient_cpu(struct sched_domain *sd, eenv->max_cpu_count = EAS_CPU_BKP + 1;
/* Find a cpu with sufficient capacity */ - eenv->cpu[EAS_CPU_NXT].cpu_id = find_best_target(p, + target_cpu = find_best_target(p, &eenv->cpu[EAS_CPU_BKP].cpu_id, boosted, prefer_idle);
+ /* immediately select idle CPUs for prefer-idle tasks */ + if (prefer_idle && target_cpu >= 0 && + idle_cpu(target_cpu))) + return target_cpu; + + /* place target into NEXT slot */ + eenv->cpu[EAS_CPU_NXT].cpu_id = target_cpu; + /* take note if no backup was found */ if (eenv->cpu[EAS_CPU_BKP].cpu_id < 0) eenv->max_cpu_count = EAS_CPU_BKP;
It could be the case that there are energy advantages to performing an energy diff for prefer_idle tasks, we've tried a few times to come up with a way to use lower-energy CPUs for these tasks where possible (for cases like full-screen video playback or screen-off audio playback) without harming the interactive response, but we haven't managed to come up with anything which doesn't need per-platform tuning which isn't really suitable. The underlying requirement for these prefer_idle tasks is that they should ideally not be preempted as they are likely to be part of the critical frame drawing path.
We should also note that the EAS_PREFER_IDLE sched_feature potentially also has an effect on this - if we are using find_best_target and we do not have EAS_PREFER_IDLE set to true (true is the default), then we don't pass the prefer_idle status of the task to find_best_target - however if EAS_PREFER_IDLE is not true, we should have avoided entering find_energy_efficient_cpu in the first place.
To be complete, there is also a further sched_feat for task placement which controls how we evaluate the energy requirement for a task. If FBT_STRICT_ORDER is on (the default), we will select the first CPU we find which saves energy compared to the prev_cpu. This matches previous versions of android.
We have been experimenting with selecting the most efficient out of the target/backup CPUs provided by find_best_target, if FBT_STRICT_ORDER is false this is what the scheduler will do. In testing there isn't really a clear winner out of these options, so I added the sched feature so users could play with the options easily on their platforms without code changes.
android-4.14 sched_feat(FIND_BEST_TARGET) false wakeup path -----------------------------------------------------------
This uses the brute-force placement strategy from the pre-simplified-EM mainline EAS patch set, inside the android energy diff algorithm.
In android, we collect all the potential CPU placement options in a data structure, and then evaluate the energy for all of them at each sched_domain traversal level. This is done to avoid visiting each domain & group more than once.
Other than the switch to the new energy calculation method, the energy required is still calculated for every allowable CPU and then the option which has the lowest overall consumption is selected.
When it comes to dealing with prefer_idle tasks, there is another optional behaviour in this mode. If you have sched_feat(EAS_PREFER_IDLE) set to true (the default), then prefer_idle tasks have no special treatment in the energy-aware wakeup path. This means they will get the lowest-energy option with no regard to idleness.
However, if EAS_PREFER_IDLE is false (i.e. don't use EAS for prefer-idle tasks) then we route these tasks through the slow path wakeup code - the decision is made inside wake_energy, which propagates through to calling find_idlest_cpu instead of find_energy_efficient_cpu.
If the find_best_target feature FBT_STRICT_ORDER is true (the default) then the first CPU which saves energy compared to default will be selected rather than the most efficient. When combined with the brute force selection algorithm, this will result in the lowest-numbered cpu which saves energy over the prev_cpu being selected.
This is not the way that the brute force (mainline-alike) placement is intended to work, so when using brute force (FIND_BEST_TARGET=false) one should also turn off strict ordering (FBT_STRICT_ORDER=false) *and* opt out of using EAS for prefer_idle tasks (EAS_PREFER_IDLE=false).
--Chris
Hi Chris,
On Tuesday 29 May 2018 at 13:16:17 (+0100), Chris Redpath wrote:
Hi,
Joel was looking at something else in android-4.14 wakeup path and he noticed that we have a difference in behavior for prefer_idle tasks when we're using find_best_target, so he asked if we could discuss that here.
Behavior in android-4.9 and earlier
When we find an idle CPU for a task which has the prefer_idle attribute, we immediately return that CPU from the energy-based wakeup CPU selection. This happens in slightly different places over the EAS and android versions but it is always true that we take the recommended idle CPU for tasks in this class.
How that changed in android-4.14
android-4.14 has two different wakeup paths, selected with a sched_feature FIND_BEST_TARGET. This defaults to true with the intent of preserving the previous behavior. Both paths are different, so I'll describe them below separately.
The two paths however share some common code - the way we integrated EAS with the regular wakeup code is different in android-4.14.
There were two reasons for doing this.
- minimize the differences in select_task_rq_fair wrt mainline code
- make better use of the per-sd overutilization flags
Since we have per-sd overutilised flags, we attempt to perform an EAS wakeup at the highest non-overutilised sched_domain - meaning that we can still perform an energy-aware wakeup for small tasks inside a non-overutilized group of small CPUs while potentially other groups of CPUs are overutilized.
The decision about attempting to use energy awareness is taken in wake_energy function at the top of strf - all the cases where we can't use energy-awareness are ruled out, and the decision about using find_idlest_cpu/EAS for prefer_idle tasks is also done here.
If we are using energy aware wakeups, then we will find the highest non-overutilised SD to wake in.
In all cases where we do an energy aware wakeup but don't find any suitable candidate CPUs we will go on to use find_idlest_cpu.
android-4.14 sched_feat(FIND_BEST_TARGET) true wakeup path
When FIND_BEST_TARGET sched feature is on (the default), we call find_best_target to populate the energy_env structure. This takes note of the prefer_idle flag and the task boost to change which task placement strategy will be used - the algorithm is the same as in previous versions of android.
However in android-4.14 (unintentionally) the prefer_idle task placement is not immediately acted upon - when a prefer_idle task is placed, we will select the first idle CPU we see *but* this will become the target CPU and we will still perform an energy diff and select between prev/target based upon energy requirement.
The open question is - now that we have realized that there is a different strategy in place, should we change it to be the same as the old version? I think that we should - it will be a simple change to use the idle cpu selected immediately without the energy diff.
All we would need is to add something like this:
--- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7383,7 +7383,7 @@ static int find_energy_efficient_cpu(struct sched_domain *sd, } } else { int boosted = (schedtune_task_boost(p) > 0);
int prefer_idle;
int prefer_idle, target_cpu; /* * give compiler a hint that if sched_features
@@ -7396,10 +7396,18 @@ static int find_energy_efficient_cpu(struct sched_domain *sd, eenv->max_cpu_count = EAS_CPU_BKP + 1;
/* Find a cpu with sufficient capacity */
eenv->cpu[EAS_CPU_NXT].cpu_id = find_best_target(p,
target_cpu = find_best_target(p, &eenv->cpu[EAS_CPU_BKP].cpu_id, boosted, prefer_idle);
/* immediately select idle CPUs for prefer-idle tasks */
if (prefer_idle && target_cpu >= 0 &&
idle_cpu(target_cpu)))
return target_cpu;
/* place target into NEXT slot */
eenv->cpu[EAS_CPU_NXT].cpu_id = target_cpu;
/* take note if no backup was found */ if (eenv->cpu[EAS_CPU_BKP].cpu_id < 0) eenv->max_cpu_count = EAS_CPU_BKP;
I'm not against the idea but we do need to be careful with changes like that to avoid migrating tasks for no reason ... If prev_cpu=5 and target_cpu=4 and both CPUs are idle, we might end up migrating the task from CPU 5 to 4 for nothing. The energy margin should prevent that, but at the cost of an expensive energy diff. Or we can have extra conditions in fbt()/find_energy_efficient_cpu() or something ...
What's the main obstacle for just using the mainline slow path for prefer_idle tasks (!EAS_PREFER_IDLE I think) in your opinion ?
Thanks, Quentin
Hi,
On 29/05/18 17:14, Quentin Perret wrote:
Hi Chris,
On Tuesday 29 May 2018 at 13:16:17 (+0100), Chris Redpath wrote:
Hi,
Joel was looking at something else in android-4.14 wakeup path and he noticed that we have a difference in behavior for prefer_idle tasks when we're using find_best_target, so he asked if we could discuss that here.
Behavior in android-4.9 and earlier
When we find an idle CPU for a task which has the prefer_idle attribute, we immediately return that CPU from the energy-based wakeup CPU selection. This happens in slightly different places over the EAS and android versions but it is always true that we take the recommended idle CPU for tasks in this class.
How that changed in android-4.14
android-4.14 has two different wakeup paths, selected with a sched_feature FIND_BEST_TARGET. This defaults to true with the intent of preserving the previous behavior. Both paths are different, so I'll describe them below separately.
The two paths however share some common code - the way we integrated EAS with the regular wakeup code is different in android-4.14.
There were two reasons for doing this.
- minimize the differences in select_task_rq_fair wrt mainline code
- make better use of the per-sd overutilization flags
Since we have per-sd overutilised flags, we attempt to perform an EAS wakeup at the highest non-overutilised sched_domain - meaning that we can still perform an energy-aware wakeup for small tasks inside a non-overutilized group of small CPUs while potentially other groups of CPUs are overutilized.
The decision about attempting to use energy awareness is taken in wake_energy function at the top of strf - all the cases where we can't use energy-awareness are ruled out, and the decision about using find_idlest_cpu/EAS for prefer_idle tasks is also done here.
If we are using energy aware wakeups, then we will find the highest non-overutilised SD to wake in.
In all cases where we do an energy aware wakeup but don't find any suitable candidate CPUs we will go on to use find_idlest_cpu.
android-4.14 sched_feat(FIND_BEST_TARGET) true wakeup path
When FIND_BEST_TARGET sched feature is on (the default), we call find_best_target to populate the energy_env structure. This takes note of the prefer_idle flag and the task boost to change which task placement strategy will be used - the algorithm is the same as in previous versions of android.
However in android-4.14 (unintentionally) the prefer_idle task placement is not immediately acted upon - when a prefer_idle task is placed, we will select the first idle CPU we see *but* this will become the target CPU and we will still perform an energy diff and select between prev/target based upon energy requirement.
The open question is - now that we have realized that there is a different strategy in place, should we change it to be the same as the old version? I think that we should - it will be a simple change to use the idle cpu selected immediately without the energy diff.
All we would need is to add something like this:
--- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7383,7 +7383,7 @@ static int find_energy_efficient_cpu(struct sched_domain *sd, } } else { int boosted = (schedtune_task_boost(p) > 0);
int prefer_idle;
int prefer_idle, target_cpu; /* * give compiler a hint that if sched_features
@@ -7396,10 +7396,18 @@ static int find_energy_efficient_cpu(struct sched_domain *sd, eenv->max_cpu_count = EAS_CPU_BKP + 1;
/* Find a cpu with sufficient capacity */
eenv->cpu[EAS_CPU_NXT].cpu_id = find_best_target(p,
target_cpu = find_best_target(p, &eenv->cpu[EAS_CPU_BKP].cpu_id, boosted, prefer_idle);
/* immediately select idle CPUs for prefer-idle tasks */
if (prefer_idle && target_cpu >= 0 &&
idle_cpu(target_cpu)))
return target_cpu;
/* place target into NEXT slot */
eenv->cpu[EAS_CPU_NXT].cpu_id = target_cpu;
/* take note if no backup was found */ if (eenv->cpu[EAS_CPU_BKP].cpu_id < 0) eenv->max_cpu_count = EAS_CPU_BKP;
I'm not against the idea but we do need to be careful with changes like that to avoid migrating tasks for no reason ... If prev_cpu=5 and target_cpu=4 and both CPUs are idle, we might end up migrating the task from CPU 5 to 4 for nothing. The energy margin should prevent that, but at the cost of an expensive energy diff. Or we can have extra conditions in fbt()/find_energy_efficient_cpu() or something ...
There could be cases where we might want to migrate the task from 5 to 4: to take advantage of reserved CPUs, to take advantage of a cpu in a more shallow idle state. I think as long as we have these QoS parameters to consider in find_best_target, we can't always allow energy_diff to change the cpu selection done in find_best_target.
What's the main obstacle for just using the mainline slow path for prefer_idle tasks (!EAS_PREFER_IDLE I think) in your opinion ?
In my opinion, that would be the fact that we'd no longer consider reserved CPUs, so currently prefer_idle and boosted tasks (top app) could risk competing and being preempted by tasks in other classes. But this is definitely worth evaluating so that's where the sched feature is very useful. Unfortunately, this decision cannot be taken in a short amount of time and lacking relevant platforms to evaluate this on.
Regards, Ionela.
Thanks, Quentin
On Wed, May 30, 2018 at 08:16:51PM +0100, Ionela Voinescu wrote:
Hi,
On 29/05/18 17:14, Quentin Perret wrote:
Hi Chris,
On Tuesday 29 May 2018 at 13:16:17 (+0100), Chris Redpath wrote:
Hi,
Joel was looking at something else in android-4.14 wakeup path and he noticed that we have a difference in behavior for prefer_idle tasks when we're using find_best_target, so he asked if we could discuss that here.
Behavior in android-4.9 and earlier
When we find an idle CPU for a task which has the prefer_idle attribute, we immediately return that CPU from the energy-based wakeup CPU selection. This happens in slightly different places over the EAS and android versions but it is always true that we take the recommended idle CPU for tasks in this class.
How that changed in android-4.14
android-4.14 has two different wakeup paths, selected with a sched_feature FIND_BEST_TARGET. This defaults to true with the intent of preserving the previous behavior. Both paths are different, so I'll describe them below separately.
The two paths however share some common code - the way we integrated EAS with the regular wakeup code is different in android-4.14.
There were two reasons for doing this.
- minimize the differences in select_task_rq_fair wrt mainline code
- make better use of the per-sd overutilization flags
Since we have per-sd overutilised flags, we attempt to perform an EAS wakeup at the highest non-overutilised sched_domain - meaning that we can still perform an energy-aware wakeup for small tasks inside a non-overutilized group of small CPUs while potentially other groups of CPUs are overutilized.
The decision about attempting to use energy awareness is taken in wake_energy function at the top of strf - all the cases where we can't use energy-awareness are ruled out, and the decision about using find_idlest_cpu/EAS for prefer_idle tasks is also done here.
If we are using energy aware wakeups, then we will find the highest non-overutilised SD to wake in.
In all cases where we do an energy aware wakeup but don't find any suitable candidate CPUs we will go on to use find_idlest_cpu.
android-4.14 sched_feat(FIND_BEST_TARGET) true wakeup path
When FIND_BEST_TARGET sched feature is on (the default), we call find_best_target to populate the energy_env structure. This takes note of the prefer_idle flag and the task boost to change which task placement strategy will be used - the algorithm is the same as in previous versions of android.
However in android-4.14 (unintentionally) the prefer_idle task placement is not immediately acted upon - when a prefer_idle task is placed, we will select the first idle CPU we see *but* this will become the target CPU and we will still perform an energy diff and select between prev/target based upon energy requirement.
The open question is - now that we have realized that there is a different strategy in place, should we change it to be the same as the old version? I think that we should - it will be a simple change to use the idle cpu selected immediately without the energy diff.
All we would need is to add something like this:
--- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7383,7 +7383,7 @@ static int find_energy_efficient_cpu(struct sched_domain *sd, } } else { int boosted = (schedtune_task_boost(p) > 0);
int prefer_idle;
int prefer_idle, target_cpu; /* * give compiler a hint that if sched_features
@@ -7396,10 +7396,18 @@ static int find_energy_efficient_cpu(struct sched_domain *sd, eenv->max_cpu_count = EAS_CPU_BKP + 1;
/* Find a cpu with sufficient capacity */
eenv->cpu[EAS_CPU_NXT].cpu_id = find_best_target(p,
target_cpu = find_best_target(p, &eenv->cpu[EAS_CPU_BKP].cpu_id, boosted, prefer_idle);
/* immediately select idle CPUs for prefer-idle tasks */
if (prefer_idle && target_cpu >= 0 &&
idle_cpu(target_cpu)))
return target_cpu;
/* place target into NEXT slot */
eenv->cpu[EAS_CPU_NXT].cpu_id = target_cpu;
/* take note if no backup was found */ if (eenv->cpu[EAS_CPU_BKP].cpu_id < 0) eenv->max_cpu_count = EAS_CPU_BKP;
I'm not against the idea but we do need to be careful with changes like that to avoid migrating tasks for no reason ... If prev_cpu=5 and target_cpu=4 and both CPUs are idle, we might end up migrating the task from CPU 5 to 4 for nothing. The energy margin should prevent that, but at the cost of an expensive energy diff. Or we can have extra conditions in fbt()/find_energy_efficient_cpu() or something ...
There could be cases where we might want to migrate the task from 5 to 4: to take advantage of reserved CPUs, to take advantage of a cpu in a more shallow idle state. I think as long as we have these QoS parameters to consider in find_best_target, we can't always allow energy_diff to change the cpu selection done in find_best_target.
What's the main obstacle for just using the mainline slow path for prefer_idle tasks (!EAS_PREFER_IDLE I think) in your opinion ?
In my opinion, that would be the fact that we'd no longer consider reserved CPUs, so currently prefer_idle and boosted tasks (top app) could risk competing and being preempted by tasks in other classes. But this is definitely worth evaluating so that's where the sched feature is very useful. Unfortunately, this decision cannot be taken in a short amount of time and lacking relevant platforms to evaluate this on.
When I experimented with it last year (using mainline slow-path for prefer idle), I do remember that there wasn't an issue with choosing a reserved CPU since the mainline slow-path also tries to maximize spare capacity, and the BIG reserved CPUs had both more capacity, and were mostly idle (because of the cpuset reservations).
I also couldn't see why we don't have any platforms to evaluate this on? A simple hack could be just to force the find_idlest_cpu path for all prefer idle tasks on older platforms. I agree the kernel version may change certain behaviors but I don't think evaluation of such change is not possible on most/all platforms we have currently, since the mainline slow-path hasn't changed much. May be I missed your point, in that case sorry :)
thanks,
- Joel
Hi
On 30/05/18 21:57, Joel Fernandes wrote:
On Wed, May 30, 2018 at 08:16:51PM +0100, Ionela Voinescu wrote:
Hi,
On 29/05/18 17:14, Quentin Perret wrote:
Hi Chris,
On Tuesday 29 May 2018 at 13:16:17 (+0100), Chris Redpath wrote:
Hi,
Joel was looking at something else in android-4.14 wakeup path and he noticed that we have a difference in behavior for prefer_idle tasks when we're using find_best_target, so he asked if we could discuss that here.
Behavior in android-4.9 and earlier
When we find an idle CPU for a task which has the prefer_idle attribute, we immediately return that CPU from the energy-based wakeup CPU selection. This happens in slightly different places over the EAS and android versions but it is always true that we take the recommended idle CPU for tasks in this class.
How that changed in android-4.14
android-4.14 has two different wakeup paths, selected with a sched_feature FIND_BEST_TARGET. This defaults to true with the intent of preserving the previous behavior. Both paths are different, so I'll describe them below separately.
The two paths however share some common code - the way we integrated EAS with the regular wakeup code is different in android-4.14.
There were two reasons for doing this.
- minimize the differences in select_task_rq_fair wrt mainline code
- make better use of the per-sd overutilization flags
Since we have per-sd overutilised flags, we attempt to perform an EAS wakeup at the highest non-overutilised sched_domain - meaning that we can still perform an energy-aware wakeup for small tasks inside a non-overutilized group of small CPUs while potentially other groups of CPUs are overutilized.
The decision about attempting to use energy awareness is taken in wake_energy function at the top of strf - all the cases where we can't use energy-awareness are ruled out, and the decision about using find_idlest_cpu/EAS for prefer_idle tasks is also done here.
If we are using energy aware wakeups, then we will find the highest non-overutilised SD to wake in.
In all cases where we do an energy aware wakeup but don't find any suitable candidate CPUs we will go on to use find_idlest_cpu.
android-4.14 sched_feat(FIND_BEST_TARGET) true wakeup path
When FIND_BEST_TARGET sched feature is on (the default), we call find_best_target to populate the energy_env structure. This takes note of the prefer_idle flag and the task boost to change which task placement strategy will be used - the algorithm is the same as in previous versions of android.
However in android-4.14 (unintentionally) the prefer_idle task placement is not immediately acted upon - when a prefer_idle task is placed, we will select the first idle CPU we see *but* this will become the target CPU and we will still perform an energy diff and select between prev/target based upon energy requirement.
The open question is - now that we have realized that there is a different strategy in place, should we change it to be the same as the old version? I think that we should - it will be a simple change to use the idle cpu selected immediately without the energy diff.
All we would need is to add something like this:
--- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7383,7 +7383,7 @@ static int find_energy_efficient_cpu(struct sched_domain *sd, } } else { int boosted = (schedtune_task_boost(p) > 0);
int prefer_idle;
int prefer_idle, target_cpu; /* * give compiler a hint that if sched_features
@@ -7396,10 +7396,18 @@ static int find_energy_efficient_cpu(struct sched_domain *sd, eenv->max_cpu_count = EAS_CPU_BKP + 1;
/* Find a cpu with sufficient capacity */
eenv->cpu[EAS_CPU_NXT].cpu_id = find_best_target(p,
target_cpu = find_best_target(p, &eenv->cpu[EAS_CPU_BKP].cpu_id, boosted, prefer_idle);
/* immediately select idle CPUs for prefer-idle tasks */
if (prefer_idle && target_cpu >= 0 &&
idle_cpu(target_cpu)))
return target_cpu;
/* place target into NEXT slot */
eenv->cpu[EAS_CPU_NXT].cpu_id = target_cpu;
/* take note if no backup was found */ if (eenv->cpu[EAS_CPU_BKP].cpu_id < 0) eenv->max_cpu_count = EAS_CPU_BKP;
I'm not against the idea but we do need to be careful with changes like that to avoid migrating tasks for no reason ... If prev_cpu=5 and target_cpu=4 and both CPUs are idle, we might end up migrating the task from CPU 5 to 4 for nothing. The energy margin should prevent that, but at the cost of an expensive energy diff. Or we can have extra conditions in fbt()/find_energy_efficient_cpu() or something ...
There could be cases where we might want to migrate the task from 5 to 4: to take advantage of reserved CPUs, to take advantage of a cpu in a more shallow idle state. I think as long as we have these QoS parameters to consider in find_best_target, we can't always allow energy_diff to change the cpu selection done in find_best_target.
What's the main obstacle for just using the mainline slow path for prefer_idle tasks (!EAS_PREFER_IDLE I think) in your opinion ?
In my opinion, that would be the fact that we'd no longer consider reserved CPUs, so currently prefer_idle and boosted tasks (top app) could risk competing and being preempted by tasks in other classes. But this is definitely worth evaluating so that's where the sched feature is very useful. Unfortunately, this decision cannot be taken in a short amount of time and lacking relevant platforms to evaluate this on.
When I experimented with it last year (using mainline slow-path for prefer idle), I do remember that there wasn't an issue with choosing a reserved CPU since the mainline slow-path also tries to maximize spare capacity, and the BIG reserved CPUs had both more capacity, and were mostly idle (because of the cpuset reservations).
I saw good results as well in an initial run I was trying for find_best_target changes (https://gist.github.com/ionela-voinescu/f89815591c7f50864188094bb8c53ec4), and I had the same reasoning in my head to explain the results, but I did not get the chance to do more than to run wltests so far.
I've started today to put together some kernels for experimentation (prefer_idle through slow-path, fbt removed + few tweaks) on a 4.4 Pixel 2 kernel to get more data on this.
I also couldn't see why we don't have any platforms to evaluate this on? A simple hack could be just to force the find_idlest_cpu path for all prefer idle tasks on older platforms. I agree the kernel version may change certain behaviors but I don't think evaluation of such change is not possible on most/all platforms we have currently, since the mainline slow-path hasn't changed much. May be I missed your point, in that case sorry :)
No, you are right. I had 4.14 in mind and was referring more to the fact that even if we have good results on older kernels and platforms, it might be worth keeping this functionality under a sched feature for now to allow more experimentation the closer we get to mainline.
thanks,
- Joel
Hi Chris,
On 29/05/18 13:16, Chris Redpath wrote:
Hi,
Joel was looking at something else in android-4.14 wakeup path and he noticed that we have a difference in behavior for prefer_idle tasks when we're using find_best_target, so he asked if we could discuss that here.
Behavior in android-4.9 and earlier
When we find an idle CPU for a task which has the prefer_idle attribute, we immediately return that CPU from the energy-based wakeup CPU selection. This happens in slightly different places over the EAS and android versions but it is always true that we take the recommended idle CPU for tasks in this class.
How that changed in android-4.14
android-4.14 has two different wakeup paths, selected with a sched_feature FIND_BEST_TARGET. This defaults to true with the intent of preserving the previous behavior. Both paths are different, so I'll describe them below separately.
The two paths however share some common code - the way we integrated EAS with the regular wakeup code is different in android-4.14.
There were two reasons for doing this.
- minimize the differences in select_task_rq_fair wrt mainline code
- make better use of the per-sd overutilization flags
Since we have per-sd overutilised flags, we attempt to perform an EAS wakeup at the highest non-overutilised sched_domain - meaning that we can still perform an energy-aware wakeup for small tasks inside a non-overutilized group of small CPUs while potentially other groups of CPUs are overutilized.
The decision about attempting to use energy awareness is taken in wake_energy function at the top of strf - all the cases where we can't use energy-awareness are ruled out, and the decision about using find_idlest_cpu/EAS for prefer_idle tasks is also done here.
If we are using energy aware wakeups, then we will find the highest non-overutilised SD to wake in.
In all cases where we do an energy aware wakeup but don't find any suitable candidate CPUs we will go on to use find_idlest_cpu.
android-4.14 sched_feat(FIND_BEST_TARGET) true wakeup path
When FIND_BEST_TARGET sched feature is on (the default), we call find_best_target to populate the energy_env structure. This takes note of the prefer_idle flag and the task boost to change which task placement strategy will be used - the algorithm is the same as in previous versions of android.
However in android-4.14 (unintentionally) the prefer_idle task placement is not immediately acted upon - when a prefer_idle task is placed, we will select the first idle CPU we see *but* this will become the target CPU and we will still perform an energy diff and select between prev/target based upon energy requirement.
The open question is - now that we have realized that there is a different strategy in place, should we change it to be the same as the old version? I think that we should - it will be a simple change to use the idle cpu selected immediately without the energy diff.
All we would need is to add something like this:
--- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7383,7 +7383,7 @@ static int find_energy_efficient_cpu(struct sched_domain *sd, } } else { int boosted = (schedtune_task_boost(p) > 0);
int prefer_idle;
int prefer_idle, target_cpu; /* * give compiler a hint that if sched_features
@@ -7396,10 +7396,18 @@ static int find_energy_efficient_cpu(struct sched_domain *sd, eenv->max_cpu_count = EAS_CPU_BKP + 1;
/* Find a cpu with sufficient capacity */
eenv->cpu[EAS_CPU_NXT].cpu_id = find_best_target(p,
target_cpu = find_best_target(p, &eenv->cpu[EAS_CPU_BKP].cpu_id, boosted, prefer_idle);
/* immediately select idle CPUs for prefer-idle tasks */
if (prefer_idle && target_cpu >= 0 &&
idle_cpu(target_cpu)))
return target_cpu;
In 4.9 we also returned target_cpu if the task is boosted. So if the task is boosted or prefer_idle and the target_cpu is an idle cpu we would return immediately. Given that currently all boosted tasks are also prefer_idle, there might not be a point to add this as well, but if we want to maintain the same behavior for now and if we don't want to make assumptions about userspace schedtune configuration, this condition should be added. The reason behind it is that we'd want boosted tasks to benefit from low latency wakeups as well, if possible (but this logic was inherited from older platforms and probably older task boost implementation so I'm not sure how much sense or difference it makes today).
/* place target into NEXT slot */
eenv->cpu[EAS_CPU_NXT].cpu_id = target_cpu;
/* take note if no backup was found */ if (eenv->cpu[EAS_CPU_BKP].cpu_id < 0) eenv->max_cpu_count = EAS_CPU_BKP;
It could be the case that there are energy advantages to performing an energy diff for prefer_idle tasks, we've tried a few times to come up with a way to use lower-energy CPUs for these tasks where possible (for cases like full-screen video playback or screen-off audio playback) without harming the interactive response, but we haven't managed to come up with anything which doesn't need per-platform tuning which isn't really suitable. The underlying requirement for these prefer_idle tasks is that they should ideally not be preempted as they are likely to be part of the critical frame drawing path.
We should also note that the EAS_PREFER_IDLE sched_feature potentially also has an effect on this - if we are using find_best_target and we do not have EAS_PREFER_IDLE set to true (true is the default), then we don't pass the prefer_idle status of the task to find_best_target - however if EAS_PREFER_IDLE is not true, we should have avoided entering find_energy_efficient_cpu in the first place.
There is a chance (but I'm not sure I know enough about this to rule it out) for us to get here with !sched_feat(EAS_PREFER_IDLE): if !sched_feat(EAS_PREFER_IDLE) and sync is true, wake_energy will return true so we take the find_energy_efficient_cpu route. Here, a few things can happen: 1. sysctl_sched_sync_hint is enabled and the waker cpu is allowed so we return it, or..
2. if use_fbt is set (sched_feat(FIND_BEST_TARGET): the prefer_idle flag of the task is cleared if it was ever set, so the task is treated as a non prefer_idle task in find_best target. I'm not sure if this is what we want here. What do you think?
3. To be found below..
To be complete, there is also a further sched_feat for task placement which controls how we evaluate the energy requirement for a task. If FBT_STRICT_ORDER is on (the default), we will select the first CPU we find which saves energy compared to the prev_cpu. This matches previous versions of android.
We have been experimenting with selecting the most efficient out of the target/backup CPUs provided by find_best_target, if FBT_STRICT_ORDER is false this is what the scheduler will do. In testing there isn't really a clear winner out of these options, so I added the sched feature so users could play with the options easily on their platforms without code changes.
android-4.14 sched_feat(FIND_BEST_TARGET) false wakeup path
This uses the brute-force placement strategy from the pre-simplified-EM mainline EAS patch set, inside the android energy diff algorithm.
In android, we collect all the potential CPU placement options in a data structure, and then evaluate the energy for all of them at each sched_domain traversal level. This is done to avoid visiting each domain & group more than once.
Other than the switch to the new energy calculation method, the energy required is still calculated for every allowable CPU and then the option which has the lowest overall consumption is selected.
When it comes to dealing with prefer_idle tasks, there is another optional behaviour in this mode. If you have sched_feat(EAS_PREFER_IDLE) set to true (the default), then prefer_idle tasks have no special treatment in the energy-aware wakeup path. This means they will get the lowest-energy option with no regard to idleness.
However, if EAS_PREFER_IDLE is false (i.e. don't use EAS for prefer-idle tasks) then we route these tasks through the slow path wakeup code - the decision is made inside wake_energy, which propagates through to calling find_idlest_cpu instead of find_energy_efficient_cpu.
3. As said above, we could get here even with EAS_PREFER_IDLE false, case for which we ignore prefer_idle. So in this case the prefer_idle flag of the task is not considered no matter the value of sched_feat(EAS_PREFER_IDLE).
Regards, Ionela.
If the find_best_target feature FBT_STRICT_ORDER is true (the default) then the first CPU which saves energy compared to default will be selected rather than the most efficient. When combined with the brute force selection algorithm, this will result in the lowest-numbered cpu which saves energy over the prev_cpu being selected.
This is not the way that the brute force (mainline-alike) placement is intended to work, so when using brute force (FIND_BEST_TARGET=false) one should also turn off strict ordering (FBT_STRICT_ORDER=false) *and* opt out of using EAS for prefer_idle tasks (EAS_PREFER_IDLE=false).
--Chris
On Tue, May 29, 2018 at 01:16:17PM +0100, Chris Redpath wrote:
Hi,
Joel was looking at something else in android-4.14 wakeup path and he noticed that we have a difference in behavior for prefer_idle tasks when we're using find_best_target, so he asked if we could discuss that here.
Behavior in android-4.9 and earlier
When we find an idle CPU for a task which has the prefer_idle attribute, we immediately return that CPU from the energy-based wakeup CPU selection. This happens in slightly different places over the EAS and android versions but it is always true that we take the recommended idle CPU for tasks in this class.
How that changed in android-4.14
Thanks Chris and ARM team for the changes, the strf path looks much more mainline friendly now. :)
android-4.14 has two different wakeup paths, selected with a sched_feature FIND_BEST_TARGET. This defaults to true with the intent of preserving the previous behavior. Both paths are different, so I'll describe them below separately.
The two paths however share some common code - the way we integrated EAS with the regular wakeup code is different in android-4.14.
There were two reasons for doing this.
- minimize the differences in select_task_rq_fair wrt mainline code
- make better use of the per-sd overutilization flags
Since we have per-sd overutilised flags, we attempt to perform an EAS wakeup at the highest non-overutilised sched_domain - meaning that we can still perform an energy-aware wakeup for small tasks inside a non-overutilized group of small CPUs while potentially other groups of CPUs are overutilized.
The decision about attempting to use energy awareness is taken in wake_energy function at the top of strf - all the cases where we can't use energy-awareness are ruled out, and the decision about using find_idlest_cpu/EAS for prefer_idle tasks is also done here.
If we are using energy aware wakeups, then we will find the highest non-overutilised SD to wake in.
In all cases where we do an energy aware wakeup but don't find any suitable candidate CPUs we will go on to use find_idlest_cpu.
One thing I wanted to mention is that there are some cases where even if want_affine is set to 0 because want_energy = 1, we can still enter the select_idle_sibling path instead of energy-aware wake ups. I discovered this when I was playing with cpusets. If sched_load_balance is set to 0 in the root cpuset, then its possible that the main for_each_domain loop can be turned off. This is because all the domains would be detached from the rq. To trigger this, you could just do: mkdir /cpuset mount -t cpuset none /cpuset echo 0 > sched_load_balance
So in other words, I believe these cases shouldn't also end up in turning off the find_energy_efficient_cpu. Does that make sense or did I miss something?
android-4.14 sched_feat(FIND_BEST_TARGET) true wakeup path
When FIND_BEST_TARGET sched feature is on (the default), we call find_best_target to populate the energy_env structure. This takes note of the prefer_idle flag and the task boost to change which task placement strategy will be used - the algorithm is the same as in previous versions of android.
However in android-4.14 (unintentionally) the prefer_idle task placement is not immediately acted upon - when a prefer_idle task is placed, we will select the first idle CPU we see *but* this will become the target CPU and we will still perform an energy diff and select between prev/target based upon energy requirement.
The open question is - now that we have realized that there is a different strategy in place, should we change it to be the same as the old version? I think that we should - it will be a simple change to use the idle cpu selected immediately without the energy diff.
I guess its safer to keep the older version behavior since that's been well tested over the years and perhaps hasn't caused any issues that need fixing?
Also, if we are changing the behavior, I guess we could also make it such that if EAS_PREFER_IDLE is set and the task is prefer-idle, then we just run find_idlest_cpu unconditionally to find an idle CPU? That will also make sure that we are using the mainline find_idlest_cpu path for prefer-idle CPU. I think that would be a worthwhile change to make so we are even more aligned with mainline slow-path in hunting for an idle CPU. What do you think?
thanks,
- Joel
On Wednesday 30 May 2018 at 12:19:11 (-0700), Joel Fernandes wrote:
On Tue, May 29, 2018 at 01:16:17PM +0100, Chris Redpath wrote:
Hi,
Joel was looking at something else in android-4.14 wakeup path and he noticed that we have a difference in behavior for prefer_idle tasks when we're using find_best_target, so he asked if we could discuss that here.
Behavior in android-4.9 and earlier
When we find an idle CPU for a task which has the prefer_idle attribute, we immediately return that CPU from the energy-based wakeup CPU selection. This happens in slightly different places over the EAS and android versions but it is always true that we take the recommended idle CPU for tasks in this class.
How that changed in android-4.14
Thanks Chris and ARM team for the changes, the strf path looks much more mainline friendly now. :)
android-4.14 has two different wakeup paths, selected with a sched_feature FIND_BEST_TARGET. This defaults to true with the intent of preserving the previous behavior. Both paths are different, so I'll describe them below separately.
The two paths however share some common code - the way we integrated EAS with the regular wakeup code is different in android-4.14.
There were two reasons for doing this.
- minimize the differences in select_task_rq_fair wrt mainline code
- make better use of the per-sd overutilization flags
Since we have per-sd overutilised flags, we attempt to perform an EAS wakeup at the highest non-overutilised sched_domain - meaning that we can still perform an energy-aware wakeup for small tasks inside a non-overutilized group of small CPUs while potentially other groups of CPUs are overutilized.
The decision about attempting to use energy awareness is taken in wake_energy function at the top of strf - all the cases where we can't use energy-awareness are ruled out, and the decision about using find_idlest_cpu/EAS for prefer_idle tasks is also done here.
If we are using energy aware wakeups, then we will find the highest non-overutilised SD to wake in.
In all cases where we do an energy aware wakeup but don't find any suitable candidate CPUs we will go on to use find_idlest_cpu.
One thing I wanted to mention is that there are some cases where even if want_affine is set to 0 because want_energy = 1, we can still enter the select_idle_sibling path instead of energy-aware wake ups. I discovered this when I was playing with cpusets. If sched_load_balance is set to 0 in the root cpuset, then its possible that the main for_each_domain loop can be turned off. This is because all the domains would be detached from the rq. To trigger this, you could just do: mkdir /cpuset mount -t cpuset none /cpuset echo 0 > sched_load_balance
So in other words, I believe these cases shouldn't also end up in turning off the find_energy_efficient_cpu. Does that make sense or did I miss something?
Ah, right, we rely on the SD_BALANCE_WAKE flag to be set in the for loop of fbt(). It should be set by default on all domains below the domain that has SD_ASYM_CPUCAPACITY set (typically DIE in our case). I guess that "echo 0 > sched_load_balance" clears the SD_BALANCE_WAKE flag or something like that which causes the issue you see.
The v3 of the energy model that is on LKML right now is slightly different in this area. We avoid the loop in strf() entirely actually. Could you have a look and let me know what you think ? Hopefully we could do something similar in android-4.14.
Thanks, Quentin
android-4.14 sched_feat(FIND_BEST_TARGET) true wakeup path
When FIND_BEST_TARGET sched feature is on (the default), we call find_best_target to populate the energy_env structure. This takes note of the prefer_idle flag and the task boost to change which task placement strategy will be used - the algorithm is the same as in previous versions of android.
However in android-4.14 (unintentionally) the prefer_idle task placement is not immediately acted upon - when a prefer_idle task is placed, we will select the first idle CPU we see *but* this will become the target CPU and we will still perform an energy diff and select between prev/target based upon energy requirement.
The open question is - now that we have realized that there is a different strategy in place, should we change it to be the same as the old version? I think that we should - it will be a simple change to use the idle cpu selected immediately without the energy diff.
I guess its safer to keep the older version behavior since that's been well tested over the years and perhaps hasn't caused any issues that need fixing?
Also, if we are changing the behavior, I guess we could also make it such that if EAS_PREFER_IDLE is set and the task is prefer-idle, then we just run find_idlest_cpu unconditionally to find an idle CPU? That will also make sure that we are using the mainline find_idlest_cpu path for prefer-idle CPU. I think that would be a worthwhile change to make so we are even more aligned with mainline slow-path in hunting for an idle CPU. What do you think?
thanks,
- Joel
eas-dev mailing list eas-dev@lists.linaro.org https://lists.linaro.org/mailman/listinfo/eas-dev
On 05/31/2018 10:33 AM, Quentin Perret wrote:
On Wednesday 30 May 2018 at 12:19:11 (-0700), Joel Fernandes wrote:
On Tue, May 29, 2018 at 01:16:17PM +0100, Chris Redpath wrote:
[...]
One thing I wanted to mention is that there are some cases where even if want_affine is set to 0 because want_energy = 1, we can still enter the select_idle_sibling path instead of energy-aware wake ups. I discovered this when I was playing with cpusets. If sched_load_balance is set to 0 in the root cpuset, then its possible that the main for_each_domain loop can be turned off. This is because all the domains would be detached from the rq. To trigger this, you could just do: mkdir /cpuset mount -t cpuset none /cpuset echo 0 > sched_load_balance
So in other words, I believe these cases shouldn't also end up in turning off the find_energy_efficient_cpu. Does that make sense or did I miss something?
Ah, right, we rely on the SD_BALANCE_WAKE flag to be set in the for loop of fbt(). It should be set by default on all domains below the domain that has SD_ASYM_CPUCAPACITY set (typically DIE in our case). I guess that "echo 0 > sched_load_balance" clears the SD_BALANCE_WAKE flag or something like that which causes the issue you see.
The v3 of the energy model that is on LKML right now is slightly different in this area. We avoid the loop in strf() entirely actually. Could you have a look and let me know what you think ? Hopefully we could do something similar in android-4.14.
I'm asking myself if this would be a legitimate use case for android (destroying all sched domains)?
(1) That's my Juno (Arm64, A53 A57 A57 A53 A53 A53) after it came up:
# cat /proc/schedstat | grep "^cpu|^domain" | awk {'print $1 " " $2} cpu0 0 domain0 39 domain1 3f cpu1 0 domain0 06 domain1 3f cpu2 0 domain0 06 domain1 3f cpu3 0 domain0 39 domain1 3f cpu4 0 domain0 39 domain1 3f cpu5 0 domain0 39
(2) Now I disable load balance system-wide (root cpuset).
# echo 0 > /sys/fs/cgroup/cpuset/cpuset.sched_load_balance [ 409.467018] CPU0 attaching NULL sched-domain. [ 409.473826] CPU1 attaching NULL sched-domain. [ 409.480500] CPU2 attaching NULL sched-domain. [ 409.485284] CPU3 attaching NULL sched-domain. [ 409.490063] CPU4 attaching NULL sched-domain. [ 409.494746] CPU5 attaching NULL sched-domain.
(3) My sched domains (sd's) are completely gone:
# cat /proc/schedstat | grep "^cpu|^domain" | awk {'print $1 " " $2} cpu0 0 cpu1 0 cpu2 0 cpu3 0 cpu4 0 cpu5 0
Now if I create cpuset foo and bar with exclusive cpusets I would have 2 distinct sd hierarchies on which strf() and lb() could operate on. (If I choose all the little for foo and all the big cpus for bar, then EAS wouldn't make much sense. We haven't tested systems with multiple root domains.) But is this really a use case for android either?
[...]