Hi Chris,
On 29/05/18 13:16, Chris Redpath wrote:
Hi,
Joel was looking at something else in android-4.14 wakeup path and he noticed that we have a difference in behavior for prefer_idle tasks when we're using find_best_target, so he asked if we could discuss that here.
Behavior in android-4.9 and earlier
When we find an idle CPU for a task which has the prefer_idle attribute, we immediately return that CPU from the energy-based wakeup CPU selection. This happens in slightly different places over the EAS and android versions but it is always true that we take the recommended idle CPU for tasks in this class.
How that changed in android-4.14
android-4.14 has two different wakeup paths, selected with a sched_feature FIND_BEST_TARGET. This defaults to true with the intent of preserving the previous behavior. Both paths are different, so I'll describe them below separately.
The two paths however share some common code - the way we integrated EAS with the regular wakeup code is different in android-4.14.
There were two reasons for doing this.
- minimize the differences in select_task_rq_fair wrt mainline code
- make better use of the per-sd overutilization flags
Since we have per-sd overutilised flags, we attempt to perform an EAS wakeup at the highest non-overutilised sched_domain - meaning that we can still perform an energy-aware wakeup for small tasks inside a non-overutilized group of small CPUs while potentially other groups of CPUs are overutilized.
The decision about attempting to use energy awareness is taken in wake_energy function at the top of strf - all the cases where we can't use energy-awareness are ruled out, and the decision about using find_idlest_cpu/EAS for prefer_idle tasks is also done here.
If we are using energy aware wakeups, then we will find the highest non-overutilised SD to wake in.
In all cases where we do an energy aware wakeup but don't find any suitable candidate CPUs we will go on to use find_idlest_cpu.
android-4.14 sched_feat(FIND_BEST_TARGET) true wakeup path
When FIND_BEST_TARGET sched feature is on (the default), we call find_best_target to populate the energy_env structure. This takes note of the prefer_idle flag and the task boost to change which task placement strategy will be used - the algorithm is the same as in previous versions of android.
However in android-4.14 (unintentionally) the prefer_idle task placement is not immediately acted upon - when a prefer_idle task is placed, we will select the first idle CPU we see *but* this will become the target CPU and we will still perform an energy diff and select between prev/target based upon energy requirement.
The open question is - now that we have realized that there is a different strategy in place, should we change it to be the same as the old version? I think that we should - it will be a simple change to use the idle cpu selected immediately without the energy diff.
All we would need is to add something like this:
--- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7383,7 +7383,7 @@ static int find_energy_efficient_cpu(struct sched_domain *sd, } } else { int boosted = (schedtune_task_boost(p) > 0);
int prefer_idle;
int prefer_idle, target_cpu; /* * give compiler a hint that if sched_features
@@ -7396,10 +7396,18 @@ static int find_energy_efficient_cpu(struct sched_domain *sd, eenv->max_cpu_count = EAS_CPU_BKP + 1;
/* Find a cpu with sufficient capacity */
eenv->cpu[EAS_CPU_NXT].cpu_id = find_best_target(p,
target_cpu = find_best_target(p, &eenv->cpu[EAS_CPU_BKP].cpu_id, boosted, prefer_idle);
/* immediately select idle CPUs for prefer-idle tasks */
if (prefer_idle && target_cpu >= 0 &&
idle_cpu(target_cpu)))
return target_cpu;
In 4.9 we also returned target_cpu if the task is boosted. So if the task is boosted or prefer_idle and the target_cpu is an idle cpu we would return immediately. Given that currently all boosted tasks are also prefer_idle, there might not be a point to add this as well, but if we want to maintain the same behavior for now and if we don't want to make assumptions about userspace schedtune configuration, this condition should be added. The reason behind it is that we'd want boosted tasks to benefit from low latency wakeups as well, if possible (but this logic was inherited from older platforms and probably older task boost implementation so I'm not sure how much sense or difference it makes today).
/* place target into NEXT slot */
eenv->cpu[EAS_CPU_NXT].cpu_id = target_cpu;
/* take note if no backup was found */ if (eenv->cpu[EAS_CPU_BKP].cpu_id < 0) eenv->max_cpu_count = EAS_CPU_BKP;
It could be the case that there are energy advantages to performing an energy diff for prefer_idle tasks, we've tried a few times to come up with a way to use lower-energy CPUs for these tasks where possible (for cases like full-screen video playback or screen-off audio playback) without harming the interactive response, but we haven't managed to come up with anything which doesn't need per-platform tuning which isn't really suitable. The underlying requirement for these prefer_idle tasks is that they should ideally not be preempted as they are likely to be part of the critical frame drawing path.
We should also note that the EAS_PREFER_IDLE sched_feature potentially also has an effect on this - if we are using find_best_target and we do not have EAS_PREFER_IDLE set to true (true is the default), then we don't pass the prefer_idle status of the task to find_best_target - however if EAS_PREFER_IDLE is not true, we should have avoided entering find_energy_efficient_cpu in the first place.
There is a chance (but I'm not sure I know enough about this to rule it out) for us to get here with !sched_feat(EAS_PREFER_IDLE): if !sched_feat(EAS_PREFER_IDLE) and sync is true, wake_energy will return true so we take the find_energy_efficient_cpu route. Here, a few things can happen: 1. sysctl_sched_sync_hint is enabled and the waker cpu is allowed so we return it, or..
2. if use_fbt is set (sched_feat(FIND_BEST_TARGET): the prefer_idle flag of the task is cleared if it was ever set, so the task is treated as a non prefer_idle task in find_best target. I'm not sure if this is what we want here. What do you think?
3. To be found below..
To be complete, there is also a further sched_feat for task placement which controls how we evaluate the energy requirement for a task. If FBT_STRICT_ORDER is on (the default), we will select the first CPU we find which saves energy compared to the prev_cpu. This matches previous versions of android.
We have been experimenting with selecting the most efficient out of the target/backup CPUs provided by find_best_target, if FBT_STRICT_ORDER is false this is what the scheduler will do. In testing there isn't really a clear winner out of these options, so I added the sched feature so users could play with the options easily on their platforms without code changes.
android-4.14 sched_feat(FIND_BEST_TARGET) false wakeup path
This uses the brute-force placement strategy from the pre-simplified-EM mainline EAS patch set, inside the android energy diff algorithm.
In android, we collect all the potential CPU placement options in a data structure, and then evaluate the energy for all of them at each sched_domain traversal level. This is done to avoid visiting each domain & group more than once.
Other than the switch to the new energy calculation method, the energy required is still calculated for every allowable CPU and then the option which has the lowest overall consumption is selected.
When it comes to dealing with prefer_idle tasks, there is another optional behaviour in this mode. If you have sched_feat(EAS_PREFER_IDLE) set to true (the default), then prefer_idle tasks have no special treatment in the energy-aware wakeup path. This means they will get the lowest-energy option with no regard to idleness.
However, if EAS_PREFER_IDLE is false (i.e. don't use EAS for prefer-idle tasks) then we route these tasks through the slow path wakeup code - the decision is made inside wake_energy, which propagates through to calling find_idlest_cpu instead of find_energy_efficient_cpu.
3. As said above, we could get here even with EAS_PREFER_IDLE false, case for which we ignore prefer_idle. So in this case the prefer_idle flag of the task is not considered no matter the value of sched_feat(EAS_PREFER_IDLE).
Regards, Ionela.
If the find_best_target feature FBT_STRICT_ORDER is true (the default) then the first CPU which saves energy compared to default will be selected rather than the most efficient. When combined with the brute force selection algorithm, this will result in the lowest-numbered cpu which saves energy over the prev_cpu being selected.
This is not the way that the brute force (mainline-alike) placement is intended to work, so when using brute force (FIND_BEST_TARGET=false) one should also turn off strict ordering (FBT_STRICT_ORDER=false) *and* opt out of using EAS for prefer_idle tasks (EAS_PREFER_IDLE=false).
--Chris