Hi Chris,
On Tuesday 29 May 2018 at 13:16:17 (+0100), Chris Redpath wrote:
Hi,
Joel was looking at something else in android-4.14 wakeup path and he noticed that we have a difference in behavior for prefer_idle tasks when we're using find_best_target, so he asked if we could discuss that here.
Behavior in android-4.9 and earlier
When we find an idle CPU for a task which has the prefer_idle attribute, we immediately return that CPU from the energy-based wakeup CPU selection. This happens in slightly different places over the EAS and android versions but it is always true that we take the recommended idle CPU for tasks in this class.
How that changed in android-4.14
android-4.14 has two different wakeup paths, selected with a sched_feature FIND_BEST_TARGET. This defaults to true with the intent of preserving the previous behavior. Both paths are different, so I'll describe them below separately.
The two paths however share some common code - the way we integrated EAS with the regular wakeup code is different in android-4.14.
There were two reasons for doing this.
- minimize the differences in select_task_rq_fair wrt mainline code
- make better use of the per-sd overutilization flags
Since we have per-sd overutilised flags, we attempt to perform an EAS wakeup at the highest non-overutilised sched_domain - meaning that we can still perform an energy-aware wakeup for small tasks inside a non-overutilized group of small CPUs while potentially other groups of CPUs are overutilized.
The decision about attempting to use energy awareness is taken in wake_energy function at the top of strf - all the cases where we can't use energy-awareness are ruled out, and the decision about using find_idlest_cpu/EAS for prefer_idle tasks is also done here.
If we are using energy aware wakeups, then we will find the highest non-overutilised SD to wake in.
In all cases where we do an energy aware wakeup but don't find any suitable candidate CPUs we will go on to use find_idlest_cpu.
android-4.14 sched_feat(FIND_BEST_TARGET) true wakeup path
When FIND_BEST_TARGET sched feature is on (the default), we call find_best_target to populate the energy_env structure. This takes note of the prefer_idle flag and the task boost to change which task placement strategy will be used - the algorithm is the same as in previous versions of android.
However in android-4.14 (unintentionally) the prefer_idle task placement is not immediately acted upon - when a prefer_idle task is placed, we will select the first idle CPU we see *but* this will become the target CPU and we will still perform an energy diff and select between prev/target based upon energy requirement.
The open question is - now that we have realized that there is a different strategy in place, should we change it to be the same as the old version? I think that we should - it will be a simple change to use the idle cpu selected immediately without the energy diff.
All we would need is to add something like this:
--- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7383,7 +7383,7 @@ static int find_energy_efficient_cpu(struct sched_domain *sd, } } else { int boosted = (schedtune_task_boost(p) > 0);
int prefer_idle;
int prefer_idle, target_cpu; /* * give compiler a hint that if sched_features
@@ -7396,10 +7396,18 @@ static int find_energy_efficient_cpu(struct sched_domain *sd, eenv->max_cpu_count = EAS_CPU_BKP + 1;
/* Find a cpu with sufficient capacity */
eenv->cpu[EAS_CPU_NXT].cpu_id = find_best_target(p,
target_cpu = find_best_target(p, &eenv->cpu[EAS_CPU_BKP].cpu_id, boosted, prefer_idle);
/* immediately select idle CPUs for prefer-idle tasks */
if (prefer_idle && target_cpu >= 0 &&
idle_cpu(target_cpu)))
return target_cpu;
/* place target into NEXT slot */
eenv->cpu[EAS_CPU_NXT].cpu_id = target_cpu;
/* take note if no backup was found */ if (eenv->cpu[EAS_CPU_BKP].cpu_id < 0) eenv->max_cpu_count = EAS_CPU_BKP;
I'm not against the idea but we do need to be careful with changes like that to avoid migrating tasks for no reason ... If prev_cpu=5 and target_cpu=4 and both CPUs are idle, we might end up migrating the task from CPU 5 to 4 for nothing. The energy margin should prevent that, but at the cost of an expensive energy diff. Or we can have extra conditions in fbt()/find_energy_efficient_cpu() or something ...
What's the main obstacle for just using the mainline slow path for prefer_idle tasks (!EAS_PREFER_IDLE I think) in your opinion ?
Thanks, Quentin