On 11/13/2015 01:37 AM, Patrick Bellasi wrote:
- Big task runs then goes away on CPU X.
- Capacity vote of CPU X is reset to 0 but OPP is not changed, we remain
at fmax.
- Tiny tasks wake and sleep on other CPUs in the same cluster - they are
so small that the capacity request stays at 0 for those CPUs and we don't bother going into cpufreq_sched to re-evaluate the cluster.
You mean that a 'vote' for a 0 capacity does not trigger anymore an aggregation at scheduling class level? AFAIR this was the main (only) mechanism that allowed to reduce the OPP as soon as the CPU running the higest load goes sleep.
Well, even in the EASv5 posting when a CPU went idle, it set its capacity vote to 0 but did so passively via the special cpufreq_sched_reset_cap() API. This call did not initiate an immediate re-evaluation/aggregation for the cluster or set a new OPP. It was up to the next event in the cluster to cause that to happen when it set a capacity vote with the normal API, cpufreq_sched_set_capacity().
Because of a desire to speed up the fast paths of wakeup/sleep as much as possible, we now avoid going into cpufreq_sched if the capacity being requested does not represent a change for that CPU/sched_class. That's making things worse than they already were w.r.t. getting stuck at elevated OPPs.