On 11/12/2015 01:42 AM, Juri Lelli wrote:
CPU1's tasks sleeps, since utilization is low. Don't we re-evaluate the request as soon as someone goes to sleep/wakes up?
We used to, but there are now optimizations that cause the re-evaluation to get skipped if the wakeup/sleep doesn't result in a change in the sched class capacity request. I've run into this during limited testing:
- Big task runs then goes away on CPU X. - Capacity vote of CPU X is reset to 0 but OPP is not changed, we remain at fmax. - Tiny tasks wake and sleep on other CPUs in the same cluster - they are so small that the capacity request stays at 0 for those CPUs and we don't bother going into cpufreq_sched to re-evaluate the cluster.
I think we should go ahead with the ideas that were raised in the meeting earlier today to try and fix this as it will prevent meaningful profiling.
- Ensure idle CPUs have their utilization decayed promptly/regularly via load balance or some other periodic mechanism. - Leave capacity vote in place when a CPU goes idle. As it is decayed, reduce its capacity vote and the cluster OPP as necessary. - If platforms wish to be more aggressive about dropping the frequency during idle, this can be done in the idle loop/driver where a fully informed decision for that instance can be made.
I think we'll still have a major issue in that PELT's decay is too slow but that's nothing new and needs to be addressed anyway.
Thoughts? If no one has concerns or wants to pick this up then I'll start working on it.