Hi Steve,
On 11/11/2015 5:02 PM, Steve Muckle wrote:
I haven't been able to reconcile this behavior with the code. There are hooks in dequeue_task_fair and the migration paths that should update the CFS capacity vote if the CPU CFS runqueue is empty.
Since the dequeue_task_fair path calls cpufreq_sched_reset_cap, it will zero out the CPU's capacity vote but this API does not trigger a re-evaluation of the overall required cluster capacity and set a new OPP. Another event (on any CPU in the cluster) will need to occur which will cause the cluster capacity to be re-evaluated. Any chance this is what you are seeing?
I took a second look - the problem is that the dequeue_task_fair will only kick in if the task being dequeued is going to sleep, which may not be true in case of forced preemption. So the dequeue before switching to the migration thread doesn't actually reset the capacity.
In the particular case that I was seeing, the max request was actually ignored because of the cpufreq_sched throttling scheme, which resulted in future requests being ignored:
1) There is a single task TaskA on CPU0. 2) CPU0 makes a max frequency request as part of enqueue_task_fair that is ignored because of throttling, but per_cpu(0,capacity) is set to max (in fact it's 1278 after the capacity margin is added). 3) TaskA is forcefully preempted. dequeue_task_fair is invoked but does not reset capacity since task_sleep=false. 4) migration/0 moves TaskA off of CPU0 and onto CPU1. 5) CPU0 switches to the swapper 6) Now CPU1 attempts to raise its frequency request (after some decay of taskA's util so not fmax), but all its requests are ignored since CPU0 has the max request. 7) CPU0 remains in idle for a long time.
It seems that pick_next_idle_task is a better place to reset the capacity unconditionally, but pulling the reset_cap out of the if(task_sleep) block would also work I think?
Thanks, Vikram