Hi Vikram,
On 11/11/15, Vikram Mulukutla wrote:
On 11/11/2015 7:42 PM, Vikram Mulukutla wrote:
Hi Steve,
On 11/11/2015 5:02 PM, Steve Muckle wrote:
I haven't been able to reconcile this behavior with the code. There are hooks in dequeue_task_fair and the migration paths that should update the CFS capacity vote if the CPU CFS runqueue is empty.
Since the dequeue_task_fair path calls cpufreq_sched_reset_cap, it will zero out the CPU's capacity vote but this API does not trigger a re-evaluation of the overall required cluster capacity and set a new OPP. Another event (on any CPU in the cluster) will need to occur which will cause the cluster capacity to be re-evaluated. Any chance this is what you are seeing?
I took a second look - the problem is that the dequeue_task_fair will only kick in if the task being dequeued is going to sleep, which may not be true in case of forced preemption. So the dequeue before switching to the migration thread doesn't actually reset the capacity.
In the particular case that I was seeing, the max request was actually ignored because of the cpufreq_sched throttling scheme, which resulted in future requests being ignored:
- There is a single task TaskA on CPU0.
- CPU0 makes a max frequency request as part of enqueue_task_fair
that is ignored because of throttling, but per_cpu(0,capacity) is set to max (in fact it's 1278 after the capacity margin is added).
To be precise, the max request in (2) is made because TaskA was enqueued on CPU0.
- TaskA is forcefully preempted. dequeue_task_fair is invoked but
does not reset capacity since task_sleep=false.
So, why is TaskA preempted? Activation of some other task?
Also, we check for tash_sleep=false because we have triggering points for load_balancing operations (as you might move more than one task at a time). I'm wondering why those points don't cover this case.
Thanks,
- Juri
- migration/0 moves TaskA off of CPU0 and onto CPU1.
- CPU0 switches to the swapper
- Now CPU1 attempts to raise its frequency request (after some
decay of taskA's util so not fmax), but all its requests are ignored since CPU0 has the max request. 7) CPU0 remains in idle for a long time.
It seems that pick_next_idle_task is a better place to reset the capacity unconditionally, but pulling the reset_cap out of the if(task_sleep) block would also work I think?
Thanks, Vikram