When I read the runnable load avg code, I found the task's avg.load_avg_contrib mainly updated in enqueue/dequeue, and the 'curr' task in sched_tick. So, if a sleep long time task is waked/added and kept on a cpu, but the task is never be the 'curr' in sched_tick. Then the task's load contrib will never be updated and keep small.
what I missed? or Is it really?
Add Daniel and remove Diane. sorry.
On 11/21/2013 04:34 PM, Alex Shi wrote:
When I read the runnable load avg code, I found the task's avg.load_avg_contrib mainly updated in enqueue/dequeue, and the 'curr' task in sched_tick. So, if a sleep long time task is waked/added and kept on a cpu, but the task is never be the 'curr' in sched_tick. Then the task's load contrib will never be updated and keep small.
what I missed? or Is it really?
On Thu, Nov 21, 2013 at 12:22:06PM +0000, Alex Shi wrote:
Add Daniel and remove Diane. sorry.
On 11/21/2013 04:34 PM, Alex Shi wrote:
When I read the runnable load avg code, I found the task's avg.load_avg_contrib mainly updated in enqueue/dequeue, and the 'curr' task in sched_tick. So, if a sleep long time task is waked/added and kept on a cpu, but the task is never be the 'curr' in sched_tick. Then the task's load contrib will never be updated and keep small.
what I missed? or Is it really?
It is correct that the task load_avg_contrib is updated at enqueue/dequeue and when it happens to be 'curr' at the sched_tick. Additionally it is updated every time the task is descheduled as part of put_prev_entity() which is called from __schedule() (through other functions).
load_avg_contrib should always be quite close to the 'true' value as long as it is running. It would be updated at least once per sched period.
load_avg_contrib is not updated while the task is sleeping. It will be updated when the task is reinserted into a runqueue at wakeup. That fact that is retains its old (non-decayed) value is a very useful feature as it allows us to see how the task behaved last time it ran no matter how long it has been sleeping. That is not currently exploited in the mainline scheduler, but it is very important for energy-aware scheduling (and big.LITTLE scheduler support).
For example, webbrowser rendering it quite cpu intensive but doesn't happen very often. So its 'true' load_avg_contrib would be 0, but since it isn't updated we can see that it ran for a long time last time it was scheduled and schedule it on an appropriate cpu instead of assuming that it is a small task.
Hope it makes sense and answers your question.
Morten
-- Thanks Alex
On Fri, Nov 22, 2013 at 1:57 AM, Morten Rasmussen morten.rasmussen@arm.com wrote:
On Thu, Nov 21, 2013 at 12:22:06PM +0000, Alex Shi wrote:
Add Daniel and remove Diane. sorry.
On 11/21/2013 04:34 PM, Alex Shi wrote:
When I read the runnable load avg code, I found the task's avg.load_avg_contrib mainly updated in enqueue/dequeue, and the 'curr' task in sched_tick. So, if a sleep long time task is waked/added and kept on a cpu, but the task is never be the 'curr' in sched_tick. Then the task's load contrib will never be updated and keep small.
what I missed? or Is it really?
No -- This isn't quite how it works.
It is correct that the task load_avg_contrib is updated at enqueue/dequeue and when it happens to be 'curr' at the sched_tick. Additionally it is updated every time the task is descheduled as part of put_prev_entity() which is called from __schedule() (through other functions).
load_avg_contrib should always be quite close to the 'true' value as long as it is running. It would be updated at least once per sched period.
load_avg_contrib is not updated while the task is sleeping. It will be updated when the task is reinserted into a runqueue at wakeup. That fact that is retains its old (non-decayed) value is a very useful feature as it allows us to see how the task behaved last time it ran no matter how long it has been sleeping
A task does not maintain any non-decayed values. While a task is sleeping its value is decaying. We amortize the computation cost for this as below, but when T rewakes we will fully account the decay.
That is not currently exploited in the mainline scheduler, but it is very important for energy-aware scheduling (and big.LITTLE scheduler support).
For example, webbrowser rendering it quite cpu intensive but doesn't happen very often. So its 'true' load_avg_contrib would be 0, but since it isn't updated we can see that it ran for a long time last time it was scheduled and schedule it on an appropriate cpu instead of assuming that it is a small task.
We do track the load average while tasks are sleeping. However, much care must be taken in doing this.
Suppose we iterated all tasks in the system, updating their load average; regardless of whether they were running. This would be O(n) by tasks and tremendously expensive as many tasks enter the system. Instead we do something more subtle.
A tasks load average is: L(t) = \Sum (r_i/1024) * k^i where k^32 ==1/2
Where \u_i is the usage in the most recent 1024 us, when we add a new observation, we relabel, u0 becomes u1, etc.
This has the nice property that, given the most recent observation: L(t) = <recent> + k * L(t)' [ Where L(t)' is L(t) before the most recent observation. ]
Now, supposing a task is blocked for the entire recent period, then r_i (the time it was runnable) == 0.
Thus, L(t) = k * L(t)'
Now, we can exploit this.
Let B(n) = \Sum all blocked tasks on cpu n
Then, we can discount every task accumulated into B(n) simply by multiplying by k. [ Note: B(n) is cfs_rq->blocked_load_avg ]
When the task t finally does wake up. We can compute how much it's decayed: L(t) = k^n * L(t)'
Then remove it from B(n), B(n) -= L(t)
And add it back into the runnable average.
- Paul
Hope it makes sense and answers your question.
Morten
-- Thanks Alex
On Fri, Nov 22, 2013 at 11:45:58AM +0000, Paul Turner wrote:
On Fri, Nov 22, 2013 at 1:57 AM, Morten Rasmussen morten.rasmussen@arm.com wrote:
On Thu, Nov 21, 2013 at 12:22:06PM +0000, Alex Shi wrote:
Add Daniel and remove Diane. sorry.
On 11/21/2013 04:34 PM, Alex Shi wrote:
When I read the runnable load avg code, I found the task's avg.load_avg_contrib mainly updated in enqueue/dequeue, and the 'curr' task in sched_tick. So, if a sleep long time task is waked/added and kept on a cpu, but the task is never be the 'curr' in sched_tick. Then the task's load contrib will never be updated and keep small.
what I missed? or Is it really?
No -- This isn't quite how it works.
It is correct that the task load_avg_contrib is updated at enqueue/dequeue and when it happens to be 'curr' at the sched_tick. Additionally it is updated every time the task is descheduled as part of put_prev_entity() which is called from __schedule() (through other functions).
load_avg_contrib should always be quite close to the 'true' value as long as it is running. It would be updated at least once per sched period.
load_avg_contrib is not updated while the task is sleeping. It will be updated when the task is reinserted into a runqueue at wakeup. That fact that is retains its old (non-decayed) value is a very useful feature as it allows us to see how the task behaved last time it ran no matter how long it has been sleeping
A task does not maintain any non-decayed values. While a task is sleeping its value is decaying. We amortize the computation cost for this as below, but when T rewakes we will fully account the decay.
Yes, I agree that everything is accounted for as you explain below. What I'm referring to is the way it is implemented. I should have made that more clear in my response.
If you read the load_avg_contrib value during wake up [in select_task_rq_fair()] you get the non-decayed value. The decay is not accounted for until the task is inserted into a runqueue. The load_avg_contrib is not used until it is updated, so it is not a problem in any way.
I just wanted to point out that this implementation detail is very useful for making energy-aware decisions in the wake-up load-balancing [select_task_rq_fair()].
That is not currently exploited in the mainline scheduler, but it is very important for energy-aware scheduling (and big.LITTLE scheduler support).
For example, webbrowser rendering it quite cpu intensive but doesn't happen very often. So its 'true' load_avg_contrib would be 0, but since it isn't updated we can see that it ran for a long time last time it was scheduled and schedule it on an appropriate cpu instead of assuming that it is a small task.
We do track the load average while tasks are sleeping. However, much care must be taken in doing this.
Suppose we iterated all tasks in the system, updating their load average; regardless of whether they were running. This would be O(n) by tasks and tremendously expensive as many tasks enter the system. Instead we do something more subtle.
A tasks load average is: L(t) = \Sum (r_i/1024) * k^i where k^32 ==1/2
Where \u_i is the usage in the most recent 1024 us, when we add a new observation, we relabel, u0 becomes u1, etc.
u_i = r_i/1024
Correct me if I'm wrong.
This has the nice property that, given the most recent observation: L(t) = <recent> + k * L(t)' [ Where L(t)' is L(t) before the most recent observation. ]
<recent> = u0
Now, supposing a task is blocked for the entire recent period, then r_i (the time it was runnable) == 0.
Thus, L(t) = k * L(t)'
Now, we can exploit this.
Let B(n) = \Sum all blocked tasks on cpu n
Then, we can discount every task accumulated into B(n) simply by multiplying by k. [ Note: B(n) is cfs_rq->blocked_load_avg ]
When the task t finally does wake up. We can compute how much it's decayed: L(t) = k^n * L(t)'
In select_task_rq_fair():
load_avg_contrib = L(t)'
but it is updated immediately after to L(t).
Then remove it from B(n), B(n) -= L(t)
And add it back into the runnable average.
Thanks, Morten
On 11/22/2013 07:45 PM, Paul Turner wrote:
On Fri, Nov 22, 2013 at 1:57 AM, Morten Rasmussen morten.rasmussen@arm.com wrote:
On Thu, Nov 21, 2013 at 12:22:06PM +0000, Alex Shi wrote:
Add Daniel and remove Diane. sorry.
On 11/21/2013 04:34 PM, Alex Shi wrote:
> When I read the runnable load avg code, I found the task's > avg.load_avg_contrib mainly updated in enqueue/dequeue, and the 'curr' > task in sched_tick. > So, if a sleep long time task is waked/added and kept on a cpu, but the > task is never be the 'curr' in sched_tick. Then the task's load contrib > will never be updated and keep small. > > what I missed? or Is it really?
No -- This isn't quite how it works.
It is correct that the task load_avg_contrib is updated at enqueue/dequeue and when it happens to be 'curr' at the sched_tick. Additionally it is updated every time the task is descheduled as part of put_prev_entity() which is called from __schedule() (through other functions).
Thanks a lot for the kindly explanations!
On 11/22/2013 05:57 PM, Morten Rasmussen wrote:
On Thu, Nov 21, 2013 at 12:22:06PM +0000, Alex Shi wrote:
Add Daniel and remove Diane. sorry.
On 11/21/2013 04:34 PM, Alex Shi wrote:
When I read the runnable load avg code, I found the task's avg.load_avg_contrib mainly updated in enqueue/dequeue, and the 'curr' task in sched_tick. So, if a sleep long time task is waked/added and kept on a cpu, but the task is never be the 'curr' in sched_tick. Then the task's load contrib will never be updated and keep small.
what I missed? or Is it really?
It is correct that the task load_avg_contrib is updated at enqueue/dequeue and when it happens to be 'curr' at the sched_tick. Additionally it is updated every time the task is descheduled as part of put_prev_entity() which is called from __schedule() (through other functions).
load_avg_contrib should always be quite close to the 'true' value as long as it is running. It would be updated at least once per sched period.
I read the code again. yes if the task was scheduled(sum_exec_runtime increased), the load_avg_period will be updated. But if the task is just in running queue without a chance to take cpu, the load_avg_contrib still keep the old value when it was inserted into running queue. It seems not a big deal, if the task has chance to run. but if there is much of tasks in system, and lots of such task(with a little load_avg_contrib) waiting on a cpu. the cpu_load will not be correct -- smaller then expected. then that make load balance do wrong decision: give this cpu more tasks. That will be worse. :)
load_avg_contrib is not updated while the task is sleeping. It will be updated when the task is reinserted into a runqueue at wakeup. That fact that is retains its old (non-decayed) value is a very useful feature as it allows us to see how the task behaved last time it ran no matter how long it has been sleeping. That is not currently exploited in the mainline scheduler, but it is very important for energy-aware scheduling (and big.LITTLE scheduler support).
For example, webbrowser rendering it quite cpu intensive but doesn't happen very often. So its 'true' load_avg_contrib would be 0, but since it isn't updated we can see that it ran for a long time last time it was scheduled and schedule it on an appropriate cpu instead of assuming that it is a small task.
Yes. If such scenarios happens often, maybe worthy to add a new load_avg_contrib variable to store the old value that you wanted.
On Tue, Nov 26, 2013 at 10:04 PM, Alex Shi alex.shi@linaro.org wrote:
On 11/22/2013 05:57 PM, Morten Rasmussen wrote:
On Thu, Nov 21, 2013 at 12:22:06PM +0000, Alex Shi wrote:
Add Daniel and remove Diane. sorry.
On 11/21/2013 04:34 PM, Alex Shi wrote:
When I read the runnable load avg code, I found the task's avg.load_avg_contrib mainly updated in enqueue/dequeue, and the 'curr' task in sched_tick. So, if a sleep long time task is waked/added and kept on a cpu, but the task is never be the 'curr' in sched_tick. Then the task's load contrib will never be updated and keep small.
what I missed? or Is it really?
It is correct that the task load_avg_contrib is updated at enqueue/dequeue and when it happens to be 'curr' at the sched_tick. Additionally it is updated every time the task is descheduled as part of put_prev_entity() which is called from __schedule() (through other functions).
load_avg_contrib should always be quite close to the 'true' value as long as it is running. It would be updated at least once per sched period.
I read the code again. yes if the task was scheduled(sum_exec_runtime increased), the load_avg_period will be updated. But if the task is just in running queue without a chance to take cpu, the load_avg_contrib still keep the old value when it was inserted into running queue. It seems not a big deal, if the task has chance to run. but if there is much of tasks in system, and lots of such task(with a little load_avg_contrib) waiting on a cpu. the cpu_load will not be correct -- smaller then expected. then that make load balance do wrong decision: give this cpu more tasks. That will be worse. :)
I'm confused by what you mean it "keeps" the old value.
When a task blocks, it's load_contrib is removed from runnable_load_avg. This quantity is moved to blocked_load_avg, and continues to be updated (specifically: decayed since the task is not contributing load while blocked). When the task wakes back up its load_contrib is updated to match the decay charged against it.
load_avg_contrib is not updated while the task is sleeping. It will be updated when the task is reinserted into a runqueue at wakeup. That fact that is retains its old (non-decayed) value is a very useful feature as it allows us to see how the task behaved last time it ran no matter how long it has been sleeping. That is not currently exploited in the mainline scheduler, but it is very important for energy-aware scheduling (and big.LITTLE scheduler support).
For example, webbrowser rendering it quite cpu intensive but doesn't happen very often. So its 'true' load_avg_contrib would be 0, but since it isn't updated we can see that it ran for a long time last time it was scheduled and schedule it on an appropriate cpu instead of assuming that it is a small task.
Yes. If such scenarios happens often, maybe worthy to add a new load_avg_contrib variable to store the old value that you wanted.
I think you are misunderstanding Morten here. He is only highlighting that for highly aperiodic tasks we may (note we do not currently) take advantage of the fact that we can examine them _before_ the load_contrib is updated to reflect their new value. One can imagine such a heuristic might be enabled when we saw that the blocked period exceeded a certain thresh-hold for instance.
This does not mean that this value was participating in any runnable load average though.
-- Thanks Alex
load_avg_contrib should always be quite close to the 'true' value as long as it is running. It would be updated at least once per sched period.
I read the code again. yes if the task was scheduled(sum_exec_runtime increased), the load_avg_period will be updated. But if the task is just in running queue without a chance to take cpu, the load_avg_contrib still keep the old value when it was inserted into running queue. It seems not a big deal, if the task has chance to run. but if there is much of tasks in system, and lots of such task(with a little load_avg_contrib) waiting on a cpu. the cpu_load will not be correct -- smaller then expected. then that make load balance do wrong decision: give this cpu more tasks. That will be worse. :)
I'm confused by what you mean it "keeps" the old value.
When a task blocks, it's load_contrib is removed from runnable_load_avg. This quantity is moved to blocked_load_avg, and continues to be updated (specifically: decayed since the task is not contributing load while blocked). When the task wakes back up its load_contrib is updated to match the decay charged against it.
Yes. when the task is not in running queue, it's load in blocked_load and was decay correctly. I just concern a very rarely scenario: a sleep long time task was added into running queue, so its load_contrib is nearly 0, but it isn't get real cpu for long time. then I am afraid its load_contrib has no chance to be updated.
On Wed, Nov 27, 2013 at 06:52:22AM +0000, Alex Shi wrote:
load_avg_contrib should always be quite close to the 'true' value as long as it is running. It would be updated at least once per sched period.
I read the code again. yes if the task was scheduled(sum_exec_runtime increased), the load_avg_period will be updated. But if the task is just in running queue without a chance to take cpu, the load_avg_contrib still keep the old value when it was inserted into running queue. It seems not a big deal, if the task has chance to run. but if there is much of tasks in system, and lots of such task(with a little load_avg_contrib) waiting on a cpu. the cpu_load will not be correct -- smaller then expected. then that make load balance do wrong decision: give this cpu more tasks. That will be worse. :)
I'm confused by what you mean it "keeps" the old value.
When a task blocks, it's load_contrib is removed from runnable_load_avg. This quantity is moved to blocked_load_avg, and continues to be updated (specifically: decayed since the task is not contributing load while blocked). When the task wakes back up its load_contrib is updated to match the decay charged against it.
Yes. when the task is not in running queue, it's load in blocked_load and was decay correctly. I just concern a very rarely scenario: a sleep long time task was added into running queue, so its load_contrib is nearly 0, but it isn't get real cpu for long time. then I am afraid its load_contrib has no chance to be updated.
All tasks gets a chance to run at least once per sched_period. The period is dynamically extended depending on the number of running tasks. It may therefore take a while until the task is scheduled if you have many tasks running. But, if your task has been sleeping for a long time its vruntime is quite likely to place the task near the head of the runqueue and the waiting time is much shorter than a sched_period.
If you have that many tasks small errors in the load_contrib is probably not your biggest concern.
On 11/27/2013 07:39 PM, Morten Rasmussen wrote:
Yes. when the task is not in running queue, it's load in blocked_load and was decay correctly. I just concern a very rarely scenario: a sleep long time task was added into running queue, so its load_contrib is nearly 0, but it isn't get real cpu for long time. then I am afraid its load_contrib has no chance to be updated.
All tasks gets a chance to run at least once per sched_period. The period is dynamically extended depending on the number of running tasks. It may therefore take a while until the task is scheduled if you have many tasks running. But, if your task has been sleeping for a long time its vruntime is quite likely to place the task near the head of the runqueue and the waiting time is much shorter than a sched_period.
If you have that many tasks small errors in the load_contrib is probably not your biggest concern.
Thanks for explanation, Morten! We may assume there are 2 cpus in system, one has 100, another has 10 normal tasks.
cpu0: 100 tasks * each task's load_contrib: 1 (all tasks just wakeup from sleep), the load of cpu0 is 100 * 1 = 100 cpu1: 10 tasks * each task's load: 1000, the load cpu1 is 10 * 1000 = 10000
the typical LB interval is a few ms, assume 1 ms the min_granularity is 0.75ms. assume 1ms. So after 1ms, LB happened, cpu0 load = (100-1) * 1 + L(t) here L(t) = 1024(task load) * 1002 / 47742 = 99 + 21 = 120 cpu1 load = 10000 then, LB want to move move half of tasks to cpu0. That is a wrong decision. after 2ms, cpu0 load = 120 + 500 + 21 cpu1 load = 500 then LB still will do incorrect decision.
Above scenario is rare to happen, but that show, in this corner case(prediction wrong), system need more time/CS to rebalance well.
But I agree that is not a big deal, since this scenario is hard to have.
On Wed, Nov 27, 2013 at 01:35:39PM +0000, Alex Shi wrote:
On 11/27/2013 07:39 PM, Morten Rasmussen wrote:
Yes. when the task is not in running queue, it's load in blocked_load and was decay correctly. I just concern a very rarely scenario: a sleep long time task was added into running queue, so its load_contrib is nearly 0, but it isn't get real cpu for long time. then I am afraid its load_contrib has no chance to be updated.
All tasks gets a chance to run at least once per sched_period. The period is dynamically extended depending on the number of running tasks. It may therefore take a while until the task is scheduled if you have many tasks running. But, if your task has been sleeping for a long time its vruntime is quite likely to place the task near the head of the runqueue and the waiting time is much shorter than a sched_period.
If you have that many tasks small errors in the load_contrib is probably not your biggest concern.
Thanks for explanation, Morten! We may assume there are 2 cpus in system, one has 100, another has 10 normal tasks.
cpu0: 100 tasks * each task's load_contrib: 1 (all tasks just wakeup from sleep), the load of cpu0 is 100 * 1 = 100 cpu1: 10 tasks * each task's load: 1000, the load cpu1 is 10 * 1000 = 10000
the typical LB interval is a few ms, assume 1 ms the min_granularity is 0.75ms. assume 1ms. So after 1ms, LB happened, cpu0 load = (100-1) * 1 + L(t) here L(t) = 1024(task load) * 1002 / 47742 = 99 + 21 = 120 cpu1 load = 10000 then, LB want to move move half of tasks to cpu0. That is a wrong decision. after 2ms, cpu0 load = 120 + 500 + 21 cpu1 load = 500 then LB still will do incorrect decision.
I'm not sure if it its a wrong decision. The load-balancer can not do better with the information it has available. We would need a better way to predict future task behaviour to do better.
In the above scenario the balance will eventually sort itself out, but it may take a while as the load is quite extreme.
Above scenario is rare to happen, but that show, in this corner case(prediction wrong), system need more time/CS to rebalance well.
But I agree that is not a big deal, since this scenario is hard to have.
There will always be cases that defeat the load-balancing algorithm. You could probably fix that particular case by doing tricks similar to one I described in the other email for big.LITTLE. However, you risk introducing new cases of undesirable behaviour.
An even easier fix for this particular problem is to just go back and use the static load.weight for load-balancing. :)
But I agree that is not a big deal, since this scenario is hard to have.
There will always be cases that defeat the load-balancing algorithm. You could probably fix that particular case by doing tricks similar to one I described in the other email for big.LITTLE. However, you risk introducing new cases of undesirable behaviour.
An even easier fix for this particular problem is to just go back and use the static load.weight for load-balancing. :)
That's impossible.
And I have no idea to let task update load_contrib instant. Maybe close eyes and wait for 2*sched_period time is the best thing we can do. ;)
On 11/27/2013 02:26 PM, Paul Turner wrote:
load_avg_contrib is not updated while the task is sleeping. It will be updated when the task is reinserted into a runqueue at wakeup. That fact that is retains its old (non-decayed) value is a very useful feature as it allows us to see how the task behaved last time it ran no matter how long it has been sleeping. That is not currently exploited in the mainline scheduler, but it is very important for energy-aware scheduling (and big.LITTLE scheduler support).
For example, webbrowser rendering it quite cpu intensive but doesn't happen very often. So its 'true' load_avg_contrib would be 0, but since it isn't updated we can see that it ran for a long time last time it was scheduled and schedule it on an appropriate cpu instead of assuming that it is a small task.
Yes. If such scenarios happens often, maybe worthy to add a new load_avg_contrib variable to store the old value that you wanted.
I think you are misunderstanding Morten here. He is only highlighting that for highly aperiodic tasks we may (note we do not currently) take advantage of the fact that we can examine them _before_ the load_contrib is updated to reflect their new value. One can imagine such a heuristic might be enabled when we saw that the blocked period exceeded a certain thresh-hold for instance.
Paul, Many thanks for your time and reminder! I missed Morten's explanation in his second email.
Yes, we can check other indicators if we don't want to use unchanged load_contrib in select_task_rq->wake_affine. but the wake_affine is a kind of over tune now. :)
This does not mean that this value was participating in any runnable load average though.
On Wed, Nov 27, 2013 at 06:26:49AM +0000, Paul Turner wrote:
On Tue, Nov 26, 2013 at 10:04 PM, Alex Shi alex.shi@linaro.org wrote:
On 11/22/2013 05:57 PM, Morten Rasmussen wrote:
load_avg_contrib is not updated while the task is sleeping. It will be updated when the task is reinserted into a runqueue at wakeup. That fact that is retains its old (non-decayed) value is a very useful feature as it allows us to see how the task behaved last time it ran no matter how long it has been sleeping. That is not currently exploited in the mainline scheduler, but it is very important for energy-aware scheduling (and big.LITTLE scheduler support).
For example, webbrowser rendering it quite cpu intensive but doesn't happen very often. So its 'true' load_avg_contrib would be 0, but since it isn't updated we can see that it ran for a long time last time it was scheduled and schedule it on an appropriate cpu instead of assuming that it is a small task.
Yes. If such scenarios happens often, maybe worthy to add a new load_avg_contrib variable to store the old value that you wanted.
I think you are misunderstanding Morten here. He is only highlighting that for highly aperiodic tasks we may (note we do not currently) take advantage of the fact that we can examine them _before_ the load_contrib is updated to reflect their new value. One can imagine such a heuristic might be enabled when we saw that the blocked period exceeded a certain thresh-hold for instance.
Yes, exactly.
For big.LITTLE we do have heuristics based on this in some of the Linaro kernel trees and that work quite well for this particular purpose. We basically control wake up affinity based on task load_contrib. Examining the load_contrib before it is updated ensures that aperiodic bursty tasks don't wake up on a little cpu because it load has decayed to 0.
We would like to have similar big.LITTLE support (in a more generic form) upstream, but that is somewhere down the list of things to do for energy-aware scheduling.
linaro-kernel@lists.linaro.org