On Fri, Nov 22, 2013 at 11:45:58AM +0000, Paul Turner wrote:
On Fri, Nov 22, 2013 at 1:57 AM, Morten Rasmussen morten.rasmussen@arm.com wrote:
On Thu, Nov 21, 2013 at 12:22:06PM +0000, Alex Shi wrote:
Add Daniel and remove Diane. sorry.
On 11/21/2013 04:34 PM, Alex Shi wrote:
When I read the runnable load avg code, I found the task's avg.load_avg_contrib mainly updated in enqueue/dequeue, and the 'curr' task in sched_tick. So, if a sleep long time task is waked/added and kept on a cpu, but the task is never be the 'curr' in sched_tick. Then the task's load contrib will never be updated and keep small.
what I missed? or Is it really?
No -- This isn't quite how it works.
It is correct that the task load_avg_contrib is updated at enqueue/dequeue and when it happens to be 'curr' at the sched_tick. Additionally it is updated every time the task is descheduled as part of put_prev_entity() which is called from __schedule() (through other functions).
load_avg_contrib should always be quite close to the 'true' value as long as it is running. It would be updated at least once per sched period.
load_avg_contrib is not updated while the task is sleeping. It will be updated when the task is reinserted into a runqueue at wakeup. That fact that is retains its old (non-decayed) value is a very useful feature as it allows us to see how the task behaved last time it ran no matter how long it has been sleeping
A task does not maintain any non-decayed values. While a task is sleeping its value is decaying. We amortize the computation cost for this as below, but when T rewakes we will fully account the decay.
Yes, I agree that everything is accounted for as you explain below. What I'm referring to is the way it is implemented. I should have made that more clear in my response.
If you read the load_avg_contrib value during wake up [in select_task_rq_fair()] you get the non-decayed value. The decay is not accounted for until the task is inserted into a runqueue. The load_avg_contrib is not used until it is updated, so it is not a problem in any way.
I just wanted to point out that this implementation detail is very useful for making energy-aware decisions in the wake-up load-balancing [select_task_rq_fair()].
That is not currently exploited in the mainline scheduler, but it is very important for energy-aware scheduling (and big.LITTLE scheduler support).
For example, webbrowser rendering it quite cpu intensive but doesn't happen very often. So its 'true' load_avg_contrib would be 0, but since it isn't updated we can see that it ran for a long time last time it was scheduled and schedule it on an appropriate cpu instead of assuming that it is a small task.
We do track the load average while tasks are sleeping. However, much care must be taken in doing this.
Suppose we iterated all tasks in the system, updating their load average; regardless of whether they were running. This would be O(n) by tasks and tremendously expensive as many tasks enter the system. Instead we do something more subtle.
A tasks load average is: L(t) = \Sum (r_i/1024) * k^i where k^32 ==1/2
Where \u_i is the usage in the most recent 1024 us, when we add a new observation, we relabel, u0 becomes u1, etc.
u_i = r_i/1024
Correct me if I'm wrong.
This has the nice property that, given the most recent observation: L(t) = <recent> + k * L(t)' [ Where L(t)' is L(t) before the most recent observation. ]
<recent> = u0
Now, supposing a task is blocked for the entire recent period, then r_i (the time it was runnable) == 0.
Thus, L(t) = k * L(t)'
Now, we can exploit this.
Let B(n) = \Sum all blocked tasks on cpu n
Then, we can discount every task accumulated into B(n) simply by multiplying by k. [ Note: B(n) is cfs_rq->blocked_load_avg ]
When the task t finally does wake up. We can compute how much it's decayed: L(t) = k^n * L(t)'
In select_task_rq_fair():
load_avg_contrib = L(t)'
but it is updated immediately after to L(t).
Then remove it from B(n), B(n) -= L(t)
And add it back into the runnable average.
Thanks, Morten