Hi Leo,
On 01/04/16 07:32, Leo Yan wrote:
Hi Dietmar,
On Thu, Mar 31, 2016 at 12:32:32PM +0200, Dietmar Eggemann wrote:
On 03/30/2016 12:11 PM, Leo Yan wrote:
When log CPU's load and utilization, should directly use CPU's cfs_rq for tracking. If use the task's cfs_rq, it may introduce error value by using task_group's cfs_rq but not real CPU's cfs_rq.
[...]
commit 6e83d44d8844d8ff21a965a913818e078a54b1b6 Author: Dietmar Eggemann dietmar.eggemann@arm.com Date: Fri Jan 8 10:55:03 2016 +0000
Fix/DEBUG: sched: Move trace_sched_load_avg_cpu() to update_cfs_rq_load_avg() In a system with periodic timer ticks (constant rate, no dynticks), in case a cpu becomes idle, the cfs_rq.avg.load_avg/util_avg signals are still updated via: scheduler_tick() -> trigger_load_balance(rq) -> run_rebalance_domains() -> rebalance_domains() -> update_blocked_averages() The function update_blocked_averages() calls update_cfs_rq_load_avg() and not update_cfs_rq_load_avg() so in this situation cfs_rq.avg.load_avg/util_avg signals updates are not traced correctly. !!! There are no updates via scheduler_tick() -> task_tick_fair() -> entity_tick() since current is the idle task. !!!
So the occasions for updating CPU's cfs_rq.avg.load_avg/util_avg signals are:
- Scheduler tick to trigger load balance;
- Scheduler tick for not a idle task;
Do you mean the update_blocked_averages() calls in rebalance_domains() and idle_balance()?
- Enqueue/dequeue one task for this CPU;
Yes, enqueue_task_fair(*) and dequeue_task_fair(*). ( with (*) sched class fair entry points)
Does I miss other pathes for updating CPU's utilization?
update_cfs_rq_load_avg() is called from update_load_avg() too so there are plenty of other sides as well (e.g. pick_next_task_fair(*), set_curr_task_fair(*), put_prev_task_fair(*) and task_tick_fair(*)).
Don't forget attach_task_cfs_rq() and detach_task_cfs_rq() from switched_to_fair(*), switched_from_fair(*) and task_move_group_fair(*)
So it's pretty much all over the place.
And for "dequeue one task case", though scheduler will update the running CPU's (CPU_A) utilization, but this doesn't mean the task's utilization will be removed from task's previous CPU's (CPU_B) utilization directly, (CPU_A may is different with CPU_B); scheduler will just temperarily remove the task's util_avg value in cfs_rq->removed_util_avg and until next time the task's previous CPU (CPU_B) is really be waken up and then will update its own cfs_rq->avg.util_avg. So that means during a period, CPU_B will keep task's utilization value, right?
You're talking about remove_entity_load_avg() which is called in migrate_task_rq_fair(*) and task_dead_fair(*)? When called we don't have CPU_B's rq lock. We defer CPU_B's &cfs_rq->avg update till the next call to update_cfs_rq_load_avg() on CPU_B.
Also look where cfs_rq->avg.load_avg, cfs_rq->avg.util_avg are actually used in CFS in comparison to the non-blocked version cfs_rq->runnable_load_avg.
[..]
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index 7ec9dcfc701a..0724bd03773d 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -641,18 +641,25 @@ TRACE_EVENT(sched_load_avg_cpu,
TP_STRUCT__entry( __field( int, cpu )
__field( int, id ) __field( unsigned long, load_avg ) __field( unsigned long, util_avg ) ), TP_fast_assign( __entry->cpu = cpu;
+#ifdef CONFIG_FAIR_GROUP_SCHED
__entry->id = cfs_rq->tg->css.id;
+#else
__entry->id = 0;
+#endif
I tried to dump CPU's task group and saw different CPU may have the same css.id for their task groups.
That's right.
So we need use CPU number and css.id to get a unique task group, right? And the root cfs_rq's css.id equals 1.
Exactly. E.g. with pivot/filter or multiple filters in Lisa/trappy its easily done.
[..]