Re: [Eas-dev] [PATCH] DEBUG: sched: always use CPU's cfs_rq for load tracking

4 Apr 2016

Hi Leo,
On 01/04/16 07:32, Leo Yan wrote:
...
Hi Dietmar,
On Thu, Mar 31, 2016 at 12:32:32PM +0200, Dietmar Eggemann wrote:
...
On 03/30/2016 12:11 PM, Leo Yan wrote:
...
When log CPU's load and utilization, should directly use CPU's cfs_rq
for tracking. If use the task's cfs_rq, it may introduce error value
by using task_group's cfs_rq but not real CPU's cfs_rq.
[...]
...
...
commit 6e83d44d8844d8ff21a965a913818e078a54b1b6
Author: Dietmar Eggemann dietmar.eggemann@arm.com
Date:   Fri Jan 8 10:55:03 2016 +0000
Fix/DEBUG: sched: Move trace_sched_load_avg_cpu() to update_cfs_rq_load_avg()

In a system with periodic timer ticks (constant rate, no dynticks), in
case a cpu becomes idle, the cfs_rq.avg.load_avg/util_avg signals are
still updated via:

scheduler_tick() -> trigger_load_balance(rq)

 -> run_rebalance_domains() -> rebalance_domains() -> update_blocked_averages()

The function update_blocked_averages() calls update_cfs_rq_load_avg() and
not update_cfs_rq_load_avg() so in this situation cfs_rq.avg.load_avg/util_avg
signals updates are not traced correctly.

!!! There are no updates via scheduler_tick() -> task_tick_fair() ->
entity_tick() since current is the idle task. !!!

So the occasions for updating CPU's cfs_rq.avg.load_avg/util_avg signals are:

Scheduler tick to trigger load balance;
Scheduler tick for not a idle task;

Do you mean the update_blocked_averages() calls in rebalance_domains()
and idle_balance()?
...

Enqueue/dequeue one task for this CPU;

Yes, enqueue_task_fair(*) and dequeue_task_fair(*). ( with (*) sched
class fair entry points)
...
Does I miss other pathes for updating CPU's utilization?
update_cfs_rq_load_avg() is called from update_load_avg() too so there
are plenty of other sides as well (e.g. pick_next_task_fair(*),
set_curr_task_fair(*), put_prev_task_fair(*) and task_tick_fair(*)).
Don't forget attach_task_cfs_rq() and detach_task_cfs_rq() from
switched_to_fair(*), switched_from_fair(*) and task_move_group_fair(*)
So it's pretty much all over the place.
...
And for "dequeue one task case", though scheduler will update the running CPU's
(CPU_A) utilization, but this doesn't mean the task's utilization will be
removed from task's previous CPU's (CPU_B) utilization directly, (CPU_A may is
different with CPU_B); scheduler will just temperarily remove the task's util_avg
value in cfs_rq->removed_util_avg and until next time the task's previous CPU (CPU_B)
is really be waken up and then will update its own cfs_rq->avg.util_avg. So that
means during a period, CPU_B will keep task's utilization value, right?
You're talking about remove_entity_load_avg() which is called in
migrate_task_rq_fair(*) and task_dead_fair(*)? When called we don't have
CPU_B's rq lock. We defer CPU_B's &cfs_rq->avg update till the next call
to update_cfs_rq_load_avg() on CPU_B.
Also look where cfs_rq->avg.load_avg, cfs_rq->avg.util_avg are actually
used in CFS in comparison to the non-blocked version
cfs_rq->runnable_load_avg.
[..]
...
...

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 7ec9dcfc701a..0724bd03773d 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -641,18 +641,25 @@ TRACE_EVENT(sched_load_avg_cpu,
    TP_STRUCT__entry(
            __field( int,   cpu                             )


          __field( int,   id                              )
          __field( unsigned long, load_avg                )
          __field( unsigned long, util_avg                )
  ),

  TP_fast_assign(
          __entry->cpu                    = cpu;



+#ifdef CONFIG_FAIR_GROUP_SCHED

          __entry->id                     = cfs_rq->tg->css.id;



+#else

          __entry->id                     = 0;



+#endif
I tried to dump CPU's task group and saw different CPU may have the same
css.id for their task groups.
That's right.
...
So we need use CPU number and css.id to get a unique task group, right?
And the root cfs_rq's css.id equals 1.
Exactly. E.g. with pivot/filter or multiple filters in Lisa/trappy its
easily done.
[..]

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Eas-dev] [PATCH] DEBUG: sched: always use CPU's cfs_rq for load tracking