Re: [Eas-dev] [PATCH 3/6] sched: cfs: cpu frequency scaling based on task placement

17 Mar 2015

Hi Mike,
On 16/03/15 22:08, Michael Turquette wrote:
...
Quoting Juri Lelli (2015-03-13 03:37:53)
...
On 11/03/15 17:00, Morten Rasmussen wrote:
...
On Mon, Mar 09, 2015 at 02:06:17PM +0000, Juri Lelli wrote:
...
From: Michael Turquette mturquette@linaro.org
[...]
...
...
...
I was about to says that you don't need more than just an int, but if
you want to hook into rebalance_domains() or somewhere in that path
where we move tasks between cpus you would probably want to update both
source and destination cpus. It gets more complicated in
nohz_idle_balance() where you may move tasks between multiple sources
and destinations in one go. So having a mask might not be a bad idea
after all.
Since you only periodically check the cpus in the update_cpus mask you
may have had tasks {en,de}queue on multiple cpus in the meantime. So you
need the cpumask anyway.
Right. We need a way to keep track of "pending requests".
Instead of tracking "requests" by tagging cpus that have updated stats
in a cpumask we could track "constraints", where a constraint is the
return value of get_cpu_usage stored in a per-cpu variable whenever a
task is migrated. (or whenever the load contribution for a task change
significantly. more on that at the bottom)
I'm not sure this is necessary. The information that we have a cpu with 
a request is sufficient because we can always get the cpu usage just 
before we kick the kthread. (exactly what you do right now).
The issue is that we could see a lot of requests during the time the em 
mechanism is throttled.
Why don't you just add:
cpumask_set_cpu(cpu_of(rq), &update_cpus);
if (energy_aware() && !cpumask_empty(&update_cpus))
    arch_eval_cpu_freq(&update_cpus);
into task_tick_fair() to detect cpu usage changes due to a single task 
running in case there is no enqueue/dequeue happening?
...
In the case of scaling cpu frequency up we can do as Juri says below and
kick the logic whenever we cross the up_threshold.
For scaling down we need to know usage (from get_cpu_usage) of every cpu
in the frequency domain, or more specifically we need to know the max
usage in the freq domain. This is enough to pick the best capacity state
based on floor(max_usage).
[...]
...
How do we normally detect a significant change in task behavior? If we
have an answer to this question then we have a call site to potentially
evaluate cpu frequency.
If the task is running out flat, it would be the cfs tick function.
...
...
...
A naive solution could be to mark a cpu in update_cpus every time it
receives a tick. At the end of task_tick_fair() for example. But then it
isn't really event driven anymore. It needs some more thinking.
Right, this bit is missing. Well, the event would be a task crossing
the up threshold at the current cap. We kick the thing only if this
happened. But yeah, maybe something more clever is possible.
If we update constraints in per-cpu vars (as I outlined above) then
could we move the code that does that out of {en,de}queue_fair_task and
into some place common that solves this problem? I need some help on
this one, but I was imagining something like:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6fce9e0..266f4aa 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2785,6 +2785,7 @@ static inline void update_entity_load_avg(struct sched_entity *se,
         struct cfs_rq *cfs_rq = cfs_rq_of(se);
         long contrib_delta, utilization_delta;
         int cpu = cpu_of(rq_of(cfs_rq));

  int cap;
   u64 now;

   /*



@@ -2818,6 +2819,10 @@ static inline void update_entity_load_avg(struct sched_entity *se,
                                                         -utilization_delta);
         }

  /* store updated per-cpu usage when update_cfs_rq is true */


  cap = get_cpu_usage(cpu);


  atomic_long_set(&per_cpu(usage, cpu), cap);


   trace_sched_load_avg_cpu(cpu, cfs_rq);

}

I guess it could work but why changing the current approach (calling 
arch_eval_cpu_freq() from CFS's enqueue, dequeue and *tick*)?
update_entity_load_avg() is all about updating PELT for the sched_entity 
se and the two cfs_rq PELT signals (cfs_rq->utilization_load_avg 
(blue*), cfs_rq->utilization_blocked_avg (green*)).
I just don't see the advantage right now to save cpu usage (red*) per 
CFS's enqueue, dequeue and tick operation comparing to save the 
information that a cpu requested a freq change. In case of 'pending 
requests' happening in a throttle period, what is the value of the per 
(cfs_)rq signal 'cpu usage' any way? It could be a stale value once the 
throttle period is over.
* colors in attached diagram (data from TC2 (2 A15, 2 A7) in 
update_entity_load_avg()
...
The above untested snippet only cares about the CFS usage. In the future
we might want to aggregate it with DL and RT usage as well, or we might
have a per-class, per-cpu atomic.
IMHO,we should focus on CFS (with the underlaying PELT mechanism) for 
now and solve this smaller set of problems first:
(1) Fixup the 'struct cpumask update_cpus' problem.
(2) Fix the problem that we evaluate the cpu_usage exactly at the end
     of a throttling period if there were 'pending requests' even if
     there are no events at this moment.
(3) kick the logic whenever we cross the up_threshold for scaling cpu
     frequency up.
...
Any ideas on this approach?
Further thoughts:
Assuming that the above idea is agreeable, one of the questions I am
grappling with is whether we should store the usage in a per-cpu
variable or if we should add it as a member to struct rq (in which case
it would be a sum of usage from every sched class).
I am currently only looking at the CFS-centric, per-cpu atomic variable
method, following the philosophy of Keep It Simple, Stupid.  However
putting it in struct rq has benefits:
I like this 'CFS-centric, per-cpu atomic variable method' because IMHO 
it gives the playground for everything we need right now to let it work 
with EAS.
...

locking is already managed for us when we need to update with a new

usage value, as _THIS_ cpu is the one that will update its own usage.
This means we can avoid costly atomic operations.

locking is already managed for us when walking the rq's looking for a

max usage in places like load_balance. It is NOT managed when a new task
is placed on the rq though, so this would need some synchronization? Or
we could ignore the possibility of scaling cpu when a new task is
enqueued.
IMHO, we can life with the little overhead of some atomics for now to 
get it working in CFS with EAS for the end of march RFC.
-- Dietmar
...
Regards,
Mike

    

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Eas-dev] [PATCH 3/6] sched: cfs: cpu frequency scaling based on task placement