Re: [Eas-dev] [PATCH 3/6] sched: cfs: cpu frequency scaling based on task placement

13 Mar 2015

On Fri, Mar 13, 2015 at 10:37:53AM +0000, Juri Lelli wrote:
...
On 11/03/15 17:00, Morten Rasmussen wrote:
...
On Mon, Mar 09, 2015 at 02:06:17PM +0000, Juri Lelli wrote:
...
From: Michael Turquette mturquette@linaro.org
{en,de}queue_task_fair are updated to track which cpus will have changed
utilization values as function of task queueing. The affected cpus are
passed on to arch_eval_cpu_freq for further machine-specific processing
based on a selectable policy.
arch_scale_cpu_freq is called from run_rebalance_domains as a way to
kick off the scaling process (via wake_up_process), so as to prevent
re-entering the {en,de}queue code.
All of the call sites in this patch are up for discussion. Does it make
sense to track which cpus have updated statistics in enqueue_fair_task?
I chose this because I wanted to gather statistics for all cpus affected
in the event CONFIG_FAIR_GROUP_SCHED is enabled. As agreed at LPC14 the
next version of this patch will focus on the simpler case of not using
scheduler cgroups, which should remove a good chunk of this code,
including the cpumask stuff.
Also discussed at LPC14 is that fact that load_balance is a very
interesting place to do this as frequency can be considered in concert
with task placement. Please put forth any ideas on a sensible way to do
this.
Is run_rebalance_domains a logical place to change cpu frequency? What
other call sites make sense?
Even for platforms that can target a cpu frequency without sleeping
(x86, some ARM platforms with PM microcontrollers) it is currently
necessary to always kick the frequency target work out into a kthread.
This is because of the rw_sem usage in the cpufreq core which might
sleep. Replacing that lock type is probably a good idea.
Not-signed-off-by: Mike Turquette mturquette@linaro.org
Signed-off-by: Michael Turquette mturquette@linaro.org
kernel/sched/fair.c  | 47 ++++++++++++++++++++++++++++++++++++++++++-----
 kernel/sched/sched.h |  2 ++
 2 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d9386b0..1043266 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
[...]
...
@@ -4275,12 +4283,26 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
update_cfs_shares(cfs_rq);
update_entity_load_avg(se, 1);


/* track cpus that need to be re-evaluated */


cpumask_set_cpu(cpu_of(rq_of(cfs_rq)), &update_cpus);

}

/* !CONFIG_FAIR_GROUP_SCHED */
if (!se) {
update_rq_runnable_avg(rq, rq->nr_running);
add_nr_running(rq, 1);


/*


 * FIXME for !CONFIG_FAIR_GROUP_SCHED it might be nice to


 * typedef update_cpus into an int and skip all of the cpumask


 * stuff


 */


cpumask_set_cpu(cpu_of(rq), &update_cpus);



You are only affecting one cpu in {en,de}queue_task_fair() so
update_cpus should never have more than one cpu set. The
for_each_sched_entity() loops are just to ensure that all the parent
sched groups' sched_entities are enqueued on their parents rqs. They all
belong to one cpu and that is cpu_of(rq).
I was about to says that you don't need more than just an int, but if
you want to hook into rebalance_domains() or somewhere in that path
where we move tasks between cpus you would probably want to update both
source and destination cpus. It gets more complicated in
nohz_idle_balance() where you may move tasks between multiple sources
and destinations in one go. So having a mask might not be a bad idea
after all.
Since you only periodically check the cpus in the update_cpus mask you
may have had tasks {en,de}queue on multiple cpus in the meantime. So you
need the cpumask anyway.
Right. We need a way to keep track of "pending requests".
...
If you keep the cpumask here in {en,de}queue_task_fair() you can get
away with just a single cpumask_set_cpu(cpu_of(rq)).
If you want to experiment with hooking into load_balance() which would
cover periodic, idle, and nohz_idle balancing I think it could worth a
try setting env->src_cpu and env->dst_cpu in the update_cpus mask just
after attach_tasks() in load_balance(). You get there every time tasks
have been migrated due to 'normal' load-balancing.
You can do something very similar after attach_one_task() in
active_load_balance_cpu_stop() to get active balance migrations too (not
'normal').
{attach,detach}_tasks() (and attach_one_task()) end up calling
{activate,deactivate}_task, so we have {enqueue,dequeue}_task for
source and destination runqueues. So, even in these cases it seems
fine to just evaluate the new situation from {en,de}queue_task_fair()
(or even the core calling sites if we want to be more general).
Am I missing something?
No, it is me not doing my homework well enough. You are right.
You would however potentially get multiple enqueue/dequeue calls to the
same cpus if you are doing a load_balance() but I guess the overhead
would be small.
...
...
I guess that should cover all changes in utilization due to task
migrations. The last bit missing is changes to utilization caused by
changing task behaviour. For example, a small task suddenly turning
heavy. Since it is now runnable all the time it won't be dequeued and
load_balance() may not find a reason to migrate it. If we don't check
the utilization of busy cpus every now and again the frequency we picked
a while ago may no longer match the current utilization.
A naive solution could be to mark a cpu in update_cpus every time it
receives a tick. At the end of task_tick_fair() for example. But then it
isn't really event driven anymore. It needs some more thinking.
Right, this bit is missing. Well, the event would be a task crossing
the up threshold at the current cap. We kick the thing only if this
happened. But yeah, maybe something more clever is possible.
To complete the picture we also need to sort out the problems with
blocked load and nohz that you encountered so we don't keep the
frequency up unnecessarily.

    

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Eas-dev] [PATCH 3/6] sched: cfs: cpu frequency scaling based on task placement

Signed-off-by: Michael Turquette mturquette@linaro.org