Re: [Eas-dev] [PATCH 3/6] sched: cfs: cpu frequency scaling based on task placement

11 Mar 2015

On Mon, Mar 09, 2015 at 02:06:17PM +0000, Juri Lelli wrote:
...
From: Michael Turquette mturquette@linaro.org
{en,de}queue_task_fair are updated to track which cpus will have changed
utilization values as function of task queueing. The affected cpus are
passed on to arch_eval_cpu_freq for further machine-specific processing
based on a selectable policy.
arch_scale_cpu_freq is called from run_rebalance_domains as a way to
kick off the scaling process (via wake_up_process), so as to prevent
re-entering the {en,de}queue code.
All of the call sites in this patch are up for discussion. Does it make
sense to track which cpus have updated statistics in enqueue_fair_task?
I chose this because I wanted to gather statistics for all cpus affected
in the event CONFIG_FAIR_GROUP_SCHED is enabled. As agreed at LPC14 the
next version of this patch will focus on the simpler case of not using
scheduler cgroups, which should remove a good chunk of this code,
including the cpumask stuff.
Also discussed at LPC14 is that fact that load_balance is a very
interesting place to do this as frequency can be considered in concert
with task placement. Please put forth any ideas on a sensible way to do
this.
Is run_rebalance_domains a logical place to change cpu frequency? What
other call sites make sense?
Even for platforms that can target a cpu frequency without sleeping
(x86, some ARM platforms with PM microcontrollers) it is currently
necessary to always kick the frequency target work out into a kthread.
This is because of the rw_sem usage in the cpufreq core which might
sleep. Replacing that lock type is probably a good idea.
Not-signed-off-by: Mike Turquette mturquette@linaro.org
Signed-off-by: Michael Turquette mturquette@linaro.org
kernel/sched/fair.c  | 47 ++++++++++++++++++++++++++++++++++++++++++-----
 kernel/sched/sched.h |  2 ++
 2 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d9386b0..1043266 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
[...]
...
@@ -4275,12 +4283,26 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 
   	update_cfs_shares(cfs_rq);
   	update_entity_load_avg(se, 1);

/* track cpus that need to be re-evaluated */


cpumask_set_cpu(cpu_of(rq_of(cfs_rq)), &update_cpus);

}


/* !CONFIG_FAIR_GROUP_SCHED */
if (!se) {
update_rq_runnable_avg(rq, rq->nr_running);
add_nr_running(rq, 1);

/*


 * FIXME for !CONFIG_FAIR_GROUP_SCHED it might be nice to


 * typedef update_cpus into an int and skip all of the cpumask


 * stuff


 */


cpumask_set_cpu(cpu_of(rq), &update_cpus);



You are only affecting one cpu in {en,de}queue_task_fair() so
update_cpus should never have more than one cpu set. The
for_each_sched_entity() loops are just to ensure that all the parent
sched groups' sched_entities are enqueued on their parents rqs. They all
belong to one cpu and that is cpu_of(rq).
I was about to says that you don't need more than just an int, but if
you want to hook into rebalance_domains() or somewhere in that path
where we move tasks between cpus you would probably want to update both
source and destination cpus. It gets more complicated in
nohz_idle_balance() where you may move tasks between multiple sources
and destinations in one go. So having a mask might not be a bad idea
after all.
Since you only periodically check the cpus in the update_cpus mask you
may have had tasks {en,de}queue on multiple cpus in the meantime. So you
need the cpumask anyway.
If you keep the cpumask here in {en,de}queue_task_fair() you can get
away with just a single cpumask_set_cpu(cpu_of(rq)).
If you want to experiment with hooking into load_balance() which would
cover periodic, idle, and nohz_idle balancing I think it could worth a
try setting env->src_cpu and env->dst_cpu in the update_cpus mask just
after attach_tasks() in load_balance(). You get there every time tasks
have been migrated due to 'normal' load-balancing.
You can do something very similar after attach_one_task() in
active_load_balance_cpu_stop() to get active balance migrations too (not
'normal').
I guess that should cover all changes in utilization due to task
migrations. The last bit missing is changes to utilization caused by
changing task behaviour. For example, a small task suddenly turning
heavy. Since it is now runnable all the time it won't be dequeued and
load_balance() may not find a reason to migrate it. If we don't check
the utilization of busy cpus every now and again the frequency we picked
a while ago may no longer match the current utilization.
A naive solution could be to mark a cpu in update_cpus every time it
receives a tick. At the end of task_tick_fair() for example. But then it
isn't really event driven anymore. It needs some more thinking.
Morten

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Eas-dev] [PATCH 3/6] sched: cfs: cpu frequency scaling based on task placement

Signed-off-by: Michael Turquette mturquette@linaro.org