[PATCH RFC 6/7] sched: cfs: cpu frequency scaling based on task placement

22 Oct 2014

{en,de}queue_task_fair are updated to track which cpus will have changed
utilization values as function of task queueing. The affected cpus are
passed on to arch_eval_cpu_freq for further machine-specific processing
based on a selectable policy.
arch_scale_cpu_freq is called from run_rebalance_domains as a way to
kick off the scaling process (via wake_up_process), so as to prevent
re-entering the {en,de}queue code.
All of the call sites in this patch are up for discussion. Does it make
sense to track which cpus have updated statistics in enqueue_fair_task?
I chose this because I wanted to gather statistics for all cpus affected
in the event CONFIG_FAIR_GROUP_SCHED is enabled. As agreed at LPC14 the
next version of this patch will focus on the simpler case of not using
scheduler cgroups, which should remove a good chunk of this code,
including the cpumask stuff.
Also discussed at LPC14 is that fact that load_balance is a very
interesting place to do this as frequency can be considered in concert
with task placement. Please put forth any ideas on a sensible way to do
this.
Is run_rebalance_domains a logical place to change cpu frequency? What
other call sites make sense?
Even for platforms that can target a cpu frequency without sleeping
(x86, some ARM platforms with PM microcontrollers) it is currently
necessary to always kick the frequency target work out into a kthread.
This is because of the rw_sem usage in the cpufreq core which might
sleep. Replacing that lock type is probably a good idea.
Not-signed-off-by: Mike Turquette mturquette@linaro.org
---
 kernel/sched/fair.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1af6f6d..3619f63 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3999,6 +3999,9 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 {
    struct cfs_rq *cfs_rq;
    struct sched_entity *se = &p->se;
+	struct cpumask update_cpus;
+
+	cpumask_clear(&update_cpus);
for_each_sched_entity(se) {
    	if (se->on_rq)
@@ -4028,12 +4031,27 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
update_cfs_shares(cfs_rq);
    	update_entity_load_avg(se, 1);
+		/* track cpus that need to be re-evaluated */
+		cpumask_set_cpu(cpu_of(rq_of(cfs_rq)), &update_cpus);
    }
+	/* !CONFIG_FAIR_GROUP_SCHED */
    if (!se) {
    	update_rq_runnable_avg(rq, rq->nr_running);
    	add_nr_running(rq, 1);
+
+		/*
+		 * FIXME for !CONFIG_FAIR_GROUP_SCHED it might be nice to
+		 * typedef update_cpus into an int and skip all of the cpumask
+		 * stuff
+		 */
+		cpumask_set_cpu(cpu_of(rq), &update_cpus);
    }
+
+	if (energy_aware())
+		if (!cpumask_empty(&update_cpus))
+			arch_eval_cpu_freq(&update_cpus);
+
    hrtick_update(rq);
 }
@@ -4049,6 +4067,9 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
    struct cfs_rq *cfs_rq;
    struct sched_entity *se = &p->se;
    int task_sleep = flags & DEQUEUE_SLEEP;
+	struct cpumask update_cpus;
+
+	cpumask_clear(&update_cpus);
for_each_sched_entity(se) {
    	cfs_rq = cfs_rq_of(se);
@@ -4089,12 +4110,27 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
update_cfs_shares(cfs_rq);
    	update_entity_load_avg(se, 1);
+		/* track runqueues/cpus that need to be re-evaluated */
+		cpumask_set_cpu(cpu_of(rq_of(cfs_rq)), &update_cpus);
    }
+	/* !CONFIG_FAIR_GROUP_SCHED */
    if (!se) {
    	sub_nr_running(rq, 1);
    	update_rq_runnable_avg(rq, 1);
+
+		/*
+		 * FIXME for !CONFIG_FAIR_GROUP_SCHED it might be nice to
+		 * typedef update_cpus into an int and skip all of the cpumask
+		 * stuff
+		 */
+		cpumask_set_cpu(cpu_of(rq), &update_cpus);
    }
+
+	if (energy_aware())
+		if (!cpumask_empty(&update_cpus))
+			arch_eval_cpu_freq(&update_cpus);
+
    hrtick_update(rq);
 }
@@ -7536,6 +7572,9 @@ static void run_rebalance_domains(struct softirq_action *h)
     * stopped.
     */
    nohz_idle_balance(this_rq, idle);
+
+	if (energy_aware())
+		arch_scale_cpu_freq();
 }
/*
-- 
1.8.3.2



    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[PATCH RFC 6/7] sched: cfs: cpu frequency scaling based on task placement