Hi Guys,
All of this work was done by Steve before he left. I have made very minor changes, merged few patches, rebased over 4.10-rc5.
More details can be found here:
https://projects.linaro.org/browse/PMWG-1018
With Android UI and benchmarks the latency of cpufreq response to certain scheduling events can become very critical. Currently on mainline tip, callbacks into schedutil are only made from the scheduler if the target CPU of the event is the same as the current CPU. This means there are certain situations where a target CPU may not run schedutil for some time.
One testcase to show this behavior is where a task starts running on CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the system is configured such that new tasks should receive maximum demand initially, this should result in CPU0 increasing frequency immediately. Because of the above mentioned limitation though this does not occur.
This patchset defers the callback into schedutil if the callback would be remote (not for a CPU in the policy of which we are running). If there is no preemption required by the wakeup a late callback into schedutil is made, and schedutil is modified to be able to correctly deal with remote callbacks. If preemption does occur then the scheduler, and schedutil, will run on the remote CPU anyway.
I would be doing further testing on this to get more performance numbers with it, just wanted to get some early responses and so sending it to the EAS list.
-- viresh
Steve Muckle (9): sched: cpufreq: add cpu to update_util_data irq_work: add irq_work_queue_on for !CONFIG_SMP sched: cpufreq: extend irq work to support fast switches sched: cpufreq: remove smp_processor_id() in remote paths sched: create late cpufreq callback sched: cpufreq: detect, process remote callbacks cpufreq: governor: support scheduler cpufreq callbacks on remote CPUs intel_pstate: ignore scheduler cpufreq callbacks on remote CPUs sched: cpufreq: enable remote sched cpufreq callbacks
drivers/cpufreq/cpufreq_governor.c | 2 +- drivers/cpufreq/intel_pstate.c | 3 ++ include/linux/irq_work.h | 7 ++++ include/linux/sched.h | 1 + kernel/sched/core.c | 4 ++ kernel/sched/cpufreq.c | 1 + kernel/sched/cpufreq_schedutil.c | 80 +++++++++++++++++++++++++++----------- kernel/sched/fair.c | 6 ++- kernel/sched/sched.h | 24 +++++++++++- 9 files changed, 102 insertions(+), 26 deletions(-)
-- 2.7.1.410.g6faf27b
From: Steve Muckle smuckle@linaro.org
Upcoming support for scheduler cpufreq callbacks on remote wakeups will require the client to know what the target CPU is that the callback is being invoked for. Add this information into the callback data structure.
Signed-off-by: Steve Muckle smuckle@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- include/linux/sched.h | 1 + kernel/sched/cpufreq.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/include/linux/sched.h b/include/linux/sched.h index ad3ec9ec61f7..8d4409287adf 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -3663,6 +3663,7 @@ static inline unsigned long rlimit_max(unsigned int limit) #ifdef CONFIG_CPU_FREQ struct update_util_data { void (*func)(struct update_util_data *data, u64 time, unsigned int flags); + int cpu; };
void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data, diff --git a/kernel/sched/cpufreq.c b/kernel/sched/cpufreq.c index dbc51442ecbc..ee4c596b71b4 100644 --- a/kernel/sched/cpufreq.c +++ b/kernel/sched/cpufreq.c @@ -42,6 +42,7 @@ void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data, return;
data->func = func; + data->cpu = cpu; rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), data); } EXPORT_SYMBOL_GPL(cpufreq_add_update_util_hook); -- 2.7.1.410.g6faf27b
Hi Viresh,
On 23/01/17 16:51, Viresh Kumar wrote:
From: Steve Muckle smuckle@linaro.org
Upcoming support for scheduler cpufreq callbacks on remote wakeups will require the client to know what the target CPU is that the callback is being invoked for. Add this information into the callback data structure.
Signed-off-by: Steve Muckle smuckle@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
include/linux/sched.h | 1 + kernel/sched/cpufreq.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/include/linux/sched.h b/include/linux/sched.h index ad3ec9ec61f7..8d4409287adf 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -3663,6 +3663,7 @@ static inline unsigned long rlimit_max(unsigned int limit) #ifdef CONFIG_CPU_FREQ struct update_util_data { void (*func)(struct update_util_data *data, u64 time, unsigned int flags);
int cpu;
This bit seems to have whitespace damage issues (spaces instead of tabs), can you please check? I'm actually cherry-picking from
ssh://git@git.linaro.org/people/viresh.kumar/linux.git cpufreq/remote-wakeup
and include/linux/sched.h seems to have spaces instead of tabs at the point above.
Best,
- Juri
On 24-01-17, 11:00, Juri Lelli wrote:
Hi Viresh,
On 23/01/17 16:51, Viresh Kumar wrote:
From: Steve Muckle smuckle@linaro.org
Upcoming support for scheduler cpufreq callbacks on remote wakeups will require the client to know what the target CPU is that the callback is being invoked for. Add this information into the callback data structure.
Signed-off-by: Steve Muckle smuckle@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
include/linux/sched.h | 1 + kernel/sched/cpufreq.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/include/linux/sched.h b/include/linux/sched.h index ad3ec9ec61f7..8d4409287adf 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -3663,6 +3663,7 @@ static inline unsigned long rlimit_max(unsigned int limit) #ifdef CONFIG_CPU_FREQ struct update_util_data { void (*func)(struct update_util_data *data, u64 time, unsigned int flags);
int cpu;
This bit seems to have whitespace damage issues (spaces instead of tabs), can you please check? I'm actually cherry-picking from
ssh://git@git.linaro.org/people/viresh.kumar/linux.git cpufreq/remote-wakeup
and include/linux/sched.h seems to have spaces instead of tabs at the point above.
Thanks for noticing that. Probably Steve was trying to copy how it is done earlier. If you see the earlier line: void (*func)..., it has the same issue. Other places in the file don't have that though.
Anyway, we shall keep tabs going forward. I will fix that one.
-- viresh
From: Steve Muckle smuckle@linaro.org
Having irq_work_queue_on() available for !CONFIG_SMP can make some call sites cleaner.
Signed-off-by: Steve Muckle smuckle@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- include/linux/irq_work.h | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h index 47b9ebd4a74f..0195c3502d6b 100644 --- a/include/linux/irq_work.h +++ b/include/linux/irq_work.h @@ -1,6 +1,7 @@ #ifndef _LINUX_IRQ_WORK_H #define _LINUX_IRQ_WORK_H
+#include <linux/bug.h> #include <linux/llist.h>
/* @@ -36,6 +37,12 @@ bool irq_work_queue(struct irq_work *work);
#ifdef CONFIG_SMP bool irq_work_queue_on(struct irq_work *work, int cpu); +#else +static inline bool irq_work_queue_on(struct irq_work *work, int cpu) +{ + BUG_ON(cpu != 0); + return irq_work_queue(work); +} #endif
void irq_work_tick(void); -- 2.7.1.410.g6faf27b
From: Steve Muckle smuckle@linaro.org
In preparation for schedutil receiving sched cpufreq callbacks for remote CPUs, extend the irq work in schedutil to support policies with fast switching enabled in addition to policies using the slow path.
Signed-off-by: Steve Muckle smuckle@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/sched/cpufreq_schedutil.c | 31 +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8 deletions(-)
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index fd4659313640..3cf7b8cb4ffe 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -87,6 +87,19 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) return delta_ns >= sg_policy->freq_update_delay_ns; }
+static void sugov_fast_switch(struct sugov_policy *sg_policy, int cpu, + unsigned int next_freq) +{ + struct cpufreq_policy *policy = sg_policy->policy; + + next_freq = cpufreq_driver_fast_switch(policy, next_freq); + if (next_freq == CPUFREQ_ENTRY_INVALID) + return; + + policy->cur = next_freq; + trace_cpu_frequency(next_freq, cpu); +} + static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, unsigned int next_freq) { @@ -100,12 +113,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, return; } sg_policy->next_freq = next_freq; - next_freq = cpufreq_driver_fast_switch(policy, next_freq); - if (next_freq == CPUFREQ_ENTRY_INVALID) - return; - - policy->cur = next_freq; - trace_cpu_frequency(next_freq, smp_processor_id()); + sugov_fast_switch(sg_policy, smp_processor_id(), next_freq); } else if (sg_policy->next_freq != next_freq) { sg_policy->next_freq = next_freq; sg_policy->work_in_progress = true; @@ -310,9 +318,16 @@ static void sugov_work(struct kthread_work *work)
static void sugov_irq_work(struct irq_work *irq_work) { - struct sugov_policy *sg_policy; + struct sugov_policy *sg_policy = container_of(irq_work, struct + sugov_policy, irq_work); + struct cpufreq_policy *policy = sg_policy->policy;
- sg_policy = container_of(irq_work, struct sugov_policy, irq_work); + if (policy->fast_switch_enabled) { + sugov_fast_switch(sg_policy, smp_processor_id(), + sg_policy->next_freq); + sg_policy->work_in_progress = false; + return; + }
/* * For RT and deadline tasks, the schedutil governor shoots the -- 2.7.1.410.g6faf27b
From: Steve Muckle smuckle@linaro.org
Upcoming support for remote callbacks from the scheduler into schedutil requires that the CPU identified in the hook structure be used to indicate the CPU being operated on, rather than relying on smp_processor_id().
Signed-off-by: Steve Muckle smuckle@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/sched/cpufreq_schedutil.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 3cf7b8cb4ffe..2904f6ee7888 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -100,8 +100,8 @@ static void sugov_fast_switch(struct sugov_policy *sg_policy, int cpu, trace_cpu_frequency(next_freq, cpu); }
-static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, - unsigned int next_freq) +static void sugov_update_commit(struct sugov_policy *sg_policy, int cpu, + u64 time, unsigned int next_freq) { struct cpufreq_policy *policy = sg_policy->policy;
@@ -109,11 +109,11 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
if (policy->fast_switch_enabled) { if (sg_policy->next_freq == next_freq) { - trace_cpu_frequency(policy->cur, smp_processor_id()); + trace_cpu_frequency(policy->cur, cpu); return; } sg_policy->next_freq = next_freq; - sugov_fast_switch(sg_policy, smp_processor_id(), next_freq); + sugov_fast_switch(sg_policy, cpu, next_freq); } else if (sg_policy->next_freq != next_freq) { sg_policy->next_freq = next_freq; sg_policy->work_in_progress = true; @@ -159,12 +159,12 @@ static unsigned int get_next_freq(struct sugov_cpu *sg_cpu, unsigned long util, return cpufreq_driver_resolve_freq(policy, freq); }
-static void sugov_get_util(unsigned long *util, unsigned long *max) +static void sugov_get_util(unsigned long *util, unsigned long *max, int cpu) { - struct rq *rq = this_rq(); + struct rq *rq = cpu_rq(cpu); unsigned long cfs_max;
- cfs_max = arch_scale_cpu_capacity(NULL, smp_processor_id()); + cfs_max = arch_scale_cpu_capacity(NULL, cpu);
*util = min(rq->cfs.avg.util_avg, cfs_max); *max = cfs_max; @@ -218,11 +218,11 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, if (flags & SCHED_CPUFREQ_RT_DL) { next_f = policy->cpuinfo.max_freq; } else { - sugov_get_util(&util, &max); + sugov_get_util(&util, &max, hook->cpu); sugov_iowait_boost(sg_cpu, &util, &max); next_f = get_next_freq(sg_cpu, util, max); } - sugov_update_commit(sg_policy, time, next_f); + sugov_update_commit(sg_policy, hook->cpu, time, next_f); }
static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, @@ -245,10 +245,10 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, unsigned long j_util, j_max; s64 delta_ns;
- if (j == smp_processor_id()) + j_sg_cpu = &per_cpu(sugov_cpu, j); + if (j_sg_cpu == sg_cpu) continue;
- j_sg_cpu = &per_cpu(sugov_cpu, j); /* * If the CPU utilization was last updated before the previous * frequency update and the time elapsed between the last update @@ -285,7 +285,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, unsigned long util, max; unsigned int next_f;
- sugov_get_util(&util, &max); + sugov_get_util(&util, &max, hook->cpu);
raw_spin_lock(&sg_policy->update_lock);
@@ -298,7 +298,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
if (sugov_should_update_freq(sg_policy, time)) { next_f = sugov_next_freq_shared(sg_cpu, util, max, flags); - sugov_update_commit(sg_policy, time, next_f); + sugov_update_commit(sg_policy, hook->cpu, time, next_f); }
raw_spin_unlock(&sg_policy->update_lock); -- 2.7.1.410.g6faf27b
From: Steve Muckle smuckle@linaro.org
In order to trigger a frequency change on the target CPU of a remote wakeup, a client of the scheduler cpufreq hook may need to send an IPI to that target CPU.
This IPI is redundant if the wakeup causes preemption because the scheduler (and the scheduler cpufreq hook) will run soon on the target CPU in that case anyway.
Add a late cpufreq callback to cover the non-preemption case. It can be requested by the scheduler cpufreq hook client by setting rq->cpufreq_late_cb. The callback is only invoked if preemption does not occur.
Signed-off-by: Steve Muckle smuckle@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/sched/core.c | 4 ++++ kernel/sched/sched.h | 21 +++++++++++++++++++++ 2 files changed, 25 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c56fb57f2991..5e99fdae2590 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -485,6 +485,8 @@ void resched_curr(struct rq *rq)
lockdep_assert_held(&rq->lock);
+ cpufreq_clear_late_cb(rq); + if (test_tsk_need_resched(curr)) return;
@@ -970,6 +972,8 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags) */ if (task_on_rq_queued(rq->curr) && test_tsk_need_resched(rq->curr)) rq_clock_skip_update(rq, true); + + cpufreq_late_update(rq); }
#ifdef CONFIG_SMP diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 7b34c7826ca5..108d159f9a35 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -727,6 +727,10 @@ struct rq { /* Must be inspected within a rcu lock section */ struct cpuidle_state *idle_state; #endif + +#if defined(CONFIG_SMP) && defined(CONFIG_CPU_FREQ) + unsigned char cpufreq_late_cb; +#endif };
static inline int cpu_of(struct rq *rq) @@ -1817,6 +1821,23 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {} static inline void cpufreq_update_this_cpu(struct rq *rq, unsigned int flags) {} #endif /* CONFIG_CPU_FREQ */
+#if defined(CONFIG_SMP) && defined(CONFIG_CPU_FREQ) +static inline void cpufreq_late_update(struct rq *rq) +{ + if (rq->cpufreq_late_cb) { + cpufreq_update_util(rq, 0); + rq->cpufreq_late_cb = false; + } +} +static inline void cpufreq_clear_late_cb(struct rq *rq) +{ + rq->cpufreq_late_cb = false; +} +#else +static inline void cpufreq_late_update(struct rq *rq) {} +static inline void cpufreq_clear_late_cb(struct rq *rq) {} +#endif /* CONFIG_SMP && CONFIG_CPU_FREQ */ + #ifdef arch_scale_freq_capacity #ifndef arch_scale_freq_invariant #define arch_scale_freq_invariant() (true) -- 2.7.1.410.g6faf27b
From: Steve Muckle smuckle@linaro.org
An earlier patch provided a flag, rq->cpufreq_late_cb, which can be used to request a second callback into schedutil if the preemption does not occur on the target CPU.
Use this flag to implement support for remote callbacks in schedutil. If a remote callback is seen and rq->cpufreq_late_cb is not set, then it is an early callback and it is not yet known whether preemption will happen. Request a deferred callback by setting rq->cpufreq_late_cb.
If a remote callback is seen and rq->cpufreq_late_cb is set, then this is the deferred callback and preemption did not occur. Queue the irq work for this callback on the destination CPU. The irq work will carry out the fast or slow switch as appropriate.
A callback for a CPU which is not the current CPU but is a CPU in the same cpufreq policy as the current CPU is not a remote callback. These callbacks are treated the same as callbacks for the current CPU.
Signed-off-by: Steve Muckle smuckle@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/sched/cpufreq_schedutil.c | 33 +++++++++++++++++++++++++++------ 1 file changed, 27 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 2904f6ee7888..40c2c728602f 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -66,6 +66,18 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
/************************ Governor internals ***********************/
+static inline bool sugov_defer_remote(bool remote, int cpu) +{ + struct rq *rq = cpu_rq(cpu); + + if (remote && !rq->cpufreq_late_cb) { + rq->cpufreq_late_cb = true; + return true; + } else { + return false; + } +} + static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) { s64 delta_ns; @@ -100,14 +112,14 @@ static void sugov_fast_switch(struct sugov_policy *sg_policy, int cpu, trace_cpu_frequency(next_freq, cpu); }
-static void sugov_update_commit(struct sugov_policy *sg_policy, int cpu, - u64 time, unsigned int next_freq) +static void sugov_update_commit(struct sugov_policy *sg_policy, bool remote, + int cpu, u64 time, unsigned int next_freq) { struct cpufreq_policy *policy = sg_policy->policy;
sg_policy->last_freq_update_time = time;
- if (policy->fast_switch_enabled) { + if (policy->fast_switch_enabled && !remote) { if (sg_policy->next_freq == next_freq) { trace_cpu_frequency(policy->cur, cpu); return; @@ -117,7 +129,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, int cpu, } else if (sg_policy->next_freq != next_freq) { sg_policy->next_freq = next_freq; sg_policy->work_in_progress = true; - irq_work_queue(&sg_policy->irq_work); + irq_work_queue_on(&sg_policy->irq_work, cpu); } }
@@ -206,9 +218,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util); struct sugov_policy *sg_policy = sg_cpu->sg_policy; struct cpufreq_policy *policy = sg_policy->policy; + bool remote = smp_processor_id() != hook->cpu; unsigned long util, max; unsigned int next_f;
+ if (sugov_defer_remote(remote, hook->cpu)) + return; + sugov_set_iowait_boost(sg_cpu, time, flags); sg_cpu->last_update = time;
@@ -222,7 +238,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, sugov_iowait_boost(sg_cpu, &util, &max); next_f = get_next_freq(sg_cpu, util, max); } - sugov_update_commit(sg_policy, hook->cpu, time, next_f); + sugov_update_commit(sg_policy, remote, hook->cpu, time, next_f); }
static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, @@ -282,9 +298,14 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, { struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util); struct sugov_policy *sg_policy = sg_cpu->sg_policy; + struct cpufreq_policy *policy = sg_policy->policy; + bool remote = !cpumask_test_cpu(smp_processor_id(), policy->cpus); unsigned long util, max; unsigned int next_f;
+ if (sugov_defer_remote(remote, hook->cpu)) + return; + sugov_get_util(&util, &max, hook->cpu);
raw_spin_lock(&sg_policy->update_lock); @@ -298,7 +319,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
if (sugov_should_update_freq(sg_policy, time)) { next_f = sugov_next_freq_shared(sg_cpu, util, max, flags); - sugov_update_commit(sg_policy, hook->cpu, time, next_f); + sugov_update_commit(sg_policy, remote, hook->cpu, time, next_f); }
raw_spin_unlock(&sg_policy->update_lock); -- 2.7.1.410.g6faf27b
From: Steve Muckle smuckle@linaro.org
In preparation for the scheduler cpufreq callback happening on remote CPUs, add support for this in the dbs governors. The dbs governors make assumptions about the callback occurring on the CPU being updated.
Signed-off-by: Steve Muckle smuckle@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_governor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 0196467280bd..ff25aea6faa8 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -316,7 +316,7 @@ static void dbs_update_util_handler(struct update_util_data *data, u64 time,
policy_dbs->last_sample_time = time; policy_dbs->work_in_progress = true; - irq_work_queue(&policy_dbs->irq_work); + irq_work_queue_on(&policy_dbs->irq_work, data->cpu); }
static void gov_set_update_util(struct policy_dbs_info *policy_dbs, -- 2.7.1.410.g6faf27b
From: Steve Muckle smuckle@linaro.org
In preparation for the scheduler cpufreq callback happening on remote CPUs, check for this case in intel_pstate which currently requires the callback run on the local CPU. Such callbacks are ignored for now.
Signed-off-by: Steve Muckle smuckle@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/intel_pstate.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index f91c25718d16..45bd812be8bc 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -1786,6 +1786,9 @@ static void intel_pstate_update_util(struct update_util_data *data, u64 time, struct cpudata *cpu = container_of(data, struct cpudata, update_util); u64 delta_ns;
+ if (smp_processor_id() != data->cpu) + return; + if (pstate_funcs.get_target_pstate == get_target_pstate_use_cpu_load) { if (flags & SCHED_CPUFREQ_IOWAIT) { cpu->iowait_boost = int_tofp(1); -- 2.7.1.410.g6faf27b
From: Steve Muckle smuckle@linaro.org
Now that all clients properly support (or ignore) remote scheduler cpufreq callbacks, remove the restriction that such callbacks only be made in CFS on the local CPU.
Signed-off-by: Steve Muckle smuckle@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/sched/fair.c | 6 ++++-- kernel/sched/sched.h | 3 ++- 2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6559d197e08a..2e8068b0056c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3169,7 +3169,9 @@ static inline void set_tg_cfs_propagate(struct cfs_rq *cfs_rq) {}
static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq) { - if (&this_rq()->cfs == cfs_rq) { + struct rq *rq = rq_of(cfs_rq); + + if (&rq->cfs == cfs_rq) { /* * There are a few boundary cases this might miss but it should * get called often enough that that should (hopefully) not be @@ -3186,7 +3188,7 @@ static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq) * * See cpu_util(). */ - cpufreq_update_util(rq_of(cfs_rq), 0); + cpufreq_update_util(rq, 0); } }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 108d159f9a35..00daec413615 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1806,7 +1806,8 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) { struct update_util_data *data;
- data = rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data)); + data = rcu_dereference_sched(*per_cpu_ptr(&cpufreq_update_util_data, + cpu_of(rq))); if (data) data->func(data, rq_clock(rq), flags); } -- 2.7.1.410.g6faf27b
On 23-01-17, 16:51, Viresh Kumar wrote:
Hi Guys,
All of this work was done by Steve before he left. I have made very minor changes, merged few patches, rebased over 4.10-rc5.
More details can be found here:
https://projects.linaro.org/browse/PMWG-1018
With Android UI and benchmarks the latency of cpufreq response to certain scheduling events can become very critical. Currently on mainline tip, callbacks into schedutil are only made from the scheduler if the target CPU of the event is the same as the current CPU. This means there are certain situations where a target CPU may not run schedutil for some time.
One testcase to show this behavior is where a task starts running on CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the system is configured such that new tasks should receive maximum demand initially, this should result in CPU0 increasing frequency immediately. Because of the above mentioned limitation though this does not occur.
This patchset defers the callback into schedutil if the callback would be remote (not for a CPU in the policy of which we are running). If there is no preemption required by the wakeup a late callback into schedutil is made, and schedutil is modified to be able to correctly deal with remote callbacks. If preemption does occur then the scheduler, and schedutil, will run on the remote CPU anyway.
I would be doing further testing on this to get more performance numbers with it, just wanted to get some early responses and so sending it to the EAS list.
FWIW, Steve provided me a .c file to test this stuff and I have written a short script to get traces along with that. Just in case it is helpful for any of you.
I tested it on Odoroid XU3 (big LITTLE) and remote wakeup is working for me as well.
-- viresh
Hi Viresh,
On 23/01/17 16:51, Viresh Kumar wrote:
Hi Guys,
All of this work was done by Steve before he left. I have made very minor changes, merged few patches, rebased over 4.10-rc5.
More details can be found here:
https://projects.linaro.org/browse/PMWG-1018
With Android UI and benchmarks the latency of cpufreq response to certain scheduling events can become very critical. Currently on mainline tip, callbacks into schedutil are only made from the scheduler if the target CPU of the event is the same as the current CPU. This means there are certain situations where a target CPU may not run schedutil for some time.
One testcase to show this behavior is where a task starts running on CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the system is configured such that new tasks should receive maximum demand initially, this should result in CPU0 increasing frequency immediately. Because of the above mentioned limitation though this does not occur.
This patchset defers the callback into schedutil if the callback would be remote (not for a CPU in the policy of which we are running). If there is no preemption required by the wakeup a late callback into schedutil is made, and schedutil is modified to be able to correctly deal with remote callbacks. If preemption does occur then the scheduler, and schedutil, will run on the remote CPU anyway.
I would be doing further testing on this to get more performance numbers with it, just wanted to get some early responses and so sending it to the EAS list.
I managed to run jankbench and youtube with and without this set on top of a schedutil backport for Pixel phones. I uploaded results here:
https://drive.google.com/open?id=0B0gETIMiqtYIOHhmUDJyNXpzQkE
Not sure everybody is familiar with what reported, so please don't hesitate to ask for any sort of clarification, but I couldn't find any evident difference between having or not having the set in. This is a very narrow set of workloads, though. So, are there any other Android workloads/benchmark/platforms positively affected by the changes?
Best,
- Juri
On 26-01-17, 14:40, Juri Lelli wrote:
I managed to run jankbench and youtube with and without this set on top of a schedutil backport for Pixel phones. I uploaded results here:
https://drive.google.com/open?id=0B0gETIMiqtYIOHhmUDJyNXpzQkE
Not sure everybody is familiar with what reported, so please don't hesitate to ask for any sort of clarification, but I couldn't find any evident difference between having or not having the set in. This is a very narrow set of workloads, though. So, are there any other Android workloads/benchmark/platforms positively affected by the changes?
Hi Juri,
I tried Hackbench, recentfling and galleryfling. The first two didn't gave anything special, but I do see improvements with galleryfling.
I tested with workload automation and the results are attached in this email.
Most noticeable is the results for 90th and 95th percentiles:
With remote wakeups Without remote wakeups 23/44 28/57
FWIW, I am using images provided by Steve (around 300 MBs). I am uploading them to google drive for Vincent. Lemme know if you also need a link.
-- viresh
Attaching now, finally :)
On 30-01-17, 15:52, Viresh Kumar wrote:
On 26-01-17, 14:40, Juri Lelli wrote:
I managed to run jankbench and youtube with and without this set on top of a schedutil backport for Pixel phones. I uploaded results here:
https://drive.google.com/open?id=0B0gETIMiqtYIOHhmUDJyNXpzQkE
Not sure everybody is familiar with what reported, so please don't hesitate to ask for any sort of clarification, but I couldn't find any evident difference between having or not having the set in. This is a very narrow set of workloads, though. So, are there any other Android workloads/benchmark/platforms positively affected by the changes?
Hi Juri,
I tried Hackbench, recentfling and galleryfling. The first two didn't gave anything special, but I do see improvements with galleryfling.
I tested with workload automation and the results are attached in this email.
Most noticeable is the results for 90th and 95th percentiles:
With remote wakeups Without remote wakeups 23/44 28/57
FWIW, I am using images provided by Steve (around 300 MBs). I am uploading them to google drive for Vincent. Lemme know if you also need a link.
-- viresh
-- viresh
On 30-01-17, 15:54, Viresh Kumar wrote:
FWIW, I am using images provided by Steve (around 300 MBs). I am uploading them to google drive for Vincent. Lemme know if you also need a link.
The images are uploaded here.
https://drive.google.com/file/d/0B3L3wI0xSjydQUlhbVFUV1UtX2M/view?usp=sharin...
-- viresh
Hi,
On 30/01/17 15:52, Viresh Kumar wrote:
On 26-01-17, 14:40, Juri Lelli wrote:
I managed to run jankbench and youtube with and without this set on top of a schedutil backport for Pixel phones. I uploaded results here:
https://drive.google.com/open?id=0B0gETIMiqtYIOHhmUDJyNXpzQkE
Not sure everybody is familiar with what reported, so please don't hesitate to ask for any sort of clarification, but I couldn't find any evident difference between having or not having the set in. This is a very narrow set of workloads, though. So, are there any other Android workloads/benchmark/platforms positively affected by the changes?
Hi Juri,
I tried Hackbench, recentfling and galleryfling. The first two didn't gave anything special, but I do see improvements with galleryfling.
I tested with workload automation and the results are attached in this email.
Thanks for sharing. Any idea why galleryfling in particular is seeing benefits? I'm asking because it might be hard to defend the changes upstream without a clear understanding of why they are needed (and the standard hackbench scores don't seem to help us here :().
Best,
- Juri
Most noticeable is the results for 90th and 95th percentiles:
With remote wakeups Without remote wakeups 23/44 28/57
FWIW, I am using images provided by Steve (around 300 MBs). I am uploading them to google drive for Vincent. Lemme know if you also need a link.
-- viresh
On Mon, Jan 23, 2017 at 3:21 AM, Viresh Kumar viresh.kumar@linaro.org wrote:
Hi Guys,
All of this work was done by Steve before he left. I have made very minor changes, merged few patches, rebased over 4.10-rc5.
More details can be found here:
https://projects.linaro.org/browse/PMWG-1018
With Android UI and benchmarks the latency of cpufreq response to certain scheduling events can become very critical. Currently on mainline tip, callbacks into schedutil are only made from the scheduler if the target CPU of the event is the same as the current CPU. This means there are certain situations where a target CPU may not run schedutil for some time.
One testcase to show this behavior is where a task starts running on CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the system is configured such that new tasks should receive maximum demand initially, this should result in CPU0 increasing frequency immediately. Because of the above mentioned limitation though this does not occur.
Some possibly naive questions about this patch, when you say "system is configured such that new tasks should receive maximum demand", do you mean an RT task waking up on a remote CPU or do you mean in general? Can you clarify how/where this configuration for new tasks is done?
Also, if new tasks are not to receive maximum demand initially (as you indicated its configurable), then shouldn't the late cpufreq callbacks never happen for the cases where new tasks are not to receive maximum demand? With these patches they always will invoke the late callbacks? In this case the currently running task on the remote CPU should already have brought the demand levels to a high value, then in that case should the callbacks run? I was just wondering if we can avoid unwanted IPIs from this patch in that case.
Thanks!
Joel
This patchset defers the callback into schedutil if the callback would be remote (not for a CPU in the policy of which we are running). If there is no preemption required by the wakeup a late callback into schedutil is made, and schedutil is modified to be able to correctly deal with remote callbacks. If preemption does occur then the scheduler, and schedutil, will run on the remote CPU anyway.
I would be doing further testing on this to get more performance numbers with it, just wanted to get some early responses and so sending it to the EAS list.
-- viresh
Steve Muckle (9): sched: cpufreq: add cpu to update_util_data irq_work: add irq_work_queue_on for !CONFIG_SMP sched: cpufreq: extend irq work to support fast switches sched: cpufreq: remove smp_processor_id() in remote paths sched: create late cpufreq callback sched: cpufreq: detect, process remote callbacks cpufreq: governor: support scheduler cpufreq callbacks on remote CPUs intel_pstate: ignore scheduler cpufreq callbacks on remote CPUs sched: cpufreq: enable remote sched cpufreq callbacks
drivers/cpufreq/cpufreq_governor.c | 2 +- drivers/cpufreq/intel_pstate.c | 3 ++ include/linux/irq_work.h | 7 ++++ include/linux/sched.h | 1 + kernel/sched/core.c | 4 ++ kernel/sched/cpufreq.c | 1 + kernel/sched/cpufreq_schedutil.c | 80 +++++++++++++++++++++++++++----------- kernel/sched/fair.c | 6 ++- kernel/sched/sched.h | 24 +++++++++++- 9 files changed, 102 insertions(+), 26 deletions(-)
-- 2.7.1.410.g6faf27b
eas-dev mailing list eas-dev@lists.linaro.org https://lists.linaro.org/mailman/listinfo/eas-dev
On 26-01-17, 11:15, Joel Fernandes wrote:
Some possibly naive questions about this patch, when you say "system is configured such that new tasks should receive maximum demand", do you mean an RT task waking up on a remote CPU or do you mean in general? Can you clarify how/where this configuration for new tasks is done?
I think Steve (who wrote this stuff initially) wanted to say that in a general way. We don't have any such configurable way possible right now, but there were discussions on how much utilization should a new task be assigned when it first comes up.
Also, if new tasks are not to receive maximum demand initially (as you
This stuff shall help for all the ranges I hope, not just for maximum demands initially. Its just that the CPU may be running at an OPP much lower than what the task may want initially. Without this patch we will wait for a tick's time to raise it up.
indicated its configurable), then shouldn't the late cpufreq callbacks never happen for the cases where new tasks are not to receive maximum demand? With these patches they always will invoke the late callbacks? In this case the currently running task on the remote CPU should already have brought the demand levels to a high value, then in that case should the callbacks run? I was just wondering if we can avoid unwanted IPIs from this patch in that case.
The IPI isn't sent always. We send it only if its been long since we last evaluated freq change and current rq load wants us to change the frequency.
-- viresh
Viresh,
If I'm not mistaken, Steve's change for allowing changes to remote CPUs was limiting it to slow path and also limiting it to only CPUs within the same cluster.
Our CPU scaling code doesn't have any such relation. We can set the frequency of any CPU from any CPU. I think it'll be best to push this detail to the CPUfreq driver. Maybe we can have a flag where the CPUfreq driver says it wants to do remote CPU setting or not. And then we can let the driver do the filtering as they see fit.
Can you do that change please?
Thanks, Saravana P.S: I'm not subscribed to eas-dev. I'll figure that out soon.
On 01/23/2017 03:21 AM, Viresh Kumar wrote:
<I'm able to read your email, but it is getting garbled for some reason when I try to reply. Deleted it.>
-- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
On 27-01-17, 19:37, Saravana Kannan wrote:
Viresh,
If I'm not mistaken, Steve's change for allowing changes to remote CPUs was limiting it to slow path
Not completely.
and also limiting it to only CPUs within the same cluster.
There are two cases really:
- Remote call for a CPU in same policy
This isn't considered as remote in this series and will be taken care of directly from sugov_update_commit().
- Remote call for a CPU from another policy
This is considered remote. And irq-work is shot from sugov_update_commit() on the target CPU. Now we again check if fast path execution is possible in the irq-work and don't involve the RT thread in that case.
Otherwise, we take care of it as slow path code via the RT thread.
Our CPU scaling code doesn't have any such relation. We can set the frequency of any CPU from any CPU. I think it'll be best to push this detail to the CPUfreq driver. Maybe we can have a flag where the CPUfreq driver says it wants to do remote CPU setting or not. And then we can let the driver do the filtering as they see fit.
You want to allow frequency setting from a CPU in policy X to a CPU in policy Y? And of course that will make sense only for the fast path.
On 01/23/2017 03:21 AM, Viresh Kumar wrote:
<I'm able to read your email, but it is getting garbled for some reason when I try to reply. Deleted it.>
Strange, I sent it with plain git send-email. Shouldn't be a problem with the mail at least.
-- viresh