Hi,
This is based of the work done by Steve Muckle [1] before he left Linaro and most of the patches are still under his authorship. I have done couple of improvements (detailed in individual patches) and removed the late callback support [2] as I wasn't sure of the value it adds. We can include it separately if others feel it is required. This series is based on pm/linux-next with patches [3] and [4] applied on top of it.
With Android UI and benchmarks the latency of cpufreq response to certain scheduling events can become very critical. Currently, callbacks into schedutil are only made from the scheduler if the target CPU of the event is the same as the current CPU. This means there are certain situations where a target CPU may not run schedutil for some time.
One testcase to show this behavior is where a task starts running on CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the system is configured such that new tasks should receive maximum demand initially, this should result in CPU0 increasing frequency immediately. Because of the above mentioned limitation though this does not occur. This is verified using ftrace with the sample [5] application.
This patchset updates the scheduler to call cpufreq callbacks for remote CPUs as well and updates schedutil governor to deal with it. An additional flag is added to cpufreq policies to avoid sending IPIs to remote CPUs to update the frequency, if CPUs on the platform can change frequency of any other CPU.
This series is tested with couple of usecases (Android: hackbench, recentfling, galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey board (64 bit octa-core, single policy). Only galleryfling showed minor improvements, while others didn't had much deviation.
The reason being that this patchset only targets a corner case, where following are required to be true to improve performance and that doesn't happen too often with these tests:
- Task is migrated to another CPU. - The task has maximum demand initially, and should take the CPU to higher OPPs. - And the target CPU doesn't call into schedutil until the next tick, without this patchset.
-- viresh
[1] https://git.linaro.org/people/steve.muckle/kernel.git/log/?h=pmwg-integratio... [2] https://git.linaro.org/people/steve.muckle/kernel.git/commit/?h=pmwg-integra... [3] https://marc.info/?l=linux-kernel&m=148766093718487&w=2 [4] https://marc.info/?l=linux-kernel&m=148903231720432&w=2 [5] http://pastebin.com/7LkMSRxE
Steve Muckle (8): sched: cpufreq: add cpu to update_util_data irq_work: add irq_work_queue_on for !CONFIG_SMP sched: cpufreq: extend irq work to support fast switches sched: cpufreq: remove smp_processor_id() in remote paths sched: cpufreq: detect, process remote callbacks cpufreq: governor: support scheduler cpufreq callbacks on remote CPUs intel_pstate: ignore scheduler cpufreq callbacks on remote CPUs sched: cpufreq: enable remote sched cpufreq callbacks
Viresh Kumar (1): cpufreq: Add dvfs_possible_from_any_cpu policy flag
drivers/cpufreq/cpufreq-dt.c | 1 + drivers/cpufreq/cpufreq_governor.c | 2 +- drivers/cpufreq/intel_pstate.c | 3 ++ include/linux/cpufreq.h | 9 +++++ include/linux/irq_work.h | 7 ++++ include/linux/sched/cpufreq.h | 1 + kernel/sched/cpufreq.c | 1 + kernel/sched/cpufreq_schedutil.c | 80 +++++++++++++++++++++++++++++--------- kernel/sched/fair.c | 6 ++- kernel/sched/sched.h | 3 +- 10 files changed, 90 insertions(+), 23 deletions(-)
From: Steve Muckle smuckle.linux@gmail.com
Upcoming support for scheduler cpufreq callbacks on remote wakeups will require the client to know what the target CPU is that the callback is being invoked for. Add this information into the callback data structure.
Signed-off-by: Steve Muckle smuckle.linux@gmail.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- include/linux/sched/cpufreq.h | 1 + kernel/sched/cpufreq.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h index d2be2ccbb372..f798f63d93e8 100644 --- a/include/linux/sched/cpufreq.h +++ b/include/linux/sched/cpufreq.h @@ -16,6 +16,7 @@ #ifdef CONFIG_CPU_FREQ struct update_util_data { void (*func)(struct update_util_data *data, u64 time, unsigned int flags); + int cpu; };
void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data, diff --git a/kernel/sched/cpufreq.c b/kernel/sched/cpufreq.c index dbc51442ecbc..ee4c596b71b4 100644 --- a/kernel/sched/cpufreq.c +++ b/kernel/sched/cpufreq.c @@ -42,6 +42,7 @@ void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data, return;
data->func = func; + data->cpu = cpu; rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), data); } EXPORT_SYMBOL_GPL(cpufreq_add_update_util_hook);
On Thursday, March 09, 2017 05:15:11 PM Viresh Kumar wrote:
From: Steve Muckle smuckle.linux@gmail.com
Upcoming support for scheduler cpufreq callbacks on remote wakeups will require the client to know what the target CPU is that the callback is being invoked for. Add this information into the callback data structure.
Signed-off-by: Steve Muckle smuckle.linux@gmail.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
include/linux/sched/cpufreq.h | 1 + kernel/sched/cpufreq.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h index d2be2ccbb372..f798f63d93e8 100644 --- a/include/linux/sched/cpufreq.h +++ b/include/linux/sched/cpufreq.h @@ -16,6 +16,7 @@ #ifdef CONFIG_CPU_FREQ struct update_util_data { void (*func)(struct update_util_data *data, u64 time, unsigned int flags);
- int cpu;
unsigned int?
}; void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data, diff --git a/kernel/sched/cpufreq.c b/kernel/sched/cpufreq.c index dbc51442ecbc..ee4c596b71b4 100644 --- a/kernel/sched/cpufreq.c +++ b/kernel/sched/cpufreq.c @@ -42,6 +42,7 @@ void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data, return; data->func = func;
- data->cpu = cpu;
But I'm not convinced that this helps at all.
rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), data); } EXPORT_SYMBOL_GPL(cpufreq_add_update_util_hook);
Thanks, Rafael
From: Steve Muckle smuckle.linux@gmail.com
Having irq_work_queue_on() available for !CONFIG_SMP can make some call sites cleaner.
Signed-off-by: Steve Muckle smuckle.linux@gmail.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- include/linux/irq_work.h | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h index 47b9ebd4a74f..0195c3502d6b 100644 --- a/include/linux/irq_work.h +++ b/include/linux/irq_work.h @@ -1,6 +1,7 @@ #ifndef _LINUX_IRQ_WORK_H #define _LINUX_IRQ_WORK_H
+#include <linux/bug.h> #include <linux/llist.h>
/* @@ -36,6 +37,12 @@ bool irq_work_queue(struct irq_work *work);
#ifdef CONFIG_SMP bool irq_work_queue_on(struct irq_work *work, int cpu); +#else +static inline bool irq_work_queue_on(struct irq_work *work, int cpu) +{ + BUG_ON(cpu != 0); + return irq_work_queue(work); +} #endif
void irq_work_tick(void);
On Thursday, March 09, 2017 05:15:12 PM Viresh Kumar wrote:
From: Steve Muckle smuckle.linux@gmail.com
Having irq_work_queue_on() available for !CONFIG_SMP can make some call sites cleaner.
Signed-off-by: Steve Muckle smuckle.linux@gmail.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
include/linux/irq_work.h | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h index 47b9ebd4a74f..0195c3502d6b 100644 --- a/include/linux/irq_work.h +++ b/include/linux/irq_work.h @@ -1,6 +1,7 @@ #ifndef _LINUX_IRQ_WORK_H #define _LINUX_IRQ_WORK_H +#include <linux/bug.h> #include <linux/llist.h> /* @@ -36,6 +37,12 @@ bool irq_work_queue(struct irq_work *work); #ifdef CONFIG_SMP bool irq_work_queue_on(struct irq_work *work, int cpu); +#else +static inline bool irq_work_queue_on(struct irq_work *work, int cpu) +{
- BUG_ON(cpu != 0);
Would WARN_ON(), or WARN_ON_ONCE() even, be insufficient?
- return irq_work_queue(work);
+} #endif void irq_work_tick(void);
Thanks, Rafael
On many platforms any CPU (from any cpufreq policy) can perform DVFS on behalf of other CPUs. Add a flag to identify such cpufreq policies.
Also enable it for cpufreq-dt driver which is used only on ARM platforms currently.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq-dt.c | 1 + include/linux/cpufreq.h | 9 +++++++++ 2 files changed, 10 insertions(+)
diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c index c943787d761e..e57b45f20544 100644 --- a/drivers/cpufreq/cpufreq-dt.c +++ b/drivers/cpufreq/cpufreq-dt.c @@ -274,6 +274,7 @@ static int cpufreq_init(struct cpufreq_policy *policy) transition_latency = CPUFREQ_ETERNAL;
policy->cpuinfo.transition_latency = transition_latency; + policy->dvfs_possible_from_any_cpu = true;
return 0;
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 87165f06a307..9490a314c515 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -120,6 +120,15 @@ struct cpufreq_policy { bool fast_switch_possible; bool fast_switch_enabled;
+ /* + * Remote DVFS flag (Not added to the driver structure as we don't want + * to access another structure from scheduler hotpath). + * + * Should be set if any CPU (from same or different policy) can do DVFS + * on behalf of any other CPU. + */ + bool dvfs_possible_from_any_cpu; + /* Cached frequency lookup from cpufreq_driver_resolve_freq. */ unsigned int cached_target_freq; int cached_resolved_idx;
On Thursday, March 09, 2017 05:15:13 PM Viresh Kumar wrote:
On many platforms any CPU (from any cpufreq policy) can perform DVFS on behalf of other CPUs. Add a flag to identify such cpufreq policies.
Also enable it for cpufreq-dt driver which is used only on ARM platforms currently.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/cpufreq-dt.c | 1 + include/linux/cpufreq.h | 9 +++++++++ 2 files changed, 10 insertions(+)
diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c index c943787d761e..e57b45f20544 100644 --- a/drivers/cpufreq/cpufreq-dt.c +++ b/drivers/cpufreq/cpufreq-dt.c @@ -274,6 +274,7 @@ static int cpufreq_init(struct cpufreq_policy *policy) transition_latency = CPUFREQ_ETERNAL; policy->cpuinfo.transition_latency = transition_latency;
- policy->dvfs_possible_from_any_cpu = true;
return 0; diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 87165f06a307..9490a314c515 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -120,6 +120,15 @@ struct cpufreq_policy { bool fast_switch_possible; bool fast_switch_enabled;
- /*
* Remote DVFS flag (Not added to the driver structure as we don't want
* to access another structure from scheduler hotpath).
*
* Should be set if any CPU (from same or different policy) can do DVFS
* on behalf of any other CPU.
*/
- bool dvfs_possible_from_any_cpu;
We rely on the assumption that any CPU in a policy can do DVFS in there already.
Why is this flag necessary at all?
- /* Cached frequency lookup from cpufreq_driver_resolve_freq. */ unsigned int cached_target_freq; int cached_resolved_idx;
Thanks, Rafael
From: Steve Muckle smuckle.linux@gmail.com
In preparation for schedutil receiving sched cpufreq callbacks for remote CPUs, extend the irq work in schedutil to support policies with fast switching enabled in addition to policies using the slow path.
Signed-off-by: Steve Muckle smuckle.linux@gmail.com [ vk: minor code updates ] Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/sched/cpufreq_schedutil.c | 28 ++++++++++++++++++++-------- 1 file changed, 20 insertions(+), 8 deletions(-)
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index f5ffe241812e..a418544c51b1 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -88,6 +88,17 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) return delta_ns >= sg_policy->freq_update_delay_ns; }
+static void sugov_fast_switch(struct cpufreq_policy *policy, + unsigned int next_freq) +{ + next_freq = cpufreq_driver_fast_switch(policy, next_freq); + if (next_freq == CPUFREQ_ENTRY_INVALID) + return; + + policy->cur = next_freq; + trace_cpu_frequency(next_freq, smp_processor_id()); +} + static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, unsigned int next_freq) { @@ -100,12 +111,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, } sg_policy->next_freq = next_freq; sg_policy->last_freq_update_time = time; - next_freq = cpufreq_driver_fast_switch(policy, next_freq); - if (next_freq == CPUFREQ_ENTRY_INVALID) - return; - - policy->cur = next_freq; - trace_cpu_frequency(next_freq, smp_processor_id()); + sugov_fast_switch(policy, next_freq); } else if (sg_policy->next_freq != next_freq) { sg_policy->next_freq = next_freq; sg_policy->last_freq_update_time = time; @@ -303,9 +309,15 @@ static void sugov_work(struct kthread_work *work)
static void sugov_irq_work(struct irq_work *irq_work) { - struct sugov_policy *sg_policy; + struct sugov_policy *sg_policy = container_of(irq_work, struct + sugov_policy, irq_work); + struct cpufreq_policy *policy = sg_policy->policy;
- sg_policy = container_of(irq_work, struct sugov_policy, irq_work); + if (policy->fast_switch_enabled) { + sugov_fast_switch(policy, sg_policy->next_freq); + sg_policy->work_in_progress = false; + return; + }
/* * For RT and deadline tasks, the schedutil governor shoots the
On Thursday, March 09, 2017 05:15:14 PM Viresh Kumar wrote:
From: Steve Muckle smuckle.linux@gmail.com
In preparation for schedutil receiving sched cpufreq callbacks for remote CPUs, extend the irq work in schedutil to support policies with fast switching enabled in addition to policies using the slow path.
Signed-off-by: Steve Muckle smuckle.linux@gmail.com [ vk: minor code updates ] Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
This should be merged with the [6/9].
As is, it requires me to look at two patches at the same time to make sense out of it.
Thanks, Rafael
From: Steve Muckle smuckle.linux@gmail.com
Upcoming support for remote callbacks from the scheduler into schedutil requires that the CPU identified in the hook structure be used to indicate the CPU being operated on, rather than relying on smp_processor_id().
Note that policy->cpu is passed to trace_cpu_frequency() in fast switch path, as it is safe to use any CPU which is managed by the current policy.
Signed-off-by: Steve Muckle smuckle.linux@gmail.com [ vk: updated commit log, minor code cleanups and use policy->cpu for traces ] Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/sched/cpufreq_schedutil.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index a418544c51b1..b168c31f1c8f 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -96,7 +96,7 @@ static void sugov_fast_switch(struct cpufreq_policy *policy, return;
policy->cur = next_freq; - trace_cpu_frequency(next_freq, smp_processor_id()); + trace_cpu_frequency(next_freq, policy->cpu); }
static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, @@ -106,7 +106,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
if (policy->fast_switch_enabled) { if (sg_policy->next_freq == next_freq) { - trace_cpu_frequency(policy->cur, smp_processor_id()); + trace_cpu_frequency(policy->cur, policy->cpu); return; } sg_policy->next_freq = next_freq; @@ -157,12 +157,12 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy, return cpufreq_driver_resolve_freq(policy, freq); }
-static void sugov_get_util(unsigned long *util, unsigned long *max) +static void sugov_get_util(unsigned long *util, unsigned long *max, int cpu) { - struct rq *rq = this_rq(); + struct rq *rq = cpu_rq(cpu); unsigned long cfs_max;
- cfs_max = arch_scale_cpu_capacity(NULL, smp_processor_id()); + cfs_max = arch_scale_cpu_capacity(NULL, cpu);
*util = min(rq->cfs.avg.util_avg, cfs_max); *max = cfs_max; @@ -216,7 +216,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, if (flags & SCHED_CPUFREQ_RT_DL) { next_f = policy->cpuinfo.max_freq; } else { - sugov_get_util(&util, &max); + sugov_get_util(&util, &max, hook->cpu); sugov_iowait_boost(sg_cpu, &util, &max); next_f = get_next_freq(sg_policy, util, max); } @@ -272,7 +272,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, unsigned long util, max; unsigned int next_f;
- sugov_get_util(&util, &max); + sugov_get_util(&util, &max, hook->cpu);
raw_spin_lock(&sg_policy->update_lock);
On Thursday, March 09, 2017 05:15:15 PM Viresh Kumar wrote:
From: Steve Muckle smuckle.linux@gmail.com
Upcoming support for remote callbacks from the scheduler into schedutil requires that the CPU identified in the hook structure be used to indicate the CPU being operated on, rather than relying on smp_processor_id().
Note that policy->cpu is passed to trace_cpu_frequency() in fast switch path, as it is safe to use any CPU which is managed by the current policy.
This should be commented about in the code too IMO.
Signed-off-by: Steve Muckle smuckle.linux@gmail.com [ vk: updated commit log, minor code cleanups and use policy->cpu for traces ] Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
kernel/sched/cpufreq_schedutil.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index a418544c51b1..b168c31f1c8f 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -96,7 +96,7 @@ static void sugov_fast_switch(struct cpufreq_policy *policy, return; policy->cur = next_freq;
- trace_cpu_frequency(next_freq, smp_processor_id());
- trace_cpu_frequency(next_freq, policy->cpu);
} static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, @@ -106,7 +106,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, if (policy->fast_switch_enabled) { if (sg_policy->next_freq == next_freq) {
trace_cpu_frequency(policy->cur, smp_processor_id());
} sg_policy->next_freq = next_freq;trace_cpu_frequency(policy->cur, policy->cpu); return;
@@ -157,12 +157,12 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy, return cpufreq_driver_resolve_freq(policy, freq); } -static void sugov_get_util(unsigned long *util, unsigned long *max) +static void sugov_get_util(unsigned long *util, unsigned long *max, int cpu) {
- struct rq *rq = this_rq();
- struct rq *rq = cpu_rq(cpu); unsigned long cfs_max;
- cfs_max = arch_scale_cpu_capacity(NULL, smp_processor_id());
- cfs_max = arch_scale_cpu_capacity(NULL, cpu);
*util = min(rq->cfs.avg.util_avg, cfs_max); *max = cfs_max; @@ -216,7 +216,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, if (flags & SCHED_CPUFREQ_RT_DL) { next_f = policy->cpuinfo.max_freq; } else {
sugov_get_util(&util, &max);
sugov_get_util(&util, &max, hook->cpu);
Why is this not racy?
sugov_iowait_boost(sg_cpu, &util, &max); next_f = get_next_freq(sg_policy, util, max);
} @@ -272,7 +272,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, unsigned long util, max; unsigned int next_f;
- sugov_get_util(&util, &max);
- sugov_get_util(&util, &max, hook->cpu);
And here?
raw_spin_lock(&sg_policy->update_lock);
Thanks, Rafael
On 29-03-17, 23:28, Rafael J. Wysocki wrote:
On Thursday, March 09, 2017 05:15:15 PM Viresh Kumar wrote:
@@ -216,7 +216,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, if (flags & SCHED_CPUFREQ_RT_DL) { next_f = policy->cpuinfo.max_freq; } else {
sugov_get_util(&util, &max);
sugov_get_util(&util, &max, hook->cpu);
Why is this not racy?
Why would reading the utilization values be racy? The only dynamic value here is "util_avg" and I am not sure if reading it is racy.
But, this whole routine has races which I ignored as we may end up updating frequency simultaneously from two threads.
sugov_iowait_boost(sg_cpu, &util, &max); next_f = get_next_freq(sg_policy, util, max);
} @@ -272,7 +272,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, unsigned long util, max; unsigned int next_f;
- sugov_get_util(&util, &max);
- sugov_get_util(&util, &max, hook->cpu);
And here?
raw_spin_lock(&sg_policy->update_lock);
The lock prevents the same here though.
So, if we are going to use this series, then we can use the same update-lock in case of single cpu per policies as well.
On Tue, Apr 11, 2017 at 12:35 PM, Viresh Kumar viresh.kumar@linaro.org wrote:
On 29-03-17, 23:28, Rafael J. Wysocki wrote:
On Thursday, March 09, 2017 05:15:15 PM Viresh Kumar wrote:
@@ -216,7 +216,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, if (flags & SCHED_CPUFREQ_RT_DL) { next_f = policy->cpuinfo.max_freq; } else {
sugov_get_util(&util, &max);
sugov_get_util(&util, &max, hook->cpu);
Why is this not racy?
Why would reading the utilization values be racy? The only dynamic value here is "util_avg" and I am not sure if reading it is racy.
But, this whole routine has races which I ignored as we may end up updating frequency simultaneously from two threads.
Those races aren't there if we don't update cross-CPU, which is my point. :-)
sugov_iowait_boost(sg_cpu, &util, &max); next_f = get_next_freq(sg_policy, util, max); }
@@ -272,7 +272,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, unsigned long util, max; unsigned int next_f;
- sugov_get_util(&util, &max);
- sugov_get_util(&util, &max, hook->cpu);
And here?
raw_spin_lock(&sg_policy->update_lock);
The lock prevents the same here though.
So, if we are going to use this series, then we can use the same update-lock in case of single cpu per policies as well.
No, we can't.
The lock is unavoidable in the mulit-CPU policies case, but there's no way I will agree on using a lock in the single-CPU case.
Thanks, Rafael
On 11-04-17, 16:00, Rafael J. Wysocki wrote:
On Tue, Apr 11, 2017 at 12:35 PM, Viresh Kumar viresh.kumar@linaro.org wrote:
On 29-03-17, 23:28, Rafael J. Wysocki wrote:
On Thursday, March 09, 2017 05:15:15 PM Viresh Kumar wrote:
@@ -216,7 +216,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, if (flags & SCHED_CPUFREQ_RT_DL) { next_f = policy->cpuinfo.max_freq; } else {
sugov_get_util(&util, &max);
sugov_get_util(&util, &max, hook->cpu);
Why is this not racy?
Why would reading the utilization values be racy? The only dynamic value here is "util_avg" and I am not sure if reading it is racy.
But, this whole routine has races which I ignored as we may end up updating frequency simultaneously from two threads.
Those races aren't there if we don't update cross-CPU, which is my point. :-)
Of course. There are no races without this series.
sugov_iowait_boost(sg_cpu, &util, &max); next_f = get_next_freq(sg_policy, util, max); }
@@ -272,7 +272,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, unsigned long util, max; unsigned int next_f;
- sugov_get_util(&util, &max);
- sugov_get_util(&util, &max, hook->cpu);
And here?
raw_spin_lock(&sg_policy->update_lock);
The lock prevents the same here though.
So, if we are going to use this series, then we can use the same update-lock in case of single cpu per policies as well.
No, we can't.
The lock is unavoidable in the mulit-CPU policies case, but there's no way I will agree on using a lock in the single-CPU case.
How do you suggest to avoid the locking here then ? Some atomic variable read/write as done in cpufreq_governor.c ?
On Wed, Apr 12, 2017 at 4:26 PM, Viresh Kumar viresh.kumar@linaro.org wrote:
On 11-04-17, 16:00, Rafael J. Wysocki wrote:
On Tue, Apr 11, 2017 at 12:35 PM, Viresh Kumar viresh.kumar@linaro.org wrote:
On 29-03-17, 23:28, Rafael J. Wysocki wrote:
On Thursday, March 09, 2017 05:15:15 PM Viresh Kumar wrote:
@@ -216,7 +216,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, if (flags & SCHED_CPUFREQ_RT_DL) { next_f = policy->cpuinfo.max_freq; } else {
sugov_get_util(&util, &max);
sugov_get_util(&util, &max, hook->cpu);
Why is this not racy?
Why would reading the utilization values be racy? The only dynamic value here is "util_avg" and I am not sure if reading it is racy.
But, this whole routine has races which I ignored as we may end up updating frequency simultaneously from two threads.
Those races aren't there if we don't update cross-CPU, which is my point. :-)
Of course. There are no races without this series.
sugov_iowait_boost(sg_cpu, &util, &max); next_f = get_next_freq(sg_policy, util, max); }
@@ -272,7 +272,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, unsigned long util, max; unsigned int next_f;
- sugov_get_util(&util, &max);
- sugov_get_util(&util, &max, hook->cpu);
And here?
raw_spin_lock(&sg_policy->update_lock);
The lock prevents the same here though.
So, if we are going to use this series, then we can use the same update-lock in case of single cpu per policies as well.
No, we can't.
The lock is unavoidable in the mulit-CPU policies case, but there's no way I will agree on using a lock in the single-CPU case.
How do you suggest to avoid the locking here then ? Some atomic variable read/write as done in cpufreq_governor.c ?
That is a very good question. :-)
I need to look at the scheduler code that invokes those things and see what happens in there. Chances are there already is some sufficient mutual exclusion in place.
From: Steve Muckle smuckle.linux@gmail.com
A callback is considered remote if the target CPU is not the current CPU and if it is not managed by the policy managing the current CPU or the current CPU can't do DVFS on its behalf.
Queue the irq work for remote callbacks on the destination CPU. The irq work will carry out the fast or slow switch as appropriate.
Signed-off-by: Steve Muckle smuckle.linux@gmail.com [ vk: commit log, code cleanups, introduce dvfs_possible_from_any_cpu and drop late callback support to avoid IPIs on remote CPUs. ] Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/sched/cpufreq_schedutil.c | 40 +++++++++++++++++++++++++++++++++++----- 1 file changed, 35 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index b168c31f1c8f..9bad579b6b08 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -100,11 +100,11 @@ static void sugov_fast_switch(struct cpufreq_policy *policy, }
static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, - unsigned int next_freq) + int cpu, bool remote, unsigned int next_freq) { struct cpufreq_policy *policy = sg_policy->policy;
- if (policy->fast_switch_enabled) { + if (policy->fast_switch_enabled && !remote) { if (sg_policy->next_freq == next_freq) { trace_cpu_frequency(policy->cur, policy->cpu); return; @@ -116,7 +116,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, sg_policy->next_freq = next_freq; sg_policy->last_freq_update_time = time; sg_policy->work_in_progress = true; - irq_work_queue(&sg_policy->irq_work); + irq_work_queue_on(&sg_policy->irq_work, cpu); } }
@@ -206,6 +206,20 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, struct cpufreq_policy *policy = sg_policy->policy; unsigned long util, max; unsigned int next_f; + int cpu, this_cpu = smp_processor_id(); + bool remote; + + if (policy->dvfs_possible_from_any_cpu) { + /* + * Avoid sending IPI to 'hook->cpu' if this CPU can change + * frequency on its behalf. + */ + remote = false; + cpu = this_cpu; + } else { + cpu = hook->cpu; + remote = this_cpu != hook->cpu; + }
sugov_set_iowait_boost(sg_cpu, time, flags); sg_cpu->last_update = time; @@ -220,7 +234,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, sugov_iowait_boost(sg_cpu, &util, &max); next_f = get_next_freq(sg_policy, util, max); } - sugov_update_commit(sg_policy, time, next_f); + sugov_update_commit(sg_policy, time, cpu, remote, next_f); }
static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu) @@ -269,8 +283,24 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, { struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util); struct sugov_policy *sg_policy = sg_cpu->sg_policy; + struct cpufreq_policy *policy = sg_policy->policy; unsigned long util, max; unsigned int next_f; + int cpu, this_cpu = smp_processor_id(); + bool remote; + + if (policy->dvfs_possible_from_any_cpu || + cpumask_test_cpu(this_cpu, policy->cpus)) { + /* + * Avoid sending IPI to 'hook->cpu' if this CPU can change + * frequency on its behalf. + */ + remote = false; + cpu = this_cpu; + } else { + cpu = hook->cpu; + remote = this_cpu != hook->cpu; + }
sugov_get_util(&util, &max, hook->cpu);
@@ -289,7 +319,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, else next_f = sugov_next_freq_shared(sg_cpu);
- sugov_update_commit(sg_policy, time, next_f); + sugov_update_commit(sg_policy, time, cpu, remote, next_f); }
raw_spin_unlock(&sg_policy->update_lock);
On Thursday, March 09, 2017 05:15:16 PM Viresh Kumar wrote:
From: Steve Muckle smuckle.linux@gmail.com
A callback is considered remote if the target CPU is not the current CPU and if it is not managed by the policy managing the current CPU or the current CPU can't do DVFS on its behalf.
Queue the irq work for remote callbacks on the destination CPU. The irq work will carry out the fast or slow switch as appropriate.
Signed-off-by: Steve Muckle smuckle.linux@gmail.com [ vk: commit log, code cleanups, introduce dvfs_possible_from_any_cpu and drop late callback support to avoid IPIs on remote CPUs. ] Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
kernel/sched/cpufreq_schedutil.c | 40 +++++++++++++++++++++++++++++++++++----- 1 file changed, 35 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index b168c31f1c8f..9bad579b6b08 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -100,11 +100,11 @@ static void sugov_fast_switch(struct cpufreq_policy *policy, } static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
unsigned int next_freq)
int cpu, bool remote, unsigned int next_freq)
{ struct cpufreq_policy *policy = sg_policy->policy;
- if (policy->fast_switch_enabled) {
- if (policy->fast_switch_enabled && !remote) { if (sg_policy->next_freq == next_freq) { trace_cpu_frequency(policy->cur, policy->cpu); return;
@@ -116,7 +116,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, sg_policy->next_freq = next_freq; sg_policy->last_freq_update_time = time; sg_policy->work_in_progress = true;
irq_work_queue(&sg_policy->irq_work);
}irq_work_queue_on(&sg_policy->irq_work, cpu);
} @@ -206,6 +206,20 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, struct cpufreq_policy *policy = sg_policy->policy; unsigned long util, max; unsigned int next_f;
- int cpu, this_cpu = smp_processor_id();
- bool remote;
- if (policy->dvfs_possible_from_any_cpu) {
/*
* Avoid sending IPI to 'hook->cpu' if this CPU can change
* frequency on its behalf.
*/
remote = false;
cpu = this_cpu;
- } else {
cpu = hook->cpu;
remote = this_cpu != hook->cpu;
- }
Honestly, this dvfs_possible_from_any_cpu thing doesn't make the code particularly clear and I wouldn't bother adding it, at least to start with.
I would just not do the fast switch for remote updates at all.
Plus, the single-CPU policy case is additionally complicated by the recent addition of sugov_cpu_is_busy(), so that needs to be take into account too.
sugov_set_iowait_boost(sg_cpu, time, flags); sg_cpu->last_update = time; @@ -220,7 +234,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, sugov_iowait_boost(sg_cpu, &util, &max); next_f = get_next_freq(sg_policy, util, max); }
- sugov_update_commit(sg_policy, time, next_f);
- sugov_update_commit(sg_policy, time, cpu, remote, next_f);
} static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu) @@ -269,8 +283,24 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, { struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util); struct sugov_policy *sg_policy = sg_cpu->sg_policy;
- struct cpufreq_policy *policy = sg_policy->policy; unsigned long util, max; unsigned int next_f;
- int cpu, this_cpu = smp_processor_id();
- bool remote;
- if (policy->dvfs_possible_from_any_cpu ||
cpumask_test_cpu(this_cpu, policy->cpus)) {
Again, is this actually worth it?
Thanks, Rafael
From: Steve Muckle smuckle.linux@gmail.com
In preparation for the scheduler cpufreq callback happening on remote CPUs, add support for this in the legacy (ondemand and conservative) governors. The legacy governors make assumptions about the callback occurring on the CPU being updated.
Signed-off-by: Steve Muckle smuckle.linux@gmail.com [ vk: minor updates in commit log ] Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_governor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 47e24b5384b3..c9e786e7ee1f 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -315,7 +315,7 @@ static void dbs_update_util_handler(struct update_util_data *data, u64 time,
policy_dbs->last_sample_time = time; policy_dbs->work_in_progress = true; - irq_work_queue(&policy_dbs->irq_work); + irq_work_queue_on(&policy_dbs->irq_work, data->cpu); }
static void gov_set_update_util(struct policy_dbs_info *policy_dbs,
On Thursday, March 09, 2017 05:15:17 PM Viresh Kumar wrote:
From: Steve Muckle smuckle.linux@gmail.com
In preparation for the scheduler cpufreq callback happening on remote CPUs, add support for this in the legacy (ondemand and conservative) governors. The legacy governors make assumptions about the callback occurring on the CPU being updated.
Signed-off-by: Steve Muckle smuckle.linux@gmail.com [ vk: minor updates in commit log ] Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/cpufreq_governor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 47e24b5384b3..c9e786e7ee1f 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -315,7 +315,7 @@ static void dbs_update_util_handler(struct update_util_data *data, u64 time, policy_dbs->last_sample_time = time; policy_dbs->work_in_progress = true;
- irq_work_queue(&policy_dbs->irq_work);
- irq_work_queue_on(&policy_dbs->irq_work, data->cpu);
I'm totally unconvinced that this is sufficient.
This function carries out lockless computations with the assumption that it will always run on the CPU being updated.
For instance, how is it prevented from being run on two CPUs in parallel in the single-CPU policy case if cross-CPU updates are allowed to happen?
Second, is accessing rq_clock(rq) of a remote runqueue a good idea entirely?
Thanks, Rafael
On 30-03-17, 00:14, Rafael J. Wysocki wrote:
On Thursday, March 09, 2017 05:15:17 PM Viresh Kumar wrote:
From: Steve Muckle smuckle.linux@gmail.com
In preparation for the scheduler cpufreq callback happening on remote CPUs, add support for this in the legacy (ondemand and conservative) governors. The legacy governors make assumptions about the callback occurring on the CPU being updated.
Signed-off-by: Steve Muckle smuckle.linux@gmail.com [ vk: minor updates in commit log ] Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/cpufreq_governor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 47e24b5384b3..c9e786e7ee1f 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -315,7 +315,7 @@ static void dbs_update_util_handler(struct update_util_data *data, u64 time, policy_dbs->last_sample_time = time; policy_dbs->work_in_progress = true;
- irq_work_queue(&policy_dbs->irq_work);
- irq_work_queue_on(&policy_dbs->irq_work, data->cpu);
I'm totally unconvinced that this is sufficient.
This function carries out lockless computations with the assumption that it will always run on the CPU being updated.
For instance, how is it prevented from being run on two CPUs in parallel in the single-CPU policy case if cross-CPU updates are allowed to happen?
I am convinced that it is insufficient and yes I too missed the obvious race here as well for single cpu per policy. Sorry about that.
Second, is accessing rq_clock(rq) of a remote runqueue a good idea entirely?
I am not sure about how costly that can be.
From: Steve Muckle smuckle.linux@gmail.com
In preparation for the scheduler cpufreq callback happening on remote CPUs, check for this case in intel_pstate which currently requires the callback run on the local CPU. Such callbacks are ignored for now.
Signed-off-by: Steve Muckle smuckle.linux@gmail.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/intel_pstate.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 3d37219a0dd7..bd60f1cd7ea6 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -1930,6 +1930,9 @@ static void intel_pstate_update_util(struct update_util_data *data, u64 time, struct cpudata *cpu = container_of(data, struct cpudata, update_util); u64 delta_ns;
+ if (smp_processor_id() != data->cpu) + return; + if (pstate_funcs.get_target_pstate == get_target_pstate_use_cpu_load) { if (flags & SCHED_CPUFREQ_IOWAIT) { cpu->iowait_boost = int_tofp(1);
On Thursday, March 09, 2017 05:15:18 PM Viresh Kumar wrote:
From: Steve Muckle smuckle.linux@gmail.com
In preparation for the scheduler cpufreq callback happening on remote CPUs, check for this case in intel_pstate which currently requires the callback run on the local CPU. Such callbacks are ignored for now.
Signed-off-by: Steve Muckle smuckle.linux@gmail.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/intel_pstate.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 3d37219a0dd7..bd60f1cd7ea6 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -1930,6 +1930,9 @@ static void intel_pstate_update_util(struct update_util_data *data, u64 time, struct cpudata *cpu = container_of(data, struct cpudata, update_util); u64 delta_ns;
- if (smp_processor_id() != data->cpu)
return;
- if (pstate_funcs.get_target_pstate == get_target_pstate_use_cpu_load) { if (flags & SCHED_CPUFREQ_IOWAIT) { cpu->iowait_boost = int_tofp(1);
This would need to be updated on top of some intel_pstate changes in the queue.
Thanks, Rafael
From: Steve Muckle smuckle.linux@gmail.com
Now that all clients properly support (or ignore) remote scheduler cpufreq callbacks, remove the restriction that such callbacks only be made in CFS on the local CPU.
Signed-off-by: Steve Muckle smuckle.linux@gmail.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/sched/fair.c | 6 ++++-- kernel/sched/sched.h | 3 ++- 2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3e88b35ac157..12db77814814 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3188,7 +3188,9 @@ static inline void set_tg_cfs_propagate(struct cfs_rq *cfs_rq) {}
static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq) { - if (&this_rq()->cfs == cfs_rq) { + struct rq *rq = rq_of(cfs_rq); + + if (&rq->cfs == cfs_rq) { /* * There are a few boundary cases this might miss but it should * get called often enough that that should (hopefully) not be @@ -3205,7 +3207,7 @@ static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq) * * See cpu_util(). */ - cpufreq_update_util(rq_of(cfs_rq), 0); + cpufreq_update_util(rq, 0); } }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 5cbf92214ad8..30c71fc3e02e 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1921,7 +1921,8 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) { struct update_util_data *data;
- data = rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data)); + data = rcu_dereference_sched(*per_cpu_ptr(&cpufreq_update_util_data, + cpu_of(rq))); if (data) data->func(data, rq_clock(rq), flags); }
On Thursday, March 09, 2017 05:15:10 PM Viresh Kumar wrote:
Hi,
This is based of the work done by Steve Muckle [1] before he left Linaro and most of the patches are still under his authorship. I have done couple of improvements (detailed in individual patches) and removed the late callback support [2] as I wasn't sure of the value it adds. We can include it separately if others feel it is required. This series is based on pm/linux-next with patches [3] and [4] applied on top of it.
With Android UI and benchmarks the latency of cpufreq response to certain scheduling events can become very critical. Currently, callbacks into schedutil are only made from the scheduler if the target CPU of the event is the same as the current CPU. This means there are certain situations where a target CPU may not run schedutil for some time.
One testcase to show this behavior is where a task starts running on CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the system is configured such that new tasks should receive maximum demand initially, this should result in CPU0 increasing frequency immediately. Because of the above mentioned limitation though this does not occur. This is verified using ftrace with the sample [5] application.
This patchset updates the scheduler to call cpufreq callbacks for remote CPUs as well and updates schedutil governor to deal with it. An additional flag is added to cpufreq policies to avoid sending IPIs to remote CPUs to update the frequency, if CPUs on the platform can change frequency of any other CPU.
This series is tested with couple of usecases (Android: hackbench, recentfling, galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey board (64 bit octa-core, single policy). Only galleryfling showed minor improvements, while others didn't had much deviation.
The reason being that this patchset only targets a corner case, where following are required to be true to improve performance and that doesn't happen too often with these tests:
- Task is migrated to another CPU.
- The task has maximum demand initially, and should take the CPU to higher OPPs.
- And the target CPU doesn't call into schedutil until the next tick, without this patchset.
From the first quick look patches [1-3/9] seem to be split out somewhat
artificially.
Any chance to fold them into the patches where the new stuff is actually used?
I'll be looking at the rest of the patchset shortly.
Thanks, Rafael
On 15-03-17, 12:45, Rafael J. Wysocki wrote:
From the first quick look patches [1-3/9] seem to be split out somewhat artificially.
Any chance to fold them into the patches where the new stuff is actually used?
I'll be looking at the rest of the patchset shortly.
I thought it would be better to keep them separate, but I can merge them to others in the next version if you want.
On Thu, Mar 16, 2017 at 4:09 AM, Viresh Kumar viresh.kumar@linaro.org wrote:
On 15-03-17, 12:45, Rafael J. Wysocki wrote:
From the first quick look patches [1-3/9] seem to be split out somewhat artificially.
Any chance to fold them into the patches where the new stuff is actually used?
I'll be looking at the rest of the patchset shortly.
I thought it would be better to keep them separate, but I can merge them to others in the next version if you want.
Yes, it is kind of more convenient to see how they are used right away.
Otherwise I need to look at two patches at the same time which is a pain on some machine form factors. :-)
Thanks, Rafael
linaro-kernel@lists.linaro.org