Hi Rafael,
This series fixes few more possible race conditions. Over that there is some non-trivial cleanup, in order to simplify code.
Preeti did review some of them before she left and shared concerns on others, all that is sorted out now.
V1->V2: - Dropped 2/10 from V1 as it wasn't required - 3/10 saw some changes due to above patch being dropped - 7/10 changed a bit as we check for pending work items by looking at shared->policy, rather than calling delayed_work_pending. We wanted to check if governor is operational or not and the new check is enough for that.
Viresh Kumar (9): cpufreq: Use __func__ to print function's name cpufreq: conservative: remove 'enable' field cpufreq: ondemand: only queue canceled works from update_sampling_rate() cpufreq: governor: Drop __gov_queue_work() cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate() cpufreq: ondemand: queue work for policy->cpus together cpufreq: ondemand: update sampling rate immidiately cpufreq: governor: Quit work-handlers early if governor is stopped cpufreq: Get rid of ->governor_enabled and its lock
drivers/cpufreq/cpufreq.c | 27 +---------- drivers/cpufreq/cpufreq_conservative.c | 38 +++++++++------ drivers/cpufreq/cpufreq_governor.c | 86 +++++++++++++++------------------- drivers/cpufreq/cpufreq_governor.h | 5 +- drivers/cpufreq/cpufreq_ondemand.c | 54 +++++++-------------- include/linux/cpufreq.h | 1 - 6 files changed, 81 insertions(+), 130 deletions(-)
Its better to use __func__ to print functions name instead of writing the name in the print statement. This also has the advantage that a change in function's name doesn't force us to change the print message as well.
Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 46251e8d30f2..f620055279f3 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2074,8 +2074,7 @@ static int __cpufreq_governor(struct cpufreq_policy *policy, if (!try_module_get(policy->governor->owner)) return -EINVAL;
- pr_debug("__cpufreq_governor for CPU %u, event %u\n", - policy->cpu, event); + pr_debug("%s: for CPU %u, event %u\n", __func__, policy->cpu, event);
mutex_lock(&cpufreq_governor_lock); if ((policy->governor_enabled && event == CPUFREQ_GOV_START)
Conservative governor has its own 'enable' field to check if conservative governor is used for a CPU or not
This can be checked by policy->governor with 'cpufreq_gov_conservative' and so this field can be dropped.
Because its not guaranteed that dbs_info->cdbs.shared will a valid pointer for all CPUs (will be NULL for CPUs that don't use ondemand/conservative governors), we can't use it anymore. Lets get policy with cpufreq_cpu_get() instead.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 34 +++++++++++++++++++++------------- drivers/cpufreq/cpufreq_governor.c | 12 +----------- drivers/cpufreq/cpufreq_governor.h | 1 - 3 files changed, 22 insertions(+), 25 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 84a1506950a7..18bfbc313e48 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -23,6 +23,19 @@
static DEFINE_PER_CPU(struct cs_cpu_dbs_info_s, cs_cpu_dbs_info);
+static int cs_cpufreq_governor_dbs(struct cpufreq_policy *policy, + unsigned int event); + +#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE +static +#endif +struct cpufreq_governor cpufreq_gov_conservative = { + .name = "conservative", + .governor = cs_cpufreq_governor_dbs, + .max_transition_latency = TRANSITION_LATENCY_LIMIT, + .owner = THIS_MODULE, +}; + static inline unsigned int get_freq_target(struct cs_dbs_tuners *cs_tuners, struct cpufreq_policy *policy) { @@ -119,12 +132,14 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, struct cpufreq_freqs *freq = data; struct cs_cpu_dbs_info_s *dbs_info = &per_cpu(cs_cpu_dbs_info, freq->cpu); - struct cpufreq_policy *policy; + struct cpufreq_policy *policy = cpufreq_cpu_get(freq->cpu);
- if (!dbs_info->enable) + if (!policy) return 0;
- policy = dbs_info->cdbs.shared->policy; + /* policy isn't governed by conservative governor */ + if (policy->governor != &cpufreq_gov_conservative) + goto policy_put;
/* * we only care if our internally tracked freq moves outside the 'valid' @@ -134,6 +149,9 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, || dbs_info->requested_freq < policy->min) dbs_info->requested_freq = freq->new;
+policy_put: + cpufreq_cpu_put(policy); + return 0; }
@@ -367,16 +385,6 @@ static int cs_cpufreq_governor_dbs(struct cpufreq_policy *policy, return cpufreq_governor_dbs(policy, &cs_dbs_cdata, event); }
-#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE -static -#endif -struct cpufreq_governor cpufreq_gov_conservative = { - .name = "conservative", - .governor = cs_cpufreq_governor_dbs, - .max_transition_latency = TRANSITION_LATENCY_LIMIT, - .owner = THIS_MODULE, -}; - static int __init cpufreq_gov_dbs_init(void) { return cpufreq_register_governor(&cpufreq_gov_conservative); diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 939197ffa4ac..750626d8fb03 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -463,7 +463,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, cdata->get_cpu_dbs_info_s(cpu);
cs_dbs_info->down_skip = 0; - cs_dbs_info->enable = 1; cs_dbs_info->requested_freq = policy->cur; } else { struct od_ops *od_ops = cdata->gov_ops; @@ -482,9 +481,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, static int cpufreq_governor_stop(struct cpufreq_policy *policy, struct dbs_data *dbs_data) { - struct common_dbs_data *cdata = dbs_data->cdata; - unsigned int cpu = policy->cpu; - struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(policy->cpu); struct cpu_common_dbs_info *shared = cdbs->shared;
/* State should be equivalent to START */ @@ -493,13 +490,6 @@ static int cpufreq_governor_stop(struct cpufreq_policy *policy,
gov_cancel_work(dbs_data, policy);
- if (cdata->governor == GOV_CONSERVATIVE) { - struct cs_cpu_dbs_info_s *cs_dbs_info = - cdata->get_cpu_dbs_info_s(cpu); - - cs_dbs_info->enable = 0; - } - shared->policy = NULL; mutex_destroy(&shared->timer_mutex); return 0; diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 50f171796632..5621bb03e874 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -170,7 +170,6 @@ struct cs_cpu_dbs_info_s { struct cpu_dbs_info cdbs; unsigned int down_skip; unsigned int requested_freq; - unsigned int enable:1; };
/* Per policy Governors sysfs tunables */
The sampling rate is updated with a call to update_sampling_rate(), and we process CPUs one by one here. While the work is canceled on per-cpu basis, it is getting queued (by mistake) for all policy->cpus.
This would result in wasting cpu cycles for queuing works which are already queued and never canceled.
This patch changes this behavior to queue work only on the cpu, for which it was canceled earlier.
To do that, replace 'modify_all' parameter to gov_queue_work() with a mask of CPUs. Also the last parameter to ->gov_dbs_timer() was named 'modify_all' earlier, but its purpose was to decide if load has to be evaluated again or not. Lets rename that to load_eval.
Fixes: 031299b3be30 ("cpufreq: governors: Avoid unnecessary per cpu timer interrupts") Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 4 ++-- drivers/cpufreq/cpufreq_governor.c | 30 ++++++++++-------------------- drivers/cpufreq/cpufreq_governor.h | 4 ++-- drivers/cpufreq/cpufreq_ondemand.c | 7 ++++--- 4 files changed, 18 insertions(+), 27 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 18bfbc313e48..1aa3bd46cea3 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -116,11 +116,11 @@ static void cs_check_cpu(int cpu, unsigned int load) }
static unsigned int cs_dbs_timer(struct cpu_dbs_info *cdbs, - struct dbs_data *dbs_data, bool modify_all) + struct dbs_data *dbs_data, bool load_eval) { struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
- if (modify_all) + if (load_eval) dbs_check_cpu(dbs_data, cdbs->shared->policy->cpu);
return delay_for_sampling_rate(cs_tuners->sampling_rate); diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 750626d8fb03..a890450711bb 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -167,7 +167,7 @@ static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data, }
void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, - unsigned int delay, bool all_cpus) + unsigned int delay, const struct cpumask *cpus) { int i;
@@ -175,19 +175,8 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, if (!policy->governor_enabled) goto out_unlock;
- if (!all_cpus) { - /* - * Use raw_smp_processor_id() to avoid preemptible warnings. - * We know that this is only called with all_cpus == false from - * works that have been queued with *_work_on() functions and - * those works are canceled during CPU_DOWN_PREPARE so they - * can't possibly run on any other CPU. - */ - __gov_queue_work(raw_smp_processor_id(), dbs_data, delay); - } else { - for_each_cpu(i, policy->cpus) - __gov_queue_work(i, dbs_data, delay); - } + for_each_cpu(i, cpus) + __gov_queue_work(i, dbs_data, delay);
out_unlock: mutex_unlock(&cpufreq_governor_lock); @@ -232,7 +221,8 @@ static void dbs_timer(struct work_struct *work) struct cpufreq_policy *policy = shared->policy; struct dbs_data *dbs_data = policy->governor_data; unsigned int sampling_rate, delay; - bool modify_all = true; + const struct cpumask *cpus; + bool load_eval;
mutex_lock(&shared->timer_mutex);
@@ -246,11 +236,11 @@ static void dbs_timer(struct work_struct *work) sampling_rate = od_tuners->sampling_rate; }
- if (!need_load_eval(cdbs->shared, sampling_rate)) - modify_all = false; + load_eval = need_load_eval(cdbs->shared, sampling_rate); + cpus = load_eval ? policy->cpus : cpumask_of(raw_smp_processor_id());
- delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, modify_all); - gov_queue_work(dbs_data, policy, delay, modify_all); + delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, load_eval); + gov_queue_work(dbs_data, policy, delay, cpus);
mutex_unlock(&shared->timer_mutex); } @@ -474,7 +464,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, }
gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate), - true); + policy->cpus); return 0; }
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 5621bb03e874..52665a0624b2 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -211,7 +211,7 @@ struct common_dbs_data { void *(*get_cpu_dbs_info_s)(int cpu); unsigned int (*gov_dbs_timer)(struct cpu_dbs_info *cdbs, struct dbs_data *dbs_data, - bool modify_all); + bool load_eval); void (*gov_check_cpu)(int cpu, unsigned int load); int (*init)(struct dbs_data *dbs_data, bool notify); void (*exit)(struct dbs_data *dbs_data, bool notify); @@ -273,7 +273,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu); int cpufreq_governor_dbs(struct cpufreq_policy *policy, struct common_dbs_data *cdata, unsigned int event); void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, - unsigned int delay, bool all_cpus); + unsigned int delay, const struct cpumask *cpus); void od_register_powersave_bias_handler(unsigned int (*f) (struct cpufreq_policy *, unsigned int, unsigned int), unsigned int powersave_bias); diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 1fa9088c84a8..2474c9c34022 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -192,7 +192,7 @@ static void od_check_cpu(int cpu, unsigned int load) }
static unsigned int od_dbs_timer(struct cpu_dbs_info *cdbs, - struct dbs_data *dbs_data, bool modify_all) + struct dbs_data *dbs_data, bool load_eval) { struct cpufreq_policy *policy = cdbs->shared->policy; unsigned int cpu = policy->cpu; @@ -201,7 +201,7 @@ static unsigned int od_dbs_timer(struct cpu_dbs_info *cdbs, struct od_dbs_tuners *od_tuners = dbs_data->tuners; int delay = 0, sample_type = dbs_info->sample_type;
- if (!modify_all) + if (!load_eval) goto max_delay;
/* Common NORMAL_SAMPLE setup */ @@ -284,7 +284,8 @@ static void update_sampling_rate(struct dbs_data *dbs_data, mutex_lock(&dbs_info->cdbs.shared->timer_mutex);
gov_queue_work(dbs_data, policy, - usecs_to_jiffies(new_rate), true); + usecs_to_jiffies(new_rate), + cpumask_of(cpu));
} mutex_unlock(&dbs_info->cdbs.shared->timer_mutex);
__gov_queue_work() isn't required anymore and can be merged with gov_queue_work(). Do it.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_governor.c | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index a890450711bb..3ddc27764e10 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -158,25 +158,20 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) } EXPORT_SYMBOL_GPL(dbs_check_cpu);
-static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data, - unsigned int delay) -{ - struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); - - mod_delayed_work_on(cpu, system_wq, &cdbs->dwork, delay); -} - void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, unsigned int delay, const struct cpumask *cpus) { - int i; + struct cpu_dbs_info *cdbs; + int cpu;
mutex_lock(&cpufreq_governor_lock); if (!policy->governor_enabled) goto out_unlock;
- for_each_cpu(i, cpus) - __gov_queue_work(i, dbs_data, delay); + for_each_cpu(cpu, cpus) { + cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); + mod_delayed_work_on(cpu, system_wq, &cdbs->dwork, delay); + }
out_unlock: mutex_unlock(&cpufreq_governor_lock);
'timer_mutex' is required to sync work-handlers of policy->cpus. update_sampling_rate() is just canceling the works and queuing them again. This isn't protecting anything at all in update_sampling_rate() and is not gonna be of any use.
Even if a work-handler is already running for a CPU, cancel_delayed_work_sync() will wait for it to finish.
Drop these unnecessary locks.
Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_ondemand.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 2474c9c34022..f1551fc7b4fd 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -267,28 +267,20 @@ static void update_sampling_rate(struct dbs_data *dbs_data, dbs_info = &per_cpu(od_cpu_dbs_info, cpu); cpufreq_cpu_put(policy);
- mutex_lock(&dbs_info->cdbs.shared->timer_mutex); - - if (!delayed_work_pending(&dbs_info->cdbs.dwork)) { - mutex_unlock(&dbs_info->cdbs.shared->timer_mutex); + if (!delayed_work_pending(&dbs_info->cdbs.dwork)) continue; - }
next_sampling = jiffies + usecs_to_jiffies(new_rate); appointed_at = dbs_info->cdbs.dwork.timer.expires;
if (time_before(next_sampling, appointed_at)) { - - mutex_unlock(&dbs_info->cdbs.shared->timer_mutex); cancel_delayed_work_sync(&dbs_info->cdbs.dwork); - mutex_lock(&dbs_info->cdbs.shared->timer_mutex);
gov_queue_work(dbs_data, policy, usecs_to_jiffies(new_rate), cpumask_of(cpu));
} - mutex_unlock(&dbs_info->cdbs.shared->timer_mutex); } }
Currently update_sampling_rate() runs over each online CPU and cancels/queues work on it. Its very inefficient for the case where a single policy manages multiple CPUs, as they can be processed together.
Also drop the unnecessary cancel_delayed_work_sync() as we are doing a mod_delayed_work_on() in gov_queue_work(), which will take care of pending works for us.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_ondemand.c | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index f1551fc7b4fd..a6f579e40ce2 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -247,40 +247,45 @@ static void update_sampling_rate(struct dbs_data *dbs_data, unsigned int new_rate) { struct od_dbs_tuners *od_tuners = dbs_data->tuners; + struct cpufreq_policy *policy; + struct od_cpu_dbs_info_s *dbs_info; + unsigned long next_sampling, appointed_at; + struct cpumask cpumask; int cpu;
+ cpumask_copy(&cpumask, cpu_online_mask); + od_tuners->sampling_rate = new_rate = max(new_rate, dbs_data->min_sampling_rate);
- for_each_online_cpu(cpu) { - struct cpufreq_policy *policy; - struct od_cpu_dbs_info_s *dbs_info; - unsigned long next_sampling, appointed_at; - + for_each_cpu(cpu, &cpumask) { policy = cpufreq_cpu_get(cpu); if (!policy) continue; + + /* clear all CPUs of this policy */ + cpumask_andnot(&cpumask, &cpumask, policy->cpus); + if (policy->governor != &cpufreq_gov_ondemand) { cpufreq_cpu_put(policy); continue; } + dbs_info = &per_cpu(od_cpu_dbs_info, cpu); cpufreq_cpu_put(policy);
- if (!delayed_work_pending(&dbs_info->cdbs.dwork)) + /* Make sure the work is not canceled on policy->cpus */ + if (!dbs_info->cdbs.shared->policy) continue;
next_sampling = jiffies + usecs_to_jiffies(new_rate); appointed_at = dbs_info->cdbs.dwork.timer.expires;
- if (time_before(next_sampling, appointed_at)) { - cancel_delayed_work_sync(&dbs_info->cdbs.dwork); - - gov_queue_work(dbs_data, policy, - usecs_to_jiffies(new_rate), - cpumask_of(cpu)); + if (!time_before(next_sampling, appointed_at)) + continue;
- } + gov_queue_work(dbs_data, policy, usecs_to_jiffies(new_rate), + policy->cpus); } }
We are immediately updating sampling rate for already queued-works, only if the new expiry is lesser than the old one.
But what about the case, where the user doesn't want frequent events and want to increase sampling time? Shouldn't we cancel the works (and so their interrupts) on all policy->cpus (which might occur very shortly).
This patch removes this special case and simplifies code by immediately updating the expiry.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_ondemand.c | 18 +----------------- 1 file changed, 1 insertion(+), 17 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index a6f579e40ce2..1a6f84b42441 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -231,17 +231,8 @@ static unsigned int od_dbs_timer(struct cpu_dbs_info *cdbs, static struct common_dbs_data od_dbs_cdata;
/** - * update_sampling_rate - update sampling rate effective immediately if needed. + * update_sampling_rate - update sampling rate immediately. * @new_rate: new sampling rate - * - * If new rate is smaller than the old, simply updating - * dbs_tuners_int.sampling_rate might not be appropriate. For example, if the - * original sampling_rate was 1 second and the requested new sampling rate is 10 - * ms because the user needs immediate reaction from ondemand governor, but not - * sure if higher frequency will be required or not, then, the governor may - * change the sampling rate too late; up to 1 second later. Thus, if we are - * reducing the sampling rate, we need to make the new value effective - * immediately. */ static void update_sampling_rate(struct dbs_data *dbs_data, unsigned int new_rate) @@ -249,7 +240,6 @@ static void update_sampling_rate(struct dbs_data *dbs_data, struct od_dbs_tuners *od_tuners = dbs_data->tuners; struct cpufreq_policy *policy; struct od_cpu_dbs_info_s *dbs_info; - unsigned long next_sampling, appointed_at; struct cpumask cpumask; int cpu;
@@ -278,12 +268,6 @@ static void update_sampling_rate(struct dbs_data *dbs_data, if (!dbs_info->cdbs.shared->policy) continue;
- next_sampling = jiffies + usecs_to_jiffies(new_rate); - appointed_at = dbs_info->cdbs.dwork.timer.expires; - - if (!time_before(next_sampling, appointed_at)) - continue; - gov_queue_work(dbs_data, policy, usecs_to_jiffies(new_rate), policy->cpus); }
cpufreq_governor_lock is abused by using it outside of cpufreq core, i.e. in cpufreq-governors. But we didn't had a solution at that point of time, and so doing that was the only acceptable solution:
6f1e4efd882e ("cpufreq: Fix timer/workqueue corruption by protecting reading governor_enabled")
The cpufreq governor core is fixed now against possible races and things are in much better shape.
cpufreq core is checking for invalid state-transitions of governors in __cpufreq_governor() with help of governor_enabled flag. The governor core is already taking care of that now and so we can get rid of those extra checks in __cpufreq_governor().
To do that, we first need to get rid of the dependency on governor_enabled flag in governor core, in gov_queue_work.
This patch is about getting rid of this dependency.
When a CPU is hot removed we'll cancel all the delayed work items via gov_cancel_work(). Normally this will just cancels a delayed timer on each CPU that the policy is managing and the work won't run. But if the work is already running, the workqueue code will wait for the work to finish before continuing to prevent the work items from re-queuing themselves like they normally do.
This scheme will work most of the time, except for the case where the work function determines that it should adjust the delay for all other CPUs that the policy is managing. If this scenario occurs, the canceling CPU will cancel its own work but queue up the other CPUs works to run.
And we will enter a situation where gov_cancel_work() has returned with work being queued on few CPUs.
To fix that in a different (non-hacky) way, set set shared->policy to false before trying to cancel the work. It should be updated within timer_mutex, which will prevent the work-handlers to start. Once the work-handlers finds that we are already trying to stop the governor, it will exit early. And that will prevent queuing of works again as well.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_governor.c | 33 +++++++++++++++++++++++---------- 1 file changed, 23 insertions(+), 10 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 3ddc27764e10..bb12acff3ba6 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -164,17 +164,10 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, struct cpu_dbs_info *cdbs; int cpu;
- mutex_lock(&cpufreq_governor_lock); - if (!policy->governor_enabled) - goto out_unlock; - for_each_cpu(cpu, cpus) { cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); mod_delayed_work_on(cpu, system_wq, &cdbs->dwork, delay); } - -out_unlock: - mutex_unlock(&cpufreq_governor_lock); } EXPORT_SYMBOL_GPL(gov_queue_work);
@@ -213,14 +206,25 @@ static void dbs_timer(struct work_struct *work) struct cpu_dbs_info *cdbs = container_of(work, struct cpu_dbs_info, dwork.work); struct cpu_common_dbs_info *shared = cdbs->shared; - struct cpufreq_policy *policy = shared->policy; - struct dbs_data *dbs_data = policy->governor_data; + struct cpufreq_policy *policy; + struct dbs_data *dbs_data; unsigned int sampling_rate, delay; const struct cpumask *cpus; bool load_eval;
mutex_lock(&shared->timer_mutex);
+ policy = shared->policy; + + /* + * Governor might already be disabled and there is no point continuing + * with the work-handler. + */ + if (!policy) + goto unlock; + + dbs_data = policy->governor_data; + if (dbs_data->cdata->governor == GOV_CONSERVATIVE) { struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
@@ -237,6 +241,7 @@ static void dbs_timer(struct work_struct *work) delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, load_eval); gov_queue_work(dbs_data, policy, delay, cpus);
+unlock: mutex_unlock(&shared->timer_mutex); }
@@ -473,9 +478,17 @@ static int cpufreq_governor_stop(struct cpufreq_policy *policy, if (!shared || !shared->policy) return -EBUSY;
+ /* + * Work-handler must see this updated, as it should not proceed any + * further after governor is disabled. And so timer_mutex is taken while + * updating this value. + */ + mutex_lock(&shared->timer_mutex); + shared->policy = NULL; + mutex_unlock(&shared->timer_mutex); + gov_cancel_work(dbs_data, policy);
- shared->policy = NULL; mutex_destroy(&shared->timer_mutex); return 0; }
Invalid state-transitions is verified by governor core now and there is no need to replicate that in cpufreq core.
Stop verifying the same in cpufreq core. That will get rid of policy->governor_enabled and cpufreq_governor_lock.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 24 ------------------------ include/linux/cpufreq.h | 1 - 2 files changed, 25 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index f620055279f3..c0d49950db01 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -102,7 +102,6 @@ static LIST_HEAD(cpufreq_governor_list); static struct cpufreq_driver *cpufreq_driver; static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data); static DEFINE_RWLOCK(cpufreq_driver_lock); -DEFINE_MUTEX(cpufreq_governor_lock);
/* Flag to suspend/resume CPUFreq governors */ static bool cpufreq_suspended; @@ -2076,21 +2075,6 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
pr_debug("%s: for CPU %u, event %u\n", __func__, policy->cpu, event);
- mutex_lock(&cpufreq_governor_lock); - if ((policy->governor_enabled && event == CPUFREQ_GOV_START) - || (!policy->governor_enabled - && (event == CPUFREQ_GOV_LIMITS || event == CPUFREQ_GOV_STOP))) { - mutex_unlock(&cpufreq_governor_lock); - return -EBUSY; - } - - if (event == CPUFREQ_GOV_STOP) - policy->governor_enabled = false; - else if (event == CPUFREQ_GOV_START) - policy->governor_enabled = true; - - mutex_unlock(&cpufreq_governor_lock); - ret = policy->governor->governor(policy, event);
if (!ret) { @@ -2098,14 +2082,6 @@ static int __cpufreq_governor(struct cpufreq_policy *policy, policy->governor->initialized++; else if (event == CPUFREQ_GOV_POLICY_EXIT) policy->governor->initialized--; - } else { - /* Restore original values */ - mutex_lock(&cpufreq_governor_lock); - if (event == CPUFREQ_GOV_STOP) - policy->governor_enabled = true; - else if (event == CPUFREQ_GOV_START) - policy->governor_enabled = false; - mutex_unlock(&cpufreq_governor_lock); }
if (((event == CPUFREQ_GOV_POLICY_INIT) && ret) || diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index bde1e567b3a9..5930c6b3a1d8 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -82,7 +82,6 @@ struct cpufreq_policy { unsigned int policy; /* see above */ struct cpufreq_governor *governor; /* see below */ void *governor_data; - bool governor_enabled; /* governor start/stop flag */ char last_governor[CPUFREQ_NAME_LEN]; /* last governor used */
struct work_struct update; /* if update_policy() needs to be
On 27-07-15, 17:58, Viresh Kumar wrote:
Hi Rafael,
This series fixes few more possible race conditions. Over that there is some non-trivial cleanup, in order to simplify code.
Preeti did review some of them before she left and shared concerns on others, all that is sorted out now.
V1->V2:
- Dropped 2/10 from V1 as it wasn't required
- 3/10 saw some changes due to above patch being dropped
- 7/10 changed a bit as we check for pending work items by looking at shared->policy, rather than calling delayed_work_pending. We wanted to check if governor is operational or not and the new check is enough for that.
Viresh Kumar (9): cpufreq: Use __func__ to print function's name cpufreq: conservative: remove 'enable' field cpufreq: ondemand: only queue canceled works from update_sampling_rate() cpufreq: governor: Drop __gov_queue_work() cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate() cpufreq: ondemand: queue work for policy->cpus together cpufreq: ondemand: update sampling rate immidiately cpufreq: governor: Quit work-handlers early if governor is stopped cpufreq: Get rid of ->governor_enabled and its lock
I thought you had some review comment for this series?
On Thursday, September 03, 2015 10:14:54 AM Viresh Kumar wrote:
On 27-07-15, 17:58, Viresh Kumar wrote:
Hi Rafael,
This series fixes few more possible race conditions. Over that there is some non-trivial cleanup, in order to simplify code.
Preeti did review some of them before she left and shared concerns on others, all that is sorted out now.
V1->V2:
- Dropped 2/10 from V1 as it wasn't required
- 3/10 saw some changes due to above patch being dropped
- 7/10 changed a bit as we check for pending work items by looking at shared->policy, rather than calling delayed_work_pending. We wanted to check if governor is operational or not and the new check is enough for that.
Viresh Kumar (9): cpufreq: Use __func__ to print function's name cpufreq: conservative: remove 'enable' field cpufreq: ondemand: only queue canceled works from update_sampling_rate() cpufreq: governor: Drop __gov_queue_work() cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate() cpufreq: ondemand: queue work for policy->cpus together cpufreq: ondemand: update sampling rate immidiately cpufreq: governor: Quit work-handlers early if governor is stopped cpufreq: Get rid of ->governor_enabled and its lock
I thought you had some review comment for this series?
Yes, I did. I'll send them later today if all goes well.
Thanks, Rafael
On Monday, July 27, 2015 05:58:06 PM Viresh Kumar wrote:
Its better to use __func__ to print functions name instead of writing the name in the print statement. This also has the advantage that a change in function's name doesn't force us to change the print message as well.
Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
OK, this is simple enough. Applied, thanks!
drivers/cpufreq/cpufreq.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 46251e8d30f2..f620055279f3 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2074,8 +2074,7 @@ static int __cpufreq_governor(struct cpufreq_policy *policy, if (!try_module_get(policy->governor->owner)) return -EINVAL;
- pr_debug("__cpufreq_governor for CPU %u, event %u\n",
policy->cpu, event);
- pr_debug("%s: for CPU %u, event %u\n", __func__, policy->cpu, event);
mutex_lock(&cpufreq_governor_lock); if ((policy->governor_enabled && event == CPUFREQ_GOV_START)
On Monday, July 27, 2015 05:58:07 PM Viresh Kumar wrote:
Conservative governor has its own 'enable' field to check if conservative governor is used for a CPU or not
This can be checked by policy->governor with 'cpufreq_gov_conservative' and so this field can be dropped.
Because its not guaranteed that dbs_info->cdbs.shared will a valid pointer for all CPUs (will be NULL for CPUs that don't use ondemand/conservative governors), we can't use it anymore. Lets get policy with cpufreq_cpu_get() instead.
But previously, if the enable bit was set, we actually new that the pointer was valid, right?
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/cpufreq_conservative.c | 34 +++++++++++++++++++++------------- drivers/cpufreq/cpufreq_governor.c | 12 +----------- drivers/cpufreq/cpufreq_governor.h | 1 - 3 files changed, 22 insertions(+), 25 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 84a1506950a7..18bfbc313e48 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -23,6 +23,19 @@ static DEFINE_PER_CPU(struct cs_cpu_dbs_info_s, cs_cpu_dbs_info); +static int cs_cpufreq_governor_dbs(struct cpufreq_policy *policy,
unsigned int event);
+#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE +static +#endif +struct cpufreq_governor cpufreq_gov_conservative = {
- .name = "conservative",
- .governor = cs_cpufreq_governor_dbs,
- .max_transition_latency = TRANSITION_LATENCY_LIMIT,
- .owner = THIS_MODULE,
+};
static inline unsigned int get_freq_target(struct cs_dbs_tuners *cs_tuners, struct cpufreq_policy *policy) { @@ -119,12 +132,14 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, struct cpufreq_freqs *freq = data; struct cs_cpu_dbs_info_s *dbs_info = &per_cpu(cs_cpu_dbs_info, freq->cpu);
- struct cpufreq_policy *policy;
- struct cpufreq_policy *policy = cpufreq_cpu_get(freq->cpu);
- if (!dbs_info->enable)
- if (!policy) return 0;
- policy = dbs_info->cdbs.shared->policy;
So here we could get to the policy directly. After the change we have to:
- acquire cpufreq_rwsem - acquire cpufreq_driver_lock - go the kobject_get on policy->kobj
and then finally drop the reference to the kobject when we're done.
So may I ask where exactly is the improvement?
- /* policy isn't governed by conservative governor */
- if (policy->governor != &cpufreq_gov_conservative)
goto policy_put;
/* * we only care if our internally tracked freq moves outside the 'valid'
Thanks, Rafael
On Monday, July 27, 2015 05:58:08 PM Viresh Kumar wrote:
The sampling rate is updated with a call to update_sampling_rate(), and we process CPUs one by one here. While the work is canceled on per-cpu basis, it is getting queued (by mistake) for all policy->cpus.
This would result in wasting cpu cycles for queuing works which are already queued and never canceled.
This patch changes this behavior to queue work only on the cpu, for which it was canceled earlier.
To do that, replace 'modify_all' parameter to gov_queue_work() with a mask of CPUs.
There really are two cases, either you pass a CPU or gov_queue_work() has to walk policy->cpus. Doing it the way you did hides that IMO.
I'd simply pass an int and use a special value to indicate that policy->cpus is to be walked.
Also the last parameter to ->gov_dbs_timer() was named 'modify_all' earlier, but its purpose was to decide if load has to be evaluated again or not. Lets rename that to load_eval.
Fixes: 031299b3be30 ("cpufreq: governors: Avoid unnecessary per cpu timer interrupts") Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/cpufreq_conservative.c | 4 ++-- drivers/cpufreq/cpufreq_governor.c | 30 ++++++++++-------------------- drivers/cpufreq/cpufreq_governor.h | 4 ++-- drivers/cpufreq/cpufreq_ondemand.c | 7 ++++--- 4 files changed, 18 insertions(+), 27 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 18bfbc313e48..1aa3bd46cea3 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -116,11 +116,11 @@ static void cs_check_cpu(int cpu, unsigned int load) } static unsigned int cs_dbs_timer(struct cpu_dbs_info *cdbs,
struct dbs_data *dbs_data, bool modify_all)
struct dbs_data *dbs_data, bool load_eval)
{ struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
- if (modify_all)
- if (load_eval) dbs_check_cpu(dbs_data, cdbs->shared->policy->cpu);
return delay_for_sampling_rate(cs_tuners->sampling_rate); diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 750626d8fb03..a890450711bb 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -167,7 +167,7 @@ static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data, } void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
unsigned int delay, bool all_cpus)
unsigned int delay, const struct cpumask *cpus)
{ int i; @@ -175,19 +175,8 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, if (!policy->governor_enabled) goto out_unlock;
- if (!all_cpus) {
/*
* Use raw_smp_processor_id() to avoid preemptible warnings.
* We know that this is only called with all_cpus == false from
* works that have been queued with *_work_on() functions and
* those works are canceled during CPU_DOWN_PREPARE so they
* can't possibly run on any other CPU.
*/
This was a useful comment and it should be moved along the logic it was supposed to explain and not just dropped.
__gov_queue_work(raw_smp_processor_id(), dbs_data, delay);
- } else {
for_each_cpu(i, policy->cpus)
__gov_queue_work(i, dbs_data, delay);
- }
- for_each_cpu(i, cpus)
__gov_queue_work(i, dbs_data, delay);
out_unlock: mutex_unlock(&cpufreq_governor_lock); @@ -232,7 +221,8 @@ static void dbs_timer(struct work_struct *work) struct cpufreq_policy *policy = shared->policy; struct dbs_data *dbs_data = policy->governor_data; unsigned int sampling_rate, delay;
- bool modify_all = true;
- const struct cpumask *cpus;
I don't think this local variable is necessary.
- bool load_eval;
mutex_lock(&shared->timer_mutex); @@ -246,11 +236,11 @@ static void dbs_timer(struct work_struct *work) sampling_rate = od_tuners->sampling_rate; }
- if (!need_load_eval(cdbs->shared, sampling_rate))
modify_all = false;
- load_eval = need_load_eval(cdbs->shared, sampling_rate);
- cpus = load_eval ? policy->cpus : cpumask_of(raw_smp_processor_id());
- delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, modify_all);
- gov_queue_work(dbs_data, policy, delay, modify_all);
- delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, load_eval);
- gov_queue_work(dbs_data, policy, delay, cpus);
mutex_unlock(&shared->timer_mutex); } @@ -474,7 +464,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, } gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate),
true);
return 0;policy->cpus);
} diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 5621bb03e874..52665a0624b2 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -211,7 +211,7 @@ struct common_dbs_data { void *(*get_cpu_dbs_info_s)(int cpu); unsigned int (*gov_dbs_timer)(struct cpu_dbs_info *cdbs, struct dbs_data *dbs_data,
bool modify_all);
void (*gov_check_cpu)(int cpu, unsigned int load); int (*init)(struct dbs_data *dbs_data, bool notify); void (*exit)(struct dbs_data *dbs_data, bool notify);bool load_eval);
@@ -273,7 +273,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu); int cpufreq_governor_dbs(struct cpufreq_policy *policy, struct common_dbs_data *cdata, unsigned int event); void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
unsigned int delay, bool all_cpus);
unsigned int delay, const struct cpumask *cpus);
void od_register_powersave_bias_handler(unsigned int (*f) (struct cpufreq_policy *, unsigned int, unsigned int), unsigned int powersave_bias); diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 1fa9088c84a8..2474c9c34022 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -192,7 +192,7 @@ static void od_check_cpu(int cpu, unsigned int load) } static unsigned int od_dbs_timer(struct cpu_dbs_info *cdbs,
struct dbs_data *dbs_data, bool modify_all)
struct dbs_data *dbs_data, bool load_eval)
{ struct cpufreq_policy *policy = cdbs->shared->policy; unsigned int cpu = policy->cpu; @@ -201,7 +201,7 @@ static unsigned int od_dbs_timer(struct cpu_dbs_info *cdbs, struct od_dbs_tuners *od_tuners = dbs_data->tuners; int delay = 0, sample_type = dbs_info->sample_type;
- if (!modify_all)
- if (!load_eval) goto max_delay;
/* Common NORMAL_SAMPLE setup */ @@ -284,7 +284,8 @@ static void update_sampling_rate(struct dbs_data *dbs_data, mutex_lock(&dbs_info->cdbs.shared->timer_mutex); gov_queue_work(dbs_data, policy,
usecs_to_jiffies(new_rate), true);
usecs_to_jiffies(new_rate),
cpumask_of(cpu));
} mutex_unlock(&dbs_info->cdbs.shared->timer_mutex);
Thanks, Rafael
On Monday, July 27, 2015 05:58:09 PM Viresh Kumar wrote:
__gov_queue_work() isn't required anymore and can be merged with gov_queue_work(). Do it.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Quite frankly I don't see the point.
I'd even remove the inline from its definition and let the compiler decide what to do with it.
drivers/cpufreq/cpufreq_governor.c | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index a890450711bb..3ddc27764e10 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -158,25 +158,20 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) } EXPORT_SYMBOL_GPL(dbs_check_cpu); -static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data,
unsigned int delay)
-{
- struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
- mod_delayed_work_on(cpu, system_wq, &cdbs->dwork, delay);
-}
void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, unsigned int delay, const struct cpumask *cpus) {
- int i;
- struct cpu_dbs_info *cdbs;
- int cpu;
mutex_lock(&cpufreq_governor_lock); if (!policy->governor_enabled) goto out_unlock;
- for_each_cpu(i, cpus)
__gov_queue_work(i, dbs_data, delay);
- for_each_cpu(cpu, cpus) {
cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
mod_delayed_work_on(cpu, system_wq, &cdbs->dwork, delay);
- }
out_unlock: mutex_unlock(&cpufreq_governor_lock);
Thanks, Rafael
On 08-09-15, 02:17, Rafael J. Wysocki wrote:
static inline unsigned int get_freq_target(struct cs_dbs_tuners *cs_tuners, struct cpufreq_policy *policy) { @@ -119,12 +132,14 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, struct cpufreq_freqs *freq = data; struct cs_cpu_dbs_info_s *dbs_info = &per_cpu(cs_cpu_dbs_info, freq->cpu);
- struct cpufreq_policy *policy;
- struct cpufreq_policy *policy = cpufreq_cpu_get(freq->cpu);
- if (!dbs_info->enable)
- if (!policy) return 0;
- policy = dbs_info->cdbs.shared->policy;
So here we could get to the policy directly. After the change we have to:
- acquire cpufreq_rwsem
- acquire cpufreq_driver_lock
- go the kobject_get on policy->kobj
Hmm, actually we can do cpufreq_cpu_get_raw() here as this is getting called from notifier and policy isn't going to get freed for sure.
And then it wouldn't be that bad.
and then finally drop the reference to the kobject when we're done.
So may I ask where exactly is the improvement?
Agree. Let me resend it quickly.
- /* policy isn't governed by conservative governor */
- if (policy->governor != &cpufreq_gov_conservative)
goto policy_put;
/* * we only care if our internally tracked freq moves outside the 'valid'
Thanks, Rafael
On Monday, July 27, 2015 05:58:11 PM Viresh Kumar wrote:
Currently update_sampling_rate() runs over each online CPU and cancels/queues work on it. Its very inefficient for the case where a single policy manages multiple CPUs, as they can be processed together.
Also drop the unnecessary cancel_delayed_work_sync() as we are doing a mod_delayed_work_on() in gov_queue_work(), which will take care of pending works for us.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/cpufreq_ondemand.c | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index f1551fc7b4fd..a6f579e40ce2 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -247,40 +247,45 @@ static void update_sampling_rate(struct dbs_data *dbs_data, unsigned int new_rate) { struct od_dbs_tuners *od_tuners = dbs_data->tuners;
- struct cpufreq_policy *policy;
- struct od_cpu_dbs_info_s *dbs_info;
- unsigned long next_sampling, appointed_at;
- struct cpumask cpumask; int cpu;
- cpumask_copy(&cpumask, cpu_online_mask);
- od_tuners->sampling_rate = new_rate = max(new_rate, dbs_data->min_sampling_rate);
- for_each_online_cpu(cpu) {
struct cpufreq_policy *policy;
struct od_cpu_dbs_info_s *dbs_info;
unsigned long next_sampling, appointed_at;
- for_each_cpu(cpu, &cpumask) { policy = cpufreq_cpu_get(cpu); if (!policy) continue;
/* clear all CPUs of this policy */
cpumask_andnot(&cpumask, &cpumask, policy->cpus);
Well, this is not exactly straightforward, but should work.
- if (policy->governor != &cpufreq_gov_ondemand) { cpufreq_cpu_put(policy); continue; }
- dbs_info = &per_cpu(od_cpu_dbs_info, cpu); cpufreq_cpu_put(policy);
if (!delayed_work_pending(&dbs_info->cdbs.dwork))
/* Make sure the work is not canceled on policy->cpus */
I'm not sure what scenario can lead to that. Care to explain?
if (!dbs_info->cdbs.shared->policy) continue;
next_sampling = jiffies + usecs_to_jiffies(new_rate); appointed_at = dbs_info->cdbs.dwork.timer.expires;
For that to work we always need to do stuff for policy->cpus in sync. Do we?
if (time_before(next_sampling, appointed_at)) {
cancel_delayed_work_sync(&dbs_info->cdbs.dwork);
gov_queue_work(dbs_data, policy,
usecs_to_jiffies(new_rate),
cpumask_of(cpu));
if (!time_before(next_sampling, appointed_at))
continue;
}
gov_queue_work(dbs_data, policy, usecs_to_jiffies(new_rate),
}policy->cpus);
}
Thanks, Rafael
Conservative governor has its own 'enable' field to check if conservative governor is used for a CPU or not
This can be checked by policy->governor with 'cpufreq_gov_conservative' and so this field can be dropped.
Because its not guaranteed that dbs_info->cdbs.shared will a valid pointer for all CPUs (will be NULL for CPUs that don't use ondemand/conservative governors), we can't use it anymore. Lets get policy with cpufreq_cpu_get_raw() instead.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- Updates: use cpufreq_cpu_get_raw() instead of cpufreq_cpu_get().
drivers/cpufreq/cpufreq.c | 3 ++- drivers/cpufreq/cpufreq_conservative.c | 31 ++++++++++++++++++------------- drivers/cpufreq/cpufreq_governor.c | 12 +----------- drivers/cpufreq/cpufreq_governor.h | 2 +- 4 files changed, 22 insertions(+), 26 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 82428099c569..2a0c2a26df11 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -239,12 +239,13 @@ int cpufreq_generic_init(struct cpufreq_policy *policy, EXPORT_SYMBOL_GPL(cpufreq_generic_init);
/* Only for cpufreq core internal use */ -static struct cpufreq_policy *cpufreq_cpu_get_raw(unsigned int cpu) +struct cpufreq_policy *cpufreq_cpu_get_raw(unsigned int cpu) { struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
return policy && cpumask_test_cpu(cpu, policy->cpus) ? policy : NULL; } +EXPORT_SYMBOL_GPL(cpufreq_cpu_get_raw);
unsigned int cpufreq_generic_get(unsigned int cpu) { diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 84a1506950a7..1fa1deb6e91f 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -23,6 +23,19 @@
static DEFINE_PER_CPU(struct cs_cpu_dbs_info_s, cs_cpu_dbs_info);
+static int cs_cpufreq_governor_dbs(struct cpufreq_policy *policy, + unsigned int event); + +#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE +static +#endif +struct cpufreq_governor cpufreq_gov_conservative = { + .name = "conservative", + .governor = cs_cpufreq_governor_dbs, + .max_transition_latency = TRANSITION_LATENCY_LIMIT, + .owner = THIS_MODULE, +}; + static inline unsigned int get_freq_target(struct cs_dbs_tuners *cs_tuners, struct cpufreq_policy *policy) { @@ -119,12 +132,14 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, struct cpufreq_freqs *freq = data; struct cs_cpu_dbs_info_s *dbs_info = &per_cpu(cs_cpu_dbs_info, freq->cpu); - struct cpufreq_policy *policy; + struct cpufreq_policy *policy = cpufreq_cpu_get_raw(freq->cpu);
- if (!dbs_info->enable) + if (!policy) return 0;
- policy = dbs_info->cdbs.shared->policy; + /* policy isn't governed by conservative governor */ + if (policy->governor != &cpufreq_gov_conservative) + return 0;
/* * we only care if our internally tracked freq moves outside the 'valid' @@ -367,16 +382,6 @@ static int cs_cpufreq_governor_dbs(struct cpufreq_policy *policy, return cpufreq_governor_dbs(policy, &cs_dbs_cdata, event); }
-#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE -static -#endif -struct cpufreq_governor cpufreq_gov_conservative = { - .name = "conservative", - .governor = cs_cpufreq_governor_dbs, - .max_transition_latency = TRANSITION_LATENCY_LIMIT, - .owner = THIS_MODULE, -}; - static int __init cpufreq_gov_dbs_init(void) { return cpufreq_register_governor(&cpufreq_gov_conservative); diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 939197ffa4ac..750626d8fb03 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -463,7 +463,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, cdata->get_cpu_dbs_info_s(cpu);
cs_dbs_info->down_skip = 0; - cs_dbs_info->enable = 1; cs_dbs_info->requested_freq = policy->cur; } else { struct od_ops *od_ops = cdata->gov_ops; @@ -482,9 +481,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, static int cpufreq_governor_stop(struct cpufreq_policy *policy, struct dbs_data *dbs_data) { - struct common_dbs_data *cdata = dbs_data->cdata; - unsigned int cpu = policy->cpu; - struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(policy->cpu); struct cpu_common_dbs_info *shared = cdbs->shared;
/* State should be equivalent to START */ @@ -493,13 +490,6 @@ static int cpufreq_governor_stop(struct cpufreq_policy *policy,
gov_cancel_work(dbs_data, policy);
- if (cdata->governor == GOV_CONSERVATIVE) { - struct cs_cpu_dbs_info_s *cs_dbs_info = - cdata->get_cpu_dbs_info_s(cpu); - - cs_dbs_info->enable = 0; - } - shared->policy = NULL; mutex_destroy(&shared->timer_mutex); return 0; diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 50f171796632..2ed3c07fea5e 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -170,7 +170,6 @@ struct cs_cpu_dbs_info_s { struct cpu_dbs_info cdbs; unsigned int down_skip; unsigned int requested_freq; - unsigned int enable:1; };
/* Per policy Governors sysfs tunables */ @@ -270,6 +269,7 @@ static ssize_t show_sampling_rate_min_gov_pol \
extern struct mutex cpufreq_governor_lock;
+struct cpufreq_policy *cpufreq_cpu_get_raw(unsigned int cpu); void dbs_check_cpu(struct dbs_data *dbs_data, int cpu); int cpufreq_governor_dbs(struct cpufreq_policy *policy, struct common_dbs_data *cdata, unsigned int event);
On 08-09-15, 03:11, Rafael J. Wysocki wrote:
There really are two cases, either you pass a CPU or gov_queue_work() has to walk policy->cpus.
Right (At least for now, we are doing just that.)
Doing it the way you did hides that IMO.
Maybe. But I see it otherwise. Adding special meaning to a variable (like int cpu == -1 being the special case to specify policy->cpus) hides things morei, as we need to look at how it is decoded finally in the routine gov_queue_work().
But if we send a mask instead, it is very clear by reading the callers site, what we are trying to do.
I'd simply pass an int and use a special value to indicate that policy->cpus is to be walked.
Like cpu == -1 thing? Or something else?
- if (!all_cpus) {
/*
* Use raw_smp_processor_id() to avoid preemptible warnings.
* We know that this is only called with all_cpus == false from
* works that have been queued with *_work_on() functions and
* those works are canceled during CPU_DOWN_PREPARE so they
* can't possibly run on any other CPU.
*/
This was a useful comment and it should be moved along the logic it was supposed to explain and not just dropped.
Sigh
__gov_queue_work(raw_smp_processor_id(), dbs_data, delay);
- } else {
for_each_cpu(i, policy->cpus)
__gov_queue_work(i, dbs_data, delay);
- }
- for_each_cpu(i, cpus)
__gov_queue_work(i, dbs_data, delay);
out_unlock: mutex_unlock(&cpufreq_governor_lock); @@ -232,7 +221,8 @@ static void dbs_timer(struct work_struct *work) struct cpufreq_policy *policy = shared->policy; struct dbs_data *dbs_data = policy->governor_data; unsigned int sampling_rate, delay;
- bool modify_all = true;
- const struct cpumask *cpus;
I don't think this local variable is necessary.
- bool load_eval;
mutex_lock(&shared->timer_mutex); @@ -246,11 +236,11 @@ static void dbs_timer(struct work_struct *work) sampling_rate = od_tuners->sampling_rate; }
- if (!need_load_eval(cdbs->shared, sampling_rate))
modify_all = false;
- load_eval = need_load_eval(cdbs->shared, sampling_rate);
- cpus = load_eval ? policy->cpus : cpumask_of(raw_smp_processor_id());
- delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, modify_all);
- gov_queue_work(dbs_data, policy, delay, modify_all);
- delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, load_eval);
- gov_queue_work(dbs_data, policy, delay, cpus);
Avoiding that local variable would have made this a little longer, but I can surely drop it :)
gov_queue_work(dbs_data, policy, delay, load_eval ? policy->cpus : cpumask_of(raw_smp_processor_id());
On 08-09-15, 03:15, Rafael J. Wysocki wrote:
On Monday, July 27, 2015 05:58:09 PM Viresh Kumar wrote:
__gov_queue_work() isn't required anymore and can be merged with gov_queue_work(). Do it.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Quite frankly I don't see the point.
But isn't that just an unnecessary wrapper ?
I'd even remove the inline from its definition and let the compiler decide what to do with it.
What if the compiler decides to link it? Why add a function call for (almost) no use?
On 08-09-15, 03:33, Rafael J. Wysocki wrote:
/* Make sure the work is not canceled on policy->cpus */
I'm not sure what scenario can lead to that. Care to explain?
CPUFREQ_GOV_STOP event called for the policy and so all its works are in canceled state.
if (!dbs_info->cdbs.shared->policy) continue;
next_sampling = jiffies + usecs_to_jiffies(new_rate); appointed_at = dbs_info->cdbs.dwork.timer.expires;
For that to work we always need to do stuff for policy->cpus in sync. Do we?
Hmm, we are not in 100% sync for sure. Will check that again.
On 08-09-15, 07:41, Viresh Kumar wrote:
next_sampling = jiffies + usecs_to_jiffies(new_rate); appointed_at = dbs_info->cdbs.dwork.timer.expires;
For that to work we always need to do stuff for policy->cpus in sync. Do we?
Hmm, we are not in 100% sync for sure. Will check that again.
On the other hand, if we decide to apply 7/9 as well, then this is anyway going to get removed :)
On Tuesday, September 08, 2015 07:30:44 AM Viresh Kumar wrote:
On 08-09-15, 03:15, Rafael J. Wysocki wrote:
On Monday, July 27, 2015 05:58:09 PM Viresh Kumar wrote:
__gov_queue_work() isn't required anymore and can be merged with gov_queue_work(). Do it.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Quite frankly I don't see the point.
But isn't that just an unnecessary wrapper ?
It isn't a wrapper, just a separation of code executed in each step of the loop. There's nothing wrong with having a separate function for that in principle.
I wouldn't make a fuss about that if that was new code even, so I don't see why we should change it.
I'd even remove the inline from its definition and let the compiler decide what to do with it.
What if the compiler decides to link it? Why add a function call for (almost) no use?
If the compiler does that, let it do it. :-)
If you think that you can outsmart the compiler people by doing such optimizations at this level manually, you're likely wrong. Serious man-hours go into making that stuff work as well as it can in compilers.
Thanks, Rafael
On Tuesday, September 08, 2015 07:28:31 AM Viresh Kumar wrote:
On 08-09-15, 03:11, Rafael J. Wysocki wrote:
There really are two cases, either you pass a CPU or gov_queue_work() has to walk policy->cpus.
Right (At least for now, we are doing just that.)
Doing it the way you did hides that IMO.
Maybe. But I see it otherwise. Adding special meaning to a variable (like int cpu == -1 being the special case to specify policy->cpus) hides things morei, as we need to look at how it is decoded finally in the routine gov_queue_work().
Oh well.
I've just realized that if you combined this patch with the [6/9], you wouldn't need to make any changes to gov_queue_work() at all, because that patch removes the case in point entirely.
Thanks, Rafael
On 09-09-15, 03:06, Rafael J. Wysocki wrote:
I've just realized that if you combined this patch with the [6/9], you wouldn't need to make any changes to gov_queue_work() at all, because that patch removes the case in point entirely.
Yeah, but then these are really two separate issues at hand and so I solved them separately. Lemme know how you want to see that and I can change :)
On Wednesday, September 09, 2015 08:00:20 AM Viresh Kumar wrote:
On 09-09-15, 03:06, Rafael J. Wysocki wrote:
I've just realized that if you combined this patch with the [6/9], you wouldn't need to make any changes to gov_queue_work() at all, because that patch removes the case in point entirely.
Yeah, but then these are really two separate issues at hand and so I solved them separately. Lemme know how you want to see that and I can change :)
I don't want to make artificial changes.
If you can address two problems in one go with one relatively simple patch, why don't you do that?
Thanks, Rafael
linaro-kernel@lists.linaro.org