Hi Rafael/Preeti,
As Preeti is still around, I wanted her to Test/Review few more patches that I had. Sorry Preeti :)
So, this one sits on top of the earlier patches [1], that fixed some serious crashes for us. And this and the earlier series are both 4.3 material.
This series fixes few more possible race conditions. Over that there is some non-trivial cleanup, in order to simplify code.
Pushed here: ssh://git@git.linaro.org/people/viresh.kumar/linux.git cpufreq/gov-locking
-- viresh
[1] lkml.kernel.org/r/cover.1434713657.git.viresh.kumar@linaro.org
Viresh Kumar (10): cpufreq: Use __func__ to print function's name cpufreq: conservative: Avoid races with transition notifier cpufreq: conservative: remove 'enable' field cpufreq: ondemand: only queue canceled works from update_sampling_rate() cpufreq: governor: Drop __gov_queue_work() cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate() cpufreq: ondemand: queue work for policy->cpus together cpufreq: ondemand: update sampling rate immidiately cpufreq: governor: Quit work-handlers early if governor is stopped cpufreq: Get rid of ->governor_enabled and its lock
drivers/cpufreq/cpufreq.c | 27 +---------- drivers/cpufreq/cpufreq_conservative.c | 38 +++++++++------ drivers/cpufreq/cpufreq_governor.c | 86 +++++++++++++++------------------- drivers/cpufreq/cpufreq_governor.h | 3 +- drivers/cpufreq/cpufreq_ondemand.c | 57 +++++++++------------- include/linux/cpufreq.h | 1 - 6 files changed, 83 insertions(+), 129 deletions(-)
Its better to use __func__ to print functions name instead of writing the name in the print statement. This also has the advantage that a change in function's name doesn't force us to change the print message as well.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 0b3c60861bdf..5863db9213aa 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2097,8 +2097,7 @@ static int __cpufreq_governor(struct cpufreq_policy *policy, if (!try_module_get(policy->governor->owner)) return -EINVAL;
- pr_debug("__cpufreq_governor for CPU %u, event %u\n", - policy->cpu, event); + pr_debug("%s: for CPU %u, event %u\n", __func__, policy->cpu, event);
mutex_lock(&cpufreq_governor_lock); if ((policy->governor_enabled && event == CPUFREQ_GOV_START)
On 06/22/2015 01:32 PM, Viresh Kumar wrote:
Its better to use __func__ to print functions name instead of writing the name in the print statement. This also has the advantage that a change in function's name doesn't force us to change the print message as well.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com
drivers/cpufreq/cpufreq.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 0b3c60861bdf..5863db9213aa 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2097,8 +2097,7 @@ static int __cpufreq_governor(struct cpufreq_policy *policy, if (!try_module_get(policy->governor->owner)) return -EINVAL;
- pr_debug("__cpufreq_governor for CPU %u, event %u\n",
policy->cpu, event);
pr_debug("%s: for CPU %u, event %u\n", __func__, policy->cpu, event);
mutex_lock(&cpufreq_governor_lock); if ((policy->governor_enabled && event == CPUFREQ_GOV_START)
It is possible that cpufreq transition notifier is called while the governor is performing its EXIT operation. If this happens, 'ccdbs' may get updated to NULL, while it is being accessed from the notifier callback. And that will result in NULL pointer dereference.
ccdbs is used here just to get cpufreq policy, which can be obtained from cpufreq_cpu_get() as well. And so the reference to ccdbs can be avoided.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 0e4154e584bf..1e3cabfb2b57 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -119,12 +119,13 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, struct cpufreq_freqs *freq = data; struct cs_cpu_dbs_info_s *dbs_info = &per_cpu(cs_cpu_dbs_info, freq->cpu); - struct cpufreq_policy *policy; + struct cpufreq_policy *policy = cpufreq_cpu_get(freq->cpu);
- if (!dbs_info->enable) + if (!policy) return 0;
- policy = dbs_info->cdbs.ccdbs->policy; + if (!dbs_info->enable) + goto policy_put;
/* * we only care if our internally tracked freq moves outside the 'valid' @@ -134,6 +135,9 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, || dbs_info->requested_freq < policy->min) dbs_info->requested_freq = freq->new;
+policy_put: + cpufreq_cpu_put(policy); + return 0; }
On 06/22/2015 01:32 PM, Viresh Kumar wrote:
It is possible that cpufreq transition notifier is called while the governor is performing its EXIT operation. If this happens, 'ccdbs'
When does this happen ? As far as I can see, cpufreq transition notifier gets called from the cpufreq kworker or when we set the cpufreq limits. And from your previous patches, an exit operation only proceeds after ensuring that no kworker is running (check on ccdbs->policy). And LIMIT operation does not run in parallel too.
Regards Preeti U Murthy
may get updated to NULL, while it is being accessed from the notifier callback. And that will result in NULL pointer dereference.
ccdbs is used here just to get cpufreq policy, which can be obtained from cpufreq_cpu_get() as well. And so the reference to ccdbs can be avoided.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/cpufreq_conservative.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 0e4154e584bf..1e3cabfb2b57 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -119,12 +119,13 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, struct cpufreq_freqs *freq = data; struct cs_cpu_dbs_info_s *dbs_info = &per_cpu(cs_cpu_dbs_info, freq->cpu);
- struct cpufreq_policy *policy;
- struct cpufreq_policy *policy = cpufreq_cpu_get(freq->cpu);
- if (!dbs_info->enable)
- if (!policy) return 0;
- policy = dbs_info->cdbs.ccdbs->policy;
if (!dbs_info->enable)
goto policy_put;
/*
- we only care if our internally tracked freq moves outside the 'valid'
@@ -134,6 +135,9 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, || dbs_info->requested_freq < policy->min) dbs_info->requested_freq = freq->new;
+policy_put:
- cpufreq_cpu_put(policy);
- return 0;
}
On 23-06-15, 21:23, Preeti U Murthy wrote:
On 06/22/2015 01:32 PM, Viresh Kumar wrote:
It is possible that cpufreq transition notifier is called while the governor is performing its EXIT operation. If this happens, 'ccdbs'
When does this happen ? As far as I can see, cpufreq transition notifier gets called from the cpufreq kworker or when we set the cpufreq limits.
Which kworker are you talking about here ? The work-handlers of ondemand/conservative governors ?
Conservative governor has registered for transition notifier and that will be called every time frequency of a CPU is updated. And that has nothing to do with the governor callbacks. These notifiers are called from the ->target() routines of the drivers.
And from your previous patches, an exit operation only proceeds after ensuring that no kworker is running (check on ccdbs->policy). And LIMIT operation does not run in parallel too.
Does it look better from the earlier description?
On 24-06-15, 06:41, Viresh Kumar wrote:
On 23-06-15, 21:23, Preeti U Murthy wrote:
On 06/22/2015 01:32 PM, Viresh Kumar wrote:
It is possible that cpufreq transition notifier is called while the governor is performing its EXIT operation. If this happens, 'ccdbs'
When does this happen ? As far as I can see, cpufreq transition notifier gets called from the cpufreq kworker or when we set the cpufreq limits.
Which kworker are you talking about here ? The work-handlers of ondemand/conservative governors ?
Conservative governor has registered for transition notifier and that will be called every time frequency of a CPU is updated. And that has nothing to do with the governor callbacks. These notifiers are called from the ->target() routines of the drivers.
And from your previous patches, an exit operation only proceeds after ensuring that no kworker is running (check on ccdbs->policy). And LIMIT operation does not run in parallel too.
Hmm, you were right. I Nack my own patch. :)
Conservative governor has its own 'enable' field to check two things: - If conservative governor is used for a CPU or not - If governor is currently enabled or not, as there can be a race around the notifier being called while it was getting unregistered from cpufreq_governor_dbs().
The first one can be checked by comparing policy->governor with 'cpufreq_gov_conservative' and the second one isn't that important. In the worst case, we will end up updating dbs_info->requested_freq. And that wouldn't do any harm.
Lets get rid of this field.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 26 +++++++++++++++----------- drivers/cpufreq/cpufreq_governor.c | 12 +----------- drivers/cpufreq/cpufreq_governor.h | 1 - 3 files changed, 16 insertions(+), 23 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 1e3cabfb2b57..f53719e5bed9 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -23,6 +23,19 @@
static DEFINE_PER_CPU(struct cs_cpu_dbs_info_s, cs_cpu_dbs_info);
+static int cs_cpufreq_governor_dbs(struct cpufreq_policy *policy, + unsigned int event); + +#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE +static +#endif +struct cpufreq_governor cpufreq_gov_conservative = { + .name = "conservative", + .governor = cs_cpufreq_governor_dbs, + .max_transition_latency = TRANSITION_LATENCY_LIMIT, + .owner = THIS_MODULE, +}; + static inline unsigned int get_freq_target(struct cs_dbs_tuners *cs_tuners, struct cpufreq_policy *policy) { @@ -124,7 +137,8 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, if (!policy) return 0;
- if (!dbs_info->enable) + /* policy isn't governed by conservative governor */ + if (policy->governor != &cpufreq_gov_conservative) goto policy_put;
/* @@ -371,16 +385,6 @@ static int cs_cpufreq_governor_dbs(struct cpufreq_policy *policy, return cpufreq_governor_dbs(policy, &cs_dbs_cdata, event); }
-#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE -static -#endif -struct cpufreq_governor cpufreq_gov_conservative = { - .name = "conservative", - .governor = cs_cpufreq_governor_dbs, - .max_transition_latency = TRANSITION_LATENCY_LIMIT, - .owner = THIS_MODULE, -}; - static int __init cpufreq_gov_dbs_init(void) { return cpufreq_register_governor(&cpufreq_gov_conservative); diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index af63402a94a9..836aefd03c1b 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -463,7 +463,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, cdata->get_cpu_dbs_info_s(cpu);
cs_dbs_info->down_skip = 0; - cs_dbs_info->enable = 1; cs_dbs_info->requested_freq = policy->cur; } else { struct od_ops *od_ops = cdata->gov_ops; @@ -482,9 +481,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, static int cpufreq_governor_stop(struct cpufreq_policy *policy, struct dbs_data *dbs_data) { - struct common_dbs_data *cdata = dbs_data->cdata; - unsigned int cpu = policy->cpu; - struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(policy->cpu); struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs;
/* State should be equivalent to START */ @@ -493,13 +490,6 @@ static int cpufreq_governor_stop(struct cpufreq_policy *policy,
gov_cancel_work(dbs_data, policy);
- if (cdata->governor == GOV_CONSERVATIVE) { - struct cs_cpu_dbs_info_s *cs_dbs_info = - cdata->get_cpu_dbs_info_s(cpu); - - cs_dbs_info->enable = 0; - } - ccdbs->policy = NULL; mutex_destroy(&ccdbs->timer_mutex); return 0; diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 2125c299c602..a0d24149f18c 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -170,7 +170,6 @@ struct cs_cpu_dbs_info_s { struct cpu_dbs_info cdbs; unsigned int down_skip; unsigned int requested_freq; - unsigned int enable:1; };
/* Per policy Governors sysfs tunables */
On 06/22/2015 01:32 PM, Viresh Kumar wrote:
Conservative governor has its own 'enable' field to check two things:
- If conservative governor is used for a CPU or not
- If governor is currently enabled or not, as there can be a race around the notifier being called while it was getting unregistered from cpufreq_governor_dbs().
The race is between changing governors in cpufreq_set_policy() and the notifier being called, isn't it ? The governor will get unregistered when we remove the cpufreq module and here too we do not set policy->governor to NULL nor set the enable bit to 0. So it does not look like we were protecting these checks against un-registering the governor.
The first one can be checked by comparing policy->governor with 'cpufreq_gov_conservative' and the second one isn't that important. In the worst case, we will end up updating dbs_info->requested_freq. And that wouldn't do any harm.
Assuming the race that exists is indeed the one that I mentioned above, it does not look like we will hit this case where we end up updating the requested_freq unnecessarily.
stop conservative governor :
policy->governor is conservative. STOP gov_cancel_work() ----> The notifiers will not run after this point. enable = 0
So while the notifiers are running, they will see that the policy->governor is conservative and that enable = 1.
start conservative governor :
START enable = 1 gov_start_work() ----> The notifiers will run after this point only.
So when they run, they will see that policy->governor = conservative and enable = 1.
Both the fields are consistent with each other when notifiers run. It thus looks like this patch will not bring about a difference in the functionality, hence harmless.
Regards Preeti U Murthy
Lets get rid of this field.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/cpufreq_conservative.c | 26 +++++++++++++++----------- drivers/cpufreq/cpufreq_governor.c | 12 +----------- drivers/cpufreq/cpufreq_governor.h | 1 - 3 files changed, 16 insertions(+), 23 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 1e3cabfb2b57..f53719e5bed9 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -23,6 +23,19 @@
static DEFINE_PER_CPU(struct cs_cpu_dbs_info_s, cs_cpu_dbs_info);
+static int cs_cpufreq_governor_dbs(struct cpufreq_policy *policy,
unsigned int event);
+#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE +static +#endif +struct cpufreq_governor cpufreq_gov_conservative = {
- .name = "conservative",
- .governor = cs_cpufreq_governor_dbs,
- .max_transition_latency = TRANSITION_LATENCY_LIMIT,
- .owner = THIS_MODULE,
+};
static inline unsigned int get_freq_target(struct cs_dbs_tuners *cs_tuners, struct cpufreq_policy *policy) { @@ -124,7 +137,8 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, if (!policy) return 0;
- if (!dbs_info->enable)
/* policy isn't governed by conservative governor */
if (policy->governor != &cpufreq_gov_conservative) goto policy_put;
/*
@@ -371,16 +385,6 @@ static int cs_cpufreq_governor_dbs(struct cpufreq_policy *policy, return cpufreq_governor_dbs(policy, &cs_dbs_cdata, event); }
-#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE -static -#endif -struct cpufreq_governor cpufreq_gov_conservative = {
- .name = "conservative",
- .governor = cs_cpufreq_governor_dbs,
- .max_transition_latency = TRANSITION_LATENCY_LIMIT,
- .owner = THIS_MODULE,
-};
static int __init cpufreq_gov_dbs_init(void) { return cpufreq_register_governor(&cpufreq_gov_conservative); diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index af63402a94a9..836aefd03c1b 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -463,7 +463,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, cdata->get_cpu_dbs_info_s(cpu);
cs_dbs_info->down_skip = 0;
cs_dbs_info->requested_freq = policy->cur; } else { struct od_ops *od_ops = cdata->gov_ops;cs_dbs_info->enable = 1;
@@ -482,9 +481,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, static int cpufreq_governor_stop(struct cpufreq_policy *policy, struct dbs_data *dbs_data) {
- struct common_dbs_data *cdata = dbs_data->cdata;
- unsigned int cpu = policy->cpu;
- struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);
struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(policy->cpu); struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs;
/* State should be equivalent to START */
@@ -493,13 +490,6 @@ static int cpufreq_governor_stop(struct cpufreq_policy *policy,
gov_cancel_work(dbs_data, policy);
- if (cdata->governor == GOV_CONSERVATIVE) {
struct cs_cpu_dbs_info_s *cs_dbs_info =
cdata->get_cpu_dbs_info_s(cpu);
cs_dbs_info->enable = 0;
- }
- ccdbs->policy = NULL; mutex_destroy(&ccdbs->timer_mutex); return 0;
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 2125c299c602..a0d24149f18c 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -170,7 +170,6 @@ struct cs_cpu_dbs_info_s { struct cpu_dbs_info cdbs; unsigned int down_skip; unsigned int requested_freq;
- unsigned int enable:1;
};
/* Per policy Governors sysfs tunables */
On 26-06-15, 11:27, Preeti U Murthy wrote:
On 06/22/2015 01:32 PM, Viresh Kumar wrote:
Conservative governor has its own 'enable' field to check two things:
- If conservative governor is used for a CPU or not
- If governor is currently enabled or not, as there can be a race around the notifier being called while it was getting unregistered from cpufreq_governor_dbs().
The race is between changing governors in cpufreq_set_policy() and the notifier being called, isn't it ? The governor will get unregistered when we remove the cpufreq module and here too we do not set policy->governor to NULL nor set the enable bit to 0. So it does not look like we were protecting these checks against un-registering the governor.
I was talking about the same race which I believed to exist in 2/10 as well.. But there is no such race it seems as we discussed yesterday. So, only the first point is what the enable field was required for.
And because of that getting NAK'd, here is the new version:
-------------------------8<-----------------
Message-Id: 4ce8fc98a46af394118237ef90ed0b80a7652cfb.1435299551.git.viresh.kumar@linaro.org From: Viresh Kumar viresh.kumar@linaro.org Date: Fri, 19 Jun 2015 11:40:14 +0530 Subject: [PATCH] cpufreq: conservative: remove 'enable' field
Conservative governor has its own 'enable' field to check if conservative governor is used for a CPU or not
This can be checked by policy->governor with 'cpufreq_gov_conservative' and so this field can be dropped.
Because its not guaranteed that dbs_info->cdbs.ccdbs will a valid pointer for all CPUs (will be NULL for CPUs that don't use ondemand/conservative governors), we can't use it anymore. Lets get policy with cpufreq_cpu_get() instead.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 34 +++++++++++++++++++++------------- drivers/cpufreq/cpufreq_governor.c | 12 +----------- drivers/cpufreq/cpufreq_governor.h | 1 - 3 files changed, 22 insertions(+), 25 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 0e4154e584bf..f53719e5bed9 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -23,6 +23,19 @@
static DEFINE_PER_CPU(struct cs_cpu_dbs_info_s, cs_cpu_dbs_info);
+static int cs_cpufreq_governor_dbs(struct cpufreq_policy *policy, + unsigned int event); + +#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE +static +#endif +struct cpufreq_governor cpufreq_gov_conservative = { + .name = "conservative", + .governor = cs_cpufreq_governor_dbs, + .max_transition_latency = TRANSITION_LATENCY_LIMIT, + .owner = THIS_MODULE, +}; + static inline unsigned int get_freq_target(struct cs_dbs_tuners *cs_tuners, struct cpufreq_policy *policy) { @@ -119,12 +132,14 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, struct cpufreq_freqs *freq = data; struct cs_cpu_dbs_info_s *dbs_info = &per_cpu(cs_cpu_dbs_info, freq->cpu); - struct cpufreq_policy *policy; + struct cpufreq_policy *policy = cpufreq_cpu_get(freq->cpu);
- if (!dbs_info->enable) + if (!policy) return 0;
- policy = dbs_info->cdbs.ccdbs->policy; + /* policy isn't governed by conservative governor */ + if (policy->governor != &cpufreq_gov_conservative) + goto policy_put;
/* * we only care if our internally tracked freq moves outside the 'valid' @@ -134,6 +149,9 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, || dbs_info->requested_freq < policy->min) dbs_info->requested_freq = freq->new;
+policy_put: + cpufreq_cpu_put(policy); + return 0; }
@@ -367,16 +385,6 @@ static int cs_cpufreq_governor_dbs(struct cpufreq_policy *policy, return cpufreq_governor_dbs(policy, &cs_dbs_cdata, event); }
-#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE -static -#endif -struct cpufreq_governor cpufreq_gov_conservative = { - .name = "conservative", - .governor = cs_cpufreq_governor_dbs, - .max_transition_latency = TRANSITION_LATENCY_LIMIT, - .owner = THIS_MODULE, -}; - static int __init cpufreq_gov_dbs_init(void) { return cpufreq_register_governor(&cpufreq_gov_conservative); diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index af63402a94a9..836aefd03c1b 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -463,7 +463,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, cdata->get_cpu_dbs_info_s(cpu);
cs_dbs_info->down_skip = 0; - cs_dbs_info->enable = 1; cs_dbs_info->requested_freq = policy->cur; } else { struct od_ops *od_ops = cdata->gov_ops; @@ -482,9 +481,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, static int cpufreq_governor_stop(struct cpufreq_policy *policy, struct dbs_data *dbs_data) { - struct common_dbs_data *cdata = dbs_data->cdata; - unsigned int cpu = policy->cpu; - struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(policy->cpu); struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs;
/* State should be equivalent to START */ @@ -493,13 +490,6 @@ static int cpufreq_governor_stop(struct cpufreq_policy *policy,
gov_cancel_work(dbs_data, policy);
- if (cdata->governor == GOV_CONSERVATIVE) { - struct cs_cpu_dbs_info_s *cs_dbs_info = - cdata->get_cpu_dbs_info_s(cpu); - - cs_dbs_info->enable = 0; - } - ccdbs->policy = NULL; mutex_destroy(&ccdbs->timer_mutex); return 0; diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 2125c299c602..a0d24149f18c 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -170,7 +170,6 @@ struct cs_cpu_dbs_info_s { struct cpu_dbs_info cdbs; unsigned int down_skip; unsigned int requested_freq; - unsigned int enable:1; };
/* Per policy Governors sysfs tunables */
The sampling rate is updated with a call to update_sampling_rate(), and we process CPUs one by one here. While the work is canceled on per-cpu basis, it is getting enqueued (by mistake) for all policy->cpus.
That would result in wasting cpu cycles for queuing works which are already queued and never canceled.
This patch is about queuing work only on the cpu for which it was canceled earlier.
gov_queue_work() was missing the CPU parameter and it's better to club 'modify_all' and the new 'cpu' parameter to a 'cpus' mask. And so this patch also changes the prototype of gov_queue_work() and fixes its caller sites.
Fixes: 031299b3be30 ("cpufreq: governors: Avoid unnecessary per cpu timer interrupts") Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 4 ++-- drivers/cpufreq/cpufreq_governor.c | 30 ++++++++++-------------------- drivers/cpufreq/cpufreq_governor.h | 2 +- drivers/cpufreq/cpufreq_ondemand.c | 7 ++++--- 4 files changed, 17 insertions(+), 26 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index f53719e5bed9..2ab53d96c078 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -116,11 +116,11 @@ static void cs_check_cpu(int cpu, unsigned int load) }
static unsigned int cs_dbs_timer(struct cpu_dbs_info *cdbs, - struct dbs_data *dbs_data, bool modify_all) + struct dbs_data *dbs_data, bool load_eval) { struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
- if (modify_all) + if (load_eval) dbs_check_cpu(dbs_data, cdbs->ccdbs->policy->cpu);
return delay_for_sampling_rate(cs_tuners->sampling_rate); diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 836aefd03c1b..416a8c5665dd 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -167,7 +167,7 @@ static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data, }
void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, - unsigned int delay, bool all_cpus) + unsigned int delay, const struct cpumask *cpus) { int i;
@@ -175,19 +175,8 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, if (!policy->governor_enabled) goto out_unlock;
- if (!all_cpus) { - /* - * Use raw_smp_processor_id() to avoid preemptible warnings. - * We know that this is only called with all_cpus == false from - * works that have been queued with *_work_on() functions and - * those works are canceled during CPU_DOWN_PREPARE so they - * can't possibly run on any other CPU. - */ - __gov_queue_work(raw_smp_processor_id(), dbs_data, delay); - } else { - for_each_cpu(i, policy->cpus) - __gov_queue_work(i, dbs_data, delay); - } + for_each_cpu(i, cpus) + __gov_queue_work(i, dbs_data, delay);
out_unlock: mutex_unlock(&cpufreq_governor_lock); @@ -232,7 +221,8 @@ static void dbs_timer(struct work_struct *work) struct cpufreq_policy *policy = ccdbs->policy; struct dbs_data *dbs_data = policy->governor_data; unsigned int sampling_rate, delay; - bool modify_all = true; + const struct cpumask *cpus; + bool load_eval;
mutex_lock(&ccdbs->timer_mutex);
@@ -246,11 +236,11 @@ static void dbs_timer(struct work_struct *work) sampling_rate = od_tuners->sampling_rate; }
- if (!need_load_eval(cdbs->ccdbs, sampling_rate)) - modify_all = false; + load_eval = need_load_eval(cdbs->ccdbs, sampling_rate); + cpus = load_eval ? policy->cpus : cpumask_of(raw_smp_processor_id());
- delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, modify_all); - gov_queue_work(dbs_data, policy, delay, modify_all); + delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, load_eval); + gov_queue_work(dbs_data, policy, delay, cpus);
mutex_unlock(&ccdbs->timer_mutex); } @@ -474,7 +464,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, }
gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate), - true); + policy->cpus); return 0; }
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index a0d24149f18c..dc2ad8a427f3 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -273,7 +273,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu); int cpufreq_governor_dbs(struct cpufreq_policy *policy, struct common_dbs_data *cdata, unsigned int event); void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, - unsigned int delay, bool all_cpus); + unsigned int delay, const struct cpumask *cpus); void od_register_powersave_bias_handler(unsigned int (*f) (struct cpufreq_policy *, unsigned int, unsigned int), unsigned int powersave_bias); diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 11db20079fc6..774bbddae2c9 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -192,7 +192,7 @@ static void od_check_cpu(int cpu, unsigned int load) }
static unsigned int od_dbs_timer(struct cpu_dbs_info *cdbs, - struct dbs_data *dbs_data, bool modify_all) + struct dbs_data *dbs_data, bool load_eval) { struct cpufreq_policy *policy = cdbs->ccdbs->policy; unsigned int cpu = policy->cpu; @@ -201,7 +201,7 @@ static unsigned int od_dbs_timer(struct cpu_dbs_info *cdbs, struct od_dbs_tuners *od_tuners = dbs_data->tuners; int delay = 0, sample_type = dbs_info->sample_type;
- if (!modify_all) + if (!load_eval) goto max_delay;
/* Common NORMAL_SAMPLE setup */ @@ -284,7 +284,8 @@ static void update_sampling_rate(struct dbs_data *dbs_data, mutex_lock(&dbs_info->cdbs.ccdbs->timer_mutex);
gov_queue_work(dbs_data, policy, - usecs_to_jiffies(new_rate), true); + usecs_to_jiffies(new_rate), + cpumask_of(cpu));
} mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex);
On 06/22/2015 01:32 PM, Viresh Kumar wrote:
The sampling rate is updated with a call to update_sampling_rate(), and we process CPUs one by one here. While the work is canceled on per-cpu basis, it is getting enqueued (by mistake) for all policy->cpus.
That would result in wasting cpu cycles for queuing works which are already queued and never canceled.
This patch is about queuing work only on the cpu for which it was canceled earlier.
gov_queue_work() was missing the CPU parameter and it's better to club 'modify_all' and the new 'cpu' parameter to a 'cpus' mask. And so this patch also changes the prototype of gov_queue_work() and fixes its caller sites.
This looks good, except I did not understand the motivation to change the 'modify_all' to 'load_eval'. Neither is saying the purpose better than the other.
Regards Preeti U Murthy
Fixes: 031299b3be30 ("cpufreq: governors: Avoid unnecessary per cpu timer interrupts") Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/cpufreq_conservative.c | 4 ++-- drivers/cpufreq/cpufreq_governor.c | 30 ++++++++++-------------------- drivers/cpufreq/cpufreq_governor.h | 2 +- drivers/cpufreq/cpufreq_ondemand.c | 7 ++++--- 4 files changed, 17 insertions(+), 26 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index f53719e5bed9..2ab53d96c078 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -116,11 +116,11 @@ static void cs_check_cpu(int cpu, unsigned int load) }
static unsigned int cs_dbs_timer(struct cpu_dbs_info *cdbs,
struct dbs_data *dbs_data, bool modify_all)
struct dbs_data *dbs_data, bool load_eval)
{ struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
- if (modify_all)
if (load_eval) dbs_check_cpu(dbs_data, cdbs->ccdbs->policy->cpu);
return delay_for_sampling_rate(cs_tuners->sampling_rate);
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 836aefd03c1b..416a8c5665dd 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -167,7 +167,7 @@ static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data, }
void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
unsigned int delay, bool all_cpus)
unsigned int delay, const struct cpumask *cpus)
{ int i;
@@ -175,19 +175,8 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, if (!policy->governor_enabled) goto out_unlock;
- if (!all_cpus) {
/*
* Use raw_smp_processor_id() to avoid preemptible warnings.
* We know that this is only called with all_cpus == false from
* works that have been queued with *_work_on() functions and
* those works are canceled during CPU_DOWN_PREPARE so they
* can't possibly run on any other CPU.
*/
__gov_queue_work(raw_smp_processor_id(), dbs_data, delay);
- } else {
for_each_cpu(i, policy->cpus)
__gov_queue_work(i, dbs_data, delay);
- }
- for_each_cpu(i, cpus)
__gov_queue_work(i, dbs_data, delay);
out_unlock: mutex_unlock(&cpufreq_governor_lock); @@ -232,7 +221,8 @@ static void dbs_timer(struct work_struct *work) struct cpufreq_policy *policy = ccdbs->policy; struct dbs_data *dbs_data = policy->governor_data; unsigned int sampling_rate, delay;
- bool modify_all = true;
const struct cpumask *cpus;
bool load_eval;
mutex_lock(&ccdbs->timer_mutex);
@@ -246,11 +236,11 @@ static void dbs_timer(struct work_struct *work) sampling_rate = od_tuners->sampling_rate; }
- if (!need_load_eval(cdbs->ccdbs, sampling_rate))
modify_all = false;
- load_eval = need_load_eval(cdbs->ccdbs, sampling_rate);
- cpus = load_eval ? policy->cpus : cpumask_of(raw_smp_processor_id());
- delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, modify_all);
- gov_queue_work(dbs_data, policy, delay, modify_all);
delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, load_eval);
gov_queue_work(dbs_data, policy, delay, cpus);
mutex_unlock(&ccdbs->timer_mutex);
} @@ -474,7 +464,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, }
gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate),
true);
return 0;policy->cpus);
}
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index a0d24149f18c..dc2ad8a427f3 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -273,7 +273,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu); int cpufreq_governor_dbs(struct cpufreq_policy *policy, struct common_dbs_data *cdata, unsigned int event); void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
unsigned int delay, bool all_cpus);
unsigned int delay, const struct cpumask *cpus);
void od_register_powersave_bias_handler(unsigned int (*f) (struct cpufreq_policy *, unsigned int, unsigned int), unsigned int powersave_bias); diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 11db20079fc6..774bbddae2c9 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -192,7 +192,7 @@ static void od_check_cpu(int cpu, unsigned int load) }
static unsigned int od_dbs_timer(struct cpu_dbs_info *cdbs,
struct dbs_data *dbs_data, bool modify_all)
struct dbs_data *dbs_data, bool load_eval)
{ struct cpufreq_policy *policy = cdbs->ccdbs->policy; unsigned int cpu = policy->cpu; @@ -201,7 +201,7 @@ static unsigned int od_dbs_timer(struct cpu_dbs_info *cdbs, struct od_dbs_tuners *od_tuners = dbs_data->tuners; int delay = 0, sample_type = dbs_info->sample_type;
- if (!modify_all)
if (!load_eval) goto max_delay;
/* Common NORMAL_SAMPLE setup */
@@ -284,7 +284,8 @@ static void update_sampling_rate(struct dbs_data *dbs_data, mutex_lock(&dbs_info->cdbs.ccdbs->timer_mutex);
gov_queue_work(dbs_data, policy,
usecs_to_jiffies(new_rate), true);
usecs_to_jiffies(new_rate),
cpumask_of(cpu));
} mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex);
On 26-06-15, 12:20, Preeti U Murthy wrote:
On 06/22/2015 01:32 PM, Viresh Kumar wrote:
The sampling rate is updated with a call to update_sampling_rate(), and we process CPUs one by one here. While the work is canceled on per-cpu basis, it is getting enqueued (by mistake) for all policy->cpus.
That would result in wasting cpu cycles for queuing works which are already queued and never canceled.
This patch is about queuing work only on the cpu for which it was canceled earlier.
gov_queue_work() was missing the CPU parameter and it's better to club 'modify_all' and the new 'cpu' parameter to a 'cpus' mask. And so this patch also changes the prototype of gov_queue_work() and fixes its caller sites.
This looks good, except I did not understand the motivation to change the 'modify_all' to 'load_eval'. Neither is saying the purpose better than the other.
modify_all was used to check if we need to queue work on all CPUs or a single cpu. But now we pass the cpumask and that name isn't valid anymore. The only other thing we do with help of modify_all is evaluating the load again and so is named load_eval.
I think the purpose is very much clear with load_eval now.
__gov_queue_work() isn't required anymore and can be merged with gov_queue_work(). Do it.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_governor.c | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 416a8c5665dd..ac288f0efb06 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -158,25 +158,20 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) } EXPORT_SYMBOL_GPL(dbs_check_cpu);
-static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data, - unsigned int delay) -{ - struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); - - mod_delayed_work_on(cpu, system_wq, &cdbs->dwork, delay); -} - void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, unsigned int delay, const struct cpumask *cpus) { - int i; + struct cpu_dbs_info *cdbs; + int cpu;
mutex_lock(&cpufreq_governor_lock); if (!policy->governor_enabled) goto out_unlock;
- for_each_cpu(i, cpus) - __gov_queue_work(i, dbs_data, delay); + for_each_cpu(cpu, cpus) { + cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); + mod_delayed_work_on(cpu, system_wq, &cdbs->dwork, delay); + }
out_unlock: mutex_unlock(&cpufreq_governor_lock);
On 06/22/2015 01:32 PM, Viresh Kumar wrote:
__gov_queue_work() isn't required anymore and can be merged with gov_queue_work(). Do it.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/cpufreq_governor.c | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 416a8c5665dd..ac288f0efb06 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -158,25 +158,20 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) } EXPORT_SYMBOL_GPL(dbs_check_cpu);
-static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data,
unsigned int delay)
-{
- struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
- mod_delayed_work_on(cpu, system_wq, &cdbs->dwork, delay);
-}
Looking at this patch I feel now that, you can simply make the above function non static and call the same from update_sampling_rate() in patch[04/10] It already gives scope to pass in the cpu parameter, if not the mask. Anyway you want to pass in a single cpu from update_sampling_rate(). This will avoid having to change the other call sites of gov_queue_work().
I also see elsewhere in the kernel where a fn() and __fn() are sometimes both non-static, with __fn() giving you the additional benefit of passing a critical parameter. Eg: __hrtimer_start_range_ns() and hrtimer_start_range_ns().
Regards Preeti U Murthy
void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, unsigned int delay, const struct cpumask *cpus) {
- int i;
struct cpu_dbs_info *cdbs;
int cpu;
mutex_lock(&cpufreq_governor_lock); if (!policy->governor_enabled) goto out_unlock;
- for_each_cpu(i, cpus)
__gov_queue_work(i, dbs_data, delay);
- for_each_cpu(cpu, cpus) {
cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
mod_delayed_work_on(cpu, system_wq, &cdbs->dwork, delay);
- }
out_unlock: mutex_unlock(&cpufreq_governor_lock);
On 26-06-15, 12:33, Preeti U Murthy wrote:
On 06/22/2015 01:32 PM, Viresh Kumar wrote:
__gov_queue_work() isn't required anymore and can be merged with gov_queue_work(). Do it.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/cpufreq_governor.c | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 416a8c5665dd..ac288f0efb06 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -158,25 +158,20 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) } EXPORT_SYMBOL_GPL(dbs_check_cpu);
-static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data,
unsigned int delay)
-{
- struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
- mod_delayed_work_on(cpu, system_wq, &cdbs->dwork, delay);
-}
Looking at this patch I feel now that, you can simply make the above function non static and call the same from update_sampling_rate() in patch[04/10] It already gives scope to pass in the cpu parameter, if not the mask. Anyway you want to pass in a single cpu from update_sampling_rate(). This will avoid having to change the other call sites of gov_queue_work().
I also see elsewhere in the kernel where a fn() and __fn() are sometimes both non-static, with __fn() giving you the additional benefit of passing a critical parameter. Eg: __hrtimer_start_range_ns() and hrtimer_start_range_ns().
Keeping two functions that way isn't a problem and is beneficial many a times. But that doesn't fit here.
The above function doesn't have the necessary lock and so was never used directly. And there is no need to keep two routines now that only one is sufficient.
'timer_mutex' is required to sync work-handlers of policy->cpus. update_sampling_rate() is just canceling the works and queuing them again. This isn't protecting anything at all in update_sampling_rate() and is not gonna be of any use.
Even if a work-handler is already running for a CPU, cancel_delayed_work_sync() will wait for it to finish.
Drop these unnecessary locks.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_ondemand.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 774bbddae2c9..841e1fa96ee7 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -267,28 +267,20 @@ static void update_sampling_rate(struct dbs_data *dbs_data, dbs_info = &per_cpu(od_cpu_dbs_info, cpu); cpufreq_cpu_put(policy);
- mutex_lock(&dbs_info->cdbs.ccdbs->timer_mutex); - - if (!delayed_work_pending(&dbs_info->cdbs.dwork)) { - mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex); + if (!delayed_work_pending(&dbs_info->cdbs.dwork)) continue; - }
next_sampling = jiffies + usecs_to_jiffies(new_rate); appointed_at = dbs_info->cdbs.dwork.timer.expires;
if (time_before(next_sampling, appointed_at)) { - - mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex); cancel_delayed_work_sync(&dbs_info->cdbs.dwork); - mutex_lock(&dbs_info->cdbs.ccdbs->timer_mutex);
gov_queue_work(dbs_data, policy, usecs_to_jiffies(new_rate), cpumask_of(cpu));
} - mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex); } }
On 06/22/2015 01:32 PM, Viresh Kumar wrote:
'timer_mutex' is required to sync work-handlers of policy->cpus. update_sampling_rate() is just canceling the works and queuing them again. This isn't protecting anything at all in update_sampling_rate() and is not gonna be of any use.
Even if a work-handler is already running for a CPU, cancel_delayed_work_sync() will wait for it to finish.
Drop these unnecessary locks.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com
drivers/cpufreq/cpufreq_ondemand.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 774bbddae2c9..841e1fa96ee7 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -267,28 +267,20 @@ static void update_sampling_rate(struct dbs_data *dbs_data, dbs_info = &per_cpu(od_cpu_dbs_info, cpu); cpufreq_cpu_put(policy);
mutex_lock(&dbs_info->cdbs.ccdbs->timer_mutex);
if (!delayed_work_pending(&dbs_info->cdbs.dwork)) {
mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex);
if (!delayed_work_pending(&dbs_info->cdbs.dwork)) continue;
}
next_sampling = jiffies + usecs_to_jiffies(new_rate); appointed_at = dbs_info->cdbs.dwork.timer.expires;
if (time_before(next_sampling, appointed_at)) {
mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex); cancel_delayed_work_sync(&dbs_info->cdbs.dwork);
mutex_lock(&dbs_info->cdbs.ccdbs->timer_mutex); gov_queue_work(dbs_data, policy, usecs_to_jiffies(new_rate), cpumask_of(cpu));
}
mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex);
}
}
Currently update_sampling_rate() runs over each online CPU and cancels/queues work on it. Its very inefficient for the case where a single policy manages multiple CPUs, as they can be processed together.
Also drop the unnecessary cancel_delayed_work_sync() as we are doing a mod_delayed_work_on() in gov_queue_work(), which will take care of pending works for us.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_ondemand.c | 32 ++++++++++++++++++++------------ 1 file changed, 20 insertions(+), 12 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 841e1fa96ee7..cfecd3b67cb3 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -247,40 +247,48 @@ static void update_sampling_rate(struct dbs_data *dbs_data, unsigned int new_rate) { struct od_dbs_tuners *od_tuners = dbs_data->tuners; + struct cpufreq_policy *policy; + struct od_cpu_dbs_info_s *dbs_info; + unsigned long next_sampling, appointed_at; + struct cpumask cpumask; int cpu;
+ cpumask_copy(&cpumask, cpu_online_mask); + od_tuners->sampling_rate = new_rate = max(new_rate, dbs_data->min_sampling_rate);
- for_each_online_cpu(cpu) { - struct cpufreq_policy *policy; - struct od_cpu_dbs_info_s *dbs_info; - unsigned long next_sampling, appointed_at; - + for_each_cpu(cpu, &cpumask) { policy = cpufreq_cpu_get(cpu); if (!policy) continue; + + /* clear all CPUs of this policy */ + cpumask_andnot(&cpumask, &cpumask, policy->cpus); + if (policy->governor != &cpufreq_gov_ondemand) { cpufreq_cpu_put(policy); continue; } + dbs_info = &per_cpu(od_cpu_dbs_info, cpu); cpufreq_cpu_put(policy);
+ /* + * Checking this for any CPU of the policy is fine. As either + * all would have queued work or none. + */ if (!delayed_work_pending(&dbs_info->cdbs.dwork)) continue;
next_sampling = jiffies + usecs_to_jiffies(new_rate); appointed_at = dbs_info->cdbs.dwork.timer.expires;
- if (time_before(next_sampling, appointed_at)) { - cancel_delayed_work_sync(&dbs_info->cdbs.dwork); - - gov_queue_work(dbs_data, policy, - usecs_to_jiffies(new_rate), - cpumask_of(cpu)); + if (!time_before(next_sampling, appointed_at)) + continue;
- } + gov_queue_work(dbs_data, policy, usecs_to_jiffies(new_rate), + policy->cpus); } }
On 06/22/2015 01:32 PM, Viresh Kumar wrote:
Currently update_sampling_rate() runs over each online CPU and cancels/queues work on it. Its very inefficient for the case where a single policy manages multiple CPUs, as they can be processed together.
Also drop the unnecessary cancel_delayed_work_sync() as we are doing a mod_delayed_work_on() in gov_queue_work(), which will take care of pending works for us.
This looks fine, except for one point. See below:
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/cpufreq_ondemand.c | 32 ++++++++++++++++++++------------ 1 file changed, 20 insertions(+), 12 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 841e1fa96ee7..cfecd3b67cb3 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -247,40 +247,48 @@ static void update_sampling_rate(struct dbs_data *dbs_data, unsigned int new_rate) { struct od_dbs_tuners *od_tuners = dbs_data->tuners;
struct cpufreq_policy *policy;
struct od_cpu_dbs_info_s *dbs_info;
unsigned long next_sampling, appointed_at;
struct cpumask cpumask; int cpu;
cpumask_copy(&cpumask, cpu_online_mask);
od_tuners->sampling_rate = new_rate = max(new_rate, dbs_data->min_sampling_rate);
- for_each_online_cpu(cpu) {
struct cpufreq_policy *policy;
struct od_cpu_dbs_info_s *dbs_info;
unsigned long next_sampling, appointed_at;
for_each_cpu(cpu, &cpumask) { policy = cpufreq_cpu_get(cpu); if (!policy) continue;
/* clear all CPUs of this policy */
cpumask_andnot(&cpumask, &cpumask, policy->cpus);
if (policy->governor != &cpufreq_gov_ondemand) { cpufreq_cpu_put(policy); continue; }
dbs_info = &per_cpu(od_cpu_dbs_info, cpu); cpufreq_cpu_put(policy);
/*
* Checking this for any CPU of the policy is fine. As either
* all would have queued work or none.
Are you sure that the state of the work will be the same across all policy cpus ? 'Pending' only refers to twork awaiting for the timer to fire and then queue itself on the runqueue right ? On some of the policy->cpus, timers may be yet to fire, while on others it might already have ?
*/
if (!delayed_work_pending(&dbs_info->cdbs.dwork)) continue;
next_sampling = jiffies + usecs_to_jiffies(new_rate); appointed_at = dbs_info->cdbs.dwork.timer.expires;
if (time_before(next_sampling, appointed_at)) {
cancel_delayed_work_sync(&dbs_info->cdbs.dwork);
gov_queue_work(dbs_data, policy,
usecs_to_jiffies(new_rate),
cpumask_of(cpu));
if (!time_before(next_sampling, appointed_at))
continue;
}
gov_queue_work(dbs_data, policy, usecs_to_jiffies(new_rate),
}policy->cpus);
}
Regards Preeti U Murthy
On 26-06-15, 13:58, Preeti U Murthy wrote:
/*
* Checking this for any CPU of the policy is fine. As either
* all would have queued work or none.
Are you sure that the state of the work will be the same across all policy cpus ? 'Pending' only refers to twork awaiting for the timer to fire and then queue itself on the runqueue right ? On some of the policy->cpus, timers may be yet to fire, while on others it might already have ?
I think a better way to check this is to check if the governor is stopped or not. i.e. by checking ccdbs->policy. Will fix that.
We are immediately updating sampling rate for already queued-works, only if the new expiry is lesser than the old one.
But what about the case, where the user doesn't want frequent events and want to increase sampling time? Shouldn't we cancel the works (and so their interrupts) on all policy->cpus (which might occur very shortly).
This patch removes this special case and simplifies code by immediately updating the expiry.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_ondemand.c | 18 +----------------- 1 file changed, 1 insertion(+), 17 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index cfecd3b67cb3..98ad38a350b2 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -231,17 +231,8 @@ static unsigned int od_dbs_timer(struct cpu_dbs_info *cdbs, static struct common_dbs_data od_dbs_cdata;
/** - * update_sampling_rate - update sampling rate effective immediately if needed. + * update_sampling_rate - update sampling rate immediately. * @new_rate: new sampling rate - * - * If new rate is smaller than the old, simply updating - * dbs_tuners_int.sampling_rate might not be appropriate. For example, if the - * original sampling_rate was 1 second and the requested new sampling rate is 10 - * ms because the user needs immediate reaction from ondemand governor, but not - * sure if higher frequency will be required or not, then, the governor may - * change the sampling rate too late; up to 1 second later. Thus, if we are - * reducing the sampling rate, we need to make the new value effective - * immediately. */ static void update_sampling_rate(struct dbs_data *dbs_data, unsigned int new_rate) @@ -249,7 +240,6 @@ static void update_sampling_rate(struct dbs_data *dbs_data, struct od_dbs_tuners *od_tuners = dbs_data->tuners; struct cpufreq_policy *policy; struct od_cpu_dbs_info_s *dbs_info; - unsigned long next_sampling, appointed_at; struct cpumask cpumask; int cpu;
@@ -281,12 +271,6 @@ static void update_sampling_rate(struct dbs_data *dbs_data, if (!delayed_work_pending(&dbs_info->cdbs.dwork)) continue;
- next_sampling = jiffies + usecs_to_jiffies(new_rate); - appointed_at = dbs_info->cdbs.dwork.timer.expires; - - if (!time_before(next_sampling, appointed_at)) - continue; - gov_queue_work(dbs_data, policy, usecs_to_jiffies(new_rate), policy->cpus); }
cpufreq_governor_lock is abused by using it outside of cpufreq core, i.e. in cpufreq-governors. But we didn't had a solution at that point of time, and so doing that was the only acceptable solution:
6f1e4efd882e ("cpufreq: Fix timer/workqueue corruption by protecting reading governor_enabled")
The cpufreq governor core is fixed now against possible races and things are in much better shape.
cpufreq core is checking for invalid state-transitions of governors in __cpufreq_governor() with help of governor_enabled flag. The governor core is already taking care of that now and so we can get rid of those extra checks in __cpufreq_governor().
To do that, we first need to get rid of the dependency on governor_enabled flag in governor core, in gov_queue_work.
This patch is about getting rid of this dependency.
When a CPU is hot removed we'll cancel all the delayed work items via gov_cancel_work(). Normally this will just cancels a delayed timer on each CPU that the policy is managing and the work won't run. But if the work is already running, the workqueue code will wait for the work to finish before continuing to prevent the work items from re-queuing themselves like they normally do.
This scheme will work most of the time, except for the case where the work function determines that it should adjust the delay for all other CPUs that the policy is managing. If this scenario occurs, the canceling CPU will cancel its own work but queue up the other CPUs works to run.
And we will enter a situation where gov_cancel_work() has returned with work being queued on few CPUs.
To fix that in a different (non-hacky) way, set set ccdbs->policy to false before trying to cancel the work. It should be updated within timer_mutex, which will prevent the work-handlers to start. Once the work-handlers finds that we are already trying to stop the governor, it will exit early. And that will prevent queuing of works again as well.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_governor.c | 33 +++++++++++++++++++++++---------- 1 file changed, 23 insertions(+), 10 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index ac288f0efb06..38748e219cae 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -164,17 +164,10 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, struct cpu_dbs_info *cdbs; int cpu;
- mutex_lock(&cpufreq_governor_lock); - if (!policy->governor_enabled) - goto out_unlock; - for_each_cpu(cpu, cpus) { cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); mod_delayed_work_on(cpu, system_wq, &cdbs->dwork, delay); } - -out_unlock: - mutex_unlock(&cpufreq_governor_lock); } EXPORT_SYMBOL_GPL(gov_queue_work);
@@ -213,14 +206,25 @@ static void dbs_timer(struct work_struct *work) struct cpu_dbs_info *cdbs = container_of(work, struct cpu_dbs_info, dwork.work); struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs; - struct cpufreq_policy *policy = ccdbs->policy; - struct dbs_data *dbs_data = policy->governor_data; + struct cpufreq_policy *policy; + struct dbs_data *dbs_data; unsigned int sampling_rate, delay; const struct cpumask *cpus; bool load_eval;
mutex_lock(&ccdbs->timer_mutex);
+ policy = ccdbs->policy; + + /* + * Governor might already be disabled and there is no point continuing + * with the work-handler. + */ + if (!policy) + goto unlock; + + dbs_data = policy->governor_data; + if (dbs_data->cdata->governor == GOV_CONSERVATIVE) { struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
@@ -237,6 +241,7 @@ static void dbs_timer(struct work_struct *work) delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, load_eval); gov_queue_work(dbs_data, policy, delay, cpus);
+unlock: mutex_unlock(&ccdbs->timer_mutex); }
@@ -473,9 +478,17 @@ static int cpufreq_governor_stop(struct cpufreq_policy *policy, if (!ccdbs || !ccdbs->policy) return -EBUSY;
+ /* + * Work-handler must see this updated, as it should not proceed any + * further after governor is disabled. And so timer_mutex is taken while + * updating this value. + */ + mutex_lock(&ccdbs->timer_mutex); + ccdbs->policy = NULL; + mutex_unlock(&ccdbs->timer_mutex); + gov_cancel_work(dbs_data, policy);
- ccdbs->policy = NULL; mutex_destroy(&ccdbs->timer_mutex); return 0; }
Invalid state-transitions is verified by governor core now and there is no need to replicate that in cpufreq core.
Stop verifying the same in cpufreq core. That will get rid of policy->governor_enabled and cpufreq_governor_lock.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 24 ------------------------ include/linux/cpufreq.h | 1 - 2 files changed, 25 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 5863db9213aa..7c46c3f3d938 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -102,7 +102,6 @@ static LIST_HEAD(cpufreq_governor_list); static struct cpufreq_driver *cpufreq_driver; static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data); static DEFINE_RWLOCK(cpufreq_driver_lock); -DEFINE_MUTEX(cpufreq_governor_lock);
/* Flag to suspend/resume CPUFreq governors */ static bool cpufreq_suspended; @@ -2099,21 +2098,6 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
pr_debug("%s: for CPU %u, event %u\n", __func__, policy->cpu, event);
- mutex_lock(&cpufreq_governor_lock); - if ((policy->governor_enabled && event == CPUFREQ_GOV_START) - || (!policy->governor_enabled - && (event == CPUFREQ_GOV_LIMITS || event == CPUFREQ_GOV_STOP))) { - mutex_unlock(&cpufreq_governor_lock); - return -EBUSY; - } - - if (event == CPUFREQ_GOV_STOP) - policy->governor_enabled = false; - else if (event == CPUFREQ_GOV_START) - policy->governor_enabled = true; - - mutex_unlock(&cpufreq_governor_lock); - ret = policy->governor->governor(policy, event);
if (!ret) { @@ -2121,14 +2105,6 @@ static int __cpufreq_governor(struct cpufreq_policy *policy, policy->governor->initialized++; else if (event == CPUFREQ_GOV_POLICY_EXIT) policy->governor->initialized--; - } else { - /* Restore original values */ - mutex_lock(&cpufreq_governor_lock); - if (event == CPUFREQ_GOV_STOP) - policy->governor_enabled = true; - else if (event == CPUFREQ_GOV_START) - policy->governor_enabled = false; - mutex_unlock(&cpufreq_governor_lock); }
if (((event == CPUFREQ_GOV_POLICY_INIT) && ret) || diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 29ad97c34fd5..bb02f9a6bbc5 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -81,7 +81,6 @@ struct cpufreq_policy { unsigned int policy; /* see above */ struct cpufreq_governor *governor; /* see below */ void *governor_data; - bool governor_enabled; /* governor start/stop flag */ char last_governor[CPUFREQ_NAME_LEN]; /* last governor used */
struct work_struct update; /* if update_policy() needs to be
linaro-kernel@lists.linaro.org