Hi Rafael,
Preeti recently highlighted [1] some issues in cpufreq core locking with respect to governors. I wanted to solve them after we have simplified the hotplug paths in cpufreq core with my latest patches, but now that she has poked me, I have done some work in that area.
I am trying to solve only a part of the bigger problem (in a way that I feel is the right way ahead). The first patches restructures code to make it more readable and the last patch does all the major changes. The logs in that one should be good enough to explain why and what I am doing.
The first two shouldn't bring any functional change and so can be applied early if you are confident about them.
@Preeti: I would like you to test these patches. These should get rid of the crashes you were facing but may generate a WARN() from line 447 of cpufreq_governor.c, if the sequence is wrong. That has to be fixed separately.
Line 447: WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT))
Rebased over: v4.1-rc6 Tested-on: ARM dual Cortex -A15 Exynos board.
[1] http://marc.info/?i=20150601064031.2972.59208.stgit%40perfhull-ltc.austin.ib...
Viresh Kumar (3): cpufreq: governor: register notifier from cs_init() cpufreq: governor: split cpufreq_governor_dbs() cpufreq: governor: Serialize governor callbacks
drivers/cpufreq/cpufreq_conservative.c | 28 +-- drivers/cpufreq/cpufreq_governor.c | 340 ++++++++++++++++++--------------- drivers/cpufreq/cpufreq_governor.h | 16 +- drivers/cpufreq/cpufreq_ondemand.c | 6 +- 4 files changed, 209 insertions(+), 181 deletions(-)
Notifiers are required only for conservative governor and the common governor code is unnecessarily polluted with that. Handle that from cs_init/exit() instead of cpufreq_governor_dbs().
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 26 +++++++++++++++----------- drivers/cpufreq/cpufreq_governor.c | 22 +++------------------- drivers/cpufreq/cpufreq_governor.h | 8 ++------ drivers/cpufreq/cpufreq_ondemand.c | 4 ++-- 4 files changed, 22 insertions(+), 38 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 25a70d06c5bf..75f875bb155e 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -148,6 +148,10 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, return 0; }
+static struct notifier_block cs_cpufreq_notifier_block = { + .notifier_call = dbs_cpufreq_notifier, +}; + /************************** sysfs interface ************************/ static struct common_dbs_data cs_dbs_cdata;
@@ -317,7 +321,7 @@ static struct attribute_group cs_attr_group_gov_pol = {
/************************** sysfs end ************************/
-static int cs_init(struct dbs_data *dbs_data) +static int cs_init(struct dbs_data *dbs_data, bool notify) { struct cs_dbs_tuners *tuners;
@@ -336,25 +340,26 @@ static int cs_init(struct dbs_data *dbs_data) dbs_data->tuners = tuners; dbs_data->min_sampling_rate = MIN_SAMPLING_RATE_RATIO * jiffies_to_usecs(10); + + if (notify) + cpufreq_register_notifier(&cs_cpufreq_notifier_block, + CPUFREQ_TRANSITION_NOTIFIER); + mutex_init(&dbs_data->mutex); return 0; }
-static void cs_exit(struct dbs_data *dbs_data) +static void cs_exit(struct dbs_data *dbs_data, bool notify) { + if (notify) + cpufreq_unregister_notifier(&cs_cpufreq_notifier_block, + CPUFREQ_TRANSITION_NOTIFIER); + kfree(dbs_data->tuners); }
define_get_cpu_dbs_routines(cs_cpu_dbs_info);
-static struct notifier_block cs_cpufreq_notifier_block = { - .notifier_call = dbs_cpufreq_notifier, -}; - -static struct cs_ops cs_ops = { - .notifier_block = &cs_cpufreq_notifier_block, -}; - static struct common_dbs_data cs_dbs_cdata = { .governor = GOV_CONSERVATIVE, .attr_group_gov_sys = &cs_attr_group_gov_sys, @@ -363,7 +368,6 @@ static struct common_dbs_data cs_dbs_cdata = { .get_cpu_dbs_info_s = get_cpu_dbs_info_s, .gov_dbs_timer = cs_dbs_timer, .gov_check_cpu = cs_check_cpu, - .gov_ops = &cs_ops, .init = cs_init, .exit = cs_exit, }; diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 1b44496b2d2b..d64a82e6481a 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -278,7 +278,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
dbs_data->cdata = cdata; dbs_data->usage_count = 1; - rc = cdata->init(dbs_data); + rc = cdata->init(dbs_data, !policy->governor->initialized); if (rc) { pr_err("%s: POLICY_INIT: init() failed\n", __func__); kfree(dbs_data); @@ -291,7 +291,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy, rc = sysfs_create_group(get_governor_parent_kobj(policy), get_sysfs_attr(dbs_data)); if (rc) { - cdata->exit(dbs_data); + cdata->exit(dbs_data, !policy->governor->initialized); kfree(dbs_data); return rc; } @@ -309,14 +309,6 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy, set_sampling_rate(dbs_data, max(dbs_data->min_sampling_rate, latency * LATENCY_MULTIPLIER));
- if ((cdata->governor == GOV_CONSERVATIVE) && - (!policy->governor->initialized)) { - struct cs_ops *cs_ops = dbs_data->cdata->gov_ops; - - cpufreq_register_notifier(cs_ops->notifier_block, - CPUFREQ_TRANSITION_NOTIFIER); - } - if (!have_governor_per_policy()) cdata->gdbs_data = dbs_data;
@@ -329,15 +321,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy, if (!have_governor_per_policy()) cpufreq_put_global_kobject();
- if ((dbs_data->cdata->governor == GOV_CONSERVATIVE) && - (policy->governor->initialized == 1)) { - struct cs_ops *cs_ops = dbs_data->cdata->gov_ops; - - cpufreq_unregister_notifier(cs_ops->notifier_block, - CPUFREQ_TRANSITION_NOTIFIER); - } - - cdata->exit(dbs_data); + cdata->exit(dbs_data, policy->governor->initialized == 1); kfree(dbs_data); cdata->gdbs_data = NULL; } diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index cc401d147e72..1690120df487 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -208,8 +208,8 @@ struct common_dbs_data { void *(*get_cpu_dbs_info_s)(int cpu); void (*gov_dbs_timer)(struct work_struct *work); void (*gov_check_cpu)(int cpu, unsigned int load); - int (*init)(struct dbs_data *dbs_data); - void (*exit)(struct dbs_data *dbs_data); + int (*init)(struct dbs_data *dbs_data, bool notify); + void (*exit)(struct dbs_data *dbs_data, bool notify);
/* Governor specific ops, see below */ void *gov_ops; @@ -234,10 +234,6 @@ struct od_ops { void (*freq_increase)(struct cpufreq_policy *policy, unsigned int freq); };
-struct cs_ops { - struct notifier_block *notifier_block; -}; - static inline int delay_for_sampling_rate(unsigned int sampling_rate) { int delay = usecs_to_jiffies(sampling_rate); diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index ad3f38fd3eb9..4fe78a9caa04 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -475,7 +475,7 @@ static struct attribute_group od_attr_group_gov_pol = {
/************************** sysfs end ************************/
-static int od_init(struct dbs_data *dbs_data) +static int od_init(struct dbs_data *dbs_data, bool notify) { struct od_dbs_tuners *tuners; u64 idle_time; @@ -517,7 +517,7 @@ static int od_init(struct dbs_data *dbs_data) return 0; }
-static void od_exit(struct dbs_data *dbs_data) +static void od_exit(struct dbs_data *dbs_data, bool notify) { kfree(dbs_data->tuners); }
On 06/03/2015 03:57 PM, Viresh Kumar wrote:
Notifiers are required only for conservative governor and the common governor code is unnecessarily polluted with that. Handle that from cs_init/exit() instead of cpufreq_governor_dbs().
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/cpufreq_conservative.c | 26 +++++++++++++++----------- drivers/cpufreq/cpufreq_governor.c | 22 +++------------------- drivers/cpufreq/cpufreq_governor.h | 8 ++------ drivers/cpufreq/cpufreq_ondemand.c | 4 ++-- 4 files changed, 22 insertions(+), 38 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 25a70d06c5bf..75f875bb155e 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -148,6 +148,10 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, return 0; }
+static struct notifier_block cs_cpufreq_notifier_block = {
- .notifier_call = dbs_cpufreq_notifier,
+};
/************************** sysfs interface ************************/ static struct common_dbs_data cs_dbs_cdata;
@@ -317,7 +321,7 @@ static struct attribute_group cs_attr_group_gov_pol = {
/************************** sysfs end ************************/
-static int cs_init(struct dbs_data *dbs_data) +static int cs_init(struct dbs_data *dbs_data, bool notify) { struct cs_dbs_tuners *tuners;
@@ -336,25 +340,26 @@ static int cs_init(struct dbs_data *dbs_data) dbs_data->tuners = tuners; dbs_data->min_sampling_rate = MIN_SAMPLING_RATE_RATIO * jiffies_to_usecs(10);
- if (notify)
cpufreq_register_notifier(&cs_cpufreq_notifier_block,
CPUFREQ_TRANSITION_NOTIFIER);
- mutex_init(&dbs_data->mutex); return 0;
}
-static void cs_exit(struct dbs_data *dbs_data) +static void cs_exit(struct dbs_data *dbs_data, bool notify) {
- if (notify)
cpufreq_unregister_notifier(&cs_cpufreq_notifier_block,
CPUFREQ_TRANSITION_NOTIFIER);
- kfree(dbs_data->tuners);
}
define_get_cpu_dbs_routines(cs_cpu_dbs_info);
-static struct notifier_block cs_cpufreq_notifier_block = {
- .notifier_call = dbs_cpufreq_notifier,
-};
-static struct cs_ops cs_ops = {
- .notifier_block = &cs_cpufreq_notifier_block,
-};
static struct common_dbs_data cs_dbs_cdata = { .governor = GOV_CONSERVATIVE, .attr_group_gov_sys = &cs_attr_group_gov_sys, @@ -363,7 +368,6 @@ static struct common_dbs_data cs_dbs_cdata = { .get_cpu_dbs_info_s = get_cpu_dbs_info_s, .gov_dbs_timer = cs_dbs_timer, .gov_check_cpu = cs_check_cpu,
- .gov_ops = &cs_ops, .init = cs_init, .exit = cs_exit,
}; diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 1b44496b2d2b..d64a82e6481a 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -278,7 +278,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
dbs_data->cdata = cdata; dbs_data->usage_count = 1;
rc = cdata->init(dbs_data);
if (rc) { pr_err("%s: POLICY_INIT: init() failed\n", __func__); kfree(dbs_data);rc = cdata->init(dbs_data, !policy->governor->initialized);
@@ -291,7 +291,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy, rc = sysfs_create_group(get_governor_parent_kobj(policy), get_sysfs_attr(dbs_data)); if (rc) {
cdata->exit(dbs_data);
}cdata->exit(dbs_data, !policy->governor->initialized); kfree(dbs_data); return rc;
@@ -309,14 +309,6 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy, set_sampling_rate(dbs_data, max(dbs_data->min_sampling_rate, latency * LATENCY_MULTIPLIER));
if ((cdata->governor == GOV_CONSERVATIVE) &&
(!policy->governor->initialized)) {
struct cs_ops *cs_ops = dbs_data->cdata->gov_ops;
cpufreq_register_notifier(cs_ops->notifier_block,
CPUFREQ_TRANSITION_NOTIFIER);
}
- if (!have_governor_per_policy()) cdata->gdbs_data = dbs_data;
@@ -329,15 +321,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy, if (!have_governor_per_policy()) cpufreq_put_global_kobject();
if ((dbs_data->cdata->governor == GOV_CONSERVATIVE) &&
(policy->governor->initialized == 1)) {
struct cs_ops *cs_ops = dbs_data->cdata->gov_ops;
cpufreq_unregister_notifier(cs_ops->notifier_block,
CPUFREQ_TRANSITION_NOTIFIER);
}
cdata->exit(dbs_data);
cdata->exit(dbs_data, policy->governor->initialized == 1); kfree(dbs_data); cdata->gdbs_data = NULL;
I don't see why we need the check on policy->governor->initialized because we call cdata->init() and cdata->exit(), *only* when the first and last references to the governor are being made respectively (filtered by dbs_data->usage_count), which is precisely what the initialized flag checks. So passing policy->governor->initialized seems to be redundant? And this is the case for both gov_per_policy and otherwise.
Regards Preeti U Murthy
On 04-06-15, 11:08, Preeti U Murthy wrote:
I don't see why we need the check on policy->governor->initialized because we call cdata->init() and cdata->exit(), *only* when the first and last references to the governor are being made respectively (filtered by dbs_data->usage_count), which is precisely what the initialized flag checks. So passing policy->governor->initialized seems to be redundant? And this is the case for both gov_per_policy and otherwise.
That's the case only for !gov_per_policy.
In case of gov_per_policy, the same governor is used by multiple policies but with different dbs_data objects. And in this case INIT will be called only once for a dbs_data and so usage count will be max 1 at any time. But we want to register the notifiers for a governor only once and so we need this extra check.
policy->governor->initialized is set to 1 when the governor is initialized for the first policy. For all others it is incremented to other values.
On 06/04/2015 11:32 AM, Viresh Kumar wrote:
On 04-06-15, 11:08, Preeti U Murthy wrote:
I don't see why we need the check on policy->governor->initialized because we call cdata->init() and cdata->exit(), *only* when the first and last references to the governor are being made respectively (filtered by dbs_data->usage_count), which is precisely what the initialized flag checks. So passing policy->governor->initialized seems to be redundant? And this is the case for both gov_per_policy and otherwise.
That's the case only for !gov_per_policy.
In case of gov_per_policy, the same governor is used by multiple policies but with different dbs_data objects. And in this case INIT will be called only once for a dbs_data and so usage count will be max 1 at any time. But we want to register the notifiers for a governor only once and so we need this extra check.
policy->governor->initialized is set to 1 when the governor is initialized for the first policy. For all others it is incremented to other values.
Ok I see. The patch looks good to me then.
Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com
cpufreq_governor_dbs() is hardly readable, it is just too big and complicated. Lets make it more readable by splitting out event specific routines.
Order of statements is changed at few places, but that shouldn't bring any functional change.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- Best way to verify the changes here is to keep both copies of code side by side and comparing it event wise.
drivers/cpufreq/cpufreq_governor.c | 326 +++++++++++++++++++++---------------- 1 file changed, 185 insertions(+), 141 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index d64a82e6481a..dc382a5a2158 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -239,195 +239,239 @@ static void set_sampling_rate(struct dbs_data *dbs_data, } }
-int cpufreq_governor_dbs(struct cpufreq_policy *policy, - struct common_dbs_data *cdata, unsigned int event) +static int cpufreq_governor_init(struct cpufreq_policy *policy, + struct dbs_data *dbs_data, + struct common_dbs_data *cdata) { - struct dbs_data *dbs_data; - struct od_cpu_dbs_info_s *od_dbs_info = NULL; - struct cs_cpu_dbs_info_s *cs_dbs_info = NULL; - struct od_ops *od_ops = NULL; - struct od_dbs_tuners *od_tuners = NULL; - struct cs_dbs_tuners *cs_tuners = NULL; - struct cpu_dbs_common_info *cpu_cdbs; - unsigned int sampling_rate, latency, ignore_nice, j, cpu = policy->cpu; - int io_busy = 0; - int rc; + unsigned int latency; + int ret;
- if (have_governor_per_policy()) - dbs_data = policy->governor_data; - else - dbs_data = cdata->gdbs_data; + if (dbs_data) { + WARN_ON(have_governor_per_policy()); + dbs_data->usage_count++; + policy->governor_data = dbs_data; + return 0; + }
- WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT)); + dbs_data = kzalloc(sizeof(*dbs_data), GFP_KERNEL); + if (!dbs_data) + return -ENOMEM;
- switch (event) { - case CPUFREQ_GOV_POLICY_INIT: - if (have_governor_per_policy()) { - WARN_ON(dbs_data); - } else if (dbs_data) { - dbs_data->usage_count++; - policy->governor_data = dbs_data; - return 0; - } + dbs_data->cdata = cdata; + dbs_data->usage_count = 1;
- dbs_data = kzalloc(sizeof(*dbs_data), GFP_KERNEL); - if (!dbs_data) { - pr_err("%s: POLICY_INIT: kzalloc failed\n", __func__); - return -ENOMEM; - } + ret = cdata->init(dbs_data, !policy->governor->initialized); + if (ret) + goto free_dbs_data;
- dbs_data->cdata = cdata; - dbs_data->usage_count = 1; - rc = cdata->init(dbs_data, !policy->governor->initialized); - if (rc) { - pr_err("%s: POLICY_INIT: init() failed\n", __func__); - kfree(dbs_data); - return rc; - } + /* policy latency is in ns. Convert it to us first */ + latency = policy->cpuinfo.transition_latency / 1000; + if (latency == 0) + latency = 1;
- if (!have_governor_per_policy()) - WARN_ON(cpufreq_get_global_kobject()); + /* Bring kernel and HW constraints together */ + dbs_data->min_sampling_rate = max(dbs_data->min_sampling_rate, + MIN_LATENCY_MULTIPLIER * latency); + set_sampling_rate(dbs_data, max(dbs_data->min_sampling_rate, + latency * LATENCY_MULTIPLIER));
- rc = sysfs_create_group(get_governor_parent_kobj(policy), - get_sysfs_attr(dbs_data)); - if (rc) { - cdata->exit(dbs_data, !policy->governor->initialized); - kfree(dbs_data); - return rc; - } + if (!have_governor_per_policy()) { + WARN_ON(cpufreq_get_global_kobject()); + cdata->gdbs_data = dbs_data; + }
- policy->governor_data = dbs_data; + ret = sysfs_create_group(get_governor_parent_kobj(policy), + get_sysfs_attr(dbs_data)); + if (ret) + goto cdata_exit;
- /* policy latency is in ns. Convert it to us first */ - latency = policy->cpuinfo.transition_latency / 1000; - if (latency == 0) - latency = 1; + policy->governor_data = dbs_data;
- /* Bring kernel and HW constraints together */ - dbs_data->min_sampling_rate = max(dbs_data->min_sampling_rate, - MIN_LATENCY_MULTIPLIER * latency); - set_sampling_rate(dbs_data, max(dbs_data->min_sampling_rate, - latency * LATENCY_MULTIPLIER)); + return 0;
- if (!have_governor_per_policy()) - cdata->gdbs_data = dbs_data; +cdata_exit: + if (!have_governor_per_policy()) { + cdata->gdbs_data = NULL; + cpufreq_put_global_kobject(); + } + cdata->exit(dbs_data, !policy->governor->initialized); +free_dbs_data: + kfree(dbs_data); + return ret; +}
- return 0; - case CPUFREQ_GOV_POLICY_EXIT: - if (!--dbs_data->usage_count) { - sysfs_remove_group(get_governor_parent_kobj(policy), - get_sysfs_attr(dbs_data)); +static void cpufreq_governor_exit(struct cpufreq_policy *policy, + struct dbs_data *dbs_data) +{ + struct common_dbs_data *cdata = dbs_data->cdata;
- if (!have_governor_per_policy()) - cpufreq_put_global_kobject(); + policy->governor_data = NULL; + if (!--dbs_data->usage_count) { + sysfs_remove_group(get_governor_parent_kobj(policy), + get_sysfs_attr(dbs_data));
- cdata->exit(dbs_data, policy->governor->initialized == 1); - kfree(dbs_data); + if (!have_governor_per_policy()) { cdata->gdbs_data = NULL; + cpufreq_put_global_kobject(); }
- policy->governor_data = NULL; - return 0; + cdata->exit(dbs_data, policy->governor->initialized == 1); + kfree(dbs_data); } +}
- cpu_cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); +static int cpufreq_governor_start(struct cpufreq_policy *policy, + struct dbs_data *dbs_data) +{ + struct common_dbs_data *cdata = dbs_data->cdata; + unsigned int sampling_rate, ignore_nice, j, cpu = policy->cpu; + struct cpu_dbs_common_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu); + int io_busy = 0; + + if (!policy->cur) + return -EINVAL; + + if (cdata->governor == GOV_CONSERVATIVE) { + struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
- if (dbs_data->cdata->governor == GOV_CONSERVATIVE) { - cs_tuners = dbs_data->tuners; - cs_dbs_info = dbs_data->cdata->get_cpu_dbs_info_s(cpu); sampling_rate = cs_tuners->sampling_rate; ignore_nice = cs_tuners->ignore_nice_load; } else { - od_tuners = dbs_data->tuners; - od_dbs_info = dbs_data->cdata->get_cpu_dbs_info_s(cpu); + struct od_dbs_tuners *od_tuners = dbs_data->tuners; + sampling_rate = od_tuners->sampling_rate; ignore_nice = od_tuners->ignore_nice_load; - od_ops = dbs_data->cdata->gov_ops; io_busy = od_tuners->io_is_busy; }
- switch (event) { - case CPUFREQ_GOV_START: - if (!policy->cur) - return -EINVAL; + mutex_lock(&dbs_data->mutex);
- mutex_lock(&dbs_data->mutex); + for_each_cpu(j, policy->cpus) { + struct cpu_dbs_common_info *j_cdbs = cdata->get_cpu_cdbs(j); + unsigned int prev_load;
- for_each_cpu(j, policy->cpus) { - struct cpu_dbs_common_info *j_cdbs = - dbs_data->cdata->get_cpu_cdbs(j); - unsigned int prev_load; + j_cdbs->cpu = j; + j_cdbs->cur_policy = policy; + j_cdbs->prev_cpu_idle = + get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy);
- j_cdbs->cpu = j; - j_cdbs->cur_policy = policy; - j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, - &j_cdbs->prev_cpu_wall, io_busy); + prev_load = (unsigned int)(j_cdbs->prev_cpu_wall - + j_cdbs->prev_cpu_idle); + j_cdbs->prev_load = 100 * prev_load / + (unsigned int)j_cdbs->prev_cpu_wall;
- prev_load = (unsigned int) - (j_cdbs->prev_cpu_wall - j_cdbs->prev_cpu_idle); - j_cdbs->prev_load = 100 * prev_load / - (unsigned int) j_cdbs->prev_cpu_wall; + if (ignore_nice) + j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
- if (ignore_nice) - j_cdbs->prev_cpu_nice = - kcpustat_cpu(j).cpustat[CPUTIME_NICE]; + mutex_init(&j_cdbs->timer_mutex); + INIT_DEFERRABLE_WORK(&j_cdbs->work, cdata->gov_dbs_timer); + }
- mutex_init(&j_cdbs->timer_mutex); - INIT_DEFERRABLE_WORK(&j_cdbs->work, - dbs_data->cdata->gov_dbs_timer); - } + if (cdata->governor == GOV_CONSERVATIVE) { + struct cs_cpu_dbs_info_s *cs_dbs_info = + cdata->get_cpu_dbs_info_s(cpu);
- if (dbs_data->cdata->governor == GOV_CONSERVATIVE) { - cs_dbs_info->down_skip = 0; - cs_dbs_info->enable = 1; - cs_dbs_info->requested_freq = policy->cur; - } else { - od_dbs_info->rate_mult = 1; - od_dbs_info->sample_type = OD_NORMAL_SAMPLE; - od_ops->powersave_bias_init_cpu(cpu); - } + cs_dbs_info->down_skip = 0; + cs_dbs_info->enable = 1; + cs_dbs_info->requested_freq = policy->cur; + } else { + struct od_ops *od_ops = cdata->gov_ops; + struct od_cpu_dbs_info_s *od_dbs_info = cdata->get_cpu_dbs_info_s(cpu);
- mutex_unlock(&dbs_data->mutex); + od_dbs_info->rate_mult = 1; + od_dbs_info->sample_type = OD_NORMAL_SAMPLE; + od_ops->powersave_bias_init_cpu(cpu); + }
- /* Initiate timer time stamp */ - cpu_cdbs->time_stamp = ktime_get(); + mutex_unlock(&dbs_data->mutex);
- gov_queue_work(dbs_data, policy, - delay_for_sampling_rate(sampling_rate), true); - break; + /* Initiate timer time stamp */ + cpu_cdbs->time_stamp = ktime_get();
- case CPUFREQ_GOV_STOP: - if (dbs_data->cdata->governor == GOV_CONSERVATIVE) - cs_dbs_info->enable = 0; + gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate), + true); + return 0; +} + +static void cpufreq_governor_stop(struct cpufreq_policy *policy, + struct dbs_data *dbs_data) +{ + struct common_dbs_data *cdata = dbs_data->cdata; + unsigned int cpu = policy->cpu; + struct cpu_dbs_common_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu); + + if (cdata->governor == GOV_CONSERVATIVE) { + struct cs_cpu_dbs_info_s *cs_dbs_info = + cdata->get_cpu_dbs_info_s(cpu);
- gov_cancel_work(dbs_data, policy); + cs_dbs_info->enable = 0; + } + + gov_cancel_work(dbs_data, policy); + + mutex_lock(&dbs_data->mutex); + mutex_destroy(&cpu_cdbs->timer_mutex); + cpu_cdbs->cur_policy = NULL; + mutex_unlock(&dbs_data->mutex); +}
- mutex_lock(&dbs_data->mutex); - mutex_destroy(&cpu_cdbs->timer_mutex); - cpu_cdbs->cur_policy = NULL; +static void cpufreq_governor_limits(struct cpufreq_policy *policy, + struct dbs_data *dbs_data) +{ + struct common_dbs_data *cdata = dbs_data->cdata; + unsigned int cpu = policy->cpu; + struct cpu_dbs_common_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu);
+ mutex_lock(&dbs_data->mutex); + if (!cpu_cdbs->cur_policy) { mutex_unlock(&dbs_data->mutex); + return; + }
- break; + mutex_lock(&cpu_cdbs->timer_mutex); + if (policy->max < cpu_cdbs->cur_policy->cur) + __cpufreq_driver_target(cpu_cdbs->cur_policy, policy->max, + CPUFREQ_RELATION_H); + else if (policy->min > cpu_cdbs->cur_policy->cur) + __cpufreq_driver_target(cpu_cdbs->cur_policy, policy->min, + CPUFREQ_RELATION_L); + dbs_check_cpu(dbs_data, cpu); + mutex_unlock(&cpu_cdbs->timer_mutex); + + mutex_unlock(&dbs_data->mutex); +}
+int cpufreq_governor_dbs(struct cpufreq_policy *policy, + struct common_dbs_data *cdata, unsigned int event) +{ + struct dbs_data *dbs_data; + int ret = 0; + + if (have_governor_per_policy()) + dbs_data = policy->governor_data; + else + dbs_data = cdata->gdbs_data; + + WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT)); + + switch (event) { + case CPUFREQ_GOV_POLICY_INIT: + ret = cpufreq_governor_init(policy, dbs_data, cdata); + break; + case CPUFREQ_GOV_POLICY_EXIT: + cpufreq_governor_exit(policy, dbs_data); + break; + case CPUFREQ_GOV_START: + ret = cpufreq_governor_start(policy, dbs_data); + break; + case CPUFREQ_GOV_STOP: + cpufreq_governor_stop(policy, dbs_data); + break; case CPUFREQ_GOV_LIMITS: - mutex_lock(&dbs_data->mutex); - if (!cpu_cdbs->cur_policy) { - mutex_unlock(&dbs_data->mutex); - break; - } - mutex_lock(&cpu_cdbs->timer_mutex); - if (policy->max < cpu_cdbs->cur_policy->cur) - __cpufreq_driver_target(cpu_cdbs->cur_policy, - policy->max, CPUFREQ_RELATION_H); - else if (policy->min > cpu_cdbs->cur_policy->cur) - __cpufreq_driver_target(cpu_cdbs->cur_policy, - policy->min, CPUFREQ_RELATION_L); - dbs_check_cpu(dbs_data, cpu); - mutex_unlock(&cpu_cdbs->timer_mutex); - mutex_unlock(&dbs_data->mutex); + cpufreq_governor_limits(policy, dbs_data); break; } - return 0; + + return ret; } EXPORT_SYMBOL_GPL(cpufreq_governor_dbs);
On 06/03/2015 03:57 PM, Viresh Kumar wrote:
cpufreq_governor_dbs() is hardly readable, it is just too big and complicated. Lets make it more readable by splitting out event specific routines.
Order of statements is changed at few places, but that shouldn't bring any functional change.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Best way to verify the changes here is to keep both copies of code side by side and comparing it event wise.
drivers/cpufreq/cpufreq_governor.c | 326 +++++++++++++++++++++---------------- 1 file changed, 185 insertions(+), 141 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index d64a82e6481a..dc382a5a2158 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -239,195 +239,239 @@ static void set_sampling_rate(struct dbs_data *dbs_data, } }
-int cpufreq_governor_dbs(struct cpufreq_policy *policy,
struct common_dbs_data *cdata, unsigned int event)
+static int cpufreq_governor_init(struct cpufreq_policy *policy,
struct dbs_data *dbs_data,
struct common_dbs_data *cdata)
{
- struct dbs_data *dbs_data;
- struct od_cpu_dbs_info_s *od_dbs_info = NULL;
- struct cs_cpu_dbs_info_s *cs_dbs_info = NULL;
- struct od_ops *od_ops = NULL;
- struct od_dbs_tuners *od_tuners = NULL;
- struct cs_dbs_tuners *cs_tuners = NULL;
- struct cpu_dbs_common_info *cpu_cdbs;
- unsigned int sampling_rate, latency, ignore_nice, j, cpu = policy->cpu;
- int io_busy = 0;
- int rc;
- unsigned int latency;
- int ret;
- if (have_governor_per_policy())
dbs_data = policy->governor_data;
- else
dbs_data = cdata->gdbs_data;
- if (dbs_data) {
WARN_ON(have_governor_per_policy());
Shouldn't this be outside this loop ? We warn here and allocate dbs_dta freshly in the current code for the case where governor is per policy.
dbs_data->usage_count++;
Besides, in the case where a governor exists per policy, we will end up incrementing the usage_count to more than 1 under this condition, which does not make sense.
policy->governor_data = dbs_data;
return 0;
- }
Regards Preeti U Murthy
On 04-06-15, 15:34, Preeti U Murthy wrote:
- if (dbs_data) {
WARN_ON(have_governor_per_policy());
Shouldn't this be outside this loop ? We warn here and allocate dbs_dta
Loop ? Its just an 'if' block :)
freshly in the current code for the case where governor is per policy.
So what we are doing in the current code is equally disgusting. We already have a pointer and we overwrite it.
dbs_data->usage_count++;
Besides, in the case where a governor exists per policy, we will end up incrementing the usage_count to more than 1 under this condition, which does not make sense.
So, the only sane option here is to return an error immediately I think.
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index ed849a8777dd..57a39f8a92b7 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -247,7 +247,8 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, int ret;
if (dbs_data) { - WARN_ON(have_governor_per_policy()); + if (WARN_ON(have_governor_per_policy())) + return -EINVAL; dbs_data->usage_count++; policy->governor_data = dbs_data; return 0; @@ -276,24 +277,28 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, latency * LATENCY_MULTIPLIER));
if (!have_governor_per_policy()) { - WARN_ON(cpufreq_get_global_kobject()); + if (WARN_ON(cpufreq_get_global_kobject())) { + ret = -EINVAL; + goto cdata_exit; + } cdata->gdbs_data = dbs_data; }
ret = sysfs_create_group(get_governor_parent_kobj(policy), get_sysfs_attr(dbs_data)); if (ret) - goto cdata_exit; + goto put_kobj;
policy->governor_data = dbs_data;
return 0;
-cdata_exit: +put_kobj: if (!have_governor_per_policy()) { cdata->gdbs_data = NULL; cpufreq_put_global_kobject(); } +cdata_exit: cdata->exit(dbs_data, !policy->governor->initialized); free_dbs_data: kfree(dbs_data);
cpufreq_governor_dbs() is hardly readable, it is just too big and complicated. Lets make it more readable by splitting out event specific routines.
Order of statements is changed at few places, but that shouldn't bring any functional change.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- V1->V2: Return errors after hitting WARN_ON. (Preeti)
drivers/cpufreq/cpufreq_governor.c | 329 +++++++++++++++++++++---------------- 1 file changed, 189 insertions(+), 140 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index d64a82e6481a..ccf6ce7e5983 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -239,195 +239,244 @@ static void set_sampling_rate(struct dbs_data *dbs_data, } }
-int cpufreq_governor_dbs(struct cpufreq_policy *policy, - struct common_dbs_data *cdata, unsigned int event) +static int cpufreq_governor_init(struct cpufreq_policy *policy, + struct dbs_data *dbs_data, + struct common_dbs_data *cdata) { - struct dbs_data *dbs_data; - struct od_cpu_dbs_info_s *od_dbs_info = NULL; - struct cs_cpu_dbs_info_s *cs_dbs_info = NULL; - struct od_ops *od_ops = NULL; - struct od_dbs_tuners *od_tuners = NULL; - struct cs_dbs_tuners *cs_tuners = NULL; - struct cpu_dbs_common_info *cpu_cdbs; - unsigned int sampling_rate, latency, ignore_nice, j, cpu = policy->cpu; - int io_busy = 0; - int rc; + unsigned int latency; + int ret;
- if (have_governor_per_policy()) - dbs_data = policy->governor_data; - else - dbs_data = cdata->gdbs_data; + if (dbs_data) { + if (WARN_ON(have_governor_per_policy())) + return -EINVAL; + dbs_data->usage_count++; + policy->governor_data = dbs_data; + return 0; + }
- WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT)); + dbs_data = kzalloc(sizeof(*dbs_data), GFP_KERNEL); + if (!dbs_data) + return -ENOMEM;
- switch (event) { - case CPUFREQ_GOV_POLICY_INIT: - if (have_governor_per_policy()) { - WARN_ON(dbs_data); - } else if (dbs_data) { - dbs_data->usage_count++; - policy->governor_data = dbs_data; - return 0; - } + dbs_data->cdata = cdata; + dbs_data->usage_count = 1;
- dbs_data = kzalloc(sizeof(*dbs_data), GFP_KERNEL); - if (!dbs_data) { - pr_err("%s: POLICY_INIT: kzalloc failed\n", __func__); - return -ENOMEM; - } + ret = cdata->init(dbs_data, !policy->governor->initialized); + if (ret) + goto free_dbs_data;
- dbs_data->cdata = cdata; - dbs_data->usage_count = 1; - rc = cdata->init(dbs_data, !policy->governor->initialized); - if (rc) { - pr_err("%s: POLICY_INIT: init() failed\n", __func__); - kfree(dbs_data); - return rc; - } + /* policy latency is in ns. Convert it to us first */ + latency = policy->cpuinfo.transition_latency / 1000; + if (latency == 0) + latency = 1;
- if (!have_governor_per_policy()) - WARN_ON(cpufreq_get_global_kobject()); + /* Bring kernel and HW constraints together */ + dbs_data->min_sampling_rate = max(dbs_data->min_sampling_rate, + MIN_LATENCY_MULTIPLIER * latency); + set_sampling_rate(dbs_data, max(dbs_data->min_sampling_rate, + latency * LATENCY_MULTIPLIER));
- rc = sysfs_create_group(get_governor_parent_kobj(policy), - get_sysfs_attr(dbs_data)); - if (rc) { - cdata->exit(dbs_data, !policy->governor->initialized); - kfree(dbs_data); - return rc; + if (!have_governor_per_policy()) { + if (WARN_ON(cpufreq_get_global_kobject())) { + ret = -EINVAL; + goto cdata_exit; } + cdata->gdbs_data = dbs_data; + }
- policy->governor_data = dbs_data; + ret = sysfs_create_group(get_governor_parent_kobj(policy), + get_sysfs_attr(dbs_data)); + if (ret) + goto put_kobj;
- /* policy latency is in ns. Convert it to us first */ - latency = policy->cpuinfo.transition_latency / 1000; - if (latency == 0) - latency = 1; + policy->governor_data = dbs_data;
- /* Bring kernel and HW constraints together */ - dbs_data->min_sampling_rate = max(dbs_data->min_sampling_rate, - MIN_LATENCY_MULTIPLIER * latency); - set_sampling_rate(dbs_data, max(dbs_data->min_sampling_rate, - latency * LATENCY_MULTIPLIER)); + return 0;
- if (!have_governor_per_policy()) - cdata->gdbs_data = dbs_data; +put_kobj: + if (!have_governor_per_policy()) { + cdata->gdbs_data = NULL; + cpufreq_put_global_kobject(); + } +cdata_exit: + cdata->exit(dbs_data, !policy->governor->initialized); +free_dbs_data: + kfree(dbs_data); + return ret; +}
- return 0; - case CPUFREQ_GOV_POLICY_EXIT: - if (!--dbs_data->usage_count) { - sysfs_remove_group(get_governor_parent_kobj(policy), - get_sysfs_attr(dbs_data)); +static void cpufreq_governor_exit(struct cpufreq_policy *policy, + struct dbs_data *dbs_data) +{ + struct common_dbs_data *cdata = dbs_data->cdata;
- if (!have_governor_per_policy()) - cpufreq_put_global_kobject(); + policy->governor_data = NULL; + if (!--dbs_data->usage_count) { + sysfs_remove_group(get_governor_parent_kobj(policy), + get_sysfs_attr(dbs_data));
- cdata->exit(dbs_data, policy->governor->initialized == 1); - kfree(dbs_data); + if (!have_governor_per_policy()) { cdata->gdbs_data = NULL; + cpufreq_put_global_kobject(); }
- policy->governor_data = NULL; - return 0; + cdata->exit(dbs_data, policy->governor->initialized == 1); + kfree(dbs_data); } +}
- cpu_cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); +static int cpufreq_governor_start(struct cpufreq_policy *policy, + struct dbs_data *dbs_data) +{ + struct common_dbs_data *cdata = dbs_data->cdata; + unsigned int sampling_rate, ignore_nice, j, cpu = policy->cpu; + struct cpu_dbs_common_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu); + int io_busy = 0; + + if (!policy->cur) + return -EINVAL; + + if (cdata->governor == GOV_CONSERVATIVE) { + struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
- if (dbs_data->cdata->governor == GOV_CONSERVATIVE) { - cs_tuners = dbs_data->tuners; - cs_dbs_info = dbs_data->cdata->get_cpu_dbs_info_s(cpu); sampling_rate = cs_tuners->sampling_rate; ignore_nice = cs_tuners->ignore_nice_load; } else { - od_tuners = dbs_data->tuners; - od_dbs_info = dbs_data->cdata->get_cpu_dbs_info_s(cpu); + struct od_dbs_tuners *od_tuners = dbs_data->tuners; + sampling_rate = od_tuners->sampling_rate; ignore_nice = od_tuners->ignore_nice_load; - od_ops = dbs_data->cdata->gov_ops; io_busy = od_tuners->io_is_busy; }
- switch (event) { - case CPUFREQ_GOV_START: - if (!policy->cur) - return -EINVAL; + mutex_lock(&dbs_data->mutex);
- mutex_lock(&dbs_data->mutex); + for_each_cpu(j, policy->cpus) { + struct cpu_dbs_common_info *j_cdbs = cdata->get_cpu_cdbs(j); + unsigned int prev_load;
- for_each_cpu(j, policy->cpus) { - struct cpu_dbs_common_info *j_cdbs = - dbs_data->cdata->get_cpu_cdbs(j); - unsigned int prev_load; + j_cdbs->cpu = j; + j_cdbs->cur_policy = policy; + j_cdbs->prev_cpu_idle = + get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy);
- j_cdbs->cpu = j; - j_cdbs->cur_policy = policy; - j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, - &j_cdbs->prev_cpu_wall, io_busy); + prev_load = (unsigned int)(j_cdbs->prev_cpu_wall - + j_cdbs->prev_cpu_idle); + j_cdbs->prev_load = 100 * prev_load / + (unsigned int)j_cdbs->prev_cpu_wall;
- prev_load = (unsigned int) - (j_cdbs->prev_cpu_wall - j_cdbs->prev_cpu_idle); - j_cdbs->prev_load = 100 * prev_load / - (unsigned int) j_cdbs->prev_cpu_wall; + if (ignore_nice) + j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
- if (ignore_nice) - j_cdbs->prev_cpu_nice = - kcpustat_cpu(j).cpustat[CPUTIME_NICE]; + mutex_init(&j_cdbs->timer_mutex); + INIT_DEFERRABLE_WORK(&j_cdbs->work, cdata->gov_dbs_timer); + }
- mutex_init(&j_cdbs->timer_mutex); - INIT_DEFERRABLE_WORK(&j_cdbs->work, - dbs_data->cdata->gov_dbs_timer); - } + if (cdata->governor == GOV_CONSERVATIVE) { + struct cs_cpu_dbs_info_s *cs_dbs_info = + cdata->get_cpu_dbs_info_s(cpu);
- if (dbs_data->cdata->governor == GOV_CONSERVATIVE) { - cs_dbs_info->down_skip = 0; - cs_dbs_info->enable = 1; - cs_dbs_info->requested_freq = policy->cur; - } else { - od_dbs_info->rate_mult = 1; - od_dbs_info->sample_type = OD_NORMAL_SAMPLE; - od_ops->powersave_bias_init_cpu(cpu); - } + cs_dbs_info->down_skip = 0; + cs_dbs_info->enable = 1; + cs_dbs_info->requested_freq = policy->cur; + } else { + struct od_ops *od_ops = cdata->gov_ops; + struct od_cpu_dbs_info_s *od_dbs_info = cdata->get_cpu_dbs_info_s(cpu);
- mutex_unlock(&dbs_data->mutex); + od_dbs_info->rate_mult = 1; + od_dbs_info->sample_type = OD_NORMAL_SAMPLE; + od_ops->powersave_bias_init_cpu(cpu); + }
- /* Initiate timer time stamp */ - cpu_cdbs->time_stamp = ktime_get(); + mutex_unlock(&dbs_data->mutex);
- gov_queue_work(dbs_data, policy, - delay_for_sampling_rate(sampling_rate), true); - break; + /* Initiate timer time stamp */ + cpu_cdbs->time_stamp = ktime_get();
- case CPUFREQ_GOV_STOP: - if (dbs_data->cdata->governor == GOV_CONSERVATIVE) - cs_dbs_info->enable = 0; + gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate), + true); + return 0; +} + +static void cpufreq_governor_stop(struct cpufreq_policy *policy, + struct dbs_data *dbs_data) +{ + struct common_dbs_data *cdata = dbs_data->cdata; + unsigned int cpu = policy->cpu; + struct cpu_dbs_common_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu); + + if (cdata->governor == GOV_CONSERVATIVE) { + struct cs_cpu_dbs_info_s *cs_dbs_info = + cdata->get_cpu_dbs_info_s(cpu);
- gov_cancel_work(dbs_data, policy); + cs_dbs_info->enable = 0; + } + + gov_cancel_work(dbs_data, policy); + + mutex_lock(&dbs_data->mutex); + mutex_destroy(&cpu_cdbs->timer_mutex); + cpu_cdbs->cur_policy = NULL; + mutex_unlock(&dbs_data->mutex); +}
- mutex_lock(&dbs_data->mutex); - mutex_destroy(&cpu_cdbs->timer_mutex); - cpu_cdbs->cur_policy = NULL; +static void cpufreq_governor_limits(struct cpufreq_policy *policy, + struct dbs_data *dbs_data) +{ + struct common_dbs_data *cdata = dbs_data->cdata; + unsigned int cpu = policy->cpu; + struct cpu_dbs_common_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu);
+ mutex_lock(&dbs_data->mutex); + if (!cpu_cdbs->cur_policy) { mutex_unlock(&dbs_data->mutex); + return; + }
- break; + mutex_lock(&cpu_cdbs->timer_mutex); + if (policy->max < cpu_cdbs->cur_policy->cur) + __cpufreq_driver_target(cpu_cdbs->cur_policy, policy->max, + CPUFREQ_RELATION_H); + else if (policy->min > cpu_cdbs->cur_policy->cur) + __cpufreq_driver_target(cpu_cdbs->cur_policy, policy->min, + CPUFREQ_RELATION_L); + dbs_check_cpu(dbs_data, cpu); + mutex_unlock(&cpu_cdbs->timer_mutex); + + mutex_unlock(&dbs_data->mutex); +}
+int cpufreq_governor_dbs(struct cpufreq_policy *policy, + struct common_dbs_data *cdata, unsigned int event) +{ + struct dbs_data *dbs_data; + int ret = 0; + + if (have_governor_per_policy()) + dbs_data = policy->governor_data; + else + dbs_data = cdata->gdbs_data; + + WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT)); + + switch (event) { + case CPUFREQ_GOV_POLICY_INIT: + ret = cpufreq_governor_init(policy, dbs_data, cdata); + break; + case CPUFREQ_GOV_POLICY_EXIT: + cpufreq_governor_exit(policy, dbs_data); + break; + case CPUFREQ_GOV_START: + ret = cpufreq_governor_start(policy, dbs_data); + break; + case CPUFREQ_GOV_STOP: + cpufreq_governor_stop(policy, dbs_data); + break; case CPUFREQ_GOV_LIMITS: - mutex_lock(&dbs_data->mutex); - if (!cpu_cdbs->cur_policy) { - mutex_unlock(&dbs_data->mutex); - break; - } - mutex_lock(&cpu_cdbs->timer_mutex); - if (policy->max < cpu_cdbs->cur_policy->cur) - __cpufreq_driver_target(cpu_cdbs->cur_policy, - policy->max, CPUFREQ_RELATION_H); - else if (policy->min > cpu_cdbs->cur_policy->cur) - __cpufreq_driver_target(cpu_cdbs->cur_policy, - policy->min, CPUFREQ_RELATION_L); - dbs_check_cpu(dbs_data, cpu); - mutex_unlock(&cpu_cdbs->timer_mutex); - mutex_unlock(&dbs_data->mutex); + cpufreq_governor_limits(policy, dbs_data); break; } - return 0; + + return ret; } EXPORT_SYMBOL_GPL(cpufreq_governor_dbs);
On 06/04/2015 04:43 PM, Viresh Kumar wrote:
cpufreq_governor_dbs() is hardly readable, it is just too big and complicated. Lets make it more readable by splitting out event specific routines.
Order of statements is changed at few places, but that shouldn't bring any functional change.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
V1->V2: Return errors after hitting WARN_ON. (Preeti)
Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com
Regards Preeti U Murthy
There are several races reported in cpufreq core around governors (only ondemand and conservative) by different people.
There are at least two race scenarios present in governor code: (a) Concurrent access/updates of governor internal structures.
It is possible that fields such as 'dbs_data->usage_count', etc. are accessed simultaneously for different policies using same governor structure (i.e. CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag unset). And because of this we can dereference bad pointers.
For example consider a system with two CPUs with separate 'struct cpufreq_policy' instances. CPU0 governor: ondemand and CPU1: powersave. CPU0 switching to powersave and CPU1 to ondemand: CPU0 CPU1
store* store*
cpufreq_governor_exit() cpufreq_governor_init() dbs_data = cdata->gdbs_data;
if (!--dbs_data->usage_count) kfree(dbs_data);
dbs_data->usage_count++; *Bad pointer dereference*
There are other races possible between EXIT and START/STOP/LIMIT as well. Its really complicated.
(b) Switching governor state in bad sequence:
For example trying to switch a governor to START state, when the governor is in EXIT state. There are some checks present in __cpufreq_governor() but they aren't sufficient as they compare events against 'policy->governor_enabled', where as we need to take governor's state into account, which can be used by multiple policies.
These two issues need to be solved separately and the responsibility should be properly divided between cpufreq and governor core.
The first problem is more about the governor core, as it needs to protect its structures properly. And the second problem should be fixed in cpufreq core instead of governor, as its all about sequence of events.
This patch is trying to solve only the first problem.
There are two types of data we need to protect, - 'struct common_dbs_data': No matter what, there is going to be a single copy of this per governor. - 'struct dbs_data': With CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag set, we will have per-policy copy of this data, otherwise a single copy.
Because of such complexities, the mutex present in 'struct dbs_data' is insufficient to solve our problem. For example we need to protect fetching of 'dbs_data' from different structures at the beginning of cpufreq_governor_dbs(), to make sure it isn't currently being updated.
This can be fixed if we can guarantee serialization of event parsing code for an individual governor. This is best solved with a mutex per governor, and the placeholder for that is 'struct common_dbs_data'.
And so this patch moves the mutex from 'struct dbs_data' to 'struct common_dbs_data' and takes it at the beginning and drops it at the end of cpufreq_governor_dbs().
Tested with and without following configuration options:
CONFIG_LOCKDEP_SUPPORT=y CONFIG_DEBUG_RT_MUTEXES=y CONFIG_DEBUG_PI_LIST=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_MUTEXES=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y CONFIG_LOCKDEP=y CONFIG_DEBUG_ATOMIC_SLEEP=y
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 2 +- drivers/cpufreq/cpufreq_governor.c | 24 +++++++++++------------- drivers/cpufreq/cpufreq_governor.h | 8 +++++--- drivers/cpufreq/cpufreq_ondemand.c | 2 +- 4 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 75f875bb155e..c86a10c30912 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -345,7 +345,6 @@ static int cs_init(struct dbs_data *dbs_data, bool notify) cpufreq_register_notifier(&cs_cpufreq_notifier_block, CPUFREQ_TRANSITION_NOTIFIER);
- mutex_init(&dbs_data->mutex); return 0; }
@@ -370,6 +369,7 @@ static struct common_dbs_data cs_dbs_cdata = { .gov_check_cpu = cs_check_cpu, .init = cs_init, .exit = cs_exit, + .mutex = __MUTEX_INITIALIZER(cs_dbs_cdata.mutex), };
static int cs_cpufreq_governor_dbs(struct cpufreq_policy *policy, diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index dc382a5a2158..ed849a8777dd 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -344,8 +344,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, io_busy = od_tuners->io_is_busy; }
- mutex_lock(&dbs_data->mutex); - for_each_cpu(j, policy->cpus) { struct cpu_dbs_common_info *j_cdbs = cdata->get_cpu_cdbs(j); unsigned int prev_load; @@ -383,8 +381,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, od_ops->powersave_bias_init_cpu(cpu); }
- mutex_unlock(&dbs_data->mutex); - /* Initiate timer time stamp */ cpu_cdbs->time_stamp = ktime_get();
@@ -409,10 +405,8 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy,
gov_cancel_work(dbs_data, policy);
- mutex_lock(&dbs_data->mutex); mutex_destroy(&cpu_cdbs->timer_mutex); cpu_cdbs->cur_policy = NULL; - mutex_unlock(&dbs_data->mutex); }
static void cpufreq_governor_limits(struct cpufreq_policy *policy, @@ -422,11 +416,8 @@ static void cpufreq_governor_limits(struct cpufreq_policy *policy, unsigned int cpu = policy->cpu; struct cpu_dbs_common_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu);
- mutex_lock(&dbs_data->mutex); - if (!cpu_cdbs->cur_policy) { - mutex_unlock(&dbs_data->mutex); + if (!cpu_cdbs->cur_policy) return; - }
mutex_lock(&cpu_cdbs->timer_mutex); if (policy->max < cpu_cdbs->cur_policy->cur) @@ -437,8 +428,6 @@ static void cpufreq_governor_limits(struct cpufreq_policy *policy, CPUFREQ_RELATION_L); dbs_check_cpu(dbs_data, cpu); mutex_unlock(&cpu_cdbs->timer_mutex); - - mutex_unlock(&dbs_data->mutex); }
int cpufreq_governor_dbs(struct cpufreq_policy *policy, @@ -447,12 +436,18 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy, struct dbs_data *dbs_data; int ret = 0;
+ /* Lock governor to block concurrent initialization of governor */ + mutex_lock(&cdata->mutex); + if (have_governor_per_policy()) dbs_data = policy->governor_data; else dbs_data = cdata->gdbs_data;
- WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT)); + if (WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT))) { + ret = -EINVAL; + goto unlock; + }
switch (event) { case CPUFREQ_GOV_POLICY_INIT: @@ -472,6 +467,9 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy, break; }
+unlock: + mutex_unlock(&cdata->mutex); + return ret; } EXPORT_SYMBOL_GPL(cpufreq_governor_dbs); diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 1690120df487..34736f5e869d 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -213,6 +213,11 @@ struct common_dbs_data {
/* Governor specific ops, see below */ void *gov_ops; + + /* + * Protects governor's data (struct dbs_data and struct common_dbs_data) + */ + struct mutex mutex; };
/* Governor Per policy data */ @@ -221,9 +226,6 @@ struct dbs_data { unsigned int min_sampling_rate; int usage_count; void *tuners; - - /* dbs_mutex protects dbs_enable in governor start/stop */ - struct mutex mutex; };
/* Governor specific ops, will be passed to dbs_data->gov_ops */ diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 4fe78a9caa04..3c1e10f2304c 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -513,7 +513,6 @@ static int od_init(struct dbs_data *dbs_data, bool notify) tuners->io_is_busy = should_io_be_busy();
dbs_data->tuners = tuners; - mutex_init(&dbs_data->mutex); return 0; }
@@ -541,6 +540,7 @@ static struct common_dbs_data od_dbs_cdata = { .gov_ops = &od_ops, .init = od_init, .exit = od_exit, + .mutex = __MUTEX_INITIALIZER(od_dbs_cdata.mutex), };
static void od_set_powersave_bias(unsigned int powersave_bias)
On 06/03/2015 03:57 PM, Viresh Kumar wrote:
There are several races reported in cpufreq core around governors (only ondemand and conservative) by different people.
There are at least two race scenarios present in governor code: (a) Concurrent access/updates of governor internal structures.
It is possible that fields such as 'dbs_data->usage_count', etc. are accessed simultaneously for different policies using same governor structure (i.e. CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag unset). And because of this we can dereference bad pointers.
For example consider a system with two CPUs with separate 'struct cpufreq_policy' instances. CPU0 governor: ondemand and CPU1: powersave. CPU0 switching to powersave and CPU1 to ondemand: CPU0 CPU1
store* store*
cpufreq_governor_exit() cpufreq_governor_init() dbs_data = cdata->gdbs_data;
if (!--dbs_data->usage_count) kfree(dbs_data);
dbs_data->usage_count++; *Bad pointer dereference*
There are other races possible between EXIT and START/STOP/LIMIT as well. Its really complicated.
(b) Switching governor state in bad sequence:
For example trying to switch a governor to START state, when the governor is in EXIT state. There are some checks present in __cpufreq_governor() but they aren't sufficient as they compare events against 'policy->governor_enabled', where as we need to take governor's state into account, which can be used by multiple policies.
These two issues need to be solved separately and the responsibility should be properly divided between cpufreq and governor core.
The first problem is more about the governor core, as it needs to protect its structures properly. And the second problem should be fixed in cpufreq core instead of governor, as its all about sequence of events.
This patch is trying to solve only the first problem.
There are two types of data we need to protect,
- 'struct common_dbs_data': No matter what, there is going to be a single copy of this per governor.
- 'struct dbs_data': With CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag set, we will have per-policy copy of this data, otherwise a single copy.
Because of such complexities, the mutex present in 'struct dbs_data' is insufficient to solve our problem. For example we need to protect fetching of 'dbs_data' from different structures at the beginning of cpufreq_governor_dbs(), to make sure it isn't currently being updated.
This can be fixed if we can guarantee serialization of event parsing code for an individual governor. This is best solved with a mutex per governor, and the placeholder for that is 'struct common_dbs_data'.
And so this patch moves the mutex from 'struct dbs_data' to 'struct common_dbs_data' and takes it at the beginning and drops it at the end of cpufreq_governor_dbs().
Tested with and without following configuration options:
CONFIG_LOCKDEP_SUPPORT=y CONFIG_DEBUG_RT_MUTEXES=y CONFIG_DEBUG_PI_LIST=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_MUTEXES=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y CONFIG_LOCKDEP=y CONFIG_DEBUG_ATOMIC_SLEEP=y
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com
On 06/03/2015 03:57 PM, Viresh Kumar wrote:
Hi Rafael,
Preeti recently highlighted [1] some issues in cpufreq core locking with respect to governors. I wanted to solve them after we have simplified the hotplug paths in cpufreq core with my latest patches, but now that she has poked me, I have done some work in that area.
I am trying to solve only a part of the bigger problem (in a way that I feel is the right way ahead). The first patches restructures code to make it more readable and the last patch does all the major changes. The logs in that one should be good enough to explain why and what I am doing.
The first two shouldn't bring any functional change and so can be applied early if you are confident about them.
@Preeti: I would like you to test these patches. These should get rid of the crashes you were facing but may generate a WARN() from line 447 of cpufreq_governor.c, if the sequence is wrong. That has to be fixed separately.
Line 447: WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT))
Rebased over: v4.1-rc6 Tested-on: ARM dual Cortex -A15 Exynos board.
[1] http://marc.info/?i=20150601064031.2972.59208.stgit%40perfhull-ltc.austin.ib...
Viresh Kumar (3): cpufreq: governor: register notifier from cs_init() cpufreq: governor: split cpufreq_governor_dbs() cpufreq: governor: Serialize governor callbacks
drivers/cpufreq/cpufreq_conservative.c | 28 +-- drivers/cpufreq/cpufreq_governor.c | 340 ++++++++++++++++++--------------- drivers/cpufreq/cpufreq_governor.h | 16 +- drivers/cpufreq/cpufreq_ondemand.c | 6 +- 4 files changed, 209 insertions(+), 181 deletions(-)
I did a hotplug test on a single core alongside changing governors between ondemand and conservative on the same core. The policy is per core on powerpc. Within a second of that run the kernel panics. The backtrace is below:
[ 165.981836] Unable to handle kernel paging request for data at address 0x00000000 [ 165.981929] Faulting instruction address: 0xc00000000053b3e0 cpu 0x4: Vector: 300 (Data Access) at [c000000fe0b2b880] pc: c00000000053b3e0: __bitmap_weight+0x70/0x100 lr: c00000000085a008: need_load_eval+0x38/0xf0 sp: c000000fe0b2bb00 msr: 9000000100009033 dar: 0 dsisr: 40000000 current = 0xc000000003e4fc90 paca = 0xc000000007da2600 softe: 0 irq_happened: 0x01 pid = 812, comm = kworker/4:2 enter ? for help [c000000fe0b2bb50] c00000000085a008 need_load_eval+0x38/0xf0 [c000000fe0b2bb80] c00000000085815c cs_dbs_timer+0xdc/0x150 [c000000fe0b2bbe0] c0000000000f489c process_one_work+0x24c/0x910 [c000000fe0b2bc90] c0000000000f50dc worker_thread+0x17c/0x540 [c000000fe0b2bd20] c0000000000fed70 kthread+0x120/0x140 [c000000fe0b2be30] c000000000009678 ret_from_kernel_thread+0x5c/0x64
The crash is the same as was reported at http://www.gossamer-threads.com/lists/linux/kernel/2186336.
Regards Preeti U Murthy
On 06/04/2015 10:44 AM, Preeti U Murthy wrote:
On 06/03/2015 03:57 PM, Viresh Kumar wrote:
Hi Rafael,
Preeti recently highlighted [1] some issues in cpufreq core locking with respect to governors. I wanted to solve them after we have simplified the hotplug paths in cpufreq core with my latest patches, but now that she has poked me, I have done some work in that area.
I am trying to solve only a part of the bigger problem (in a way that I feel is the right way ahead). The first patches restructures code to make it more readable and the last patch does all the major changes. The logs in that one should be good enough to explain why and what I am doing.
The first two shouldn't bring any functional change and so can be applied early if you are confident about them.
@Preeti: I would like you to test these patches. These should get rid of the crashes you were facing but may generate a WARN() from line 447 of cpufreq_governor.c, if the sequence is wrong. That has to be fixed separately.
Line 447: WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT))
Rebased over: v4.1-rc6 Tested-on: ARM dual Cortex -A15 Exynos board.
[1] http://marc.info/?i=20150601064031.2972.59208.stgit%40perfhull-ltc.austin.ib...
Viresh Kumar (3): cpufreq: governor: register notifier from cs_init() cpufreq: governor: split cpufreq_governor_dbs() cpufreq: governor: Serialize governor callbacks
drivers/cpufreq/cpufreq_conservative.c | 28 +-- drivers/cpufreq/cpufreq_governor.c | 340 ++++++++++++++++++--------------- drivers/cpufreq/cpufreq_governor.h | 16 +- drivers/cpufreq/cpufreq_ondemand.c | 6 +- 4 files changed, 209 insertions(+), 181 deletions(-)
I did a hotplug test on a single core alongside changing governors between ondemand and conservative on the same core. The policy is per core on powerpc. Within a second of that run the kernel panics. The backtrace is below:
[ 165.981836] Unable to handle kernel paging request for data at address 0x00000000 [ 165.981929] Faulting instruction address: 0xc00000000053b3e0 cpu 0x4: Vector: 300 (Data Access) at [c000000fe0b2b880] pc: c00000000053b3e0: __bitmap_weight+0x70/0x100 lr: c00000000085a008: need_load_eval+0x38/0xf0 sp: c000000fe0b2bb00 msr: 9000000100009033 dar: 0 dsisr: 40000000 current = 0xc000000003e4fc90 paca = 0xc000000007da2600 softe: 0 irq_happened: 0x01 pid = 812, comm = kworker/4:2 enter ? for help [c000000fe0b2bb50] c00000000085a008 need_load_eval+0x38/0xf0 [c000000fe0b2bb80] c00000000085815c cs_dbs_timer+0xdc/0x150 [c000000fe0b2bbe0] c0000000000f489c process_one_work+0x24c/0x910 [c000000fe0b2bc90] c0000000000f50dc worker_thread+0x17c/0x540 [c000000fe0b2bd20] c0000000000fed70 kthread+0x120/0x140 [c000000fe0b2be30] c000000000009678 ret_from_kernel_thread+0x5c/0x64
The crash is the same as was reported at http://www.gossamer-threads.com/lists/linux/kernel/2186336.
Regards Preeti U Murthy
And a crash at the cpufreq worker thread again due to data access exception when I change governors in parallel on a single core.
cpu 0x3: Vector: 300 (Data Access) at [c000000fedb538f0] pc: c000000000856750: od_dbs_timer+0x60/0x1e0 lr: c0000000000f489c: process_one_work+0x24c/0x910 sp: c000000fedb53b70 msr: 9000000100009033 dar: 10 dsisr: 40000000 current = 0xc000000fe3d128e0 paca = 0xc000000007da1c80 softe: 0 irq_happened: 0x01 pid = 17227, comm = kworker/3:1
With the backtrace being:
[c000000fedb53be0] c0000000000f489c process_one_work+0x24c/0x910 [c000000fedb53c90] c0000000000f50dc worker_thread+0x17c/0x540 [c000000fedb53d20] c0000000000fed70 kthread+0x120/0x140 [c000000fedb53e30] c000000000009678 ret_from_kernel_thread+0x5c/0x64
But the kernel stays sane longer than before with the patchset. The above crash happens around 15 seconds after the test begins, while earlier it wouldn't survive 2 seconds even.
Regards Preeti U Murthy
On 04-06-15, 11:38, Preeti U Murthy wrote:
And a crash at the cpufreq worker thread again due to data access exception when I change governors in parallel on a single core.
cpu 0x3: Vector: 300 (Data Access) at [c000000fedb538f0] pc: c000000000856750: od_dbs_timer+0x60/0x1e0 lr: c0000000000f489c: process_one_work+0x24c/0x910 sp: c000000fedb53b70 msr: 9000000100009033 dar: 10 dsisr: 40000000 current = 0xc000000fe3d128e0 paca = 0xc000000007da1c80 softe: 0 irq_happened: 0x01 pid = 17227, comm = kworker/3:1
With the backtrace being:
[c000000fedb53be0] c0000000000f489c process_one_work+0x24c/0x910 [c000000fedb53c90] c0000000000f50dc worker_thread+0x17c/0x540 [c000000fedb53d20] c0000000000fed70 kthread+0x120/0x140 [c000000fedb53e30] c000000000009678 ret_from_kernel_thread+0x5c/0x64
But the kernel stays sane longer than before with the patchset. The above crash happens around 15 seconds after the test begins, while earlier it wouldn't survive 2 seconds even.
I haven't attempted to solve the race between the worker threads and governor-callbacks yet. What I have tried to solve is the race between different callbacks. And you shouldn't see a race there for now. For example a race between INIT/EXIT/START/STOP/LIMITS.
On 06/04/2015 11:41 AM, Viresh Kumar wrote:
On 04-06-15, 11:38, Preeti U Murthy wrote:
And a crash at the cpufreq worker thread again due to data access exception when I change governors in parallel on a single core.
cpu 0x3: Vector: 300 (Data Access) at [c000000fedb538f0] pc: c000000000856750: od_dbs_timer+0x60/0x1e0 lr: c0000000000f489c: process_one_work+0x24c/0x910 sp: c000000fedb53b70 msr: 9000000100009033 dar: 10 dsisr: 40000000 current = 0xc000000fe3d128e0 paca = 0xc000000007da1c80 softe: 0 irq_happened: 0x01 pid = 17227, comm = kworker/3:1
With the backtrace being:
[c000000fedb53be0] c0000000000f489c process_one_work+0x24c/0x910 [c000000fedb53c90] c0000000000f50dc worker_thread+0x17c/0x540 [c000000fedb53d20] c0000000000fed70 kthread+0x120/0x140 [c000000fedb53e30] c000000000009678 ret_from_kernel_thread+0x5c/0x64
But the kernel stays sane longer than before with the patchset. The above crash happens around 15 seconds after the test begins, while earlier it wouldn't survive 2 seconds even.
I haven't attempted to solve the race between the worker threads and governor-callbacks yet. What I have tried to solve is the race between different callbacks. And you shouldn't see a race there for now. For example a race between INIT/EXIT/START/STOP/LIMITS.
Your fix may not be complete and here is why. The reason we see the crash is because we have *only* attempted to serialize calls to cpufreq_governor_dbs() and not attempted to serialize *entire logical sequence of operations*. Let's take a look at what is happening as a consequence.
CPU0 CPU1 store_scaling_governor() __cpufreq_remove_dev_finish() __cpufreq_governor(CPUFREQ_GOV_STOP) __cpufreq_governor(CPUFREQ_GOV_START) policy->governor_enabled = false cpufreq_governor_dbs() policy->governor_enabled = true mutex_lock() gov_cancel_work() cpufreq_governor_dbs() wait on lock
may call gov_queue_work() if (!policy->enabled) : fails and we end up queuing work
mutex_unlock() mutex_lock() gov_queue_work() mutex_unlock()
__cpufreq_governor(CPUFREQ_GOV_POLICY_EXIT) mutex_lock() cpufreq_governor_dbs() kfree(dbs_data)
timer fires and od_dbs_timer/cs_dbs_timer() runs References governor data structures which are freed
The issue as I see it is one set of operations must be allowed to run *completely* before another begins. When store_scaling_governor() says STOP, all governor operations must be stopped, till the time store_scaling_governor() itself gives permission to restart. Somebody else, in this case __cpufreq_remove_dev_finish() cannot overrule this if it arrives late. This is what is happening above.
So if store_scaling_governor() arrives first, STOP|EXIT|START|LIMIT must complete before START|LIMIT of __cpufreq_remove_dev_finish() is allowed to run. So it is just not about serializing, its about *what needs to be serialized*.
Regards Preeti U Murthy
On 04-06-15, 12:06, Preeti U Murthy wrote:
Your fix may not be complete and here is why. The reason we see the crash is because we have *only* attempted to serialize calls to cpufreq_governor_dbs() and not attempted to serialize *entire logical sequence of operations*. Let's take a look at what is happening as a consequence.
You missed my logs (For the first time in my life I wrote them so well). This is what I mentioned in 3/3:
" These two issues need to be solved separately and the responsibility should be properly divided between cpufreq and governor core.
The first problem is more about the governor core, as it needs to protect its structures properly. And the second problem should be fixed in cpufreq core instead of governor, as its all about sequence of events.
This patch is trying to solve only the first problem. "
I NEVER claimed that I solved all the issues.
On 06/04/2015 12:12 PM, Viresh Kumar wrote:
On 04-06-15, 12:06, Preeti U Murthy wrote:
Your fix may not be complete and here is why. The reason we see the crash is because we have *only* attempted to serialize calls to cpufreq_governor_dbs() and not attempted to serialize *entire logical sequence of operations*. Let's take a look at what is happening as a consequence.
You missed my logs (For the first time in my life I wrote them so well). This is what I mentioned in 3/3:
" These two issues need to be solved separately and the responsibility should be properly divided between cpufreq and governor core.
The first problem is more about the governor core, as it needs to protect its structures properly. And the second problem should be fixed in cpufreq core instead of governor, as its all about sequence of events.
My point is do we really need to treat them as separate problems ? Will not serializing sequence of events help solve both issues ?
When we know the problem, why not fix it proper, rather than breaking it up ?
This patch is trying to solve only the first problem. "
I NEVER claimed that I solved all the issues.
That is true. My intention was to point out explicitly what still remains to be solved. It is true that you have mentioned that the problem in the cpufreq core is about sequencing of events. I intended to highlight what it was.
I would have restrained from pointing it out had the issues that I am seeing waned a wee bit, but it has not, which is why I did not see value in having the third patch as a stand alone patch, with more going in as series.
Regards Preeti U Murthy
On 04-06-15, 12:34, Preeti U Murthy wrote:
My point is do we really need to treat them as separate problems ?
Yes.
Will not serializing sequence of events help solve both issues ?
That's not the point. Even if it solves the problem, it may not be the right approach. There are two problems here: - One lies in cpufreq.c or in policies domain. - Other one is about governor. Governor code shouldn't rely of cpufreq.c locking to guarantee that access to its structures isn't racy.
And the way you proposed to solve it (yes the original idea was from one of my earlier patches) is not the right way to do it.
For example, cpufreq_set_policy() shouldn't care about how the governor code is placed. It should just do enough to get rid of racy code belonging to that policy.
But with our other approach, we are trying to stop the governor to be used by anyone else in the kernel. Who the hell is that 'policy' to decide who can access the governor ?
That's why I divided it up, so that we don't come to it again. I haven't learnt these things earlier, and wrote messy locking code earlier. But after looking/working on core code like timers etc, I understood the importance of right design and proper partitioning of responsibilities.
When we know the problem, why not fix it proper, rather than breaking it up ?
What are we breaking here ?
This patch is trying to solve only the first problem. "
I NEVER claimed that I solved all the issues.
That is true. My intention was to point out explicitly what still remains to be solved. It is true that you have mentioned that the problem in the cpufreq core is about sequencing of events. I intended to highlight what it was.
Okay, that should be the next target we have to fix after applying these patches.
I would have restrained from pointing it out had the issues that I am seeing waned a wee bit, but it has not, which is why I did not see value in having the third patch as a stand alone patch, with more going in as series.
Its solving what it is intended to solve. But we need more patches over it to fix problems outside the scope of governors.
On 06/04/2015 12:43 PM, Viresh Kumar wrote:
On 04-06-15, 12:34, Preeti U Murthy wrote:
My point is do we really need to treat them as separate problems ?
Yes.
Will not serializing sequence of events help solve both issues ?
That's not the point. Even if it solves the problem, it may not be the right approach. There are two problems here:
- One lies in cpufreq.c or in policies domain.
- Other one is about governor. Governor code shouldn't rely of cpufreq.c locking to guarantee that access to its structures isn't racy.
And the way you proposed to solve it (yes the original idea was from one of my earlier patches) is not the right way to do it.
For example, cpufreq_set_policy() shouldn't care about how the governor code is placed. It should just do enough to get rid of racy code belonging to that policy.
But with our other approach, we are trying to stop the governor to be used by anyone else in the kernel. Who the hell is that 'policy' to decide who can access the governor ?
That's why I divided it up, so that we don't come to it again. I haven't learnt these things earlier, and wrote messy locking code earlier. But after looking/working on core code like timers etc, I understood the importance of right design and proper partitioning of responsibilities.
Ok, fair enough.
Regards Preeti U Murthy
On 04-06-15, 10:44, Preeti U Murthy wrote:
On 06/03/2015 03:57 PM, Viresh Kumar wrote:
Preeti recently highlighted [1] some issues in cpufreq core locking with respect to governors. I wanted to solve them after we have simplified the hotplug paths in cpufreq core with my latest patches, but now that she has poked me, I have done some work in that area.
I am trying to solve only a part of the bigger problem (in a way that I feel is the right way ahead). The first patches restructures code to make it more readable and the last patch does all the major changes. The logs in that one should be good enough to explain why and what I am doing.
The first two shouldn't bring any functional change and so can be applied early if you are confident about them.
@Preeti: I would like you to test these patches. These should get rid of the crashes you were facing but may generate a WARN() from line 447 of cpufreq_governor.c, if the sequence is wrong. That has to be fixed separately.
Line 447: WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT))
Hi Preeti,
Thanks for giving your RBY tags for all the patches, would you also like to give Tested-by's if you have done any testing on these.
That is just to confirm it hasn't broken things any further and that we haven't seen any crashes in races between INIT/EXIT/START/STOP/LIMITS.
On 06/05/2015 08:30 AM, Viresh Kumar wrote:
On 04-06-15, 10:44, Preeti U Murthy wrote:
On 06/03/2015 03:57 PM, Viresh Kumar wrote:
Preeti recently highlighted [1] some issues in cpufreq core locking with respect to governors. I wanted to solve them after we have simplified the hotplug paths in cpufreq core with my latest patches, but now that she has poked me, I have done some work in that area.
I am trying to solve only a part of the bigger problem (in a way that I feel is the right way ahead). The first patches restructures code to make it more readable and the last patch does all the major changes. The logs in that one should be good enough to explain why and what I am doing.
The first two shouldn't bring any functional change and so can be applied early if you are confident about them.
@Preeti: I would like you to test these patches. These should get rid of the crashes you were facing but may generate a WARN() from line 447 of cpufreq_governor.c, if the sequence is wrong. That has to be fixed separately.
Line 447: WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT))
Hi Preeti,
Thanks for giving your RBY tags for all the patches, would you also like to give Tested-by's if you have done any testing on these.
That is just to confirm it hasn't broken things any further and that we haven't seen any crashes in races between INIT/EXIT/START/STOP/LIMITS.
Let me run a fair bit of tests today and confirm this.
Regards Preeti U Murthy
On 06/05/2015 08:30 AM, Viresh Kumar wrote:
On 04-06-15, 10:44, Preeti U Murthy wrote:
On 06/03/2015 03:57 PM, Viresh Kumar wrote:
Preeti recently highlighted [1] some issues in cpufreq core locking with respect to governors. I wanted to solve them after we have simplified the hotplug paths in cpufreq core with my latest patches, but now that she has poked me, I have done some work in that area.
I am trying to solve only a part of the bigger problem (in a way that I feel is the right way ahead). The first patches restructures code to make it more readable and the last patch does all the major changes. The logs in that one should be good enough to explain why and what I am doing.
The first two shouldn't bring any functional change and so can be applied early if you are confident about them.
@Preeti: I would like you to test these patches. These should get rid of the crashes you were facing but may generate a WARN() from line 447 of cpufreq_governor.c, if the sequence is wrong. That has to be fixed separately.
Line 447: WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT))
Hi Preeti,
Thanks for giving your RBY tags for all the patches, would you also like to give Tested-by's if you have done any testing on these.
That is just to confirm it hasn't broken things any further and that we haven't seen any crashes in races between INIT/EXIT/START/STOP/LIMITS.
I realize that I am not currently hitting the races on my machine that this patchset is trying to solve. The races that I am hitting have been conveyed on this thread and we should be fixing that separately as Viresh pointed out. I can therefore provide only by Reviewed-bys to this patchset.
Thanks
Regards Preeti U Murthy
On Wednesday, June 03, 2015 03:57:10 PM Viresh Kumar wrote:
Hi Rafael,
Preeti recently highlighted [1] some issues in cpufreq core locking with respect to governors. I wanted to solve them after we have simplified the hotplug paths in cpufreq core with my latest patches, but now that she has poked me, I have done some work in that area.
I am trying to solve only a part of the bigger problem (in a way that I feel is the right way ahead). The first patches restructures code to make it more readable and the last patch does all the major changes. The logs in that one should be good enough to explain why and what I am doing.
The first two shouldn't bring any functional change and so can be applied early if you are confident about them.
@Preeti: I would like you to test these patches. These should get rid of the crashes you were facing but may generate a WARN() from line 447 of cpufreq_governor.c, if the sequence is wrong. That has to be fixed separately.
Line 447: WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT))
Rebased over: v4.1-rc6 Tested-on: ARM dual Cortex -A15 Exynos board.
[1] http://marc.info/?i=20150601064031.2972.59208.stgit%40perfhull-ltc.austin.ib...
Viresh Kumar (3): cpufreq: governor: register notifier from cs_init() cpufreq: governor: split cpufreq_governor_dbs() cpufreq: governor: Serialize governor callbacks
drivers/cpufreq/cpufreq_conservative.c | 28 +-- drivers/cpufreq/cpufreq_governor.c | 340 ++++++++++++++++++--------------- drivers/cpufreq/cpufreq_governor.h | 16 +- drivers/cpufreq/cpufreq_ondemand.c | 6 +- 4 files changed, 209 insertions(+), 181 deletions(-)
I've queued up the series (with the replacement [2/3]) for 4.2, thanks!
linaro-kernel@lists.linaro.org