Hi Rafael,
Sorry for doing this, I know you were also looking to fix this in a possibly different way. But I thought, it would be better if we fix that. We can scrap this version and take yours if that looks better.
The root cause of all the issues we were facing, was that we were taking policy->rwsem while accessing governor sysfs attributes. And that happened because we were sharing the show/store calls present in cpufreq.c.
I thought, perhaps the best way to fix it is to give separate sysfs-ops to governors. And that's what I did.
@Juri: I need your help in testing these. My platform doesn't give me those lockups (even without these patches) and Juno/Tc2 would fit better.
Can you please run some tests on these?
They are pushed here for easy access (and auto test by build-bot): git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git cpufreq/governor-kobject
-- viresh
Viresh Kumar (5): cpufreq: governor: Kill declare_show_sampling_rate_min() cpufreq: governor: Create separate sysfs-ops cpufreq: governor: Remove unused sysfs attribute macros cpufreq: Don't drop rwsem before calling CPUFREQ_GOV_POLICY_EXIT cpufreq: Get rid of ->governor_enabled and its lock
drivers/cpufreq/cpufreq.c | 29 ---------- drivers/cpufreq/cpufreq_conservative.c | 77 ++++++++++--------------- drivers/cpufreq/cpufreq_governor.c | 86 ++++++++++++++++++++-------- drivers/cpufreq/cpufreq_governor.h | 101 +++++++-------------------------- drivers/cpufreq/cpufreq_ondemand.c | 77 ++++++++++--------------- include/linux/cpufreq.h | 5 -- 6 files changed, 143 insertions(+), 232 deletions(-)
This extra macro is required because the variable min_sampling_rate is made part of 'struct dbs_data' instead of governor specific tunables.
For further optimization, its better that we kill declare_show_sampling_rate_min() by moving min_sampling_rate to governor specific tunables.
Lets do it.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 14 ++++++------- drivers/cpufreq/cpufreq_governor.c | 36 ++++++++++++++++++++-------------- drivers/cpufreq/cpufreq_governor.h | 18 ++--------------- drivers/cpufreq/cpufreq_ondemand.c | 14 ++++++------- 4 files changed, 37 insertions(+), 45 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 606ad74abe6e..57750367bd26 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -185,7 +185,7 @@ static ssize_t store_sampling_rate(struct dbs_data *dbs_data, const char *buf, if (ret != 1) return -EINVAL;
- cs_tuners->sampling_rate = max(input, dbs_data->min_sampling_rate); + cs_tuners->sampling_rate = max(input, cs_tuners->min_sampling_rate); return count; }
@@ -281,7 +281,7 @@ show_store_one(cs, up_threshold); show_store_one(cs, down_threshold); show_store_one(cs, ignore_nice_load); show_store_one(cs, freq_step); -declare_show_sampling_rate_min(cs); +show_one(cs, min_sampling_rate);
gov_sys_pol_attr_rw(sampling_rate); gov_sys_pol_attr_rw(sampling_down_factor); @@ -289,10 +289,10 @@ gov_sys_pol_attr_rw(up_threshold); gov_sys_pol_attr_rw(down_threshold); gov_sys_pol_attr_rw(ignore_nice_load); gov_sys_pol_attr_rw(freq_step); -gov_sys_pol_attr_ro(sampling_rate_min); +gov_sys_pol_attr_ro(min_sampling_rate);
static struct attribute *dbs_attributes_gov_sys[] = { - &sampling_rate_min_gov_sys.attr, + &min_sampling_rate_gov_sys.attr, &sampling_rate_gov_sys.attr, &sampling_down_factor_gov_sys.attr, &up_threshold_gov_sys.attr, @@ -308,7 +308,7 @@ static struct attribute_group cs_attr_group_gov_sys = { };
static struct attribute *dbs_attributes_gov_pol[] = { - &sampling_rate_min_gov_pol.attr, + &min_sampling_rate_gov_pol.attr, &sampling_rate_gov_pol.attr, &sampling_down_factor_gov_pol.attr, &up_threshold_gov_pol.attr, @@ -340,10 +340,10 @@ static int cs_init(struct dbs_data *dbs_data, bool notify) tuners->sampling_down_factor = DEF_SAMPLING_DOWN_FACTOR; tuners->ignore_nice_load = 0; tuners->freq_step = DEF_FREQUENCY_STEP; + tuners->min_sampling_rate = MIN_SAMPLING_RATE_RATIO * + jiffies_to_usecs(10);
dbs_data->tuners = tuners; - dbs_data->min_sampling_rate = MIN_SAMPLING_RATE_RATIO * - jiffies_to_usecs(10);
if (notify) cpufreq_register_notifier(&cs_cpufreq_notifier_block, diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index e0d111024d48..9a7edc91ad57 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -286,16 +286,32 @@ static void dbs_timer_handler(unsigned long data) queue_work(system_wq, &shared->work); }
-static void set_sampling_rate(struct dbs_data *dbs_data, - unsigned int sampling_rate) +static void set_sampling_rate(struct cpufreq_policy *policy, + struct dbs_data *dbs_data) { + unsigned int *sampling_rate; + unsigned int *min_sampling_rate; + unsigned int latency; + + /* policy latency is in ns. Convert it to us first */ + latency = policy->cpuinfo.transition_latency / 1000; + if (latency == 0) + latency = 1; + if (dbs_data->cdata->governor == GOV_CONSERVATIVE) { struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; - cs_tuners->sampling_rate = sampling_rate; + sampling_rate = &cs_tuners->sampling_rate; + min_sampling_rate = &cs_tuners->min_sampling_rate; } else { struct od_dbs_tuners *od_tuners = dbs_data->tuners; - od_tuners->sampling_rate = sampling_rate; + sampling_rate = &od_tuners->sampling_rate; + min_sampling_rate = &od_tuners->min_sampling_rate; } + + /* Bring kernel and HW constraints together */ + *min_sampling_rate = max(*min_sampling_rate, + MIN_LATENCY_MULTIPLIER * latency); + *sampling_rate = max(*min_sampling_rate, latency * LATENCY_MULTIPLIER); }
static int alloc_common_dbs_info(struct cpufreq_policy *policy, @@ -338,7 +354,6 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, struct dbs_data *dbs_data, struct common_dbs_data *cdata) { - unsigned int latency; int ret;
/* State should be equivalent to EXIT */ @@ -373,16 +388,7 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, if (ret) goto free_common_dbs_info;
- /* policy latency is in ns. Convert it to us first */ - latency = policy->cpuinfo.transition_latency / 1000; - if (latency == 0) - latency = 1; - - /* Bring kernel and HW constraints together */ - dbs_data->min_sampling_rate = max(dbs_data->min_sampling_rate, - MIN_LATENCY_MULTIPLIER * latency); - set_sampling_rate(dbs_data, max(dbs_data->min_sampling_rate, - latency * LATENCY_MULTIPLIER)); + set_sampling_rate(policy, dbs_data);
if (!have_governor_per_policy()) cdata->gdbs_data = dbs_data; diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 91e767a058a7..ad44a8546a3a 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -183,6 +183,7 @@ struct od_dbs_tuners { unsigned int up_threshold; unsigned int powersave_bias; unsigned int io_is_busy; + unsigned int min_sampling_rate; };
struct cs_dbs_tuners { @@ -192,6 +193,7 @@ struct cs_dbs_tuners { unsigned int up_threshold; unsigned int down_threshold; unsigned int freq_step; + unsigned int min_sampling_rate; };
/* Common Governor data across policies */ @@ -230,7 +232,6 @@ struct common_dbs_data { /* Governor Per policy data */ struct dbs_data { struct common_dbs_data *cdata; - unsigned int min_sampling_rate; int usage_count; void *tuners; }; @@ -254,21 +255,6 @@ static inline int delay_for_sampling_rate(unsigned int sampling_rate) return delay; }
-#define declare_show_sampling_rate_min(_gov) \ -static ssize_t show_sampling_rate_min_gov_sys \ -(struct kobject *kobj, struct attribute *attr, char *buf) \ -{ \ - struct dbs_data *dbs_data = _gov##_dbs_cdata.gdbs_data; \ - return sprintf(buf, "%u\n", dbs_data->min_sampling_rate); \ -} \ - \ -static ssize_t show_sampling_rate_min_gov_pol \ -(struct cpufreq_policy *policy, char *buf) \ -{ \ - struct dbs_data *dbs_data = policy->governor_data; \ - return sprintf(buf, "%u\n", dbs_data->min_sampling_rate); \ -} - extern struct mutex cpufreq_governor_lock;
void gov_add_timers(struct cpufreq_policy *policy, unsigned int delay); diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index eae51070c034..b31f64745232 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -250,7 +250,7 @@ static void update_sampling_rate(struct dbs_data *dbs_data, int cpu;
od_tuners->sampling_rate = new_rate = max(new_rate, - dbs_data->min_sampling_rate); + od_tuners->min_sampling_rate);
/* * Lock governor so that governor start/stop can't execute in parallel. @@ -442,7 +442,7 @@ show_store_one(od, up_threshold); show_store_one(od, sampling_down_factor); show_store_one(od, ignore_nice_load); show_store_one(od, powersave_bias); -declare_show_sampling_rate_min(od); +show_one(od, min_sampling_rate);
gov_sys_pol_attr_rw(sampling_rate); gov_sys_pol_attr_rw(io_is_busy); @@ -450,10 +450,10 @@ gov_sys_pol_attr_rw(up_threshold); gov_sys_pol_attr_rw(sampling_down_factor); gov_sys_pol_attr_rw(ignore_nice_load); gov_sys_pol_attr_rw(powersave_bias); -gov_sys_pol_attr_ro(sampling_rate_min); +gov_sys_pol_attr_ro(min_sampling_rate);
static struct attribute *dbs_attributes_gov_sys[] = { - &sampling_rate_min_gov_sys.attr, + &min_sampling_rate_gov_sys.attr, &sampling_rate_gov_sys.attr, &up_threshold_gov_sys.attr, &sampling_down_factor_gov_sys.attr, @@ -469,7 +469,7 @@ static struct attribute_group od_attr_group_gov_sys = { };
static struct attribute *dbs_attributes_gov_pol[] = { - &sampling_rate_min_gov_pol.attr, + &min_sampling_rate_gov_pol.attr, &sampling_rate_gov_pol.attr, &up_threshold_gov_pol.attr, &sampling_down_factor_gov_pol.attr, @@ -509,12 +509,12 @@ static int od_init(struct dbs_data *dbs_data, bool notify) * not depending on HZ, but fixed (very low). The deferred * timer might skip some samples if idle/sleeping as needed. */ - dbs_data->min_sampling_rate = MICRO_FREQUENCY_MIN_SAMPLE_RATE; + tuners->min_sampling_rate = MICRO_FREQUENCY_MIN_SAMPLE_RATE; } else { tuners->up_threshold = DEF_FREQUENCY_UP_THRESHOLD;
/* For correct statistics, we need 10 ticks for each measure */ - dbs_data->min_sampling_rate = MIN_SAMPLING_RATE_RATIO * + tuners->min_sampling_rate = MIN_SAMPLING_RATE_RATIO * jiffies_to_usecs(10); }
On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar viresh.kumar@linaro.org wrote:
This extra macro is required because the variable min_sampling_rate is made part of 'struct dbs_data' instead of governor specific tunables.
For further optimization, its better that we kill declare_show_sampling_rate_min() by moving min_sampling_rate to governor specific tunables.
Lets do it.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
To me, this is not about the macro, but about moving min_sampling_rate to governor tunables, so my subject would be something like "cpufreq: governor: Treat min_sampling_rate as a governor-specific tunable".
My changelog, then, would be something like the following:
"The min_sampling_rate governor tunable is a field in struct dbs_data, so it has to be handled in a special way separate from the rest of governor tunables. In particular, that requires a special macro to be present for creating its show/store sysfs attribute callbacks.
However, there is no real need for the data structures and code in question to be arranged this way and if min_sampling_rate is moved to data structures holding the other governor tunables, the sysfs attribute creation macros that work with those tunables will also work with min_sampling_rate and the special macro for it won't be necessary any more. That will make it easier to modify the governor code going forward, so do it."
Thanks, Rafael
On 02-02-16, 21:23, Rafael J. Wysocki wrote:
To me, this is not about the macro, but about moving min_sampling_rate to governor tunables, so my subject would be something like "cpufreq: governor: Treat min_sampling_rate as a governor-specific tunable".
My changelog, then, would be something like the following:
"The min_sampling_rate governor tunable is a field in struct dbs_data, so it has to be handled in a special way separate from the rest of governor tunables. In particular, that requires a special macro to be present for creating its show/store sysfs attribute callbacks.
However, there is no real need for the data structures and code in question to be arranged this way and if min_sampling_rate is moved to data structures holding the other governor tunables, the sysfs attribute creation macros that work with those tunables will also work with min_sampling_rate and the special macro for it won't be necessary any more. That will make it easier to modify the governor code going forward, so do it."
I just copy pasted it with:
[ Rafael: Written changelog ]
just before my sign-off. Thanks.
Until now, governors (ondemand/conservative) were using the 'global-attr' or 'freq-attr', depending on the sysfs location where we want to create governor's directory.
The problem is that, in case of 'freq-attr', we are forced to use show()/store() present in cpufreq.c, which always take policy->rwsem.
And because of that we were facing some ABBA lockups during governor callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT event.
That caused further problems and it never worked perfectly.
This patch attempts to fix that by creating separate sysfs-ops for cpufreq governors.
Because things got much simplified now, we don't need separate show/store callbacks for governor-for-system and governor-per-policy cases.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 71 +++++++++++++--------------------- drivers/cpufreq/cpufreq_governor.c | 50 +++++++++++++++++++----- drivers/cpufreq/cpufreq_governor.h | 31 +++++++++++++-- drivers/cpufreq/cpufreq_ondemand.c | 71 +++++++++++++--------------------- 4 files changed, 122 insertions(+), 101 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 57750367bd26..980145da796a 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -275,51 +275,35 @@ static ssize_t store_freq_step(struct dbs_data *dbs_data, const char *buf, return count; }
-show_store_one(cs, sampling_rate); -show_store_one(cs, sampling_down_factor); -show_store_one(cs, up_threshold); -show_store_one(cs, down_threshold); -show_store_one(cs, ignore_nice_load); -show_store_one(cs, freq_step); -show_one(cs, min_sampling_rate); - -gov_sys_pol_attr_rw(sampling_rate); -gov_sys_pol_attr_rw(sampling_down_factor); -gov_sys_pol_attr_rw(up_threshold); -gov_sys_pol_attr_rw(down_threshold); -gov_sys_pol_attr_rw(ignore_nice_load); -gov_sys_pol_attr_rw(freq_step); -gov_sys_pol_attr_ro(min_sampling_rate); - -static struct attribute *dbs_attributes_gov_sys[] = { - &min_sampling_rate_gov_sys.attr, - &sampling_rate_gov_sys.attr, - &sampling_down_factor_gov_sys.attr, - &up_threshold_gov_sys.attr, - &down_threshold_gov_sys.attr, - &ignore_nice_load_gov_sys.attr, - &freq_step_gov_sys.attr, +gov_show_one(cs, sampling_rate); +gov_show_one(cs, sampling_down_factor); +gov_show_one(cs, up_threshold); +gov_show_one(cs, down_threshold); +gov_show_one(cs, ignore_nice_load); +gov_show_one(cs, freq_step); +gov_show_one(cs, min_sampling_rate); + +gov_attr_rw(sampling_rate); +gov_attr_rw(sampling_down_factor); +gov_attr_rw(up_threshold); +gov_attr_rw(down_threshold); +gov_attr_rw(ignore_nice_load); +gov_attr_rw(freq_step); +gov_attr_ro(min_sampling_rate); + +static struct attribute *dbs_attributes[] = { + &min_sampling_rate.attr, + &sampling_rate.attr, + &sampling_down_factor.attr, + &up_threshold.attr, + &down_threshold.attr, + &ignore_nice_load.attr, + &freq_step.attr, NULL };
-static struct attribute_group cs_attr_group_gov_sys = { - .attrs = dbs_attributes_gov_sys, - .name = "conservative", -}; - -static struct attribute *dbs_attributes_gov_pol[] = { - &min_sampling_rate_gov_pol.attr, - &sampling_rate_gov_pol.attr, - &sampling_down_factor_gov_pol.attr, - &up_threshold_gov_pol.attr, - &down_threshold_gov_pol.attr, - &ignore_nice_load_gov_pol.attr, - &freq_step_gov_pol.attr, - NULL -}; - -static struct attribute_group cs_attr_group_gov_pol = { - .attrs = dbs_attributes_gov_pol, +static struct attribute_group cs_attr_group = { + .attrs = dbs_attributes, .name = "conservative", };
@@ -365,8 +349,7 @@ define_get_cpu_dbs_routines(cs_cpu_dbs_info);
static struct common_dbs_data cs_dbs_cdata = { .governor = GOV_CONSERVATIVE, - .attr_group_gov_sys = &cs_attr_group_gov_sys, - .attr_group_gov_pol = &cs_attr_group_gov_pol, + .attr_group = &cs_attr_group, .get_cpu_cdbs = get_cpu_cdbs, .get_cpu_dbs_info_s = get_cpu_dbs_info_s, .gov_dbs_timer = cs_dbs_timer, diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 9a7edc91ad57..e785a118cbdc 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -22,14 +22,37 @@
#include "cpufreq_governor.h"
-static struct attribute_group *get_sysfs_attr(struct dbs_data *dbs_data) +#define to_dbs_data(k) container_of(k, struct dbs_data, kobj) +#define to_attr(a) container_of(a, struct governor_attr, attr) + +static ssize_t show(struct kobject *kobj, struct attribute *attr, char *buf) { - if (have_governor_per_policy()) - return dbs_data->cdata->attr_group_gov_pol; - else - return dbs_data->cdata->attr_group_gov_sys; + struct dbs_data *dbs_data = to_dbs_data(kobj); + struct governor_attr *gattr = to_attr(attr); + + if (gattr->show) + return gattr->show(dbs_data, buf); + + return -EIO; +} + +static ssize_t store(struct kobject *kobj, struct attribute *attr, + const char *buf, size_t count) +{ + struct dbs_data *dbs_data = to_dbs_data(kobj); + struct governor_attr *gattr = to_attr(attr); + + if (gattr->store) + return gattr->store(dbs_data, buf, count); + + return -EIO; }
+static const struct sysfs_ops sysfs_ops = { + .show = show, + .store = store, +}; + void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) { struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); @@ -354,6 +377,7 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, struct dbs_data *dbs_data, struct common_dbs_data *cdata) { + struct attribute_group *attr_group; int ret;
/* State should be equivalent to EXIT */ @@ -395,10 +419,17 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy,
policy->governor_data = dbs_data;
- ret = sysfs_create_group(get_governor_parent_kobj(policy), - get_sysfs_attr(dbs_data)); - if (ret) + attr_group = dbs_data->cdata->attr_group; + dbs_data->kobj_type.sysfs_ops = &sysfs_ops; + dbs_data->kobj_type.default_attrs = attr_group->attrs; + + ret = kobject_init_and_add(&dbs_data->kobj, &dbs_data->kobj_type, + get_governor_parent_kobj(policy), + attr_group->name); + if (ret) { + pr_err("%s: failed to init dbs_data kobj: %d\n", __func__, ret); goto reset_gdbs_data; + }
return 0;
@@ -426,8 +457,7 @@ static int cpufreq_governor_exit(struct cpufreq_policy *policy, return -EBUSY;
if (!--dbs_data->usage_count) { - sysfs_remove_group(get_governor_parent_kobj(policy), - get_sysfs_attr(dbs_data)); + kobject_put(&dbs_data->kobj);
policy->governor_data = NULL;
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index ad44a8546a3a..59b28133dd68 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -108,6 +108,31 @@ static ssize_t store_##file_name##_gov_pol \ show_one(_gov, file_name); \ store_one(_gov, file_name)
+/* Governor's specific attributes */ +struct dbs_data; +struct governor_attr { + struct attribute attr; + ssize_t (*show)(struct dbs_data *dbs_data, char *buf); + ssize_t (*store)(struct dbs_data *dbs_data, const char *buf, + size_t count); +}; + +#define gov_show_one(_gov, file_name) \ +static ssize_t show_##file_name \ +(struct dbs_data *dbs_data, char *buf) \ +{ \ + struct _gov##_dbs_tuners *tuners = dbs_data->tuners; \ + return sprintf(buf, "%u\n", tuners->file_name); \ +} + +#define gov_attr_ro(_name) \ +static struct governor_attr _name = \ +__ATTR(_name, 0444, show_##_name, NULL) + +#define gov_attr_rw(_name) \ +static struct governor_attr _name = \ +__ATTR(_name, 0644, show_##_name, store_##_name) + /* create helper routines */ #define define_get_cpu_dbs_routines(_dbs_info) \ static struct cpu_dbs_info *get_cpu_cdbs(int cpu) \ @@ -197,14 +222,12 @@ struct cs_dbs_tuners { };
/* Common Governor data across policies */ -struct dbs_data; struct common_dbs_data { /* Common across governors */ #define GOV_ONDEMAND 0 #define GOV_CONSERVATIVE 1 int governor; - struct attribute_group *attr_group_gov_sys; /* one governor - system */ - struct attribute_group *attr_group_gov_pol; /* one governor - policy */ + struct attribute_group *attr_group; /* one governor - system */
/* * Common data for platforms that don't set @@ -234,6 +257,8 @@ struct dbs_data { struct common_dbs_data *cdata; int usage_count; void *tuners; + struct kobject kobj; + struct kobj_type kobj_type; };
/* Governor specific ops, will be passed to dbs_data->gov_ops */ diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index b31f64745232..b7983dd02e24 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -436,51 +436,35 @@ static ssize_t store_powersave_bias(struct dbs_data *dbs_data, const char *buf, return count; }
-show_store_one(od, sampling_rate); -show_store_one(od, io_is_busy); -show_store_one(od, up_threshold); -show_store_one(od, sampling_down_factor); -show_store_one(od, ignore_nice_load); -show_store_one(od, powersave_bias); -show_one(od, min_sampling_rate); - -gov_sys_pol_attr_rw(sampling_rate); -gov_sys_pol_attr_rw(io_is_busy); -gov_sys_pol_attr_rw(up_threshold); -gov_sys_pol_attr_rw(sampling_down_factor); -gov_sys_pol_attr_rw(ignore_nice_load); -gov_sys_pol_attr_rw(powersave_bias); -gov_sys_pol_attr_ro(min_sampling_rate); - -static struct attribute *dbs_attributes_gov_sys[] = { - &min_sampling_rate_gov_sys.attr, - &sampling_rate_gov_sys.attr, - &up_threshold_gov_sys.attr, - &sampling_down_factor_gov_sys.attr, - &ignore_nice_load_gov_sys.attr, - &powersave_bias_gov_sys.attr, - &io_is_busy_gov_sys.attr, +gov_show_one(od, sampling_rate); +gov_show_one(od, io_is_busy); +gov_show_one(od, up_threshold); +gov_show_one(od, sampling_down_factor); +gov_show_one(od, ignore_nice_load); +gov_show_one(od, powersave_bias); +gov_show_one(od, min_sampling_rate); + +gov_attr_rw(sampling_rate); +gov_attr_rw(io_is_busy); +gov_attr_rw(up_threshold); +gov_attr_rw(sampling_down_factor); +gov_attr_rw(ignore_nice_load); +gov_attr_rw(powersave_bias); +gov_attr_ro(min_sampling_rate); + +static struct attribute *dbs_attributes[] = { + &min_sampling_rate.attr, + &sampling_rate.attr, + &up_threshold.attr, + &sampling_down_factor.attr, + &ignore_nice_load.attr, + &powersave_bias.attr, + &io_is_busy.attr, NULL };
-static struct attribute_group od_attr_group_gov_sys = { - .attrs = dbs_attributes_gov_sys, - .name = "ondemand", -}; - -static struct attribute *dbs_attributes_gov_pol[] = { - &min_sampling_rate_gov_pol.attr, - &sampling_rate_gov_pol.attr, - &up_threshold_gov_pol.attr, - &sampling_down_factor_gov_pol.attr, - &ignore_nice_load_gov_pol.attr, - &powersave_bias_gov_pol.attr, - &io_is_busy_gov_pol.attr, - NULL -}; - -static struct attribute_group od_attr_group_gov_pol = { - .attrs = dbs_attributes_gov_pol, +static struct attribute_group od_attr_group = { + .attrs = dbs_attributes, .name = "ondemand", };
@@ -542,8 +526,7 @@ static struct od_ops od_ops = {
static struct common_dbs_data od_dbs_cdata = { .governor = GOV_ONDEMAND, - .attr_group_gov_sys = &od_attr_group_gov_sys, - .attr_group_gov_pol = &od_attr_group_gov_pol, + .attr_group = &od_attr_group, .get_cpu_cdbs = get_cpu_cdbs, .get_cpu_dbs_info_s = get_cpu_dbs_info_s, .gov_dbs_timer = od_dbs_timer,
Hi Viresh,
On 02/02/16 16:27, Viresh Kumar wrote:
Until now, governors (ondemand/conservative) were using the 'global-attr' or 'freq-attr', depending on the sysfs location where we want to create governor's directory.
The problem is that, in case of 'freq-attr', we are forced to use show()/store() present in cpufreq.c, which always take policy->rwsem.
And because of that we were facing some ABBA lockups during governor callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT event.
That caused further problems and it never worked perfectly.
This patch attempts to fix that by creating separate sysfs-ops for cpufreq governors.
Because things got much simplified now, we don't need separate show/store callbacks for governor-for-system and governor-per-policy cases.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
This patch cleans things up a lot, that's good.
One thing I'm still concerned about, though: don't we need some locking in place for some of the store operations on governors attributes? Are store_{ignore_nice_load, sampling_down_fact, etc} safe without locking? It seems that we can call them from different cpus concurrently.
Best,
- Juri
drivers/cpufreq/cpufreq_conservative.c | 71 +++++++++++++--------------------- drivers/cpufreq/cpufreq_governor.c | 50 +++++++++++++++++++----- drivers/cpufreq/cpufreq_governor.h | 31 +++++++++++++-- drivers/cpufreq/cpufreq_ondemand.c | 71 +++++++++++++--------------------- 4 files changed, 122 insertions(+), 101 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 57750367bd26..980145da796a 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -275,51 +275,35 @@ static ssize_t store_freq_step(struct dbs_data *dbs_data, const char *buf, return count; } -show_store_one(cs, sampling_rate); -show_store_one(cs, sampling_down_factor); -show_store_one(cs, up_threshold); -show_store_one(cs, down_threshold); -show_store_one(cs, ignore_nice_load); -show_store_one(cs, freq_step); -show_one(cs, min_sampling_rate);
-gov_sys_pol_attr_rw(sampling_rate); -gov_sys_pol_attr_rw(sampling_down_factor); -gov_sys_pol_attr_rw(up_threshold); -gov_sys_pol_attr_rw(down_threshold); -gov_sys_pol_attr_rw(ignore_nice_load); -gov_sys_pol_attr_rw(freq_step); -gov_sys_pol_attr_ro(min_sampling_rate);
-static struct attribute *dbs_attributes_gov_sys[] = {
- &min_sampling_rate_gov_sys.attr,
- &sampling_rate_gov_sys.attr,
- &sampling_down_factor_gov_sys.attr,
- &up_threshold_gov_sys.attr,
- &down_threshold_gov_sys.attr,
- &ignore_nice_load_gov_sys.attr,
- &freq_step_gov_sys.attr,
+gov_show_one(cs, sampling_rate); +gov_show_one(cs, sampling_down_factor); +gov_show_one(cs, up_threshold); +gov_show_one(cs, down_threshold); +gov_show_one(cs, ignore_nice_load); +gov_show_one(cs, freq_step); +gov_show_one(cs, min_sampling_rate);
+gov_attr_rw(sampling_rate); +gov_attr_rw(sampling_down_factor); +gov_attr_rw(up_threshold); +gov_attr_rw(down_threshold); +gov_attr_rw(ignore_nice_load); +gov_attr_rw(freq_step); +gov_attr_ro(min_sampling_rate);
+static struct attribute *dbs_attributes[] = {
- &min_sampling_rate.attr,
- &sampling_rate.attr,
- &sampling_down_factor.attr,
- &up_threshold.attr,
- &down_threshold.attr,
- &ignore_nice_load.attr,
- &freq_step.attr, NULL
}; -static struct attribute_group cs_attr_group_gov_sys = {
- .attrs = dbs_attributes_gov_sys,
- .name = "conservative",
-};
-static struct attribute *dbs_attributes_gov_pol[] = {
- &min_sampling_rate_gov_pol.attr,
- &sampling_rate_gov_pol.attr,
- &sampling_down_factor_gov_pol.attr,
- &up_threshold_gov_pol.attr,
- &down_threshold_gov_pol.attr,
- &ignore_nice_load_gov_pol.attr,
- &freq_step_gov_pol.attr,
- NULL
-};
-static struct attribute_group cs_attr_group_gov_pol = {
- .attrs = dbs_attributes_gov_pol,
+static struct attribute_group cs_attr_group = {
- .attrs = dbs_attributes, .name = "conservative",
}; @@ -365,8 +349,7 @@ define_get_cpu_dbs_routines(cs_cpu_dbs_info); static struct common_dbs_data cs_dbs_cdata = { .governor = GOV_CONSERVATIVE,
- .attr_group_gov_sys = &cs_attr_group_gov_sys,
- .attr_group_gov_pol = &cs_attr_group_gov_pol,
- .attr_group = &cs_attr_group, .get_cpu_cdbs = get_cpu_cdbs, .get_cpu_dbs_info_s = get_cpu_dbs_info_s, .gov_dbs_timer = cs_dbs_timer,
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 9a7edc91ad57..e785a118cbdc 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -22,14 +22,37 @@ #include "cpufreq_governor.h" -static struct attribute_group *get_sysfs_attr(struct dbs_data *dbs_data) +#define to_dbs_data(k) container_of(k, struct dbs_data, kobj) +#define to_attr(a) container_of(a, struct governor_attr, attr)
+static ssize_t show(struct kobject *kobj, struct attribute *attr, char *buf) {
- if (have_governor_per_policy())
return dbs_data->cdata->attr_group_gov_pol;
- else
return dbs_data->cdata->attr_group_gov_sys;
- struct dbs_data *dbs_data = to_dbs_data(kobj);
- struct governor_attr *gattr = to_attr(attr);
- if (gattr->show)
return gattr->show(dbs_data, buf);
- return -EIO;
+}
+static ssize_t store(struct kobject *kobj, struct attribute *attr,
const char *buf, size_t count)
+{
- struct dbs_data *dbs_data = to_dbs_data(kobj);
- struct governor_attr *gattr = to_attr(attr);
- if (gattr->store)
return gattr->store(dbs_data, buf, count);
- return -EIO;
} +static const struct sysfs_ops sysfs_ops = {
- .show = show,
- .store = store,
+};
void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) { struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); @@ -354,6 +377,7 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, struct dbs_data *dbs_data, struct common_dbs_data *cdata) {
- struct attribute_group *attr_group; int ret;
/* State should be equivalent to EXIT */ @@ -395,10 +419,17 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, policy->governor_data = dbs_data;
- ret = sysfs_create_group(get_governor_parent_kobj(policy),
get_sysfs_attr(dbs_data));
- if (ret)
- attr_group = dbs_data->cdata->attr_group;
- dbs_data->kobj_type.sysfs_ops = &sysfs_ops;
- dbs_data->kobj_type.default_attrs = attr_group->attrs;
- ret = kobject_init_and_add(&dbs_data->kobj, &dbs_data->kobj_type,
get_governor_parent_kobj(policy),
attr_group->name);
- if (ret) {
goto reset_gdbs_data;pr_err("%s: failed to init dbs_data kobj: %d\n", __func__, ret);
- }
return 0; @@ -426,8 +457,7 @@ static int cpufreq_governor_exit(struct cpufreq_policy *policy, return -EBUSY; if (!--dbs_data->usage_count) {
sysfs_remove_group(get_governor_parent_kobj(policy),
get_sysfs_attr(dbs_data));
kobject_put(&dbs_data->kobj);
policy->governor_data = NULL; diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index ad44a8546a3a..59b28133dd68 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -108,6 +108,31 @@ static ssize_t store_##file_name##_gov_pol \ show_one(_gov, file_name); \ store_one(_gov, file_name) +/* Governor's specific attributes */ +struct dbs_data; +struct governor_attr {
- struct attribute attr;
- ssize_t (*show)(struct dbs_data *dbs_data, char *buf);
- ssize_t (*store)(struct dbs_data *dbs_data, const char *buf,
size_t count);
+};
+#define gov_show_one(_gov, file_name) \ +static ssize_t show_##file_name \ +(struct dbs_data *dbs_data, char *buf) \ +{ \
- struct _gov##_dbs_tuners *tuners = dbs_data->tuners; \
- return sprintf(buf, "%u\n", tuners->file_name); \
+}
+#define gov_attr_ro(_name) \ +static struct governor_attr _name = \ +__ATTR(_name, 0444, show_##_name, NULL)
+#define gov_attr_rw(_name) \ +static struct governor_attr _name = \ +__ATTR(_name, 0644, show_##_name, store_##_name)
/* create helper routines */ #define define_get_cpu_dbs_routines(_dbs_info) \ static struct cpu_dbs_info *get_cpu_cdbs(int cpu) \ @@ -197,14 +222,12 @@ struct cs_dbs_tuners { }; /* Common Governor data across policies */ -struct dbs_data; struct common_dbs_data { /* Common across governors */ #define GOV_ONDEMAND 0 #define GOV_CONSERVATIVE 1 int governor;
- struct attribute_group *attr_group_gov_sys; /* one governor - system */
- struct attribute_group *attr_group_gov_pol; /* one governor - policy */
- struct attribute_group *attr_group; /* one governor - system */
/* * Common data for platforms that don't set @@ -234,6 +257,8 @@ struct dbs_data { struct common_dbs_data *cdata; int usage_count; void *tuners;
- struct kobject kobj;
- struct kobj_type kobj_type;
}; /* Governor specific ops, will be passed to dbs_data->gov_ops */ diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index b31f64745232..b7983dd02e24 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -436,51 +436,35 @@ static ssize_t store_powersave_bias(struct dbs_data *dbs_data, const char *buf, return count; } -show_store_one(od, sampling_rate); -show_store_one(od, io_is_busy); -show_store_one(od, up_threshold); -show_store_one(od, sampling_down_factor); -show_store_one(od, ignore_nice_load); -show_store_one(od, powersave_bias); -show_one(od, min_sampling_rate);
-gov_sys_pol_attr_rw(sampling_rate); -gov_sys_pol_attr_rw(io_is_busy); -gov_sys_pol_attr_rw(up_threshold); -gov_sys_pol_attr_rw(sampling_down_factor); -gov_sys_pol_attr_rw(ignore_nice_load); -gov_sys_pol_attr_rw(powersave_bias); -gov_sys_pol_attr_ro(min_sampling_rate);
-static struct attribute *dbs_attributes_gov_sys[] = {
- &min_sampling_rate_gov_sys.attr,
- &sampling_rate_gov_sys.attr,
- &up_threshold_gov_sys.attr,
- &sampling_down_factor_gov_sys.attr,
- &ignore_nice_load_gov_sys.attr,
- &powersave_bias_gov_sys.attr,
- &io_is_busy_gov_sys.attr,
+gov_show_one(od, sampling_rate); +gov_show_one(od, io_is_busy); +gov_show_one(od, up_threshold); +gov_show_one(od, sampling_down_factor); +gov_show_one(od, ignore_nice_load); +gov_show_one(od, powersave_bias); +gov_show_one(od, min_sampling_rate);
+gov_attr_rw(sampling_rate); +gov_attr_rw(io_is_busy); +gov_attr_rw(up_threshold); +gov_attr_rw(sampling_down_factor); +gov_attr_rw(ignore_nice_load); +gov_attr_rw(powersave_bias); +gov_attr_ro(min_sampling_rate);
+static struct attribute *dbs_attributes[] = {
- &min_sampling_rate.attr,
- &sampling_rate.attr,
- &up_threshold.attr,
- &sampling_down_factor.attr,
- &ignore_nice_load.attr,
- &powersave_bias.attr,
- &io_is_busy.attr, NULL
}; -static struct attribute_group od_attr_group_gov_sys = {
- .attrs = dbs_attributes_gov_sys,
- .name = "ondemand",
-};
-static struct attribute *dbs_attributes_gov_pol[] = {
- &min_sampling_rate_gov_pol.attr,
- &sampling_rate_gov_pol.attr,
- &up_threshold_gov_pol.attr,
- &sampling_down_factor_gov_pol.attr,
- &ignore_nice_load_gov_pol.attr,
- &powersave_bias_gov_pol.attr,
- &io_is_busy_gov_pol.attr,
- NULL
-};
-static struct attribute_group od_attr_group_gov_pol = {
- .attrs = dbs_attributes_gov_pol,
+static struct attribute_group od_attr_group = {
- .attrs = dbs_attributes, .name = "ondemand",
}; @@ -542,8 +526,7 @@ static struct od_ops od_ops = { static struct common_dbs_data od_dbs_cdata = { .governor = GOV_ONDEMAND,
- .attr_group_gov_sys = &od_attr_group_gov_sys,
- .attr_group_gov_pol = &od_attr_group_gov_pol,
- .attr_group = &od_attr_group, .get_cpu_cdbs = get_cpu_cdbs, .get_cpu_dbs_info_s = get_cpu_dbs_info_s, .gov_dbs_timer = od_dbs_timer,
-- 2.7.0.79.gdc08a19
On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli juri.lelli@arm.com wrote:
Hi Viresh,
On 02/02/16 16:27, Viresh Kumar wrote:
Until now, governors (ondemand/conservative) were using the 'global-attr' or 'freq-attr', depending on the sysfs location where we want to create governor's directory.
The problem is that, in case of 'freq-attr', we are forced to use show()/store() present in cpufreq.c, which always take policy->rwsem.
And because of that we were facing some ABBA lockups during governor callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT event.
That caused further problems and it never worked perfectly.
This patch attempts to fix that by creating separate sysfs-ops for cpufreq governors.
Because things got much simplified now, we don't need separate show/store callbacks for governor-for-system and governor-per-policy cases.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
This patch cleans things up a lot, that's good.
One thing I'm still concerned about, though: don't we need some locking in place for some of the store operations on governors attributes? Are store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?
That would require some investigation I suppose.
It seems that we can call them from different cpus concurrently.
Yes, we can.
One quick-and-dirty way of dealing with that might be to introduce a "sysfs lock" into struct dbs_data and hold that around the invocation of gattr->store() in the sysfs_ops's ->store callback.
Best,
- Juri
BTW, you could have dropped the stuff below this line from your reply message. That at least would have prevented tools like Patchwork from storing useless garbage.
Thanks, Rafael
Hi Rafael,
On 02/02/16 17:35, Rafael J. Wysocki wrote:
On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli juri.lelli@arm.com wrote:
Hi Viresh,
On 02/02/16 16:27, Viresh Kumar wrote:
Until now, governors (ondemand/conservative) were using the 'global-attr' or 'freq-attr', depending on the sysfs location where we want to create governor's directory.
The problem is that, in case of 'freq-attr', we are forced to use show()/store() present in cpufreq.c, which always take policy->rwsem.
And because of that we were facing some ABBA lockups during governor callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT event.
That caused further problems and it never worked perfectly.
This patch attempts to fix that by creating separate sysfs-ops for cpufreq governors.
Because things got much simplified now, we don't need separate show/store callbacks for governor-for-system and governor-per-policy cases.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
This patch cleans things up a lot, that's good.
One thing I'm still concerned about, though: don't we need some locking in place for some of the store operations on governors attributes? Are store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?
That would require some investigation I suppose.
It seems that we can call them from different cpus concurrently.
Yes, we can.
One quick-and-dirty way of dealing with that might be to introduce a "sysfs lock" into struct dbs_data and hold that around the invocation of gattr->store() in the sysfs_ops's ->store callback.
There is value in trying to solve this issue by using some of the existing locks, IMHO.
Can't we actually try to use the policy->rwsem (or one of the core locks) + wait_for_completion approach as we do in cpufreq core?
BTW, you could have dropped the stuff below this line from your reply message. That at least would have prevented tools like Patchwork from storing useless garbage.
Right. Sorry for the garbage; I'll check twice that I trim my replies in the future.
Best,
- Juri
On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli juri.lelli@arm.com wrote:
Hi Rafael,
On 02/02/16 17:35, Rafael J. Wysocki wrote:
On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli juri.lelli@arm.com wrote:
Hi Viresh,
On 02/02/16 16:27, Viresh Kumar wrote:
Until now, governors (ondemand/conservative) were using the 'global-attr' or 'freq-attr', depending on the sysfs location where we want to create governor's directory.
The problem is that, in case of 'freq-attr', we are forced to use show()/store() present in cpufreq.c, which always take policy->rwsem.
And because of that we were facing some ABBA lockups during governor callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT event.
That caused further problems and it never worked perfectly.
This patch attempts to fix that by creating separate sysfs-ops for cpufreq governors.
Because things got much simplified now, we don't need separate show/store callbacks for governor-for-system and governor-per-policy cases.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
This patch cleans things up a lot, that's good.
One thing I'm still concerned about, though: don't we need some locking in place for some of the store operations on governors attributes? Are store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?
That would require some investigation I suppose.
It seems that we can call them from different cpus concurrently.
Yes, we can.
One quick-and-dirty way of dealing with that might be to introduce a "sysfs lock" into struct dbs_data and hold that around the invocation of gattr->store() in the sysfs_ops's ->store callback.
There is value in trying to solve this issue by using some of the existing locks, IMHO.
Some value - maybe. I'm not sure how much of it, though.
Finer-grained locking is generally easier to follow, because the locks tend to be used for specific purposes only.
Can't we actually try to use the policy->rwsem (or one of the core locks) + wait_for_completion approach as we do in cpufreq core?
No. Too many things depend on that lock already and some of them work by accident rather than by design.
Thanks, Rafael
On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:
On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli juri.lelli@arm.com wrote:
Hi Rafael,
On 02/02/16 17:35, Rafael J. Wysocki wrote:
On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli juri.lelli@arm.com wrote:
Hi Viresh,
On 02/02/16 16:27, Viresh Kumar wrote:
Until now, governors (ondemand/conservative) were using the 'global-attr' or 'freq-attr', depending on the sysfs location where we want to create governor's directory.
The problem is that, in case of 'freq-attr', we are forced to use show()/store() present in cpufreq.c, which always take policy->rwsem.
And because of that we were facing some ABBA lockups during governor callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT event.
That caused further problems and it never worked perfectly.
This patch attempts to fix that by creating separate sysfs-ops for cpufreq governors.
Because things got much simplified now, we don't need separate show/store callbacks for governor-for-system and governor-per-policy cases.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
This patch cleans things up a lot, that's good.
One thing I'm still concerned about, though: don't we need some locking in place for some of the store operations on governors attributes? Are store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?
That would require some investigation I suppose.
It seems that we can call them from different cpus concurrently.
Yes, we can.
One quick-and-dirty way of dealing with that might be to introduce a "sysfs lock" into struct dbs_data and hold that around the invocation of gattr->store() in the sysfs_ops's ->store callback.
There is value in trying to solve this issue by using some of the existing locks, IMHO.
Some value - maybe. I'm not sure how much of it, though.
Finer-grained locking is generally easier to follow, because the locks tend to be used for specific purposes only.
Can't we actually try to use the policy->rwsem (or one of the core locks) + wait_for_completion approach as we do in cpufreq core?
No. Too many things depend on that lock already and some of them work by accident rather than by design.
Also, wait_for_completion() and complete() is just another way to implement a lock. So, it won't necessarily solve any deadlock issues.
I also don't like this patch because it forces governors to either implement their own macros and management of their attributes or force them to use the governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO is very ondemand and conservative governor specific and is very irrelevant for sched-dvfs or any other governors (hint hint).
The only time this ABBA locking is an issue is when governor are changing and trying to add/remove attributes. That can easily be checked in store_governor and dealt with without holding the policy rwsem if the governors can provide their per sys and per policy attribute arrays as part of registering themselves.
I'm sorry that I just keep talking about the idea and not sending out the patches.
-Saravana
On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan skannan@codeaurora.org wrote:
On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:
On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli juri.lelli@arm.com wrote:
Hi Rafael,
On 02/02/16 17:35, Rafael J. Wysocki wrote:
On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli juri.lelli@arm.com wrote:
Hi Viresh,
On 02/02/16 16:27, Viresh Kumar wrote:
Until now, governors (ondemand/conservative) were using the 'global-attr' or 'freq-attr', depending on the sysfs location where we want to create governor's directory.
The problem is that, in case of 'freq-attr', we are forced to use show()/store() present in cpufreq.c, which always take policy->rwsem.
And because of that we were facing some ABBA lockups during governor callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT event.
That caused further problems and it never worked perfectly.
This patch attempts to fix that by creating separate sysfs-ops for cpufreq governors.
Because things got much simplified now, we don't need separate show/store callbacks for governor-for-system and governor-per-policy cases.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
This patch cleans things up a lot, that's good.
One thing I'm still concerned about, though: don't we need some locking in place for some of the store operations on governors attributes? Are store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?
That would require some investigation I suppose.
It seems that we can call them from different cpus concurrently.
Yes, we can.
One quick-and-dirty way of dealing with that might be to introduce a "sysfs lock" into struct dbs_data and hold that around the invocation of gattr->store() in the sysfs_ops's ->store callback.
There is value in trying to solve this issue by using some of the existing locks, IMHO.
Some value - maybe. I'm not sure how much of it, though.
Finer-grained locking is generally easier to follow, because the locks tend to be used for specific purposes only.
Can't we actually try to use the policy->rwsem (or one of the core locks) + wait_for_completion approach as we do in cpufreq core?
No. Too many things depend on that lock already and some of them work by accident rather than by design.
Also, wait_for_completion() and complete() is just another way to implement a lock. So, it won't necessarily solve any deadlock issues.
I also don't like this patch because it forces governors to either implement their own macros and management of their attributes or force them to use the governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO is very ondemand and conservative governor specific and is very irrelevant for sched-dvfs or any other governors (hint hint).
The only time this ABBA locking is an issue is when governor are changing and trying to add/remove attributes. That can easily be checked in store_governor and dealt with without holding the policy rwsem if the governors can provide their per sys and per policy attribute arrays as part of registering themselves.
I'm sorry that I just keep talking about the idea and not sending out the patches.
I think you have a point, though.
The deadlock really is specific to the governors using the code in cpufreq_governor.c.
Thanks, Rafael
On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki rafael@kernel.org wrote:
On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan skannan@codeaurora.org wrote:
On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:
On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli juri.lelli@arm.com wrote:
[cut]
I also don't like this patch because it forces governors to either implement their own macros and management of their attributes or force them to use the governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO is very ondemand and conservative governor specific and is very irrelevant for sched-dvfs or any other governors (hint hint).
The only time this ABBA locking is an issue is when governor are changing and trying to add/remove attributes. That can easily be checked in store_governor and dealt with without holding the policy rwsem if the governors can provide their per sys and per policy attribute arrays as part of registering themselves.
I'm sorry that I just keep talking about the idea and not sending out the patches.
I think you have a point, though.
The deadlock really is specific to the governors using the code in cpufreq_governor.c.
That said no other governors in the tree use any sysfs attributes for tunables AFAICS and the out-of-the tree ones are out of interest here.
Also the deadlock happens if one of the tunable attributes is accessed while we're trying to remove it which very well may happen on read access too.
Thanks, Rafael
On 02/02/2016 05:07 PM, Rafael J. Wysocki wrote:
On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki rafael@kernel.org wrote:
On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan skannan@codeaurora.org wrote:
On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:
On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli juri.lelli@arm.com wrote:
[cut]
I also don't like this patch because it forces governors to either implement their own macros and management of their attributes or force them to use the governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO is very ondemand and conservative governor specific and is very irrelevant for sched-dvfs or any other governors (hint hint).
The only time this ABBA locking is an issue is when governor are changing and trying to add/remove attributes. That can easily be checked in store_governor and dealt with without holding the policy rwsem if the governors can provide their per sys and per policy attribute arrays as part of registering themselves.
I'm sorry that I just keep talking about the idea and not sending out the patches.
I think you have a point, though.
The deadlock really is specific to the governors using the code in cpufreq_governor.c.
That said no other governors in the tree use any sysfs attributes for tunables AFAICS and the out-of-the tree ones are out of interest here.
But if we are expecting sched dvfs to come in, why make it worse for it. It would be completely pointless to try and shoehorn sched dvfs to use cpufreq_governor.c
Also the deadlock happens if one of the tunable attributes is accessed while we're trying to remove it which very well may happen on read access too.
Isn't this THE deadlock we are talking about? The removal of the attributes only happen when governors are changes and we send a POLICY_EXIT and or all the cores are hotplugged out. And my suggestion would work just as well there.
Why are you prefixing your sentence with "Also"? Is there some other case I'm not considering?
-Saravana
On Wed, Feb 3, 2016 at 2:32 AM, Saravana Kannan skannan@codeaurora.org wrote:
On 02/02/2016 05:07 PM, Rafael J. Wysocki wrote:
On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki rafael@kernel.org wrote:
On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan skannan@codeaurora.org wrote:
On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:
On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli juri.lelli@arm.com wrote:
[cut]
I also don't like this patch because it forces governors to either implement their own macros and management of their attributes or force them to use the governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO is very ondemand and conservative governor specific and is very irrelevant for sched-dvfs or any other governors (hint hint).
The only time this ABBA locking is an issue is when governor are changing and trying to add/remove attributes. That can easily be checked in store_governor and dealt with without holding the policy rwsem if the governors can provide their per sys and per policy attribute arrays as part of registering themselves.
I'm sorry that I just keep talking about the idea and not sending out the patches.
I think you have a point, though.
The deadlock really is specific to the governors using the code in cpufreq_governor.c.
That said no other governors in the tree use any sysfs attributes for tunables AFAICS and the out-of-the tree ones are out of interest here.
But if we are expecting sched dvfs to come in, why make it worse for it. It would be completely pointless to try and shoehorn sched dvfs to use cpufreq_governor.c
Well, do you honestly think that using the existing stuff in it would be a good idea?
If not, then why it matters at all?
Also the deadlock happens if one of the tunable attributes is accessed while we're trying to remove it which very well may happen on read access too.
Isn't this THE deadlock we are talking about? The removal of the attributes only happen when governors are changes and we send a POLICY_EXIT and or all the cores are hotplugged out.
It generally happens when the "old" governor is going away, whatever the reason.
And my suggestion would work just as well there.
Why are you prefixing your sentence with "Also"? Is there some other case I'm not considering?
Say someone is reading sampling_rate for a policy with 1 CPU in it and someone else is taking the CPU offline. The governor EXIT code path (that will trigger as a result) will try to remove the sampling_rate attribute and (if it does that under policy->rwsem) it'll wait for the read access to finish. Where exactly would you put the deadlock prevention in this case?
Thanks, Rafael
On 02/02/2016 05:52 PM, Rafael J. Wysocki wrote:
On Wed, Feb 3, 2016 at 2:32 AM, Saravana Kannan skannan@codeaurora.org wrote:
On 02/02/2016 05:07 PM, Rafael J. Wysocki wrote:
On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki rafael@kernel.org wrote:
On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan skannan@codeaurora.org wrote:
On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:
On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli juri.lelli@arm.com wrote: > >
[cut]
I also don't like this patch because it forces governors to either implement their own macros and management of their attributes or force them to use the governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO is very ondemand and conservative governor specific and is very irrelevant for sched-dvfs or any other governors (hint hint).
The only time this ABBA locking is an issue is when governor are changing and trying to add/remove attributes. That can easily be checked in store_governor and dealt with without holding the policy rwsem if the governors can provide their per sys and per policy attribute arrays as part of registering themselves.
I'm sorry that I just keep talking about the idea and not sending out the patches.
I think you have a point, though.
The deadlock really is specific to the governors using the code in cpufreq_governor.c.
That said no other governors in the tree use any sysfs attributes for tunables AFAICS and the out-of-the tree ones are out of interest here.
But if we are expecting sched dvfs to come in, why make it worse for it. It would be completely pointless to try and shoehorn sched dvfs to use cpufreq_governor.c
Well, do you honestly think that using the existing stuff in it would be a good idea?
If not, then why it matters at all?
Also the deadlock happens if one of the tunable attributes is accessed while we're trying to remove it which very well may happen on read access too.
Isn't this THE deadlock we are talking about? The removal of the attributes only happen when governors are changes and we send a POLICY_EXIT and or all the cores are hotplugged out.
It generally happens when the "old" governor is going away, whatever the reason.
And my suggestion would work just as well there.
Why are you prefixing your sentence with "Also"? Is there some other case I'm not considering?
Say someone is reading sampling_rate for a policy with 1 CPU in it and someone else is taking the CPU offline. The governor EXIT code path (that will trigger as a result) will try to remove the sampling_rate attribute and (if it does that under policy->rwsem) it'll wait for the read access to finish. Where exactly would you put the deadlock prevention in this case?
This is the hotplug case I mentioned. The sysfs file removals will happen only for the last CPU in that policy (we thankfully optimized that part last year). We also know that multiple CPUs can't be hotplugged at the same time. So, in the start of cpufreq_offline_prepare, we just need to check if this is the last CPU in the policy and if that's the case, do the gov sysfs remove and then grab the policy lock and do all our crap. If a read is going on, that's going to finish before the sysfs attr remove can go ahead and it can grab the policy lock if it needs to and that still won't cause a deadlock because we haven't yet grabbed the policy lock in cpufreq_offline_prepare(). If the read comes after the sysfs remove, then the read is obviously going to fail (we can depend on the sysfs framework on doing its job there).
Will that still leave any race conditions in?
-Saravana
On 02-02-16, 20:03, Saravana Kannan wrote:
This is the hotplug case I mentioned. The sysfs file removals will happen only for the last CPU in that policy (we thankfully optimized that part last year). We also know that multiple CPUs can't be hotplugged at the same time. So, in the start of cpufreq_offline_prepare, we just need to check if this is the last CPU in the policy and if that's the case, do the gov sysfs remove and then grab the policy lock and do all our crap. If a read is going on, that's going to finish before the sysfs attr remove can go ahead and it can grab the policy lock if it needs to and that still won't cause a deadlock because we haven't yet grabbed the policy lock in cpufreq_offline_prepare(). If the read comes after the sysfs remove, then the read is obviously going to fail (we can depend on the sysfs framework on doing its job there).
IMHO, these are all dirty hacks we should stay away from. Adding such hunks in code is considered a band-aid kind of solution and hurts readability badly. The new solution (new governor show/store) implement this in a very clean and proper way I feel..
Others are free to disagree though :)
On 02/02/2016 10:57 PM, Viresh Kumar wrote:
On 02-02-16, 20:03, Saravana Kannan wrote:
This is the hotplug case I mentioned. The sysfs file removals will happen only for the last CPU in that policy (we thankfully optimized that part last year). We also know that multiple CPUs can't be hotplugged at the same time. So, in the start of cpufreq_offline_prepare, we just need to check if this is the last CPU in the policy and if that's the case, do the gov sysfs remove and then grab the policy lock and do all our crap. If a read is going on, that's going to finish before the sysfs attr remove can go ahead and it can grab the policy lock if it needs to and that still won't cause a deadlock because we haven't yet grabbed the policy lock in cpufreq_offline_prepare(). If the read comes after the sysfs remove, then the read is obviously going to fail (we can depend on the sysfs framework on doing its job there).
IMHO, these are all dirty hacks we should stay away from. Adding such hunks in code is considered a band-aid kind of solution and hurts readability badly. The new solution (new governor show/store) implement this in a very clean and proper way I feel..
Others are free to disagree though :)
I think it looks clean since we haven't sorted out the race conditions that Juri pointed out. So, it's early to call this series clean :)
Also, I don't see it as a dirty hack at all. What's so hacky about it? We are just identifying conditions when we'll have to remove the sysfs files and removing them before grabbing the policy lock.
The unlock/lock that we have now is what is a dirty hack.
-Saravana
On 02-02-16, 17:32, Saravana Kannan wrote:
But if we are expecting sched dvfs to come in, why make it worse for it. It would be completely pointless to try and shoehorn sched dvfs to use cpufreq_governor.c
We can move the common part to cpufreq core and not make sched-dvfs reuse cpufreq_governor.c
On 03/02/16 12:24, Viresh Kumar wrote:
On 02-02-16, 17:32, Saravana Kannan wrote:
But if we are expecting sched dvfs to come in, why make it worse for it. It would be completely pointless to try and shoehorn sched dvfs to use cpufreq_governor.c
We can move the common part to cpufreq core and not make sched-dvfs reuse cpufreq_governor.c
I also think that sched-dvfs should not use cpufreq_governor.c. It is useful boilerplate code for ondemand and conservative, as they share lot of data structures and how they work, but it doesn't necessarily suit everybody's needs, IMHO.
OTOH, fixing the current issue in the best way we can come up with has still value of course :).
Best,
- Juri
On 03-02-16, 10:51, Juri Lelli wrote:
I also think that sched-dvfs should not use cpufreq_governor.c. It is useful boilerplate code for ondemand and conservative, as they share lot of data structures and how they work, but it doesn't necessarily suit everybody's needs, IMHO.
OTOH, fixing the current issue in the best way we can come up with has still value of course :).
Right. cpufreq_governor.c is more about the technique where we do load evaluation using deferred timers and workqueues, which isn't required for sched-dvfs.
We can just move the common parts, like, governor_show/governor_store routines, and the new macros being added here.
On 02/02/2016 10:54 PM, Viresh Kumar wrote:
On 02-02-16, 17:32, Saravana Kannan wrote:
But if we are expecting sched dvfs to come in, why make it worse for it. It would be completely pointless to try and shoehorn sched dvfs to use cpufreq_governor.c
We can move the common part to cpufreq core and not make sched-dvfs reuse cpufreq_governor.c
Let's do this please.
-Saravana
On 02-02-16, 14:21, Saravana Kannan wrote:
I also don't like this patch because it forces governors to either implement their own macros and management of their attributes or force them to use the governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO is very ondemand and conservative governor specific and is very irrelevant for sched-dvfs or any other governors (hint hint).
But who is stopping us from breaking that file and moving some of it into include/linux/cpufreq.h ?
We can do that today as well, but it would be fine to do that, when we add more governors to the core. Though, it would only take a simple patch if people want me to do it now.
The only time this ABBA locking is an issue is when governor are changing and trying to add/remove attributes. That can easily be checked in store_governor
store_scaling_governor ??
and dealt with without holding the policy rwsem if the
Are you saying that we could have taken the rwsem from the generic cpufreq.c:store() and dropped it from store_scaling_governor() ? That would have been something similar to what I tried earlier, which I never posted (I gave the link to that few days back).
governors can provide their per sys and per policy attribute arrays as part of registering themselves.
These per-sys and per-policy attributes really suck. There is nothing really different in the implementation, just that the show/store callbacks have different prototype. One accept 'kboj' as the parameter, other accept 'policy'. I would call that a HACK as well (I only implemented it though).
That should just die. A single list of attributes is what we should have had initially as well.
On 02-02-16, 17:01, Juri Lelli wrote:
Hi Rafael,
On 02/02/16 17:35, Rafael J. Wysocki wrote:
On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli juri.lelli@arm.com wrote:
This patch cleans things up a lot, that's good.
One thing I'm still concerned about, though: don't we need some locking in place for some of the store operations on governors attributes? Are store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?
That would require some investigation I suppose.
Yeah, that protection is required. Sorry about that.
It seems that we can call them from different cpus concurrently.
Yes, we can.
One quick-and-dirty way of dealing with that might be to introduce a "sysfs lock" into struct dbs_data and hold that around the invocation of gattr->store() in the sysfs_ops's ->store callback.
s/dirty/sane ? :)
Can't we actually try to use the policy->rwsem (or one of the core locks) + wait_for_completion approach as we do in cpufreq core?
policy->rwsem will defeat the purpose of this change as it will reintroduce the ABBA issue.
On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar viresh.kumar@linaro.org wrote:
First, the subject might be better. What about something like "cpufreq: governor: New sysfs show/store callbacks for governor tunables", for example?
Until now, governors (ondemand/conservative) were using the 'global-attr' or 'freq-attr', depending on the sysfs location where we want to create governor's directory.
"The ondemand and conservative governors use the global-attr or freq-attr structures to represent sysfs attributes corresponding to their tunables (which of them is actually used depends on whether or not different policy objects can use different governors at the same time and, consequently, on where those attributes are located in sysfs).
Unfortunately, in the freq-attr case, the standard cpufreq show/store sysfs attribute callbacks are applied to the governor tunable attributes and they always acquire the policy->rwsem lock before carrying out the operation. That may lead to an ABBA deadlock if governor tunable attributes are removed under policy->rwsem while one of them is being accessed concurrently (if sysfs attributes removal wins the race, it will wait for the access to complete with policy->rwsem held while the attribute callback will block on policy->rwsem indefinitely).
We attempted to address this issue by dropping policy->rwsem around governor tunable attributes removal (that is, around invocations of the ->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT) in cpufreq_set_policy(), but that opened up race conditions that had not been possible with policy->rwsem held all the time. Therefore policy->rwsem cannot be dropped in cpufreq_set_policy() at any point, but the deadlock situation described above must be avoided too.
To that end, use the observation that in principle governor tunables may be represented by the same data type regardless of whether the governor is system-wide or per-policy and introduce a new structure, struct governor_attr, for representing them and new corresponding macros for creating show/store sysfs callbacks for them. Also make their parent kobject use a new kobject type whose default show/store callbacks are not related to the standard core cpufreq ones in any way (and they don't acquire policy->rwsem in particular)."
IOW, (1) describe the problem you're addressing so that people unfamiliar with the code in question can understand it, (2) describe what is done to address the problem (what's the idea and what changes are made to implement it).
[cut]
--- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -22,14 +22,37 @@
#include "cpufreq_governor.h"
-static struct attribute_group *get_sysfs_attr(struct dbs_data *dbs_data) +#define to_dbs_data(k) container_of(k, struct dbs_data, kobj) +#define to_attr(a) container_of(a, struct governor_attr, attr)
Please change the above to static inline routines.
+static ssize_t show(struct kobject *kobj, struct attribute *attr, char *buf)
A better name please. Something that will correspond to the purpose.
{
if (have_governor_per_policy())
return dbs_data->cdata->attr_group_gov_pol;
else
return dbs_data->cdata->attr_group_gov_sys;
struct dbs_data *dbs_data = to_dbs_data(kobj);
struct governor_attr *gattr = to_attr(attr);
if (gattr->show)
return gattr->show(dbs_data, buf);
return -EIO;
+}
+static ssize_t store(struct kobject *kobj, struct attribute *attr,
const char *buf, size_t count)
Ditto.
+{
struct dbs_data *dbs_data = to_dbs_data(kobj);
struct governor_attr *gattr = to_attr(attr);
if (gattr->store)
return gattr->store(dbs_data, buf, count);
Say two instances of this run in parallel with each other, either for the same attribute or for different attributes under the same dbs_data. What's the guarantee that they won't make conflicting changes?
return -EIO;
}
+static const struct sysfs_ops sysfs_ops = {
.show = show,
.store = store,
+};
That is completely enigmatic, so please at least add a comment describing it.
void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) { struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); @@ -354,6 +377,7 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, struct dbs_data *dbs_data, struct common_dbs_data *cdata) {
struct attribute_group *attr_group; int ret; /* State should be equivalent to EXIT */
@@ -395,10 +419,17 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy,
policy->governor_data = dbs_data;
ret = sysfs_create_group(get_governor_parent_kobj(policy),
get_sysfs_attr(dbs_data));
if (ret)
attr_group = dbs_data->cdata->attr_group;
dbs_data->kobj_type.sysfs_ops = &sysfs_ops;
dbs_data->kobj_type.default_attrs = attr_group->attrs;
Why can't the kobject type be defined in struct common_dbs_data? Surely, it will be the same for all dbs_data objects corresponding to the same governor, won't it?
ret = kobject_init_and_add(&dbs_data->kobj, &dbs_data->kobj_type,
get_governor_parent_kobj(policy),
attr_group->name);
if (ret) {
pr_err("%s: failed to init dbs_data kobj: %d\n", __func__, ret);
pr_debug() would be better here.
goto reset_gdbs_data;
} return 0;
@@ -426,8 +457,7 @@ static int cpufreq_governor_exit(struct cpufreq_policy *policy, return -EBUSY;
if (!--dbs_data->usage_count) {
sysfs_remove_group(get_governor_parent_kobj(policy),
get_sysfs_attr(dbs_data));
kobject_put(&dbs_data->kobj);
Don't we need a ->release callback for this kobject?
policy->governor_data = NULL;
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index ad44a8546a3a..59b28133dd68 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -108,6 +108,31 @@ static ssize_t store_##file_name##_gov_pol \ show_one(_gov, file_name); \ store_one(_gov, file_name)
+/* Governor's specific attributes */ +struct dbs_data; +struct governor_attr {
struct attribute attr;
ssize_t (*show)(struct dbs_data *dbs_data, char *buf);
ssize_t (*store)(struct dbs_data *dbs_data, const char *buf,
size_t count);
+};
+#define gov_show_one(_gov, file_name) \ +static ssize_t show_##file_name \ +(struct dbs_data *dbs_data, char *buf) \ +{ \
struct _gov##_dbs_tuners *tuners = dbs_data->tuners; \
return sprintf(buf, "%u\n", tuners->file_name); \
+}
+#define gov_attr_ro(_name) \ +static struct governor_attr _name = \ +__ATTR(_name, 0444, show_##_name, NULL)
+#define gov_attr_rw(_name) \ +static struct governor_attr _name = \ +__ATTR(_name, 0644, show_##_name, store_##_name)
/* create helper routines */ #define define_get_cpu_dbs_routines(_dbs_info) \ static struct cpu_dbs_info *get_cpu_cdbs(int cpu) \ @@ -197,14 +222,12 @@ struct cs_dbs_tuners { };
/* Common Governor data across policies */ -struct dbs_data; struct common_dbs_data { /* Common across governors */ #define GOV_ONDEMAND 0 #define GOV_CONSERVATIVE 1 int governor;
struct attribute_group *attr_group_gov_sys; /* one governor - system */
struct attribute_group *attr_group_gov_pol; /* one governor - policy */
struct attribute_group *attr_group; /* one governor - system */ /* * Common data for platforms that don't set
@@ -234,6 +257,8 @@ struct dbs_data { struct common_dbs_data *cdata; int usage_count; void *tuners;
struct kobject kobj;
struct kobj_type kobj_type;
This is questionable. The kobject type doesn't have to be dynamic IMO.
};
Thanks, Rafael
On 02-02-16, 22:23, Rafael J. Wysocki wrote:
On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar viresh.kumar@linaro.org wrote:
"The ondemand and conservative governors use the global-attr or freq-attr structures to represent sysfs attributes corresponding to their tunables
(which of them is actually used depends on whether or not different policy objects can use different governors at the same time
Not exactly. Different policies can always use different governors. What made the difference was that different policies using same governor, with different tunables or separate governor directories.
I have reworded this para like:
The ondemand and conservative governors use the global-attr or freq-attr structures to represent sysfs attributes corresponding to their tunables (which of them is actually used depends on whether or not different policy objects can use same governor with different tunables at the same time and, consequently, on where those attributes are located in sysfs).
Please let me know if isn't clear.
--- a/drivers/cpufreq/cpufreq_governor.c
ret = kobject_init_and_add(&dbs_data->kobj, &dbs_data->kobj_type,
get_governor_parent_kobj(policy),
attr_group->name);
if (ret) {
pr_err("%s: failed to init dbs_data kobj: %d\n", __func__, ret);
pr_debug() would be better here.
Its a real error, why pr_debug for that ?
goto reset_gdbs_data;
} return 0;
@@ -426,8 +457,7 @@ static int cpufreq_governor_exit(struct cpufreq_policy *policy, return -EBUSY;
if (!--dbs_data->usage_count) {
sysfs_remove_group(get_governor_parent_kobj(policy),
get_sysfs_attr(dbs_data));
kobject_put(&dbs_data->kobj);
Don't we need a ->release callback for this kobject?
There is nothing that we need to free from the ->release() callback. We are using the kobject here just to get separate show/store callbacks.
Here is the new version based on the review comments received until now:
-------------------------8<-------------------------
From: Viresh Kumar viresh.kumar@linaro.org Date: Tue, 2 Feb 2016 12:35:01 +0530 Subject: [PATCH] cpufreq: governor: New sysfs show/store callbacks for governor tunables
The ondemand and conservative governors use the global-attr or freq-attr structures to represent sysfs attributes corresponding to their tunables (which of them is actually used depends on whether or not different policy objects can use same governor with different tunables at the same time and, consequently, on where those attributes are located in sysfs).
Unfortunately, in the freq-attr case, the standard cpufreq show/store sysfs attribute callbacks are applied to the governor tunable attributes and they always acquire the policy->rwsem lock before carrying out the operation. That may lead to an ABBA deadlock if governor tunable attributes are removed under policy->rwsem while one of them is being accessed concurrently (if sysfs attributes removal wins the race, it will wait for the access to complete with policy->rwsem held while the attribute callback will block on policy->rwsem indefinitely).
We attempted to address this issue by dropping policy->rwsem around governor tunable attributes removal (that is, around invocations of the ->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT) in cpufreq_set_policy(), but that opened up race conditions that had not been possible with policy->rwsem held all the time. Therefore policy->rwsem cannot be dropped in cpufreq_set_policy() at any point, but the deadlock situation described above must be avoided too.
To that end, use the observation that in principle governor tunables may be represented by the same data type regardless of whether the governor is system-wide or per-policy and introduce a new structure, struct governor_attr, for representing them and new corresponding macros for creating show/store sysfs callbacks for them. Also make their parent kobject use a new kobject type whose default show/store callbacks are not related to the standard core cpufreq ones in any way (and they don't acquire policy->rwsem in particular).
[ Rafael: Written changelog ] Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 73 ++++++++++++---------------------- drivers/cpufreq/cpufreq_governor.c | 71 ++++++++++++++++++++++++++++----- drivers/cpufreq/cpufreq_governor.h | 34 ++++++++++++++-- drivers/cpufreq/cpufreq_ondemand.c | 73 ++++++++++++---------------------- 4 files changed, 144 insertions(+), 107 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 57750367bd26..c749fb4fe5d2 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -275,54 +275,33 @@ static ssize_t store_freq_step(struct dbs_data *dbs_data, const char *buf, return count; }
-show_store_one(cs, sampling_rate); -show_store_one(cs, sampling_down_factor); -show_store_one(cs, up_threshold); -show_store_one(cs, down_threshold); -show_store_one(cs, ignore_nice_load); -show_store_one(cs, freq_step); -show_one(cs, min_sampling_rate); - -gov_sys_pol_attr_rw(sampling_rate); -gov_sys_pol_attr_rw(sampling_down_factor); -gov_sys_pol_attr_rw(up_threshold); -gov_sys_pol_attr_rw(down_threshold); -gov_sys_pol_attr_rw(ignore_nice_load); -gov_sys_pol_attr_rw(freq_step); -gov_sys_pol_attr_ro(min_sampling_rate); - -static struct attribute *dbs_attributes_gov_sys[] = { - &min_sampling_rate_gov_sys.attr, - &sampling_rate_gov_sys.attr, - &sampling_down_factor_gov_sys.attr, - &up_threshold_gov_sys.attr, - &down_threshold_gov_sys.attr, - &ignore_nice_load_gov_sys.attr, - &freq_step_gov_sys.attr, +gov_show_one(cs, sampling_rate); +gov_show_one(cs, sampling_down_factor); +gov_show_one(cs, up_threshold); +gov_show_one(cs, down_threshold); +gov_show_one(cs, ignore_nice_load); +gov_show_one(cs, freq_step); +gov_show_one(cs, min_sampling_rate); + +gov_attr_rw(sampling_rate); +gov_attr_rw(sampling_down_factor); +gov_attr_rw(up_threshold); +gov_attr_rw(down_threshold); +gov_attr_rw(ignore_nice_load); +gov_attr_rw(freq_step); +gov_attr_ro(min_sampling_rate); + +static struct attribute *cs_attributes[] = { + &min_sampling_rate.attr, + &sampling_rate.attr, + &sampling_down_factor.attr, + &up_threshold.attr, + &down_threshold.attr, + &ignore_nice_load.attr, + &freq_step.attr, NULL };
-static struct attribute_group cs_attr_group_gov_sys = { - .attrs = dbs_attributes_gov_sys, - .name = "conservative", -}; - -static struct attribute *dbs_attributes_gov_pol[] = { - &min_sampling_rate_gov_pol.attr, - &sampling_rate_gov_pol.attr, - &sampling_down_factor_gov_pol.attr, - &up_threshold_gov_pol.attr, - &down_threshold_gov_pol.attr, - &ignore_nice_load_gov_pol.attr, - &freq_step_gov_pol.attr, - NULL -}; - -static struct attribute_group cs_attr_group_gov_pol = { - .attrs = dbs_attributes_gov_pol, - .name = "conservative", -}; - /************************** sysfs end ************************/
static int cs_init(struct dbs_data *dbs_data, bool notify) @@ -365,8 +344,8 @@ define_get_cpu_dbs_routines(cs_cpu_dbs_info);
static struct common_dbs_data cs_dbs_cdata = { .governor = GOV_CONSERVATIVE, - .attr_group_gov_sys = &cs_attr_group_gov_sys, - .attr_group_gov_pol = &cs_attr_group_gov_pol, + .kobj_name = "conservative", + .kobj_type = { .sysfs_ops = &governor_sysfs_ops, .default_attrs = cs_attributes }, .get_cpu_cdbs = get_cpu_cdbs, .get_cpu_dbs_info_s = get_cpu_dbs_info_s, .gov_dbs_timer = cs_dbs_timer, diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 9a7edc91ad57..a9f335c4e461 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -22,14 +22,62 @@
#include "cpufreq_governor.h"
-static struct attribute_group *get_sysfs_attr(struct dbs_data *dbs_data) +static inline struct dbs_data *to_dbs_data(struct kobject *kobj) { - if (have_governor_per_policy()) - return dbs_data->cdata->attr_group_gov_pol; - else - return dbs_data->cdata->attr_group_gov_sys; + return container_of(kobj, struct dbs_data, kobj); +} + +static inline struct governor_attr *to_gov_attr(struct attribute *attr) +{ + return container_of(attr, struct governor_attr, attr); +} + +static ssize_t governor_show(struct kobject *kobj, struct attribute *attr, + char *buf) +{ + struct dbs_data *dbs_data = to_dbs_data(kobj); + struct governor_attr *gattr = to_gov_attr(attr); + int ret = -EIO; + + down_read(&dbs_data->rwsem); + + if (gattr->show) + ret = gattr->show(dbs_data, buf); + + up_read(&dbs_data->rwsem); + + return ret; }
+static ssize_t governor_store(struct kobject *kobj, struct attribute *attr, + const char *buf, size_t count) +{ + struct dbs_data *dbs_data = to_dbs_data(kobj); + struct governor_attr *gattr = to_gov_attr(attr); + int ret = -EIO; + + down_write(&dbs_data->rwsem); + + if (gattr->store) + ret = gattr->store(dbs_data, buf, count); + + up_write(&dbs_data->rwsem); + + return ret; +} + +/* + * Sysfs Ops for accessing governor attributes. + * + * All show/store invocations for governor specific sysfs attributes, will first + * call the below show/store callbacks and the attribute specific callback will + * be called from within it. + */ +const struct sysfs_ops governor_sysfs_ops = { + .show = governor_show, + .store = governor_store, +}; + void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) { struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); @@ -383,6 +431,7 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy,
dbs_data->cdata = cdata; dbs_data->usage_count = 1; + init_rwsem(&dbs_data->rwsem);
ret = cdata->init(dbs_data, !policy->governor->initialized); if (ret) @@ -395,10 +444,13 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy,
policy->governor_data = dbs_data;
- ret = sysfs_create_group(get_governor_parent_kobj(policy), - get_sysfs_attr(dbs_data)); - if (ret) + ret = kobject_init_and_add(&dbs_data->kobj, &cdata->kobj_type, + get_governor_parent_kobj(policy), + cdata->kobj_name); + if (ret) { + pr_err("%s: failed to init dbs_data kobj: %d\n", __func__, ret); goto reset_gdbs_data; + }
return 0;
@@ -426,8 +478,7 @@ static int cpufreq_governor_exit(struct cpufreq_policy *policy, return -EBUSY;
if (!--dbs_data->usage_count) { - sysfs_remove_group(get_governor_parent_kobj(policy), - get_sysfs_attr(dbs_data)); + kobject_put(&dbs_data->kobj);
policy->governor_data = NULL;
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index ad44a8546a3a..67500a1015cf 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -108,6 +108,31 @@ static ssize_t store_##file_name##_gov_pol \ show_one(_gov, file_name); \ store_one(_gov, file_name)
+/* Governor's specific attributes */ +struct dbs_data; +struct governor_attr { + struct attribute attr; + ssize_t (*show)(struct dbs_data *dbs_data, char *buf); + ssize_t (*store)(struct dbs_data *dbs_data, const char *buf, + size_t count); +}; + +#define gov_show_one(_gov, file_name) \ +static ssize_t show_##file_name \ +(struct dbs_data *dbs_data, char *buf) \ +{ \ + struct _gov##_dbs_tuners *tuners = dbs_data->tuners; \ + return sprintf(buf, "%u\n", tuners->file_name); \ +} + +#define gov_attr_ro(_name) \ +static struct governor_attr _name = \ +__ATTR(_name, 0444, show_##_name, NULL) + +#define gov_attr_rw(_name) \ +static struct governor_attr _name = \ +__ATTR(_name, 0644, show_##_name, store_##_name) + /* create helper routines */ #define define_get_cpu_dbs_routines(_dbs_info) \ static struct cpu_dbs_info *get_cpu_cdbs(int cpu) \ @@ -197,14 +222,13 @@ struct cs_dbs_tuners { };
/* Common Governor data across policies */ -struct dbs_data; struct common_dbs_data { /* Common across governors */ #define GOV_ONDEMAND 0 #define GOV_CONSERVATIVE 1 int governor; - struct attribute_group *attr_group_gov_sys; /* one governor - system */ - struct attribute_group *attr_group_gov_pol; /* one governor - policy */ + const char *kobj_name; + struct kobj_type kobj_type;
/* * Common data for platforms that don't set @@ -234,6 +258,9 @@ struct dbs_data { struct common_dbs_data *cdata; int usage_count; void *tuners; + struct kobject kobj; + /* Protect concurrent updates to governor tunables from sysfs */ + struct rw_semaphore rwsem; };
/* Governor specific ops, will be passed to dbs_data->gov_ops */ @@ -256,6 +283,7 @@ static inline int delay_for_sampling_rate(unsigned int sampling_rate) }
extern struct mutex cpufreq_governor_lock; +extern const struct sysfs_ops governor_sysfs_ops;
void gov_add_timers(struct cpufreq_policy *policy, unsigned int delay); void gov_cancel_work(struct cpu_common_dbs_info *shared); diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index b31f64745232..82ed490f7de0 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -436,54 +436,33 @@ static ssize_t store_powersave_bias(struct dbs_data *dbs_data, const char *buf, return count; }
-show_store_one(od, sampling_rate); -show_store_one(od, io_is_busy); -show_store_one(od, up_threshold); -show_store_one(od, sampling_down_factor); -show_store_one(od, ignore_nice_load); -show_store_one(od, powersave_bias); -show_one(od, min_sampling_rate); - -gov_sys_pol_attr_rw(sampling_rate); -gov_sys_pol_attr_rw(io_is_busy); -gov_sys_pol_attr_rw(up_threshold); -gov_sys_pol_attr_rw(sampling_down_factor); -gov_sys_pol_attr_rw(ignore_nice_load); -gov_sys_pol_attr_rw(powersave_bias); -gov_sys_pol_attr_ro(min_sampling_rate); - -static struct attribute *dbs_attributes_gov_sys[] = { - &min_sampling_rate_gov_sys.attr, - &sampling_rate_gov_sys.attr, - &up_threshold_gov_sys.attr, - &sampling_down_factor_gov_sys.attr, - &ignore_nice_load_gov_sys.attr, - &powersave_bias_gov_sys.attr, - &io_is_busy_gov_sys.attr, +gov_show_one(od, sampling_rate); +gov_show_one(od, io_is_busy); +gov_show_one(od, up_threshold); +gov_show_one(od, sampling_down_factor); +gov_show_one(od, ignore_nice_load); +gov_show_one(od, powersave_bias); +gov_show_one(od, min_sampling_rate); + +gov_attr_rw(sampling_rate); +gov_attr_rw(io_is_busy); +gov_attr_rw(up_threshold); +gov_attr_rw(sampling_down_factor); +gov_attr_rw(ignore_nice_load); +gov_attr_rw(powersave_bias); +gov_attr_ro(min_sampling_rate); + +static struct attribute *od_attributes[] = { + &min_sampling_rate.attr, + &sampling_rate.attr, + &up_threshold.attr, + &sampling_down_factor.attr, + &ignore_nice_load.attr, + &powersave_bias.attr, + &io_is_busy.attr, NULL };
-static struct attribute_group od_attr_group_gov_sys = { - .attrs = dbs_attributes_gov_sys, - .name = "ondemand", -}; - -static struct attribute *dbs_attributes_gov_pol[] = { - &min_sampling_rate_gov_pol.attr, - &sampling_rate_gov_pol.attr, - &up_threshold_gov_pol.attr, - &sampling_down_factor_gov_pol.attr, - &ignore_nice_load_gov_pol.attr, - &powersave_bias_gov_pol.attr, - &io_is_busy_gov_pol.attr, - NULL -}; - -static struct attribute_group od_attr_group_gov_pol = { - .attrs = dbs_attributes_gov_pol, - .name = "ondemand", -}; - /************************** sysfs end ************************/
static int od_init(struct dbs_data *dbs_data, bool notify) @@ -542,8 +521,8 @@ static struct od_ops od_ops = {
static struct common_dbs_data od_dbs_cdata = { .governor = GOV_ONDEMAND, - .attr_group_gov_sys = &od_attr_group_gov_sys, - .attr_group_gov_pol = &od_attr_group_gov_pol, + .kobj_name = "ondemand", + .kobj_type = { .sysfs_ops = &governor_sysfs_ops, .default_attrs = od_attributes }, .get_cpu_cdbs = get_cpu_cdbs, .get_cpu_dbs_info_s = get_cpu_dbs_info_s, .gov_dbs_timer = od_dbs_timer,
On Wed, Feb 3, 2016 at 7:58 AM, Viresh Kumar viresh.kumar@linaro.org wrote:
On 02-02-16, 22:23, Rafael J. Wysocki wrote:
On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar viresh.kumar@linaro.org wrote:
"The ondemand and conservative governors use the global-attr or freq-attr structures to represent sysfs attributes corresponding to their tunables
(which of them is actually used depends on whether or not different policy objects can use different governors at the same time
Not exactly. Different policies can always use different governors. What made the difference was that different policies using same governor, with different tunables or separate governor directories.
I have reworded this para like:
The ondemand and conservative governors use the global-attr or freq-attr structures to represent sysfs attributes corresponding to their tunables (which of them is actually used depends on whether or not different policy objects can use same governor with different tunables at the same time and, consequently, on where those attributes are located in sysfs).
Please let me know if isn't clear.
That's OK. IMO you should say "use the same governor", but that's easily fixable. :-)
--- a/drivers/cpufreq/cpufreq_governor.c
ret = kobject_init_and_add(&dbs_data->kobj, &dbs_data->kobj_type,
get_governor_parent_kobj(policy),
attr_group->name);
if (ret) {
pr_err("%s: failed to init dbs_data kobj: %d\n", __func__, ret);
pr_debug() would be better here.
Its a real error, why pr_debug for that ?
What's the value of printing that on user systems? It contains debug information only and it is not useful to anyone unfamiliar with the code in question anyway.
goto reset_gdbs_data;
} return 0;
@@ -426,8 +457,7 @@ static int cpufreq_governor_exit(struct cpufreq_policy *policy, return -EBUSY;
if (!--dbs_data->usage_count) {
sysfs_remove_group(get_governor_parent_kobj(policy),
get_sysfs_attr(dbs_data));
kobject_put(&dbs_data->kobj);
Don't we need a ->release callback for this kobject?
There is nothing that we need to free from the ->release() callback. We are using the kobject here just to get separate show/store callbacks.
Well, I guess the answer should be that there can't be more active references to the kobject, so it is safe to free it synchronously later.
Here is the new version based on the review comments received until now:
-------------------------8<-------------------------
From: Viresh Kumar viresh.kumar@linaro.org Date: Tue, 2 Feb 2016 12:35:01 +0530 Subject: [PATCH] cpufreq: governor: New sysfs show/store callbacks for governor tunables
[cut]
@@ -22,14 +22,62 @@
#include "cpufreq_governor.h"
-static struct attribute_group *get_sysfs_attr(struct dbs_data *dbs_data) +static inline struct dbs_data *to_dbs_data(struct kobject *kobj) {
if (have_governor_per_policy())
return dbs_data->cdata->attr_group_gov_pol;
else
return dbs_data->cdata->attr_group_gov_sys;
return container_of(kobj, struct dbs_data, kobj);
+}
+static inline struct governor_attr *to_gov_attr(struct attribute *attr) +{
return container_of(attr, struct governor_attr, attr);
+}
+static ssize_t governor_show(struct kobject *kobj, struct attribute *attr,
char *buf)
+{
struct dbs_data *dbs_data = to_dbs_data(kobj);
struct governor_attr *gattr = to_gov_attr(attr);
int ret = -EIO;
down_read(&dbs_data->rwsem);
if (gattr->show)
ret = gattr->show(dbs_data, buf);
up_read(&dbs_data->rwsem);
Do we need the lock here too?
show() is only going to read the value, isn't it? And everything u32 or smaller is read atomically anyway.
Apart from this it looks good to me.
When you're ready, please resend the whole series without patch [5/5] which is premature IMO.
Thanks, Rafael
On 03-02-16, 13:42, Rafael J. Wysocki wrote:
+static ssize_t governor_show(struct kobject *kobj, struct attribute *attr,
char *buf)
+{
struct dbs_data *dbs_data = to_dbs_data(kobj);
struct governor_attr *gattr = to_gov_attr(attr);
int ret = -EIO;
down_read(&dbs_data->rwsem);
if (gattr->show)
ret = gattr->show(dbs_data, buf);
up_read(&dbs_data->rwsem);
Do we need the lock here too?
show() is only going to read the value, isn't it? And everything u32 or smaller is read atomically anyway.
Okay, will drop it for now.
Apart from this it looks good to me.
When you're ready, please resend the whole series without patch [5/5] which is premature IMO.
I have changed that patch a bit, and am dropping just the lock now and not governor_enable thing. That should be sane enough I believe.
Anyway I will be posting 7 patches now, pick only first 4 if you aren't confident about the rest.
On Wed, Feb 3, 2016 at 2:21 PM, Viresh Kumar viresh.kumar@linaro.org wrote:
On 03-02-16, 13:42, Rafael J. Wysocki wrote:
+static ssize_t governor_show(struct kobject *kobj, struct attribute *attr,
char *buf)
+{
struct dbs_data *dbs_data = to_dbs_data(kobj);
struct governor_attr *gattr = to_gov_attr(attr);
int ret = -EIO;
down_read(&dbs_data->rwsem);
if (gattr->show)
ret = gattr->show(dbs_data, buf);
up_read(&dbs_data->rwsem);
Do we need the lock here too?
show() is only going to read the value, isn't it? And everything u32 or smaller is read atomically anyway.
Okay, will drop it for now.
Apart from this it looks good to me.
When you're ready, please resend the whole series without patch [5/5] which is premature IMO.
I have changed that patch a bit, and am dropping just the lock now and not governor_enable thing. That should be sane enough I believe.
In any case this is not suitable for 4.5 IMO.
Anyway I will be posting 7 patches now, pick only first 4 if you aren't confident about the rest.
OK
Thanks, Rafael
The older macros aren't used anymore, remove them.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_governor.h | 68 -------------------------------------- 1 file changed, 68 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 59b28133dd68..5a7de52815c4 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -40,74 +40,6 @@ /* Ondemand Sampling types */ enum {OD_NORMAL_SAMPLE, OD_SUB_SAMPLE};
-/* - * Macro for creating governors sysfs routines - * - * - gov_sys: One governor instance per whole system - * - gov_pol: One governor instance per policy - */ - -/* Create attributes */ -#define gov_sys_attr_ro(_name) \ -static struct global_attr _name##_gov_sys = \ -__ATTR(_name, 0444, show_##_name##_gov_sys, NULL) - -#define gov_sys_attr_rw(_name) \ -static struct global_attr _name##_gov_sys = \ -__ATTR(_name, 0644, show_##_name##_gov_sys, store_##_name##_gov_sys) - -#define gov_pol_attr_ro(_name) \ -static struct freq_attr _name##_gov_pol = \ -__ATTR(_name, 0444, show_##_name##_gov_pol, NULL) - -#define gov_pol_attr_rw(_name) \ -static struct freq_attr _name##_gov_pol = \ -__ATTR(_name, 0644, show_##_name##_gov_pol, store_##_name##_gov_pol) - -#define gov_sys_pol_attr_rw(_name) \ - gov_sys_attr_rw(_name); \ - gov_pol_attr_rw(_name) - -#define gov_sys_pol_attr_ro(_name) \ - gov_sys_attr_ro(_name); \ - gov_pol_attr_ro(_name) - -/* Create show/store routines */ -#define show_one(_gov, file_name) \ -static ssize_t show_##file_name##_gov_sys \ -(struct kobject *kobj, struct attribute *attr, char *buf) \ -{ \ - struct _gov##_dbs_tuners *tuners = _gov##_dbs_cdata.gdbs_data->tuners; \ - return sprintf(buf, "%u\n", tuners->file_name); \ -} \ - \ -static ssize_t show_##file_name##_gov_pol \ -(struct cpufreq_policy *policy, char *buf) \ -{ \ - struct dbs_data *dbs_data = policy->governor_data; \ - struct _gov##_dbs_tuners *tuners = dbs_data->tuners; \ - return sprintf(buf, "%u\n", tuners->file_name); \ -} - -#define store_one(_gov, file_name) \ -static ssize_t store_##file_name##_gov_sys \ -(struct kobject *kobj, struct attribute *attr, const char *buf, size_t count) \ -{ \ - struct dbs_data *dbs_data = _gov##_dbs_cdata.gdbs_data; \ - return store_##file_name(dbs_data, buf, count); \ -} \ - \ -static ssize_t store_##file_name##_gov_pol \ -(struct cpufreq_policy *policy, const char *buf, size_t count) \ -{ \ - struct dbs_data *dbs_data = policy->governor_data; \ - return store_##file_name(dbs_data, buf, count); \ -} - -#define show_store_one(_gov, file_name) \ -show_one(_gov, file_name); \ -store_one(_gov, file_name) - /* Governor's specific attributes */ struct dbs_data; struct governor_attr {
On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar viresh.kumar@linaro.org wrote:
The older macros aren't used anymore, remove them.
It doesn't follow from either the subject or the changelog which macros *in* *particular* are in question.
What about changing the subject to something like "cpufreq: governor: Drop unused macros for creating governor tunable attributes" and the changelog to something like:
"The previous commit introduced a new set of macros for creating sysfs attributes that represent governor tunables and the old macros used for this purpose are not needed any more, so drop them."
Thanks, Rafael
Commit 955ef4833574 ("cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT") dropped these because of some ABBA lockup issues.
The previous commit has fixed them all and we don't need to drop these locks anymore.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 5 ----- include/linux/cpufreq.h | 4 ---- 2 files changed, 9 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index e979ec78b695..5f7e24567e0e 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2155,10 +2155,7 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, return ret; }
- up_write(&policy->rwsem); ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); - down_write(&policy->rwsem); - if (ret) { pr_err("%s: Failed to Exit Governor: %s (%d)\n", __func__, old_gov->name, ret); @@ -2174,9 +2171,7 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, if (!ret) goto out;
- up_write(&policy->rwsem); __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); - down_write(&policy->rwsem); }
/* new governor failed, so re-start old one */ diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 88a4215125bc..79b87cebaa9c 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -100,10 +100,6 @@ struct cpufreq_policy { * - Any routine that will write to the policy structure and/or may take away * the policy altogether (eg. CPU hotplug), will hold this lock in write * mode before doing so. - * - * Additional rules: - * - Lock should not be held across - * __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT); */ struct rw_semaphore rwsem;
On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar viresh.kumar@linaro.org wrote:
Commit 955ef4833574 ("cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT") dropped these because of some ABBA lockup issues.
The previous commit has fixed them all and we don't need to drop these locks anymore.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
First of all, this is effectively reverting commit 955ef4833574, so the subject should be "Revert commit 955ef4833574 (cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT)".
There should be a Fixes: tag pointing to commit 955ef4833574 and a Reported-by: for Juri.
If there is a link to a bug report related to this, it should be pointed to by a Link: tag.
The changelog should say why the original commit was there and why the way it attempted to solve the problem was incorrect. It also should say that the original problem was addressed by a previous commit, so this one can be reverted without consequences.
But I'm not going to write that changelog. I actually am not going to write any changelogs for you any more, because I'm seriously tired of doing that. Moreover, if I see a patch from you with a changelog that's not acceptable to me, it will immediately go to the "not applicable" trash bin no matter what the changes below look like. You *have* *to* write useful changelogs. This isn't optional or best effort. This is mandatory and important.
Now, I'm not really sure if the ordering of this patchset is right. Maybe we should just revert upfront with the "we'll address the original problem in the following commits" statement in the changelog and fix it in a different way? It looks like patches [1-3/5] fix a problem that isn't there even, but would appear after the [4/5] if they were not applied previously, which doesn't sound really straightforward to me.
Thanks, Rafael
On 02-02-16, 22:53, Rafael J. Wysocki wrote:
First of all, this is effectively reverting commit 955ef4833574, so the subject should be "Revert commit 955ef4833574 (cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT)".
There should be a Fixes: tag pointing to commit 955ef4833574 and a Reported-by: for Juri.
If there is a link to a bug report related to this, it should be pointed to by a Link: tag.
The changelog should say why the original commit was there and why the way it attempted to solve the problem was incorrect. It also should say that the original problem was addressed by a previous commit, so this one can be reverted without consequences.
How about this:
Revert "cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT"
Earlier, when the struct freq-attr was used to represent governor attributes, the standard cpufreq show/store sysfs attribute callbacks were applied to the governor tunable attributes and they always acquire the policy->rwsem lock before carrying out the operation. That could have resulted in an ABBA deadlock if governor tunable attributes are removed under policy->rwsem while one of them is being accessed concurrently (if sysfs attributes removal wins the race, it will wait for the access to complete with policy->rwsem held while the attribute callback will block on policy->rwsem indefinitely).
We attempted to address this issue by dropping policy->rwsem around governor tunable attributes removal (that is, around invocations of the ->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT) in cpufreq_set_policy(), but that opened up race conditions that had not been possible with policy->rwsem held all the time.
The previous commit, "cpufreq: governor: New sysfs show/store callbacks for governor tunables", fixed the original ABBA deadlock by adding new governor specific show/store callbacks.
We don't have to drop rwsem around invocations of governor event CPUFREQ_GOV_POLICY_EXIT anymore, and original fix can be reverted now.
Fixes: 955ef4833574 ("cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT") Reported-by: Juri Lelli juri.lelli@arm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
But I'm not going to write that changelog. I actually am not going to write any changelogs for you any more, because I'm seriously tired of doing that. Moreover, if I see a patch from you with a changelog that's not acceptable to me, it will immediately go to the "not applicable" trash bin no matter what the changes below look like. You *have* *to* write useful changelogs. This isn't optional or best effort. This is mandatory and important.
Will try to improve, sorry about that (again).
Now, I'm not really sure if the ordering of this patchset is right. Maybe we should just revert upfront with the "we'll address the original problem in the following commits" statement in the changelog and fix it in a different way?
Wouldn't that break things like 'git bisect'? People running kernels after the reverted commits may start hitting lockdeps.
It looks like patches [1-3/5] fix a problem that isn't there even, but would appear after the [4/5] if they were not applied previously, which doesn't sound really straightforward to me.
I am going to fight hard for it, if you feel 4/5 should be the first patch here, I will do that.
On Wed, Feb 3, 2016 at 6:51 AM, Viresh Kumar viresh.kumar@linaro.org wrote:
On 02-02-16, 22:53, Rafael J. Wysocki wrote:
First of all, this is effectively reverting commit 955ef4833574, so the subject should be "Revert commit 955ef4833574 (cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT)".
There should be a Fixes: tag pointing to commit 955ef4833574 and a Reported-by: for Juri.
If there is a link to a bug report related to this, it should be pointed to by a Link: tag.
The changelog should say why the original commit was there and why the way it attempted to solve the problem was incorrect. It also should say that the original problem was addressed by a previous commit, so this one can be reverted without consequences.
How about this:
Revert "cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT" Earlier, when the struct freq-attr was used to represent governor attributes, the standard cpufreq show/store sysfs attribute callbacks were applied to the governor tunable attributes and they always acquire the policy->rwsem lock before carrying out the operation. That could have resulted in an ABBA deadlock if governor tunable attributes are removed under policy->rwsem while one of them is being accessed concurrently (if sysfs attributes removal wins the race, it will wait for the access to complete with policy->rwsem held while the attribute callback will block on policy->rwsem indefinitely). We attempted to address this issue by dropping policy->rwsem around governor tunable attributes removal (that is, around invocations of the ->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT) in cpufreq_set_policy(), but that opened up race conditions that had not been possible with policy->rwsem held all the time. The previous commit, "cpufreq: governor: New sysfs show/store callbacks for governor tunables", fixed the original ABBA deadlock by adding new governor specific show/store callbacks. We don't have to drop rwsem around invocations of governor event CPUFREQ_GOV_POLICY_EXIT anymore, and original fix can be reverted now. Fixes: 955ef4833574 ("cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT") Reported-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Much better.
But I'm not going to write that changelog. I actually am not going to write any changelogs for you any more, because I'm seriously tired of doing that. Moreover, if I see a patch from you with a changelog that's not acceptable to me, it will immediately go to the "not applicable" trash bin no matter what the changes below look like. You *have* *to* write useful changelogs. This isn't optional or best effort. This is mandatory and important.
Will try to improve, sorry about that (again).
Now, I'm not really sure if the ordering of this patchset is right. Maybe we should just revert upfront with the "we'll address the original problem in the following commits" statement in the changelog and fix it in a different way?
Wouldn't that break things like 'git bisect'? People running kernels after the reverted commits may start hitting lockdeps.
Well, we have at least one bug in there before applying the whole series anyway, regardless of the ordering.
It looks like patches [1-3/5] fix a problem that isn't there even, but would appear after the [4/5] if they were not applied previously, which doesn't sound really straightforward to me.
I am going to fight hard for it, if you feel 4/5 should be the first patch here, I will do that.
I guess this was supposed to be "I am not ...".
With the above changelog the current order of patches in the series is fine.
Thanks, Rafael
Invalid state-transitions is verified by governor core now and there is no need to replicate that in cpufreq core. Also we don't drop policy->rwsem anymore, which makes rest of the races go away.
Simplify code a bit now.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 24 ------------------------ include/linux/cpufreq.h | 1 - 2 files changed, 25 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 5f7e24567e0e..052ad1b9372c 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -102,7 +102,6 @@ static LIST_HEAD(cpufreq_governor_list); static struct cpufreq_driver *cpufreq_driver; static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data); static DEFINE_RWLOCK(cpufreq_driver_lock); -DEFINE_MUTEX(cpufreq_governor_lock);
/* Flag to suspend/resume CPUFreq governors */ static bool cpufreq_suspended; @@ -1963,21 +1962,6 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
pr_debug("%s: for CPU %u, event %u\n", __func__, policy->cpu, event);
- mutex_lock(&cpufreq_governor_lock); - if ((policy->governor_enabled && event == CPUFREQ_GOV_START) - || (!policy->governor_enabled - && (event == CPUFREQ_GOV_LIMITS || event == CPUFREQ_GOV_STOP))) { - mutex_unlock(&cpufreq_governor_lock); - return -EBUSY; - } - - if (event == CPUFREQ_GOV_STOP) - policy->governor_enabled = false; - else if (event == CPUFREQ_GOV_START) - policy->governor_enabled = true; - - mutex_unlock(&cpufreq_governor_lock); - ret = policy->governor->governor(policy, event);
if (!ret) { @@ -1985,14 +1969,6 @@ static int __cpufreq_governor(struct cpufreq_policy *policy, policy->governor->initialized++; else if (event == CPUFREQ_GOV_POLICY_EXIT) policy->governor->initialized--; - } else { - /* Restore original values */ - mutex_lock(&cpufreq_governor_lock); - if (event == CPUFREQ_GOV_STOP) - policy->governor_enabled = true; - else if (event == CPUFREQ_GOV_START) - policy->governor_enabled = false; - mutex_unlock(&cpufreq_governor_lock); }
if (((event == CPUFREQ_GOV_POLICY_INIT) && ret) || diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 79b87cebaa9c..e90cf5d31e85 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -80,7 +80,6 @@ struct cpufreq_policy { unsigned int last_policy; /* policy before unplug */ struct cpufreq_governor *governor; /* see below */ void *governor_data; - bool governor_enabled; /* governor start/stop flag */ char last_governor[CPUFREQ_NAME_LEN]; /* last governor used */
struct work_struct update; /* if update_policy() needs to be
Hi Viresh,
On 02/02/16 16:27, Viresh Kumar wrote:
Invalid state-transitions is verified by governor core now and there is no need to replicate that in cpufreq core. Also we don't drop policy->rwsem anymore, which makes rest of the races go away.
There are still paths where we call __cpufreq_governor() without holding policy->rwsem, but those should be fixed with my cleanups (that I intend to refresh and post soon). So, I'm not sure we can safely remove this yet.
Simplify code a bit now.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/cpufreq.c | 24 ------------------------ include/linux/cpufreq.h | 1 - 2 files changed, 25 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 5f7e24567e0e..052ad1b9372c 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -102,7 +102,6 @@ static LIST_HEAD(cpufreq_governor_list); static struct cpufreq_driver *cpufreq_driver; static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data); static DEFINE_RWLOCK(cpufreq_driver_lock); -DEFINE_MUTEX(cpufreq_governor_lock); /* Flag to suspend/resume CPUFreq governors */ static bool cpufreq_suspended; @@ -1963,21 +1962,6 @@ static int __cpufreq_governor(struct cpufreq_policy *policy, pr_debug("%s: for CPU %u, event %u\n", __func__, policy->cpu, event);
- mutex_lock(&cpufreq_governor_lock);
- if ((policy->governor_enabled && event == CPUFREQ_GOV_START)
|| (!policy->governor_enabled
&& (event == CPUFREQ_GOV_LIMITS || event == CPUFREQ_GOV_STOP))) {
mutex_unlock(&cpufreq_governor_lock);
return -EBUSY;
- }
- if (event == CPUFREQ_GOV_STOP)
policy->governor_enabled = false;
- else if (event == CPUFREQ_GOV_START)
policy->governor_enabled = true;
- mutex_unlock(&cpufreq_governor_lock);
- ret = policy->governor->governor(policy, event);
So, __cpufreq_governor() becomes effectively a wrapper around ->governor() calls and governors are left responsible for implementing the state machine with appropriate checks.
I'm wondering if this approach is completely sane, but what we end up with your changes should work (and we kill a lock! :)).
Maybe we add a comment somewhere stating exactly how things are meant to work?
Thanks,
- Juri
On 02-02-16, 16:49, Juri Lelli wrote:
There are still paths where we call __cpufreq_governor() without holding policy->rwsem, but those should be fixed with my cleanups (that I intend to refresh and post soon). So, I'm not sure we can safely remove this yet.
No, we can't.. Though I haven't seen any races from happening even after removing it, but it doesn't mean we can't.
The deal is that, the entire sequence of events needs to be guaranteed to happen in a particular order without any other code performing similar operations concurrently.
And so we need to preserve the other sites with proper rwsem locking first.
So, __cpufreq_governor() becomes effectively a wrapper around ->governor() calls and governors are left responsible for implementing the state machine with appropriate checks.
Not really. The core can now guarantee that the entire sequence happens atomically. For example, on governor switch, we need to guarantee that STOP/EXIT happen without any intervention for the old governor. Or, INIT/START/LIMITS happen without any intervention for the new governor, etc..
Maybe we add a comment somewhere stating exactly how things are meant to work?
Hmm.
On 03/02/16 11:35, Viresh Kumar wrote:
On 02-02-16, 16:49, Juri Lelli wrote:
There are still paths where we call __cpufreq_governor() without holding policy->rwsem, but those should be fixed with my cleanups (that I intend to refresh and post soon). So, I'm not sure we can safely remove this yet.
No, we can't.. Though I haven't seen any races from happening even after removing it, but it doesn't mean we can't.
The deal is that, the entire sequence of events needs to be guaranteed to happen in a particular order without any other code performing similar operations concurrently.
And so we need to preserve the other sites with proper rwsem locking first.
Right. I guess it is what I was trying to do with my cleanups, adding assertions and fixing paths that didn't verify those.
It should be easy to rebase that set (or a part of it) on top of your and/or Rafael changes. I realize that there are multiple sets of changes under discussion; so, please tell me how do you, and Rafael, want to proceed about this.
So, __cpufreq_governor() becomes effectively a wrapper around ->governor() calls and governors are left responsible for implementing the state machine with appropriate checks.
Not really. The core can now guarantee that the entire sequence happens atomically. For example, on governor switch, we need to guarantee that STOP/EXIT happen without any intervention for the old governor. Or, INIT/START/LIMITS happen without any intervention for the new governor, etc..
OK, checking for invalid state transitions (for ondemand and conservative) is still in done cpufreq_governor.c.
Maybe we add a comment somewhere stating exactly how things are meant to work?
But, I guess any other governor that will bypass cpufreq_governor.c, it will also have to implement such checks. I was just proposing to state this somewhere, so that we don't forget.
Best,
- Juri
On 03-02-16, 11:05, Juri Lelli wrote:
It should be easy to rebase that set (or a part of it) on top of your and/or Rafael changes. I realize that there are multiple sets of changes under discussion; so, please tell me how do you, and Rafael, want to proceed about this.
Yeah, please wait for a bit for Rafael to apply both the series (if they pass the litmus test) to PM tree :)
But, I guess any other governor that will bypass cpufreq_governor.c, it will also have to implement such checks. I was just proposing to state this somewhere, so that we don't forget.
We can surely add a comment for that.
But I would like to review the state after these patches are applied, as we may be able to guarantee that from cpufreq-core instead.
On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar viresh.kumar@linaro.org wrote:
Invalid state-transitions is verified by governor core now
What about the governors that don't use cpufreq_governor_dbs()?
and there is no need to replicate that in cpufreq core. Also we don't drop policy->rwsem anymore, which makes rest of the races go away.
Simplify code a bit now.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
I won't make this change just yet. At least not in 4.5 (provided that the other patches go into it).
Thanks, Rafael
Hi Viresh,
On 02/02/16 16:27, Viresh Kumar wrote:
Hi Rafael,
Sorry for doing this, I know you were also looking to fix this in a possibly different way. But I thought, it would be better if we fix that. We can scrap this version and take yours if that looks better.
The root cause of all the issues we were facing, was that we were taking policy->rwsem while accessing governor sysfs attributes. And that happened because we were sharing the show/store calls present in cpufreq.c.
I thought, perhaps the best way to fix it is to give separate sysfs-ops to governors. And that's what I did.
@Juri: I need your help in testing these. My platform doesn't give me those lockups (even without these patches) and Juno/Tc2 would fit better.
Can you please run some tests on these?
Sure! Will do in the next few days.
Best,
- Juri
They are pushed here for easy access (and auto test by build-bot): git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git cpufreq/governor-kobject
-- viresh
Viresh Kumar (5): cpufreq: governor: Kill declare_show_sampling_rate_min() cpufreq: governor: Create separate sysfs-ops cpufreq: governor: Remove unused sysfs attribute macros cpufreq: Don't drop rwsem before calling CPUFREQ_GOV_POLICY_EXIT cpufreq: Get rid of ->governor_enabled and its lock
drivers/cpufreq/cpufreq.c | 29 ---------- drivers/cpufreq/cpufreq_conservative.c | 77 ++++++++++--------------- drivers/cpufreq/cpufreq_governor.c | 86 ++++++++++++++++++++-------- drivers/cpufreq/cpufreq_governor.h | 101 +++++++-------------------------- drivers/cpufreq/cpufreq_ondemand.c | 77 ++++++++++--------------- include/linux/cpufreq.h | 5 -- 6 files changed, 143 insertions(+), 232 deletions(-)
-- 2.7.0.79.gdc08a19
On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar viresh.kumar@linaro.org wrote:
Hi Rafael,
Sorry for doing this, I know you were also looking to fix this in a possibly different way. But I thought, it would be better if we fix that. We can scrap this version and take yours if that looks better.
That's not nice to be honest. At the very least you could have asked me about the status of my work before sending this.
Fortunately, though, when I started to look deeper into fixing this problem I thought I didn't like the overall design of things in the governor land, so I started to change that and my modifications turn out to be sort of complementary with respect to this patchset. Of course, they do conflict, but I can redo my patches on top of these if that's necessary. That said I'm going to post them in their current form anyway, at least to show the direction I want to take going forward.
The root cause of all the issues we were facing, was that we were taking policy->rwsem while accessing governor sysfs attributes. And that happened because we were sharing the show/store calls present in cpufreq.c.
I thought, perhaps the best way to fix it is to give separate sysfs-ops to governors. And that's what I did.
Yes, that's what I was talking about in the other thread.
My overall impression here is that the code changes make sense. Some details need to be improved (like the concurrent ->store for governor tunables pointed out by Juri). The patch changelogs suck, though.
If your hope was that this might go into 4.5, there is a small chance of that happening, but only if it can be made ready this week. Otherwise, I'd prefer it to be redone on top of my changes.
Let me comment on the individual patches.
Thanks, Rafael
On 02-02-16, 21:04, Rafael J. Wysocki wrote:
On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar viresh.kumar@linaro.org wrote:
Hi Rafael,
Sorry for doing this, I know you were also looking to fix this in a possibly different way. But I thought, it would be better if we fix that. We can scrap this version and take yours if that looks better.
That's not nice to be honest. At the very least you could have asked me about the status of my work before sending this.
I started looking into this yesterday morning, while you were away and finished it before you came back. So, it would have taken more time and so I just sent them. I don't have any issues (as mentioned earlier) is discarding them for the work you might have already done.
Fortunately, though, when I started to look deeper into fixing this problem I thought I didn't like the overall design of things in the governor land, so I started to change that and my modifications turn out to be sort of complementary with respect to this patchset.
That's good.
Of course, they do conflict, but I can redo my patches on top of these if that's necessary.
Yeah, even I don't have any issues in rebasing over your work, if I have to.
That said I'm going to post them in their current form anyway, at least to show the direction I want to take going forward.
Sure.
I thought, perhaps the best way to fix it is to give separate sysfs-ops to governors. And that's what I did.
Yes, that's what I was talking about in the other thread.
I must have missed that then :(
My overall impression here is that the code changes make sense. Some details need to be improved (like the concurrent ->store for governor tunables pointed out by Juri).
The patch changelogs suck, though.
Like always :(
If your hope was that this might go into 4.5, there is a small chance of that happening, but only if it can be made ready this week.
Will try my best.
Otherwise, I'd prefer it to be redone on top of my changes.
No worries.
Let me comment on the individual patches.
Thanks for taking this up Rafael, really appreciate it :)
On 03-02-16, 07:52, Viresh Kumar wrote:
My overall impression here is that the code changes make sense. Some details need to be improved (like the concurrent ->store for governor tunables pointed out by Juri).
The next version is ready with some improvements, will send it once you are done with this thread.
linaro-kernel@lists.linaro.org