[PATCH 00/12] cpufreq: Fix governor races - part 2 - linaro-kernel

List overview All Threads
Download

newer

[PATCH 00/12] cpufreq: Fix governor races - part 2

older

master build: 0 failures 24...

[PATCH][lsk-3.10-android] arm64:...

Viresh Kumar

11 Jun 2015 11 Jun '15

10:51 a.m.

Hi,

This is a next step of the earlier work I have posted [1].

The first 5 patches are minor cleanups, 6th & 7th try to optimize few things to make code less complex.

Patches 8-11 actually solve (or try to solve :)) the synchronization problems, or the crashes I was getting.

And the last one again optimizes code a bit.

I don't get the crashes anymore and want others to try. At least Preeti and Ke Wang (Sorry if I haven't spelled it properly :( ), as you two have reported the issues to me.

This patchset should be rebased over the earlier one [1].

To make things simple, all these patches are pushed here [2] and are rebased over pm/bleeeding-edge because of some recent important changes there:

[1] http://marc.info/?i=cover.1433326032.git.viresh.kumar%40linaro.org [2] ssh://git@git.linaro.org/people/viresh.kumar/linux.git cpufreq/gov-locking

-- viresh

Viresh Kumar (12): cpufreq: governor: Name delayed-work as dwork cpufreq: governor: Drop unused field 'cpu' cpufreq: governor: Rename 'cpu_dbs_common_info' to 'cpu_dbs_info' cpufreq: governor: name pointer to cpu_dbs_info as 'cdbs' cpufreq: governor: rename cur_policy as policy cpufreq: governor: Keep single copy of information common to policy->cpus cpufreq: governor: split out common part of {cs|od}_dbs_timer() cpufreq: governor: synchronize work-handler with governor callbacks cpufreq: governor: Avoid invalid states with additional checks cpufreq: governor: Don't WARN on invalid states cpufreq: propagate errors returned from __cpufreq_governor() cpufreq: conservative: remove 'enable' field

drivers/cpufreq/cpufreq.c | 22 ++-- drivers/cpufreq/cpufreq_conservative.c | 35 ++---- drivers/cpufreq/cpufreq_governor.c | 200 +++++++++++++++++++++++---------- drivers/cpufreq/cpufreq_governor.h | 36 +++--- drivers/cpufreq/cpufreq_ondemand.c | 68 +++++------ 5 files changed, 214 insertions(+), 147 deletions(-)

-- 2.4.0

Show replies by date

Viresh Kumar

11 Jun 11 Jun

10:51 a.m.

New subject: [PATCH 01/12] cpufreq: governor: Name delayed-work as dwork

Delayed work was named as 'work' and to access work within it we do work.work. Not much readable. Rename delayed_work as 'dwork'.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 2 +- drivers/cpufreq/cpufreq_governor.c | 6 +++--- drivers/cpufreq/cpufreq_governor.h | 2 +- drivers/cpufreq/cpufreq_ondemand.c | 8 ++++---- 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index c86a10c30912..0e3ec1d968d9 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -105,7 +105,7 @@ static void cs_check_cpu(int cpu, unsigned int load) static void cs_dbs_timer(struct work_struct *work) { struct cs_cpu_dbs_info_s *dbs_info = container_of(work, - struct cs_cpu_dbs_info_s, cdbs.work.work); + struct cs_cpu_dbs_info_s, cdbs.dwork.work); unsigned int cpu = dbs_info->cdbs.cur_policy->cpu; struct cs_cpu_dbs_info_s *core_dbs_info = &per_cpu(cs_cpu_dbs_info, cpu); diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 57a39f8a92b7..6024bbc782c0 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -165,7 +165,7 @@ static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data, { struct cpu_dbs_common_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);

- mod_delayed_work_on(cpu, system_wq, &cdbs->work, delay); + mod_delayed_work_on(cpu, system_wq, &cdbs->dwork, delay); }

void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, @@ -204,7 +204,7 @@ static inline void gov_cancel_work(struct dbs_data *dbs_data,

for_each_cpu(i, policy->cpus) { cdbs = dbs_data->cdata->get_cpu_cdbs(i); - cancel_delayed_work_sync(&cdbs->work); + cancel_delayed_work_sync(&cdbs->dwork); } }

@@ -367,7 +367,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];

mutex_init(&j_cdbs->timer_mutex); - INIT_DEFERRABLE_WORK(&j_cdbs->work, cdata->gov_dbs_timer); + INIT_DEFERRABLE_WORK(&j_cdbs->dwork, cdata->gov_dbs_timer); }

if (cdata->governor == GOV_CONSERVATIVE) { diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 34736f5e869d..352eecaae789 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -142,7 +142,7 @@ struct cpu_dbs_common_info { */ unsigned int prev_load; struct cpufreq_policy *cur_policy; - struct delayed_work work; + struct delayed_work dwork; /* * percpu mutex that serializes governor limit change with gov_dbs_timer * invocation. We do not want gov_dbs_timer to run when user is changing diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 3c1e10f2304c..830aef1db8c3 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -194,7 +194,7 @@ static void od_check_cpu(int cpu, unsigned int load) static void od_dbs_timer(struct work_struct *work) { struct od_cpu_dbs_info_s *dbs_info = - container_of(work, struct od_cpu_dbs_info_s, cdbs.work.work); + container_of(work, struct od_cpu_dbs_info_s, cdbs.dwork.work); unsigned int cpu = dbs_info->cdbs.cur_policy->cpu; struct od_cpu_dbs_info_s *core_dbs_info = &per_cpu(od_cpu_dbs_info, cpu); @@ -275,18 +275,18 @@ static void update_sampling_rate(struct dbs_data *dbs_data,

mutex_lock(&dbs_info->cdbs.timer_mutex);

- if (!delayed_work_pending(&dbs_info->cdbs.work)) { + if (!delayed_work_pending(&dbs_info->cdbs.dwork)) { mutex_unlock(&dbs_info->cdbs.timer_mutex); continue; }

next_sampling = jiffies + usecs_to_jiffies(new_rate); - appointed_at = dbs_info->cdbs.work.timer.expires; + appointed_at = dbs_info->cdbs.dwork.timer.expires;

if (time_before(next_sampling, appointed_at)) {

mutex_unlock(&dbs_info->cdbs.timer_mutex); - cancel_delayed_work_sync(&dbs_info->cdbs.work); + cancel_delayed_work_sync(&dbs_info->cdbs.dwork); mutex_lock(&dbs_info->cdbs.timer_mutex);

gov_queue_work(dbs_data, dbs_info->cdbs.cur_policy,

-- 2.4.0

Preeti U Murthy

15 Jun 15 Jun

3:01 a.m.

New subject: [PATCH 01/12] cpufreq: governor: Name delayed-work as dwork

On 06/11/2015 04:21 PM, Viresh Kumar wrote:

...

Delayed work was named as 'work' and to access work within it we do work.work. Not much readable. Rename delayed_work as 'dwork'.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org

Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com

Viresh Kumar

11 Jun 11 Jun

10:51 a.m.

New subject: [PATCH 02/12] cpufreq: governor: Drop unused field 'cpu'

Its not used at all, drop it.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_governor.c | 1 - drivers/cpufreq/cpufreq_governor.h | 1 - 2 files changed, 2 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 6024bbc782c0..2149ba7d32a8 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -353,7 +353,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, struct cpu_dbs_common_info *j_cdbs = cdata->get_cpu_cdbs(j); unsigned int prev_load;

- j_cdbs->cpu = j; j_cdbs->cur_policy = policy; j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy); diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 352eecaae789..1bbf8c87fdd5 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -130,7 +130,6 @@ static void *get_cpu_dbs_info_s(int cpu) \

/* Per cpu structures */ struct cpu_dbs_common_info { - int cpu; u64 prev_cpu_idle; u64 prev_cpu_wall; u64 prev_cpu_nice;

-- 2.4.0

Preeti U Murthy

15 Jun 15 Jun

3:12 a.m.

New subject: [PATCH 02/12] cpufreq: governor: Drop unused field 'cpu'

On 06/11/2015 04:21 PM, Viresh Kumar wrote:

...

Its not used at all, drop it.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org

drivers/cpufreq/cpufreq_governor.c | 1 - drivers/cpufreq/cpufreq_governor.h | 1 - 2 files changed, 2 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 6024bbc782c0..2149ba7d32a8 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -353,7 +353,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, struct cpu_dbs_common_info *j_cdbs = cdata->get_cpu_cdbs(j); unsigned int prev_load;
j_cdbs->cpu = j;
j_cdbs->cur_policy = policy; j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy);
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 352eecaae789..1bbf8c87fdd5 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -130,7 +130,6 @@ static void *get_cpu_dbs_info_s(int cpu) \

/* Per cpu structures */ struct cpu_dbs_common_info {

int cpu; u64 prev_cpu_idle; u64 prev_cpu_wall; u64 prev_cpu_nice;

Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com

Viresh Kumar

11 Jun 11 Jun

10:51 a.m.

New subject: [PATCH 03/12] cpufreq: governor: Rename 'cpu_dbs_common_info' to 'cpu_dbs_info'

Its not common info to all CPUs, but a structure representing common type of cpu info to both governor types. Lets drop 'common_' from its name.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_governor.c | 19 +++++++++---------- drivers/cpufreq/cpufreq_governor.h | 13 ++++++------- 2 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 2149ba7d32a8..529f236f2d05 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -32,7 +32,7 @@ static struct attribute_group *get_sysfs_attr(struct dbs_data *dbs_data)

void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) { - struct cpu_dbs_common_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); + struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); struct od_dbs_tuners *od_tuners = dbs_data->tuners; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; struct cpufreq_policy *policy; @@ -64,7 +64,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)

/* Get Absolute Load */ for_each_cpu(j, policy->cpus) { - struct cpu_dbs_common_info *j_cdbs; + struct cpu_dbs_info *j_cdbs; u64 cur_wall_time, cur_idle_time; unsigned int idle_time, wall_time; unsigned int load; @@ -163,7 +163,7 @@ EXPORT_SYMBOL_GPL(dbs_check_cpu); static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data, unsigned int delay) { - struct cpu_dbs_common_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); + struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);

mod_delayed_work_on(cpu, system_wq, &cdbs->dwork, delay); } @@ -199,7 +199,7 @@ EXPORT_SYMBOL_GPL(gov_queue_work); static inline void gov_cancel_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy) { - struct cpu_dbs_common_info *cdbs; + struct cpu_dbs_info *cdbs; int i;

for_each_cpu(i, policy->cpus) { @@ -209,8 +209,7 @@ static inline void gov_cancel_work(struct dbs_data *dbs_data, }

/* Will return if we need to evaluate cpu load again or not */ -bool need_load_eval(struct cpu_dbs_common_info *cdbs, - unsigned int sampling_rate) +bool need_load_eval(struct cpu_dbs_info *cdbs, unsigned int sampling_rate) { if (policy_is_shared(cdbs->cur_policy)) { ktime_t time_now = ktime_get(); @@ -330,7 +329,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, { struct common_dbs_data *cdata = dbs_data->cdata; unsigned int sampling_rate, ignore_nice, j, cpu = policy->cpu; - struct cpu_dbs_common_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_dbs_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu); int io_busy = 0;

if (!policy->cur) @@ -350,7 +349,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, }

for_each_cpu(j, policy->cpus) { - struct cpu_dbs_common_info *j_cdbs = cdata->get_cpu_cdbs(j); + struct cpu_dbs_info *j_cdbs = cdata->get_cpu_cdbs(j); unsigned int prev_load;

j_cdbs->cur_policy = policy; @@ -398,7 +397,7 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, { struct common_dbs_data *cdata = dbs_data->cdata; unsigned int cpu = policy->cpu; - struct cpu_dbs_common_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_dbs_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu);

if (cdata->governor == GOV_CONSERVATIVE) { struct cs_cpu_dbs_info_s *cs_dbs_info = @@ -418,7 +417,7 @@ static void cpufreq_governor_limits(struct cpufreq_policy *policy, { struct common_dbs_data *cdata = dbs_data->cdata; unsigned int cpu = policy->cpu; - struct cpu_dbs_common_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_dbs_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu);

if (!cpu_cdbs->cur_policy) return; diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 1bbf8c87fdd5..6b5e33f68064 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -109,7 +109,7 @@ store_one(_gov, file_name)

/* create helper routines */ #define define_get_cpu_dbs_routines(_dbs_info) \ -static struct cpu_dbs_common_info *get_cpu_cdbs(int cpu) \ +static struct cpu_dbs_info *get_cpu_cdbs(int cpu) \ { \ return &per_cpu(_dbs_info, cpu).cdbs; \ } \ @@ -129,7 +129,7 @@ static void *get_cpu_dbs_info_s(int cpu) \ */

/* Per cpu structures */ -struct cpu_dbs_common_info { +struct cpu_dbs_info { u64 prev_cpu_idle; u64 prev_cpu_wall; u64 prev_cpu_nice; @@ -152,7 +152,7 @@ struct cpu_dbs_common_info { };

struct od_cpu_dbs_info_s { - struct cpu_dbs_common_info cdbs; + struct cpu_dbs_info cdbs; struct cpufreq_frequency_table *freq_table; unsigned int freq_lo; unsigned int freq_lo_jiffies; @@ -162,7 +162,7 @@ struct od_cpu_dbs_info_s { };

struct cs_cpu_dbs_info_s { - struct cpu_dbs_common_info cdbs; + struct cpu_dbs_info cdbs; unsigned int down_skip; unsigned int requested_freq; unsigned int enable:1; @@ -203,7 +203,7 @@ struct common_dbs_data { */ struct dbs_data *gdbs_data;

- struct cpu_dbs_common_info *(*get_cpu_cdbs)(int cpu); + struct cpu_dbs_info *(*get_cpu_cdbs)(int cpu); void *(*get_cpu_dbs_info_s)(int cpu); void (*gov_dbs_timer)(struct work_struct *work); void (*gov_check_cpu)(int cpu, unsigned int load); @@ -264,8 +264,7 @@ static ssize_t show_sampling_rate_min_gov_pol \ extern struct mutex cpufreq_governor_lock;

void dbs_check_cpu(struct dbs_data *dbs_data, int cpu); -bool need_load_eval(struct cpu_dbs_common_info *cdbs, - unsigned int sampling_rate); +bool need_load_eval(struct cpu_dbs_info *cdbs, unsigned int sampling_rate); int cpufreq_governor_dbs(struct cpufreq_policy *policy, struct common_dbs_data *cdata, unsigned int event); void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,

-- 2.4.0

Preeti U Murthy

18 Jun 18 Jun

6:52 a.m.

New subject: [PATCH 03/12] cpufreq: governor: Rename 'cpu_dbs_common_info' to 'cpu_dbs_info'

On 06/11/2015 04:21 PM, Viresh Kumar wrote:

...

Its not common info to all CPUs, but a structure representing common type of cpu info to both governor types. Lets drop 'common_' from its name.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org

Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com

Viresh Kumar

11 Jun 11 Jun

10:51 a.m.

New subject: [PATCH 04/12] cpufreq: governor: name pointer to cpu_dbs_info as 'cdbs'

It is called as 'cdbs' at most of the places and 'cpu_dbs' at others. Lets use 'cdbs' consistently for better readability.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_governor.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 529f236f2d05..4ea13f182154 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -329,7 +329,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, { struct common_dbs_data *cdata = dbs_data->cdata; unsigned int sampling_rate, ignore_nice, j, cpu = policy->cpu; - struct cpu_dbs_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu); int io_busy = 0;

if (!policy->cur) @@ -385,7 +385,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, }

/* Initiate timer time stamp */ - cpu_cdbs->time_stamp = ktime_get(); + cdbs->time_stamp = ktime_get();

gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate), true); @@ -397,7 +397,7 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, { struct common_dbs_data *cdata = dbs_data->cdata; unsigned int cpu = policy->cpu; - struct cpu_dbs_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

if (cdata->governor == GOV_CONSERVATIVE) { struct cs_cpu_dbs_info_s *cs_dbs_info = @@ -408,8 +408,8 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy,

gov_cancel_work(dbs_data, policy);

- mutex_destroy(&cpu_cdbs->timer_mutex); - cpu_cdbs->cur_policy = NULL; + mutex_destroy(&cdbs->timer_mutex); + cdbs->cur_policy = NULL; }

static void cpufreq_governor_limits(struct cpufreq_policy *policy, @@ -417,20 +417,20 @@ static void cpufreq_governor_limits(struct cpufreq_policy *policy, { struct common_dbs_data *cdata = dbs_data->cdata; unsigned int cpu = policy->cpu; - struct cpu_dbs_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

- if (!cpu_cdbs->cur_policy) + if (!cdbs->cur_policy) return;

- mutex_lock(&cpu_cdbs->timer_mutex); - if (policy->max < cpu_cdbs->cur_policy->cur) - __cpufreq_driver_target(cpu_cdbs->cur_policy, policy->max, + mutex_lock(&cdbs->timer_mutex); + if (policy->max < cdbs->cur_policy->cur) + __cpufreq_driver_target(cdbs->cur_policy, policy->max, CPUFREQ_RELATION_H); - else if (policy->min > cpu_cdbs->cur_policy->cur) - __cpufreq_driver_target(cpu_cdbs->cur_policy, policy->min, + else if (policy->min > cdbs->cur_policy->cur) + __cpufreq_driver_target(cdbs->cur_policy, policy->min, CPUFREQ_RELATION_L); dbs_check_cpu(dbs_data, cpu); - mutex_unlock(&cpu_cdbs->timer_mutex); + mutex_unlock(&cdbs->timer_mutex); }

int cpufreq_governor_dbs(struct cpufreq_policy *policy,

-- 2.4.0

Preeti U Murthy

15 Jun 15 Jun

4:22 a.m.

New subject: [PATCH 04/12] cpufreq: governor: name pointer to cpu_dbs_info as 'cdbs'

On 06/11/2015 04:21 PM, Viresh Kumar wrote:

...

It is called as 'cdbs' at most of the places and 'cpu_dbs' at others. Lets use 'cdbs' consistently for better readability.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org

Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com

Viresh Kumar

11 Jun 11 Jun

10:51 a.m.

New subject: [PATCH 05/12] cpufreq: governor: rename cur_policy as policy

Just call it 'policy', cur_policy is unnecessarily long and doesn't have any special meaning.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 10 +++++----- drivers/cpufreq/cpufreq_governor.c | 18 +++++++++--------- drivers/cpufreq/cpufreq_governor.h | 2 +- drivers/cpufreq/cpufreq_ondemand.c | 19 ++++++++++--------- 4 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 0e3ec1d968d9..af47d322679e 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -47,7 +47,7 @@ static inline unsigned int get_freq_target(struct cs_dbs_tuners *cs_tuners, static void cs_check_cpu(int cpu, unsigned int load) { struct cs_cpu_dbs_info_s *dbs_info = &per_cpu(cs_cpu_dbs_info, cpu); - struct cpufreq_policy *policy = dbs_info->cdbs.cur_policy; + struct cpufreq_policy *policy = dbs_info->cdbs.policy; struct dbs_data *dbs_data = policy->governor_data; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;

@@ -106,10 +106,10 @@ static void cs_dbs_timer(struct work_struct *work) { struct cs_cpu_dbs_info_s *dbs_info = container_of(work, struct cs_cpu_dbs_info_s, cdbs.dwork.work); - unsigned int cpu = dbs_info->cdbs.cur_policy->cpu; + unsigned int cpu = dbs_info->cdbs.policy->cpu; struct cs_cpu_dbs_info_s *core_dbs_info = &per_cpu(cs_cpu_dbs_info, cpu); - struct dbs_data *dbs_data = dbs_info->cdbs.cur_policy->governor_data; + struct dbs_data *dbs_data = dbs_info->cdbs.policy->governor_data; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; int delay = delay_for_sampling_rate(cs_tuners->sampling_rate); bool modify_all = true; @@ -120,7 +120,7 @@ static void cs_dbs_timer(struct work_struct *work) else dbs_check_cpu(dbs_data, cpu);

- gov_queue_work(dbs_data, dbs_info->cdbs.cur_policy, delay, modify_all); + gov_queue_work(dbs_data, dbs_info->cdbs.policy, delay, modify_all); mutex_unlock(&core_dbs_info->cdbs.timer_mutex); }

@@ -135,7 +135,7 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, if (!dbs_info->enable) return 0;

- policy = dbs_info->cdbs.cur_policy; + policy = dbs_info->cdbs.policy;

/* * we only care if our internally tracked freq moves outside the 'valid' diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 4ea13f182154..c0566f86caed 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -60,7 +60,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) ignore_nice = cs_tuners->ignore_nice_load; }

- policy = cdbs->cur_policy; + policy = cdbs->policy;

/* Get Absolute Load */ for_each_cpu(j, policy->cpus) { @@ -211,7 +211,7 @@ static inline void gov_cancel_work(struct dbs_data *dbs_data, /* Will return if we need to evaluate cpu load again or not */ bool need_load_eval(struct cpu_dbs_info *cdbs, unsigned int sampling_rate) { - if (policy_is_shared(cdbs->cur_policy)) { + if (policy_is_shared(cdbs->policy)) { ktime_t time_now = ktime_get(); s64 delta_us = ktime_us_delta(time_now, cdbs->time_stamp);

@@ -352,7 +352,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, struct cpu_dbs_info *j_cdbs = cdata->get_cpu_cdbs(j); unsigned int prev_load;

- j_cdbs->cur_policy = policy; + j_cdbs->policy = policy; j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy);

@@ -409,7 +409,7 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, gov_cancel_work(dbs_data, policy);

mutex_destroy(&cdbs->timer_mutex); - cdbs->cur_policy = NULL; + cdbs->policy = NULL; }

static void cpufreq_governor_limits(struct cpufreq_policy *policy, @@ -419,15 +419,15 @@ static void cpufreq_governor_limits(struct cpufreq_policy *policy, unsigned int cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

- if (!cdbs->cur_policy) + if (!cdbs->policy) return;

mutex_lock(&cdbs->timer_mutex); - if (policy->max < cdbs->cur_policy->cur) - __cpufreq_driver_target(cdbs->cur_policy, policy->max, + if (policy->max < cdbs->policy->cur) + __cpufreq_driver_target(cdbs->policy, policy->max, CPUFREQ_RELATION_H); - else if (policy->min > cdbs->cur_policy->cur) - __cpufreq_driver_target(cdbs->cur_policy, policy->min, + else if (policy->min > cdbs->policy->cur) + __cpufreq_driver_target(cdbs->policy, policy->min, CPUFREQ_RELATION_L); dbs_check_cpu(dbs_data, cpu); mutex_unlock(&cdbs->timer_mutex); diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 6b5e33f68064..a0f8eb79ee6d 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -140,7 +140,7 @@ struct cpu_dbs_info { * wake-up from idle. */ unsigned int prev_load; - struct cpufreq_policy *cur_policy; + struct cpufreq_policy *policy; struct delayed_work dwork; /* * percpu mutex that serializes governor limit change with gov_dbs_timer diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 830aef1db8c3..d29c6f9c6e3e 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -155,7 +155,7 @@ static void dbs_freq_increase(struct cpufreq_policy *policy, unsigned int freq) static void od_check_cpu(int cpu, unsigned int load) { struct od_cpu_dbs_info_s *dbs_info = &per_cpu(od_cpu_dbs_info, cpu); - struct cpufreq_policy *policy = dbs_info->cdbs.cur_policy; + struct cpufreq_policy *policy = dbs_info->cdbs.policy; struct dbs_data *dbs_data = policy->governor_data; struct od_dbs_tuners *od_tuners = dbs_data->tuners;

@@ -195,10 +195,10 @@ static void od_dbs_timer(struct work_struct *work) { struct od_cpu_dbs_info_s *dbs_info = container_of(work, struct od_cpu_dbs_info_s, cdbs.dwork.work); - unsigned int cpu = dbs_info->cdbs.cur_policy->cpu; + unsigned int cpu = dbs_info->cdbs.policy->cpu; struct od_cpu_dbs_info_s *core_dbs_info = &per_cpu(od_cpu_dbs_info, cpu); - struct dbs_data *dbs_data = dbs_info->cdbs.cur_policy->governor_data; + struct dbs_data *dbs_data = dbs_info->cdbs.policy->governor_data; struct od_dbs_tuners *od_tuners = dbs_data->tuners; int delay = 0, sample_type = core_dbs_info->sample_type; bool modify_all = true; @@ -213,8 +213,9 @@ static void od_dbs_timer(struct work_struct *work) core_dbs_info->sample_type = OD_NORMAL_SAMPLE; if (sample_type == OD_SUB_SAMPLE) { delay = core_dbs_info->freq_lo_jiffies; - __cpufreq_driver_target(core_dbs_info->cdbs.cur_policy, - core_dbs_info->freq_lo, CPUFREQ_RELATION_H); + __cpufreq_driver_target(core_dbs_info->cdbs.policy, + core_dbs_info->freq_lo, + CPUFREQ_RELATION_H); } else { dbs_check_cpu(dbs_data, cpu); if (core_dbs_info->freq_lo) { @@ -229,7 +230,7 @@ static void od_dbs_timer(struct work_struct *work) delay = delay_for_sampling_rate(od_tuners->sampling_rate * core_dbs_info->rate_mult);

- gov_queue_work(dbs_data, dbs_info->cdbs.cur_policy, delay, modify_all); + gov_queue_work(dbs_data, dbs_info->cdbs.policy, delay, modify_all); mutex_unlock(&core_dbs_info->cdbs.timer_mutex); }

@@ -289,8 +290,8 @@ static void update_sampling_rate(struct dbs_data *dbs_data, cancel_delayed_work_sync(&dbs_info->cdbs.dwork); mutex_lock(&dbs_info->cdbs.timer_mutex);

- gov_queue_work(dbs_data, dbs_info->cdbs.cur_policy, - usecs_to_jiffies(new_rate), true); + gov_queue_work(dbs_data, dbs_info->cdbs.policy, + usecs_to_jiffies(new_rate), true);

} mutex_unlock(&dbs_info->cdbs.timer_mutex); @@ -559,7 +560,7 @@ static void od_set_powersave_bias(unsigned int powersave_bias) if (cpumask_test_cpu(cpu, &done)) continue;

- policy = per_cpu(od_cpu_dbs_info, cpu).cdbs.cur_policy; + policy = per_cpu(od_cpu_dbs_info, cpu).cdbs.policy; if (!policy) continue;

-- 2.4.0

Preeti U Murthy

15 Jun 15 Jun

4:24 a.m.

New subject: [PATCH 05/12] cpufreq: governor: rename cur_policy as policy

On 06/11/2015 04:21 PM, Viresh Kumar wrote:

...

Just call it 'policy', cur_policy is unnecessarily long and doesn't have any special meaning.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org

Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com

Viresh Kumar

11 Jun 11 Jun

10:51 a.m.

New subject: [PATCH 06/12] cpufreq: governor: Keep single copy of information common to policy->cpus

Some information is common to all CPUs belonging to a policy, but are kept on per-cpu basis. Lets keep that in another structure common to all policy->cpus. That will make updates/reads to that less complex and less error prone.

The memory for ccdbs is allocated/freed at INIT/EXIT, so that it we don't reallocate it for STOP/START sequence. It will be also be used (in next patch) while the governor is stopped and so must not be freed that early.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 18 ++++--- drivers/cpufreq/cpufreq_governor.c | 91 +++++++++++++++++++++++++--------- drivers/cpufreq/cpufreq_governor.h | 24 +++++---- drivers/cpufreq/cpufreq_ondemand.c | 38 +++++++------- 4 files changed, 113 insertions(+), 58 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index af47d322679e..5b5c01ca556c 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -47,7 +47,7 @@ static inline unsigned int get_freq_target(struct cs_dbs_tuners *cs_tuners, static void cs_check_cpu(int cpu, unsigned int load) { struct cs_cpu_dbs_info_s *dbs_info = &per_cpu(cs_cpu_dbs_info, cpu); - struct cpufreq_policy *policy = dbs_info->cdbs.policy; + struct cpufreq_policy *policy = dbs_info->cdbs.ccdbs->policy; struct dbs_data *dbs_data = policy->governor_data; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;

@@ -106,22 +106,24 @@ static void cs_dbs_timer(struct work_struct *work) { struct cs_cpu_dbs_info_s *dbs_info = container_of(work, struct cs_cpu_dbs_info_s, cdbs.dwork.work); - unsigned int cpu = dbs_info->cdbs.policy->cpu; + struct cpufreq_policy *policy = dbs_info->cdbs.ccdbs->policy; + unsigned int cpu = policy->cpu; struct cs_cpu_dbs_info_s *core_dbs_info = &per_cpu(cs_cpu_dbs_info, cpu); - struct dbs_data *dbs_data = dbs_info->cdbs.policy->governor_data; + struct dbs_data *dbs_data = policy->governor_data; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; int delay = delay_for_sampling_rate(cs_tuners->sampling_rate); bool modify_all = true;

- mutex_lock(&core_dbs_info->cdbs.timer_mutex); - if (!need_load_eval(&core_dbs_info->cdbs, cs_tuners->sampling_rate)) + mutex_lock(&core_dbs_info->cdbs.ccdbs->timer_mutex); + if (!need_load_eval(core_dbs_info->cdbs.ccdbs, + cs_tuners->sampling_rate)) modify_all = false; else dbs_check_cpu(dbs_data, cpu);

- gov_queue_work(dbs_data, dbs_info->cdbs.policy, delay, modify_all); - mutex_unlock(&core_dbs_info->cdbs.timer_mutex); + gov_queue_work(dbs_data, policy, delay, modify_all); + mutex_unlock(&core_dbs_info->cdbs.ccdbs->timer_mutex); }

static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, @@ -135,7 +137,7 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, if (!dbs_info->enable) return 0;

- policy = dbs_info->cdbs.policy; + policy = dbs_info->cdbs.ccdbs->policy;

/* * we only care if our internally tracked freq moves outside the 'valid' diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index c0566f86caed..0ebff6fd0a0b 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -35,7 +35,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); struct od_dbs_tuners *od_tuners = dbs_data->tuners; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; - struct cpufreq_policy *policy; + struct cpufreq_policy *policy = cdbs->ccdbs->policy; unsigned int sampling_rate; unsigned int max_load = 0; unsigned int ignore_nice; @@ -60,8 +60,6 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) ignore_nice = cs_tuners->ignore_nice_load; }

- policy = cdbs->policy; - /* Get Absolute Load */ for_each_cpu(j, policy->cpus) { struct cpu_dbs_info *j_cdbs; @@ -209,17 +207,18 @@ static inline void gov_cancel_work(struct dbs_data *dbs_data, }

/* Will return if we need to evaluate cpu load again or not */ -bool need_load_eval(struct cpu_dbs_info *cdbs, unsigned int sampling_rate) +bool need_load_eval(struct cpu_common_dbs_info *ccdbs, + unsigned int sampling_rate) { - if (policy_is_shared(cdbs->policy)) { + if (policy_is_shared(ccdbs->policy)) { ktime_t time_now = ktime_get(); - s64 delta_us = ktime_us_delta(time_now, cdbs->time_stamp); + s64 delta_us = ktime_us_delta(time_now, ccdbs->time_stamp);

/* Do nothing if we recently have sampled */ if (delta_us < (s64)(sampling_rate / 2)) return false; else - cdbs->time_stamp = time_now; + ccdbs->time_stamp = time_now; }

return true; @@ -238,6 +237,39 @@ static void set_sampling_rate(struct dbs_data *dbs_data, } }

+static int alloc_ccdbs(struct cpufreq_policy *policy, + struct common_dbs_data *cdata) +{ + struct cpu_common_dbs_info *ccdbs; + int j; + + /* Allocate memory for the common information for policy->cpus */ + ccdbs = kzalloc(sizeof(*ccdbs), GFP_KERNEL); + if (!ccdbs) + return -ENOMEM; + + ccdbs->policy = policy; + + /* Set ccdbs for all CPUs, online+offline */ + for_each_cpu(j, policy->related_cpus) + cdata->get_cpu_cdbs(j)->ccdbs = ccdbs; + + return 0; +} + +static void free_ccdbs(struct cpufreq_policy *policy, + struct common_dbs_data *cdata) +{ + struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(policy->cpu); + struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs; + int j; + + for_each_cpu(j, policy->cpus) + cdata->get_cpu_cdbs(j)->ccdbs = NULL; + + kfree(ccdbs); +} + static int cpufreq_governor_init(struct cpufreq_policy *policy, struct dbs_data *dbs_data, struct common_dbs_data *cdata) @@ -248,6 +280,11 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, if (dbs_data) { if (WARN_ON(have_governor_per_policy())) return -EINVAL; + + ret = alloc_ccdbs(policy, cdata); + if (ret) + return ret; + dbs_data->usage_count++; policy->governor_data = dbs_data; return 0; @@ -257,12 +294,16 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, if (!dbs_data) return -ENOMEM;

+ ret = alloc_ccdbs(policy, cdata); + if (ret) + goto free_dbs_data; + dbs_data->cdata = cdata; dbs_data->usage_count = 1;

ret = cdata->init(dbs_data, !policy->governor->initialized); if (ret) - goto free_dbs_data; + goto free_ccdbs;

/* policy latency is in ns. Convert it to us first */ latency = policy->cpuinfo.transition_latency / 1000; @@ -299,6 +340,8 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, } cdata_exit: cdata->exit(dbs_data, !policy->governor->initialized); +free_ccdbs: + free_ccdbs(policy, cdata); free_dbs_data: kfree(dbs_data); return ret; @@ -320,6 +363,7 @@ static void cpufreq_governor_exit(struct cpufreq_policy *policy, }

cdata->exit(dbs_data, policy->governor->initialized == 1); + free_ccdbs(policy, cdata); kfree(dbs_data); } } @@ -330,6 +374,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, struct common_dbs_data *cdata = dbs_data->cdata; unsigned int sampling_rate, ignore_nice, j, cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs; int io_busy = 0;

if (!policy->cur) @@ -348,11 +393,13 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, io_busy = od_tuners->io_is_busy; }

+ ccdbs->time_stamp = ktime_get(); + mutex_init(&ccdbs->timer_mutex); + for_each_cpu(j, policy->cpus) { struct cpu_dbs_info *j_cdbs = cdata->get_cpu_cdbs(j); unsigned int prev_load;

- j_cdbs->policy = policy; j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy);

@@ -364,7 +411,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, if (ignore_nice) j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];

- mutex_init(&j_cdbs->timer_mutex); INIT_DEFERRABLE_WORK(&j_cdbs->dwork, cdata->gov_dbs_timer); }

@@ -384,9 +430,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, od_ops->powersave_bias_init_cpu(cpu); }

- /* Initiate timer time stamp */ - cdbs->time_stamp = ktime_get(); - gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate), true); return 0; @@ -398,6 +441,9 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, struct common_dbs_data *cdata = dbs_data->cdata; unsigned int cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs; + + gov_cancel_work(dbs_data, policy);

if (cdata->governor == GOV_CONSERVATIVE) { struct cs_cpu_dbs_info_s *cs_dbs_info = @@ -406,10 +452,7 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, cs_dbs_info->enable = 0; }

- gov_cancel_work(dbs_data, policy); - - mutex_destroy(&cdbs->timer_mutex); - cdbs->policy = NULL; + mutex_destroy(&ccdbs->timer_mutex); }

static void cpufreq_governor_limits(struct cpufreq_policy *policy, @@ -419,18 +462,18 @@ static void cpufreq_governor_limits(struct cpufreq_policy *policy, unsigned int cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

- if (!cdbs->policy) + if (!cdbs->ccdbs) return;

- mutex_lock(&cdbs->timer_mutex); - if (policy->max < cdbs->policy->cur) - __cpufreq_driver_target(cdbs->policy, policy->max, + mutex_lock(&cdbs->ccdbs->timer_mutex); + if (policy->max < cdbs->ccdbs->policy->cur) + __cpufreq_driver_target(cdbs->ccdbs->policy, policy->max, CPUFREQ_RELATION_H); - else if (policy->min > cdbs->policy->cur) - __cpufreq_driver_target(cdbs->policy, policy->min, + else if (policy->min > cdbs->ccdbs->policy->cur) + __cpufreq_driver_target(cdbs->ccdbs->policy, policy->min, CPUFREQ_RELATION_L); dbs_check_cpu(dbs_data, cpu); - mutex_unlock(&cdbs->timer_mutex); + mutex_unlock(&cdbs->ccdbs->timer_mutex); }

int cpufreq_governor_dbs(struct cpufreq_policy *policy, diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index a0f8eb79ee6d..153412cafb05 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -128,6 +128,18 @@ static void *get_cpu_dbs_info_s(int cpu) \ * cs_*: Conservative governor */

+/* Common to all CPUs of a policy */ +struct cpu_common_dbs_info { + struct cpufreq_policy *policy; + /* + * percpu mutex that serializes governor limit change with gov_dbs_timer + * invocation. We do not want gov_dbs_timer to run when user is changing + * the governor or limits. + */ + struct mutex timer_mutex; + ktime_t time_stamp; +}; + /* Per cpu structures */ struct cpu_dbs_info { u64 prev_cpu_idle; @@ -140,15 +152,8 @@ struct cpu_dbs_info { * wake-up from idle. */ unsigned int prev_load; - struct cpufreq_policy *policy; struct delayed_work dwork; - /* - * percpu mutex that serializes governor limit change with gov_dbs_timer - * invocation. We do not want gov_dbs_timer to run when user is changing - * the governor or limits. - */ - struct mutex timer_mutex; - ktime_t time_stamp; + struct cpu_common_dbs_info *ccdbs; };

struct od_cpu_dbs_info_s { @@ -264,7 +269,8 @@ static ssize_t show_sampling_rate_min_gov_pol \ extern struct mutex cpufreq_governor_lock;

void dbs_check_cpu(struct dbs_data *dbs_data, int cpu); -bool need_load_eval(struct cpu_dbs_info *cdbs, unsigned int sampling_rate); +bool need_load_eval(struct cpu_common_dbs_info *ccdbs, + unsigned int sampling_rate); int cpufreq_governor_dbs(struct cpufreq_policy *policy, struct common_dbs_data *cdata, unsigned int event); void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index d29c6f9c6e3e..651677cfa581 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -155,7 +155,7 @@ static void dbs_freq_increase(struct cpufreq_policy *policy, unsigned int freq) static void od_check_cpu(int cpu, unsigned int load) { struct od_cpu_dbs_info_s *dbs_info = &per_cpu(od_cpu_dbs_info, cpu); - struct cpufreq_policy *policy = dbs_info->cdbs.policy; + struct cpufreq_policy *policy = dbs_info->cdbs.ccdbs->policy; struct dbs_data *dbs_data = policy->governor_data; struct od_dbs_tuners *od_tuners = dbs_data->tuners;

@@ -195,16 +195,18 @@ static void od_dbs_timer(struct work_struct *work) { struct od_cpu_dbs_info_s *dbs_info = container_of(work, struct od_cpu_dbs_info_s, cdbs.dwork.work); - unsigned int cpu = dbs_info->cdbs.policy->cpu; + struct cpufreq_policy *policy = dbs_info->cdbs.ccdbs->policy; + unsigned int cpu = policy->cpu; struct od_cpu_dbs_info_s *core_dbs_info = &per_cpu(od_cpu_dbs_info, cpu); - struct dbs_data *dbs_data = dbs_info->cdbs.policy->governor_data; + struct dbs_data *dbs_data = policy->governor_data; struct od_dbs_tuners *od_tuners = dbs_data->tuners; int delay = 0, sample_type = core_dbs_info->sample_type; bool modify_all = true;

- mutex_lock(&core_dbs_info->cdbs.timer_mutex); - if (!need_load_eval(&core_dbs_info->cdbs, od_tuners->sampling_rate)) { + mutex_lock(&core_dbs_info->cdbs.ccdbs->timer_mutex); + if (!need_load_eval(core_dbs_info->cdbs.ccdbs, + od_tuners->sampling_rate)) { modify_all = false; goto max_delay; } @@ -213,8 +215,7 @@ static void od_dbs_timer(struct work_struct *work) core_dbs_info->sample_type = OD_NORMAL_SAMPLE; if (sample_type == OD_SUB_SAMPLE) { delay = core_dbs_info->freq_lo_jiffies; - __cpufreq_driver_target(core_dbs_info->cdbs.policy, - core_dbs_info->freq_lo, + __cpufreq_driver_target(policy, core_dbs_info->freq_lo, CPUFREQ_RELATION_H); } else { dbs_check_cpu(dbs_data, cpu); @@ -230,8 +231,8 @@ static void od_dbs_timer(struct work_struct *work) delay = delay_for_sampling_rate(od_tuners->sampling_rate * core_dbs_info->rate_mult);

/************************** sysfs interface ************************/ @@ -274,10 +275,10 @@ static void update_sampling_rate(struct dbs_data *dbs_data, dbs_info = &per_cpu(od_cpu_dbs_info, cpu); cpufreq_cpu_put(policy);

- mutex_lock(&dbs_info->cdbs.timer_mutex); + mutex_lock(&dbs_info->cdbs.ccdbs->timer_mutex);

if (!delayed_work_pending(&dbs_info->cdbs.dwork)) { - mutex_unlock(&dbs_info->cdbs.timer_mutex); + mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex); continue; }

@@ -286,15 +287,15 @@ static void update_sampling_rate(struct dbs_data *dbs_data,

if (time_before(next_sampling, appointed_at)) {

- mutex_unlock(&dbs_info->cdbs.timer_mutex); + mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex); cancel_delayed_work_sync(&dbs_info->cdbs.dwork); - mutex_lock(&dbs_info->cdbs.timer_mutex); + mutex_lock(&dbs_info->cdbs.ccdbs->timer_mutex);

- gov_queue_work(dbs_data, dbs_info->cdbs.policy, + gov_queue_work(dbs_data, policy, usecs_to_jiffies(new_rate), true);

} - mutex_unlock(&dbs_info->cdbs.timer_mutex); + mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex); } }

@@ -557,13 +558,16 @@ static void od_set_powersave_bias(unsigned int powersave_bias)

get_online_cpus(); for_each_online_cpu(cpu) { + struct cpu_common_dbs_info *ccdbs; + if (cpumask_test_cpu(cpu, &done)) continue;

- policy = per_cpu(od_cpu_dbs_info, cpu).cdbs.policy; - if (!policy) + ccdbs = per_cpu(od_cpu_dbs_info, cpu).cdbs.ccdbs; + if (!ccdbs) continue;

+ policy = ccdbs->policy; cpumask_or(&done, &done, policy->cpus);

if (policy->governor != &cpufreq_gov_ondemand)

-- 2.4.0

Preeti U Murthy

15 Jun 15 Jun

6:15 a.m.

New subject: [PATCH 06/12] cpufreq: governor: Keep single copy of information common to policy->cpus

On 06/11/2015 04:21 PM, Viresh Kumar wrote:

...

Some information is common to all CPUs belonging to a policy, but are kept on per-cpu basis. Lets keep that in another structure common to all policy->cpus. That will make updates/reads to that less complex and less error prone.

Good point.

...

The memory for ccdbs is allocated/freed at INIT/EXIT, so that it we don't reallocate it for STOP/START sequence. It will be also be used (in next patch) while the governor is stopped and so must not be freed that early.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index c0566f86caed..0ebff6fd0a0b 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c +static int alloc_ccdbs(struct cpufreq_policy *policy,
       struct common_dbs_data *cdata)
+{
struct cpu_common_dbs_info *ccdbs;

int j;

/* Allocate memory for the common information for policy->cpus */

ccdbs = kzalloc(sizeof(*ccdbs), GFP_KERNEL);

if (!ccdbs)
return -ENOMEM;
ccdbs->policy = policy;

/* Set ccdbs for all CPUs, online+offline */

for_each_cpu(j, policy->related_cpus)
cdata->get_cpu_cdbs(j)->ccdbs = ccdbs;
return 0;
+}

+static void free_ccdbs(struct cpufreq_policy *policy,
       struct common_dbs_data *cdata)
+{
struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(policy->cpu);

struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs;

int j;

for_each_cpu(j, policy->cpus)
cdata->get_cpu_cdbs(j)->ccdbs = NULL;
kfree(ccdbs);
+}

static int cpufreq_governor_init(struct cpufreq_policy *policy, struct dbs_data *dbs_data, struct common_dbs_data *cdata) @@ -248,6 +280,11 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, if (dbs_data) { if (WARN_ON(have_governor_per_policy())) return -EINVAL;
ret = alloc_ccdbs(policy, cdata);
if (ret)
	return ret;
dbs_data->usage_count++; policy->governor_data = dbs_data; return 0;
@@ -257,12 +294,16 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, if (!dbs_data) return -ENOMEM;
ret = alloc_ccdbs(policy, cdata);

if (ret)
goto free_dbs_data;
dbs_data->cdata = cdata; dbs_data->usage_count = 1;

ret = cdata->init(dbs_data, !policy->governor->initialized); if (ret)
goto free_dbs_data;
goto free_ccdbs;
/* policy latency is in ns. Convert it to us first */ latency = policy->cpuinfo.transition_latency / 1000;
@@ -299,6 +340,8 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, } cdata_exit: cdata->exit(dbs_data, !policy->governor->initialized); +free_ccdbs:

free_ccdbs(policy, cdata);

free_dbs_data: kfree(dbs_data); return ret; @@ -320,6 +363,7 @@ static void cpufreq_governor_exit(struct cpufreq_policy *policy, }
cdata->exit(dbs_data, policy->governor->initialized == 1);
free_ccdbs(policy, cdata);

This is a per-policy data structure, so why free it only when all the users of the governor are gone ? We should be freeing it when a policy is asked to exit, which is independent of references to this governor by other policy cpus. This would mean freeing it outside this if condition.

...

kfree(dbs_data);
} } @@ -330,6 +374,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, struct common_dbs_data *cdata = dbs_data->cdata; unsigned int sampling_rate, ignore_nice, j, cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs; int io_busy = 0;

if (!policy->cur)

@@ -348,11 +393,13 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, io_busy = od_tuners->io_is_busy; }

ccdbs->time_stamp = ktime_get();

mutex_init(&ccdbs->timer_mutex);

for_each_cpu(j, policy->cpus) { struct cpu_dbs_info *j_cdbs = cdata->get_cpu_cdbs(j); unsigned int prev_load;
j_cdbs->policy = policy;

This is not convincing. INIT and EXIT should be typically used to initiate and free 'governor' specific data structures. START and STOP should be used for 'policy wide/cpu wide' initialization and making NULL. Atleast thats how the current code appears to be designed.

Now, ccdbs is a policy wide data structure. We can perhaps allocate and free ccdbs during INIT and EXIT, but initiating the values and making them NULL must be done in START and STOP respectively. You initiate the time_stamp and timer_mutex in START, why not initialize policy also here? This will help maintain consistency in code too.

...

j_cdbs->prev_cpu_idle =
	get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy);
@@ -364,7 +411,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, if (ignore_nice) j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
mutex_init(&j_cdbs->timer_mutex);
INIT_DEFERRABLE_WORK(&j_cdbs->dwork, cdata->gov_dbs_timer); }
@@ -384,9 +430,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, od_ops->powersave_bias_init_cpu(cpu); }

/* Initiate timer time stamp */

cdbs->time_stamp = ktime_get();

gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate), true); return 0;

@@ -398,6 +441,9 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, struct common_dbs_data *cdata = dbs_data->cdata; unsigned int cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs;

gov_cancel_work(dbs_data, policy);

if (cdata->governor == GOV_CONSERVATIVE) { struct cs_cpu_dbs_info_s *cs_dbs_info =

@@ -406,10 +452,7 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, cs_dbs_info->enable = 0; }

gov_cancel_work(dbs_data, policy);

mutex_destroy(&cdbs->timer_mutex);

cdbs->policy = NULL;

Same here. For the same reason as above, the value for policy must be nullified in STOP. Besides, policy is initiated a value explicitly in INIT, but invalidated in EXIT by freeing ccdbs in this patch. There is lack of consistency.

There is another consequence. Freeing a data structure does not necessarily mean setting it to NULL. It can be some random address. This will break places which check for NULL policy.

...

mutex_destroy(&ccdbs->timer_mutex);

Another point which you may have taken care of in the subsequent patches. I will mention here nevertheless.

The timer_mutex is destroyed, but the cdbs->policy is not NULL until we call EXIT. So when cpufreq_governor_limits() is called, it checks for the existence of ccdbs, which succeeds. But when it tries to take the timer_mutex it dereferences NULL.

...

}

static void cpufreq_governor_limits(struct cpufreq_policy *policy, @@ -419,18 +462,18 @@ static void cpufreq_governor_limits(struct cpufreq_policy *policy, unsigned int cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

if (!cdbs->policy)

if (!cdbs->ccdbs) return;
mutex_lock(&cdbs->timer_mutex);

if (policy->max < cdbs->policy->cur)
__cpufreq_driver_target(cdbs->policy, policy->max,
mutex_lock(&cdbs->ccdbs->timer_mutex);

if (policy->max < cdbs->ccdbs->policy->cur)
__cpufreq_driver_target(cdbs->ccdbs->policy, policy->max,
		CPUFREQ_RELATION_H);
else if (policy->min > cdbs->policy->cur)
__cpufreq_driver_target(cdbs->policy, policy->min,
else if (policy->min > cdbs->ccdbs->policy->cur)
__cpufreq_driver_target(cdbs->ccdbs->policy, policy->min,
		CPUFREQ_RELATION_L);
dbs_check_cpu(dbs_data, cpu);
mutex_unlock(&cdbs->timer_mutex);

mutex_unlock(&cdbs->ccdbs->timer_mutex);

}

int cpufreq_governor_dbs(struct cpufreq_policy *policy,

Regards Preeti U Murthy

Viresh Kumar

6:46 a.m.

New subject: [PATCH 06/12] cpufreq: governor: Keep single copy of information common to policy->cpus

On 15-06-15, 11:45, Preeti U Murthy wrote:

...

On 06/11/2015 04:21 PM, Viresh Kumar wrote:

...
@@ -320,6 +363,7 @@ static void cpufreq_governor_exit(struct cpufreq_policy *policy, }
cdata->exit(dbs_data, policy->governor->initialized == 1);
free_ccdbs(policy, cdata);
This is a per-policy data structure, so why free it only when all the users of the governor are gone ? We should be freeing it when a policy is asked to exit, which is independent of references to this governor by other policy cpus. This would mean freeing it outside this if condition.

Right.

...

...
@@ -348,11 +393,13 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, io_busy = od_tuners->io_is_busy; }

ccdbs->time_stamp = ktime_get();

mutex_init(&ccdbs->timer_mutex);

for_each_cpu(j, policy->cpus) { struct cpu_dbs_info *j_cdbs = cdata->get_cpu_cdbs(j); unsigned int prev_load;
j_cdbs->policy = policy;
This is not convincing. INIT and EXIT should be typically used to initiate and free 'governor' specific data structures. START and STOP should be used for 'policy wide/cpu wide' initialization and making NULL. Atleast thats how the current code appears to be designed.

Now, ccdbs is a policy wide data structure. We can perhaps allocate and free ccdbs during INIT and EXIT, but initiating the values and making them NULL must be done in START and STOP respectively. You initiate the time_stamp and timer_mutex in START, why not initialize policy also here? This will help maintain consistency in code too.

I don't think it was done to use it early and so moving it to START/STOP should be fine.

...

...
@@ -406,10 +452,7 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, cs_dbs_info->enable = 0; }

gov_cancel_work(dbs_data, policy);

mutex_destroy(&cdbs->timer_mutex);

cdbs->policy = NULL;

Same here. For the same reason as above, the value for policy must be nullified in STOP. Besides, policy is initiated a value explicitly in INIT, but invalidated in EXIT by freeing ccdbs in this patch. There is lack of consistency.

There is another consequence. Freeing a data structure does not necessarily mean setting it to NULL. It can be some random address. This will break places which check for NULL policy.

If we are freeing ccdbs, then using ccdbs->policy isn't valid anymore. And so while freeing ccdbs, we don't really need to set its fields to NULL.

...

...

mutex_destroy(&ccdbs->timer_mutex);

Another point which you may have taken care of in the subsequent patches. I will mention here nevertheless.

The timer_mutex is destroyed, but the cdbs->policy is not NULL until we call EXIT. So when cpufreq_governor_limits() is called, it checks for the existence of ccdbs, which succeeds. But when it tries to take the timer_mutex it dereferences NULL.

Hmm, so I should keep checking for cdbs->ccdbs->policy instead and make it NULL in STOP..

Nice work Preeti. Thanks.

-- viresh

Viresh Kumar

18 Jun 18 Jun

5:59 a.m.

New subject: [PATCH 06/12] cpufreq: governor: Keep single copy of information common to policy->cpus

On 15-06-15, 11:45, Preeti U Murthy wrote:

...

On 06/11/2015 04:21 PM, Viresh Kumar wrote: This is a per-policy data structure, so why free it only when all the users of the governor are gone ? We should be freeing it when a policy is asked to exit, which is independent of references to this governor by other policy cpus. This would mean freeing it outside this if condition.

Okay, based on review here is the diff from earlier patch:

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 0ebff6fd0a0b..72a92fa71ba5 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -248,8 +248,6 @@ static int alloc_ccdbs(struct cpufreq_policy *policy, if (!ccdbs) return -ENOMEM;

- ccdbs->policy = policy; - /* Set ccdbs for all CPUs, online+offline */ for_each_cpu(j, policy->related_cpus) cdata->get_cpu_cdbs(j)->ccdbs = ccdbs; @@ -363,9 +361,10 @@ static void cpufreq_governor_exit(struct cpufreq_policy *policy, }

cdata->exit(dbs_data, policy->governor->initialized == 1); - free_ccdbs(policy, cdata); kfree(dbs_data); } + + free_ccdbs(policy, cdata); }

static int cpufreq_governor_start(struct cpufreq_policy *policy, @@ -393,6 +392,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, io_busy = od_tuners->io_is_busy; }

+ ccdbs->policy = policy; ccdbs->time_stamp = ktime_get(); mutex_init(&ccdbs->timer_mutex);

@@ -452,6 +452,7 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, cs_dbs_info->enable = 0; }

+ ccdbs->policy = NULL; mutex_destroy(&ccdbs->timer_mutex); }

@@ -462,7 +463,7 @@ static void cpufreq_governor_limits(struct cpufreq_policy *policy, unsigned int cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

- if (!cdbs->ccdbs) + if (!cdbs->ccdbs || !cdbs->ccdbs->policy) return;

mutex_lock(&cdbs->ccdbs->timer_mutex);

And here is the complete patch which you can Ack :)

------------8<---------------

From: Viresh Kumar viresh.kumar@linaro.org Date: Fri, 5 Jun 2015 13:09:45 +0530 Subject: [PATCH] cpufreq: governor: Keep single copy of information common to policy->cpus

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 18 ++++--- drivers/cpufreq/cpufreq_governor.c | 92 +++++++++++++++++++++++++--------- drivers/cpufreq/cpufreq_governor.h | 24 +++++---- drivers/cpufreq/cpufreq_ondemand.c | 38 +++++++------- 4 files changed, 114 insertions(+), 58 deletions(-)

- policy = dbs_info->cdbs.policy; + policy = dbs_info->cdbs.ccdbs->policy;

/* * we only care if our internally tracked freq moves outside the 'valid' diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index c0566f86caed..72a92fa71ba5 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -35,7 +35,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); struct od_dbs_tuners *od_tuners = dbs_data->tuners; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; - struct cpufreq_policy *policy; + struct cpufreq_policy *policy = cdbs->ccdbs->policy; unsigned int sampling_rate; unsigned int max_load = 0; unsigned int ignore_nice; @@ -60,8 +60,6 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) ignore_nice = cs_tuners->ignore_nice_load; }

- policy = cdbs->policy; - /* Get Absolute Load */ for_each_cpu(j, policy->cpus) { struct cpu_dbs_info *j_cdbs; @@ -209,17 +207,18 @@ static inline void gov_cancel_work(struct dbs_data *dbs_data, }

/* Do nothing if we recently have sampled */ if (delta_us < (s64)(sampling_rate / 2)) return false; else - cdbs->time_stamp = time_now; + ccdbs->time_stamp = time_now; }

return true; @@ -238,6 +237,37 @@ static void set_sampling_rate(struct dbs_data *dbs_data, } }

+static int alloc_ccdbs(struct cpufreq_policy *policy, + struct common_dbs_data *cdata) +{ + struct cpu_common_dbs_info *ccdbs; + int j; + + /* Allocate memory for the common information for policy->cpus */ + ccdbs = kzalloc(sizeof(*ccdbs), GFP_KERNEL); + if (!ccdbs) + return -ENOMEM; + + /* Set ccdbs for all CPUs, online+offline */ + for_each_cpu(j, policy->related_cpus) + cdata->get_cpu_cdbs(j)->ccdbs = ccdbs; + + return 0; +} + +static void free_ccdbs(struct cpufreq_policy *policy, + struct common_dbs_data *cdata) +{ + struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(policy->cpu); + struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs; + int j; + + for_each_cpu(j, policy->cpus) + cdata->get_cpu_cdbs(j)->ccdbs = NULL; + + kfree(ccdbs); +} + static int cpufreq_governor_init(struct cpufreq_policy *policy, struct dbs_data *dbs_data, struct common_dbs_data *cdata) @@ -248,6 +278,11 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, if (dbs_data) { if (WARN_ON(have_governor_per_policy())) return -EINVAL; + + ret = alloc_ccdbs(policy, cdata); + if (ret) + return ret; + dbs_data->usage_count++; policy->governor_data = dbs_data; return 0; @@ -257,12 +292,16 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, if (!dbs_data) return -ENOMEM;

+ ret = alloc_ccdbs(policy, cdata); + if (ret) + goto free_dbs_data; + dbs_data->cdata = cdata; dbs_data->usage_count = 1;

ret = cdata->init(dbs_data, !policy->governor->initialized); if (ret) - goto free_dbs_data; + goto free_ccdbs;

/* policy latency is in ns. Convert it to us first */ latency = policy->cpuinfo.transition_latency / 1000; @@ -299,6 +338,8 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, } cdata_exit: cdata->exit(dbs_data, !policy->governor->initialized); +free_ccdbs: + free_ccdbs(policy, cdata); free_dbs_data: kfree(dbs_data); return ret; @@ -322,6 +363,8 @@ static void cpufreq_governor_exit(struct cpufreq_policy *policy, cdata->exit(dbs_data, policy->governor->initialized == 1); kfree(dbs_data); } + + free_ccdbs(policy, cdata); }

static int cpufreq_governor_start(struct cpufreq_policy *policy, @@ -330,6 +373,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, struct common_dbs_data *cdata = dbs_data->cdata; unsigned int sampling_rate, ignore_nice, j, cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs; int io_busy = 0;

if (!policy->cur) @@ -348,11 +392,14 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, io_busy = od_tuners->io_is_busy; }

+ ccdbs->policy = policy; + ccdbs->time_stamp = ktime_get(); + mutex_init(&ccdbs->timer_mutex); + for_each_cpu(j, policy->cpus) { struct cpu_dbs_info *j_cdbs = cdata->get_cpu_cdbs(j); unsigned int prev_load;

- j_cdbs->policy = policy; j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy);

@@ -364,7 +411,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, if (ignore_nice) j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];

- mutex_init(&j_cdbs->timer_mutex); INIT_DEFERRABLE_WORK(&j_cdbs->dwork, cdata->gov_dbs_timer); }

@@ -384,9 +430,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, od_ops->powersave_bias_init_cpu(cpu); }

if (cdata->governor == GOV_CONSERVATIVE) { struct cs_cpu_dbs_info_s *cs_dbs_info = @@ -406,10 +452,8 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, cs_dbs_info->enable = 0; }

- gov_cancel_work(dbs_data, policy); - - mutex_destroy(&cdbs->timer_mutex); - cdbs->policy = NULL; + ccdbs->policy = NULL; + mutex_destroy(&ccdbs->timer_mutex); }

static void cpufreq_governor_limits(struct cpufreq_policy *policy, @@ -419,18 +463,18 @@ static void cpufreq_governor_limits(struct cpufreq_policy *policy, unsigned int cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

- if (!cdbs->policy) + if (!cdbs->ccdbs || !cdbs->ccdbs->policy) return;

struct od_cpu_dbs_info_s { @@ -264,7 +269,8 @@ static ssize_t show_sampling_rate_min_gov_pol \ extern struct mutex cpufreq_governor_lock;

- mutex_lock(&dbs_info->cdbs.timer_mutex); + mutex_lock(&dbs_info->cdbs.ccdbs->timer_mutex);

if (!delayed_work_pending(&dbs_info->cdbs.dwork)) { - mutex_unlock(&dbs_info->cdbs.timer_mutex); + mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex); continue; }

@@ -286,15 +287,15 @@ static void update_sampling_rate(struct dbs_data *dbs_data,

if (time_before(next_sampling, appointed_at)) {

- gov_queue_work(dbs_data, dbs_info->cdbs.policy, + gov_queue_work(dbs_data, policy, usecs_to_jiffies(new_rate), true);

} - mutex_unlock(&dbs_info->cdbs.timer_mutex); + mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex); } }

@@ -557,13 +558,16 @@ static void od_set_powersave_bias(unsigned int powersave_bias)

get_online_cpus(); for_each_online_cpu(cpu) { + struct cpu_common_dbs_info *ccdbs; + if (cpumask_test_cpu(cpu, &done)) continue;

- policy = per_cpu(od_cpu_dbs_info, cpu).cdbs.policy; - if (!policy) + ccdbs = per_cpu(od_cpu_dbs_info, cpu).cdbs.ccdbs; + if (!ccdbs) continue;

+ policy = ccdbs->policy; cpumask_or(&done, &done, policy->cpus);

if (policy->governor != &cpufreq_gov_ondemand)

Preeti U Murthy

19 Jun 19 Jun

4:13 a.m.

New subject: [PATCH 06/12] cpufreq: governor: Keep single copy of information common to policy->cpus

On 06/18/2015 11:29 AM, Viresh Kumar wrote:

...

From: Viresh Kumar viresh.kumar@linaro.org Date: Fri, 5 Jun 2015 13:09:45 +0530 Subject: [PATCH] cpufreq: governor: Keep single copy of information common to policy->cpus

Some information is common to all CPUs belonging to a policy, but are kept on per-cpu basis. Lets keep that in another structure common to all policy->cpus. That will make updates/reads to that less complex and less error prone.

The memory for ccdbs is allocated/freed at INIT/EXIT, so that it we don't reallocate it for STOP/START sequence. It will be also be used (in next patch) while the governor is stopped and so must not be freed that early.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org

Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com

...

drivers/cpufreq/cpufreq_conservative.c | 18 ++++--- drivers/cpufreq/cpufreq_governor.c | 92 +++++++++++++++++++++++++--------- drivers/cpufreq/cpufreq_governor.h | 24 +++++---- drivers/cpufreq/cpufreq_ondemand.c | 38 +++++++------- 4 files changed, 114 insertions(+), 58 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index af47d322679e..5b5c01ca556c 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -47,7 +47,7 @@ static inline unsigned int get_freq_target(struct cs_dbs_tuners *cs_tuners, static void cs_check_cpu(int cpu, unsigned int load) { struct cs_cpu_dbs_info_s *dbs_info = &per_cpu(cs_cpu_dbs_info, cpu);

struct cpufreq_policy *policy = dbs_info->cdbs.policy;

struct cpufreq_policy *policy = dbs_info->cdbs.ccdbs->policy; struct dbs_data *dbs_data = policy->governor_data; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;

@@ -106,22 +106,24 @@ static void cs_dbs_timer(struct work_struct *work) { struct cs_cpu_dbs_info_s *dbs_info = container_of(work, struct cs_cpu_dbs_info_s, cdbs.dwork.work);

unsigned int cpu = dbs_info->cdbs.policy->cpu;

struct cpufreq_policy *policy = dbs_info->cdbs.ccdbs->policy;

unsigned int cpu = policy->cpu; struct cs_cpu_dbs_info_s *core_dbs_info = &per_cpu(cs_cpu_dbs_info, cpu);

struct dbs_data *dbs_data = dbs_info->cdbs.policy->governor_data;

struct dbs_data *dbs_data = policy->governor_data; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; int delay = delay_for_sampling_rate(cs_tuners->sampling_rate); bool modify_all = true;

mutex_lock(&core_dbs_info->cdbs.timer_mutex);

if (!need_load_eval(&core_dbs_info->cdbs, cs_tuners->sampling_rate))
mutex_lock(&core_dbs_info->cdbs.ccdbs->timer_mutex);

if (!need_load_eval(core_dbs_info->cdbs.ccdbs,
	    cs_tuners->sampling_rate))
modify_all = false; else dbs_check_cpu(dbs_data, cpu);
gov_queue_work(dbs_data, dbs_info->cdbs.policy, delay, modify_all);

mutex_unlock(&core_dbs_info->cdbs.timer_mutex);

gov_queue_work(dbs_data, policy, delay, modify_all);

mutex_unlock(&core_dbs_info->cdbs.ccdbs->timer_mutex);

}

static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, @@ -135,7 +137,7 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, if (!dbs_info->enable) return 0;

policy = dbs_info->cdbs.policy;

policy = dbs_info->cdbs.ccdbs->policy;

/*

we only care if our internally tracked freq moves outside the 'valid'

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index c0566f86caed..72a92fa71ba5 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -35,7 +35,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); struct od_dbs_tuners *od_tuners = dbs_data->tuners; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;

struct cpufreq_policy *policy;

struct cpufreq_policy *policy = cdbs->ccdbs->policy; unsigned int sampling_rate; unsigned int max_load = 0; unsigned int ignore_nice;

@@ -60,8 +60,6 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) ignore_nice = cs_tuners->ignore_nice_load; }

policy = cdbs->policy;

/* Get Absolute Load */ for_each_cpu(j, policy->cpus) { struct cpu_dbs_info *j_cdbs;

@@ -209,17 +207,18 @@ static inline void gov_cancel_work(struct dbs_data *dbs_data, }

/* Will return if we need to evaluate cpu load again or not */ -bool need_load_eval(struct cpu_dbs_info *cdbs, unsigned int sampling_rate) +bool need_load_eval(struct cpu_common_dbs_info *ccdbs,
    unsigned int sampling_rate)
{

if (policy_is_shared(cdbs->policy)) {

if (policy_is_shared(ccdbs->policy)) { ktime_t time_now = ktime_get();
s64 delta_us = ktime_us_delta(time_now, cdbs->time_stamp);
s64 delta_us = ktime_us_delta(time_now, ccdbs->time_stamp);
/* Do nothing if we recently have sampled */ if (delta_us < (s64)(sampling_rate / 2)) return false; else
	cdbs->time_stamp = time_now;
	ccdbs->time_stamp = time_now;
}

return true;
@@ -238,6 +237,37 @@ static void set_sampling_rate(struct dbs_data *dbs_data, } }

+static int alloc_ccdbs(struct cpufreq_policy *policy,
       struct common_dbs_data *cdata)
+{
struct cpu_common_dbs_info *ccdbs;

int j;

/* Allocate memory for the common information for policy->cpus */

ccdbs = kzalloc(sizeof(*ccdbs), GFP_KERNEL);

if (!ccdbs)
return -ENOMEM;
/* Set ccdbs for all CPUs, online+offline */

for_each_cpu(j, policy->related_cpus)
cdata->get_cpu_cdbs(j)->ccdbs = ccdbs;
return 0;
+}

+static void free_ccdbs(struct cpufreq_policy *policy,
       struct common_dbs_data *cdata)
+{
struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(policy->cpu);

struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs;

int j;

for_each_cpu(j, policy->cpus)
cdata->get_cpu_cdbs(j)->ccdbs = NULL;
kfree(ccdbs);
+}

static int cpufreq_governor_init(struct cpufreq_policy *policy, struct dbs_data *dbs_data, struct common_dbs_data *cdata) @@ -248,6 +278,11 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, if (dbs_data) { if (WARN_ON(have_governor_per_policy())) return -EINVAL;
ret = alloc_ccdbs(policy, cdata);
if (ret)
	return ret;
dbs_data->usage_count++; policy->governor_data = dbs_data; return 0;
@@ -257,12 +292,16 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, if (!dbs_data) return -ENOMEM;
ret = alloc_ccdbs(policy, cdata);

if (ret)
goto free_dbs_data;
dbs_data->cdata = cdata; dbs_data->usage_count = 1;

ret = cdata->init(dbs_data, !policy->governor->initialized); if (ret)
goto free_dbs_data;
goto free_ccdbs;
/* policy latency is in ns. Convert it to us first */ latency = policy->cpuinfo.transition_latency / 1000;
@@ -299,6 +338,8 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, } cdata_exit: cdata->exit(dbs_data, !policy->governor->initialized); +free_ccdbs:

free_ccdbs(policy, cdata);

free_dbs_data: kfree(dbs_data); return ret; @@ -322,6 +363,8 @@ static void cpufreq_governor_exit(struct cpufreq_policy *policy, cdata->exit(dbs_data, policy->governor->initialized == 1); kfree(dbs_data); }

free_ccdbs(policy, cdata);

}

static int cpufreq_governor_start(struct cpufreq_policy *policy, @@ -330,6 +373,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, struct common_dbs_data *cdata = dbs_data->cdata; unsigned int sampling_rate, ignore_nice, j, cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs; int io_busy = 0;

if (!policy->cur)

@@ -348,11 +392,14 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, io_busy = od_tuners->io_is_busy; }

ccdbs->policy = policy;

ccdbs->time_stamp = ktime_get();

mutex_init(&ccdbs->timer_mutex);

for_each_cpu(j, policy->cpus) { struct cpu_dbs_info *j_cdbs = cdata->get_cpu_cdbs(j); unsigned int prev_load;
j_cdbs->policy = policy;
j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy);
@@ -364,7 +411,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, if (ignore_nice) j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
mutex_init(&j_cdbs->timer_mutex);
INIT_DEFERRABLE_WORK(&j_cdbs->dwork, cdata->gov_dbs_timer); }
@@ -384,9 +430,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, od_ops->powersave_bias_init_cpu(cpu); }

/* Initiate timer time stamp */

cdbs->time_stamp = ktime_get();

gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate), true); return 0;

@@ -398,6 +441,9 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, struct common_dbs_data *cdata = dbs_data->cdata; unsigned int cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs;

gov_cancel_work(dbs_data, policy);

if (cdata->governor == GOV_CONSERVATIVE) { struct cs_cpu_dbs_info_s *cs_dbs_info =

@@ -406,10 +452,8 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, cs_dbs_info->enable = 0; }

gov_cancel_work(dbs_data, policy);

mutex_destroy(&cdbs->timer_mutex);

cdbs->policy = NULL;

ccdbs->policy = NULL;

mutex_destroy(&ccdbs->timer_mutex);

}

static void cpufreq_governor_limits(struct cpufreq_policy *policy, @@ -419,18 +463,18 @@ static void cpufreq_governor_limits(struct cpufreq_policy *policy, unsigned int cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

if (!cdbs->policy)

if (!cdbs->ccdbs || !cdbs->ccdbs->policy) return;
mutex_lock(&cdbs->timer_mutex);

if (policy->max < cdbs->policy->cur)
__cpufreq_driver_target(cdbs->policy, policy->max,
mutex_lock(&cdbs->ccdbs->timer_mutex);

if (policy->max < cdbs->ccdbs->policy->cur)
__cpufreq_driver_target(cdbs->ccdbs->policy, policy->max,
		CPUFREQ_RELATION_H);
else if (policy->min > cdbs->policy->cur)
__cpufreq_driver_target(cdbs->policy, policy->min,
else if (policy->min > cdbs->ccdbs->policy->cur)
__cpufreq_driver_target(cdbs->ccdbs->policy, policy->min,
		CPUFREQ_RELATION_L);
dbs_check_cpu(dbs_data, cpu);
mutex_unlock(&cdbs->timer_mutex);

mutex_unlock(&cdbs->ccdbs->timer_mutex);

}

int cpufreq_governor_dbs(struct cpufreq_policy *policy, diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index a0f8eb79ee6d..153412cafb05 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -128,6 +128,18 @@ static void *get_cpu_dbs_info_s(int cpu) \

cs_*: Conservative governor

*/

+/* Common to all CPUs of a policy */ +struct cpu_common_dbs_info {
struct cpufreq_policy *policy;

/*
* percpu mutex that serializes governor limit change with gov_dbs_timer
* invocation. We do not want gov_dbs_timer to run when user is changing
* the governor or limits.
*/
struct mutex timer_mutex;

ktime_t time_stamp;
+};

/* Per cpu structures */ struct cpu_dbs_info { u64 prev_cpu_idle; @@ -140,15 +152,8 @@ struct cpu_dbs_info { * wake-up from idle. */ unsigned int prev_load;
struct cpufreq_policy *policy; struct delayed_work dwork;

/*
* percpu mutex that serializes governor limit change with gov_dbs_timer
* invocation. We do not want gov_dbs_timer to run when user is changing
* the governor or limits.
*/
struct mutex timer_mutex;

ktime_t time_stamp;
struct cpu_common_dbs_info *ccdbs;

};

struct od_cpu_dbs_info_s { @@ -264,7 +269,8 @@ static ssize_t show_sampling_rate_min_gov_pol \ extern struct mutex cpufreq_governor_lock;

void dbs_check_cpu(struct dbs_data *dbs_data, int cpu); -bool need_load_eval(struct cpu_dbs_info *cdbs, unsigned int sampling_rate); +bool need_load_eval(struct cpu_common_dbs_info *ccdbs,
    unsigned int sampling_rate);
int cpufreq_governor_dbs(struct cpufreq_policy *policy, struct common_dbs_data *cdata, unsigned int event); void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index d29c6f9c6e3e..651677cfa581 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -155,7 +155,7 @@ static void dbs_freq_increase(struct cpufreq_policy *policy, unsigned int freq) static void od_check_cpu(int cpu, unsigned int load) { struct od_cpu_dbs_info_s *dbs_info = &per_cpu(od_cpu_dbs_info, cpu);

struct cpufreq_policy *policy = dbs_info->cdbs.policy;

struct cpufreq_policy *policy = dbs_info->cdbs.ccdbs->policy; struct dbs_data *dbs_data = policy->governor_data; struct od_dbs_tuners *od_tuners = dbs_data->tuners;

@@ -195,16 +195,18 @@ static void od_dbs_timer(struct work_struct *work) { struct od_cpu_dbs_info_s *dbs_info = container_of(work, struct od_cpu_dbs_info_s, cdbs.dwork.work);

unsigned int cpu = dbs_info->cdbs.policy->cpu;

struct cpufreq_policy *policy = dbs_info->cdbs.ccdbs->policy;

unsigned int cpu = policy->cpu; struct od_cpu_dbs_info_s *core_dbs_info = &per_cpu(od_cpu_dbs_info, cpu);

struct dbs_data *dbs_data = dbs_info->cdbs.policy->governor_data;

struct dbs_data *dbs_data = policy->governor_data; struct od_dbs_tuners *od_tuners = dbs_data->tuners; int delay = 0, sample_type = core_dbs_info->sample_type; bool modify_all = true;

mutex_lock(&core_dbs_info->cdbs.timer_mutex);

if (!need_load_eval(&core_dbs_info->cdbs, od_tuners->sampling_rate)) {
mutex_lock(&core_dbs_info->cdbs.ccdbs->timer_mutex);

if (!need_load_eval(core_dbs_info->cdbs.ccdbs,
	    od_tuners->sampling_rate)) {
modify_all = false; goto max_delay; }
@@ -213,8 +215,7 @@ static void od_dbs_timer(struct work_struct *work) core_dbs_info->sample_type = OD_NORMAL_SAMPLE; if (sample_type == OD_SUB_SAMPLE) { delay = core_dbs_info->freq_lo_jiffies;
__cpufreq_driver_target(core_dbs_info->cdbs.policy,
			core_dbs_info->freq_lo,
__cpufreq_driver_target(policy, core_dbs_info->freq_lo,
		CPUFREQ_RELATION_H);
} else { dbs_check_cpu(dbs_data, cpu);
@@ -230,8 +231,8 @@ static void od_dbs_timer(struct work_struct *work) delay = delay_for_sampling_rate(od_tuners->sampling_rate * core_dbs_info->rate_mult);

gov_queue_work(dbs_data, dbs_info->cdbs.policy, delay, modify_all);

mutex_unlock(&core_dbs_info->cdbs.timer_mutex);

gov_queue_work(dbs_data, policy, delay, modify_all);

mutex_unlock(&core_dbs_info->cdbs.ccdbs->timer_mutex);

}

/************************** sysfs interface ************************/ @@ -274,10 +275,10 @@ static void update_sampling_rate(struct dbs_data *dbs_data, dbs_info = &per_cpu(od_cpu_dbs_info, cpu); cpufreq_cpu_put(policy);
mutex_lock(&dbs_info->cdbs.timer_mutex);
mutex_lock(&dbs_info->cdbs.ccdbs->timer_mutex);
if (!delayed_work_pending(&dbs_info->cdbs.dwork)) {
	mutex_unlock(&dbs_info->cdbs.timer_mutex);
	mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex);
continue;
}
@@ -286,15 +287,15 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
if (time_before(next_sampling, appointed_at)) {
	mutex_unlock(&dbs_info->cdbs.timer_mutex);
	mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex);
cancel_delayed_work_sync(&dbs_info->cdbs.dwork);
	mutex_lock(&dbs_info->cdbs.timer_mutex);
	mutex_lock(&dbs_info->cdbs.ccdbs->timer_mutex);
	gov_queue_work(dbs_data, dbs_info->cdbs.policy,
	gov_queue_work(dbs_data, policy,
	       usecs_to_jiffies(new_rate), true);
}
mutex_unlock(&dbs_info->cdbs.timer_mutex);
mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex);
}
}

@@ -557,13 +558,16 @@ static void od_set_powersave_bias(unsigned int powersave_bias)

get_online_cpus(); for_each_online_cpu(cpu) {
struct cpu_common_dbs_info *ccdbs;
if (cpumask_test_cpu(cpu, &done)) continue;
policy = per_cpu(od_cpu_dbs_info, cpu).cdbs.policy;
if (!policy)
ccdbs = per_cpu(od_cpu_dbs_info, cpu).cdbs.ccdbs;
if (!ccdbs)
continue;
policy = ccdbs->policy;
cpumask_or(&done, &done, policy->cpus);

if (policy->governor != &cpufreq_gov_ondemand)

Viresh Kumar

11 Jun 11 Jun

10:51 a.m.

New subject: [PATCH 07/12] cpufreq: governor: split out common part of {cs|od}_dbs_timer()

Some part of cs_dbs_timer() and od_dbs_timer() is exactly same and is unnecessarily duplicated.

Create the real work-handler in cpufreq_governor.c and put the common code in this routine (dbs_timer()).

Shouldn't make any functional change.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 27 +++++++----------------- drivers/cpufreq/cpufreq_governor.c | 38 ++++++++++++++++++++++++++++++---- drivers/cpufreq/cpufreq_governor.h | 10 ++++----- drivers/cpufreq/cpufreq_ondemand.c | 36 +++++++++++++------------------- 4 files changed, 60 insertions(+), 51 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 5b5c01ca556c..0e4154e584bf 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -102,28 +102,15 @@ static void cs_check_cpu(int cpu, unsigned int load) } }

-static void cs_dbs_timer(struct work_struct *work) +static unsigned int cs_dbs_timer(struct cpu_dbs_info *cdbs, + struct dbs_data *dbs_data, bool modify_all) { - struct cs_cpu_dbs_info_s *dbs_info = container_of(work, - struct cs_cpu_dbs_info_s, cdbs.dwork.work); - struct cpufreq_policy *policy = dbs_info->cdbs.ccdbs->policy; - unsigned int cpu = policy->cpu; - struct cs_cpu_dbs_info_s *core_dbs_info = &per_cpu(cs_cpu_dbs_info, - cpu); - struct dbs_data *dbs_data = policy->governor_data; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; - int delay = delay_for_sampling_rate(cs_tuners->sampling_rate); - bool modify_all = true; - - mutex_lock(&core_dbs_info->cdbs.ccdbs->timer_mutex); - if (!need_load_eval(core_dbs_info->cdbs.ccdbs, - cs_tuners->sampling_rate)) - modify_all = false; - else - dbs_check_cpu(dbs_data, cpu); - - gov_queue_work(dbs_data, policy, delay, modify_all); - mutex_unlock(&core_dbs_info->cdbs.ccdbs->timer_mutex); + + if (modify_all) + dbs_check_cpu(dbs_data, cdbs->ccdbs->policy->cpu); + + return delay_for_sampling_rate(cs_tuners->sampling_rate); }

static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 0ebff6fd0a0b..08bb67225b85 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -207,8 +207,8 @@ static inline void gov_cancel_work(struct dbs_data *dbs_data, }

/* Will return if we need to evaluate cpu load again or not */ -bool need_load_eval(struct cpu_common_dbs_info *ccdbs, - unsigned int sampling_rate) +static bool need_load_eval(struct cpu_common_dbs_info *ccdbs, + unsigned int sampling_rate) { if (policy_is_shared(ccdbs->policy)) { ktime_t time_now = ktime_get(); @@ -223,7 +223,37 @@ bool need_load_eval(struct cpu_common_dbs_info *ccdbs,

return true; } -EXPORT_SYMBOL_GPL(need_load_eval); + +static void dbs_timer(struct work_struct *work) +{ + struct cpu_dbs_info *cdbs = container_of(work, struct cpu_dbs_info, + dwork.work); + struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs; + struct cpufreq_policy *policy = ccdbs->policy; + struct dbs_data *dbs_data = policy->governor_data; + unsigned int sampling_rate, delay; + bool modify_all = true; + + mutex_lock(&ccdbs->timer_mutex); + + if (dbs_data->cdata->governor == GOV_CONSERVATIVE) { + struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; + + sampling_rate = cs_tuners->sampling_rate; + } else { + struct od_dbs_tuners *od_tuners = dbs_data->tuners; + + sampling_rate = od_tuners->sampling_rate; + } + + if (!need_load_eval(cdbs->ccdbs, sampling_rate)) + modify_all = false; + + delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, modify_all); + gov_queue_work(dbs_data, policy, delay, modify_all); + + mutex_unlock(&ccdbs->timer_mutex); +}

static void set_sampling_rate(struct dbs_data *dbs_data, unsigned int sampling_rate) @@ -411,7 +441,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, if (ignore_nice) j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];

- INIT_DEFERRABLE_WORK(&j_cdbs->dwork, cdata->gov_dbs_timer); + INIT_DEFERRABLE_WORK(&j_cdbs->dwork, dbs_timer); }

if (cdata->governor == GOV_CONSERVATIVE) { diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 153412cafb05..2125c299c602 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -132,8 +132,8 @@ static void *get_cpu_dbs_info_s(int cpu) \ struct cpu_common_dbs_info { struct cpufreq_policy *policy; /* - * percpu mutex that serializes governor limit change with gov_dbs_timer - * invocation. We do not want gov_dbs_timer to run when user is changing + * percpu mutex that serializes governor limit change with dbs_timer + * invocation. We do not want dbs_timer to run when user is changing * the governor or limits. */ struct mutex timer_mutex; @@ -210,7 +210,9 @@ struct common_dbs_data {

struct cpu_dbs_info *(*get_cpu_cdbs)(int cpu); void *(*get_cpu_dbs_info_s)(int cpu); - void (*gov_dbs_timer)(struct work_struct *work); + unsigned int (*gov_dbs_timer)(struct cpu_dbs_info *cdbs, + struct dbs_data *dbs_data, + bool modify_all); void (*gov_check_cpu)(int cpu, unsigned int load); int (*init)(struct dbs_data *dbs_data, bool notify); void (*exit)(struct dbs_data *dbs_data, bool notify); @@ -269,8 +271,6 @@ static ssize_t show_sampling_rate_min_gov_pol \ extern struct mutex cpufreq_governor_lock;

void dbs_check_cpu(struct dbs_data *dbs_data, int cpu); -bool need_load_eval(struct cpu_common_dbs_info *ccdbs, - unsigned int sampling_rate); int cpufreq_governor_dbs(struct cpufreq_policy *policy, struct common_dbs_data *cdata, unsigned int event); void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 651677cfa581..11db20079fc6 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -191,48 +191,40 @@ static void od_check_cpu(int cpu, unsigned int load) } }

-static void od_dbs_timer(struct work_struct *work) +static unsigned int od_dbs_timer(struct cpu_dbs_info *cdbs, + struct dbs_data *dbs_data, bool modify_all) { - struct od_cpu_dbs_info_s *dbs_info = - container_of(work, struct od_cpu_dbs_info_s, cdbs.dwork.work); - struct cpufreq_policy *policy = dbs_info->cdbs.ccdbs->policy; + struct cpufreq_policy *policy = cdbs->ccdbs->policy; unsigned int cpu = policy->cpu; - struct od_cpu_dbs_info_s *core_dbs_info = &per_cpu(od_cpu_dbs_info, + struct od_cpu_dbs_info_s *dbs_info = &per_cpu(od_cpu_dbs_info, cpu); - struct dbs_data *dbs_data = policy->governor_data; struct od_dbs_tuners *od_tuners = dbs_data->tuners; - int delay = 0, sample_type = core_dbs_info->sample_type; - bool modify_all = true; + int delay = 0, sample_type = dbs_info->sample_type;

- mutex_lock(&core_dbs_info->cdbs.ccdbs->timer_mutex); - if (!need_load_eval(core_dbs_info->cdbs.ccdbs, - od_tuners->sampling_rate)) { - modify_all = false; + if (!modify_all) goto max_delay; - }

/* Common NORMAL_SAMPLE setup */ - core_dbs_info->sample_type = OD_NORMAL_SAMPLE; + dbs_info->sample_type = OD_NORMAL_SAMPLE; if (sample_type == OD_SUB_SAMPLE) { - delay = core_dbs_info->freq_lo_jiffies; - __cpufreq_driver_target(policy, core_dbs_info->freq_lo, + delay = dbs_info->freq_lo_jiffies; + __cpufreq_driver_target(policy, dbs_info->freq_lo, CPUFREQ_RELATION_H); } else { dbs_check_cpu(dbs_data, cpu); - if (core_dbs_info->freq_lo) { + if (dbs_info->freq_lo) { /* Setup timer for SUB_SAMPLE */ - core_dbs_info->sample_type = OD_SUB_SAMPLE; - delay = core_dbs_info->freq_hi_jiffies; + dbs_info->sample_type = OD_SUB_SAMPLE; + delay = dbs_info->freq_hi_jiffies; } }

max_delay: if (!delay) delay = delay_for_sampling_rate(od_tuners->sampling_rate - * core_dbs_info->rate_mult); + * dbs_info->rate_mult);

- gov_queue_work(dbs_data, policy, delay, modify_all); - mutex_unlock(&core_dbs_info->cdbs.ccdbs->timer_mutex); + return delay; }

/************************** sysfs interface ************************/

-- 2.4.0

Preeti U Murthy

15 Jun 15 Jun

7:03 a.m.

New subject: [PATCH 07/12] cpufreq: governor: split out common part of {cs|od}_dbs_timer()

On 06/11/2015 04:21 PM, Viresh Kumar wrote:

...

Some part of cs_dbs_timer() and od_dbs_timer() is exactly same and is unnecessarily duplicated.

Create the real work-handler in cpufreq_governor.c and put the common code in this routine (dbs_timer()).

Shouldn't make any functional change.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org

Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com

Viresh Kumar

11 Jun 11 Jun

10:51 a.m.

New subject: [PATCH 08/12] cpufreq: governor: synchronize work-handler with governor callbacks

Currently the work-handler's for all the CPUs are synchronized with one another but not with cpufreq_governor_dbs(). Both work-handlers and cpufreq_governor_dbs() enqueue work. And these not being in sync is an open invitation to all kind of races to come in.

So, this patch replaces 'timer_mutex' with 'cdata->mutex' so that both work-handler and callback are in sync.

Also in update_sampling_rate() we are dropping the mutex while canceling delayed-work. There is no need to do that, keep lock active for the entire length of routine as that also enqueues works.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_governor.c | 14 +++----------- drivers/cpufreq/cpufreq_governor.h | 6 ------ drivers/cpufreq/cpufreq_ondemand.c | 13 ++++--------- 3 files changed, 7 insertions(+), 26 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 08bb67225b85..aa24aa9a9eb3 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -228,13 +228,12 @@ static void dbs_timer(struct work_struct *work) { struct cpu_dbs_info *cdbs = container_of(work, struct cpu_dbs_info, dwork.work); - struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs; - struct cpufreq_policy *policy = ccdbs->policy; + struct cpufreq_policy *policy = cdbs->ccdbs->policy; struct dbs_data *dbs_data = policy->governor_data; unsigned int sampling_rate, delay; bool modify_all = true;

- mutex_lock(&ccdbs->timer_mutex); + mutex_lock(&dbs_data->cdata->mutex);

if (dbs_data->cdata->governor == GOV_CONSERVATIVE) { struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; @@ -252,7 +251,7 @@ static void dbs_timer(struct work_struct *work) delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, modify_all); gov_queue_work(dbs_data, policy, delay, modify_all);

- mutex_unlock(&ccdbs->timer_mutex); + mutex_unlock(&dbs_data->cdata->mutex); }

static void set_sampling_rate(struct dbs_data *dbs_data, @@ -424,7 +423,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, }

ccdbs->time_stamp = ktime_get(); - mutex_init(&ccdbs->timer_mutex);

for_each_cpu(j, policy->cpus) { struct cpu_dbs_info *j_cdbs = cdata->get_cpu_cdbs(j); @@ -470,8 +468,6 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, { struct common_dbs_data *cdata = dbs_data->cdata; unsigned int cpu = policy->cpu; - struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu); - struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs;

gov_cancel_work(dbs_data, policy);

@@ -481,8 +477,6 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy,

cs_dbs_info->enable = 0; } - - mutex_destroy(&ccdbs->timer_mutex); }

static void cpufreq_governor_limits(struct cpufreq_policy *policy, @@ -495,7 +489,6 @@ static void cpufreq_governor_limits(struct cpufreq_policy *policy, if (!cdbs->ccdbs) return;

- mutex_lock(&cdbs->ccdbs->timer_mutex); if (policy->max < cdbs->ccdbs->policy->cur) __cpufreq_driver_target(cdbs->ccdbs->policy, policy->max, CPUFREQ_RELATION_H); @@ -503,7 +496,6 @@ static void cpufreq_governor_limits(struct cpufreq_policy *policy, __cpufreq_driver_target(cdbs->ccdbs->policy, policy->min, CPUFREQ_RELATION_L); dbs_check_cpu(dbs_data, cpu); - mutex_unlock(&cdbs->ccdbs->timer_mutex); }

int cpufreq_governor_dbs(struct cpufreq_policy *policy, diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 2125c299c602..2b2884f91fe7 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -131,12 +131,6 @@ static void *get_cpu_dbs_info_s(int cpu) \ /* Common to all CPUs of a policy */ struct cpu_common_dbs_info { struct cpufreq_policy *policy; - /* - * percpu mutex that serializes governor limit change with dbs_timer - * invocation. We do not want dbs_timer to run when user is changing - * the governor or limits. - */ - struct mutex timer_mutex; ktime_t time_stamp; };

diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 11db20079fc6..87813830e169 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -249,6 +249,8 @@ static void update_sampling_rate(struct dbs_data *dbs_data, struct od_dbs_tuners *od_tuners = dbs_data->tuners; int cpu;

+ mutex_lock(&dbs_data->cdata->mutex); + od_tuners->sampling_rate = new_rate = max(new_rate, dbs_data->min_sampling_rate);

@@ -267,28 +269,21 @@ static void update_sampling_rate(struct dbs_data *dbs_data, dbs_info = &per_cpu(od_cpu_dbs_info, cpu); cpufreq_cpu_put(policy);

- mutex_lock(&dbs_info->cdbs.ccdbs->timer_mutex); - - if (!delayed_work_pending(&dbs_info->cdbs.dwork)) { - mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex); + if (!delayed_work_pending(&dbs_info->cdbs.dwork)) continue; - }

next_sampling = jiffies + usecs_to_jiffies(new_rate); appointed_at = dbs_info->cdbs.dwork.timer.expires;

if (time_before(next_sampling, appointed_at)) {

- mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex); cancel_delayed_work_sync(&dbs_info->cdbs.dwork); - mutex_lock(&dbs_info->cdbs.ccdbs->timer_mutex); - gov_queue_work(dbs_data, policy, usecs_to_jiffies(new_rate), true);

} - mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex); } + mutex_unlock(&dbs_data->cdata->mutex); }

static ssize_t store_sampling_rate(struct dbs_data *dbs_data, const char *buf,

-- 2.4.0

Preeti U Murthy

15 Jun 15 Jun

8:23 a.m.

New subject: [PATCH 08/12] cpufreq: governor: synchronize work-handler with governor callbacks

On 06/11/2015 04:21 PM, Viresh Kumar wrote:

...

Currently the work-handler's for all the CPUs are synchronized with one another but not with cpufreq_governor_dbs(). Both work-handlers and cpufreq_governor_dbs() enqueue work. And these not being in sync is an open invitation to all kind of races to come in.

This patch is not convincing. What are the precise races between cpufreq_governor_dbs() and the work handlers ?

It is true that the work handlers can get queued after a governor exit due to a race between different callers of cpufreq_governor_dbs() and they can dereference freed data structures. But that is about invalid interleaving of states which your next patch aims at fixing.

AFAICT, preventing invalid states from succeeding will avoid this race. I don't see the need for another lock here.

...

So, this patch replaces 'timer_mutex' with 'cdata->mutex' so that both work-handler and callback are in sync.

Also in update_sampling_rate() we are dropping the mutex while canceling delayed-work. There is no need to do that, keep lock active

Why? What is the race that we are fixing by taking the lock here ?

...

for the entire length of routine as that also enqueues works.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org

Regards Preeti U Murthy

Viresh Kumar

8:31 a.m.

New subject: [PATCH 08/12] cpufreq: governor: synchronize work-handler with governor callbacks

On 15-06-15, 13:53, Preeti U Murthy wrote:

...

This patch is not convincing. What are the precise races between cpufreq_governor_dbs() and the work handlers ?

The most important problem was restarting of workqueues from the handler, which can happen at the same time when the governor-callbacks are in progress..

...

It is true that the work handlers can get queued after a governor exit due to a race between different callers of cpufreq_governor_dbs() and they can dereference freed data structures. But that is about invalid interleaving of states which your next patch aims at fixing.

AFAICT, preventing invalid states from succeeding will avoid this race. I don't see the need for another lock here.

Another lock? I am taking exactly the same lock at all places and actually removing the need of two separate locks.

...

...
So, this patch replaces 'timer_mutex' with 'cdata->mutex' so that both work-handler and callback are in sync.

Also in update_sampling_rate() we are dropping the mutex while canceling delayed-work. There is no need to do that, keep lock active

Why? What is the race that we are fixing by taking the lock here ?

This can also queue work on CPUs.

-- viresh

Viresh Kumar

11 Jun 11 Jun

10:51 a.m.

New subject: [PATCH 09/12] cpufreq: governor: Avoid invalid states with additional checks

There can be races where the request has come to a wrong state. For example INIT followed by STOP (instead of START) or START followed by EXIT (instead of STOP).

Also make sure, once we have started canceling queued works, we don't queue any new works. That can lead to the case where the work-handler finds many data structures are freed and so NULL pointer exceptions.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_governor.c | 56 ++++++++++++++++++++++++++++++-------- drivers/cpufreq/cpufreq_governor.h | 1 + 2 files changed, 45 insertions(+), 12 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index aa24aa9a9eb3..ee2e19a1218a 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -169,8 +169,12 @@ static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data, void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, unsigned int delay, bool all_cpus) { + struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(policy->cpu); int i;

+ if (!cdbs->ccdbs->enabled) + return; + mutex_lock(&cpufreq_governor_lock); if (!policy->governor_enabled) goto out_unlock; @@ -234,6 +238,8 @@ static void dbs_timer(struct work_struct *work) bool modify_all = true;

mutex_lock(&dbs_data->cdata->mutex); + if (!cdbs->ccdbs->enabled) + goto unlock;

if (dbs_data->cdata->governor == GOV_CONSERVATIVE) { struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; @@ -251,6 +257,7 @@ static void dbs_timer(struct work_struct *work) delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, modify_all); gov_queue_work(dbs_data, policy, delay, modify_all);

+unlock: mutex_unlock(&dbs_data->cdata->mutex); }

@@ -376,10 +383,15 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, return ret; }

-static void cpufreq_governor_exit(struct cpufreq_policy *policy, - struct dbs_data *dbs_data) +static int cpufreq_governor_exit(struct cpufreq_policy *policy, + struct dbs_data *dbs_data) { struct common_dbs_data *cdata = dbs_data->cdata; + struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(policy->cpu); + + /* STOP should have been called by now */ + if (cdbs->ccdbs->enabled) + return -EBUSY;

policy->governor_data = NULL; if (!--dbs_data->usage_count) { @@ -395,6 +407,8 @@ static void cpufreq_governor_exit(struct cpufreq_policy *policy, free_ccdbs(policy, cdata); kfree(dbs_data); } + + return 0; }

static int cpufreq_governor_start(struct cpufreq_policy *policy, @@ -409,6 +423,10 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, if (!policy->cur) return -EINVAL;

+ /* START shouldn't be already called */ + if (ccdbs->enabled) + return -EBUSY; + if (cdata->governor == GOV_CONSERVATIVE) { struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;

@@ -458,17 +476,25 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, od_ops->powersave_bias_init_cpu(cpu); }

+ ccdbs->enabled = true; gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate), true); return 0; }

-static void cpufreq_governor_stop(struct cpufreq_policy *policy, - struct dbs_data *dbs_data) +static int cpufreq_governor_stop(struct cpufreq_policy *policy, + struct dbs_data *dbs_data) { struct common_dbs_data *cdata = dbs_data->cdata; unsigned int cpu = policy->cpu; + struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs; + + /* Shouldn't be already stopped */ + if (!ccdbs->enabled) + return -EBUSY;

+ ccdbs->enabled = false; gov_cancel_work(dbs_data, policy);

if (cdata->governor == GOV_CONSERVATIVE) { @@ -477,17 +503,19 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy,

cs_dbs_info->enable = 0; } + + return 0; }

-static void cpufreq_governor_limits(struct cpufreq_policy *policy, - struct dbs_data *dbs_data) +static int cpufreq_governor_limits(struct cpufreq_policy *policy, + struct dbs_data *dbs_data) { struct common_dbs_data *cdata = dbs_data->cdata; unsigned int cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

- if (!cdbs->ccdbs) - return; + if (!cdbs->ccdbs->enabled) + return -EBUSY;

if (policy->max < cdbs->ccdbs->policy->cur) __cpufreq_driver_target(cdbs->ccdbs->policy, policy->max, @@ -496,13 +524,15 @@ static void cpufreq_governor_limits(struct cpufreq_policy *policy, __cpufreq_driver_target(cdbs->ccdbs->policy, policy->min, CPUFREQ_RELATION_L); dbs_check_cpu(dbs_data, cpu); + + return 0; }

int cpufreq_governor_dbs(struct cpufreq_policy *policy, struct common_dbs_data *cdata, unsigned int event) { struct dbs_data *dbs_data; - int ret = 0; + int ret;

/* Lock governor to block concurrent initialization of governor */ mutex_lock(&cdata->mutex); @@ -522,17 +552,19 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy, ret = cpufreq_governor_init(policy, dbs_data, cdata); break; case CPUFREQ_GOV_POLICY_EXIT: - cpufreq_governor_exit(policy, dbs_data); + ret = cpufreq_governor_exit(policy, dbs_data); break; case CPUFREQ_GOV_START: ret = cpufreq_governor_start(policy, dbs_data); break; case CPUFREQ_GOV_STOP: - cpufreq_governor_stop(policy, dbs_data); + ret = cpufreq_governor_stop(policy, dbs_data); break; case CPUFREQ_GOV_LIMITS: - cpufreq_governor_limits(policy, dbs_data); + ret = cpufreq_governor_limits(policy, dbs_data); break; + default: + ret = -EINVAL; }

unlock: diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 2b2884f91fe7..7da5aedb8174 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -130,6 +130,7 @@ static void *get_cpu_dbs_info_s(int cpu) \

/* Common to all CPUs of a policy */ struct cpu_common_dbs_info { + bool enabled; struct cpufreq_policy *policy; ktime_t time_stamp; };

-- 2.4.0

Preeti U Murthy

15 Jun 15 Jun

8:59 a.m.

New subject: [PATCH 09/12] cpufreq: governor: Avoid invalid states with additional checks

On 06/11/2015 04:21 PM, Viresh Kumar wrote:

...

There can be races where the request has come to a wrong state. For example INIT followed by STOP (instead of START) or START followed by EXIT (instead of STOP).

Also make sure, once we have started canceling queued works, we don't queue any new works. That can lead to the case where the work-handler finds many data structures are freed and so NULL pointer exceptions.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org

drivers/cpufreq/cpufreq_governor.c | 56 ++++++++++++++++++++++++++++++-------- drivers/cpufreq/cpufreq_governor.h | 1 + 2 files changed, 45 insertions(+), 12 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index aa24aa9a9eb3..ee2e19a1218a 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -169,8 +169,12 @@ static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data, void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, unsigned int delay, bool all_cpus) {
struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(policy->cpu); int i;

if (!cdbs->ccdbs->enabled)
return;

policy->governor_enabled is already doing this job. Why this additional check ?

...

mutex_lock(&cpufreq_governor_lock); if (!policy->governor_enabled) goto out_unlock;

@@ -234,6 +238,8 @@ static void dbs_timer(struct work_struct *work) bool modify_all = true;

mutex_lock(&dbs_data->cdata->mutex);
if (!cdbs->ccdbs->enabled)
goto unlock;

This should not trigger at all if we get the entries into cpufreq_governor_dbs() fixed. I don't like the idea of adding checks/locks in places where it can be avoided.

...

if (dbs_data->cdata->governor == GOV_CONSERVATIVE) { struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; @@ -251,6 +257,7 @@ static void dbs_timer(struct work_struct *work) delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, modify_all); gov_queue_work(dbs_data, policy, delay, modify_all);

+unlock: mutex_unlock(&dbs_data->cdata->mutex); }

@@ -376,10 +383,15 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, return ret; }

-static void cpufreq_governor_exit(struct cpufreq_policy *policy,
		  struct dbs_data *dbs_data)
+static int cpufreq_governor_exit(struct cpufreq_policy *policy,
		 struct dbs_data *dbs_data)
{ struct common_dbs_data *cdata = dbs_data->cdata;

struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(policy->cpu);

/* STOP should have been called by now */

This is not true, atleast in the races that I have seen. The problem is not about STOP not being called before an EXIT. It is about a START being called after a STOP and before an EXIT. The comment should ideally be "The policy is active, stop it before exit" or similar.

...

if (cdbs->ccdbs->enabled)
return -EBUSY;

And.. in such a scenario, we must not be aborting EXIT; rather it must cancel the queued work and successfully exit the policy. An EXIT is a more urgent operation than START, given its call sites. Also an EXIT will not leave the cpufreq governors in a limbo state, it is bound to restart a new policy or quit a policy if the last cpu goes down. A racing START operation however is typically from a call site referencing an older policy. Its better to abort this than the EXIT operation.

It may mean a user is trying to switch governors, and the exit operation is quitting the old governor as a result. A START from a cpufreq_remove_dev_finish() racing in just before this is no reason to prevent switching governors.

...

policy->governor_data = NULL; if (!--dbs_data->usage_count) { @@ -395,6 +407,8 @@ static void cpufreq_governor_exit(struct cpufreq_policy *policy, free_ccdbs(policy, cdata); kfree(dbs_data); }

return 0;

}

static int cpufreq_governor_start(struct cpufreq_policy *policy, @@ -409,6 +423,10 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, if (!policy->cur) return -EINVAL;
/* START shouldn't be already called */

if (ccdbs->enabled)
return -EBUSY;

Why not reuse policy->governor_enabled in each of these places ?

Regards Preeti U Murthy

Viresh Kumar

9:12 a.m.

New subject: [PATCH 09/12] cpufreq: governor: Avoid invalid states with additional checks

On 15-06-15, 14:29, Preeti U Murthy wrote:

...

On 06/11/2015 04:21 PM, Viresh Kumar wrote:

...
@@ -169,8 +169,12 @@ static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data, void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, unsigned int delay, bool all_cpus) {
struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(policy->cpu); int i;

if (!cdbs->ccdbs->enabled)
return;
policy->governor_enabled is already doing this job. Why this additional check ?

That can be removed after this series. The idea was to get/set this information from within the governor instead of cpufreq core. And all those checks from __cpufreq_governor() should die as well.

...

...
mutex_lock(&cpufreq_governor_lock); if (!policy->governor_enabled) goto out_unlock;

@@ -234,6 +238,8 @@ static void dbs_timer(struct work_struct *work) bool modify_all = true;

mutex_lock(&dbs_data->cdata->mutex);
if (!cdbs->ccdbs->enabled)
goto unlock;
This should not trigger at all if we get the entries into cpufreq_governor_dbs() fixed. I don't like the idea of adding checks/locks in places where it can be avoided.

We will get the order of events get fixed in cpufreq.c, but this is about the sanity of the governor.

...

...
-static void cpufreq_governor_exit(struct cpufreq_policy *policy,
		  struct dbs_data *dbs_data)
+static int cpufreq_governor_exit(struct cpufreq_policy *policy,
		 struct dbs_data *dbs_data)
{ struct common_dbs_data *cdata = dbs_data->cdata;

struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(policy->cpu);

/* STOP should have been called by now */
This is not true, atleast in the races that I have seen. The problem is not about STOP not being called before an EXIT. It is about a START being called after a STOP and before an EXIT. The comment should ideally be "The policy is active, stop it before exit" or similar.

...
if (cdbs->ccdbs->enabled)
return -EBUSY;
And.. in such a scenario, we must not be aborting EXIT; rather it must cancel the queued work and successfully exit the policy. An EXIT is a more urgent operation than START, given its call sites. Also an EXIT will not leave the cpufreq governors in a limbo state, it is bound to restart a new policy or quit a policy if the last cpu goes down. A racing START operation however is typically from a call site referencing an older policy. Its better to abort this than the EXIT operation.

It may mean a user is trying to switch governors, and the exit operation is quitting the old governor as a result. A START from a cpufreq_remove_dev_finish() racing in just before this is no reason to prevent switching governors.

We can't decide the priority of events here. All we can do is to make sure that the governor doesn't end up going into a bad state.

So, if the sequence is STOP followed by START and then EXIT. Because START has started the wqs, EXIT can't just EXIT. And pushing the wq-cancel part into EXIT is absolutely wrong.

What we need to prevent is the START to interfere with the sequence of STOP, EXIT. We will do that separately, we are just making sure here that a user code isn't following wrong sequence of events.

-- viresh

Viresh Kumar

11 Jun 11 Jun

10:51 a.m.

New subject: [PATCH 10/12] cpufreq: governor: Don't WARN on invalid states

The last commit returned errors on invalid state requests for a governor. But we are already issuing a WARN for an invalid state in cpufreq_governor_dbs().

Lets stop warning on that until the time cpufreq core is fixed to serialize state changes of the governor.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_governor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index ee2e19a1218a..c26f535d3d91 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -542,7 +542,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy, else dbs_data = cdata->gdbs_data;

- if (WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT))) { + if (!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT)) { ret = -EINVAL; goto unlock; }

-- 2.4.0

Preeti U Murthy

15 Jun 15 Jun

9:52 a.m.

New subject: [PATCH 10/12] cpufreq: governor: Don't WARN on invalid states

On 06/11/2015 04:21 PM, Viresh Kumar wrote:

...

The last commit returned errors on invalid state requests for a governor. But we are already issuing a WARN for an invalid state in cpufreq_governor_dbs().

Lets stop warning on that until the time cpufreq core is fixed to serialize state changes of the governor.

The way I see it is that if this patch series manages to capture invalid states right, we don't need to necessarily serialize state changes in the cpufreq core to get rid of the race conditions. So getting a START after an EXIT will not be fatal(WARN_ON() will trigger then), if it is identified as an invalid operation at that point and prevented.

So yes, we must get rid of the WARN_ON() not because we want to reintroduce it later when all races are fixed but because the condition that it is warning on can happen even if we get this fixed right making it essentially a false positive.

And its ok to get rid of the WARN_ON() now rather than waiting till all races are fixed, because anyway we are bound to crash the moment we hit the warning today and we will know anyway that things went out of hand :P

...

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org

drivers/cpufreq/cpufreq_governor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index ee2e19a1218a..c26f535d3d91 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -542,7 +542,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy, else dbs_data = cdata->gdbs_data;

if (WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT))) {

if (!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT)) { ret = -EINVAL; goto unlock; }

Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com

Viresh Kumar

11 Jun 11 Jun

10:51 a.m.

New subject: [PATCH 11/12] cpufreq: propagate errors returned from __cpufreq_governor()

At few places in cpufreq_set_policy() either we aren't checking return errors of __cpufreq_governor() or aren't returning them as is to the callers. This sometimes propagates wrong errors to sysfs OR we try to do more operations even if we have failed.

Now that we return -EBUSY from __cpufreq_governor() on invalid requests, there are more chances of hitting these errors. Lets fix them by checking and returning errors properly.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index b612411655f9..da672b910760 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2284,16 +2284,20 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, old_gov = policy->governor; /* end old governor */ if (old_gov) { - __cpufreq_governor(policy, CPUFREQ_GOV_STOP); - up_write(&policy->rwsem); - __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); - down_write(&policy->rwsem); + if(!(ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP))) { + up_write(&policy->rwsem); + ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); + down_write(&policy->rwsem); + } + + if (ret) + return ret; }

/* start new governor */ policy->governor = new_policy->governor; - if (!__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT)) { - if (!__cpufreq_governor(policy, CPUFREQ_GOV_START)) + if (!(ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT))) { + if (!(ret = __cpufreq_governor(policy, CPUFREQ_GOV_START))) goto out;

up_write(&policy->rwsem); @@ -2305,11 +2309,11 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, pr_debug("starting governor %s failed\n", policy->governor->name); if (old_gov) { policy->governor = old_gov; - __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT); - __cpufreq_governor(policy, CPUFREQ_GOV_START); + if (!__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT)) + __cpufreq_governor(policy, CPUFREQ_GOV_START); }

- return -EINVAL; + return ret;

out: pr_debug("governor: change or update limits\n");

-- 2.4.0

Preeti U Murthy

15 Jun 15 Jun

10:30 a.m.

New subject: [PATCH 11/12] cpufreq: propagate errors returned from __cpufreq_governor()

On 06/11/2015 04:21 PM, Viresh Kumar wrote:

...

At few places in cpufreq_set_policy() either we aren't checking return errors of __cpufreq_governor() or aren't returning them as is to the callers. This sometimes propagates wrong errors to sysfs OR we try to do more operations even if we have failed.

Now that we return -EBUSY from __cpufreq_governor() on invalid requests, there are more chances of hitting these errors. Lets fix them by checking and returning errors properly.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org

drivers/cpufreq/cpufreq.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index b612411655f9..da672b910760 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2284,16 +2284,20 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, old_gov = policy->governor; /* end old governor */ if (old_gov) {
__cpufreq_governor(policy, CPUFREQ_GOV_STOP);
up_write(&policy->rwsem);
__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
down_write(&policy->rwsem);
if(!(ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP))) {
	up_write(&policy->rwsem);
	ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);

...

From my comments in the previous patches, EXIT should always succeed. In

that case we only need to take care of STOP. So we can perhaps do a

ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP) if (!ret) .. .. ?

It looks better this way.

...

```
	down_write(&policy->rwsem);
```
```
}
```
```
if (ret)
```

How about a pr_debug("Failed to stop old governor %s", policy->gov->name) here ?

...

	return ret;
}

/* start new governor */ policy->governor = new_policy->governor;
if (!__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT)) {
if (!__cpufreq_governor(policy, CPUFREQ_GOV_START))
if (!(ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT))) {
if (!(ret = __cpufreq_governor(policy, CPUFREQ_GOV_START)))

Do we really need to capture the return values here ? If there are errors we fall down to return EINVAL. Will this not be a valid error condition?

...

	goto out;

up_write(&policy->rwsem);
@@ -2305,11 +2309,11 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, pr_debug("starting governor %s failed\n", policy->governor->name); if (old_gov) { policy->governor = old_gov;
__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT);
__cpufreq_governor(policy, CPUFREQ_GOV_START);
if (!__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT))
	__cpufreq_governor(policy, CPUFREQ_GOV_START);

I would suggest printing a debug message if INIT fails and calling __cpufreq_governor(POLICY_EXIT) if START fails. And EXIT is not supposed to fail. This will leave the cpufreq governor in a sane state.

...

}

return -EINVAL;

return ret;

out: pr_debug("governor: change or update limits\n");

Regards Preeti U Murthy

Viresh Kumar

11 Jun 11 Jun

10:51 a.m.

New subject: [PATCH 12/12] cpufreq: conservative: remove 'enable' field

Conservative governor has its own 'enable' field to check in notifier if notification is required or not. The same functionality can now be achieved with 'ccdbs->enabled instead'. Lets get rid of 'enable'.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 12 ++++++------ drivers/cpufreq/cpufreq_governor.c | 13 +------------ drivers/cpufreq/cpufreq_governor.h | 1 - 3 files changed, 7 insertions(+), 19 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 0e4154e584bf..e0b49729307d 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -21,6 +21,7 @@ #define DEF_SAMPLING_DOWN_FACTOR (1) #define MAX_SAMPLING_DOWN_FACTOR (10)

+static struct common_dbs_data cs_dbs_cdata; static DEFINE_PER_CPU(struct cs_cpu_dbs_info_s, cs_cpu_dbs_info);

static inline unsigned int get_freq_target(struct cs_dbs_tuners *cs_tuners, @@ -119,13 +120,13 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, struct cpufreq_freqs *freq = data; struct cs_cpu_dbs_info_s *dbs_info = &per_cpu(cs_cpu_dbs_info, freq->cpu); - struct cpufreq_policy *policy; + struct cpu_common_dbs_info *ccdbs = dbs_info->cdbs.ccdbs; + struct cpufreq_policy *policy = ccdbs->policy;

- if (!dbs_info->enable) + mutex_lock(&cs_dbs_cdata.mutex); + if (!ccdbs->enabled) return 0;

- policy = dbs_info->cdbs.ccdbs->policy; - /* * we only care if our internally tracked freq moves outside the 'valid' * ranges of frequency available to us otherwise we do not change it @@ -133,6 +134,7 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, if (dbs_info->requested_freq > policy->max || dbs_info->requested_freq < policy->min) dbs_info->requested_freq = freq->new; + mutex_unlock(&cs_dbs_cdata.mutex);

return 0; } @@ -142,8 +144,6 @@ static struct notifier_block cs_cpufreq_notifier_block = { };

/************************** sysfs interface ************************/ -static struct common_dbs_data cs_dbs_cdata; - static ssize_t store_sampling_down_factor(struct dbs_data *dbs_data, const char *buf, size_t count) { diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index c26f535d3d91..7f348c3a4782 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -465,7 +465,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, cdata->get_cpu_dbs_info_s(cpu);

cs_dbs_info->down_skip = 0; - cs_dbs_info->enable = 1; cs_dbs_info->requested_freq = policy->cur; } else { struct od_ops *od_ops = cdata->gov_ops; @@ -485,9 +484,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, static int cpufreq_governor_stop(struct cpufreq_policy *policy, struct dbs_data *dbs_data) { - struct common_dbs_data *cdata = dbs_data->cdata; - unsigned int cpu = policy->cpu; - struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(policy->cpu); struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs;

/* Shouldn't be already stopped */ @@ -496,14 +493,6 @@ static int cpufreq_governor_stop(struct cpufreq_policy *policy,

ccdbs->enabled = false; gov_cancel_work(dbs_data, policy); - - if (cdata->governor == GOV_CONSERVATIVE) { - struct cs_cpu_dbs_info_s *cs_dbs_info = - cdata->get_cpu_dbs_info_s(cpu); - - cs_dbs_info->enable = 0; - } - return 0; }

diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 7da5aedb8174..7f651bdf43ae 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -165,7 +165,6 @@ struct cs_cpu_dbs_info_s { struct cpu_dbs_info cdbs; unsigned int down_skip; unsigned int requested_freq; - unsigned int enable:1; };

/* Per policy Governors sysfs tunables */

-- 2.4.0

Preeti U Murthy

15 Jun 15 Jun

10:40 a.m.

New subject: [PATCH 12/12] cpufreq: conservative: remove 'enable' field

On 06/11/2015 04:21 PM, Viresh Kumar wrote:

...

Conservative governor has its own 'enable' field to check in notifier if notification is required or not. The same functionality can now be achieved with 'ccdbs->enabled instead'. Lets get rid of 'enable'.

Since this is a policy wide value, is there a race possible between switching to a new governor and checking of this value in the notifier ? We don't want scenarios where we have switched from conservative to ondemand and ccdbs->enabled = 1 while a parallel notifier thread is running and thinks the conservative governor is enabled.

Regards Preeti U Murthy

...

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org

drivers/cpufreq/cpufreq_conservative.c | 12 ++++++------ drivers/cpufreq/cpufreq_governor.c | 13 +------------ drivers/cpufreq/cpufreq_governor.h | 1 - 3 files changed, 7 insertions(+), 19 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 0e4154e584bf..e0b49729307d 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -21,6 +21,7 @@ #define DEF_SAMPLING_DOWN_FACTOR (1) #define MAX_SAMPLING_DOWN_FACTOR (10)

+static struct common_dbs_data cs_dbs_cdata; static DEFINE_PER_CPU(struct cs_cpu_dbs_info_s, cs_cpu_dbs_info);

static inline unsigned int get_freq_target(struct cs_dbs_tuners *cs_tuners, @@ -119,13 +120,13 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, struct cpufreq_freqs *freq = data; struct cs_cpu_dbs_info_s *dbs_info = &per_cpu(cs_cpu_dbs_info, freq->cpu);

struct cpufreq_policy *policy;

struct cpu_common_dbs_info *ccdbs = dbs_info->cdbs.ccdbs;

struct cpufreq_policy *policy = ccdbs->policy;

if (!dbs_info->enable)

mutex_lock(&cs_dbs_cdata.mutex);

if (!ccdbs->enabled) return 0;

policy = dbs_info->cdbs.ccdbs->policy;

/*

we only care if our internally tracked freq moves outside the 'valid'

ranges of frequency available to us otherwise we do not change it

@@ -133,6 +134,7 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, if (dbs_info->requested_freq > policy->max || dbs_info->requested_freq < policy->min) dbs_info->requested_freq = freq->new;

mutex_unlock(&cs_dbs_cdata.mutex);

return 0;

} @@ -142,8 +144,6 @@ static struct notifier_block cs_cpufreq_notifier_block = { };

/************************** sysfs interface ************************/ -static struct common_dbs_data cs_dbs_cdata;

static ssize_t store_sampling_down_factor(struct dbs_data *dbs_data, const char *buf, size_t count) { diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index c26f535d3d91..7f348c3a4782 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -465,7 +465,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, cdata->get_cpu_dbs_info_s(cpu);
cs_dbs_info->down_skip = 0;
cs_dbs_info->enable = 1;
cs_dbs_info->requested_freq = policy->cur; } else { struct od_ops *od_ops = cdata->gov_ops;
@@ -485,9 +484,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, static int cpufreq_governor_stop(struct cpufreq_policy *policy, struct dbs_data *dbs_data) {

struct common_dbs_data *cdata = dbs_data->cdata;

unsigned int cpu = policy->cpu;

struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(policy->cpu); struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs;

/* Shouldn't be already stopped */

@@ -496,14 +493,6 @@ static int cpufreq_governor_stop(struct cpufreq_policy *policy,

ccdbs->enabled = false; gov_cancel_work(dbs_data, policy);
if (cdata->governor == GOV_CONSERVATIVE) {
struct cs_cpu_dbs_info_s *cs_dbs_info =
	cdata->get_cpu_dbs_info_s(cpu);
cs_dbs_info->enable = 0;
}

return 0;
}

diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 7da5aedb8174..7f651bdf43ae 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -165,7 +165,6 @@ struct cs_cpu_dbs_info_s { struct cpu_dbs_info cdbs; unsigned int down_skip; unsigned int requested_freq;

unsigned int enable:1;

};

/* Per policy Governors sysfs tunables */

Preeti U Murthy

4:49 a.m.

On 06/11/2015 04:21 PM, Viresh Kumar wrote:

...

Hi,

This is a next step of the earlier work I have posted [1].

The first 5 patches are minor cleanups, 6th & 7th try to optimize few things to make code less complex.

Patches 8-11 actually solve (or try to solve :)) the synchronization problems, or the crashes I was getting.

And the last one again optimizes code a bit.

I don't get the crashes anymore and want others to try. At least Preeti and Ke Wang (Sorry if I haven't spelled it properly :( ), as you two have reported the issues to me.

This patchset should be rebased over the earlier one [1].

To make things simple, all these patches are pushed here [2] and are rebased over pm/bleeeding-edge because of some recent important changes there:

[1] http://marc.info/?i=cover.1433326032.git.viresh.kumar%40linaro.org [2] ssh://git@git.linaro.org/people/viresh.kumar/linux.git cpufreq/gov-locking

I ran the kernel compiled from the above ^^ branch. I get data access exception with the following backtrace:

[ 67.834689] Unable to handle kernel paging request for data at address 0x00000008 [ 67.834800] Faulting instruction address: 0xc000000000859708 cpu 0x0: Vector: 300 (Data Access) at [c00000000382b4b0] pc: c000000000859708: dbs_cpufreq_notifier+0x68/0xe0 lr: c000000000100dec: notifier_call_chain+0xbc/0x120 sp: c00000000382b730 msr: 9000000100009033 dar: 8 dsisr: 40000000 current = 0xc0000000038876c0 paca = 0xc000000007da0000 softe: 0 irq_happened: 0x01 pid = 737, comm = kworker/0:2 enter ? for help [c00000000382b780] c000000000100dec notifier_call_chain+0xbc/0x120 [c00000000382b7d0] c000000000101638 __srcu_notifier_call_chain+0xa8/0x110 [c00000000382b830] c000000000850844 cpufreq_notify_transition+0x1a4/0x540 [c00000000382b920] c000000000850d1c cpufreq_freq_transition_begin+0x13c/0x180 [c00000000382b9b0] c000000000851958 __cpufreq_driver_target+0x2b8/0x4a0 [c00000000382ba70] c0000000008578b0 od_check_cpu+0x120/0x140 [c00000000382baa0] c00000000085ac7c dbs_check_cpu+0x25c/0x310 [c00000000382bb50] c0000000008580f0 od_dbs_timer+0x120/0x190 [c00000000382bb90] c00000000085b2c0 dbs_timer+0xf0/0x150 [c00000000382bbe0] c0000000000f489c process_one_work+0x24c/0x910 [c00000000382bc90] c0000000000f50dc worker_thread+0x17c/0x540 [c00000000382bd20] c0000000000fed70 kthread+0x120/0x140 [c00000000382be30] c000000000009678 ret_from_kernel_thread+0x5c/0x64

I also get the following lockdep warning just before hitting the above panic.

[ 64.414664] [ 64.414724] ====================================================== [ 64.414810] [ INFO: possible circular locking dependency detected ] [ 64.414883] 4.1.0-rc7-00513-g3af78d9 #44 Not tainted [ 64.414934] ------------------------------------------------------- [ 64.414998] ppc64_cpu/3378 is trying to acquire lock: [ 64.415049] ((&(&j_cdbs->dwork)->work)){+.+...}, at: [<c0000000000f5848>] flush_work+0x8/0x350 [ 64.415172] [ 64.415172] but task is already holding lock: [ 64.415236] (od_dbs_cdata.mutex){+.+.+.}, at: [<c00000000085b3a0>] cpufreq_governor_dbs+0x80/0x940 [ 64.415366] [ 64.415366] which lock already depends on the new lock. [ 64.415366] [ 64.415443] [ 64.415443] the existing dependency chain (in reverse order) is: [ 64.415518] -> #1 (od_dbs_cdata.mutex){+.+.+.}: [ 64.415608] [<c0000000001489c8>] lock_acquire+0xf8/0x340 [ 64.415674] [<c000000000a6dc28>] mutex_lock_nested+0xb8/0x5b0 [ 64.415752] [<c00000000085b220>] dbs_timer+0x50/0x150 [ 64.415824] [<c0000000000f489c>] process_one_work+0x24c/0x910 [ 64.415909] [<c0000000000f50dc>] worker_thread+0x17c/0x540 [ 64.415981] [<c0000000000fed70>] kthread+0x120/0x140 [ 64.416052] [<c000000000009678>] ret_from_kernel_thread+0x5c/0x64 [ 64.416139] -> #0 ((&(&j_cdbs->dwork)->work)){+.+...}: [ 64.416240] [<c000000000147764>] __lock_acquire+0x1114/0x1990 [ 64.416321] [<c0000000001489c8>] lock_acquire+0xf8/0x340 [ 64.416385] [<c0000000000f58a8>] flush_work+0x68/0x350 [ 64.416453] [<c0000000000f5c74>] __cancel_work_timer+0xe4/0x270 [ 64.416530] [<c00000000085b8d0>] cpufreq_governor_dbs+0x5b0/0x940 [ 64.416605] [<c000000000857e3c>] od_cpufreq_governor_dbs+0x3c/0x60 [ 64.416684] [<c000000000852b04>] __cpufreq_governor+0xe4/0x320 [ 64.416762] [<c000000000855bb8>] __cpufreq_remove_dev_prepare.isra.22+0x78/0x340 [ 64.416851] [<c000000000855f44>] cpufreq_cpu_callback+0xc4/0xe0 [ 64.416928] [<c000000000100dec>] notifier_call_chain+0xbc/0x120 [ 64.417005] [<c00000000025a3bc>] _cpu_down+0xec/0x440 [ 64.417072] [<c00000000025a76c>] cpu_down+0x5c/0xa0 [ 64.417137] [<c00000000064f52c>] cpu_subsys_offline+0x2c/0x50 [ 64.417214] [<c000000000646de4>] device_offline+0x114/0x150 [ 64.417291] [<c000000000646fac>] online_store+0x6c/0xc0 [ 64.417355] [<c000000000642cc8>] dev_attr_store+0x68/0xa0 [ 64.417421] [<c0000000003bfd44>] sysfs_kf_write+0x94/0xc0 [ 64.417488] [<c0000000003be94c>] kernfs_fop_write+0x18c/0x1f0 [ 64.417564] [<c000000000304dfc>] __vfs_write+0x6c/0x190 [ 64.417630] [<c000000000305b10>] vfs_write+0xc0/0x200 [ 64.417694] [<c000000000306b0c>] SyS_write+0x6c/0x110 [ 64.417759] [<c000000000009364>] system_call+0x38/0xd0 [ 64.417823] [ 64.417823] other info that might help us debug this: [ 64.417823] [ 64.417901] Possible unsafe locking scenario: [ 64.417901] [ 64.417965] CPU0 CPU1 [ 64.418015] ---- ---- [ 64.418065] lock(od_dbs_cdata.mutex); [ 64.418129] lock((&(&j_cdbs->dwork)->work)); [ 64.418217] lock(od_dbs_cdata.mutex); [ 64.418304] lock((&(&j_cdbs->dwork)->work)); [ 64.418368] [ 64.418368] *** DEADLOCK *** [ 64.418368] [ 64.418432] 9 locks held by ppc64_cpu/3378: [ 64.418471] #0: (sb_writers#3){.+.+.+}, at: [<c000000000305c20>] vfs_write+0x1d0/0x200 [ 64.418600] #1: (&of->mutex){+.+.+.}, at: [<c0000000003be83c>] kernfs_fop_write+0x7c/0x1f0 [ 64.418728] #2: (s_active#54){.+.+.+}, at: [<c0000000003be848>] kernfs_fop_write+0x88/0x1f0 [ 64.418868] #3: (device_hotplug_lock){+.+...}, at: [<c0000000006452e8>] lock_device_hotplug_sysfs+0x28/0x90 [ 64.419009] #4: (&dev->mutex){......}, at: [<c000000000646d60>] device_offline+0x90/0x150 [ 64.419124] #5: (cpu_add_remove_lock){+.+.+.}, at: [<c00000000025a74c>] cpu_down+0x3c/0xa0 [ 64.419252] #6: (cpu_hotplug.lock){++++++}, at: [<c0000000000cb458>] cpu_hotplug_begin+0x8/0x110 [ 64.419382] #7: (cpu_hotplug.lock#2){+.+.+.}, at: [<c0000000000cb4f0>] cpu_hotplug_begin+0xa0/0x110 [ 64.419522] #8: (od_dbs_cdata.mutex){+.+.+.}, at: [<c00000000085b3a0>] cpufreq_governor_dbs+0x80/0x940 [ 64.419662] [ 64.419662] stack backtrace: [ 64.419716] CPU: 125 PID: 3378 Comm: ppc64_cpu Not tainted 4.1.0-rc7-00513-g3af78d9 #44 [ 64.419795] Call Trace: [ 64.419824] [c000000fbe7832e0] [c000000000a80fe8] dump_stack+0xa0/0xdc (unreliable) [ 64.419913] [c000000fbe783310] [c000000000a7b2e8] print_circular_bug+0x354/0x388 [ 64.420003] [c000000fbe7833b0] [c000000000145480] check_prev_add+0x8c0/0x8d0 [ 64.420080] [c000000fbe7834b0] [c000000000147764] __lock_acquire+0x1114/0x1990 [ 64.420169] [c000000fbe7835d0] [c0000000001489c8] lock_acquire+0xf8/0x340 [ 64.420245] [c000000fbe783690] [c0000000000f58a8] flush_work+0x68/0x350 [ 64.420321] [c000000fbe783780] [c0000000000f5c74] __cancel_work_timer+0xe4/0x270 [ 64.420410] [c000000fbe783810] [c00000000085b8d0] cpufreq_governor_dbs+0x5b0/0x940 [ 64.420499] [c000000fbe7838b0] [c000000000857e3c] od_cpufreq_governor_dbs+0x3c/0x60 [ 64.420588] [c000000fbe7838f0] [c000000000852b04] __cpufreq_governor+0xe4/0x320 [ 64.420678] [c000000fbe783970] [c000000000855bb8] __cpufreq_remove_dev_prepare.isra.22+0x78/0x340 [ 64.420780] [c000000fbe7839f0] [c000000000855f44] cpufreq_cpu_callback+0xc4/0xe0 [ 64.420869] [c000000fbe783a20] [c000000000100dec] notifier_call_chain+0xbc/0x120 [ 64.420957] [c000000fbe783a70] [c00000000025a3bc] _cpu_down+0xec/0x440 [ 64.421034] [c000000fbe783b30] [c00000000025a76c] cpu_down+0x5c/0xa0 [ 64.421110] [c000000fbe783b60] [c00000000064f52c] cpu_subsys_offline+0x2c/0x50 [ 64.421199] [c000000fbe783b90] [c000000000646de4] device_offline+0x114/0x150 [ 64.421275] [c000000fbe783bd0] [c000000000646fac] online_store+0x6c/0xc0 [ 64.421352] [c000000fbe783c20] [c000000000642cc8] dev_attr_store+0x68/0xa0 [ 64.421428] [c000000fbe783c60] [c0000000003bfd44] sysfs_kf_write+0x94/0xc0 [ 64.421504] [c000000fbe783ca0] [c0000000003be94c] kernfs_fop_write+0x18c/0x1f0 [ 64.421594] [c000000fbe783cf0] [c000000000304dfc] __vfs_write+0x6c/0x190 [ 64.421670] [c000000fbe783d90] [c000000000305b10] vfs_write+0xc0/0x200 [ 64.421747] [c000000fbe783de0] [c000000000306b0c] SyS_write+0x6c/0x110

ppc64_cpu is the utility used to perform cpu hotplug.

Regards Preeti U Murthy

...

-- viresh

Viresh Kumar (12): cpufreq: governor: Name delayed-work as dwork cpufreq: governor: Drop unused field 'cpu' cpufreq: governor: Rename 'cpu_dbs_common_info' to 'cpu_dbs_info' cpufreq: governor: name pointer to cpu_dbs_info as 'cdbs' cpufreq: governor: rename cur_policy as policy cpufreq: governor: Keep single copy of information common to policy->cpus cpufreq: governor: split out common part of {cs|od}_dbs_timer() cpufreq: governor: synchronize work-handler with governor callbacks cpufreq: governor: Avoid invalid states with additional checks cpufreq: governor: Don't WARN on invalid states cpufreq: propagate errors returned from __cpufreq_governor() cpufreq: conservative: remove 'enable' field

drivers/cpufreq/cpufreq.c | 22 ++-- drivers/cpufreq/cpufreq_conservative.c | 35 ++---- drivers/cpufreq/cpufreq_governor.c | 200 +++++++++++++++++++++++---------- drivers/cpufreq/cpufreq_governor.h | 36 +++--- drivers/cpufreq/cpufreq_ondemand.c | 68 +++++------ 5 files changed, 214 insertions(+), 147 deletions(-)

Viresh Kumar

5:45 a.m.

On 15-06-15, 10:19, Preeti U Murthy wrote:

...

I ran the kernel compiled from the above ^^ branch. I get data access exception with the following backtrace:

Were you trying to run some tests with it? Or was that just on normal boot?

...

[ 67.834689] Unable to handle kernel paging request for data at address 0x00000008 [ 67.834800] Faulting instruction address: 0xc000000000859708 cpu 0x0: Vector: 300 (Data Access) at [c00000000382b4b0] pc: c000000000859708: dbs_cpufreq_notifier+0x68/0xe0

This belongs to conservative governor..

...

lr: c000000000100dec: notifier_call_chain+0xbc/0x120
sp: c00000000382b730
msr: 9000000100009033 dar: 8 dsisr: 40000000 current = 0xc0000000038876c0 paca = 0xc000000007da0000 softe: 0 irq_happened: 0x01 pid = 737, comm = kworker/0:2 enter ? for help [c00000000382b780] c000000000100dec notifier_call_chain+0xbc/0x120 [c00000000382b7d0] c000000000101638 __srcu_notifier_call_chain+0xa8/0x110 [c00000000382b830] c000000000850844 cpufreq_notify_transition+0x1a4/0x540 [c00000000382b920] c000000000850d1c cpufreq_freq_transition_begin+0x13c/0x180 [c00000000382b9b0] c000000000851958 __cpufreq_driver_target+0x2b8/0x4a0 [c00000000382ba70] c0000000008578b0 od_check_cpu+0x120/0x140 [c00000000382baa0] c00000000085ac7c dbs_check_cpu+0x25c/0x310 [c00000000382bb50] c0000000008580f0 od_dbs_timer+0x120/0x190

... And this is about ondemand governor. Why is the callback getting called for a different governor ?

Well, here is the answer:

From: Viresh Kumar viresh.kumar@linaro.org Date: Mon, 15 Jun 2015 11:06:36 +0530 Subject: [PATCH] cpufreq: conservative: only manage relevant CPUs's notifier

Conservative governor registers for freq-change notifiers for its functioning. The notifiers layer doesn't have any information about which CPUs to notify for and hence notifies for all CPUs.

Conservative governor might not be managing all present CPUs on a system and so notifications for CPUs which it isn't managing must be discarded.

Compare policy's governor against conservative governor to check this.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 18 ++++++++++++++++-- drivers/cpufreq/cpufreq_governor.h | 1 + 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index e0b49729307d..f63a79d6d557 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -24,6 +24,11 @@ static struct common_dbs_data cs_dbs_cdata; static DEFINE_PER_CPU(struct cs_cpu_dbs_info_s, cs_cpu_dbs_info);

+#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE +static +#endif +struct cpufreq_governor cpufreq_gov_conservative; + static inline unsigned int get_freq_target(struct cs_dbs_tuners *cs_tuners, struct cpufreq_policy *policy) { @@ -120,8 +125,17 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, struct cpufreq_freqs *freq = data; struct cs_cpu_dbs_info_s *dbs_info = &per_cpu(cs_cpu_dbs_info, freq->cpu); - struct cpu_common_dbs_info *ccdbs = dbs_info->cdbs.ccdbs; - struct cpufreq_policy *policy = ccdbs->policy; + struct cpufreq_policy *policy = cpufreq_cpu_get_raw(freq->cpu); + struct cpu_common_dbs_info *ccdbs; + + if (WARN_ON(!policy)) + return -EINVAL; + + /* policy isn't governed by conservative governor */ + if (policy->governor != &cpufreq_gov_conservative) + return 0; + + ccdbs = dbs_info->cdbs.ccdbs;

mutex_lock(&cs_dbs_cdata.mutex); if (!ccdbs->enabled) diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 7f651bdf43ae..1c551237ac8d 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -273,4 +273,5 @@ void od_register_powersave_bias_handler(unsigned int (*f) (struct cpufreq_policy *, unsigned int, unsigned int), unsigned int powersave_bias); void od_unregister_powersave_bias_handler(void); +struct cpufreq_policy *cpufreq_cpu_get_raw(unsigned int cpu); #endif /* _CPUFREQ_GOVERNOR_H */

Applies on top of that series, please try once more..

...

I also get the following lockdep warning just before hitting the above panic.

[ 64.414664] [ 64.414724] ====================================================== [ 64.414810] [ INFO: possible circular locking dependency detected ] [ 64.414883] 4.1.0-rc7-00513-g3af78d9 #44 Not tainted [ 64.414934] ------------------------------------------------------- [ 64.414998] ppc64_cpu/3378 is trying to acquire lock: [ 64.415049] ((&(&j_cdbs->dwork)->work)){+.+...}, at: [<c0000000000f5848>] flush_work+0x8/0x350 [ 64.415172] [ 64.415172] but task is already holding lock: [ 64.415236] (od_dbs_cdata.mutex){+.+.+.}, at: [<c00000000085b3a0>] cpufreq_governor_dbs+0x80/0x940 [ 64.415366] [ 64.415366] which lock already depends on the new lock. [ 64.415366] [ 64.415443] [ 64.415443] the existing dependency chain (in reverse order) is: [ 64.415518] -> #1 (od_dbs_cdata.mutex){+.+.+.}: [ 64.415608] [<c0000000001489c8>] lock_acquire+0xf8/0x340 [ 64.415674] [<c000000000a6dc28>] mutex_lock_nested+0xb8/0x5b0 [ 64.415752] [<c00000000085b220>] dbs_timer+0x50/0x150 [ 64.415824] [<c0000000000f489c>] process_one_work+0x24c/0x910 [ 64.415909] [<c0000000000f50dc>] worker_thread+0x17c/0x540 [ 64.415981] [<c0000000000fed70>] kthread+0x120/0x140 [ 64.416052] [<c000000000009678>] ret_from_kernel_thread+0x5c/0x64 [ 64.416139] -> #0 ((&(&j_cdbs->dwork)->work)){+.+...}: [ 64.416240] [<c000000000147764>] __lock_acquire+0x1114/0x1990 [ 64.416321] [<c0000000001489c8>] lock_acquire+0xf8/0x340 [ 64.416385] [<c0000000000f58a8>] flush_work+0x68/0x350 [ 64.416453] [<c0000000000f5c74>] __cancel_work_timer+0xe4/0x270 [ 64.416530] [<c00000000085b8d0>] cpufreq_governor_dbs+0x5b0/0x940 [ 64.416605] [<c000000000857e3c>] od_cpufreq_governor_dbs+0x3c/0x60 [ 64.416684] [<c000000000852b04>] __cpufreq_governor+0xe4/0x320 [ 64.416762] [<c000000000855bb8>] __cpufreq_remove_dev_prepare.isra.22+0x78/0x340 [ 64.416851] [<c000000000855f44>] cpufreq_cpu_callback+0xc4/0xe0 [ 64.416928] [<c000000000100dec>] notifier_call_chain+0xbc/0x120 [ 64.417005] [<c00000000025a3bc>] _cpu_down+0xec/0x440 [ 64.417072] [<c00000000025a76c>] cpu_down+0x5c/0xa0 [ 64.417137] [<c00000000064f52c>] cpu_subsys_offline+0x2c/0x50 [ 64.417214] [<c000000000646de4>] device_offline+0x114/0x150 [ 64.417291] [<c000000000646fac>] online_store+0x6c/0xc0 [ 64.417355] [<c000000000642cc8>] dev_attr_store+0x68/0xa0 [ 64.417421] [<c0000000003bfd44>] sysfs_kf_write+0x94/0xc0 [ 64.417488] [<c0000000003be94c>] kernfs_fop_write+0x18c/0x1f0 [ 64.417564] [<c000000000304dfc>] __vfs_write+0x6c/0x190 [ 64.417630] [<c000000000305b10>] vfs_write+0xc0/0x200 [ 64.417694] [<c000000000306b0c>] SyS_write+0x6c/0x110 [ 64.417759] [<c000000000009364>] system_call+0x38/0xd0 [ 64.417823] [ 64.417823] other info that might help us debug this: [ 64.417823] [ 64.417901] Possible unsafe locking scenario: [ 64.417901] [ 64.417965] CPU0 CPU1 [ 64.418015] ---- ---- [ 64.418065] lock(od_dbs_cdata.mutex); [ 64.418129] lock((&(&j_cdbs->dwork)->work)); [ 64.418217] lock(od_dbs_cdata.mutex); [ 64.418304] lock((&(&j_cdbs->dwork)->work)); [ 64.418368] [ 64.418368] *** DEADLOCK *** [ 64.418368] [ 64.418432] 9 locks held by ppc64_cpu/3378: [ 64.418471] #0: (sb_writers#3){.+.+.+}, at: [<c000000000305c20>] vfs_write+0x1d0/0x200 [ 64.418600] #1: (&of->mutex){+.+.+.}, at: [<c0000000003be83c>] kernfs_fop_write+0x7c/0x1f0 [ 64.418728] #2: (s_active#54){.+.+.+}, at: [<c0000000003be848>] kernfs_fop_write+0x88/0x1f0 [ 64.418868] #3: (device_hotplug_lock){+.+...}, at: [<c0000000006452e8>] lock_device_hotplug_sysfs+0x28/0x90 [ 64.419009] #4: (&dev->mutex){......}, at: [<c000000000646d60>] device_offline+0x90/0x150 [ 64.419124] #5: (cpu_add_remove_lock){+.+.+.}, at: [<c00000000025a74c>] cpu_down+0x3c/0xa0 [ 64.419252] #6: (cpu_hotplug.lock){++++++}, at: [<c0000000000cb458>] cpu_hotplug_begin+0x8/0x110 [ 64.419382] #7: (cpu_hotplug.lock#2){+.+.+.}, at: [<c0000000000cb4f0>] cpu_hotplug_begin+0xa0/0x110 [ 64.419522] #8: (od_dbs_cdata.mutex){+.+.+.}, at: [<c00000000085b3a0>] cpufreq_governor_dbs+0x80/0x940 [ 64.419662] [ 64.419662] stack backtrace: [ 64.419716] CPU: 125 PID: 3378 Comm: ppc64_cpu Not tainted 4.1.0-rc7-00513-g3af78d9 #44 [ 64.419795] Call Trace: [ 64.419824] [c000000fbe7832e0] [c000000000a80fe8] dump_stack+0xa0/0xdc (unreliable) [ 64.419913] [c000000fbe783310] [c000000000a7b2e8] print_circular_bug+0x354/0x388 [ 64.420003] [c000000fbe7833b0] [c000000000145480] check_prev_add+0x8c0/0x8d0 [ 64.420080] [c000000fbe7834b0] [c000000000147764] __lock_acquire+0x1114/0x1990 [ 64.420169] [c000000fbe7835d0] [c0000000001489c8] lock_acquire+0xf8/0x340 [ 64.420245] [c000000fbe783690] [c0000000000f58a8] flush_work+0x68/0x350 [ 64.420321] [c000000fbe783780] [c0000000000f5c74] __cancel_work_timer+0xe4/0x270 [ 64.420410] [c000000fbe783810] [c00000000085b8d0] cpufreq_governor_dbs+0x5b0/0x940 [ 64.420499] [c000000fbe7838b0] [c000000000857e3c] od_cpufreq_governor_dbs+0x3c/0x60 [ 64.420588] [c000000fbe7838f0] [c000000000852b04] __cpufreq_governor+0xe4/0x320 [ 64.420678] [c000000fbe783970] [c000000000855bb8] __cpufreq_remove_dev_prepare.isra.22+0x78/0x340 [ 64.420780] [c000000fbe7839f0] [c000000000855f44] cpufreq_cpu_callback+0xc4/0xe0 [ 64.420869] [c000000fbe783a20] [c000000000100dec] notifier_call_chain+0xbc/0x120 [ 64.420957] [c000000fbe783a70] [c00000000025a3bc] _cpu_down+0xec/0x440 [ 64.421034] [c000000fbe783b30] [c00000000025a76c] cpu_down+0x5c/0xa0 [ 64.421110] [c000000fbe783b60] [c00000000064f52c] cpu_subsys_offline+0x2c/0x50 [ 64.421199] [c000000fbe783b90] [c000000000646de4] device_offline+0x114/0x150 [ 64.421275] [c000000fbe783bd0] [c000000000646fac] online_store+0x6c/0xc0 [ 64.421352] [c000000fbe783c20] [c000000000642cc8] dev_attr_store+0x68/0xa0 [ 64.421428] [c000000fbe783c60] [c0000000003bfd44] sysfs_kf_write+0x94/0xc0 [ 64.421504] [c000000fbe783ca0] [c0000000003be94c] kernfs_fop_write+0x18c/0x1f0 [ 64.421594] [c000000fbe783cf0] [c000000000304dfc] __vfs_write+0x6c/0x190 [ 64.421670] [c000000fbe783d90] [c000000000305b10] vfs_write+0xc0/0x200 [ 64.421747] [c000000fbe783de0] [c000000000306b0c] SyS_write+0x6c/0x110

ppc64_cpu is the utility used to perform cpu hotplug.

Sigh.. These ghosts will never leave us. Okay, lets fix this separately. Check if you are getting any crashes that you were getting earlier..

-- viresh

Rafael J. Wysocki

11:29 p.m.

On Thursday, June 11, 2015 04:21:43 PM Viresh Kumar wrote:

...

Hi,

This is a next step of the earlier work I have posted [1].

The first 5 patches are minor cleanups, 6th & 7th try to optimize few things to make code less complex.

Patches 8-11 actually solve (or try to solve :)) the synchronization problems, or the crashes I was getting.

And the last one again optimizes code a bit.

I don't get the crashes anymore and want others to try. At least Preeti and Ke Wang (Sorry if I haven't spelled it properly :( ), as you two have reported the issues to me.

This patchset should be rebased over the earlier one [1].

To make things simple, all these patches are pushed here [2] and are rebased over pm/bleeeding-edge because of some recent important changes there:

[1] http://marc.info/?i=cover.1433326032.git.viresh.kumar%40linaro.org [2] ssh://git@git.linaro.org/people/viresh.kumar/linux.git cpufreq/gov-locking

Given the Preeti's comments on these patches I don't think they are quite ready and the 4.2 merge window is next week. I don't think this series is 4.2 material, so please target it at the next one.

-- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center.

Viresh Kumar

16 Jun 16 Jun

2:10 a.m.

On 16-06-15, 01:29, Rafael J. Wysocki wrote:

...

Given the Preeti's comments on these patches I don't think they are quite ready and the 4.2 merge window is next week. I don't think this series is 4.2 material, so please target it at the next one.

/me nods.

-- viresh

Viresh Kumar

18 Jun 18 Jun

5:19 a.m.

On 16-06-15, 01:29, Rafael J. Wysocki wrote:

...

Given the Preeti's comments on these patches I don't think they are quite ready and the 4.2 merge window is next week. I don't think this series is 4.2 material, so please target it at the next one.

I agreed to this earlier, yes. But will it be possible to push [1-5]/12 for 4.2 ? Its not that they are really important and something like that, but there is no point sending these trivial patches to lists (they are already Reviewed-by Preeti, leaving 3/12 which I have asked her to review today). Wanna avoid unnecessary noise over lists, that's it.

I see that these are not anymore in patchwork, so you might want me to send them again. Will do that once you confirm.

Thanks.

-- viresh

3771

days inactive

3779

days old

linaro-kernel@lists.linaro.org

34 comments

participants

tags (0)

participants (3)

Preeti U Murthy
Rafael J. Wysocki
Viresh Kumar