[PATCH V2 00/10] cpufreq: governor: Avoid invalid state-transitions

List overview All Threads
Download

newer

older

[RFC V2] Add gup trace points...

[RESEND PATCH] arm64: bpf: add...

Viresh Kumar

19 Jun 2015 19 Jun '15

11:48 a.m.

Hi Rafael/Preeti,

This is another attempt to fix the crashes reported by Preeti. They work quite well for me now, and I hope they would work well for Preeti as well :)

So, patches [1-7,9] are already Reviewed by Preeti.

The first 5 patches are minor cleanups, 6th & 7th try to optimize few things to make code less complex.

Patches 8-10 actually solve (or try to solve :)) the synchronization problems, or the crashes I was getting.

V1->V2: - 7/11 is dropped and only 8/11 is updated, which is 8/10 now. - Avoid taking the same mutex in both cpufreq_governor_dbs() and work-handler as that has given us some locdeps, classic ABBA stuff. - And so timer_mutex, responsible for synchronizing governor work-handlers is kept as is. - Later patches are almost same with minor updates. - Invalid state-transitions are sensed better now with improved checks. - I have run enough tests on my exynos dual core board and failed to crash at all. Not that I wanted to crash :)

Rebased over pm/bleeeding-edge.

Viresh Kumar (10): cpufreq: governor: Name delayed-work as dwork cpufreq: governor: Drop unused field 'cpu' cpufreq: governor: Rename 'cpu_dbs_common_info' to 'cpu_dbs_info' cpufreq: governor: name pointer to cpu_dbs_info as 'cdbs' cpufreq: governor: rename cur_policy as policy cpufreq: governor: Keep single copy of information common to policy->cpus cpufreq: governor: split out common part of {cs|od}_dbs_timer() cpufreq: governor: Avoid invalid states with additional checks cpufreq: governor: Don't WARN on invalid states cpufreq: propagate errors returned from __cpufreq_governor()

drivers/cpufreq/cpufreq.c | 22 ++-- drivers/cpufreq/cpufreq_conservative.c | 25 ++--- drivers/cpufreq/cpufreq_governor.c | 196 ++++++++++++++++++++++++--------- drivers/cpufreq/cpufreq_governor.h | 40 ++++--- drivers/cpufreq/cpufreq_ondemand.c | 67 ++++++----- 5 files changed, 220 insertions(+), 130 deletions(-)

-- 2.4.0

Show replies by date

Viresh Kumar

19 Jun 19 Jun

11:48 a.m.

New subject: [PATCH V2 01/10] cpufreq: governor: Name delayed-work as dwork

Delayed work was named as 'work' and to access work within it we do work.work. Not much readable. Rename delayed_work as 'dwork'.

Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 2 +- drivers/cpufreq/cpufreq_governor.c | 6 +++--- drivers/cpufreq/cpufreq_governor.h | 2 +- drivers/cpufreq/cpufreq_ondemand.c | 8 ++++---- 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index c86a10c30912..0e3ec1d968d9 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -105,7 +105,7 @@ static void cs_check_cpu(int cpu, unsigned int load) static void cs_dbs_timer(struct work_struct *work) { struct cs_cpu_dbs_info_s *dbs_info = container_of(work, - struct cs_cpu_dbs_info_s, cdbs.work.work); + struct cs_cpu_dbs_info_s, cdbs.dwork.work); unsigned int cpu = dbs_info->cdbs.cur_policy->cpu; struct cs_cpu_dbs_info_s *core_dbs_info = &per_cpu(cs_cpu_dbs_info, cpu); diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 57a39f8a92b7..6024bbc782c0 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -165,7 +165,7 @@ static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data, { struct cpu_dbs_common_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);

- mod_delayed_work_on(cpu, system_wq, &cdbs->work, delay); + mod_delayed_work_on(cpu, system_wq, &cdbs->dwork, delay); }

void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, @@ -204,7 +204,7 @@ static inline void gov_cancel_work(struct dbs_data *dbs_data,

for_each_cpu(i, policy->cpus) { cdbs = dbs_data->cdata->get_cpu_cdbs(i); - cancel_delayed_work_sync(&cdbs->work); + cancel_delayed_work_sync(&cdbs->dwork); } }

@@ -367,7 +367,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];

mutex_init(&j_cdbs->timer_mutex); - INIT_DEFERRABLE_WORK(&j_cdbs->work, cdata->gov_dbs_timer); + INIT_DEFERRABLE_WORK(&j_cdbs->dwork, cdata->gov_dbs_timer); }

if (cdata->governor == GOV_CONSERVATIVE) { diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 34736f5e869d..352eecaae789 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -142,7 +142,7 @@ struct cpu_dbs_common_info { */ unsigned int prev_load; struct cpufreq_policy *cur_policy; - struct delayed_work work; + struct delayed_work dwork; /* * percpu mutex that serializes governor limit change with gov_dbs_timer * invocation. We do not want gov_dbs_timer to run when user is changing diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 3c1e10f2304c..830aef1db8c3 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -194,7 +194,7 @@ static void od_check_cpu(int cpu, unsigned int load) static void od_dbs_timer(struct work_struct *work) { struct od_cpu_dbs_info_s *dbs_info = - container_of(work, struct od_cpu_dbs_info_s, cdbs.work.work); + container_of(work, struct od_cpu_dbs_info_s, cdbs.dwork.work); unsigned int cpu = dbs_info->cdbs.cur_policy->cpu; struct od_cpu_dbs_info_s *core_dbs_info = &per_cpu(od_cpu_dbs_info, cpu); @@ -275,18 +275,18 @@ static void update_sampling_rate(struct dbs_data *dbs_data,

mutex_lock(&dbs_info->cdbs.timer_mutex);

- if (!delayed_work_pending(&dbs_info->cdbs.work)) { + if (!delayed_work_pending(&dbs_info->cdbs.dwork)) { mutex_unlock(&dbs_info->cdbs.timer_mutex); continue; }

next_sampling = jiffies + usecs_to_jiffies(new_rate); - appointed_at = dbs_info->cdbs.work.timer.expires; + appointed_at = dbs_info->cdbs.dwork.timer.expires;

if (time_before(next_sampling, appointed_at)) {

mutex_unlock(&dbs_info->cdbs.timer_mutex); - cancel_delayed_work_sync(&dbs_info->cdbs.work); + cancel_delayed_work_sync(&dbs_info->cdbs.dwork); mutex_lock(&dbs_info->cdbs.timer_mutex);

gov_queue_work(dbs_data, dbs_info->cdbs.cur_policy,

-- 2.4.0

Viresh Kumar

11:48 a.m.

New subject: [PATCH V2 02/10] cpufreq: governor: Drop unused field 'cpu'

Its not used at all, drop it.

Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_governor.c | 1 - drivers/cpufreq/cpufreq_governor.h | 1 - 2 files changed, 2 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 6024bbc782c0..2149ba7d32a8 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -353,7 +353,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, struct cpu_dbs_common_info *j_cdbs = cdata->get_cpu_cdbs(j); unsigned int prev_load;

- j_cdbs->cpu = j; j_cdbs->cur_policy = policy; j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy); diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 352eecaae789..1bbf8c87fdd5 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -130,7 +130,6 @@ static void *get_cpu_dbs_info_s(int cpu) \

/* Per cpu structures */ struct cpu_dbs_common_info { - int cpu; u64 prev_cpu_idle; u64 prev_cpu_wall; u64 prev_cpu_nice;

-- 2.4.0

Viresh Kumar

11:48 a.m.

New subject: [PATCH V2 03/10] cpufreq: governor: Rename 'cpu_dbs_common_info' to 'cpu_dbs_info'

Its not common info to all CPUs, but a structure representing common type of cpu info to both governor types. Lets drop 'common_' from its name.

Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_governor.c | 19 +++++++++---------- drivers/cpufreq/cpufreq_governor.h | 13 ++++++------- 2 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 2149ba7d32a8..529f236f2d05 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -32,7 +32,7 @@ static struct attribute_group *get_sysfs_attr(struct dbs_data *dbs_data)

void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) { - struct cpu_dbs_common_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); + struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); struct od_dbs_tuners *od_tuners = dbs_data->tuners; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; struct cpufreq_policy *policy; @@ -64,7 +64,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)

/* Get Absolute Load */ for_each_cpu(j, policy->cpus) { - struct cpu_dbs_common_info *j_cdbs; + struct cpu_dbs_info *j_cdbs; u64 cur_wall_time, cur_idle_time; unsigned int idle_time, wall_time; unsigned int load; @@ -163,7 +163,7 @@ EXPORT_SYMBOL_GPL(dbs_check_cpu); static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data, unsigned int delay) { - struct cpu_dbs_common_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); + struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);

mod_delayed_work_on(cpu, system_wq, &cdbs->dwork, delay); } @@ -199,7 +199,7 @@ EXPORT_SYMBOL_GPL(gov_queue_work); static inline void gov_cancel_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy) { - struct cpu_dbs_common_info *cdbs; + struct cpu_dbs_info *cdbs; int i;

for_each_cpu(i, policy->cpus) { @@ -209,8 +209,7 @@ static inline void gov_cancel_work(struct dbs_data *dbs_data, }

/* Will return if we need to evaluate cpu load again or not */ -bool need_load_eval(struct cpu_dbs_common_info *cdbs, - unsigned int sampling_rate) +bool need_load_eval(struct cpu_dbs_info *cdbs, unsigned int sampling_rate) { if (policy_is_shared(cdbs->cur_policy)) { ktime_t time_now = ktime_get(); @@ -330,7 +329,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, { struct common_dbs_data *cdata = dbs_data->cdata; unsigned int sampling_rate, ignore_nice, j, cpu = policy->cpu; - struct cpu_dbs_common_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_dbs_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu); int io_busy = 0;

if (!policy->cur) @@ -350,7 +349,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, }

for_each_cpu(j, policy->cpus) { - struct cpu_dbs_common_info *j_cdbs = cdata->get_cpu_cdbs(j); + struct cpu_dbs_info *j_cdbs = cdata->get_cpu_cdbs(j); unsigned int prev_load;

j_cdbs->cur_policy = policy; @@ -398,7 +397,7 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, { struct common_dbs_data *cdata = dbs_data->cdata; unsigned int cpu = policy->cpu; - struct cpu_dbs_common_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_dbs_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu);

if (cdata->governor == GOV_CONSERVATIVE) { struct cs_cpu_dbs_info_s *cs_dbs_info = @@ -418,7 +417,7 @@ static void cpufreq_governor_limits(struct cpufreq_policy *policy, { struct common_dbs_data *cdata = dbs_data->cdata; unsigned int cpu = policy->cpu; - struct cpu_dbs_common_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_dbs_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu);

if (!cpu_cdbs->cur_policy) return; diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 1bbf8c87fdd5..6b5e33f68064 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -109,7 +109,7 @@ store_one(_gov, file_name)

/* create helper routines */ #define define_get_cpu_dbs_routines(_dbs_info) \ -static struct cpu_dbs_common_info *get_cpu_cdbs(int cpu) \ +static struct cpu_dbs_info *get_cpu_cdbs(int cpu) \ { \ return &per_cpu(_dbs_info, cpu).cdbs; \ } \ @@ -129,7 +129,7 @@ static void *get_cpu_dbs_info_s(int cpu) \ */

/* Per cpu structures */ -struct cpu_dbs_common_info { +struct cpu_dbs_info { u64 prev_cpu_idle; u64 prev_cpu_wall; u64 prev_cpu_nice; @@ -152,7 +152,7 @@ struct cpu_dbs_common_info { };

struct od_cpu_dbs_info_s { - struct cpu_dbs_common_info cdbs; + struct cpu_dbs_info cdbs; struct cpufreq_frequency_table *freq_table; unsigned int freq_lo; unsigned int freq_lo_jiffies; @@ -162,7 +162,7 @@ struct od_cpu_dbs_info_s { };

struct cs_cpu_dbs_info_s { - struct cpu_dbs_common_info cdbs; + struct cpu_dbs_info cdbs; unsigned int down_skip; unsigned int requested_freq; unsigned int enable:1; @@ -203,7 +203,7 @@ struct common_dbs_data { */ struct dbs_data *gdbs_data;

- struct cpu_dbs_common_info *(*get_cpu_cdbs)(int cpu); + struct cpu_dbs_info *(*get_cpu_cdbs)(int cpu); void *(*get_cpu_dbs_info_s)(int cpu); void (*gov_dbs_timer)(struct work_struct *work); void (*gov_check_cpu)(int cpu, unsigned int load); @@ -264,8 +264,7 @@ static ssize_t show_sampling_rate_min_gov_pol \ extern struct mutex cpufreq_governor_lock;

void dbs_check_cpu(struct dbs_data *dbs_data, int cpu); -bool need_load_eval(struct cpu_dbs_common_info *cdbs, - unsigned int sampling_rate); +bool need_load_eval(struct cpu_dbs_info *cdbs, unsigned int sampling_rate); int cpufreq_governor_dbs(struct cpufreq_policy *policy, struct common_dbs_data *cdata, unsigned int event); void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,

-- 2.4.0

Viresh Kumar

11:48 a.m.

New subject: [PATCH V2 04/10] cpufreq: governor: name pointer to cpu_dbs_info as 'cdbs'

It is called as 'cdbs' at most of the places and 'cpu_dbs' at others. Lets use 'cdbs' consistently for better readability.

Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_governor.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 529f236f2d05..4ea13f182154 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -329,7 +329,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, { struct common_dbs_data *cdata = dbs_data->cdata; unsigned int sampling_rate, ignore_nice, j, cpu = policy->cpu; - struct cpu_dbs_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu); int io_busy = 0;

if (!policy->cur) @@ -385,7 +385,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, }

/* Initiate timer time stamp */ - cpu_cdbs->time_stamp = ktime_get(); + cdbs->time_stamp = ktime_get();

gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate), true); @@ -397,7 +397,7 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, { struct common_dbs_data *cdata = dbs_data->cdata; unsigned int cpu = policy->cpu; - struct cpu_dbs_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

if (cdata->governor == GOV_CONSERVATIVE) { struct cs_cpu_dbs_info_s *cs_dbs_info = @@ -408,8 +408,8 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy,

gov_cancel_work(dbs_data, policy);

- mutex_destroy(&cpu_cdbs->timer_mutex); - cpu_cdbs->cur_policy = NULL; + mutex_destroy(&cdbs->timer_mutex); + cdbs->cur_policy = NULL; }

static void cpufreq_governor_limits(struct cpufreq_policy *policy, @@ -417,20 +417,20 @@ static void cpufreq_governor_limits(struct cpufreq_policy *policy, { struct common_dbs_data *cdata = dbs_data->cdata; unsigned int cpu = policy->cpu; - struct cpu_dbs_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

- if (!cpu_cdbs->cur_policy) + if (!cdbs->cur_policy) return;

- mutex_lock(&cpu_cdbs->timer_mutex); - if (policy->max < cpu_cdbs->cur_policy->cur) - __cpufreq_driver_target(cpu_cdbs->cur_policy, policy->max, + mutex_lock(&cdbs->timer_mutex); + if (policy->max < cdbs->cur_policy->cur) + __cpufreq_driver_target(cdbs->cur_policy, policy->max, CPUFREQ_RELATION_H); - else if (policy->min > cpu_cdbs->cur_policy->cur) - __cpufreq_driver_target(cpu_cdbs->cur_policy, policy->min, + else if (policy->min > cdbs->cur_policy->cur) + __cpufreq_driver_target(cdbs->cur_policy, policy->min, CPUFREQ_RELATION_L); dbs_check_cpu(dbs_data, cpu); - mutex_unlock(&cpu_cdbs->timer_mutex); + mutex_unlock(&cdbs->timer_mutex); }

int cpufreq_governor_dbs(struct cpufreq_policy *policy,

-- 2.4.0

Viresh Kumar

11:48 a.m.

New subject: [PATCH V2 05/10] cpufreq: governor: rename cur_policy as policy

Just call it 'policy', cur_policy is unnecessarily long and doesn't have any special meaning.

Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 10 +++++----- drivers/cpufreq/cpufreq_governor.c | 18 +++++++++--------- drivers/cpufreq/cpufreq_governor.h | 2 +- drivers/cpufreq/cpufreq_ondemand.c | 19 ++++++++++--------- 4 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 0e3ec1d968d9..af47d322679e 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -47,7 +47,7 @@ static inline unsigned int get_freq_target(struct cs_dbs_tuners *cs_tuners, static void cs_check_cpu(int cpu, unsigned int load) { struct cs_cpu_dbs_info_s *dbs_info = &per_cpu(cs_cpu_dbs_info, cpu); - struct cpufreq_policy *policy = dbs_info->cdbs.cur_policy; + struct cpufreq_policy *policy = dbs_info->cdbs.policy; struct dbs_data *dbs_data = policy->governor_data; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;

@@ -106,10 +106,10 @@ static void cs_dbs_timer(struct work_struct *work) { struct cs_cpu_dbs_info_s *dbs_info = container_of(work, struct cs_cpu_dbs_info_s, cdbs.dwork.work); - unsigned int cpu = dbs_info->cdbs.cur_policy->cpu; + unsigned int cpu = dbs_info->cdbs.policy->cpu; struct cs_cpu_dbs_info_s *core_dbs_info = &per_cpu(cs_cpu_dbs_info, cpu); - struct dbs_data *dbs_data = dbs_info->cdbs.cur_policy->governor_data; + struct dbs_data *dbs_data = dbs_info->cdbs.policy->governor_data; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; int delay = delay_for_sampling_rate(cs_tuners->sampling_rate); bool modify_all = true; @@ -120,7 +120,7 @@ static void cs_dbs_timer(struct work_struct *work) else dbs_check_cpu(dbs_data, cpu);

- gov_queue_work(dbs_data, dbs_info->cdbs.cur_policy, delay, modify_all); + gov_queue_work(dbs_data, dbs_info->cdbs.policy, delay, modify_all); mutex_unlock(&core_dbs_info->cdbs.timer_mutex); }

@@ -135,7 +135,7 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, if (!dbs_info->enable) return 0;

- policy = dbs_info->cdbs.cur_policy; + policy = dbs_info->cdbs.policy;

/* * we only care if our internally tracked freq moves outside the 'valid' diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 4ea13f182154..c0566f86caed 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -60,7 +60,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) ignore_nice = cs_tuners->ignore_nice_load; }

- policy = cdbs->cur_policy; + policy = cdbs->policy;

/* Get Absolute Load */ for_each_cpu(j, policy->cpus) { @@ -211,7 +211,7 @@ static inline void gov_cancel_work(struct dbs_data *dbs_data, /* Will return if we need to evaluate cpu load again or not */ bool need_load_eval(struct cpu_dbs_info *cdbs, unsigned int sampling_rate) { - if (policy_is_shared(cdbs->cur_policy)) { + if (policy_is_shared(cdbs->policy)) { ktime_t time_now = ktime_get(); s64 delta_us = ktime_us_delta(time_now, cdbs->time_stamp);

@@ -352,7 +352,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, struct cpu_dbs_info *j_cdbs = cdata->get_cpu_cdbs(j); unsigned int prev_load;

- j_cdbs->cur_policy = policy; + j_cdbs->policy = policy; j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy);

@@ -409,7 +409,7 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, gov_cancel_work(dbs_data, policy);

mutex_destroy(&cdbs->timer_mutex); - cdbs->cur_policy = NULL; + cdbs->policy = NULL; }

static void cpufreq_governor_limits(struct cpufreq_policy *policy, @@ -419,15 +419,15 @@ static void cpufreq_governor_limits(struct cpufreq_policy *policy, unsigned int cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

- if (!cdbs->cur_policy) + if (!cdbs->policy) return;

mutex_lock(&cdbs->timer_mutex); - if (policy->max < cdbs->cur_policy->cur) - __cpufreq_driver_target(cdbs->cur_policy, policy->max, + if (policy->max < cdbs->policy->cur) + __cpufreq_driver_target(cdbs->policy, policy->max, CPUFREQ_RELATION_H); - else if (policy->min > cdbs->cur_policy->cur) - __cpufreq_driver_target(cdbs->cur_policy, policy->min, + else if (policy->min > cdbs->policy->cur) + __cpufreq_driver_target(cdbs->policy, policy->min, CPUFREQ_RELATION_L); dbs_check_cpu(dbs_data, cpu); mutex_unlock(&cdbs->timer_mutex); diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 6b5e33f68064..a0f8eb79ee6d 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -140,7 +140,7 @@ struct cpu_dbs_info { * wake-up from idle. */ unsigned int prev_load; - struct cpufreq_policy *cur_policy; + struct cpufreq_policy *policy; struct delayed_work dwork; /* * percpu mutex that serializes governor limit change with gov_dbs_timer diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 830aef1db8c3..d29c6f9c6e3e 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -155,7 +155,7 @@ static void dbs_freq_increase(struct cpufreq_policy *policy, unsigned int freq) static void od_check_cpu(int cpu, unsigned int load) { struct od_cpu_dbs_info_s *dbs_info = &per_cpu(od_cpu_dbs_info, cpu); - struct cpufreq_policy *policy = dbs_info->cdbs.cur_policy; + struct cpufreq_policy *policy = dbs_info->cdbs.policy; struct dbs_data *dbs_data = policy->governor_data; struct od_dbs_tuners *od_tuners = dbs_data->tuners;

@@ -195,10 +195,10 @@ static void od_dbs_timer(struct work_struct *work) { struct od_cpu_dbs_info_s *dbs_info = container_of(work, struct od_cpu_dbs_info_s, cdbs.dwork.work); - unsigned int cpu = dbs_info->cdbs.cur_policy->cpu; + unsigned int cpu = dbs_info->cdbs.policy->cpu; struct od_cpu_dbs_info_s *core_dbs_info = &per_cpu(od_cpu_dbs_info, cpu); - struct dbs_data *dbs_data = dbs_info->cdbs.cur_policy->governor_data; + struct dbs_data *dbs_data = dbs_info->cdbs.policy->governor_data; struct od_dbs_tuners *od_tuners = dbs_data->tuners; int delay = 0, sample_type = core_dbs_info->sample_type; bool modify_all = true; @@ -213,8 +213,9 @@ static void od_dbs_timer(struct work_struct *work) core_dbs_info->sample_type = OD_NORMAL_SAMPLE; if (sample_type == OD_SUB_SAMPLE) { delay = core_dbs_info->freq_lo_jiffies; - __cpufreq_driver_target(core_dbs_info->cdbs.cur_policy, - core_dbs_info->freq_lo, CPUFREQ_RELATION_H); + __cpufreq_driver_target(core_dbs_info->cdbs.policy, + core_dbs_info->freq_lo, + CPUFREQ_RELATION_H); } else { dbs_check_cpu(dbs_data, cpu); if (core_dbs_info->freq_lo) { @@ -229,7 +230,7 @@ static void od_dbs_timer(struct work_struct *work) delay = delay_for_sampling_rate(od_tuners->sampling_rate * core_dbs_info->rate_mult);

- gov_queue_work(dbs_data, dbs_info->cdbs.cur_policy, delay, modify_all); + gov_queue_work(dbs_data, dbs_info->cdbs.policy, delay, modify_all); mutex_unlock(&core_dbs_info->cdbs.timer_mutex); }

@@ -289,8 +290,8 @@ static void update_sampling_rate(struct dbs_data *dbs_data, cancel_delayed_work_sync(&dbs_info->cdbs.dwork); mutex_lock(&dbs_info->cdbs.timer_mutex);

- gov_queue_work(dbs_data, dbs_info->cdbs.cur_policy, - usecs_to_jiffies(new_rate), true); + gov_queue_work(dbs_data, dbs_info->cdbs.policy, + usecs_to_jiffies(new_rate), true);

} mutex_unlock(&dbs_info->cdbs.timer_mutex); @@ -559,7 +560,7 @@ static void od_set_powersave_bias(unsigned int powersave_bias) if (cpumask_test_cpu(cpu, &done)) continue;

- policy = per_cpu(od_cpu_dbs_info, cpu).cdbs.cur_policy; + policy = per_cpu(od_cpu_dbs_info, cpu).cdbs.policy; if (!policy) continue;

-- 2.4.0

Viresh Kumar

11:48 a.m.

New subject: [PATCH V2 06/10] cpufreq: governor: Keep single copy of information common to policy->cpus

Some information is common to all CPUs belonging to a policy, but are kept on per-cpu basis. Lets keep that in another structure common to all policy->cpus. That will make updates/reads to that less complex and less error prone.

The memory for ccdbs is allocated/freed at INIT/EXIT, so that it we don't reallocate it for STOP/START sequence. It will be also be used (in next patch) while the governor is stopped and so must not be freed that early.

Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 18 ++++--- drivers/cpufreq/cpufreq_governor.c | 92 +++++++++++++++++++++++++--------- drivers/cpufreq/cpufreq_governor.h | 24 +++++---- drivers/cpufreq/cpufreq_ondemand.c | 38 +++++++------- 4 files changed, 114 insertions(+), 58 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index af47d322679e..5b5c01ca556c 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -47,7 +47,7 @@ static inline unsigned int get_freq_target(struct cs_dbs_tuners *cs_tuners, static void cs_check_cpu(int cpu, unsigned int load) { struct cs_cpu_dbs_info_s *dbs_info = &per_cpu(cs_cpu_dbs_info, cpu); - struct cpufreq_policy *policy = dbs_info->cdbs.policy; + struct cpufreq_policy *policy = dbs_info->cdbs.ccdbs->policy; struct dbs_data *dbs_data = policy->governor_data; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;

@@ -106,22 +106,24 @@ static void cs_dbs_timer(struct work_struct *work) { struct cs_cpu_dbs_info_s *dbs_info = container_of(work, struct cs_cpu_dbs_info_s, cdbs.dwork.work); - unsigned int cpu = dbs_info->cdbs.policy->cpu; + struct cpufreq_policy *policy = dbs_info->cdbs.ccdbs->policy; + unsigned int cpu = policy->cpu; struct cs_cpu_dbs_info_s *core_dbs_info = &per_cpu(cs_cpu_dbs_info, cpu); - struct dbs_data *dbs_data = dbs_info->cdbs.policy->governor_data; + struct dbs_data *dbs_data = policy->governor_data; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; int delay = delay_for_sampling_rate(cs_tuners->sampling_rate); bool modify_all = true;

- mutex_lock(&core_dbs_info->cdbs.timer_mutex); - if (!need_load_eval(&core_dbs_info->cdbs, cs_tuners->sampling_rate)) + mutex_lock(&core_dbs_info->cdbs.ccdbs->timer_mutex); + if (!need_load_eval(core_dbs_info->cdbs.ccdbs, + cs_tuners->sampling_rate)) modify_all = false; else dbs_check_cpu(dbs_data, cpu);

- gov_queue_work(dbs_data, dbs_info->cdbs.policy, delay, modify_all); - mutex_unlock(&core_dbs_info->cdbs.timer_mutex); + gov_queue_work(dbs_data, policy, delay, modify_all); + mutex_unlock(&core_dbs_info->cdbs.ccdbs->timer_mutex); }

static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, @@ -135,7 +137,7 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, if (!dbs_info->enable) return 0;

- policy = dbs_info->cdbs.policy; + policy = dbs_info->cdbs.ccdbs->policy;

/* * we only care if our internally tracked freq moves outside the 'valid' diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index c0566f86caed..72a92fa71ba5 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -35,7 +35,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); struct od_dbs_tuners *od_tuners = dbs_data->tuners; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; - struct cpufreq_policy *policy; + struct cpufreq_policy *policy = cdbs->ccdbs->policy; unsigned int sampling_rate; unsigned int max_load = 0; unsigned int ignore_nice; @@ -60,8 +60,6 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) ignore_nice = cs_tuners->ignore_nice_load; }

- policy = cdbs->policy; - /* Get Absolute Load */ for_each_cpu(j, policy->cpus) { struct cpu_dbs_info *j_cdbs; @@ -209,17 +207,18 @@ static inline void gov_cancel_work(struct dbs_data *dbs_data, }

/* Will return if we need to evaluate cpu load again or not */ -bool need_load_eval(struct cpu_dbs_info *cdbs, unsigned int sampling_rate) +bool need_load_eval(struct cpu_common_dbs_info *ccdbs, + unsigned int sampling_rate) { - if (policy_is_shared(cdbs->policy)) { + if (policy_is_shared(ccdbs->policy)) { ktime_t time_now = ktime_get(); - s64 delta_us = ktime_us_delta(time_now, cdbs->time_stamp); + s64 delta_us = ktime_us_delta(time_now, ccdbs->time_stamp);

/* Do nothing if we recently have sampled */ if (delta_us < (s64)(sampling_rate / 2)) return false; else - cdbs->time_stamp = time_now; + ccdbs->time_stamp = time_now; }

return true; @@ -238,6 +237,37 @@ static void set_sampling_rate(struct dbs_data *dbs_data, } }

+static int alloc_ccdbs(struct cpufreq_policy *policy, + struct common_dbs_data *cdata) +{ + struct cpu_common_dbs_info *ccdbs; + int j; + + /* Allocate memory for the common information for policy->cpus */ + ccdbs = kzalloc(sizeof(*ccdbs), GFP_KERNEL); + if (!ccdbs) + return -ENOMEM; + + /* Set ccdbs for all CPUs, online+offline */ + for_each_cpu(j, policy->related_cpus) + cdata->get_cpu_cdbs(j)->ccdbs = ccdbs; + + return 0; +} + +static void free_ccdbs(struct cpufreq_policy *policy, + struct common_dbs_data *cdata) +{ + struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(policy->cpu); + struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs; + int j; + + for_each_cpu(j, policy->cpus) + cdata->get_cpu_cdbs(j)->ccdbs = NULL; + + kfree(ccdbs); +} + static int cpufreq_governor_init(struct cpufreq_policy *policy, struct dbs_data *dbs_data, struct common_dbs_data *cdata) @@ -248,6 +278,11 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, if (dbs_data) { if (WARN_ON(have_governor_per_policy())) return -EINVAL; + + ret = alloc_ccdbs(policy, cdata); + if (ret) + return ret; + dbs_data->usage_count++; policy->governor_data = dbs_data; return 0; @@ -257,12 +292,16 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, if (!dbs_data) return -ENOMEM;

+ ret = alloc_ccdbs(policy, cdata); + if (ret) + goto free_dbs_data; + dbs_data->cdata = cdata; dbs_data->usage_count = 1;

ret = cdata->init(dbs_data, !policy->governor->initialized); if (ret) - goto free_dbs_data; + goto free_ccdbs;

/* policy latency is in ns. Convert it to us first */ latency = policy->cpuinfo.transition_latency / 1000; @@ -299,6 +338,8 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, } cdata_exit: cdata->exit(dbs_data, !policy->governor->initialized); +free_ccdbs: + free_ccdbs(policy, cdata); free_dbs_data: kfree(dbs_data); return ret; @@ -322,6 +363,8 @@ static void cpufreq_governor_exit(struct cpufreq_policy *policy, cdata->exit(dbs_data, policy->governor->initialized == 1); kfree(dbs_data); } + + free_ccdbs(policy, cdata); }

static int cpufreq_governor_start(struct cpufreq_policy *policy, @@ -330,6 +373,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, struct common_dbs_data *cdata = dbs_data->cdata; unsigned int sampling_rate, ignore_nice, j, cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs; int io_busy = 0;

if (!policy->cur) @@ -348,11 +392,14 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, io_busy = od_tuners->io_is_busy; }

+ ccdbs->policy = policy; + ccdbs->time_stamp = ktime_get(); + mutex_init(&ccdbs->timer_mutex); + for_each_cpu(j, policy->cpus) { struct cpu_dbs_info *j_cdbs = cdata->get_cpu_cdbs(j); unsigned int prev_load;

- j_cdbs->policy = policy; j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy);

@@ -364,7 +411,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, if (ignore_nice) j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];

- mutex_init(&j_cdbs->timer_mutex); INIT_DEFERRABLE_WORK(&j_cdbs->dwork, cdata->gov_dbs_timer); }

@@ -384,9 +430,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, od_ops->powersave_bias_init_cpu(cpu); }

- /* Initiate timer time stamp */ - cdbs->time_stamp = ktime_get(); - gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate), true); return 0; @@ -398,6 +441,9 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, struct common_dbs_data *cdata = dbs_data->cdata; unsigned int cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs; + + gov_cancel_work(dbs_data, policy);

if (cdata->governor == GOV_CONSERVATIVE) { struct cs_cpu_dbs_info_s *cs_dbs_info = @@ -406,10 +452,8 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, cs_dbs_info->enable = 0; }

- gov_cancel_work(dbs_data, policy); - - mutex_destroy(&cdbs->timer_mutex); - cdbs->policy = NULL; + ccdbs->policy = NULL; + mutex_destroy(&ccdbs->timer_mutex); }

static void cpufreq_governor_limits(struct cpufreq_policy *policy, @@ -419,18 +463,18 @@ static void cpufreq_governor_limits(struct cpufreq_policy *policy, unsigned int cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

- if (!cdbs->policy) + if (!cdbs->ccdbs || !cdbs->ccdbs->policy) return;

- mutex_lock(&cdbs->timer_mutex); - if (policy->max < cdbs->policy->cur) - __cpufreq_driver_target(cdbs->policy, policy->max, + mutex_lock(&cdbs->ccdbs->timer_mutex); + if (policy->max < cdbs->ccdbs->policy->cur) + __cpufreq_driver_target(cdbs->ccdbs->policy, policy->max, CPUFREQ_RELATION_H); - else if (policy->min > cdbs->policy->cur) - __cpufreq_driver_target(cdbs->policy, policy->min, + else if (policy->min > cdbs->ccdbs->policy->cur) + __cpufreq_driver_target(cdbs->ccdbs->policy, policy->min, CPUFREQ_RELATION_L); dbs_check_cpu(dbs_data, cpu); - mutex_unlock(&cdbs->timer_mutex); + mutex_unlock(&cdbs->ccdbs->timer_mutex); }

int cpufreq_governor_dbs(struct cpufreq_policy *policy, diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index a0f8eb79ee6d..153412cafb05 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -128,6 +128,18 @@ static void *get_cpu_dbs_info_s(int cpu) \ * cs_*: Conservative governor */

+/* Common to all CPUs of a policy */ +struct cpu_common_dbs_info { + struct cpufreq_policy *policy; + /* + * percpu mutex that serializes governor limit change with gov_dbs_timer + * invocation. We do not want gov_dbs_timer to run when user is changing + * the governor or limits. + */ + struct mutex timer_mutex; + ktime_t time_stamp; +}; + /* Per cpu structures */ struct cpu_dbs_info { u64 prev_cpu_idle; @@ -140,15 +152,8 @@ struct cpu_dbs_info { * wake-up from idle. */ unsigned int prev_load; - struct cpufreq_policy *policy; struct delayed_work dwork; - /* - * percpu mutex that serializes governor limit change with gov_dbs_timer - * invocation. We do not want gov_dbs_timer to run when user is changing - * the governor or limits. - */ - struct mutex timer_mutex; - ktime_t time_stamp; + struct cpu_common_dbs_info *ccdbs; };

struct od_cpu_dbs_info_s { @@ -264,7 +269,8 @@ static ssize_t show_sampling_rate_min_gov_pol \ extern struct mutex cpufreq_governor_lock;

void dbs_check_cpu(struct dbs_data *dbs_data, int cpu); -bool need_load_eval(struct cpu_dbs_info *cdbs, unsigned int sampling_rate); +bool need_load_eval(struct cpu_common_dbs_info *ccdbs, + unsigned int sampling_rate); int cpufreq_governor_dbs(struct cpufreq_policy *policy, struct common_dbs_data *cdata, unsigned int event); void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index d29c6f9c6e3e..651677cfa581 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -155,7 +155,7 @@ static void dbs_freq_increase(struct cpufreq_policy *policy, unsigned int freq) static void od_check_cpu(int cpu, unsigned int load) { struct od_cpu_dbs_info_s *dbs_info = &per_cpu(od_cpu_dbs_info, cpu); - struct cpufreq_policy *policy = dbs_info->cdbs.policy; + struct cpufreq_policy *policy = dbs_info->cdbs.ccdbs->policy; struct dbs_data *dbs_data = policy->governor_data; struct od_dbs_tuners *od_tuners = dbs_data->tuners;

@@ -195,16 +195,18 @@ static void od_dbs_timer(struct work_struct *work) { struct od_cpu_dbs_info_s *dbs_info = container_of(work, struct od_cpu_dbs_info_s, cdbs.dwork.work); - unsigned int cpu = dbs_info->cdbs.policy->cpu; + struct cpufreq_policy *policy = dbs_info->cdbs.ccdbs->policy; + unsigned int cpu = policy->cpu; struct od_cpu_dbs_info_s *core_dbs_info = &per_cpu(od_cpu_dbs_info, cpu); - struct dbs_data *dbs_data = dbs_info->cdbs.policy->governor_data; + struct dbs_data *dbs_data = policy->governor_data; struct od_dbs_tuners *od_tuners = dbs_data->tuners; int delay = 0, sample_type = core_dbs_info->sample_type; bool modify_all = true;

- mutex_lock(&core_dbs_info->cdbs.timer_mutex); - if (!need_load_eval(&core_dbs_info->cdbs, od_tuners->sampling_rate)) { + mutex_lock(&core_dbs_info->cdbs.ccdbs->timer_mutex); + if (!need_load_eval(core_dbs_info->cdbs.ccdbs, + od_tuners->sampling_rate)) { modify_all = false; goto max_delay; } @@ -213,8 +215,7 @@ static void od_dbs_timer(struct work_struct *work) core_dbs_info->sample_type = OD_NORMAL_SAMPLE; if (sample_type == OD_SUB_SAMPLE) { delay = core_dbs_info->freq_lo_jiffies; - __cpufreq_driver_target(core_dbs_info->cdbs.policy, - core_dbs_info->freq_lo, + __cpufreq_driver_target(policy, core_dbs_info->freq_lo, CPUFREQ_RELATION_H); } else { dbs_check_cpu(dbs_data, cpu); @@ -230,8 +231,8 @@ static void od_dbs_timer(struct work_struct *work) delay = delay_for_sampling_rate(od_tuners->sampling_rate * core_dbs_info->rate_mult);

/************************** sysfs interface ************************/ @@ -274,10 +275,10 @@ static void update_sampling_rate(struct dbs_data *dbs_data, dbs_info = &per_cpu(od_cpu_dbs_info, cpu); cpufreq_cpu_put(policy);

- mutex_lock(&dbs_info->cdbs.timer_mutex); + mutex_lock(&dbs_info->cdbs.ccdbs->timer_mutex);

if (!delayed_work_pending(&dbs_info->cdbs.dwork)) { - mutex_unlock(&dbs_info->cdbs.timer_mutex); + mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex); continue; }

@@ -286,15 +287,15 @@ static void update_sampling_rate(struct dbs_data *dbs_data,

if (time_before(next_sampling, appointed_at)) {

- mutex_unlock(&dbs_info->cdbs.timer_mutex); + mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex); cancel_delayed_work_sync(&dbs_info->cdbs.dwork); - mutex_lock(&dbs_info->cdbs.timer_mutex); + mutex_lock(&dbs_info->cdbs.ccdbs->timer_mutex);

- gov_queue_work(dbs_data, dbs_info->cdbs.policy, + gov_queue_work(dbs_data, policy, usecs_to_jiffies(new_rate), true);

} - mutex_unlock(&dbs_info->cdbs.timer_mutex); + mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex); } }

@@ -557,13 +558,16 @@ static void od_set_powersave_bias(unsigned int powersave_bias)

get_online_cpus(); for_each_online_cpu(cpu) { + struct cpu_common_dbs_info *ccdbs; + if (cpumask_test_cpu(cpu, &done)) continue;

- policy = per_cpu(od_cpu_dbs_info, cpu).cdbs.policy; - if (!policy) + ccdbs = per_cpu(od_cpu_dbs_info, cpu).cdbs.ccdbs; + if (!ccdbs) continue;

+ policy = ccdbs->policy; cpumask_or(&done, &done, policy->cpus);

if (policy->governor != &cpufreq_gov_ondemand)

-- 2.4.0

Rafael J. Wysocki

17 Jul 17 Jul

10:11 p.m.

New subject: [PATCH V2 06/10] cpufreq: governor: Keep single copy of information common to policy->cpus

On Friday, June 19, 2015 05:18:06 PM Viresh Kumar wrote:

...

Some information is common to all CPUs belonging to a policy, but are kept on per-cpu basis. Lets keep that in another structure common to all policy->cpus. That will make updates/reads to that less complex and less error prone.

The memory for ccdbs is allocated/freed at INIT/EXIT, so that it we don't reallocate it for STOP/START sequence. It will be also be used (in next patch) while the governor is stopped and so must not be freed that early.

Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org

drivers/cpufreq/cpufreq_conservative.c | 18 ++++--- drivers/cpufreq/cpufreq_governor.c | 92 +++++++++++++++++++++++++--------- drivers/cpufreq/cpufreq_governor.h | 24 +++++---- drivers/cpufreq/cpufreq_ondemand.c | 38 +++++++------- 4 files changed, 114 insertions(+), 58 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index af47d322679e..5b5c01ca556c 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -47,7 +47,7 @@ static inline unsigned int get_freq_target(struct cs_dbs_tuners *cs_tuners, static void cs_check_cpu(int cpu, unsigned int load) { struct cs_cpu_dbs_info_s *dbs_info = &per_cpu(cs_cpu_dbs_info, cpu);

struct cpufreq_policy *policy = dbs_info->cdbs.policy;

struct cpufreq_policy *policy = dbs_info->cdbs.ccdbs->policy; struct dbs_data *dbs_data = policy->governor_data; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;

@@ -106,22 +106,24 @@ static void cs_dbs_timer(struct work_struct *work) { struct cs_cpu_dbs_info_s *dbs_info = container_of(work, struct cs_cpu_dbs_info_s, cdbs.dwork.work);

unsigned int cpu = dbs_info->cdbs.policy->cpu;

struct cpufreq_policy *policy = dbs_info->cdbs.ccdbs->policy;

unsigned int cpu = policy->cpu; struct cs_cpu_dbs_info_s *core_dbs_info = &per_cpu(cs_cpu_dbs_info, cpu);

struct dbs_data *dbs_data = dbs_info->cdbs.policy->governor_data;

struct dbs_data *dbs_data = policy->governor_data; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; int delay = delay_for_sampling_rate(cs_tuners->sampling_rate); bool modify_all = true;

mutex_lock(&core_dbs_info->cdbs.timer_mutex);

if (!need_load_eval(&core_dbs_info->cdbs, cs_tuners->sampling_rate))
mutex_lock(&core_dbs_info->cdbs.ccdbs->timer_mutex);

if (!need_load_eval(core_dbs_info->cdbs.ccdbs,
	    cs_tuners->sampling_rate))
modify_all = false; else dbs_check_cpu(dbs_data, cpu);
gov_queue_work(dbs_data, dbs_info->cdbs.policy, delay, modify_all);

mutex_unlock(&core_dbs_info->cdbs.timer_mutex);

gov_queue_work(dbs_data, policy, delay, modify_all);

mutex_unlock(&core_dbs_info->cdbs.ccdbs->timer_mutex);

} static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, @@ -135,7 +137,7 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, if (!dbs_info->enable) return 0;

policy = dbs_info->cdbs.policy;

policy = dbs_info->cdbs.ccdbs->policy;

/* * we only care if our internally tracked freq moves outside the 'valid' diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index c0566f86caed..72a92fa71ba5 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -35,7 +35,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); struct od_dbs_tuners *od_tuners = dbs_data->tuners; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;

struct cpufreq_policy *policy;

struct cpufreq_policy *policy = cdbs->ccdbs->policy; unsigned int sampling_rate; unsigned int max_load = 0; unsigned int ignore_nice;

@@ -60,8 +60,6 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) ignore_nice = cs_tuners->ignore_nice_load; }

policy = cdbs->policy;

/* Get Absolute Load */ for_each_cpu(j, policy->cpus) { struct cpu_dbs_info *j_cdbs;

@@ -209,17 +207,18 @@ static inline void gov_cancel_work(struct dbs_data *dbs_data, } /* Will return if we need to evaluate cpu load again or not */ -bool need_load_eval(struct cpu_dbs_info *cdbs, unsigned int sampling_rate) +bool need_load_eval(struct cpu_common_dbs_info *ccdbs,
    unsigned int sampling_rate)
{

if (policy_is_shared(cdbs->policy)) {

if (policy_is_shared(ccdbs->policy)) { ktime_t time_now = ktime_get();
s64 delta_us = ktime_us_delta(time_now, cdbs->time_stamp);
s64 delta_us = ktime_us_delta(time_now, ccdbs->time_stamp);
/* Do nothing if we recently have sampled */ if (delta_us < (s64)(sampling_rate / 2)) return false; else
	cdbs->time_stamp = time_now;
	ccdbs->time_stamp = time_now;
}
return true; @@ -238,6 +237,37 @@ static void set_sampling_rate(struct dbs_data *dbs_data, } } +static int alloc_ccdbs(struct cpufreq_policy *policy,
       struct common_dbs_data *cdata)
+{
struct cpu_common_dbs_info *ccdbs;

int j;

/* Allocate memory for the common information for policy->cpus */

ccdbs = kzalloc(sizeof(*ccdbs), GFP_KERNEL);

if (!ccdbs)
return -ENOMEM;
/* Set ccdbs for all CPUs, online+offline */

for_each_cpu(j, policy->related_cpus)
cdata->get_cpu_cdbs(j)->ccdbs = ccdbs;
return 0;
+}

+static void free_ccdbs(struct cpufreq_policy *policy,
       struct common_dbs_data *cdata)
+{
struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(policy->cpu);

struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs;

int j;

for_each_cpu(j, policy->cpus)
cdata->get_cpu_cdbs(j)->ccdbs = NULL;
kfree(ccdbs);
+}

static int cpufreq_governor_init(struct cpufreq_policy *policy, struct dbs_data *dbs_data, struct common_dbs_data *cdata) @@ -248,6 +278,11 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, if (dbs_data) { if (WARN_ON(have_governor_per_policy())) return -EINVAL;
ret = alloc_ccdbs(policy, cdata);
if (ret)
	return ret;
dbs_data->usage_count++; policy->governor_data = dbs_data; return 0;
@@ -257,12 +292,16 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, if (!dbs_data) return -ENOMEM;
ret = alloc_ccdbs(policy, cdata);

if (ret)
goto free_dbs_data;
dbs_data->cdata = cdata; dbs_data->usage_count = 1;
ret = cdata->init(dbs_data, !policy->governor->initialized); if (ret)
goto free_dbs_data;
goto free_ccdbs;
/* policy latency is in ns. Convert it to us first */ latency = policy->cpuinfo.transition_latency / 1000; @@ -299,6 +338,8 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, } cdata_exit: cdata->exit(dbs_data, !policy->governor->initialized); +free_ccdbs:

free_ccdbs(policy, cdata);

free_dbs_data: kfree(dbs_data); return ret; @@ -322,6 +363,8 @@ static void cpufreq_governor_exit(struct cpufreq_policy *policy, cdata->exit(dbs_data, policy->governor->initialized == 1); kfree(dbs_data); }

free_ccdbs(policy, cdata);

} static int cpufreq_governor_start(struct cpufreq_policy *policy, @@ -330,6 +373,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, struct common_dbs_data *cdata = dbs_data->cdata; unsigned int sampling_rate, ignore_nice, j, cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs; int io_busy = 0;

if (!policy->cur) @@ -348,11 +392,14 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, io_busy = od_tuners->io_is_busy; }

ccdbs->policy = policy;

ccdbs->time_stamp = ktime_get();

mutex_init(&ccdbs->timer_mutex);

for_each_cpu(j, policy->cpus) { struct cpu_dbs_info *j_cdbs = cdata->get_cpu_cdbs(j); unsigned int prev_load;
j_cdbs->policy = policy;
j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy);
@@ -364,7 +411,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, if (ignore_nice) j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
mutex_init(&j_cdbs->timer_mutex);
INIT_DEFERRABLE_WORK(&j_cdbs->dwork, cdata->gov_dbs_timer); }
@@ -384,9 +430,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, od_ops->powersave_bias_init_cpu(cpu); }

/* Initiate timer time stamp */

cdbs->time_stamp = ktime_get();

gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate), true); return 0;

@@ -398,6 +441,9 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, struct common_dbs_data *cdata = dbs_data->cdata; unsigned int cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs;

gov_cancel_work(dbs_data, policy);

if (cdata->governor == GOV_CONSERVATIVE) { struct cs_cpu_dbs_info_s *cs_dbs_info = @@ -406,10 +452,8 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, cs_dbs_info->enable = 0; }

gov_cancel_work(dbs_data, policy);

mutex_destroy(&cdbs->timer_mutex);

cdbs->policy = NULL;

ccdbs->policy = NULL;

mutex_destroy(&ccdbs->timer_mutex);

} static void cpufreq_governor_limits(struct cpufreq_policy *policy, @@ -419,18 +463,18 @@ static void cpufreq_governor_limits(struct cpufreq_policy *policy, unsigned int cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

if (!cdbs->policy)

if (!cdbs->ccdbs || !cdbs->ccdbs->policy) return;
mutex_lock(&cdbs->timer_mutex);

if (policy->max < cdbs->policy->cur)
__cpufreq_driver_target(cdbs->policy, policy->max,
mutex_lock(&cdbs->ccdbs->timer_mutex);

if (policy->max < cdbs->ccdbs->policy->cur)
__cpufreq_driver_target(cdbs->ccdbs->policy, policy->max,
		CPUFREQ_RELATION_H);
else if (policy->min > cdbs->policy->cur)
__cpufreq_driver_target(cdbs->policy, policy->min,
else if (policy->min > cdbs->ccdbs->policy->cur)
__cpufreq_driver_target(cdbs->ccdbs->policy, policy->min,
		CPUFREQ_RELATION_L);
dbs_check_cpu(dbs_data, cpu);
mutex_unlock(&cdbs->timer_mutex);

mutex_unlock(&cdbs->ccdbs->timer_mutex);

} int cpufreq_governor_dbs(struct cpufreq_policy *policy, diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index a0f8eb79ee6d..153412cafb05 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -128,6 +128,18 @@ static void *get_cpu_dbs_info_s(int cpu) \

cs_*: Conservative governor

*/ +/* Common to all CPUs of a policy */ +struct cpu_common_dbs_info {
struct cpufreq_policy *policy;

/*
* percpu mutex that serializes governor limit change with gov_dbs_timer
* invocation. We do not want gov_dbs_timer to run when user is changing
* the governor or limits.
*/
struct mutex timer_mutex;

ktime_t time_stamp;
+};

/* Per cpu structures */ struct cpu_dbs_info { u64 prev_cpu_idle; @@ -140,15 +152,8 @@ struct cpu_dbs_info { * wake-up from idle. */ unsigned int prev_load;
struct cpufreq_policy *policy; struct delayed_work dwork;

/*
* percpu mutex that serializes governor limit change with gov_dbs_timer
* invocation. We do not want gov_dbs_timer to run when user is changing
* the governor or limits.
*/
struct mutex timer_mutex;

ktime_t time_stamp;
struct cpu_common_dbs_info *ccdbs;

I don't like this new field name to be honest. It's just too cryptic to me.

Why don't you simply call it "common" or "shared", for example?

...

}; struct od_cpu_dbs_info_s { @@ -264,7 +269,8 @@ static ssize_t show_sampling_rate_min_gov_pol \ extern struct mutex cpufreq_governor_lock; void dbs_check_cpu(struct dbs_data *dbs_data, int cpu); -bool need_load_eval(struct cpu_dbs_info *cdbs, unsigned int sampling_rate); +bool need_load_eval(struct cpu_common_dbs_info *ccdbs,
    unsigned int sampling_rate);
int cpufreq_governor_dbs(struct cpufreq_policy *policy, struct common_dbs_data *cdata, unsigned int event); void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index d29c6f9c6e3e..651677cfa581 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -155,7 +155,7 @@ static void dbs_freq_increase(struct cpufreq_policy *policy, unsigned int freq) static void od_check_cpu(int cpu, unsigned int load) { struct od_cpu_dbs_info_s *dbs_info = &per_cpu(od_cpu_dbs_info, cpu);

struct cpufreq_policy *policy = dbs_info->cdbs.policy;

struct cpufreq_policy *policy = dbs_info->cdbs.ccdbs->policy;

If you rename ccdbs to "shared" that will look as follows:

struct cpufreq_policy *policy = dbs_info->cdbs.shared->policy;

which is way more readable to me.

Thanks, Rafael

Lucas Stach

27 Nov 27 Nov

11:56 a.m.

New subject: [PATCH V2 06/10] cpufreq: governor: Keep single copy of information common to policy->cpus

Am Freitag, den 19.06.2015, 17:18 +0530 schrieb Viresh Kumar:

...

Some information is common to all CPUs belonging to a policy, but are kept on per-cpu basis. Lets keep that in another structure common to all policy->cpus. That will make updates/reads to that less complex and less error prone.

The memory for ccdbs is allocated/freed at INIT/EXIT, so that it we don't reallocate it for STOP/START sequence. It will be also be used (in next patch) while the governor is stopped and so must not be freed that early.

Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org

Sorry for being late to the party, as this has long been applied, but I stumbled upon this while looking at other stuff and wanted to express my finding.

...

From a short look this change is totally bogus.

The *_dbs_timer functions are called from a workqueue on each CPU. Moving the time_stamp and the timer_mutex to a shared structure now causes a thundering herd issue on multicore systems. All workqueues wake up at the same time with all but one of the worker bouncing on the contended mutex.

Also sharing the time_stamp causes all but the first worker to acquire the mutex to not properly update their sampling information, I guess. I didn't look to deeply into this.

Regards, Lucas

...

drivers/cpufreq/cpufreq_conservative.c | 18 ++++--- drivers/cpufreq/cpufreq_governor.c | 92 +++++++++++++++++++++++++--------- drivers/cpufreq/cpufreq_governor.h | 24 +++++---- drivers/cpufreq/cpufreq_ondemand.c | 38 +++++++------- 4 files changed, 114 insertions(+), 58 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index af47d322679e..5b5c01ca556c 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -47,7 +47,7 @@ static inline unsigned int get_freq_target(struct cs_dbs_tuners *cs_tuners, static void cs_check_cpu(int cpu, unsigned int load) { struct cs_cpu_dbs_info_s *dbs_info = &per_cpu(cs_cpu_dbs_info, cpu);

struct cpufreq_policy *policy = dbs_info->cdbs.policy;

struct cpufreq_policy *policy = dbs_info->cdbs.ccdbs->policy; struct dbs_data *dbs_data = policy->governor_data; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;

@@ -106,22 +106,24 @@ static void cs_dbs_timer(struct work_struct *work) { struct cs_cpu_dbs_info_s *dbs_info = container_of(work, struct cs_cpu_dbs_info_s, cdbs.dwork.work);

unsigned int cpu = dbs_info->cdbs.policy->cpu;

struct cpufreq_policy *policy = dbs_info->cdbs.ccdbs->policy;

unsigned int cpu = policy->cpu; struct cs_cpu_dbs_info_s *core_dbs_info = &per_cpu(cs_cpu_dbs_info, cpu);

struct dbs_data *dbs_data = dbs_info->cdbs.policy->governor_data;

struct dbs_data *dbs_data = policy->governor_data; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; int delay = delay_for_sampling_rate(cs_tuners->sampling_rate); bool modify_all = true;

mutex_lock(&core_dbs_info->cdbs.timer_mutex);

if (!need_load_eval(&core_dbs_info->cdbs, cs_tuners->sampling_rate))
mutex_lock(&core_dbs_info->cdbs.ccdbs->timer_mutex);

if (!need_load_eval(core_dbs_info->cdbs.ccdbs,
	    cs_tuners->sampling_rate))
modify_all = false; else dbs_check_cpu(dbs_data, cpu);
gov_queue_work(dbs_data, dbs_info->cdbs.policy, delay, modify_all);

mutex_unlock(&core_dbs_info->cdbs.timer_mutex);

gov_queue_work(dbs_data, policy, delay, modify_all);

mutex_unlock(&core_dbs_info->cdbs.ccdbs->timer_mutex);

} static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, @@ -135,7 +137,7 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, if (!dbs_info->enable) return 0;

policy = dbs_info->cdbs.policy;

policy = dbs_info->cdbs.ccdbs->policy;

/* * we only care if our internally tracked freq moves outside the 'valid' diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index c0566f86caed..72a92fa71ba5 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -35,7 +35,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); struct od_dbs_tuners *od_tuners = dbs_data->tuners; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;

struct cpufreq_policy *policy;

struct cpufreq_policy *policy = cdbs->ccdbs->policy; unsigned int sampling_rate; unsigned int max_load = 0; unsigned int ignore_nice;

@@ -60,8 +60,6 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) ignore_nice = cs_tuners->ignore_nice_load; }

policy = cdbs->policy;

/* Get Absolute Load */ for_each_cpu(j, policy->cpus) { struct cpu_dbs_info *j_cdbs;

@@ -209,17 +207,18 @@ static inline void gov_cancel_work(struct dbs_data *dbs_data, } /* Will return if we need to evaluate cpu load again or not */ -bool need_load_eval(struct cpu_dbs_info *cdbs, unsigned int sampling_rate) +bool need_load_eval(struct cpu_common_dbs_info *ccdbs,
    unsigned int sampling_rate)
{

if (policy_is_shared(cdbs->policy)) {

if (policy_is_shared(ccdbs->policy)) { ktime_t time_now = ktime_get();
s64 delta_us = ktime_us_delta(time_now, cdbs->time_stamp);
s64 delta_us = ktime_us_delta(time_now, ccdbs->time_stamp);
/* Do nothing if we recently have sampled */ if (delta_us < (s64)(sampling_rate / 2)) return false; else
	cdbs->time_stamp = time_now;
	ccdbs->time_stamp = time_now;
}
return true; @@ -238,6 +237,37 @@ static void set_sampling_rate(struct dbs_data *dbs_data, } } +static int alloc_ccdbs(struct cpufreq_policy *policy,
       struct common_dbs_data *cdata)
+{
struct cpu_common_dbs_info *ccdbs;

int j;

/* Allocate memory for the common information for policy->cpus */

ccdbs = kzalloc(sizeof(*ccdbs), GFP_KERNEL);

if (!ccdbs)
return -ENOMEM;
/* Set ccdbs for all CPUs, online+offline */

for_each_cpu(j, policy->related_cpus)
cdata->get_cpu_cdbs(j)->ccdbs = ccdbs;
return 0;
+}

+static void free_ccdbs(struct cpufreq_policy *policy,
       struct common_dbs_data *cdata)
+{
struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(policy->cpu);

struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs;

int j;

for_each_cpu(j, policy->cpus)
cdata->get_cpu_cdbs(j)->ccdbs = NULL;
kfree(ccdbs);
+}

static int cpufreq_governor_init(struct cpufreq_policy *policy, struct dbs_data *dbs_data, struct common_dbs_data *cdata) @@ -248,6 +278,11 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, if (dbs_data) { if (WARN_ON(have_governor_per_policy())) return -EINVAL;
ret = alloc_ccdbs(policy, cdata);
if (ret)
	return ret;
dbs_data->usage_count++; policy->governor_data = dbs_data; return 0;
@@ -257,12 +292,16 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, if (!dbs_data) return -ENOMEM;
ret = alloc_ccdbs(policy, cdata);

if (ret)
goto free_dbs_data;
dbs_data->cdata = cdata; dbs_data->usage_count = 1;
ret = cdata->init(dbs_data, !policy->governor->initialized); if (ret)
goto free_dbs_data;
goto free_ccdbs;
/* policy latency is in ns. Convert it to us first */ latency = policy->cpuinfo.transition_latency / 1000; @@ -299,6 +338,8 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, } cdata_exit: cdata->exit(dbs_data, !policy->governor->initialized); +free_ccdbs:

free_ccdbs(policy, cdata);

free_dbs_data: kfree(dbs_data); return ret; @@ -322,6 +363,8 @@ static void cpufreq_governor_exit(struct cpufreq_policy *policy, cdata->exit(dbs_data, policy->governor->initialized == 1); kfree(dbs_data); }

free_ccdbs(policy, cdata);

} static int cpufreq_governor_start(struct cpufreq_policy *policy, @@ -330,6 +373,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, struct common_dbs_data *cdata = dbs_data->cdata; unsigned int sampling_rate, ignore_nice, j, cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs; int io_busy = 0;

if (!policy->cur) @@ -348,11 +392,14 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, io_busy = od_tuners->io_is_busy; }

ccdbs->policy = policy;

ccdbs->time_stamp = ktime_get();

mutex_init(&ccdbs->timer_mutex);

for_each_cpu(j, policy->cpus) { struct cpu_dbs_info *j_cdbs = cdata->get_cpu_cdbs(j); unsigned int prev_load;
j_cdbs->policy = policy;
j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy);
@@ -364,7 +411,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, if (ignore_nice) j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
mutex_init(&j_cdbs->timer_mutex);
INIT_DEFERRABLE_WORK(&j_cdbs->dwork, cdata->gov_dbs_timer); }
@@ -384,9 +430,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, od_ops->powersave_bias_init_cpu(cpu); }

/* Initiate timer time stamp */

cdbs->time_stamp = ktime_get();

gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate), true); return 0;

@@ -398,6 +441,9 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, struct common_dbs_data *cdata = dbs_data->cdata; unsigned int cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs;

gov_cancel_work(dbs_data, policy);

if (cdata->governor == GOV_CONSERVATIVE) { struct cs_cpu_dbs_info_s *cs_dbs_info = @@ -406,10 +452,8 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, cs_dbs_info->enable = 0; }

gov_cancel_work(dbs_data, policy);

mutex_destroy(&cdbs->timer_mutex);

cdbs->policy = NULL;

ccdbs->policy = NULL;

mutex_destroy(&ccdbs->timer_mutex);

} static void cpufreq_governor_limits(struct cpufreq_policy *policy, @@ -419,18 +463,18 @@ static void cpufreq_governor_limits(struct cpufreq_policy *policy, unsigned int cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

if (!cdbs->policy)

if (!cdbs->ccdbs || !cdbs->ccdbs->policy) return;
mutex_lock(&cdbs->timer_mutex);

if (policy->max < cdbs->policy->cur)
__cpufreq_driver_target(cdbs->policy, policy->max,
mutex_lock(&cdbs->ccdbs->timer_mutex);

if (policy->max < cdbs->ccdbs->policy->cur)
__cpufreq_driver_target(cdbs->ccdbs->policy, policy->max,
		CPUFREQ_RELATION_H);
else if (policy->min > cdbs->policy->cur)
__cpufreq_driver_target(cdbs->policy, policy->min,
else if (policy->min > cdbs->ccdbs->policy->cur)
__cpufreq_driver_target(cdbs->ccdbs->policy, policy->min,
		CPUFREQ_RELATION_L);
dbs_check_cpu(dbs_data, cpu);
mutex_unlock(&cdbs->timer_mutex);

mutex_unlock(&cdbs->ccdbs->timer_mutex);

} int cpufreq_governor_dbs(struct cpufreq_policy *policy, diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index a0f8eb79ee6d..153412cafb05 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -128,6 +128,18 @@ static void *get_cpu_dbs_info_s(int cpu) \

cs_*: Conservative governor

*/ +/* Common to all CPUs of a policy */ +struct cpu_common_dbs_info {
struct cpufreq_policy *policy;

/*
* percpu mutex that serializes governor limit change with gov_dbs_timer
* invocation. We do not want gov_dbs_timer to run when user is changing
* the governor or limits.
*/
struct mutex timer_mutex;

ktime_t time_stamp;
+};

/* Per cpu structures */ struct cpu_dbs_info { u64 prev_cpu_idle; @@ -140,15 +152,8 @@ struct cpu_dbs_info { * wake-up from idle. */ unsigned int prev_load;
struct cpufreq_policy *policy; struct delayed_work dwork;

/*
* percpu mutex that serializes governor limit change with gov_dbs_timer
* invocation. We do not want gov_dbs_timer to run when user is changing
* the governor or limits.
*/
struct mutex timer_mutex;

ktime_t time_stamp;
struct cpu_common_dbs_info *ccdbs;

}; struct od_cpu_dbs_info_s { @@ -264,7 +269,8 @@ static ssize_t show_sampling_rate_min_gov_pol \ extern struct mutex cpufreq_governor_lock; void dbs_check_cpu(struct dbs_data *dbs_data, int cpu); -bool need_load_eval(struct cpu_dbs_info *cdbs, unsigned int sampling_rate); +bool need_load_eval(struct cpu_common_dbs_info *ccdbs,
    unsigned int sampling_rate);
int cpufreq_governor_dbs(struct cpufreq_policy *policy, struct common_dbs_data *cdata, unsigned int event); void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index d29c6f9c6e3e..651677cfa581 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -155,7 +155,7 @@ static void dbs_freq_increase(struct cpufreq_policy *policy, unsigned int freq) static void od_check_cpu(int cpu, unsigned int load) { struct od_cpu_dbs_info_s *dbs_info = &per_cpu(od_cpu_dbs_info, cpu);

struct cpufreq_policy *policy = dbs_info->cdbs.policy;

struct cpufreq_policy *policy = dbs_info->cdbs.ccdbs->policy; struct dbs_data *dbs_data = policy->governor_data; struct od_dbs_tuners *od_tuners = dbs_data->tuners;

@@ -195,16 +195,18 @@ static void od_dbs_timer(struct work_struct *work) { struct od_cpu_dbs_info_s *dbs_info = container_of(work, struct od_cpu_dbs_info_s, cdbs.dwork.work);

unsigned int cpu = dbs_info->cdbs.policy->cpu;

struct cpufreq_policy *policy = dbs_info->cdbs.ccdbs->policy;

unsigned int cpu = policy->cpu; struct od_cpu_dbs_info_s *core_dbs_info = &per_cpu(od_cpu_dbs_info, cpu);

struct dbs_data *dbs_data = dbs_info->cdbs.policy->governor_data;

struct dbs_data *dbs_data = policy->governor_data; struct od_dbs_tuners *od_tuners = dbs_data->tuners; int delay = 0, sample_type = core_dbs_info->sample_type; bool modify_all = true;

mutex_lock(&core_dbs_info->cdbs.timer_mutex);

if (!need_load_eval(&core_dbs_info->cdbs, od_tuners->sampling_rate)) {
mutex_lock(&core_dbs_info->cdbs.ccdbs->timer_mutex);

if (!need_load_eval(core_dbs_info->cdbs.ccdbs,
	    od_tuners->sampling_rate)) {
modify_all = false; goto max_delay; }
@@ -213,8 +215,7 @@ static void od_dbs_timer(struct work_struct *work) core_dbs_info->sample_type = OD_NORMAL_SAMPLE; if (sample_type == OD_SUB_SAMPLE) { delay = core_dbs_info->freq_lo_jiffies;
__cpufreq_driver_target(core_dbs_info->cdbs.policy,
			core_dbs_info->freq_lo,
__cpufreq_driver_target(policy, core_dbs_info->freq_lo,
		CPUFREQ_RELATION_H);
} else { dbs_check_cpu(dbs_data, cpu);
@@ -230,8 +231,8 @@ static void od_dbs_timer(struct work_struct *work) delay = delay_for_sampling_rate(od_tuners->sampling_rate * core_dbs_info->rate_mult);

gov_queue_work(dbs_data, dbs_info->cdbs.policy, delay, modify_all);

mutex_unlock(&core_dbs_info->cdbs.timer_mutex);

gov_queue_work(dbs_data, policy, delay, modify_all);

mutex_unlock(&core_dbs_info->cdbs.ccdbs->timer_mutex);

} /************************** sysfs interface ************************/ @@ -274,10 +275,10 @@ static void update_sampling_rate(struct dbs_data *dbs_data, dbs_info = &per_cpu(od_cpu_dbs_info, cpu); cpufreq_cpu_put(policy);
mutex_lock(&dbs_info->cdbs.timer_mutex);
mutex_lock(&dbs_info->cdbs.ccdbs->timer_mutex);
if (!delayed_work_pending(&dbs_info->cdbs.dwork)) {
	mutex_unlock(&dbs_info->cdbs.timer_mutex);
	mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex);
continue;
}
@@ -286,15 +287,15 @@ static void update_sampling_rate(struct dbs_data *dbs_data, if (time_before(next_sampling, appointed_at)) {
	mutex_unlock(&dbs_info->cdbs.timer_mutex);
	mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex);
cancel_delayed_work_sync(&dbs_info->cdbs.dwork);
	mutex_lock(&dbs_info->cdbs.timer_mutex);
	mutex_lock(&dbs_info->cdbs.ccdbs->timer_mutex);
	gov_queue_work(dbs_data, dbs_info->cdbs.policy,
	gov_queue_work(dbs_data, policy,
	       usecs_to_jiffies(new_rate), true);
}
mutex_unlock(&dbs_info->cdbs.timer_mutex);
mutex_unlock(&dbs_info->cdbs.ccdbs->timer_mutex);
}
} @@ -557,13 +558,16 @@ static void od_set_powersave_bias(unsigned int powersave_bias) get_online_cpus(); for_each_online_cpu(cpu) {
struct cpu_common_dbs_info *ccdbs;
if (cpumask_test_cpu(cpu, &done)) continue;
policy = per_cpu(od_cpu_dbs_info, cpu).cdbs.policy;
if (!policy)
ccdbs = per_cpu(od_cpu_dbs_info, cpu).cdbs.ccdbs;
if (!ccdbs)
continue;
policy = ccdbs->policy;
cpumask_or(&done, &done, policy->cpus);
if (policy->governor != &cpufreq_gov_ondemand)

-- Pengutronix e.K. | Lucas Stach | Industrial Linux Solutions | http://www.pengutronix.de/ |

Viresh Kumar

30 Nov 30 Nov

5:19 a.m.

New subject: [PATCH V2 06/10] cpufreq: governor: Keep single copy of information common to policy->cpus

On 27-11-15, 12:56, Lucas Stach wrote:

...

Sorry for being late to the party, as this has long been applied, but I stumbled upon this while looking at other stuff and wanted to express my finding.

No issues, we can still fix things if they have gone wrong.

...

From a short look this change is totally bogus.

That's a bit shocking, after two people reviewed and tested it :)

...

The *_dbs_timer functions are called from a workqueue on each CPU.

Right.

...

Moving the time_stamp and the timer_mutex to a shared structure now causes a thundering herd issue on multicore systems.

You faced an issue or you just think that something is wrong ?

...

All workqueues wake up at the same time with all but one of the worker bouncing on the contended mutex.

This is exactly what used to be done earlier, but it was designed badly. For example, for ondemand governor, each CPU had a struct od_cpu_dbs_info_s and we only used the od_cpu_dbs_info_s for the 'policy->cpu'. So, even if all CPUs had this structure (i.e. the time_stamp and the lock), we only used structure for policy->cpu.

...

Also sharing the time_stamp causes all but the first worker to acquire the mutex to not properly update their sampling information, I guess.

Yeah, that's what we want per-policy.

...

I didn't look to deeply into this.

Perhaps, yes.

-- viresh

Lucas Stach

10:34 a.m.

New subject: [PATCH V2 06/10] cpufreq: governor: Keep single copy of information common to policy->cpus

Am Montag, den 30.11.2015, 10:49 +0530 schrieb Viresh Kumar:

...

On 27-11-15, 12:56, Lucas Stach wrote:

...
Sorry for being late to the party, as this has long been applied, but I stumbled upon this while looking at other stuff and wanted to express my finding.

No issues, we can still fix things if they have gone wrong.

...
From a short look this change is totally bogus.

That's a bit shocking, after two people reviewed and tested it :)

...
The *_dbs_timer functions are called from a workqueue on each CPU.

Right.

...
Moving the time_stamp and the timer_mutex to a shared structure now causes a thundering herd issue on multicore systems.

You faced an issue or you just think that something is wrong ?

The timer_mutex is one of the top contended locks already on a quad core system. That really doesn't look right, as the timer is quite low frequency. It's causing excessive wake ups, as all 4 CPUs wake up to handle the WQ, 3 of them directly go back to sleep waiting for the mutex to be released, then same thing happens for CPUs-1 until we rippled down to a single CPU.

...

...
All workqueues wake up at the same time with all but one of the worker bouncing on the contended mutex.

This is exactly what used to be done earlier, but it was designed badly. For example, for ondemand governor, each CPU had a struct od_cpu_dbs_info_s and we only used the od_cpu_dbs_info_s for the 'policy->cpu'. So, even if all CPUs had this structure (i.e. the time_stamp and the lock), we only used structure for policy->cpu.

Ok, so this patch isn't actually at fault. My apologies. It was just the first thing down the git history that catched my eye when looking into why the timer mutex is this badly contended. I would say it's still worth fixing. Perhaps by not waking all the workqueues at the same time, but spreading the wake times out over a jiffy.

...

...
Also sharing the time_stamp causes all but the first worker to acquire the mutex to not properly update their sampling information, I guess.

Yeah, that's what we want per-policy.

That I still don't really understand. If we only sample a single CPU, why do we also wake all the others? Also if we serialize on the timer_mutex and only the first CPU to acquire it updates the sampling information, is it ok that essentially a random CPU will do the work?

Again, I didn't really take the time to understand the code in depth, hoping that you could provide the answers to those questions more easily. :)

Thanks, Lucas

-- Pengutronix e.K. | Lucas Stach | Industrial Linux Solutions | http://www.pengutronix.de/ |

Viresh Kumar

11:03 a.m.

New subject: [PATCH V2 06/10] cpufreq: governor: Keep single copy of information common to policy->cpus

On 30-11-15, 11:34, Lucas Stach wrote:

...

The timer_mutex is one of the top contended locks already on a quad core system.

My new series (un-committed) should fix that to some level hopefully:

http://marc.info/?l=linux-pm&m=144612165211091&w=2

...

That really doesn't look right, as the timer is quite low frequency. It's causing excessive wake ups, as all 4 CPUs wake up to handle the WQ, 3 of them directly go back to sleep waiting for the mutex to be released, then same thing happens for CPUs-1 until we rippled down to a single CPU.

There is a reason why we need to do this on all CPUs today. The timers are deferrable by nature, as we shouldn't wake up a CPU to change it frequency. Now group of CPUs that change their DVFS state together, or that change their voltage/clock rails are part of the same cpufreq-policy. We need to take load of all the CPUs, that are part of the policy, while update the frequency of the group.

If we queue the timer on any one CPU, then that CPU can go into idle and the deferred timer will not fire. But the other CPUs of the policy can still be active and the frequency of the group wouldn't change with load.

Hope that answers the query related to timers on all CPUs.

...

I would say it's still worth fixing. Perhaps by not waking all the workqueues at the same time, but spreading the wake times out over a jiffy.

Maybe.

-- viresh

Saravana Kannan

2 Dec 2 Dec

9:43 p.m.

New subject: [PATCH V2 06/10] cpufreq: governor: Keep single copy of information common to policy->cpus

On 11/30/2015 03:03 AM, Viresh Kumar wrote:

...

On 30-11-15, 11:34, Lucas Stach wrote:

...
The timer_mutex is one of the top contended locks already on a quad core system.

My new series (un-committed) should fix that to some level hopefully:

http://marc.info/?l=linux-pm&m=144612165211091&w=2

...
That really doesn't look right, as the timer is quite low frequency. It's causing excessive wake ups, as all 4 CPUs wake up to handle the WQ, 3 of them directly go back to sleep waiting for the mutex to be released, then same thing happens for CPUs-1 until we rippled down to a single CPU.

There is a reason why we need to do this on all CPUs today. The timers are deferrable by nature, as we shouldn't wake up a CPU to change it frequency. Now group of CPUs that change their DVFS state together, or that change their voltage/clock rails are part of the same cpufreq-policy. We need to take load of all the CPUs, that are part of the policy, while update the frequency of the group.

If we queue the timer on any one CPU, then that CPU can go into idle and the deferred timer will not fire. But the other CPUs of the policy can still be active and the frequency of the group wouldn't change with load.

Hope that answers the query related to timers on all CPUs.

...
I would say it's still worth fixing. Perhaps by not waking all the workqueues at the same time, but spreading the wake times out over a jiffy.

Maybe.

There's a separate thread where we proposed a fix to deferrable timers that are stored globally if they are not CPU bound. That way, even if one CPU is up, they get handled. But TGLX had some valid concerns with cache thrashing and impact on some network code. So, last I heard, he was going to rewrite and fixed the deferrable timer problem by having the "orphaned" (because CPU has gone idle) deferrable timers being adopted by other CPUs while the original CPU is idle.

Once that's fixed, we just need one timer per policy. Long story short, CPU freq is working around a poor API semantic of deferrable timers.

-Saravana

-- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

Viresh Kumar

3 Dec 3 Dec

4:54 a.m.

New subject: [PATCH V2 06/10] cpufreq: governor: Keep single copy of information common to policy->cpus

On 02-12-15, 13:43, Saravana Kannan wrote:

...

There's a separate thread where we proposed a fix to deferrable timers that are stored globally if they are not CPU bound. That way, even if one CPU is up, they get handled. But TGLX had some valid concerns with cache thrashing and impact on some network code. So, last I heard, he was going to rewrite and fixed the deferrable timer problem by having the "orphaned" (because CPU has gone idle) deferrable timers being adopted by other CPUs while the original CPU is idle.

Once that's fixed, we just need one timer per policy. Long story short, CPU freq is working around a poor API semantic of deferrable timers.

There is another idea that I have.

Lets sacrifice idleness of CPU0 (which is already considered as housekeeping CPU in scheduler) to save us from all the complexity we have today.

Suppose we have 16 CPUs, with 4 CPUs per policy and hence 4 policies. - Keep a single delayed-work (non-deferrable) per policy and queue them as: queue_work_on(CPU0). - This will work because any CPU can calculate the load of other CPUs, and there is no dependency on the local CPU. - CPU0 will hence get interrupted, check if the policy->cpus are idle or not, and if not, update their frequency (perhaps with an IPI).

Not sure if this will be better performance wise though.

-- viresh

Robert Schöne

6:53 a.m.

New subject: [PATCH V2 06/10] cpufreq: governor: Keep single copy of information common to policy->cpus

...

There is another idea that I have.

Lets sacrifice idleness of CPU0 (which is already considered as housekeeping CPU in scheduler) to save us from all the complexity we have today.

Suppose we have 16 CPUs, with 4 CPUs per policy and hence 4 policies.

Keep a single delayed-work (non-deferrable) per policy and queue

them as: queue_work_on(CPU0).

This will work because any CPU can calculate the load of other

CPUs, and there is no dependency on the local CPU.

CPU0 will hence get interrupted, check if the policy->cpus are idle or not, and if not, update their frequency (perhaps with an IPI).

Not sure if this will be better performance wise though.

No it is not. In your example, everything could be okay with many ifs. I'm from the HPC field and would expect load imbalances because this approach does not scale with the number of frequency domains.

In 2011, AMD released Bulldozer servers with 4 sockets and each of these sockets had 2 dies with 4 modules. Each module has its own frequency domain, which sums up to 32 different domains. Putting all the load to CPU0 will be unfavorable for a balanced performance over all CPUS. This is worsened by the cache coherence protocol which might delay accesses to shared and foreign memory significantly.

And this had been in 2011. Today we have Haswell-EP quad socket servers with up to 72 frequency domains for all CPUs of all sockets put together. If you have an SGI Altix UV, you even have 256 sockets with 18 domains each, which sums up to 4608 domains.

I would expect architectures with more frequency domains in the future.

Robert

-- Dipl.-Inf. Robert Schoene Computer Scientist - R&D Energy Efficient Computing Technische Universitaet Dresden Center for Information Services and High Performance Computing Distributed and Data Intensive Computing 01062 Dresden Tel.: +49 (351) 463-42483 Fax : +49 (351) 463-37773 E-Mail: Robert.Schoene@tu-dresden.de

Saravana Kannan

10 p.m.

New subject: [PATCH V2 06/10] cpufreq: governor: Keep single copy of information common to policy->cpus

On 12/02/2015 10:53 PM, Robert Schöne wrote:

...

...
There is another idea that I have.

Lets sacrifice idleness of CPU0 (which is already considered as housekeeping CPU in scheduler) to save us from all the complexity we have today.

Suppose we have 16 CPUs, with 4 CPUs per policy and hence 4 policies.

Keep a single delayed-work (non-deferrable) per policy and queue

them as:

This is a 100% no go. Non-deferrable is going to kill mobile idle power.

...

...
queue_work_on(CPU0).

This will work because any CPU can calculate the load of other

CPUs, and there is no dependency on the local CPU.

CPU0 will hence get interrupted, check if the policy->cpus are idle or not, and if not, update their frequency (perhaps with an IPI).

Not sure if this will be better performance wise though.

No it is not. In your example, everything could be okay with many ifs. I'm from the HPC field and would expect load imbalances because this approach does not scale with the number of frequency domains.

Agree. I think that's why we had per policy timers (forced to be implemented with per CPU deferrable timers) in the first place.

-Saravana

-- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

Viresh Kumar

19 Jun 19 Jun

11:48 a.m.

New subject: [PATCH V2 07/10] cpufreq: governor: split out common part of {cs|od}_dbs_timer()

Some part of cs_dbs_timer() and od_dbs_timer() is exactly same and is unnecessarily duplicated.

Create the real work-handler in cpufreq_governor.c and put the common code in this routine (dbs_timer()).

Shouldn't make any functional change.

Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 27 +++++++----------------- drivers/cpufreq/cpufreq_governor.c | 38 ++++++++++++++++++++++++++++++---- drivers/cpufreq/cpufreq_governor.h | 10 ++++----- drivers/cpufreq/cpufreq_ondemand.c | 36 +++++++++++++------------------- 4 files changed, 60 insertions(+), 51 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 5b5c01ca556c..0e4154e584bf 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -102,28 +102,15 @@ static void cs_check_cpu(int cpu, unsigned int load) } }

-static void cs_dbs_timer(struct work_struct *work) +static unsigned int cs_dbs_timer(struct cpu_dbs_info *cdbs, + struct dbs_data *dbs_data, bool modify_all) { - struct cs_cpu_dbs_info_s *dbs_info = container_of(work, - struct cs_cpu_dbs_info_s, cdbs.dwork.work); - struct cpufreq_policy *policy = dbs_info->cdbs.ccdbs->policy; - unsigned int cpu = policy->cpu; - struct cs_cpu_dbs_info_s *core_dbs_info = &per_cpu(cs_cpu_dbs_info, - cpu); - struct dbs_data *dbs_data = policy->governor_data; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; - int delay = delay_for_sampling_rate(cs_tuners->sampling_rate); - bool modify_all = true; - - mutex_lock(&core_dbs_info->cdbs.ccdbs->timer_mutex); - if (!need_load_eval(core_dbs_info->cdbs.ccdbs, - cs_tuners->sampling_rate)) - modify_all = false; - else - dbs_check_cpu(dbs_data, cpu); - - gov_queue_work(dbs_data, policy, delay, modify_all); - mutex_unlock(&core_dbs_info->cdbs.ccdbs->timer_mutex); + + if (modify_all) + dbs_check_cpu(dbs_data, cdbs->ccdbs->policy->cpu); + + return delay_for_sampling_rate(cs_tuners->sampling_rate); }

static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 72a92fa71ba5..441150dea078 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -207,8 +207,8 @@ static inline void gov_cancel_work(struct dbs_data *dbs_data, }

/* Will return if we need to evaluate cpu load again or not */ -bool need_load_eval(struct cpu_common_dbs_info *ccdbs, - unsigned int sampling_rate) +static bool need_load_eval(struct cpu_common_dbs_info *ccdbs, + unsigned int sampling_rate) { if (policy_is_shared(ccdbs->policy)) { ktime_t time_now = ktime_get(); @@ -223,7 +223,37 @@ bool need_load_eval(struct cpu_common_dbs_info *ccdbs,

return true; } -EXPORT_SYMBOL_GPL(need_load_eval); + +static void dbs_timer(struct work_struct *work) +{ + struct cpu_dbs_info *cdbs = container_of(work, struct cpu_dbs_info, + dwork.work); + struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs; + struct cpufreq_policy *policy = ccdbs->policy; + struct dbs_data *dbs_data = policy->governor_data; + unsigned int sampling_rate, delay; + bool modify_all = true; + + mutex_lock(&ccdbs->timer_mutex); + + if (dbs_data->cdata->governor == GOV_CONSERVATIVE) { + struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; + + sampling_rate = cs_tuners->sampling_rate; + } else { + struct od_dbs_tuners *od_tuners = dbs_data->tuners; + + sampling_rate = od_tuners->sampling_rate; + } + + if (!need_load_eval(cdbs->ccdbs, sampling_rate)) + modify_all = false; + + delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, modify_all); + gov_queue_work(dbs_data, policy, delay, modify_all); + + mutex_unlock(&ccdbs->timer_mutex); +}

static void set_sampling_rate(struct dbs_data *dbs_data, unsigned int sampling_rate) @@ -411,7 +441,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, if (ignore_nice) j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];

- INIT_DEFERRABLE_WORK(&j_cdbs->dwork, cdata->gov_dbs_timer); + INIT_DEFERRABLE_WORK(&j_cdbs->dwork, dbs_timer); }

if (cdata->governor == GOV_CONSERVATIVE) { diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 153412cafb05..2125c299c602 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -132,8 +132,8 @@ static void *get_cpu_dbs_info_s(int cpu) \ struct cpu_common_dbs_info { struct cpufreq_policy *policy; /* - * percpu mutex that serializes governor limit change with gov_dbs_timer - * invocation. We do not want gov_dbs_timer to run when user is changing + * percpu mutex that serializes governor limit change with dbs_timer + * invocation. We do not want dbs_timer to run when user is changing * the governor or limits. */ struct mutex timer_mutex; @@ -210,7 +210,9 @@ struct common_dbs_data {

struct cpu_dbs_info *(*get_cpu_cdbs)(int cpu); void *(*get_cpu_dbs_info_s)(int cpu); - void (*gov_dbs_timer)(struct work_struct *work); + unsigned int (*gov_dbs_timer)(struct cpu_dbs_info *cdbs, + struct dbs_data *dbs_data, + bool modify_all); void (*gov_check_cpu)(int cpu, unsigned int load); int (*init)(struct dbs_data *dbs_data, bool notify); void (*exit)(struct dbs_data *dbs_data, bool notify); @@ -269,8 +271,6 @@ static ssize_t show_sampling_rate_min_gov_pol \ extern struct mutex cpufreq_governor_lock;

void dbs_check_cpu(struct dbs_data *dbs_data, int cpu); -bool need_load_eval(struct cpu_common_dbs_info *ccdbs, - unsigned int sampling_rate); int cpufreq_governor_dbs(struct cpufreq_policy *policy, struct common_dbs_data *cdata, unsigned int event); void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 651677cfa581..11db20079fc6 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -191,48 +191,40 @@ static void od_check_cpu(int cpu, unsigned int load) } }

-static void od_dbs_timer(struct work_struct *work) +static unsigned int od_dbs_timer(struct cpu_dbs_info *cdbs, + struct dbs_data *dbs_data, bool modify_all) { - struct od_cpu_dbs_info_s *dbs_info = - container_of(work, struct od_cpu_dbs_info_s, cdbs.dwork.work); - struct cpufreq_policy *policy = dbs_info->cdbs.ccdbs->policy; + struct cpufreq_policy *policy = cdbs->ccdbs->policy; unsigned int cpu = policy->cpu; - struct od_cpu_dbs_info_s *core_dbs_info = &per_cpu(od_cpu_dbs_info, + struct od_cpu_dbs_info_s *dbs_info = &per_cpu(od_cpu_dbs_info, cpu); - struct dbs_data *dbs_data = policy->governor_data; struct od_dbs_tuners *od_tuners = dbs_data->tuners; - int delay = 0, sample_type = core_dbs_info->sample_type; - bool modify_all = true; + int delay = 0, sample_type = dbs_info->sample_type;

- mutex_lock(&core_dbs_info->cdbs.ccdbs->timer_mutex); - if (!need_load_eval(core_dbs_info->cdbs.ccdbs, - od_tuners->sampling_rate)) { - modify_all = false; + if (!modify_all) goto max_delay; - }

/* Common NORMAL_SAMPLE setup */ - core_dbs_info->sample_type = OD_NORMAL_SAMPLE; + dbs_info->sample_type = OD_NORMAL_SAMPLE; if (sample_type == OD_SUB_SAMPLE) { - delay = core_dbs_info->freq_lo_jiffies; - __cpufreq_driver_target(policy, core_dbs_info->freq_lo, + delay = dbs_info->freq_lo_jiffies; + __cpufreq_driver_target(policy, dbs_info->freq_lo, CPUFREQ_RELATION_H); } else { dbs_check_cpu(dbs_data, cpu); - if (core_dbs_info->freq_lo) { + if (dbs_info->freq_lo) { /* Setup timer for SUB_SAMPLE */ - core_dbs_info->sample_type = OD_SUB_SAMPLE; - delay = core_dbs_info->freq_hi_jiffies; + dbs_info->sample_type = OD_SUB_SAMPLE; + delay = dbs_info->freq_hi_jiffies; } }

max_delay: if (!delay) delay = delay_for_sampling_rate(od_tuners->sampling_rate - * core_dbs_info->rate_mult); + * dbs_info->rate_mult);

- gov_queue_work(dbs_data, policy, delay, modify_all); - mutex_unlock(&core_dbs_info->cdbs.ccdbs->timer_mutex); + return delay; }

/************************** sysfs interface ************************/

-- 2.4.0

Viresh Kumar

11:48 a.m.

New subject: [PATCH V2 08/10] cpufreq: governor: Avoid invalid states with additional checks

There can be races where the request has come to a wrong state. For example INIT followed by STOP (instead of START) or START followed by EXIT (instead of STOP).

Tackle these races by comparing making sure the state-machine never gets into any invalid state. And return error if an invalid state-transition is requested.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_governor.c | 46 +++++++++++++++++++++++++++++--------- 1 file changed, 35 insertions(+), 11 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 441150dea078..8d4c86fa43ec 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -305,6 +305,10 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, unsigned int latency; int ret;

+ /* State should be equivalent to EXIT */ + if (policy->governor_data) + return -EBUSY; + if (dbs_data) { if (WARN_ON(have_governor_per_policy())) return -EINVAL; @@ -375,10 +379,15 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, return ret; }

-static void cpufreq_governor_exit(struct cpufreq_policy *policy, - struct dbs_data *dbs_data) +static int cpufreq_governor_exit(struct cpufreq_policy *policy, + struct dbs_data *dbs_data) { struct common_dbs_data *cdata = dbs_data->cdata; + struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(policy->cpu); + + /* State should be equivalent to INIT */ + if (!cdbs->ccdbs || cdbs->ccdbs->policy) + return -EBUSY;

policy->governor_data = NULL; if (!--dbs_data->usage_count) { @@ -395,6 +404,7 @@ static void cpufreq_governor_exit(struct cpufreq_policy *policy, }

free_ccdbs(policy, cdata); + return 0; }

static int cpufreq_governor_start(struct cpufreq_policy *policy, @@ -409,6 +419,10 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, if (!policy->cur) return -EINVAL;

+ /* State should be equivalent to INIT */ + if (!ccdbs || ccdbs->policy) + return -EBUSY; + if (cdata->governor == GOV_CONSERVATIVE) { struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;

@@ -465,14 +479,18 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, return 0; }

-static void cpufreq_governor_stop(struct cpufreq_policy *policy, - struct dbs_data *dbs_data) +static int cpufreq_governor_stop(struct cpufreq_policy *policy, + struct dbs_data *dbs_data) { struct common_dbs_data *cdata = dbs_data->cdata; unsigned int cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu); struct cpu_common_dbs_info *ccdbs = cdbs->ccdbs;

+ /* State should be equivalent to START */ + if (!ccdbs || !ccdbs->policy) + return -EBUSY; + gov_cancel_work(dbs_data, policy);

if (cdata->governor == GOV_CONSERVATIVE) { @@ -484,17 +502,19 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy,

ccdbs->policy = NULL; mutex_destroy(&ccdbs->timer_mutex); + return 0; }

-static void cpufreq_governor_limits(struct cpufreq_policy *policy, - struct dbs_data *dbs_data) +static int cpufreq_governor_limits(struct cpufreq_policy *policy, + struct dbs_data *dbs_data) { struct common_dbs_data *cdata = dbs_data->cdata; unsigned int cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu);

+ /* State should be equivalent to START */ if (!cdbs->ccdbs || !cdbs->ccdbs->policy) - return; + return -EBUSY;

mutex_lock(&cdbs->ccdbs->timer_mutex); if (policy->max < cdbs->ccdbs->policy->cur) @@ -505,13 +525,15 @@ static void cpufreq_governor_limits(struct cpufreq_policy *policy, CPUFREQ_RELATION_L); dbs_check_cpu(dbs_data, cpu); mutex_unlock(&cdbs->ccdbs->timer_mutex); + + return 0; }

int cpufreq_governor_dbs(struct cpufreq_policy *policy, struct common_dbs_data *cdata, unsigned int event) { struct dbs_data *dbs_data; - int ret = 0; + int ret;

/* Lock governor to block concurrent initialization of governor */ mutex_lock(&cdata->mutex); @@ -531,17 +553,19 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy, ret = cpufreq_governor_init(policy, dbs_data, cdata); break; case CPUFREQ_GOV_POLICY_EXIT: - cpufreq_governor_exit(policy, dbs_data); + ret = cpufreq_governor_exit(policy, dbs_data); break; case CPUFREQ_GOV_START: ret = cpufreq_governor_start(policy, dbs_data); break; case CPUFREQ_GOV_STOP: - cpufreq_governor_stop(policy, dbs_data); + ret = cpufreq_governor_stop(policy, dbs_data); break; case CPUFREQ_GOV_LIMITS: - cpufreq_governor_limits(policy, dbs_data); + ret = cpufreq_governor_limits(policy, dbs_data); break; + default: + ret = -EINVAL; }

unlock:

-- 2.4.0

Preeti U Murthy

6:04 p.m.

New subject: [PATCH V2 08/10] cpufreq: governor: Avoid invalid states with additional checks

On 06/19/2015 05:18 PM, Viresh Kumar wrote:

...

There can be races where the request has come to a wrong state. For example INIT followed by STOP (instead of START) or START followed by EXIT (instead of STOP).

Tackle these races by comparing making sure the state-machine never gets into any invalid state. And return error if an invalid state-transition is requested.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org

Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com

Viresh Kumar

11:48 a.m.

New subject: [PATCH V2 09/10] cpufreq: governor: Don't WARN on invalid states

With previous commit, governors have started to return errors on invalid state-transition requests. We already have a WARN for an invalid state-transition request in cpufreq_governor_dbs(). This does trigger today, as the sequence of events isn't guaranteed by cpufreq core.

Lets stop warning on that for now, and make sure we don't enter an invalid state.

Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_governor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 8d4c86fa43ec..af63402a94a9 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -543,7 +543,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy, else dbs_data = cdata->gdbs_data;

- if (WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT))) { + if (!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT)) { ret = -EINVAL; goto unlock; }

-- 2.4.0

Viresh Kumar

11:48 a.m.

New subject: [PATCH V2 10/10] cpufreq: propagate errors returned from __cpufreq_governor()

Return codes aren't honored properly in cpufreq_set_policy(). This can lead to two problems: - wrong errors propagated to sysfs - we try to do next state-change even if the previous one failed

cpufreq_governor_dbs() now returns proper errors on all invalid state-transition requests and this code should honor that.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index b612411655f9..da672b910760 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2284,16 +2284,20 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, old_gov = policy->governor; /* end old governor */ if (old_gov) { - __cpufreq_governor(policy, CPUFREQ_GOV_STOP); - up_write(&policy->rwsem); - __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); - down_write(&policy->rwsem); + if(!(ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP))) { + up_write(&policy->rwsem); + ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); + down_write(&policy->rwsem); + } + + if (ret) + return ret; }

/* start new governor */ policy->governor = new_policy->governor; - if (!__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT)) { - if (!__cpufreq_governor(policy, CPUFREQ_GOV_START)) + if (!(ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT))) { + if (!(ret = __cpufreq_governor(policy, CPUFREQ_GOV_START))) goto out;

up_write(&policy->rwsem); @@ -2305,11 +2309,11 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, pr_debug("starting governor %s failed\n", policy->governor->name); if (old_gov) { policy->governor = old_gov; - __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT); - __cpufreq_governor(policy, CPUFREQ_GOV_START); + if (!__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT)) + __cpufreq_governor(policy, CPUFREQ_GOV_START); }

- return -EINVAL; + return ret;

out: pr_debug("governor: change or update limits\n");

-- 2.4.0

Preeti U Murthy

6:02 p.m.

New subject: [PATCH V2 10/10] cpufreq: propagate errors returned from __cpufreq_governor()

On 06/19/2015 05:18 PM, Viresh Kumar wrote:

...

Return codes aren't honored properly in cpufreq_set_policy(). This can lead to two problems:

wrong errors propagated to sysfs

we try to do next state-change even if the previous one failed

cpufreq_governor_dbs() now returns proper errors on all invalid state-transition requests and this code should honor that.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org

Minor points below.

...

drivers/cpufreq/cpufreq.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index b612411655f9..da672b910760 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2284,16 +2284,20 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, old_gov = policy->governor; /* end old governor */ if (old_gov) {
__cpufreq_governor(policy, CPUFREQ_GOV_STOP);
up_write(&policy->rwsem);
__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
down_write(&policy->rwsem);
if(!(ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP))) {
	up_write(&policy->rwsem);
	ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
	down_write(&policy->rwsem);
}
if (ret)
	return ret;

Some debug prints help here? Like "Failed to stop govenor".

...

}

/* start new governor */ policy->governor = new_policy->governor;
if (!__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT)) {
if (!__cpufreq_governor(policy, CPUFREQ_GOV_START))
if (!(ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT))) {
if (!(ret = __cpufreq_governor(policy, CPUFREQ_GOV_START)))
goto out;
up_write(&policy->rwsem);
@@ -2305,11 +2309,11 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, pr_debug("starting governor %s failed\n", policy->governor->name); if (old_gov) { policy->governor = old_gov;
__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT);
__cpufreq_governor(policy, CPUFREQ_GOV_START);
if (!__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT))
	__cpufreq_governor(policy, CPUFREQ_GOV_START);

If INIT itself fails, we need to set policy->governor to NULL. I included this in the testing of the patchset.

Regards Preeti U Murthy

Viresh Kumar

20 Jun 20 Jun

3:12 a.m.

New subject: [PATCH V2 10/10] cpufreq: propagate errors returned from __cpufreq_governor()

On 19-06-15, 23:32, Preeti U Murthy wrote:

...

If INIT itself fails, we need to set policy->governor to NULL. I included this in the testing of the patchset.

Diff from earlier patch:

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index da672b910760..59fa0c1b7922 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2284,14 +2284,23 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, old_gov = policy->governor; /* end old governor */ if (old_gov) { - if(!(ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP))) { + ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP); + if (ret) { + /* This can happen due to race with other operations */ + pr_debug("%s: Failed to Stop Governor: %s (%d)\n", + __func__, old_gov->name, ret); + return ret; + } else { up_write(&policy->rwsem); ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); down_write(&policy->rwsem); - }

- if (ret) - return ret; + if (ret) { + pr_err("%s: Failed to Exit Governor: %s (%d)\n", + __func__, old_gov->name, ret); + return ret; + } + } }

/* start new governor */ @@ -2309,7 +2318,9 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, pr_debug("starting governor %s failed\n", policy->governor->name); if (old_gov) { policy->governor = old_gov; - if (!__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT)) + if (__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT)) + policy->governor = NULL; + else __cpufreq_governor(policy, CPUFREQ_GOV_START); }

New-patch:

--------------8<-------------------

From: Viresh Kumar viresh.kumar@linaro.org Date: Thu, 11 Sep 2014 10:50:48 +0530 Subject: [PATCH] cpufreq: propagate errors returned from __cpufreq_governor()

Return codes aren't honored properly in cpufreq_set_policy(). This can lead to two problems: - wrong errors propagated to sysfs - we try to do next state-change even if the previous one failed

cpufreq_governor_dbs() now returns proper errors on all invalid state-transition requests and this code should honor that.

Reviewed-and-Tested-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 33 ++++++++++++++++++++++++--------- 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index b612411655f9..59fa0c1b7922 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2284,16 +2284,29 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, old_gov = policy->governor; /* end old governor */ if (old_gov) { - __cpufreq_governor(policy, CPUFREQ_GOV_STOP); - up_write(&policy->rwsem); - __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); - down_write(&policy->rwsem); + ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP); + if (ret) { + /* This can happen due to race with other operations */ + pr_debug("%s: Failed to Stop Governor: %s (%d)\n", + __func__, old_gov->name, ret); + return ret; + } else { + up_write(&policy->rwsem); + ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); + down_write(&policy->rwsem); + + if (ret) { + pr_err("%s: Failed to Exit Governor: %s (%d)\n", + __func__, old_gov->name, ret); + return ret; + } + } }

up_write(&policy->rwsem); @@ -2305,11 +2318,13 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, pr_debug("starting governor %s failed\n", policy->governor->name); if (old_gov) { policy->governor = old_gov; - __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT); - __cpufreq_governor(policy, CPUFREQ_GOV_START); + if (__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT)) + policy->governor = NULL; + else + __cpufreq_governor(policy, CPUFREQ_GOV_START); }

- return -EINVAL; + return ret;

out: pr_debug("governor: change or update limits\n");

-- viresh

Preeti U Murthy

22 Jun 22 Jun

4:41 a.m.

New subject: [PATCH V2 10/10] cpufreq: propagate errors returned from __cpufreq_governor()

On 06/20/2015 08:42 AM, Viresh Kumar wrote:

...

On 19-06-15, 23:32, Preeti U Murthy wrote:

...
If INIT itself fails, we need to set policy->governor to NULL. I included this in the testing of the patchset.

Diff from earlier patch:

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index da672b910760..59fa0c1b7922 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2284,14 +2284,23 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, old_gov = policy->governor; /* end old governor */ if (old_gov) {
if(!(ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP))) {
ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
if (ret) {
	/* This can happen due to race with other operations */
	pr_debug("%s: Failed to Stop Governor: %s (%d)\n",
		 __func__, old_gov->name, ret);
	return ret;
} else {
up_write(&policy->rwsem);
ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
down_write(&policy->rwsem);
}
if (ret)
	return ret;
	if (ret) {
		pr_err("%s: Failed to Exit Governor: %s (%d)\n",
		       __func__, old_gov->name, ret);
		return ret;
	}
}
}

/* start new governor */
@@ -2309,7 +2318,9 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, pr_debug("starting governor %s failed\n", policy->governor->name); if (old_gov) { policy->governor = old_gov;
if (!__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT))
if (__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT))
	policy->governor = NULL;
else
__cpufreq_governor(policy, CPUFREQ_GOV_START);
}
New-patch:

--------------8<-------------------

From: Viresh Kumar viresh.kumar@linaro.org Date: Thu, 11 Sep 2014 10:50:48 +0530 Subject: [PATCH] cpufreq: propagate errors returned from __cpufreq_governor()

Return codes aren't honored properly in cpufreq_set_policy(). This can lead to two problems:

wrong errors propagated to sysfs

we try to do next state-change even if the previous one failed

cpufreq_governor_dbs() now returns proper errors on all invalid state-transition requests and this code should honor that.

Reviewed-and-Tested-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org

drivers/cpufreq/cpufreq.c | 33 ++++++++++++++++++++++++--------- 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index b612411655f9..59fa0c1b7922 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2284,16 +2284,29 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, old_gov = policy->governor; /* end old governor */ if (old_gov) {
__cpufreq_governor(policy, CPUFREQ_GOV_STOP);
up_write(&policy->rwsem);
__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
down_write(&policy->rwsem);
ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
if (ret) {
	/* This can happen due to race with other operations */
	pr_debug("%s: Failed to Stop Governor: %s (%d)\n",
		 __func__, old_gov->name, ret);
	return ret;
} else {
	up_write(&policy->rwsem);
	ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
	down_write(&policy->rwsem);
	if (ret) {
		pr_err("%s: Failed to Exit Governor: %s (%d)\n",
		       __func__, old_gov->name, ret);
		return ret;
	}
}
}

/* start new governor */ policy->governor = new_policy->governor;
if (!__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT)) {
if (!__cpufreq_governor(policy, CPUFREQ_GOV_START))
if (!(ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT))) {
if (!(ret = __cpufreq_governor(policy, CPUFREQ_GOV_START)))
goto out;
up_write(&policy->rwsem);
@@ -2305,11 +2318,13 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, pr_debug("starting governor %s failed\n", policy->governor->name); if (old_gov) { policy->governor = old_gov;
__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT);
__cpufreq_governor(policy, CPUFREQ_GOV_START);
if (__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT))
	policy->governor = NULL;
else
	__cpufreq_governor(policy, CPUFREQ_GOV_START);
}
return -EINVAL;

return ret;

out: pr_debug("governor: change or update limits\n");

There are some checkpatch errors on this patch.

Regards Preeti U Murthy

Viresh Kumar

5:21 a.m.

New subject: [PATCH V2 10/10] cpufreq: propagate errors returned from __cpufreq_governor()

On 22-06-15, 10:11, Preeti U Murthy wrote:

...

There are some checkpatch errors on this patch.

--------------8<------------------

From: Viresh Kumar viresh.kumar@linaro.org Date: Thu, 11 Sep 2014 10:50:48 +0530 Subject: [PATCH] cpufreq: propagate errors returned from __cpufreq_governor()

Return codes aren't honored properly in cpufreq_set_policy(). This can lead to two problems: - wrong errors propagated to sysfs - we try to do next state-change even if the previous one failed

cpufreq_governor_dbs() now returns proper errors on all invalid state-transition requests and this code should honor that.

Reviewed-and-tested-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 31 ++++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index b612411655f9..0b3c60861bdf 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2284,16 +2284,31 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, old_gov = policy->governor; /* end old governor */ if (old_gov) { - __cpufreq_governor(policy, CPUFREQ_GOV_STOP); + ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP); + if (ret) { + /* This can happen due to race with other operations */ + pr_debug("%s: Failed to Stop Governor: %s (%d)\n", + __func__, old_gov->name, ret); + return ret; + } + up_write(&policy->rwsem); - __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); + ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); down_write(&policy->rwsem); + + if (ret) { + pr_err("%s: Failed to Exit Governor: %s (%d)\n", + __func__, old_gov->name, ret); + return ret; + } }

/* start new governor */ policy->governor = new_policy->governor; - if (!__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT)) { - if (!__cpufreq_governor(policy, CPUFREQ_GOV_START)) + ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT); + if (!ret) { + ret = __cpufreq_governor(policy, CPUFREQ_GOV_START); + if (!ret) goto out;

up_write(&policy->rwsem); @@ -2305,11 +2320,13 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, pr_debug("starting governor %s failed\n", policy->governor->name); if (old_gov) { policy->governor = old_gov; - __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT); - __cpufreq_governor(policy, CPUFREQ_GOV_START); + if (__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT)) + policy->governor = NULL; + else + __cpufreq_governor(policy, CPUFREQ_GOV_START); }

- return -EINVAL; + return ret;

out: pr_debug("governor: change or update limits\n");

Preeti U Murthy

19 Jun 19 Jun

6:09 p.m.

New subject: [PATCH V2 10/10] cpufreq: propagate errors returned from __cpufreq_governor()

On 06/19/2015 05:18 PM, Viresh Kumar wrote:

...

Return codes aren't honored properly in cpufreq_set_policy(). This can lead to two problems:

wrong errors propagated to sysfs

we try to do next state-change even if the previous one failed

cpufreq_governor_dbs() now returns proper errors on all invalid state-transition requests and this code should honor that.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org

Incorporating the nit in my previous reply to this patch would mean,

Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com

Preeti U Murthy

6:08 p.m.

On 06/19/2015 05:18 PM, Viresh Kumar wrote:

...

Hi Rafael/Preeti,

This is another attempt to fix the crashes reported by Preeti. They work quite well for me now, and I hope they would work well for Preeti as well :)

So, patches [1-7,9] are already Reviewed by Preeti.

The first 5 patches are minor cleanups, 6th & 7th try to optimize few things to make code less complex.

Patches 8-10 actually solve (or try to solve :)) the synchronization problems, or the crashes I was getting.

This series has been tested on powerpc, by running governor switching in parallel and another test where hotplug, governor switching were run in parallel.So,

Tested-by: Preeti U Murthy preeti@linux.vnet.ibm.com

...

V1->V2:

7/11 is dropped and only 8/11 is updated, which is 8/10 now.

Avoid taking the same mutex in both cpufreq_governor_dbs() and work-handler as that has given us some locdeps, classic ABBA stuff.

And so timer_mutex, responsible for synchronizing governor work-handlers is kept as is.

Later patches are almost same with minor updates.

Invalid state-transitions are sensed better now with improved checks.

I have run enough tests on my exynos dual core board and failed to crash at all. Not that I wanted to crash :)

Rebased over pm/bleeeding-edge.

Viresh Kumar (10): cpufreq: governor: Name delayed-work as dwork cpufreq: governor: Drop unused field 'cpu' cpufreq: governor: Rename 'cpu_dbs_common_info' to 'cpu_dbs_info' cpufreq: governor: name pointer to cpu_dbs_info as 'cdbs' cpufreq: governor: rename cur_policy as policy cpufreq: governor: Keep single copy of information common to policy->cpus cpufreq: governor: split out common part of {cs|od}_dbs_timer() cpufreq: governor: Avoid invalid states with additional checks cpufreq: governor: Don't WARN on invalid states cpufreq: propagate errors returned from __cpufreq_governor()

drivers/cpufreq/cpufreq.c | 22 ++-- drivers/cpufreq/cpufreq_conservative.c | 25 ++--- drivers/cpufreq/cpufreq_governor.c | 196 ++++++++++++++++++++++++--------- drivers/cpufreq/cpufreq_governor.h | 40 ++++--- drivers/cpufreq/cpufreq_ondemand.c | 67 ++++++----- 5 files changed, 220 insertions(+), 130 deletions(-)

Rafael J. Wysocki

11:16 p.m.

On Friday, June 19, 2015 05:18:00 PM Viresh Kumar wrote:

...

Hi Rafael/Preeti,

This is another attempt to fix the crashes reported by Preeti. They work quite well for me now, and I hope they would work well for Preeti as well :)

So, patches [1-7,9] are already Reviewed by Preeti.

The first 5 patches are minor cleanups, 6th & 7th try to optimize few things to make code less complex.

Patches 8-10 actually solve (or try to solve :)) the synchronization problems, or the crashes I was getting.

V1->V2:

7/11 is dropped and only 8/11 is updated, which is 8/10 now.

Avoid taking the same mutex in both cpufreq_governor_dbs() and work-handler as that has given us some locdeps, classic ABBA stuff.

And so timer_mutex, responsible for synchronizing governor work-handlers is kept as is.

Later patches are almost same with minor updates.

Invalid state-transitions are sensed better now with improved checks.

I have run enough tests on my exynos dual core board and failed to crash at all. Not that I wanted to crash :)

Rebased over pm/bleeeding-edge.

Viresh Kumar (10): cpufreq: governor: Name delayed-work as dwork cpufreq: governor: Drop unused field 'cpu' cpufreq: governor: Rename 'cpu_dbs_common_info' to 'cpu_dbs_info' cpufreq: governor: name pointer to cpu_dbs_info as 'cdbs' cpufreq: governor: rename cur_policy as policy cpufreq: governor: Keep single copy of information common to policy->cpus cpufreq: governor: split out common part of {cs|od}_dbs_timer() cpufreq: governor: Avoid invalid states with additional checks cpufreq: governor: Don't WARN on invalid states cpufreq: propagate errors returned from __cpufreq_governor()

drivers/cpufreq/cpufreq.c | 22 ++-- drivers/cpufreq/cpufreq_conservative.c | 25 ++--- drivers/cpufreq/cpufreq_governor.c | 196 ++++++++++++++++++++++++--------- drivers/cpufreq/cpufreq_governor.h | 40 ++++--- drivers/cpufreq/cpufreq_ondemand.c | 67 ++++++----- 5 files changed, 220 insertions(+), 130 deletions(-)

So as I said before, I'm not regarding this as 4.2 material.

The fact that you manage to send your stuff before the merge window opens doesn't automatically mean that it has to go in during that merge window.

-- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center.

Viresh Kumar

20 Jun 20 Jun

2:36 a.m.

On 20-06-15, 01:16, Rafael J. Wysocki wrote:

...

So as I said before, I'm not regarding this as 4.2 material.

Okay.

...

The fact that you manage to send your stuff before the merge window opens doesn't automatically mean that it has to go in during that merge window.

Yeah, that wasn't the intention. There were few reasons for me to hurry up: - It was related to crashes that we are aware of since few kernel releases now. And these aren't new features, but bug fixes, ofcourse not for bugs that are introduced in 4.2 material :) - I wanted Preeti to test this stuff, as she wouldn't be working on this stuff anymore after this month.

Anyway, its fine to me even if this goes in the next cycle.

-- viresh

Rafael J. Wysocki

17 Jul 17 Jul

10:15 p.m.

On Friday, June 19, 2015 05:18:00 PM Viresh Kumar wrote:

...

Hi Rafael/Preeti,

This is another attempt to fix the crashes reported by Preeti. They work quite well for me now, and I hope they would work well for Preeti as well :)

So, patches [1-7,9] are already Reviewed by Preeti.

The first 5 patches are minor cleanups, 6th & 7th try to optimize few things to make code less complex.

Patches 8-10 actually solve (or try to solve :)) the synchronization problems, or the crashes I was getting.

V1->V2:

7/11 is dropped and only 8/11 is updated, which is 8/10 now.

Avoid taking the same mutex in both cpufreq_governor_dbs() and work-handler as that has given us some locdeps, classic ABBA stuff.

And so timer_mutex, responsible for synchronizing governor work-handlers is kept as is.

Later patches are almost same with minor updates.

Invalid state-transitions are sensed better now with improved checks.

I have run enough tests on my exynos dual core board and failed to crash at all. Not that I wanted to crash :)

Rebased over pm/bleeeding-edge.

Viresh Kumar (10): cpufreq: governor: Name delayed-work as dwork cpufreq: governor: Drop unused field 'cpu' cpufreq: governor: Rename 'cpu_dbs_common_info' to 'cpu_dbs_info' cpufreq: governor: name pointer to cpu_dbs_info as 'cdbs' cpufreq: governor: rename cur_policy as policy

The above are all fine by me, so I'm queuing them up for 4.3.

The one below needs a small update in my view which will propagate to the next ones.

...

cpufreq: governor: Keep single copy of information common to policy->cpus cpufreq: governor: split out common part of {cs|od}_dbs_timer() cpufreq: governor: Avoid invalid states with additional checks cpufreq: governor: Don't WARN on invalid states cpufreq: propagate errors returned from __cpufreq_governor()

Thanks, Rafael

Viresh Kumar

18 Jul 18 Jul

6 a.m.

New subject: [PATCH V3 0/5] cpufreq: governor: Avoid invalid state-transitions

Hi Rafael,

These are rest of the patches after replacing 'ccdbs' with 'shared' as you suggested.

Viresh Kumar (5): cpufreq: governor: Keep single copy of information common to policy->cpus cpufreq: governor: split out common part of {cs|od}_dbs_timer() cpufreq: governor: Avoid invalid states with additional checks cpufreq: governor: Don't WARN on invalid states cpufreq: propagate errors returned from __cpufreq_governor()

drivers/cpufreq/cpufreq.c | 31 ++++-- drivers/cpufreq/cpufreq_conservative.c | 25 ++--- drivers/cpufreq/cpufreq_governor.c | 174 ++++++++++++++++++++++++++------- drivers/cpufreq/cpufreq_governor.h | 26 +++-- drivers/cpufreq/cpufreq_ondemand.c | 58 +++++------ 5 files changed, 210 insertions(+), 104 deletions(-)

-- 2.4.0

Viresh Kumar

6 a.m.

New subject: [PATCH V3 1/5] cpufreq: governor: Keep single copy of information common to policy->cpus

The memory for cpu_common_dbs_info is allocated/freed at INIT/EXIT, so that it we don't reallocate it for STOP/START sequence. It will be also be used (in next patch) while the governor is stopped and so must not be freed that early.

Reviewed-and-tested-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 18 ++++--- drivers/cpufreq/cpufreq_governor.c | 92 +++++++++++++++++++++++++--------- drivers/cpufreq/cpufreq_governor.h | 24 +++++---- drivers/cpufreq/cpufreq_ondemand.c | 38 +++++++------- 4 files changed, 114 insertions(+), 58 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index af47d322679e..d21c3cff9056 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -47,7 +47,7 @@ static inline unsigned int get_freq_target(struct cs_dbs_tuners *cs_tuners, static void cs_check_cpu(int cpu, unsigned int load) { struct cs_cpu_dbs_info_s *dbs_info = &per_cpu(cs_cpu_dbs_info, cpu); - struct cpufreq_policy *policy = dbs_info->cdbs.policy; + struct cpufreq_policy *policy = dbs_info->cdbs.shared->policy; struct dbs_data *dbs_data = policy->governor_data; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;

@@ -106,22 +106,24 @@ static void cs_dbs_timer(struct work_struct *work) { struct cs_cpu_dbs_info_s *dbs_info = container_of(work, struct cs_cpu_dbs_info_s, cdbs.dwork.work); - unsigned int cpu = dbs_info->cdbs.policy->cpu; + struct cpufreq_policy *policy = dbs_info->cdbs.shared->policy; + unsigned int cpu = policy->cpu; struct cs_cpu_dbs_info_s *core_dbs_info = &per_cpu(cs_cpu_dbs_info, cpu); - struct dbs_data *dbs_data = dbs_info->cdbs.policy->governor_data; + struct dbs_data *dbs_data = policy->governor_data; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; int delay = delay_for_sampling_rate(cs_tuners->sampling_rate); bool modify_all = true;

- mutex_lock(&core_dbs_info->cdbs.timer_mutex); - if (!need_load_eval(&core_dbs_info->cdbs, cs_tuners->sampling_rate)) + mutex_lock(&core_dbs_info->cdbs.shared->timer_mutex); + if (!need_load_eval(core_dbs_info->cdbs.shared, + cs_tuners->sampling_rate)) modify_all = false; else dbs_check_cpu(dbs_data, cpu);

- policy = dbs_info->cdbs.policy; + policy = dbs_info->cdbs.shared->policy;

/* * we only care if our internally tracked freq moves outside the 'valid' diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index c0566f86caed..b01cb729104b 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -35,7 +35,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); struct od_dbs_tuners *od_tuners = dbs_data->tuners; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; - struct cpufreq_policy *policy; + struct cpufreq_policy *policy = cdbs->shared->policy; unsigned int sampling_rate; unsigned int max_load = 0; unsigned int ignore_nice; @@ -60,8 +60,6 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) ignore_nice = cs_tuners->ignore_nice_load; }

- policy = cdbs->policy; - /* Get Absolute Load */ for_each_cpu(j, policy->cpus) { struct cpu_dbs_info *j_cdbs; @@ -209,17 +207,18 @@ static inline void gov_cancel_work(struct dbs_data *dbs_data, }

/* Will return if we need to evaluate cpu load again or not */ -bool need_load_eval(struct cpu_dbs_info *cdbs, unsigned int sampling_rate) +bool need_load_eval(struct cpu_common_dbs_info *shared, + unsigned int sampling_rate) { - if (policy_is_shared(cdbs->policy)) { + if (policy_is_shared(shared->policy)) { ktime_t time_now = ktime_get(); - s64 delta_us = ktime_us_delta(time_now, cdbs->time_stamp); + s64 delta_us = ktime_us_delta(time_now, shared->time_stamp);

/* Do nothing if we recently have sampled */ if (delta_us < (s64)(sampling_rate / 2)) return false; else - cdbs->time_stamp = time_now; + shared->time_stamp = time_now; }

return true; @@ -238,6 +237,37 @@ static void set_sampling_rate(struct dbs_data *dbs_data, } }

+static int alloc_common_dbs_info(struct cpufreq_policy *policy, + struct common_dbs_data *cdata) +{ + struct cpu_common_dbs_info *shared; + int j; + + /* Allocate memory for the common information for policy->cpus */ + shared = kzalloc(sizeof(*shared), GFP_KERNEL); + if (!shared) + return -ENOMEM; + + /* Set shared for all CPUs, online+offline */ + for_each_cpu(j, policy->related_cpus) + cdata->get_cpu_cdbs(j)->shared = shared; + + return 0; +} + +static void free_common_dbs_info(struct cpufreq_policy *policy, + struct common_dbs_data *cdata) +{ + struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(policy->cpu); + struct cpu_common_dbs_info *shared = cdbs->shared; + int j; + + for_each_cpu(j, policy->cpus) + cdata->get_cpu_cdbs(j)->shared = NULL; + + kfree(shared); +} + static int cpufreq_governor_init(struct cpufreq_policy *policy, struct dbs_data *dbs_data, struct common_dbs_data *cdata) @@ -248,6 +278,11 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, if (dbs_data) { if (WARN_ON(have_governor_per_policy())) return -EINVAL; + + ret = alloc_common_dbs_info(policy, cdata); + if (ret) + return ret; + dbs_data->usage_count++; policy->governor_data = dbs_data; return 0; @@ -257,12 +292,16 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, if (!dbs_data) return -ENOMEM;

+ ret = alloc_common_dbs_info(policy, cdata); + if (ret) + goto free_dbs_data; + dbs_data->cdata = cdata; dbs_data->usage_count = 1;

ret = cdata->init(dbs_data, !policy->governor->initialized); if (ret) - goto free_dbs_data; + goto free_common_dbs_info;

/* policy latency is in ns. Convert it to us first */ latency = policy->cpuinfo.transition_latency / 1000; @@ -299,6 +338,8 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, } cdata_exit: cdata->exit(dbs_data, !policy->governor->initialized); +free_common_dbs_info: + free_common_dbs_info(policy, cdata); free_dbs_data: kfree(dbs_data); return ret; @@ -322,6 +363,8 @@ static void cpufreq_governor_exit(struct cpufreq_policy *policy, cdata->exit(dbs_data, policy->governor->initialized == 1); kfree(dbs_data); } + + free_common_dbs_info(policy, cdata); }

if (!policy->cur) @@ -348,11 +392,14 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, io_busy = od_tuners->io_is_busy; }

+ shared->policy = policy; + shared->time_stamp = ktime_get(); + mutex_init(&shared->timer_mutex); + for_each_cpu(j, policy->cpus) { struct cpu_dbs_info *j_cdbs = cdata->get_cpu_cdbs(j); unsigned int prev_load;

- j_cdbs->policy = policy; j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy);

@@ -364,7 +411,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, if (ignore_nice) j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];

- mutex_init(&j_cdbs->timer_mutex); INIT_DEFERRABLE_WORK(&j_cdbs->dwork, cdata->gov_dbs_timer); }

@@ -384,9 +430,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, od_ops->powersave_bias_init_cpu(cpu); }

- /* Initiate timer time stamp */ - cdbs->time_stamp = ktime_get(); - gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate), true); return 0; @@ -398,6 +441,9 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, struct common_dbs_data *cdata = dbs_data->cdata; unsigned int cpu = policy->cpu; struct cpu_dbs_info *cdbs = cdata->get_cpu_cdbs(cpu); + struct cpu_common_dbs_info *shared = cdbs->shared; + + gov_cancel_work(dbs_data, policy);

if (cdata->governor == GOV_CONSERVATIVE) { struct cs_cpu_dbs_info_s *cs_dbs_info = @@ -406,10 +452,8 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy, cs_dbs_info->enable = 0; }

- gov_cancel_work(dbs_data, policy); - - mutex_destroy(&cdbs->timer_mutex); - cdbs->policy = NULL; + shared->policy = NULL; + mutex_destroy(&shared->timer_mutex); }

- if (!cdbs->policy) + if (!cdbs->shared || !cdbs->shared->policy) return;

- mutex_lock(&cdbs->timer_mutex); - if (policy->max < cdbs->policy->cur) - __cpufreq_driver_target(cdbs->policy, policy->max, + mutex_lock(&cdbs->shared->timer_mutex); + if (policy->max < cdbs->shared->policy->cur) + __cpufreq_driver_target(cdbs->shared->policy, policy->max, CPUFREQ_RELATION_H); - else if (policy->min > cdbs->policy->cur) - __cpufreq_driver_target(cdbs->policy, policy->min, + else if (policy->min > cdbs->shared->policy->cur) + __cpufreq_driver_target(cdbs->shared->policy, policy->min, CPUFREQ_RELATION_L); dbs_check_cpu(dbs_data, cpu); - mutex_unlock(&cdbs->timer_mutex); + mutex_unlock(&cdbs->shared->timer_mutex); }

int cpufreq_governor_dbs(struct cpufreq_policy *policy, diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index a0f8eb79ee6d..8e4a25f0730c 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -128,6 +128,18 @@ static void *get_cpu_dbs_info_s(int cpu) \ * cs_*: Conservative governor */

struct od_cpu_dbs_info_s { @@ -264,7 +269,8 @@ static ssize_t show_sampling_rate_min_gov_pol \ extern struct mutex cpufreq_governor_lock;

void dbs_check_cpu(struct dbs_data *dbs_data, int cpu); -bool need_load_eval(struct cpu_dbs_info *cdbs, unsigned int sampling_rate); +bool need_load_eval(struct cpu_common_dbs_info *shared, + unsigned int sampling_rate); int cpufreq_governor_dbs(struct cpufreq_policy *policy, struct common_dbs_data *cdata, unsigned int event); void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index d29c6f9c6e3e..14d7e86e7f94 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -155,7 +155,7 @@ static void dbs_freq_increase(struct cpufreq_policy *policy, unsigned int freq) static void od_check_cpu(int cpu, unsigned int load) { struct od_cpu_dbs_info_s *dbs_info = &per_cpu(od_cpu_dbs_info, cpu); - struct cpufreq_policy *policy = dbs_info->cdbs.policy; + struct cpufreq_policy *policy = dbs_info->cdbs.shared->policy; struct dbs_data *dbs_data = policy->governor_data; struct od_dbs_tuners *od_tuners = dbs_data->tuners;

@@ -195,16 +195,18 @@ static void od_dbs_timer(struct work_struct *work) { struct od_cpu_dbs_info_s *dbs_info = container_of(work, struct od_cpu_dbs_info_s, cdbs.dwork.work); - unsigned int cpu = dbs_info->cdbs.policy->cpu; + struct cpufreq_policy *policy = dbs_info->cdbs.shared->policy; + unsigned int cpu = policy->cpu; struct od_cpu_dbs_info_s *core_dbs_info = &per_cpu(od_cpu_dbs_info, cpu); - struct dbs_data *dbs_data = dbs_info->cdbs.policy->governor_data; + struct dbs_data *dbs_data = policy->governor_data; struct od_dbs_tuners *od_tuners = dbs_data->tuners; int delay = 0, sample_type = core_dbs_info->sample_type; bool modify_all = true;

- mutex_lock(&core_dbs_info->cdbs.timer_mutex); - if (!need_load_eval(&core_dbs_info->cdbs, od_tuners->sampling_rate)) { + mutex_lock(&core_dbs_info->cdbs.shared->timer_mutex); + if (!need_load_eval(core_dbs_info->cdbs.shared, + od_tuners->sampling_rate)) { modify_all = false; goto max_delay; } @@ -213,8 +215,7 @@ static void od_dbs_timer(struct work_struct *work) core_dbs_info->sample_type = OD_NORMAL_SAMPLE; if (sample_type == OD_SUB_SAMPLE) { delay = core_dbs_info->freq_lo_jiffies; - __cpufreq_driver_target(core_dbs_info->cdbs.policy, - core_dbs_info->freq_lo, + __cpufreq_driver_target(policy, core_dbs_info->freq_lo, CPUFREQ_RELATION_H); } else { dbs_check_cpu(dbs_data, cpu); @@ -230,8 +231,8 @@ static void od_dbs_timer(struct work_struct *work) delay = delay_for_sampling_rate(od_tuners->sampling_rate * core_dbs_info->rate_mult);

- mutex_lock(&dbs_info->cdbs.timer_mutex); + mutex_lock(&dbs_info->cdbs.shared->timer_mutex);

if (!delayed_work_pending(&dbs_info->cdbs.dwork)) { - mutex_unlock(&dbs_info->cdbs.timer_mutex); + mutex_unlock(&dbs_info->cdbs.shared->timer_mutex); continue; }

@@ -286,15 +287,15 @@ static void update_sampling_rate(struct dbs_data *dbs_data,

if (time_before(next_sampling, appointed_at)) {

- mutex_unlock(&dbs_info->cdbs.timer_mutex); + mutex_unlock(&dbs_info->cdbs.shared->timer_mutex); cancel_delayed_work_sync(&dbs_info->cdbs.dwork); - mutex_lock(&dbs_info->cdbs.timer_mutex); + mutex_lock(&dbs_info->cdbs.shared->timer_mutex);

- gov_queue_work(dbs_data, dbs_info->cdbs.policy, + gov_queue_work(dbs_data, policy, usecs_to_jiffies(new_rate), true);

} - mutex_unlock(&dbs_info->cdbs.timer_mutex); + mutex_unlock(&dbs_info->cdbs.shared->timer_mutex); } }

@@ -557,13 +558,16 @@ static void od_set_powersave_bias(unsigned int powersave_bias)

get_online_cpus(); for_each_online_cpu(cpu) { + struct cpu_common_dbs_info *shared; + if (cpumask_test_cpu(cpu, &done)) continue;

- policy = per_cpu(od_cpu_dbs_info, cpu).cdbs.policy; - if (!policy) + shared = per_cpu(od_cpu_dbs_info, cpu).cdbs.shared; + if (!shared) continue;

+ policy = shared->policy; cpumask_or(&done, &done, policy->cpus);

if (policy->governor != &cpufreq_gov_ondemand)

-- 2.4.0

Viresh Kumar

6:01 a.m.

New subject: [PATCH V3 2/5] cpufreq: governor: split out common part of {cs|od}_dbs_timer()

Some part of cs_dbs_timer() and od_dbs_timer() is exactly same and is unnecessarily duplicated.

Create the real work-handler in cpufreq_governor.c and put the common code in this routine (dbs_timer()).

Shouldn't make any functional change.

Reviewed-and-tested-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 27 +++++++----------------- drivers/cpufreq/cpufreq_governor.c | 38 ++++++++++++++++++++++++++++++---- drivers/cpufreq/cpufreq_governor.h | 10 ++++----- drivers/cpufreq/cpufreq_ondemand.c | 36 +++++++++++++------------------- 4 files changed, 60 insertions(+), 51 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index d21c3cff9056..84a1506950a7 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -102,28 +102,15 @@ static void cs_check_cpu(int cpu, unsigned int load) } }

-static void cs_dbs_timer(struct work_struct *work) +static unsigned int cs_dbs_timer(struct cpu_dbs_info *cdbs, + struct dbs_data *dbs_data, bool modify_all) { - struct cs_cpu_dbs_info_s *dbs_info = container_of(work, - struct cs_cpu_dbs_info_s, cdbs.dwork.work); - struct cpufreq_policy *policy = dbs_info->cdbs.shared->policy; - unsigned int cpu = policy->cpu; - struct cs_cpu_dbs_info_s *core_dbs_info = &per_cpu(cs_cpu_dbs_info, - cpu); - struct dbs_data *dbs_data = policy->governor_data; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; - int delay = delay_for_sampling_rate(cs_tuners->sampling_rate); - bool modify_all = true; - - mutex_lock(&core_dbs_info->cdbs.shared->timer_mutex); - if (!need_load_eval(core_dbs_info->cdbs.shared, - cs_tuners->sampling_rate)) - modify_all = false; - else - dbs_check_cpu(dbs_data, cpu); - - gov_queue_work(dbs_data, policy, delay, modify_all); - mutex_unlock(&core_dbs_info->cdbs.shared->timer_mutex); + + if (modify_all) + dbs_check_cpu(dbs_data, cdbs->shared->policy->cpu); + + return delay_for_sampling_rate(cs_tuners->sampling_rate); }

static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index b01cb729104b..7ed0ec2ac853 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -207,8 +207,8 @@ static inline void gov_cancel_work(struct dbs_data *dbs_data, }

/* Will return if we need to evaluate cpu load again or not */ -bool need_load_eval(struct cpu_common_dbs_info *shared, - unsigned int sampling_rate) +static bool need_load_eval(struct cpu_common_dbs_info *shared, + unsigned int sampling_rate) { if (policy_is_shared(shared->policy)) { ktime_t time_now = ktime_get(); @@ -223,7 +223,37 @@ bool need_load_eval(struct cpu_common_dbs_info *shared,

return true; } -EXPORT_SYMBOL_GPL(need_load_eval); + +static void dbs_timer(struct work_struct *work) +{ + struct cpu_dbs_info *cdbs = container_of(work, struct cpu_dbs_info, + dwork.work); + struct cpu_common_dbs_info *shared = cdbs->shared; + struct cpufreq_policy *policy = shared->policy; + struct dbs_data *dbs_data = policy->governor_data; + unsigned int sampling_rate, delay; + bool modify_all = true; + + mutex_lock(&shared->timer_mutex); + + if (dbs_data->cdata->governor == GOV_CONSERVATIVE) { + struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; + + sampling_rate = cs_tuners->sampling_rate; + } else { + struct od_dbs_tuners *od_tuners = dbs_data->tuners; + + sampling_rate = od_tuners->sampling_rate; + } + + if (!need_load_eval(cdbs->shared, sampling_rate)) + modify_all = false; + + delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, modify_all); + gov_queue_work(dbs_data, policy, delay, modify_all); + + mutex_unlock(&shared->timer_mutex); +}

- INIT_DEFERRABLE_WORK(&j_cdbs->dwork, cdata->gov_dbs_timer); + INIT_DEFERRABLE_WORK(&j_cdbs->dwork, dbs_timer); }

if (cdata->governor == GOV_CONSERVATIVE) { diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 8e4a25f0730c..50f171796632 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -132,8 +132,8 @@ static void *get_cpu_dbs_info_s(int cpu) \ struct cpu_common_dbs_info { struct cpufreq_policy *policy; /* - * percpu mutex that serializes governor limit change with gov_dbs_timer - * invocation. We do not want gov_dbs_timer to run when user is changing + * percpu mutex that serializes governor limit change with dbs_timer + * invocation. We do not want dbs_timer to run when user is changing * the governor or limits. */ struct mutex timer_mutex; @@ -210,7 +210,9 @@ struct common_dbs_data {

void dbs_check_cpu(struct dbs_data *dbs_data, int cpu); -bool need_load_eval(struct cpu_common_dbs_info *shared, - unsigned int sampling_rate); int cpufreq_governor_dbs(struct cpufreq_policy *policy, struct common_dbs_data *cdata, unsigned int event); void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 14d7e86e7f94..1fa9088c84a8 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -191,48 +191,40 @@ static void od_check_cpu(int cpu, unsigned int load) } }

-static void od_dbs_timer(struct work_struct *work) +static unsigned int od_dbs_timer(struct cpu_dbs_info *cdbs, + struct dbs_data *dbs_data, bool modify_all) { - struct od_cpu_dbs_info_s *dbs_info = - container_of(work, struct od_cpu_dbs_info_s, cdbs.dwork.work); - struct cpufreq_policy *policy = dbs_info->cdbs.shared->policy; + struct cpufreq_policy *policy = cdbs->shared->policy; unsigned int cpu = policy->cpu; - struct od_cpu_dbs_info_s *core_dbs_info = &per_cpu(od_cpu_dbs_info, + struct od_cpu_dbs_info_s *dbs_info = &per_cpu(od_cpu_dbs_info, cpu); - struct dbs_data *dbs_data = policy->governor_data; struct od_dbs_tuners *od_tuners = dbs_data->tuners; - int delay = 0, sample_type = core_dbs_info->sample_type; - bool modify_all = true; + int delay = 0, sample_type = dbs_info->sample_type;

- mutex_lock(&core_dbs_info->cdbs.shared->timer_mutex); - if (!need_load_eval(core_dbs_info->cdbs.shared, - od_tuners->sampling_rate)) { - modify_all = false; + if (!modify_all) goto max_delay; - }

max_delay: if (!delay) delay = delay_for_sampling_rate(od_tuners->sampling_rate - * core_dbs_info->rate_mult); + * dbs_info->rate_mult);

- gov_queue_work(dbs_data, policy, delay, modify_all); - mutex_unlock(&core_dbs_info->cdbs.shared->timer_mutex); + return delay; }

/************************** sysfs interface ************************/

-- 2.4.0

Viresh Kumar

6:01 a.m.

New subject: [PATCH V3 3/5] cpufreq: governor: Avoid invalid states with additional checks

There can be races where the request has come to a wrong state. For example INIT followed by STOP (instead of START) or START followed by EXIT (instead of STOP).

Tackle these races by comparing making sure the state-machine never gets into any invalid state. And return error if an invalid state-transition is requested.

Reviewed-and-tested-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_governor.c | 46 +++++++++++++++++++++++++++++--------- 1 file changed, 35 insertions(+), 11 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 7ed0ec2ac853..f225dc975415 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -305,6 +305,10 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy, unsigned int latency; int ret;

policy->governor_data = NULL; if (!--dbs_data->usage_count) { @@ -395,6 +404,7 @@ static void cpufreq_governor_exit(struct cpufreq_policy *policy, }

free_common_dbs_info(policy, cdata); + return 0; }

static int cpufreq_governor_start(struct cpufreq_policy *policy, @@ -409,6 +419,10 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, if (!policy->cur) return -EINVAL;

+ /* State should be equivalent to INIT */ + if (!shared || shared->policy) + return -EBUSY; + if (cdata->governor == GOV_CONSERVATIVE) { struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;

@@ -465,14 +479,18 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy, return 0; }

+ /* State should be equivalent to START */ + if (!shared || !shared->policy) + return -EBUSY; + gov_cancel_work(dbs_data, policy);

if (cdata->governor == GOV_CONSERVATIVE) { @@ -484,17 +502,19 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy,

shared->policy = NULL; mutex_destroy(&shared->timer_mutex); + return 0; }

+ /* State should be equivalent to START */ if (!cdbs->shared || !cdbs->shared->policy) - return; + return -EBUSY;

mutex_lock(&cdbs->shared->timer_mutex); if (policy->max < cdbs->shared->policy->cur) @@ -505,13 +525,15 @@ static void cpufreq_governor_limits(struct cpufreq_policy *policy, CPUFREQ_RELATION_L); dbs_check_cpu(dbs_data, cpu); mutex_unlock(&cdbs->shared->timer_mutex); + + return 0; }

int cpufreq_governor_dbs(struct cpufreq_policy *policy, struct common_dbs_data *cdata, unsigned int event) { struct dbs_data *dbs_data; - int ret = 0; + int ret;

unlock:

-- 2.4.0

Viresh Kumar

6:01 a.m.

New subject: [PATCH V3 4/5] cpufreq: governor: Don't WARN on invalid states

Lets stop warning on that for now, and make sure we don't enter an invalid state.

Reviewed-and-tested-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_governor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index f225dc975415..939197ffa4ac 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -543,7 +543,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy, else dbs_data = cdata->gdbs_data;

- if (WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT))) { + if (!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT)) { ret = -EINVAL; goto unlock; }

-- 2.4.0

Viresh Kumar

6:01 a.m.

New subject: [PATCH V3 5/5] cpufreq: propagate errors returned from __cpufreq_governor()

Return codes aren't honored properly in cpufreq_set_policy(). This can lead to two problems: - wrong errors propagated to sysfs - we try to do next state-change even if the previous one failed

cpufreq_governor_dbs() now returns proper errors on all invalid state-transition requests and this code should honor that.

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index a7b6ac6e048e..a3e8fb61cbcc 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2295,16 +2295,31 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, old_gov = policy->governor; /* end old governor */ if (old_gov) { - __cpufreq_governor(policy, CPUFREQ_GOV_STOP); + ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP); + if (ret) { + /* This can happen due to race with other operations */ + pr_debug("%s: Failed to Stop Governor: %s (%d)\n", + __func__, old_gov->name, ret); + return ret; + } + up_write(&policy->rwsem); - __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); + ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); down_write(&policy->rwsem); + + if (ret) { + pr_err("%s: Failed to Exit Governor: %s (%d)\n", + __func__, old_gov->name, ret); + return ret; + } }

/* start new governor */ policy->governor = new_policy->governor; - if (!__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT)) { - if (!__cpufreq_governor(policy, CPUFREQ_GOV_START)) + ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT); + if (!ret) { + ret = __cpufreq_governor(policy, CPUFREQ_GOV_START); + if (!ret) goto out;

up_write(&policy->rwsem); @@ -2316,11 +2331,13 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, pr_debug("starting governor %s failed\n", policy->governor->name); if (old_gov) { policy->governor = old_gov; - __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT); - __cpufreq_governor(policy, CPUFREQ_GOV_START); + if (__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT)) + policy->governor = NULL; + else + __cpufreq_governor(policy, CPUFREQ_GOV_START); }

- return -EINVAL; + return ret;

out: pr_debug("governor: change or update limits\n");

-- 2.4.0

Rafael J. Wysocki

23 Jul 23 Jul

9 p.m.

New subject: [PATCH V3 0/5] cpufreq: governor: Avoid invalid state-transitions

On Saturday, July 18, 2015 11:30:58 AM Viresh Kumar wrote:

...

Hi Rafael,

These are rest of the patches after replacing 'ccdbs' with 'shared' as you suggested.

Viresh Kumar (5): cpufreq: governor: Keep single copy of information common to policy->cpus cpufreq: governor: split out common part of {cs|od}_dbs_timer() cpufreq: governor: Avoid invalid states with additional checks cpufreq: governor: Don't WARN on invalid states cpufreq: propagate errors returned from __cpufreq_governor()

drivers/cpufreq/cpufreq.c | 31 ++++-- drivers/cpufreq/cpufreq_conservative.c | 25 ++--- drivers/cpufreq/cpufreq_governor.c | 174 ++++++++++++++++++++++++++------- drivers/cpufreq/cpufreq_governor.h | 26 +++-- drivers/cpufreq/cpufreq_ondemand.c | 58 +++++------ 5 files changed, 210 insertions(+), 104 deletions(-)

All queued up for 4.3, thanks!

-- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center.

3602

days inactive

3769

days old

linaro-kernel@lists.linaro.org

36 comments

participants

tags (0)

participants (6)

Lucas Stach
Preeti U Murthy
Rafael J. Wysocki
Robert Schöne
Saravana Kannan
Viresh Kumar