Hello Rafael,
this is the v7 for the cpufreq SW coordinated CPU bug fix, developed after the discussion with Viresh about some issues that appeared after the merge of both mine and his patches.
Changes from v6:
* Dropped the cpu hotplug patch, as it was solving part of problem addressed by Viresh patch:
dbcb634 cpufreq: Notify governors when cpus are hot-[un]plugged
but was behaving erratically when both patches were applied together.
* In the initial patch, cleaned up cpufreq_governor_dbs and dbs_timer_{init,exit} to remove non necessary variables and special case initialization, as the additional data provided on that code was not necessary since v6. The code is a bit cleaner now.
Would you consider replacing my patches currently on your pm-cpufreq-next branch with this series? There should not be any conflicts with existing patches. Let me know if you prefer incremental patches instead.
Thanks, Fabio
Fabio Baltieri (3): cpufreq: ondemand: call dbs_check_cpu only when necessary cpufreq: conservative: call dbs_check_cpu only when necessary cpufreq: ondemand: use all CPUs in update_sampling_rate
Rickard Andersson (1): cpufreq: handle SW coordinated CPUs
drivers/cpufreq/cpufreq_conservative.c | 46 ++++++++++++++++++++++--- drivers/cpufreq/cpufreq_governor.c | 34 ++++++++++++++----- drivers/cpufreq/cpufreq_governor.h | 2 ++ drivers/cpufreq/cpufreq_ondemand.c | 62 +++++++++++++++++++++++++++------- 4 files changed, 119 insertions(+), 25 deletions(-)
From: Rickard Andersson rickard.andersson@stericsson.com
This patch fixes a bug that occurred when we had load on a secondary CPU and the primary CPU was sleeping. Only one sampling timer was spawned and it was spawned as a deferred timer on the primary CPU, so when a secondary CPU had a change in load this was not detected by the cpufreq governor (both ondemand and conservative).
This patch make sure that deferred timers are run on all CPUs in the case of software controlled CPUs that run on the same frequency.
Signed-off-by: Rickard Andersson rickard.andersson@stericsson.com Signed-off-by: Fabio Baltieri fabio.baltieri@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 3 ++- drivers/cpufreq/cpufreq_governor.c | 31 +++++++++++++++++++++++-------- drivers/cpufreq/cpufreq_governor.h | 1 + drivers/cpufreq/cpufreq_ondemand.c | 3 ++- 4 files changed, 28 insertions(+), 10 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 64ef737..b9d7f14 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -122,7 +122,8 @@ static void cs_dbs_timer(struct work_struct *work)
dbs_check_cpu(&cs_dbs_data, cpu);
- schedule_delayed_work_on(cpu, &dbs_info->cdbs.work, delay); + schedule_delayed_work_on(smp_processor_id(), &dbs_info->cdbs.work, + delay); mutex_unlock(&dbs_info->cdbs.timer_mutex); }
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 6c5f1d3..8393d6e 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -161,17 +161,27 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) } EXPORT_SYMBOL_GPL(dbs_check_cpu);
-static inline void dbs_timer_init(struct dbs_data *dbs_data, - struct cpu_dbs_common_info *cdbs, unsigned int sampling_rate) +bool dbs_sw_coordinated_cpus(struct cpu_dbs_common_info *cdbs) +{ + struct cpufreq_policy *policy = cdbs->cur_policy; + + return cpumask_weight(policy->cpus) > 1; +} +EXPORT_SYMBOL_GPL(dbs_sw_coordinated_cpus); + +static inline void dbs_timer_init(struct dbs_data *dbs_data, int cpu, + unsigned int sampling_rate) { int delay = delay_for_sampling_rate(sampling_rate); + struct cpu_dbs_common_info *cdbs = dbs_data->get_cpu_cdbs(cpu);
- INIT_DEFERRABLE_WORK(&cdbs->work, dbs_data->gov_dbs_timer); - schedule_delayed_work_on(cdbs->cpu, &cdbs->work, delay); + schedule_delayed_work_on(cpu, &cdbs->work, delay); }
-static inline void dbs_timer_exit(struct cpu_dbs_common_info *cdbs) +static inline void dbs_timer_exit(struct dbs_data *dbs_data, int cpu) { + struct cpu_dbs_common_info *cdbs = dbs_data->get_cpu_cdbs(cpu); + cancel_delayed_work_sync(&cdbs->work); }
@@ -217,6 +227,10 @@ int cpufreq_governor_dbs(struct dbs_data *dbs_data, if (ignore_nice) j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE]; + + mutex_init(&j_cdbs->timer_mutex); + INIT_DEFERRABLE_WORK(&j_cdbs->work, + dbs_data->gov_dbs_timer); }
/* @@ -275,15 +289,16 @@ second_time: } mutex_unlock(&dbs_data->mutex);
- mutex_init(&cpu_cdbs->timer_mutex); - dbs_timer_init(dbs_data, cpu_cdbs, *sampling_rate); + for_each_cpu(j, policy->cpus) + dbs_timer_init(dbs_data, j, *sampling_rate); break;
case CPUFREQ_GOV_STOP: if (dbs_data->governor == GOV_CONSERVATIVE) cs_dbs_info->enable = 0;
- dbs_timer_exit(cpu_cdbs); + for_each_cpu(j, policy->cpus) + dbs_timer_exit(dbs_data, j);
mutex_lock(&dbs_data->mutex); mutex_destroy(&cpu_cdbs->timer_mutex); diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index f661654..5bf6fb8 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -171,6 +171,7 @@ static inline int delay_for_sampling_rate(unsigned int sampling_rate)
u64 get_cpu_idle_time(unsigned int cpu, u64 *wall); void dbs_check_cpu(struct dbs_data *dbs_data, int cpu); +bool dbs_sw_coordinated_cpus(struct cpu_dbs_common_info *cdbs); int cpufreq_governor_dbs(struct dbs_data *dbs_data, struct cpufreq_policy *policy, unsigned int event); #endif /* _CPUFREQ_GOVERNER_H */ diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 7731f7c..93bb56d 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -243,7 +243,8 @@ static void od_dbs_timer(struct work_struct *work) } }
- schedule_delayed_work_on(cpu, &dbs_info->cdbs.work, delay); + schedule_delayed_work_on(smp_processor_id(), &dbs_info->cdbs.work, + delay); mutex_unlock(&dbs_info->cdbs.timer_mutex); }
Hi Fabio,
I know Rafael has asked you to send only the incremental patch, but i am interested in reviewing whole series again, as i haven't reviewed earlier stuff :)
I believe you are required to send patches to patches@linaro.org too :)
On 30 January 2013 18:30, Fabio Baltieri fabio.baltieri@linaro.org wrote:
From: Rickard Andersson rickard.andersson@stericsson.com
This patch fixes a bug that occurred when we had load on a secondary CPU and the primary CPU was sleeping. Only one sampling timer was spawned and it was spawned as a deferred timer on the primary CPU, so when a secondary CPU had a change in load this was not detected by the cpufreq governor (both ondemand and conservative).
This patch make sure that deferred timers are run on all CPUs in the case of software controlled CPUs that run on the same frequency.
Its really a good point. I wondered earlier why does interactive governor has per-cpu timer. BTW, you can check how interactive governor is handling this requirement. That might improve these drivers too.
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c +bool dbs_sw_coordinated_cpus(struct cpu_dbs_common_info *cdbs) +{
struct cpufreq_policy *policy = cdbs->cur_policy;
return cpumask_weight(policy->cpus) > 1;
+} +EXPORT_SYMBOL_GPL(dbs_sw_coordinated_cpus);
I believe this routine should be rather present in cpufreq core code, as their might be other users of it. Its really not related to dbs or governors.
My ideas about the name of routine then: - policy_is_shared() - or something better you have :)
@@ -217,6 +227,10 @@ int cpufreq_governor_dbs(struct dbs_data *dbs_data, if (ignore_nice) j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
mutex_init(&j_cdbs->timer_mutex);
INIT_DEFERRABLE_WORK(&j_cdbs->work,
dbs_data->gov_dbs_timer);
That doesn't look the right place for doing it. With this change we will initialize mutex and work for all cpus whenever cpufreq_governor_dbs(GOV_START) is called. We really want to do it just before second_time: label, which will do it only when this is called for the very first time or cpu.
That's what i thought initially :)
Then i looked at my own work and realized something else :).. Now, because we stop/start governors on every cpu movement, this routine is called only once.
What you can do: - Create a single for_each_cpu() in GOV_START path - Get rid of dbs_data->enable routine as we don't need to store the number of calls for GOV_START.
} /*
@@ -275,15 +289,16 @@ second_time: } mutex_unlock(&dbs_data->mutex);
mutex_init(&cpu_cdbs->timer_mutex);
dbs_timer_init(dbs_data, cpu_cdbs, *sampling_rate);
for_each_cpu(j, policy->cpus)
dbs_timer_init(dbs_data, j, *sampling_rate);
Keep this too in the same for_each_cpu() loop and hence get to older param types of dbs_timer_init(), i.e. don't pass cpu as we already have j_cdbs in our loop.
break; case CPUFREQ_GOV_STOP: if (dbs_data->governor == GOV_CONSERVATIVE) cs_dbs_info->enable = 0;
dbs_timer_exit(cpu_cdbs);
for_each_cpu(j, policy->cpus)
dbs_timer_exit(dbs_data, j); mutex_lock(&dbs_data->mutex); mutex_destroy(&cpu_cdbs->timer_mutex);
Hi Viresh,
On Wed, Jan 30, 2013 at 08:32:38PM +0530, Viresh Kumar wrote:
Hi Fabio,
I know Rafael has asked you to send only the incremental patch, but i am interested in reviewing whole series again, as i haven't reviewed earlier stuff :)
Sure, take your chance.
[internal note]
I believe you are required to send patches to patches@linaro.org too :)
I already have patches@linaro.org in BCC, all my patches are there. [/internal note]
On 30 January 2013 18:30, Fabio Baltieri fabio.baltieri@linaro.org wrote:
From: Rickard Andersson rickard.andersson@stericsson.com
This patch fixes a bug that occurred when we had load on a secondary CPU and the primary CPU was sleeping. Only one sampling timer was spawned and it was spawned as a deferred timer on the primary CPU, so when a secondary CPU had a change in load this was not detected by the cpufreq governor (both ondemand and conservative).
This patch make sure that deferred timers are run on all CPUs in the case of software controlled CPUs that run on the same frequency.
Its really a good point. I wondered earlier why does interactive governor has per-cpu timer. BTW, you can check how interactive governor is handling this requirement. That might improve these drivers too.
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c +bool dbs_sw_coordinated_cpus(struct cpu_dbs_common_info *cdbs) +{
struct cpufreq_policy *policy = cdbs->cur_policy;
return cpumask_weight(policy->cpus) > 1;
+} +EXPORT_SYMBOL_GPL(dbs_sw_coordinated_cpus);
I believe this routine should be rather present in cpufreq core code, as their might be other users of it. Its really not related to dbs or governors.
My ideas about the name of routine then:
- policy_is_shared()
- or something better you have :)
So you are suggesting to rethink this function to be related to policy rather than dbs... this may as well become an inline in cpufreq.h, as:
static inline bool policy_is_shared(struct cpufreq_policy *policy) { return cpumask_weight(policy->cpus) > 1; }
I'm not sure about the name through, I like mentioning sw coordination in it because that's the comment in the declaration:
cpumask_var_t cpus; /* CPUs requiring sw coordination */ cpumask_var_t related_cpus; /* CPUs with any coordination */
And those two are already confusing as a starting point.
Anyway, this sounds fine to me. If you think this is useful I can send a patch, or feel free to include it in your patches if you plan to do further cleanup work on this code.
/me tries to also keep that ->cpu field in mind.
@@ -217,6 +227,10 @@ int cpufreq_governor_dbs(struct dbs_data *dbs_data, if (ignore_nice) j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
mutex_init(&j_cdbs->timer_mutex);
INIT_DEFERRABLE_WORK(&j_cdbs->work,
dbs_data->gov_dbs_timer);
That doesn't look the right place for doing it. With this change we will initialize mutex and work for all cpus whenever cpufreq_governor_dbs(GOV_START) is called. We really want to do it just before second_time: label, which will do it only when this is called for the very first time or cpu.
That's what i thought initially :)
Then i looked at my own work and realized something else :).. Now, because we stop/start governors on every cpu movement, this routine is called only once.
What you can do:
- Create a single for_each_cpu() in GOV_START path
- Get rid of dbs_data->enable routine as we don't need to store the
number of calls for GOV_START.
} /*
@@ -275,15 +289,16 @@ second_time: } mutex_unlock(&dbs_data->mutex);
mutex_init(&cpu_cdbs->timer_mutex);
dbs_timer_init(dbs_data, cpu_cdbs, *sampling_rate);
for_each_cpu(j, policy->cpus)
dbs_timer_init(dbs_data, j, *sampling_rate);
Keep this too in the same for_each_cpu() loop and hence get to older param types of dbs_timer_init(), i.e. don't pass cpu as we already have j_cdbs in our loop.
Ok, I can try to implement this if you help testing the patch on your bL system. :-)
Fabio
break; case CPUFREQ_GOV_STOP: if (dbs_data->governor == GOV_CONSERVATIVE) cs_dbs_info->enable = 0;
dbs_timer_exit(cpu_cdbs);
for_each_cpu(j, policy->cpus)
dbs_timer_exit(dbs_data, j); mutex_lock(&dbs_data->mutex); mutex_destroy(&cpu_cdbs->timer_mutex);
On 30 January 2013 21:53, Fabio Baltieri fabio.baltieri@linaro.org wrote:
On Wed, Jan 30, 2013 at 08:32:38PM +0530, Viresh Kumar wrote:
I believe this routine should be rather present in cpufreq core code, as their might be other users of it. Its really not related to dbs or governors.
My ideas about the name of routine then:
- policy_is_shared()
- or something better you have :)
So you are suggesting to rethink this function to be related to policy rather than dbs... this may as well become an inline in cpufreq.h, as:
static inline bool policy_is_shared(struct cpufreq_policy *policy) { return cpumask_weight(policy->cpus) > 1; }
Perfect.
I'm not sure about the name through, I like mentioning sw coordination in it because that's the comment in the declaration:
cpumask_var_t cpus; /* CPUs requiring sw coordination */ cpumask_var_t related_cpus; /* CPUs with any coordination */
And those two are already confusing as a starting point.
I will fix these comments with a patch of mine.
Anyway, this sounds fine to me. If you think this is useful I can send a patch, or feel free to include it in your patches if you plan to do further cleanup work on this code.
/me tries to also keep that ->cpu field in mind.
You can make it part of your patchsets v8.
On Wed, Jan 30, 2013 at 10:21:03PM +0530, Viresh Kumar wrote:
I'm not sure about the name through, I like mentioning sw coordination in it because that's the comment in the declaration:
cpumask_var_t cpus; /* CPUs requiring sw coordination */ cpumask_var_t related_cpus; /* CPUs with any coordination */
And those two are already confusing as a starting point.
I will fix these comments with a patch of mine.
Great!
Anyway, this sounds fine to me. If you think this is useful I can send a patch, or feel free to include it in your patches if you plan to do further cleanup work on this code.
/me tries to also keep that ->cpu field in mind.
You can make it part of your patchsets v8.
I'm not sending a v8 as Rafael already asked for incremental, but I'll send a patch with that soon.
Thanks, Fabio
On Wednesday, January 30, 2013 05:57:39 PM Fabio Baltieri wrote:
On Wed, Jan 30, 2013 at 10:21:03PM +0530, Viresh Kumar wrote:
I'm not sure about the name through, I like mentioning sw coordination in it because that's the comment in the declaration:
cpumask_var_t cpus; /* CPUs requiring sw coordination */ cpumask_var_t related_cpus; /* CPUs with any coordination */
And those two are already confusing as a starting point.
I will fix these comments with a patch of mine.
Great!
Anyway, this sounds fine to me. If you think this is useful I can send a patch, or feel free to include it in your patches if you plan to do further cleanup work on this code.
/me tries to also keep that ->cpu field in mind.
You can make it part of your patchsets v8.
I'm not sending a v8 as Rafael already asked for incremental, but I'll send a patch with that soon.
Yes, please.
Thanks, Rafael
Modify ondemand timer to not resample CPU utilization if recently sampled from another SW coordinated core.
Signed-off-by: Fabio Baltieri fabio.baltieri@linaro.org --- drivers/cpufreq/cpufreq_governor.c | 3 ++ drivers/cpufreq/cpufreq_governor.h | 1 + drivers/cpufreq/cpufreq_ondemand.c | 58 +++++++++++++++++++++++++++++++------- 3 files changed, 52 insertions(+), 10 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 8393d6e..46f96a4 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -289,6 +289,9 @@ second_time: } mutex_unlock(&dbs_data->mutex);
+ /* Initiate timer time stamp */ + cpu_cdbs->time_stamp = ktime_get(); + for_each_cpu(j, policy->cpus) dbs_timer_init(dbs_data, j, *sampling_rate); break; diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 5bf6fb8..aaf073d 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -82,6 +82,7 @@ struct cpu_dbs_common_info { * the governor or limits. */ struct mutex timer_mutex; + ktime_t time_stamp; };
struct od_cpu_dbs_info_s { diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 93bb56d..13ceb3c 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -216,23 +216,23 @@ static void od_check_cpu(int cpu, unsigned int load_freq) } }
-static void od_dbs_timer(struct work_struct *work) +static void od_timer_update(struct od_cpu_dbs_info_s *dbs_info, bool sample, + struct delayed_work *dw) { - struct od_cpu_dbs_info_s *dbs_info = - container_of(work, struct od_cpu_dbs_info_s, cdbs.work.work); unsigned int cpu = dbs_info->cdbs.cpu; int delay, sample_type = dbs_info->sample_type;
- mutex_lock(&dbs_info->cdbs.timer_mutex); - /* Common NORMAL_SAMPLE setup */ dbs_info->sample_type = OD_NORMAL_SAMPLE; if (sample_type == OD_SUB_SAMPLE) { delay = dbs_info->freq_lo_jiffies; - __cpufreq_driver_target(dbs_info->cdbs.cur_policy, - dbs_info->freq_lo, CPUFREQ_RELATION_H); + if (sample) + __cpufreq_driver_target(dbs_info->cdbs.cur_policy, + dbs_info->freq_lo, + CPUFREQ_RELATION_H); } else { - dbs_check_cpu(&od_dbs_data, cpu); + if (sample) + dbs_check_cpu(&od_dbs_data, cpu); if (dbs_info->freq_lo) { /* Setup timer for SUB_SAMPLE */ dbs_info->sample_type = OD_SUB_SAMPLE; @@ -243,11 +243,49 @@ static void od_dbs_timer(struct work_struct *work) } }
- schedule_delayed_work_on(smp_processor_id(), &dbs_info->cdbs.work, - delay); + schedule_delayed_work_on(smp_processor_id(), dw, delay); +} + +static void od_timer_coordinated(struct od_cpu_dbs_info_s *dbs_info_local, + struct delayed_work *dw) +{ + struct od_cpu_dbs_info_s *dbs_info; + ktime_t time_now; + s64 delta_us; + bool sample = true; + + /* use leader CPU's dbs_info */ + dbs_info = &per_cpu(od_cpu_dbs_info, dbs_info_local->cdbs.cpu); + mutex_lock(&dbs_info->cdbs.timer_mutex); + + time_now = ktime_get(); + delta_us = ktime_us_delta(time_now, dbs_info->cdbs.time_stamp); + + /* Do nothing if we recently have sampled */ + if (delta_us < (s64)(od_tuners.sampling_rate / 2)) + sample = false; + else + dbs_info->cdbs.time_stamp = time_now; + + od_timer_update(dbs_info, sample, dw); mutex_unlock(&dbs_info->cdbs.timer_mutex); }
+static void od_dbs_timer(struct work_struct *work) +{ + struct delayed_work *dw = to_delayed_work(work); + struct od_cpu_dbs_info_s *dbs_info = + container_of(work, struct od_cpu_dbs_info_s, cdbs.work.work); + + if (dbs_sw_coordinated_cpus(&dbs_info->cdbs)) { + od_timer_coordinated(dbs_info, dw); + } else { + mutex_lock(&dbs_info->cdbs.timer_mutex); + od_timer_update(dbs_info, true, dw); + mutex_unlock(&dbs_info->cdbs.timer_mutex); + } +} + /************************** sysfs interface ************************/
static ssize_t show_sampling_rate_min(struct kobject *kobj,
On 30 January 2013 18:30, Fabio Baltieri fabio.baltieri@linaro.org wrote:
Modify ondemand timer to not resample CPU utilization if recently sampled from another SW coordinated core.
Signed-off-by: Fabio Baltieri fabio.baltieri@linaro.org
I might be wrong but i have some real concerns over this patch.
This is what i believe is your idea: - because a cpu can sleep, create timer per cpu - then check load again only when no other cpu in policy->cpus has checked load within sampling time interval.
Correct?
drivers/cpufreq/cpufreq_governor.c | 3 ++ drivers/cpufreq/cpufreq_governor.h | 1 + drivers/cpufreq/cpufreq_ondemand.c | 58 +++++++++++++++++++++++++++++++------- 3 files changed, 52 insertions(+), 10 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 8393d6e..46f96a4 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -289,6 +289,9 @@ second_time: } mutex_unlock(&dbs_data->mutex);
/* Initiate timer time stamp */
cpu_cdbs->time_stamp = ktime_get();
We have updated time_stamp only for policy->cpu's cdbs.
for_each_cpu(j, policy->cpus) dbs_timer_init(dbs_data, j, *sampling_rate); break;
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 93bb56d..13ceb3c 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -216,23 +216,23 @@ static void od_check_cpu(int cpu, unsigned int load_freq) } }
-static void od_dbs_timer(struct work_struct *work) +static void od_timer_update(struct od_cpu_dbs_info_s *dbs_info, bool sample,
struct delayed_work *dw)
{
struct od_cpu_dbs_info_s *dbs_info =
container_of(work, struct od_cpu_dbs_info_s, cdbs.work.work); unsigned int cpu = dbs_info->cdbs.cpu; int delay, sample_type = dbs_info->sample_type;
mutex_lock(&dbs_info->cdbs.timer_mutex);
/* Common NORMAL_SAMPLE setup */ dbs_info->sample_type = OD_NORMAL_SAMPLE; if (sample_type == OD_SUB_SAMPLE) { delay = dbs_info->freq_lo_jiffies;
__cpufreq_driver_target(dbs_info->cdbs.cur_policy,
dbs_info->freq_lo, CPUFREQ_RELATION_H);
if (sample)
__cpufreq_driver_target(dbs_info->cdbs.cur_policy,
dbs_info->freq_lo,
CPUFREQ_RELATION_H); } else {
dbs_check_cpu(&od_dbs_data, cpu);
if (sample)
dbs_check_cpu(&od_dbs_data, cpu); if (dbs_info->freq_lo) { /* Setup timer for SUB_SAMPLE */ dbs_info->sample_type = OD_SUB_SAMPLE;
@@ -243,11 +243,49 @@ static void od_dbs_timer(struct work_struct *work) } }
schedule_delayed_work_on(smp_processor_id(), &dbs_info->cdbs.work,
delay);
schedule_delayed_work_on(smp_processor_id(), dw, delay);
+}
All good until now.
+static void od_timer_coordinated(struct od_cpu_dbs_info_s *dbs_info_local,
struct delayed_work *dw)
+{
struct od_cpu_dbs_info_s *dbs_info;
ktime_t time_now;
s64 delta_us;
bool sample = true;
--start--
/* use leader CPU's dbs_info */
dbs_info = &per_cpu(od_cpu_dbs_info, dbs_info_local->cdbs.cpu);
mutex_lock(&dbs_info->cdbs.timer_mutex);
time_now = ktime_get();
delta_us = ktime_us_delta(time_now, dbs_info->cdbs.time_stamp);
/* Do nothing if we recently have sampled */
if (delta_us < (s64)(od_tuners.sampling_rate / 2))
sample = false;
else
dbs_info->cdbs.time_stamp = time_now;
--end--
Instead of two routines (this and the one below), we can have one and can put the above code in the if (coordinated cpus case). That will save some redundant code.
Another issue that i see is: Current routine will be called from timer handler of every cpu and so above calculations will happen for every cpu. Here, you are taking the diff of last timestamp of cpu "x" with cpu "x" current timestamp, but what you should have really done is the diff b/w current timestamp with the timestamp of last change on any cpu in policy->cpus.
Over that, timestamp for all cpu's isn't initialized early in the code at GOV_START.
Am i correct?
od_timer_update(dbs_info, sample, dw); mutex_unlock(&dbs_info->cdbs.timer_mutex);
}
+static void od_dbs_timer(struct work_struct *work) +{
struct delayed_work *dw = to_delayed_work(work);
struct od_cpu_dbs_info_s *dbs_info =
container_of(work, struct od_cpu_dbs_info_s, cdbs.work.work);
if (dbs_sw_coordinated_cpus(&dbs_info->cdbs)) {
od_timer_coordinated(dbs_info, dw);
} else {
mutex_lock(&dbs_info->cdbs.timer_mutex);
od_timer_update(dbs_info, true, dw);
mutex_unlock(&dbs_info->cdbs.timer_mutex);
}
+}
On Wed, Jan 30, 2013 at 09:21:41PM +0530, Viresh Kumar wrote:
On 30 January 2013 18:30, Fabio Baltieri fabio.baltieri@linaro.org wrote:
Modify ondemand timer to not resample CPU utilization if recently sampled from another SW coordinated core.
Signed-off-by: Fabio Baltieri fabio.baltieri@linaro.org
I might be wrong but i have some real concerns over this patch.
This is what i believe is your idea:
- because a cpu can sleep, create timer per cpu
- then check load again only when no other cpu in policy->cpus has checked load within sampling time interval.
Correct?
Yes.
drivers/cpufreq/cpufreq_governor.c | 3 ++ drivers/cpufreq/cpufreq_governor.h | 1 + drivers/cpufreq/cpufreq_ondemand.c | 58 +++++++++++++++++++++++++++++++------- 3 files changed, 52 insertions(+), 10 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 8393d6e..46f96a4 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -289,6 +289,9 @@ second_time: } mutex_unlock(&dbs_data->mutex);
/* Initiate timer time stamp */
cpu_cdbs->time_stamp = ktime_get();
We have updated time_stamp only for policy->cpu's cdbs.
Yes, see below.
for_each_cpu(j, policy->cpus) dbs_timer_init(dbs_data, j, *sampling_rate); break;
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 93bb56d..13ceb3c 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -216,23 +216,23 @@ static void od_check_cpu(int cpu, unsigned int load_freq) } }
-static void od_dbs_timer(struct work_struct *work) +static void od_timer_update(struct od_cpu_dbs_info_s *dbs_info, bool sample,
struct delayed_work *dw)
{
struct od_cpu_dbs_info_s *dbs_info =
container_of(work, struct od_cpu_dbs_info_s, cdbs.work.work); unsigned int cpu = dbs_info->cdbs.cpu; int delay, sample_type = dbs_info->sample_type;
mutex_lock(&dbs_info->cdbs.timer_mutex);
/* Common NORMAL_SAMPLE setup */ dbs_info->sample_type = OD_NORMAL_SAMPLE; if (sample_type == OD_SUB_SAMPLE) { delay = dbs_info->freq_lo_jiffies;
__cpufreq_driver_target(dbs_info->cdbs.cur_policy,
dbs_info->freq_lo, CPUFREQ_RELATION_H);
if (sample)
__cpufreq_driver_target(dbs_info->cdbs.cur_policy,
dbs_info->freq_lo,
CPUFREQ_RELATION_H); } else {
dbs_check_cpu(&od_dbs_data, cpu);
if (sample)
dbs_check_cpu(&od_dbs_data, cpu); if (dbs_info->freq_lo) { /* Setup timer for SUB_SAMPLE */ dbs_info->sample_type = OD_SUB_SAMPLE;
@@ -243,11 +243,49 @@ static void od_dbs_timer(struct work_struct *work) } }
schedule_delayed_work_on(smp_processor_id(), &dbs_info->cdbs.work,
delay);
schedule_delayed_work_on(smp_processor_id(), dw, delay);
+}
All good until now.
+static void od_timer_coordinated(struct od_cpu_dbs_info_s *dbs_info_local,
struct delayed_work *dw)
+{
struct od_cpu_dbs_info_s *dbs_info;
ktime_t time_now;
s64 delta_us;
bool sample = true;
--start--
/* use leader CPU's dbs_info */
dbs_info = &per_cpu(od_cpu_dbs_info, dbs_info_local->cdbs.cpu);
mutex_lock(&dbs_info->cdbs.timer_mutex);
time_now = ktime_get();
delta_us = ktime_us_delta(time_now, dbs_info->cdbs.time_stamp);
/* Do nothing if we recently have sampled */
if (delta_us < (s64)(od_tuners.sampling_rate / 2))
sample = false;
else
dbs_info->cdbs.time_stamp = time_now;
--end--
Instead of two routines (this and the one below), we can have one and can put the above code in the if (coordinated cpus case). That will save some redundant code.
Ok but then you have two dbs_infos mixing up in the same code and it start to become harder to read (first version was with a single function I think).
Another issue that i see is: Current routine will be called from timer handler of every cpu and so above calculations will happen for every cpu. Here, you are taking the diff of last timestamp of cpu "x" with cpu "x" current timestamp, but what you should have really done is the diff b/w current timestamp with the timestamp of last change on any cpu in policy->cpus.
Isn't that how it works now? The current cpu ktime is not checked against its own, but against the "leader" cpu (dbs_info_local->cdbs.cpu), that's why it's initialized only for the first.
Maybe I should have used dbs_info_leader/dbs_info instead of dbs_info_local/dbs_info.
Over that, timestamp for all cpu's isn't initialized early in the code at GOV_START.
Am i correct?
As above.
Fabio
od_timer_update(dbs_info, sample, dw); mutex_unlock(&dbs_info->cdbs.timer_mutex);
}
+static void od_dbs_timer(struct work_struct *work) +{
struct delayed_work *dw = to_delayed_work(work);
struct od_cpu_dbs_info_s *dbs_info =
container_of(work, struct od_cpu_dbs_info_s, cdbs.work.work);
if (dbs_sw_coordinated_cpus(&dbs_info->cdbs)) {
od_timer_coordinated(dbs_info, dw);
} else {
mutex_lock(&dbs_info->cdbs.timer_mutex);
od_timer_update(dbs_info, true, dw);
mutex_unlock(&dbs_info->cdbs.timer_mutex);
}
+}
On 30 January 2013 22:16, Fabio Baltieri fabio.baltieri@linaro.org wrote:
Isn't that how it works now? The current cpu ktime is not checked against its own, but against the "leader" cpu (dbs_info_local->cdbs.cpu), that's why it's initialized only for the first.
Maybe I should have used dbs_info_leader/dbs_info instead of dbs_info_local/dbs_info.
This routine is called as wq handler. Which will recover dbs_info from work using container_of. Which would give dbs_info_local for the cpu j.
Then we will execute below code.
+ /* use leader CPU's dbs_info */ + dbs_info = &per_cpu(od_cpu_dbs_info, dbs_info_local->cdbs.cpu);
dbs_info_local->cdbs.cpu was uninitialized for all cpus except policy->cpu. And so, might be initialized with 0 as its a global variable... But if you offline cpu 0 and online it back, then policy->cpu would be 1 and this logic, which worked by mistake will fail.
+ mutex_lock(&dbs_info->cdbs.timer_mutex); + + time_now = ktime_get(); + delta_us = ktime_us_delta(time_now, dbs_info->cdbs.time_stamp);
and so as this.
Correct?
Hello Viresh,
On Wed, Jan 30, 2013 at 10:41:08PM +0530, Viresh Kumar wrote:
On 30 January 2013 22:16, Fabio Baltieri fabio.baltieri@linaro.org wrote:
Isn't that how it works now? The current cpu ktime is not checked against its own, but against the "leader" cpu (dbs_info_local->cdbs.cpu), that's why it's initialized only for the first.
Maybe I should have used dbs_info_leader/dbs_info instead of dbs_info_local/dbs_info.
This routine is called as wq handler. Which will recover dbs_info from work using container_of. Which would give dbs_info_local for the cpu j.
Then we will execute below code.
/* use leader CPU's dbs_info */
dbs_info = &per_cpu(od_cpu_dbs_info, dbs_info_local->cdbs.cpu);
dbs_info_local->cdbs.cpu was uninitialized for all cpus except policy->cpu.
Ok, now I see the problem: cdbs->cpu is initialized only on the leader cpu and this is working by coincidence on my system, while cdbs->time_stamp is initialized only on the leader cpu, and that should be correct even when cpu hotplugging as that's reinitialized every time.
That's a fix so I'll send a patch just to set ->cpu into the for_each_cpu cycle.
Thanks, Fabio
And so, might be initialized with 0 as its a global variable... But if you offline cpu 0 and online it back, then policy->cpu would be 1 and this logic, which worked by mistake will fail.
mutex_lock(&dbs_info->cdbs.timer_mutex);
time_now = ktime_get();
delta_us = ktime_us_delta(time_now, dbs_info->cdbs.time_stamp);
and so as this.
Correct?
To unsubscribe from this list: send the line "unsubscribe cpufreq" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 31 January 2013 14:09, Fabio Baltieri fabio.baltieri@linaro.org wrote:
Ok, now I see the problem: cdbs->cpu is initialized only on the leader cpu and this is working by coincidence on my system, while cdbs->time_stamp is initialized only on the leader cpu, and that should be correct even when cpu hotplugging as that's reinitialized every time.
That's a fix so I'll send a patch just to set ->cpu into the for_each_cpu cycle.
That's not enough. You need to set ->cpu to j and not policy->cpu as we discussed it earlier (params to timer init and exit.).
And so, we need to get policy->cpu somehow and get its timestamp.
On Thu, Jan 31, 2013 at 02:12:29PM +0530, Viresh Kumar wrote:
On 31 January 2013 14:09, Fabio Baltieri fabio.baltieri@linaro.org wrote:
Ok, now I see the problem: cdbs->cpu is initialized only on the leader cpu and this is working by coincidence on my system, while cdbs->time_stamp is initialized only on the leader cpu, and that should be correct even when cpu hotplugging as that's reinitialized every time.
That's a fix so I'll send a patch just to set ->cpu into the for_each_cpu cycle.
That's not enough. You need to set ->cpu to j and not policy->cpu as we discussed it earlier (params to timer init and exit.).
And so, we need to get policy->cpu somehow and get its timestamp.
Current code uses ->cpu to track policy->cpu and get the timestamp, I can modify it to point to the current cpu and use it in timer_init *and* add a new field just to track leader_cpu but I don't see the benefits. Am I missing something?
Fabio
On 31 January 2013 14:36, Fabio Baltieri fabio.baltieri@linaro.org wrote:
Current code uses ->cpu to track policy->cpu and get the timestamp, I can modify it to point to the current cpu and use it in timer_init *and* add a new field just to track leader_cpu but I don't see the benefits. Am I missing something?
Current code doesn't use cdbs->cpu for any cpu leaving the leader. So, we can use that field to keep local cpus. And for any cpu we can get the updated leader cpu with following:
j_cdbs->cur_policy->cpu :)
Modify conservative timer to not resample CPU utilization if recently sampled from another SW coordinated core.
Signed-off-by: Fabio Baltieri fabio.baltieri@linaro.org --- drivers/cpufreq/cpufreq_conservative.c | 47 +++++++++++++++++++++++++++++----- 1 file changed, 41 insertions(+), 6 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index b9d7f14..5d8e894 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -111,22 +111,57 @@ static void cs_check_cpu(int cpu, unsigned int load) } }
-static void cs_dbs_timer(struct work_struct *work) +static void cs_timer_update(struct cs_cpu_dbs_info_s *dbs_info, bool sample, + struct delayed_work *dw) { - struct cs_cpu_dbs_info_s *dbs_info = container_of(work, - struct cs_cpu_dbs_info_s, cdbs.work.work); unsigned int cpu = dbs_info->cdbs.cpu; int delay = delay_for_sampling_rate(cs_tuners.sampling_rate);
+ if (sample) + dbs_check_cpu(&cs_dbs_data, cpu); + + schedule_delayed_work_on(smp_processor_id(), dw, delay); +} + +static void cs_timer_coordinated(struct cs_cpu_dbs_info_s *dbs_info_local, + struct delayed_work *dw) +{ + struct cs_cpu_dbs_info_s *dbs_info; + ktime_t time_now; + s64 delta_us; + bool sample = true; + + /* use leader CPU's dbs_info */ + dbs_info = &per_cpu(cs_cpu_dbs_info, dbs_info_local->cdbs.cpu); mutex_lock(&dbs_info->cdbs.timer_mutex);
- dbs_check_cpu(&cs_dbs_data, cpu); + time_now = ktime_get(); + delta_us = ktime_us_delta(time_now, dbs_info->cdbs.time_stamp);
- schedule_delayed_work_on(smp_processor_id(), &dbs_info->cdbs.work, - delay); + /* Do nothing if we recently have sampled */ + if (delta_us < (s64)(cs_tuners.sampling_rate / 2)) + sample = false; + else + dbs_info->cdbs.time_stamp = time_now; + + cs_timer_update(dbs_info, sample, dw); mutex_unlock(&dbs_info->cdbs.timer_mutex); }
+static void cs_dbs_timer(struct work_struct *work) +{ + struct delayed_work *dw = to_delayed_work(work); + struct cs_cpu_dbs_info_s *dbs_info = container_of(work, + struct cs_cpu_dbs_info_s, cdbs.work.work); + + if (dbs_sw_coordinated_cpus(&dbs_info->cdbs)) { + cs_timer_coordinated(dbs_info, dw); + } else { + mutex_lock(&dbs_info->cdbs.timer_mutex); + cs_timer_update(dbs_info, true, dw); + mutex_unlock(&dbs_info->cdbs.timer_mutex); + } +} static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, void *data) {
On 30 January 2013 18:30, Fabio Baltieri fabio.baltieri@linaro.org wrote:
Modify conservative timer to not resample CPU utilization if recently sampled from another SW coordinated core.
Signed-off-by: Fabio Baltieri fabio.baltieri@linaro.org
We are again doing the same mistake which i fixed with:
commit 4471a34f9a1f2da220272e823bdb8e8fa83a7661 Author: Viresh Kumar viresh.kumar@linaro.org Date: Fri Oct 26 00:47:42 2012 +0200
cpufreq: governors: remove redundant code
Initially ondemand governor was written and then using its code conservative governor is written. It used a lot of code from ondemand governor, but copy of code was created instead of using the same routines from both governors. Which increased code redundancy, which is difficult to manage.
This patch is an attempt to move common part of both the governors to cpufreq_governor.c file to come over above mentioned issues.
On Wed, Jan 30, 2013 at 09:23:22PM +0530, Viresh Kumar wrote:
On 30 January 2013 18:30, Fabio Baltieri fabio.baltieri@linaro.org wrote:
Modify conservative timer to not resample CPU utilization if recently sampled from another SW coordinated core.
Signed-off-by: Fabio Baltieri fabio.baltieri@linaro.org
We are again doing the same mistake which i fixed with:
commit 4471a34f9a1f2da220272e823bdb8e8fa83a7661 Author: Viresh Kumar viresh.kumar@linaro.org Date: Fri Oct 26 00:47:42 2012 +0200
cpufreq: governors: remove redundant code
Can't argue with this, but the two had some subdle differences (namely th two dbs_info structures) and I opted to not mess up forcing some non-obvious common code.
Feel free to suggest a strategy.
Thanks, Fabio
On 30 January 2013 22:23, Fabio Baltieri fabio.baltieri@linaro.org wrote:
On Wed, Jan 30, 2013 at 09:23:22PM +0530, Viresh Kumar wrote:
Can't argue with this, but the two had some subdle differences (namely th two dbs_info structures) and I opted to not mess up forcing some non-obvious common code.
Feel free to suggest a strategy.
Just check the patch i mentioned, go ahead with something similar. There is a lot of code in my original patch which removed redundancy to a big level.
Modify update_sampling_rate() to check, and eventually immediately schedule, all CPU's do_dbs_timer delayed work.
This is required in case of software coordinated CPUs, as we now have a separate delayed work for each CPU.
Signed-off-by: Fabio Baltieri fabio.baltieri@linaro.org --- drivers/cpufreq/cpufreq_ondemand.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 13ceb3c..1017b90 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -326,7 +326,7 @@ static void update_sampling_rate(unsigned int new_rate) cpufreq_cpu_put(policy); continue; } - dbs_info = &per_cpu(od_cpu_dbs_info, policy->cpu); + dbs_info = &per_cpu(od_cpu_dbs_info, cpu); cpufreq_cpu_put(policy);
mutex_lock(&dbs_info->cdbs.timer_mutex); @@ -345,8 +345,7 @@ static void update_sampling_rate(unsigned int new_rate) cancel_delayed_work_sync(&dbs_info->cdbs.work); mutex_lock(&dbs_info->cdbs.timer_mutex);
- schedule_delayed_work_on(dbs_info->cdbs.cpu, - &dbs_info->cdbs.work, + schedule_delayed_work_on(cpu, &dbs_info->cdbs.work, usecs_to_jiffies(new_rate));
}