Quoting Morten Rasmussen (2015-04-27 06:52:10)
On Fri, Apr 24, 2015 at 06:32:02PM +0100, Wu, Junjie wrote:
On 4/24/2015 7:50, Morten Rasmussen wrote:
On Mon, Apr 20, 2015 at 12:54:15PM +0100, Morten Rasmussen wrote:
On Fri, Apr 17, 2015 at 10:34:21PM +0100, Wu, Junjie wrote:
On 4/15/2015 22:29, Michael Turquette wrote:
From: Morten Rasmussen Morten.Rasmussen@arm.com
Architectures that don't have any other means for tracking cpu frequency changes need a callback from cpufreq to implement a scaling factor to enable scale-invariant per-entity load-tracking in the scheduler.
To compute the scale invariance correction factor the architecture would need to know both the max frequency and the current frequency. This patch defines weak functions for setting both from cpufreq.
Related architecture specific functions use weak function definitions. The same approach is followed here.
These callbacks can be used to implement frequency scaling of cpu capacity later.
Cc: Rafael J. Wysocki rjw@rjwysocki.net Cc: Viresh Kumar viresh.kumar@linaro.org Signed-off-by: Morten Rasmussen morten.rasmussen@arm.com
drivers/cpufreq/cpufreq.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 28e59a4..3c6398a 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -280,6 +280,10 @@ static void adjust_jiffies(unsigned long val, struct cpufreq_freqs *ci) #endif }
+void __weak arch_scale_set_curr_freq(int cpu, unsigned long freq) {}
+void __weak arch_scale_set_max_freq(int cpu, unsigned long freq) {}
- static void __cpufreq_notify_transition(struct cpufreq_policy *policy, struct cpufreq_freqs *freqs, unsigned int state) {
@@ -317,6 +321,7 @@ static void __cpufreq_notify_transition(struct cpufreq_policy *policy, pr_debug("FREQ: %lu - CPU: %lu\n", (unsigned long)freqs->new, (unsigned long)freqs->cpu); trace_cpu_frequency(freqs->new, freqs->cpu);
arch_scale_set_curr_freq(freqs->cpu, freqs->new); srcu_notifier_call_chain(&cpufreq_transition_notifier_list, CPUFREQ_POSTCHANGE, freqs); if (likely(policy) && likely(policy->cpu == freqs->cpu))
@@ -2148,7 +2153,7 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, struct cpufreq_policy *new_policy) { struct cpufreq_governor *old_gov;
int ret;
int ret, cpu; pr_debug("setting new policy for CPU %u: %u - %u kHz\n", new_policy->cpu, new_policy->min, new_policy->max);
@@ -2186,6 +2191,12 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, policy->min = new_policy->min; policy->max = new_policy->max;
for_each_cpu(cpu, policy->cpus) {
arch_scale_set_max_freq(cpu, policy->max);
/* Workaround for corner cases where notifiers don't fire */
arch_scale_set_curr_freq(cpu, policy->cur);
}
pr_debug("new min and max freqs are %u - %u kHz\n", policy->min, policy->max);
Just curious why we need these new callbacks? I don't think they are providing any new information not already given by existing cpufreq policy and transition notifications.
Right. They are providing the same information I think. I went with the __weak function callbacks as this method is used for the existing scaling functions used by the scheduler (arch_scale_{cpu,freq}_capacity()). However, that is changing now so I should give it a try and see if we can use the notifiers instead. There might be some initialization problems though as the notifier has to be registrered before cpufreq initializes the first (default) policy for us to know what the max frequency is.
I think the patch below gives us what we need using the cpufreq notifiers instead. It is just a single patch, no neeed to touch cpufreq, so the patch below replaces both patch 1 and 2.
I haven't found any initialization problems on TC2. I'm not 100% sure that policy->cur is set at initilization of all cpufreq drivers. Drivers are not required to have a get-function which seems required for policy->cur to be set.
Even if cpufreq driver's init doesn't fill in policy->cur, cpufreq_init_policy() would call cpufreq_set_policy() that sets frequency. You would receive both policy and transition notifications where the POSTCHANGE would have the right frequency information. I think your patch would be fine.
AFAICT, you are not guaranteed a transition notification during init. I don't get one on TC2 if I use the userspace governor as default. But Viresh has confirmed that somebody will set policy->cur in all cases.
[...]
@@ -709,11 +734,35 @@ static struct notifier_block cpufreq_notifier = { .notifier_call = cpufreq_callback, };
+static int cpufreq_policy_callback(struct notifier_block *nb,
unsigned long val, void *data)
+{
- struct cpufreq_policy *policy = data;
- int i;
- for_each_cpu(i, policy->cpus) {
scale_freq_capacity(i, policy->cur, policy->max);
atomic_long_set(&per_cpu(cpu_max_freq, i), policy->max);
- }
- return NOTIFY_OK;
+}
You need to skip all other events except CPUFREQ_NOTIFY. CPUFREQ_ADJUST and CPUFREQ_INCOMPATIBLE are meant for drivers to change policy->min/max, and it not uncommon to have policy->max adjusted by kernel thermal drivers. CPUFREQ_NOTIFY is the final result of new limits.
Good point, thanks. No reason to let all the noise pass through to the scheduler. I have added:
if (val != CPUFREQ_NOTIFY)
return NOTIFY_OK;
before the loop above.
Mike: I have force-updated the branch to contain this fix.
Morten,
Thanks. I've gone ahead and pushed my RFC v2 to eas-dev yesterday. I'll rebase it to your patch after we complete this comments/review cycle.
Regards, Mike
Thanks, Morten