Whenever we are changing frequency of a cpu, we are calling PRECHANGE and POSTCHANGE notifiers. They must be serialized. i.e. PRECHANGE or POSTCHANGE shouldn't be called twice continuously. Following examples show why this is important:
Scenario 1: ----------- One thread reading value of cpuinfo_cur_freq, which will call __cpufreq_cpu_get()->cpufreq_out_of_sync()->cpufreq_notify_transition()..
And ondemand governor trying to change freq of cpu at the same time and so sending notification via ->target()..
Notifiers are not serialized and suppose this is what happened - PRECHANGE Notification for freq A (from cpuinfo_cur_freq) - PRECHANGE Notification for freq B (from target()) - Freq changed by target() to B - POSTCHANGE Notification for freq B - POSTCHANGE Notification for freq A
Now the last POSTCHANGE Notification happened for freq A and hardware is currently running at freq B :)
Where would we break then?: adjust_jiffies() in cpufreq.c and cpufreq_callback() in arch/arm/kernel/smp.c (which is also adjusting jiffies).. All loops_per_jiffy based stuff is broken..
Scenario 2: ----------- Governor is changing freq and has called __cpufreq_driver_target(). At the same time we are changing scaling_{min|max}_freq from sysfs, which would eventually end up calling governor's: CPUFREQ_GOV_LIMITS notification, that will also call: __cpufreq_driver_target().. And hence concurrent calls to ->target()
And Platform have something like this in their ->target() (Like: cpufreq-cpu0, omap, exynos, etc)
A. If new freq is more than old: Increase voltage B. Change freq C. If new freq is less than old: decrease voltage
Now, two concurrent calls to target are X and Y, where X is trying to increase freq and Y is trying to decrease it..
And this is the sequence that followed due to races..
X.A: voltage increased for larger freq Y.A: nothing happened here Y.B: freq decreased Y.C: voltage decreased X.B: freq increased X.C: nothing happened..
We ended up setting a freq which is not supported by the voltage we have set.. That will probably make clock to CPU unstable and system might not be workable anymore...
This patch adds protection in cpufreq_notify_transition() to make transitions serialized. It runs WARN() if POSTCHANGE notification is sent when we are not in middle of a transition. And PRECHANGE notification is forced to wait while the current transition is in progress.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org ---
This was discussed earlier here: https://lkml.org/lkml/2013/9/25/402
Where Rafael asked for better fix, as he called the V1 fixes: "quick and dirty". This is another approach, much simpler than previous one. Please see if this looks fine. There is a TODO note in there as I wanted some suggestions on how exactly should we wait for a transition to get over.
drivers/cpufreq/cpufreq.c | 39 +++++++++++++++++++++++++++++++++++++-- include/linux/cpufreq.h | 2 ++ 2 files changed, 39 insertions(+), 2 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 2677ff1..66bdfff 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -324,6 +324,13 @@ static void __cpufreq_notify_transition(struct cpufreq_policy *policy, } }
+static void notify_transition_for_each_cpu(struct cpufreq_policy *policy, + struct cpufreq_freqs *freqs, unsigned int state) +{ + for_each_cpu(freqs->cpu, policy->cpus) + __cpufreq_notify_transition(policy, freqs, state); +} + /** * cpufreq_notify_transition - call notifier chain and adjust_jiffies * on frequency transition. @@ -335,8 +342,35 @@ static void __cpufreq_notify_transition(struct cpufreq_policy *policy, void cpufreq_notify_transition(struct cpufreq_policy *policy, struct cpufreq_freqs *freqs, unsigned int state) { - for_each_cpu(freqs->cpu, policy->cpus) - __cpufreq_notify_transition(policy, freqs, state); + if ((state != CPUFREQ_PRECHANGE) && (state != CPUFREQ_POSTCHANGE)) + return notify_transition_for_each_cpu(policy, freqs, state); + + /* Serialize pre-post notifications */ + mutex_lock(&policy->transition_lock); + if (unlikely(WARN_ON(!policy->transition_ongoing && + (state == CPUFREQ_POSTCHANGE)))) { + mutex_unlock(&policy->transition_lock); + return; + } + + if (state == CPUFREQ_PRECHANGE) { + while (policy->transition_ongoing) { + mutex_unlock(&policy->transition_lock); + /* TODO: Can we do something better here? */ + cpu_relax(); + mutex_lock(&policy->transition_lock); + } + + policy->transition_ongoing = true; + mutex_unlock(&policy->transition_lock); + } + + notify_transition_for_each_cpu(policy, freqs, state); + + if (state == CPUFREQ_POSTCHANGE) { + policy->transition_ongoing = false; + mutex_unlock(&policy->transition_lock); + } } EXPORT_SYMBOL_GPL(cpufreq_notify_transition);
@@ -983,6 +1017,7 @@ static struct cpufreq_policy *cpufreq_policy_alloc(void)
INIT_LIST_HEAD(&policy->policy_list); init_rwsem(&policy->rwsem); + mutex_init(&policy->transition_lock);
return policy;
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 31c431e..e5cebce 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -104,6 +104,8 @@ struct cpufreq_policy { * __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT); */ struct rw_semaphore rwsem; + bool transition_ongoing; /* Tracks transition status */ + struct mutex transition_lock; };
/* Only for ACPI */
On 03/14/2014 01:13 PM, Viresh Kumar wrote:
Whenever we are changing frequency of a cpu, we are calling PRECHANGE and POSTCHANGE notifiers. They must be serialized. i.e. PRECHANGE or POSTCHANGE shouldn't be called twice continuously. Following examples show why this is important:
[...]
This was discussed earlier here: https://lkml.org/lkml/2013/9/25/402
Where Rafael asked for better fix, as he called the V1 fixes: "quick and dirty". This is another approach, much simpler than previous one. Please see if this looks fine. There is a TODO note in there as I wanted some suggestions on how exactly should we wait for a transition to get over.
drivers/cpufreq/cpufreq.c | 39 +++++++++++++++++++++++++++++++++++++-- include/linux/cpufreq.h | 2 ++ 2 files changed, 39 insertions(+), 2 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 2677ff1..66bdfff 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -324,6 +324,13 @@ static void __cpufreq_notify_transition(struct cpufreq_policy *policy, } }
+static void notify_transition_for_each_cpu(struct cpufreq_policy *policy,
struct cpufreq_freqs *freqs, unsigned int state)
+{
- for_each_cpu(freqs->cpu, policy->cpus)
__cpufreq_notify_transition(policy, freqs, state);
+}
/**
- cpufreq_notify_transition - call notifier chain and adjust_jiffies
- on frequency transition.
@@ -335,8 +342,35 @@ static void __cpufreq_notify_transition(struct cpufreq_policy *policy, void cpufreq_notify_transition(struct cpufreq_policy *policy, struct cpufreq_freqs *freqs, unsigned int state) {
- for_each_cpu(freqs->cpu, policy->cpus)
__cpufreq_notify_transition(policy, freqs, state);
- if ((state != CPUFREQ_PRECHANGE) && (state != CPUFREQ_POSTCHANGE))
Wait a min, when is this condition ever true? I mean, what else can 'state' ever be, apart from CPUFREQ_PRECHANGE and POSTCHANGE?
return notify_transition_for_each_cpu(policy, freqs, state);
- /* Serialize pre-post notifications */
- mutex_lock(&policy->transition_lock);
Nope, this is definitely not the way to go, IMHO. We should enforce that the *callers* serialize the transitions, something like this:
cpufreq_transition_lock();
cpufreq_notify_transition(CPUFREQ_PRECHANGE);
//Perform the frequency change
cpufreq_notify_transition(CPUFREQ_POSTCHANGE);
cpufreq_transition_unlock();
That's it!
[ We can either introduce a new "transition" lock or perhaps even reuse the cpufreq_driver_lock if it fits... but the point is, the _caller_ has to perform the locking; trying to be smart inside cpufreq_notify_transition() is a recipe for headache :-( ]
Is there any problem with this approach due to which you didn't take this route?
- if (unlikely(WARN_ON(!policy->transition_ongoing &&
(state == CPUFREQ_POSTCHANGE)))) {
mutex_unlock(&policy->transition_lock);
return;
- }
- if (state == CPUFREQ_PRECHANGE) {
while (policy->transition_ongoing) {
mutex_unlock(&policy->transition_lock);
/* TODO: Can we do something better here? */
cpu_relax();
mutex_lock(&policy->transition_lock);
If the caller takes care of the synchronization, we can avoid these sorts of acrobatics ;-)
Regards, Srivatsa S. Bhat
}
policy->transition_ongoing = true;
mutex_unlock(&policy->transition_lock);
- }
- notify_transition_for_each_cpu(policy, freqs, state);
- if (state == CPUFREQ_POSTCHANGE) {
policy->transition_ongoing = false;
mutex_unlock(&policy->transition_lock);
- }
} EXPORT_SYMBOL_GPL(cpufreq_notify_transition);
@@ -983,6 +1017,7 @@ static struct cpufreq_policy *cpufreq_policy_alloc(void)
INIT_LIST_HEAD(&policy->policy_list); init_rwsem(&policy->rwsem);
mutex_init(&policy->transition_lock);
return policy;
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 31c431e..e5cebce 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -104,6 +104,8 @@ struct cpufreq_policy { * __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT); */ struct rw_semaphore rwsem;
- bool transition_ongoing; /* Tracks transition status */
- struct mutex transition_lock;
};
/* Only for ACPI */
On 18 March 2014 18:20, Srivatsa S. Bhat srivatsa.bhat@linux.vnet.ibm.com wrote:
On 03/14/2014 01:13 PM, Viresh Kumar wrote:
if ((state != CPUFREQ_PRECHANGE) && (state != CPUFREQ_POSTCHANGE))
Wait a min, when is this condition ever true? I mean, what else can 'state' ever be, apart from CPUFREQ_PRECHANGE and POSTCHANGE?
There were two more 'unused' states available: CPUFREQ_RESUMECHANGE and CPUFREQ_SUSPENDCHANGE
I have sent a patch to remove them now and this code would go away..
return notify_transition_for_each_cpu(policy, freqs, state);
/* Serialize pre-post notifications */
mutex_lock(&policy->transition_lock);
Nope, this is definitely not the way to go, IMHO. We should enforce that the *callers* serialize the transitions, something like this:
cpufreq_transition_lock(); cpufreq_notify_transition(CPUFREQ_PRECHANGE); //Perform the frequency change cpufreq_notify_transition(CPUFREQ_POSTCHANGE); cpufreq_transition_unlock();
That's it!
[ We can either introduce a new "transition" lock or perhaps even reuse the cpufreq_driver_lock if it fits... but the point is, the _caller_ has to perform the locking; trying to be smart inside cpufreq_notify_transition() is a recipe for headache :-( ]
Is there any problem with this approach due to which you didn't take this route?
I didn't wanted drivers to handle this as core must make sure things are in order. Over that it would have helped by not pasting redundant code everywhere..
Drivers are anyway going to call cpufreq_notify_transition(), why increase burden on them?
if (unlikely(WARN_ON(!policy->transition_ongoing &&
(state == CPUFREQ_POSTCHANGE)))) {
mutex_unlock(&policy->transition_lock);
return;
}
if (state == CPUFREQ_PRECHANGE) {
while (policy->transition_ongoing) {
mutex_unlock(&policy->transition_lock);
/* TODO: Can we do something better here? */
cpu_relax();
mutex_lock(&policy->transition_lock);
If the caller takes care of the synchronization, we can avoid these sorts of acrobatics ;-)
If we are fine with taking a mutex for the entire transition, then we can avoid above kind of acrobatics by just taking the mutex from PRECHANGE and leaving it at POSTCHANGE..
It will look like this then, hope this looks fine :)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 2677ff1..3b9eac4 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -335,8 +335,15 @@ static void __cpufreq_notify_transition(struct cpufreq_policy *policy, void cpufreq_notify_transition(struct cpufreq_policy *policy, struct cpufreq_freqs *freqs, unsigned int state) { + if (state == CPUFREQ_PRECHANGE) + mutex_lock(&policy->transition_lock); + + /* Send notifications */ for_each_cpu(freqs->cpu, policy->cpus) __cpufreq_notify_transition(policy, freqs, state); + + if (state == CPUFREQ_POSTCHANGE) + mutex_unlock(&policy->transition_lock); } EXPORT_SYMBOL_GPL(cpufreq_notify_transition);
@@ -983,6 +990,7 @@ static struct cpufreq_policy *cpufreq_policy_alloc(void)
INIT_LIST_HEAD(&policy->policy_list); init_rwsem(&policy->rwsem); + mutex_init(&policy->transition_lock);
return policy;
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 31c431e..5f9209a 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -104,6 +104,7 @@ struct cpufreq_policy { * __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT); */ struct rw_semaphore rwsem; + struct mutex transition_lock; };
On 03/19/2014 11:38 AM, Viresh Kumar wrote:
On 18 March 2014 18:20, Srivatsa S. Bhat srivatsa.bhat@linux.vnet.ibm.com wrote:
On 03/14/2014 01:13 PM, Viresh Kumar wrote:
if ((state != CPUFREQ_PRECHANGE) && (state != CPUFREQ_POSTCHANGE))
Wait a min, when is this condition ever true? I mean, what else can 'state' ever be, apart from CPUFREQ_PRECHANGE and POSTCHANGE?
There were two more 'unused' states available: CPUFREQ_RESUMECHANGE and CPUFREQ_SUSPENDCHANGE
I have sent a patch to remove them now and this code would go away..
Ok..
return notify_transition_for_each_cpu(policy, freqs, state);
/* Serialize pre-post notifications */
mutex_lock(&policy->transition_lock);
Nope, this is definitely not the way to go, IMHO. We should enforce that the *callers* serialize the transitions, something like this:
cpufreq_transition_lock(); cpufreq_notify_transition(CPUFREQ_PRECHANGE); //Perform the frequency change cpufreq_notify_transition(CPUFREQ_POSTCHANGE); cpufreq_transition_unlock();
That's it!
[ We can either introduce a new "transition" lock or perhaps even reuse the cpufreq_driver_lock if it fits... but the point is, the _caller_ has to perform the locking; trying to be smart inside cpufreq_notify_transition() is a recipe for headache :-( ]
Is there any problem with this approach due to which you didn't take this route?
Wait, I think I remember. The problem was about dealing with drivers that do asynchronous notification (those that have the ASYNC_NOTIFICATION flag set). In particular, exynos-5440 driver sends out the POSTCHANGE notification from a workqueue worker, much later than sending the PRECHANGE notification.
From what I saw, this is how the exynos-5440 driver works:
1. ->target() is invoked, and the driver writes to a register and returns to its caller.
2. An interrupt occurs that indicates that the frequency was changed.
3. The interrupt handler kicks off a worker thread which then sends out the POSTCHANGE notification.
So the important question here is, how does the exynos-5440 driver protect itself from say 2 ->target() calls which occur in close sequence (before allowing the entire chain for the first call to complete)?
As far as I can see there is no such synchronization in the driver at the moment. Adding Amit to CC for his comments.
Regards, Srivatsa S. Bhat
I didn't wanted drivers to handle this as core must make sure things are in order. Over that it would have helped by not pasting redundant code everywhere..
Drivers are anyway going to call cpufreq_notify_transition(), why increase burden on them?
if (unlikely(WARN_ON(!policy->transition_ongoing &&
(state == CPUFREQ_POSTCHANGE)))) {
mutex_unlock(&policy->transition_lock);
return;
}
if (state == CPUFREQ_PRECHANGE) {
while (policy->transition_ongoing) {
mutex_unlock(&policy->transition_lock);
/* TODO: Can we do something better here? */
cpu_relax();
mutex_lock(&policy->transition_lock);
If the caller takes care of the synchronization, we can avoid these sorts of acrobatics ;-)
If we are fine with taking a mutex for the entire transition, then we can avoid above kind of acrobatics by just taking the mutex from PRECHANGE and leaving it at POSTCHANGE..
It will look like this then, hope this looks fine :)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 2677ff1..3b9eac4 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -335,8 +335,15 @@ static void __cpufreq_notify_transition(struct cpufreq_policy *policy, void cpufreq_notify_transition(struct cpufreq_policy *policy, struct cpufreq_freqs *freqs, unsigned int state) {
if (state == CPUFREQ_PRECHANGE)
mutex_lock(&policy->transition_lock);
/* Send notifications */ for_each_cpu(freqs->cpu, policy->cpus) __cpufreq_notify_transition(policy, freqs, state);
if (state == CPUFREQ_POSTCHANGE)
mutex_unlock(&policy->transition_lock);
} EXPORT_SYMBOL_GPL(cpufreq_notify_transition);
@@ -983,6 +990,7 @@ static struct cpufreq_policy *cpufreq_policy_alloc(void)
INIT_LIST_HEAD(&policy->policy_list); init_rwsem(&policy->rwsem);
mutex_init(&policy->transition_lock); return policy;
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 31c431e..5f9209a 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -104,6 +104,7 @@ struct cpufreq_policy { * __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT); */ struct rw_semaphore rwsem;
struct mutex transition_lock;
};
On 19 March 2014 14:47, Srivatsa S. Bhat srivatsa.bhat@linux.vnet.ibm.com wrote:
Wait, I think I remember. The problem was about dealing with drivers that do asynchronous notification (those that have the ASYNC_NOTIFICATION flag set). In particular, exynos-5440 driver sends out the POSTCHANGE notification from a workqueue worker, much later than sending the PRECHANGE notification.
From what I saw, this is how the exynos-5440 driver works:
->target() is invoked, and the driver writes to a register and returns to its caller.
An interrupt occurs that indicates that the frequency was changed.
The interrupt handler kicks off a worker thread which then sends out the POSTCHANGE notification.
Correct!!
So the important question here is, how does the exynos-5440 driver protect itself from say 2 ->target() calls which occur in close sequence (before allowing the entire chain for the first call to complete)?
As far as I can see there is no such synchronization in the driver at the moment. Adding Amit to CC for his comments.
Yes, and that's what my patch is trying to fix. Where is the confusion?
On 03/19/2014 02:50 PM, Viresh Kumar wrote:
On 19 March 2014 14:47, Srivatsa S. Bhat srivatsa.bhat@linux.vnet.ibm.com wrote:
Wait, I think I remember. The problem was about dealing with drivers that do asynchronous notification (those that have the ASYNC_NOTIFICATION flag set). In particular, exynos-5440 driver sends out the POSTCHANGE notification from a workqueue worker, much later than sending the PRECHANGE notification.
From what I saw, this is how the exynos-5440 driver works:
->target() is invoked, and the driver writes to a register and returns to its caller.
An interrupt occurs that indicates that the frequency was changed.
The interrupt handler kicks off a worker thread which then sends out the POSTCHANGE notification.
Correct!!
So the important question here is, how does the exynos-5440 driver protect itself from say 2 ->target() calls which occur in close sequence (before allowing the entire chain for the first call to complete)?
As far as I can see there is no such synchronization in the driver at the moment. Adding Amit to CC for his comments.
Yes, and that's what my patch is trying to fix. Where is the confusion?
Sorry, for a moment I got confused and thought that your patch addresses the race conditions present in normal drivers alone, and not ASYNC_NOTIFICATION drivers. But now I understand that your patch intends to fix both the problems at once. I'll share my thoughts about the design in a separate reply.
Regards, Srivatsa S. Bhat
On 03/19/2014 11:38 AM, Viresh Kumar wrote:
On 18 March 2014 18:20, Srivatsa S. Bhat srivatsa.bhat@linux.vnet.ibm.com wrote:
On 03/14/2014 01:13 PM, Viresh Kumar wrote:
if ((state != CPUFREQ_PRECHANGE) && (state != CPUFREQ_POSTCHANGE))
Wait a min, when is this condition ever true? I mean, what else can 'state' ever be, apart from CPUFREQ_PRECHANGE and POSTCHANGE?
There were two more 'unused' states available: CPUFREQ_RESUMECHANGE and CPUFREQ_SUSPENDCHANGE
I have sent a patch to remove them now and this code would go away..
return notify_transition_for_each_cpu(policy, freqs, state);
/* Serialize pre-post notifications */
mutex_lock(&policy->transition_lock);
Nope, this is definitely not the way to go, IMHO. We should enforce that the *callers* serialize the transitions, something like this:
cpufreq_transition_lock(); cpufreq_notify_transition(CPUFREQ_PRECHANGE); //Perform the frequency change cpufreq_notify_transition(CPUFREQ_POSTCHANGE); cpufreq_transition_unlock();
That's it!
[ We can either introduce a new "transition" lock or perhaps even reuse the cpufreq_driver_lock if it fits... but the point is, the _caller_ has to perform the locking; trying to be smart inside cpufreq_notify_transition() is a recipe for headache :-( ]
Is there any problem with this approach due to which you didn't take this route?
I didn't wanted drivers to handle this as core must make sure things are in order. Over that it would have helped by not pasting redundant code everywhere..
Drivers are anyway going to call cpufreq_notify_transition(), why increase burden on them?
No, its not about burden. Its about the elegance of the design. We should not be overly "smart" in the cpufreq core. Hiding the synchronization inside the cpufreq core only encourages people to write buggy code in their drivers.
Why don't we go with what Rafael suggested? We can have dedicated begin_transition() and end_transition() calls to demarcate the frequency transitions. That way, it makes it very clear how the synchronization is done. Of course, these functions would be provided (exported) by the cpufreq core, by implementing them using locks/counters/whatever.
Basically what I'm arguing against, is the idea of having the cpufreq core figure out what the driver _intended_ to do, from inside the cpufreq_notify_transition() call.
What I would prefer instead is to have the cpufreq driver do something like this:
cpufreq_freq_transition_begin();
cpufreq_notify_transition(CPUFREQ_PRECHANGE);
//perform the frequency change
cpufreq_notify_transition(CPUFREQ_POSTCHANGE);
cpufreq_freq_transition_end();
[ASYNC_NOTIFICATION drivers will invoke the last two functions in a separate context/thread.]
Regards, Srivatsa S. Bhat
if (unlikely(WARN_ON(!policy->transition_ongoing &&
(state == CPUFREQ_POSTCHANGE)))) {
mutex_unlock(&policy->transition_lock);
return;
}
if (state == CPUFREQ_PRECHANGE) {
while (policy->transition_ongoing) {
mutex_unlock(&policy->transition_lock);
/* TODO: Can we do something better here? */
cpu_relax();
mutex_lock(&policy->transition_lock);
If the caller takes care of the synchronization, we can avoid these sorts of acrobatics ;-)
If we are fine with taking a mutex for the entire transition, then we can avoid above kind of acrobatics by just taking the mutex from PRECHANGE and leaving it at POSTCHANGE..
It will look like this then, hope this looks fine :)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 2677ff1..3b9eac4 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -335,8 +335,15 @@ static void __cpufreq_notify_transition(struct cpufreq_policy *policy, void cpufreq_notify_transition(struct cpufreq_policy *policy, struct cpufreq_freqs *freqs, unsigned int state) {
if (state == CPUFREQ_PRECHANGE)
mutex_lock(&policy->transition_lock);
/* Send notifications */ for_each_cpu(freqs->cpu, policy->cpus) __cpufreq_notify_transition(policy, freqs, state);
if (state == CPUFREQ_POSTCHANGE)
mutex_unlock(&policy->transition_lock);
} EXPORT_SYMBOL_GPL(cpufreq_notify_transition);
@@ -983,6 +990,7 @@ static struct cpufreq_policy *cpufreq_policy_alloc(void)
INIT_LIST_HEAD(&policy->policy_list); init_rwsem(&policy->rwsem);
mutex_init(&policy->transition_lock); return policy;
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 31c431e..5f9209a 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -104,6 +104,7 @@ struct cpufreq_policy { * __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT); */ struct rw_semaphore rwsem;
struct mutex transition_lock;
};
On 19 March 2014 15:20, Srivatsa S. Bhat srivatsa.bhat@linux.vnet.ibm.com wrote:
No, its not about burden. Its about the elegance of the design. We should not be overly "smart" in the cpufreq core. Hiding the synchronization inside the cpufreq core only encourages people to write buggy code in their drivers.
What kind of buggy code can be there? They are supposed to call notifiers in the order mentioned and so it shouldn't be a problem at all.. Don't know..
Why don't we go with what Rafael suggested? We can have dedicated begin_transition() and end_transition() calls to demarcate the frequency transitions. That way, it makes it very clear how the synchronization is done. Of course, these functions would be provided (exported) by the cpufreq core, by implementing them using locks/counters/whatever.
Basically what I'm arguing against, is the idea of having the cpufreq core figure out what the driver _intended_ to do, from inside the cpufreq_notify_transition() call.
What I would prefer instead is to have the cpufreq driver do something like this:
cpufreq_freq_transition_begin();
cpufreq_notify_transition(CPUFREQ_PRECHANGE);
Why do we need two routines then? What about doing notification from inside cpufreq_freq_transition_begin()?
This is a burden for driver writers, who don't normally understand the relevance of these calls in detail and may think, only the first one is enough or the second one is..
Its better if they simply let the core that they are starting to do transitions, i.e. cpufreq_freq_transition_begin() and then the core should send notifications.
//perform the frequency change
cpufreq_notify_transition(CPUFREQ_POSTCHANGE);
cpufreq_freq_transition_end();
[ASYNC_NOTIFICATION drivers will invoke the last two functions in a separate context/thread.]
Same for the last two routines and yes they would be called from separate thread for ASYNC_NOTIFICATION drivers..
On 03/19/2014 03:39 PM, Viresh Kumar wrote:
On 19 March 2014 15:20, Srivatsa S. Bhat srivatsa.bhat@linux.vnet.ibm.com wrote:
No, its not about burden. Its about the elegance of the design. We should not be overly "smart" in the cpufreq core. Hiding the synchronization inside the cpufreq core only encourages people to write buggy code in their drivers.
What kind of buggy code can be there? They are supposed to call notifiers in the order mentioned and so it shouldn't be a problem at all.. Don't know..
Why don't we go with what Rafael suggested? We can have dedicated begin_transition() and end_transition() calls to demarcate the frequency transitions. That way, it makes it very clear how the synchronization is done. Of course, these functions would be provided (exported) by the cpufreq core, by implementing them using locks/counters/whatever.
Basically what I'm arguing against, is the idea of having the cpufreq core figure out what the driver _intended_ to do, from inside the cpufreq_notify_transition() call.
What I would prefer instead is to have the cpufreq driver do something like this:
cpufreq_freq_transition_begin();
cpufreq_notify_transition(CPUFREQ_PRECHANGE);
Why do we need two routines then? What about doing notification from inside cpufreq_freq_transition_begin()?
Hmm, that's a good idea. I thought about ways to simplify the synchronization and this is what I came up with. Its completely untested though. Let me know what you think!
--------------------------------------------------------------------------- From: Srivatsa S. Bhat srivatsa.bhat@linux.vnet.ibm.com Subject: [PATCH v3] cpufreq: Make sure frequency transitions are serialized
Whenever we change the frequency of a CPU, we call the PRECHANGE and POSTCHANGE notifiers. They must be serialized, i.e. PRECHANGE and POSTCHANGE notifiers should strictly alternate, thereby preventing two different sets of PRECHANGE or POSTCHANGE notifiers from interleaving arbitrarily.
The following examples illustrate why this is important:
Scenario 1: ----------- A thread reading the value of cpuinfo_cur_freq, will call __cpufreq_cpu_get()->cpufreq_out_of_sync()->cpufreq_notify_transition()
The ondemand governor can decide to change the frequency of the CPU at the same time and hence it can end up sending the notifications via ->target().
If the notifiers are not serialized, the following sequence can occur: - PRECHANGE Notification for freq A (from cpuinfo_cur_freq) - PRECHANGE Notification for freq B (from target()) - Freq changed by target() to B - POSTCHANGE Notification for freq B - POSTCHANGE Notification for freq A
We can see from the above that the last POSTCHANGE Notification happens for freq A but the hardware is set to run at freq B.
Where would we break then?: adjust_jiffies() in cpufreq.c & cpufreq_callback() in arch/arm/kernel/smp.c (which also adjusts the jiffies). All the loops_per_jiffy calculations will get messed up.
Scenario 2: ----------- The governor calls __cpufreq_driver_target() to change the frequency. At the same time, if we change scaling_{min|max}_freq from sysfs, it will end up calling the governor's CPUFREQ_GOV_LIMITS notification, which will also call __cpufreq_driver_target(). And hence we end up issuing concurrent calls to ->target().
Typically, platforms have the following logic in their ->target() routines: (Eg: cpufreq-cpu0, omap, exynos, etc)
A. If new freq is more than old: Increase voltage B. Change freq C. If new freq is less than old: decrease voltage
Now, if the two concurrent calls to ->target() are X and Y, where X is trying to increase the freq and Y is trying to decrease it, we get the following race condition:
X.A: voltage gets increased for larger freq Y.A: nothing happens Y.B: freq gets decreased Y.C: voltage gets decreased X.B: freq gets increased X.C: nothing happens
Thus we can end up setting a freq which is not supported by the voltage we have set. That will probably make the clock to the CPU unstable and the system might not work properly anymore.
This patch introduces a set of synchronization primitives to serialize frequency transitions, which are to be used as shown below:
cpufreq_freq_transition_begin();
//Perform the frequency change
cpufreq_freq_transition_end();
The _begin() call sends the PRECHANGE notification whereas the _end() call sends the POSTCHANGE notification. Also, all the necessary synchronization is handled within these calls. In particular, even drivers which set the ASYNC_NOTIFICATION flag can also use these APIs for performing frequency transitions (ie., you can call _begin() from one task, and call the corresponding _end() from a different task).
The actual synchronization underneath is not that complicated:
The key challenge is to allow drivers to begin the transition from one thread and end it in a completely different thread (this is to enable drivers that do asynchronous POSTCHANGE notification from bottom-halves, to also use the same interface).
To achieve this, a 'transition_ongoing' flag, a 'transition_lock' mutex and a wait-queue are added per-policy. The flag and the wait-queue are used in conjunction to create an "uninterrupted flow" from _begin() to _end(). The mutex-lock is used to ensure that only one such "flow" is in flight at any given time. Put together, this provides us all the necessary synchronization.
Based-on-patch-by: Viresh Kumar viresh.kumar@linaro.org Signed-off-by: Srivatsa S. Bhat srivatsa.bhat@linux.vnet.ibm.com ---
drivers/cpufreq/cpufreq.c | 34 ++++++++++++++++++++++++++++++++++ include/linux/cpufreq.h | 5 +++++ 2 files changed, 39 insertions(+)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 199b52b..e90388f 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -349,6 +349,38 @@ void cpufreq_notify_post_transition(struct cpufreq_policy *policy, EXPORT_SYMBOL_GPL(cpufreq_notify_post_transition);
+void cpufreq_freq_transition_begin(struct cpufreq_policy *policy, + struct cpufreq_freqs *freqs, unsigned int state) +{ +wait: + wait_event(&policy->transition_wait, !policy->transition_ongoing); + + if (!mutex_trylock(&policy->transition_lock)) + goto wait; + + policy->transition_ongoing++; + + cpufreq_notify_transition(policy, freqs, CPUFREQ_PRECHANGE); + + mutex_unlock(&policy->transition_lock); +} + +void cpufreq_freq_transition_end(struct cpufreq_policy *policy, + struct cpufreq_freqs *freqs, unsigned int state) +{ + cpufreq_notify_transition(policy, freqs, CPUFREQ_POSTCHANGE); + + /* + * We don't need to take any locks for this update, since only + * one POSTCHANGE notification can be pending at any time, and + * at the moment, that's us :-) + */ + policy->transition_ongoing = false; + + wake_up(&policy->transition_wait); +} + + /********************************************************************* * SYSFS INTERFACE * *********************************************************************/ @@ -968,6 +1000,8 @@ static struct cpufreq_policy *cpufreq_policy_alloc(void)
INIT_LIST_HEAD(&policy->policy_list); init_rwsem(&policy->rwsem); + mutex_init(&policy->transition_lock); + init_waitqueue_head(&policy->transition_wait);
return policy;
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 4d89e0e..8bded24 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -101,6 +101,11 @@ struct cpufreq_policy { * __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT); */ struct rw_semaphore rwsem; + + /* Synchronization for frequency transitions */ + bool transition_ongoing; /* Tracks transition status */ + struct mutex transition_lock; + wait_queue_head_t transition_wait; };
/* Only for ACPI */
On 19 March 2014 17:45, Srivatsa S. Bhat srivatsa.bhat@linux.vnet.ibm.com wrote:
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
bool transition_ongoing; /* Tracks transition status */
struct mutex transition_lock;
wait_queue_head_t transition_wait;
Similar to what I have done in my last version, why do you need transition_ongoing and transition_wait? Simply work with transition_lock? i.e. Acquire it for the complete transition sequence.
On 03/19/2014 07:05 PM, Viresh Kumar wrote:
On 19 March 2014 17:45, Srivatsa S. Bhat srivatsa.bhat@linux.vnet.ibm.com wrote:
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
bool transition_ongoing; /* Tracks transition status */
struct mutex transition_lock;
wait_queue_head_t transition_wait;
Similar to what I have done in my last version, why do you need transition_ongoing and transition_wait? Simply work with transition_lock? i.e. Acquire it for the complete transition sequence.
We *can't* acquire it for the complete transition sequence in case of drivers that do asynchronous notification, because PRECHANGE is done in one thread and POSTCHANGE is done in a totally different thread! You can't acquire a lock in one task and release it in a different task. That would be a fundamental violation of locking.
That's why I introduced the wait queue to help us create a "flow" which encompasses 2 different, but co-ordinating tasks. You simply can't do that elegantly by using plain locks alone.
Regards, Srivatsa S. Bhat
On 03/19/2014 08:18 PM, Srivatsa S. Bhat wrote:
On 03/19/2014 07:05 PM, Viresh Kumar wrote:
On 19 March 2014 17:45, Srivatsa S. Bhat srivatsa.bhat@linux.vnet.ibm.com wrote:
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
bool transition_ongoing; /* Tracks transition status */
struct mutex transition_lock;
wait_queue_head_t transition_wait;
Similar to what I have done in my last version, why do you need transition_ongoing and transition_wait? Simply work with transition_lock? i.e. Acquire it for the complete transition sequence.
We *can't* acquire it for the complete transition sequence in case of drivers that do asynchronous notification, because PRECHANGE is done in one thread and POSTCHANGE is done in a totally different thread! You can't acquire a lock in one task and release it in a different task. That would be a fundamental violation of locking.
That's why I introduced the wait queue to help us create a "flow" which encompasses 2 different, but co-ordinating tasks. You simply can't do that elegantly by using plain locks alone.
By the way, note the updated changelog in my patch. It includes a brief overview of the synchronization design, which is copy-pasted below for reference. I forgot to mention this earlier!
-----
This patch introduces a set of synchronization primitives to serialize frequency transitions, which are to be used as shown below:
cpufreq_freq_transition_begin();
//Perform the frequency change
cpufreq_freq_transition_end();
The _begin() call sends the PRECHANGE notification whereas the _end() call sends the POSTCHANGE notification. Also, all the necessary synchronization is handled within these calls. In particular, even drivers which set the ASYNC_NOTIFICATION flag can also use these APIs for performing frequency transitions (ie., you can call _begin() from one task, and call the corresponding _end() from a different task).
The actual synchronization underneath is not that complicated:
The key challenge is to allow drivers to begin the transition from one thread and end it in a completely different thread (this is to enable drivers that do asynchronous POSTCHANGE notification from bottom-halves, to also use the same interface).
To achieve this, a 'transition_ongoing' flag, a 'transition_lock' mutex and a wait-queue are added per-policy. The flag and the wait-queue are used in conjunction to create an "uninterrupted flow" from _begin() to _end(). The mutex-lock is used to ensure that only one such "flow" is in flight at any given time. Put together, this provides us all the necessary synchronization.
Regards, Srivatsa S. Bhat
On 19 March 2014 17:45, Srivatsa S. Bhat srivatsa.bhat@linux.vnet.ibm.com wrote:
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 199b52b..e90388f 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -349,6 +349,38 @@ void cpufreq_notify_post_transition(struct cpufreq_policy *policy, EXPORT_SYMBOL_GPL(cpufreq_notify_post_transition);
+void cpufreq_freq_transition_begin(struct cpufreq_policy *policy,
struct cpufreq_freqs *freqs, unsigned int state)
+{ +wait:
wait_event(&policy->transition_wait, !policy->transition_ongoing);
I think its broken here. At this point another thread can come take lock, update transition_ongoing, send notification and finally unlock..
And after that we can take the lock and send another notification..
Correct?
if (!mutex_trylock(&policy->transition_lock))
goto wait;
policy->transition_ongoing++;
s/++/ = true
cpufreq_notify_transition(policy, freqs, CPUFREQ_PRECHANGE);
mutex_unlock(&policy->transition_lock);
We can release the lock before sending notifications, its there just to protect transition_ongoing.
On 03/20/2014 10:09 AM, Viresh Kumar wrote:
On 19 March 2014 17:45, Srivatsa S. Bhat srivatsa.bhat@linux.vnet.ibm.com wrote:
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 199b52b..e90388f 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -349,6 +349,38 @@ void cpufreq_notify_post_transition(struct cpufreq_policy *policy, EXPORT_SYMBOL_GPL(cpufreq_notify_post_transition);
+void cpufreq_freq_transition_begin(struct cpufreq_policy *policy,
struct cpufreq_freqs *freqs, unsigned int state)
+{ +wait:
wait_event(&policy->transition_wait, !policy->transition_ongoing);
I think its broken here. At this point another thread can come take lock, update transition_ongoing, send notification and finally unlock..
And after that we can take the lock and send another notification..
Correct?
Good catch! I missed that yesterday. Please find the updated patch below, with all your suggestions incorporated. Does this version look any better?
------------------------------------------------------------------------
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 199b52b..5283f10 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -349,6 +349,39 @@ void cpufreq_notify_post_transition(struct cpufreq_policy *policy, EXPORT_SYMBOL_GPL(cpufreq_notify_post_transition);
+void cpufreq_freq_transition_begin(struct cpufreq_policy *policy, + struct cpufreq_freqs *freqs, unsigned int state) +{ +wait: + wait_event(&policy->transition_wait, !policy->transition_ongoing); + + mutex_lock(&policy->transition_lock); + + if (policy->transition_ongoing) { + mutex_unlock(&policy->transition_lock); + goto wait; + } + + policy->transition_ongoing = true; + + mutex_unlock(&policy->transition_lock); + + cpufreq_notify_transition(policy, freqs, CPUFREQ_PRECHANGE); +} + +void cpufreq_freq_transition_end(struct cpufreq_policy *policy, + struct cpufreq_freqs *freqs, unsigned int state) +{ + cpufreq_notify_transition(policy, freqs, CPUFREQ_POSTCHANGE); + + mutex_lock(&policy->transition_lock); + policy->transition_ongoing = false; + mutex_unlock(&policy->transition_lock); + + wake_up(&policy->transition_wait); +} + + /********************************************************************* * SYSFS INTERFACE * *********************************************************************/ @@ -968,6 +1001,8 @@ static struct cpufreq_policy *cpufreq_policy_alloc(void)
INIT_LIST_HEAD(&policy->policy_list); init_rwsem(&policy->rwsem); + mutex_init(&policy->transition_lock); + init_waitqueue_head(&policy->transition_wait);
return policy;
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 4d89e0e..8bded24 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -101,6 +101,11 @@ struct cpufreq_policy { * __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT); */ struct rw_semaphore rwsem; + + /* Synchronization for frequency transitions */ + bool transition_ongoing; /* Tracks transition status */ + struct mutex transition_lock; + wait_queue_head_t transition_wait; };
/* Only for ACPI */
On 20 March 2014 14:02, Srivatsa S. Bhat srivatsa.bhat@linux.vnet.ibm.com wrote:
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 199b52b..5283f10 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -349,6 +349,39 @@ void cpufreq_notify_post_transition(struct cpufreq_policy *policy, EXPORT_SYMBOL_GPL(cpufreq_notify_post_transition);
+void cpufreq_freq_transition_begin(struct cpufreq_policy *policy,
struct cpufreq_freqs *freqs, unsigned int state)
+{ +wait:
wait_event(&policy->transition_wait, !policy->transition_ongoing);
mutex_lock(&policy->transition_lock);
if (policy->transition_ongoing) {
mutex_unlock(&policy->transition_lock);
goto wait;
}
policy->transition_ongoing = true;
mutex_unlock(&policy->transition_lock);
cpufreq_notify_transition(policy, freqs, CPUFREQ_PRECHANGE);
+}
+void cpufreq_freq_transition_end(struct cpufreq_policy *policy,
struct cpufreq_freqs *freqs, unsigned int state)
+{
cpufreq_notify_transition(policy, freqs, CPUFREQ_POSTCHANGE);
mutex_lock(&policy->transition_lock);
Why do we need locking here? You explained that earlier :)
Also, I would like to add this here:
WARN_ON(policy->transition_ongoing);
policy->transition_ongoing = false;
mutex_unlock(&policy->transition_lock);
wake_up(&policy->transition_wait);
+}
On 03/20/2014 02:07 PM, Viresh Kumar wrote:
On 20 March 2014 14:02, Srivatsa S. Bhat srivatsa.bhat@linux.vnet.ibm.com wrote:
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 199b52b..5283f10 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -349,6 +349,39 @@ void cpufreq_notify_post_transition(struct cpufreq_policy *policy, EXPORT_SYMBOL_GPL(cpufreq_notify_post_transition);
+void cpufreq_freq_transition_begin(struct cpufreq_policy *policy,
struct cpufreq_freqs *freqs, unsigned int state)
+{ +wait:
wait_event(&policy->transition_wait, !policy->transition_ongoing);
mutex_lock(&policy->transition_lock);
if (policy->transition_ongoing) {
mutex_unlock(&policy->transition_lock);
goto wait;
}
policy->transition_ongoing = true;
mutex_unlock(&policy->transition_lock);
cpufreq_notify_transition(policy, freqs, CPUFREQ_PRECHANGE);
+}
+void cpufreq_freq_transition_end(struct cpufreq_policy *policy,
struct cpufreq_freqs *freqs, unsigned int state)
+{
cpufreq_notify_transition(policy, freqs, CPUFREQ_POSTCHANGE);
mutex_lock(&policy->transition_lock);
Why do we need locking here? You explained that earlier :)
Hmm.. I had thought of some complex race condition which would make tasks miss the wake-up event and sleep forever, and hence added the locking there to prevent that. But now that I think more closely, I'm not able to recall that race... I will give some more thought to it and if I can't find any loopholes in doing the second update to the ongoing flag without locks, then I'll post the patchset with that lockless version itself.
Also, I would like to add this here:
WARN_ON(policy->transition_ongoing);
Hmm? Won't it always be true? We are the ones who set that flag to true earlier, right? I guess you meant WARN_ON(!policy->transition_ongoing) perhaps? I'm not sure whether its really worth it, because it kinda looks obvious. Not sure what kind of bugs it would catch. I can't think of any such scenario :-(
policy->transition_ongoing = false;
mutex_unlock(&policy->transition_lock);
wake_up(&policy->transition_wait);
+}
Regards, Srivatsa S. Bhat
On 20 March 2014 14:54, Srivatsa S. Bhat srivatsa.bhat@linux.vnet.ibm.com wrote:
On 03/20/2014 02:07 PM, Viresh Kumar wrote:
WARN_ON(policy->transition_ongoing);
I guess you meant WARN_ON(!policy->transition_ongoing) perhaps?
Ooops!!
I'm not sure whether its really worth it, because it kinda looks obvious. Not sure what kind of bugs it would catch. I can't think of any such scenario :-(
Just to catch if somebody is sending a POSTCHANGE one without first sending a PRECHANGE one.. Just another check to make sure things are in order.
On 03/20/2014 03:03 PM, Viresh Kumar wrote:
On 20 March 2014 14:54, Srivatsa S. Bhat srivatsa.bhat@linux.vnet.ibm.com wrote:
On 03/20/2014 02:07 PM, Viresh Kumar wrote:
WARN_ON(policy->transition_ongoing);
I guess you meant WARN_ON(!policy->transition_ongoing) perhaps?
Ooops!!
I'm not sure whether its really worth it, because it kinda looks obvious. Not sure what kind of bugs it would catch. I can't think of any such scenario :-(
Just to catch if somebody is sending a POSTCHANGE one without first sending a PRECHANGE one.. Just another check to make sure things are in order.
Well, that's unlikely, since they will have to call _end() before _begin() :-) That's the power of having great function names - they make it impossible to use them incorrectly ;-) But anyway, I can add the check, just in case somebody misses even such an obvious cue! :-)
By the way, I'm also thinking of using a spinlock instead of a mutex. The critical section is tiny and we don't sleep inside the critical section - sounds like the perfect case for a spinlock.
Regards, Srivatsa S. Bhat
On 20 March 2014 15:15, Srivatsa S. Bhat srivatsa.bhat@linux.vnet.ibm.com wrote:
By the way, I'm also thinking of using a spinlock instead of a mutex. The critical section is tiny and we don't sleep inside the critical section - sounds like the perfect case for a spinlock.
Probably yes.
On Wednesday, March 19, 2014 03:39:16 PM Viresh Kumar wrote:
On 19 March 2014 15:20, Srivatsa S. Bhat srivatsa.bhat@linux.vnet.ibm.com wrote:
No, its not about burden. Its about the elegance of the design. We should not be overly "smart" in the cpufreq core. Hiding the synchronization inside the cpufreq core only encourages people to write buggy code in their drivers.
What kind of buggy code can be there? They are supposed to call notifiers in the order mentioned and so it shouldn't be a problem at all.. Don't know..
Why don't we go with what Rafael suggested? We can have dedicated begin_transition() and end_transition() calls to demarcate the frequency transitions. That way, it makes it very clear how the synchronization is done. Of course, these functions would be provided (exported) by the cpufreq core, by implementing them using locks/counters/whatever.
Basically what I'm arguing against, is the idea of having the cpufreq core figure out what the driver _intended_ to do, from inside the cpufreq_notify_transition() call.
What I would prefer instead is to have the cpufreq driver do something like this:
cpufreq_freq_transition_begin();
cpufreq_notify_transition(CPUFREQ_PRECHANGE);
Why do we need two routines then? What about doing notification from inside cpufreq_freq_transition_begin()?
We can do that in my opinion.
This is a burden for driver writers, who don't normally understand the relevance of these calls in detail and may think, only the first one is enough or the second one is..
Its better if they simply let the core that they are starting to do transitions, i.e. cpufreq_freq_transition_begin() and then the core should send notifications.
//perform the frequency change
cpufreq_notify_transition(CPUFREQ_POSTCHANGE);
cpufreq_freq_transition_end();
[ASYNC_NOTIFICATION drivers will invoke the last two functions in a separate context/thread.]
Same for the last two routines and yes they would be called from separate thread for ASYNC_NOTIFICATION drivers..
That'd be fine by me in principle.
linaro-kernel@lists.linaro.org