This commit was earlier commited in kernel as: 19c7630 cpufreq: serialize calls to __cpufreq_governor()
and was later reverted by Srivatsa: 56d07db cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes
When this commit was first introduced it was written for races during hotplug and because we got some other solution to take care of the races with hotplug we reverted it.
But (as I also said in the revert patch: https://lkml.org/lkml/2013/9/10/61) there are more cases where this is required.
Recently Robert shown an instance where changing governors with multiple threads leads to following warnings:
------------[ cut here ]------------ WARNING: CPU: 1 PID: 2458 at drivers/cpufreq/cpufreq_governor.c:261 cpufreq_governor_dbs+0x6d2/0x740() CPU: 1 PID: 2458 Comm: tee Tainted: G OE 3.16.0-rc6+ #1 Hardware name: FUJITSU ESPRIMO P700/D3061-A1, BIOS V4.6.4.0 R1.12.0 for D3061-A1x 07/04/2011 0000000000000009 ffff8800ae403b78 ffffffff8173b0bf 0000000000000000 ffff8800ae403bb0 ffffffff8106c82d 0000000000000000 ffff88022fa27000 0000000000000005 0000000000000002 ffffffff81cd5d00 ffff8800ae403bc0 Call Trace: [<ffffffff8173b0bf>] dump_stack+0x45/0x56 [<ffffffff8106c82d>] warn_slowpath_common+0x7d/0xa0 [<ffffffff8106c90a>] warn_slowpath_null+0x1a/0x20 [<ffffffff815e4a12>] cpufreq_governor_dbs+0x6d2/0x740 [<ffffffff810941fc>] ? notifier_call_chain+0x4c/0x70 [<ffffffff815e2757>] od_cpufreq_governor_dbs+0x17/0x20 [<ffffffff815dea50>] __cpufreq_governor+0xb0/0x2a0 [<ffffffff815ded8c>] cpufreq_set_policy+0x14c/0x2f0 [<ffffffff815df796>] store_scaling_governor+0x96/0xf0 [<ffffffff815df100>] ? cpufreq_update_policy+0x1d0/0x1d0 [<ffffffff815de3c9>] store+0x79/0xc0 [<ffffffff81245bed>] sysfs_kf_write+0x3d/0x50 [<ffffffff81245120>] kernfs_fop_write+0xe0/0x160 [<ffffffff811d00d7>] vfs_write+0xb7/0x1f0 [<ffffffff811d0c76>] SyS_write+0x46/0xb0 [<ffffffff817439ff>] tracesys+0xe1/0xe6 ---[ end trace a2dad7e42b22c796 ]--- BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff815e4395>] cpufreq_governor_dbs+0x55/0x740 PGD 36a05067 PUD b47df067 PMD 0 Oops: 0000 [#1] SMP
Robert also provided a small script to reproduce it: crash_governor.sh: for I in `seq 1000` do echo ondemand | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor echo userspace | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor done
runme.sh: for I in `seq 8` do ./crash_governor.sh & done
Just run runme.sh to crash your system :)
Introduce an additional variable which would guarantee serialization of __cpufreq_governor() routine.
Reported-and-tested-by: Robert Schöne robert.schoene@tu-dresden.de Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- Hi Rafael,
These fixes the issues reported by Robert. There is slight change after Robert tested my initial patch, 'bool' is replaced by 'int' for 'governor_state'.
Regardingn stable trees, I am not too sure. The first patch of this series was earlier applied on 3.12 and then was reverted quickly in the same release.
So, the best we can do is 3.12+.
drivers/cpufreq/cpufreq.c | 7 ++++++- include/linux/cpufreq.h | 1 + 2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index d9fdedd..a7ceae3 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2012,13 +2012,15 @@ static int __cpufreq_governor(struct cpufreq_policy *policy, policy->cpu, event);
mutex_lock(&cpufreq_governor_lock); - if ((policy->governor_enabled && event == CPUFREQ_GOV_START) + if (policy->governor_busy + || (policy->governor_enabled && event == CPUFREQ_GOV_START) || (!policy->governor_enabled && (event == CPUFREQ_GOV_LIMITS || event == CPUFREQ_GOV_STOP))) { mutex_unlock(&cpufreq_governor_lock); return -EBUSY; }
+ policy->governor_busy = true; if (event == CPUFREQ_GOV_STOP) policy->governor_enabled = false; else if (event == CPUFREQ_GOV_START) @@ -2047,6 +2049,9 @@ static int __cpufreq_governor(struct cpufreq_policy *policy, ((event == CPUFREQ_GOV_POLICY_EXIT) && !ret)) module_put(policy->governor->owner);
+ mutex_lock(&cpufreq_governor_lock); + policy->governor_busy = false; + mutex_unlock(&cpufreq_governor_lock); return ret; }
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 7d1955a..c7aa96b 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -82,6 +82,7 @@ struct cpufreq_policy { struct cpufreq_governor *governor; /* see below */ void *governor_data; bool governor_enabled; /* governor start/stop flag */ + bool governor_busy;
struct work_struct update; /* if update_policy() needs to be * called, but you're in IRQ context */
Even after serializing calls to __cpufreq_governor() there are some races left. The races are around doing the invalid operation during some state of cpufreq governors. For example, while the governor is in CPUFREQ_GOV_POLICY_EXIT state, we can't do CPUFREQ_GOV_START without first doing CPUFREQ_GOV_POLICY_INIT.
All these cases weren't handled elegantly in __cpufreq_governor() and so there were enough chances that things may go wrong when governors are changed with multiple thread in parallel.
This patch renames an existing field 'policy->governor_enabled' as 'policy->governor_state' which can have values other than 0 & 1 now. Its type is also changed to 'int' from 'bool'.
We maintain the current state of governors for each policy now and reject any invalid operation.
Reported-and-tested-by: Robert Schöne robert.schoene@tu-dresden.de Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 26 +++++++++++++------------- drivers/cpufreq/cpufreq_governor.c | 2 +- include/linux/cpufreq.h | 2 +- 3 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index a7ceae3..c597361 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -935,6 +935,7 @@ static void cpufreq_init_policy(struct cpufreq_policy *policy) struct cpufreq_policy new_policy; int ret = 0;
+ policy->governor_state = CPUFREQ_GOV_POLICY_EXIT; memcpy(&new_policy, policy, sizeof(*policy));
/* Update governor of new_policy to the governor used before hotplug */ @@ -1976,7 +1977,7 @@ EXPORT_SYMBOL_GPL(cpufreq_driver_target); static int __cpufreq_governor(struct cpufreq_policy *policy, unsigned int event) { - int ret; + int ret, state;
/* Only must be defined when default governor is known to have latency restrictions, like e.g. conservative or ondemand. @@ -2012,19 +2013,21 @@ static int __cpufreq_governor(struct cpufreq_policy *policy, policy->cpu, event);
mutex_lock(&cpufreq_governor_lock); + state = policy->governor_state; + + /* Check if operation is permitted or not */ if (policy->governor_busy - || (policy->governor_enabled && event == CPUFREQ_GOV_START) - || (!policy->governor_enabled - && (event == CPUFREQ_GOV_LIMITS || event == CPUFREQ_GOV_STOP))) { + || (state == CPUFREQ_GOV_START && event != CPUFREQ_GOV_LIMITS && event != CPUFREQ_GOV_STOP) + || (state == CPUFREQ_GOV_STOP && event != CPUFREQ_GOV_START && event != CPUFREQ_GOV_POLICY_EXIT) + || (state == CPUFREQ_GOV_POLICY_INIT && event != CPUFREQ_GOV_START && event != CPUFREQ_GOV_POLICY_EXIT) + || (state == CPUFREQ_GOV_POLICY_EXIT && event != CPUFREQ_GOV_POLICY_INIT)) { mutex_unlock(&cpufreq_governor_lock); return -EBUSY; }
policy->governor_busy = true; - if (event == CPUFREQ_GOV_STOP) - policy->governor_enabled = false; - else if (event == CPUFREQ_GOV_START) - policy->governor_enabled = true; + if (event != CPUFREQ_GOV_LIMITS) + policy->governor_state = event;
mutex_unlock(&cpufreq_governor_lock);
@@ -2035,13 +2038,10 @@ static int __cpufreq_governor(struct cpufreq_policy *policy, policy->governor->initialized++; else if (event == CPUFREQ_GOV_POLICY_EXIT) policy->governor->initialized--; - } else { + } else if (event != CPUFREQ_GOV_LIMITS) { /* Restore original values */ mutex_lock(&cpufreq_governor_lock); - if (event == CPUFREQ_GOV_STOP) - policy->governor_enabled = true; - else if (event == CPUFREQ_GOV_START) - policy->governor_enabled = false; + policy->governor_state = state; mutex_unlock(&cpufreq_governor_lock); }
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 1b44496..d173181 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -174,7 +174,7 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, int i;
mutex_lock(&cpufreq_governor_lock); - if (!policy->governor_enabled) + if (policy->governor_state != CPUFREQ_GOV_START) goto out_unlock;
if (!all_cpus) { diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index c7aa96b..fb20fc5 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -81,7 +81,7 @@ struct cpufreq_policy { unsigned int policy; /* see above */ struct cpufreq_governor *governor; /* see below */ void *governor_data; - bool governor_enabled; /* governor start/stop flag */ + int governor_state; /* Governor's current state */ bool governor_busy;
struct work_struct update; /* if update_policy() needs to be
Even though the test fixed my initital problem, my test system (which runs the kernel with this patch applied) just crashed when changing frequencies. This might be connected to the patch.
I'll have another look into it and let you know.
Am Dienstag, den 09.09.2014, 09:46 +0530 schrieb Viresh Kumar:
This commit was earlier commited in kernel as: 19c7630 cpufreq: serialize calls to __cpufreq_governor()
and was later reverted by Srivatsa: 56d07db cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes
When this commit was first introduced it was written for races during hotplug and because we got some other solution to take care of the races with hotplug we reverted it.
But (as I also said in the revert patch: https://lkml.org/lkml/2013/9/10/61) there are more cases where this is required.
On 9 September 2014 12:59, Robert Schöne robert.schoene@tu-dresden.de wrote:
Even though the test fixed my initital problem, my test system (which runs the kernel with this patch applied) just crashed when changing
Only this patch ? or both 1/2 & 2/2 ??
frequencies. This might be connected to the patch.
I'll have another look into it and let you know.
Hmm, that's strange. Probably some other (existing) bug woken up with this change.. Also this patch will come into picture while changing governor tunables and not during frequency switches.. So they looked a bit unrelated to me..
Yeah, please let us know if this breaks anything.
On Tuesday, September 09, 2014 09:46:39 AM Viresh Kumar wrote:
This commit was earlier commited in kernel as: 19c7630 cpufreq: serialize calls to __cpufreq_governor()
and was later reverted by Srivatsa: 56d07db cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes
When this commit was first introduced it was written for races during hotplug and because we got some other solution to take care of the races with hotplug we reverted it.
But (as I also said in the revert patch: https://lkml.org/lkml/2013/9/10/61) there are more cases where this is required.
Recently Robert shown an instance where changing governors with multiple threads leads to following warnings:
------------[ cut here ]------------ WARNING: CPU: 1 PID: 2458 at drivers/cpufreq/cpufreq_governor.c:261 cpufreq_governor_dbs+0x6d2/0x740() CPU: 1 PID: 2458 Comm: tee Tainted: G OE 3.16.0-rc6+ #1 Hardware name: FUJITSU ESPRIMO P700/D3061-A1, BIOS V4.6.4.0 R1.12.0 for D3061-A1x 07/04/2011 0000000000000009 ffff8800ae403b78 ffffffff8173b0bf 0000000000000000 ffff8800ae403bb0 ffffffff8106c82d 0000000000000000 ffff88022fa27000 0000000000000005 0000000000000002 ffffffff81cd5d00 ffff8800ae403bc0 Call Trace: [<ffffffff8173b0bf>] dump_stack+0x45/0x56 [<ffffffff8106c82d>] warn_slowpath_common+0x7d/0xa0 [<ffffffff8106c90a>] warn_slowpath_null+0x1a/0x20 [<ffffffff815e4a12>] cpufreq_governor_dbs+0x6d2/0x740 [<ffffffff810941fc>] ? notifier_call_chain+0x4c/0x70 [<ffffffff815e2757>] od_cpufreq_governor_dbs+0x17/0x20 [<ffffffff815dea50>] __cpufreq_governor+0xb0/0x2a0 [<ffffffff815ded8c>] cpufreq_set_policy+0x14c/0x2f0 [<ffffffff815df796>] store_scaling_governor+0x96/0xf0 [<ffffffff815df100>] ? cpufreq_update_policy+0x1d0/0x1d0 [<ffffffff815de3c9>] store+0x79/0xc0 [<ffffffff81245bed>] sysfs_kf_write+0x3d/0x50 [<ffffffff81245120>] kernfs_fop_write+0xe0/0x160 [<ffffffff811d00d7>] vfs_write+0xb7/0x1f0 [<ffffffff811d0c76>] SyS_write+0x46/0xb0 [<ffffffff817439ff>] tracesys+0xe1/0xe6 ---[ end trace a2dad7e42b22c796 ]--- BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff815e4395>] cpufreq_governor_dbs+0x55/0x740 PGD 36a05067 PUD b47df067 PMD 0 Oops: 0000 [#1] SMP
Robert also provided a small script to reproduce it: crash_governor.sh: for I in `seq 1000` do echo ondemand | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor echo userspace | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor done
runme.sh: for I in `seq 8` do ./crash_governor.sh & done
Just run runme.sh to crash your system :)
Introduce an additional variable which would guarantee serialization of __cpufreq_governor() routine.
Reported-and-tested-by: Robert Schöne robert.schoene@tu-dresden.de Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Hi Rafael,
These fixes the issues reported by Robert. There is slight change after Robert tested my initial patch, 'bool' is replaced by 'int' for 'governor_state'.
Regardingn stable trees, I am not too sure. The first patch of this series was earlier applied on 3.12 and then was reverted quickly in the same release.
So, the best we can do is 3.12+.
What's the current status of this series?
What's the current status of this series?
We had some iterations of patches, but the only solution that works for me is the patch with the coarse-grained lock that I sent at Mon, 08 Sep 2014 10:16:48 CEST [1] Viresh is pretty occupied lately, but he told me that he might do the tests himself when the current period of busyness is over as he is supplied with a test system. I'm not sure about his current status (busy or testing).
[1] http://permalink.gmane.org/gmane.linux.power-management.general/49272
On 24 September 2014 16:46, Rafael J. Wysocki rjw@rjwysocki.net wrote:
What's the current status of this series?
Started working again after two weeks today only and so it might take some days for things to get stable at my end. Should be able to test things after that..
On 09/09/2014 12:16 AM, Viresh Kumar wrote:
This commit was earlier commited in kernel as: 19c7630 cpufreq: serialize calls to __cpufreq_governor()
and was later reverted by Srivatsa: 56d07db cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes
When this commit was first introduced it was written for races during hotplug and because we got some other solution to take care of the races with hotplug we reverted it.
But (as I also said in the revert patch: https://lkml.org/lkml/2013/9/10/61) there are more cases where this is required.
Recently Robert shown an instance where changing governors with multiple threads leads to following warnings:
------------[ cut here ]------------ WARNING: CPU: 1 PID: 2458 at drivers/cpufreq/cpufreq_governor.c:261 cpufreq_governor_dbs+0x6d2/0x740() CPU: 1 PID: 2458 Comm: tee Tainted: G OE 3.16.0-rc6+ #1 Hardware name: FUJITSU ESPRIMO P700/D3061-A1, BIOS V4.6.4.0 R1.12.0 for D3061-A1x 07/04/2011 0000000000000009 ffff8800ae403b78 ffffffff8173b0bf 0000000000000000 ffff8800ae403bb0 ffffffff8106c82d 0000000000000000 ffff88022fa27000 0000000000000005 0000000000000002 ffffffff81cd5d00 ffff8800ae403bc0 Call Trace: [<ffffffff8173b0bf>] dump_stack+0x45/0x56 [<ffffffff8106c82d>] warn_slowpath_common+0x7d/0xa0 [<ffffffff8106c90a>] warn_slowpath_null+0x1a/0x20 [<ffffffff815e4a12>] cpufreq_governor_dbs+0x6d2/0x740 [<ffffffff810941fc>] ? notifier_call_chain+0x4c/0x70 [<ffffffff815e2757>] od_cpufreq_governor_dbs+0x17/0x20 [<ffffffff815dea50>] __cpufreq_governor+0xb0/0x2a0 [<ffffffff815ded8c>] cpufreq_set_policy+0x14c/0x2f0 [<ffffffff815df796>] store_scaling_governor+0x96/0xf0 [<ffffffff815df100>] ? cpufreq_update_policy+0x1d0/0x1d0 [<ffffffff815de3c9>] store+0x79/0xc0 [<ffffffff81245bed>] sysfs_kf_write+0x3d/0x50 [<ffffffff81245120>] kernfs_fop_write+0xe0/0x160 [<ffffffff811d00d7>] vfs_write+0xb7/0x1f0 [<ffffffff811d0c76>] SyS_write+0x46/0xb0 [<ffffffff817439ff>] tracesys+0xe1/0xe6 ---[ end trace a2dad7e42b22c796 ]--- BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff815e4395>] cpufreq_governor_dbs+0x55/0x740 PGD 36a05067 PUD b47df067 PMD 0 Oops: 0000 [#1] SMP
Robert also provided a small script to reproduce it: crash_governor.sh: for I in `seq 1000` do echo ondemand | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor echo userspace | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor done
runme.sh: for I in `seq 8` do ./crash_governor.sh & done
Just run runme.sh to crash your system :)
This is exactly the same issue I mentioned a few weeks ago and traced back to 955ef4833574636819cd269cfbae12f79cbde63a which drops the lock around the CPUFREQ_GOV_POLICY_EXIT __cpufreq_governor() call.
Just my two cents -- I don't think that adding a new lock/locking scheme is the way to fix this.
P.
Introduce an additional variable which would guarantee serialization of __cpufreq_governor() routine.
Reported-and-tested-by: Robert Schöne robert.schoene@tu-dresden.de Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Hi Rafael,
These fixes the issues reported by Robert. There is slight change after Robert tested my initial patch, 'bool' is replaced by 'int' for 'governor_state'.
Regardingn stable trees, I am not too sure. The first patch of this series was earlier applied on 3.12 and then was reverted quickly in the same release.
So, the best we can do is 3.12+.
drivers/cpufreq/cpufreq.c | 7 ++++++- include/linux/cpufreq.h | 1 + 2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index d9fdedd..a7ceae3 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2012,13 +2012,15 @@ static int __cpufreq_governor(struct cpufreq_policy *policy, policy->cpu, event); mutex_lock(&cpufreq_governor_lock);
- if ((policy->governor_enabled && event == CPUFREQ_GOV_START)
- if (policy->governor_busy
mutex_unlock(&cpufreq_governor_lock); return -EBUSY; }|| (policy->governor_enabled && event == CPUFREQ_GOV_START) || (!policy->governor_enabled && (event == CPUFREQ_GOV_LIMITS || event == CPUFREQ_GOV_STOP))) {
- policy->governor_busy = true; if (event == CPUFREQ_GOV_STOP) policy->governor_enabled = false; else if (event == CPUFREQ_GOV_START)
@@ -2047,6 +2049,9 @@ static int __cpufreq_governor(struct cpufreq_policy *policy, ((event == CPUFREQ_GOV_POLICY_EXIT) && !ret)) module_put(policy->governor->owner);
- mutex_lock(&cpufreq_governor_lock);
- policy->governor_busy = false;
- mutex_unlock(&cpufreq_governor_lock); return ret;
} diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 7d1955a..c7aa96b 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -82,6 +82,7 @@ struct cpufreq_policy { struct cpufreq_governor *governor; /* see below */ void *governor_data; bool governor_enabled; /* governor start/stop flag */
- bool governor_busy;
struct work_struct update; /* if update_policy() needs to be * called, but you're in IRQ context */
On 29 September 2014 16:59, Prarit Bhargava prarit@redhat.com wrote:
This is exactly the same issue I mentioned a few weeks ago and traced back to 955ef4833574636819cd269cfbae12f79cbde63a which drops the lock around the CPUFREQ_GOV_POLICY_EXIT __cpufreq_governor() call.
Just my two cents -- I don't think that adding a new lock/locking scheme is the way to fix this.
Me and Robert are just inches away from fixing it. Just that the remote testing by Robert and patches from me aren't working well together.. I need to do this myself and have a board to reproduce it now.. But would take some time to get going...
And yes, I am also against another lock here :)
On 09/29/2014 07:38 AM, Viresh Kumar wrote:
On 29 September 2014 16:59, Prarit Bhargava prarit@redhat.com wrote:
This is exactly the same issue I mentioned a few weeks ago and traced back to 955ef4833574636819cd269cfbae12f79cbde63a which drops the lock around the CPUFREQ_GOV_POLICY_EXIT __cpufreq_governor() call.
Just my two cents -- I don't think that adding a new lock/locking scheme is the way to fix this.
Me and Robert are just inches away from fixing it. Just that the remote testing by Robert and patches from me aren't working well together.. I need to do this myself and have a board to reproduce it now.. But would take some time to get going...
And yes, I am also against another lock here :)
Send me what you have in mind -- I can always take a look and put it through tests as well.
P.
On 29 September 2014 17:20, Prarit Bhargava prarit@redhat.com wrote:
Send me what you have in mind -- I can always take a look and put it through tests as well.
The last set of changes I gave to Robert:
git://git.linaro.org/people/viresh.kumar/linux.git /cpufreq/governor-fixes
linaro-kernel@lists.linaro.org