This commit was earlier commited in kernel as: 19c7630 cpufreq: serialize calls to __cpufreq_governor()
and was later reverted by Srivatsa: 56d07db cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes
When this commit was first introduced it was written for races during hotplug and because we got some other solution to take care of the races with hotplug we reverted it.
But (as I also said in the revert patch: https://lkml.org/lkml/2013/9/10/61) there are more cases where this is required.
Recently Robert shown an instance where changing governors with multiple threads leads to following warnings:
------------[ cut here ]------------ WARNING: CPU: 1 PID: 2458 at drivers/cpufreq/cpufreq_governor.c:261 cpufreq_governor_dbs+0x6d2/0x740() CPU: 1 PID: 2458 Comm: tee Tainted: G OE 3.16.0-rc6+ #1 Hardware name: FUJITSU ESPRIMO P700/D3061-A1, BIOS V4.6.4.0 R1.12.0 for D3061-A1x 07/04/2011 0000000000000009 ffff8800ae403b78 ffffffff8173b0bf 0000000000000000 ffff8800ae403bb0 ffffffff8106c82d 0000000000000000 ffff88022fa27000 0000000000000005 0000000000000002 ffffffff81cd5d00 ffff8800ae403bc0 Call Trace: [<ffffffff8173b0bf>] dump_stack+0x45/0x56 [<ffffffff8106c82d>] warn_slowpath_common+0x7d/0xa0 [<ffffffff8106c90a>] warn_slowpath_null+0x1a/0x20 [<ffffffff815e4a12>] cpufreq_governor_dbs+0x6d2/0x740 [<ffffffff810941fc>] ? notifier_call_chain+0x4c/0x70 [<ffffffff815e2757>] od_cpufreq_governor_dbs+0x17/0x20 [<ffffffff815dea50>] __cpufreq_governor+0xb0/0x2a0 [<ffffffff815ded8c>] cpufreq_set_policy+0x14c/0x2f0 [<ffffffff815df796>] store_scaling_governor+0x96/0xf0 [<ffffffff815df100>] ? cpufreq_update_policy+0x1d0/0x1d0 [<ffffffff815de3c9>] store+0x79/0xc0 [<ffffffff81245bed>] sysfs_kf_write+0x3d/0x50 [<ffffffff81245120>] kernfs_fop_write+0xe0/0x160 [<ffffffff811d00d7>] vfs_write+0xb7/0x1f0 [<ffffffff811d0c76>] SyS_write+0x46/0xb0 [<ffffffff817439ff>] tracesys+0xe1/0xe6 ---[ end trace a2dad7e42b22c796 ]--- BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff815e4395>] cpufreq_governor_dbs+0x55/0x740 PGD 36a05067 PUD b47df067 PMD 0 Oops: 0000 [#1] SMP
Robert also provided a small script to reproduce it: crash_governor.sh: for I in `seq 1000` do echo ondemand | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor echo userspace | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor done
runme.sh: for I in `seq 8` do ./crash_governor.sh & done
Just run runme.sh to crash your system :)
Introduce an additional variable which would guarantee serialization of __cpufreq_governor() routine.
Reported-and-tested-by: Robert Schöne robert.schoene@tu-dresden.de Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- Hi Rafael,
These fixes the issues reported by Robert. There is slight change after Robert tested my initial patch, 'bool' is replaced by 'int' for 'governor_state'.
Regardingn stable trees, I am not too sure. The first patch of this series was earlier applied on 3.12 and then was reverted quickly in the same release.
So, the best we can do is 3.12+.
drivers/cpufreq/cpufreq.c | 7 ++++++- include/linux/cpufreq.h | 1 + 2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index d9fdedd..a7ceae3 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2012,13 +2012,15 @@ static int __cpufreq_governor(struct cpufreq_policy *policy, policy->cpu, event);
mutex_lock(&cpufreq_governor_lock); - if ((policy->governor_enabled && event == CPUFREQ_GOV_START) + if (policy->governor_busy + || (policy->governor_enabled && event == CPUFREQ_GOV_START) || (!policy->governor_enabled && (event == CPUFREQ_GOV_LIMITS || event == CPUFREQ_GOV_STOP))) { mutex_unlock(&cpufreq_governor_lock); return -EBUSY; }
+ policy->governor_busy = true; if (event == CPUFREQ_GOV_STOP) policy->governor_enabled = false; else if (event == CPUFREQ_GOV_START) @@ -2047,6 +2049,9 @@ static int __cpufreq_governor(struct cpufreq_policy *policy, ((event == CPUFREQ_GOV_POLICY_EXIT) && !ret)) module_put(policy->governor->owner);
+ mutex_lock(&cpufreq_governor_lock); + policy->governor_busy = false; + mutex_unlock(&cpufreq_governor_lock); return ret; }
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 7d1955a..c7aa96b 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -82,6 +82,7 @@ struct cpufreq_policy { struct cpufreq_governor *governor; /* see below */ void *governor_data; bool governor_enabled; /* governor start/stop flag */ + bool governor_busy;
struct work_struct update; /* if update_policy() needs to be * called, but you're in IRQ context */