Hi Dear Poynor/All cpufreq owner,
I found a kernel panic case caused by interactive governor when we enabled both hotplug governor and linked cpufreq features.
Below is the background and root cause, We are using the latest interactive governor on dual ca9 smp system, and linked cpufreq feature(cpumask_setall(policy->cpus) for each cpus) is enabled for them. Based on it, cpufreq governor could consider the both cpu's workload and make decision for the smp cores. And we also have a hotplug governor to monitor the system workload to make decision when will cpu1 could be hotplugged. 1. The default cpufreq governor during boot up is userspace.
2. After system is boot up, hotplug governor found system has no workload and cpu1 was hotplugged. Hotplug notifier will set cpu0’s policy->cpus only for cpu0.
3. After that, Cpufreq governor switched to Interactive, and its CPUFREQ_GOV_START will only init the cpu0’s info struct, such as pcpu->policy and governor_enabled etc.
4. A boost event comes, such as touching the screen, it will lead to CPU1 is plugged in at first, that means cpu0’s policy->cpus is linked to cpu0&cpu1 again, but it will not call interactive governor’s CPUFREQ_GOV_START to initialize cpu1’s info struct.
5. After that cpu0’s frequency changed, and it will notify interactive governor, in the notifier function(cpufreq_interactive_notifier) it will call update_load and access CPU1’s pcpu->policy->cur, but it is never initialized and lead to kernel panic.
In current interactive governor, if the governor is started after cpu1 is plugged out, later even the cpu1 is plugged in, it still won’t consider cpu1’s status as it thinks cpu1’s governor_enabled is still 0, cpu1's profiling timer will just return in case it thinks interactive governor is not enabled on cpu1. So the linked feature could not ensure it makes decision based on the max load of cpu0 and cpu1 on smp. It is not what we expected.
But there is no such issue in b.L system as all cpu has no link relationship. Looks like interactive governor doesn’t consider much for SMP.
I made a draft patch to solve the panic as below, but still could solve the issue interactive is enabled after cpu1 is hotplugged. Could you please take a look how to solve the interactive governor issue for SMP?
Thanks.
Date: Thu, 28 Mar 2013 14:03:40 +0800
Subject: [PATCH] cpufreq: interactive: fix race condition of hotplug and linked cpufreq
Signed-off-by: Zhoujie Wu zhoujiewu@gmail.com
---
drivers/cpufreq/cpufreq_interactive.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_interactive.c b/drivers/cpufreq/cpufreq_interactive.c
index 7d1952c..6941201 100644
--- a/drivers/cpufreq/cpufreq_interactive.c
+++ b/drivers/cpufreq/cpufreq_interactive.c
@@ -565,6 +565,8 @@ static int cpufreq_interactive_notifier(
for_each_cpu(cpu, pcpu->policy->cpus) {
struct cpufreq_interactive_cpuinfo *pjcpu =
&per_cpu(cpuinfo, cpu);
+ if (!pjcpu->governor_enabled)
+ continue;
spin_lock_irqsave(&pjcpu->load_lock, flags);
update_load(cpu);
spin_unlock_irqrestore(&pjcpu->load_lock, flags);
--
Hi Zhoujie,
On 28 March 2013 18:42, zhoujie wu zhoujiewu@gmail.com wrote:
I found a kernel panic case caused by interactive governor when we enabled both hotplug governor and linked cpufreq features.
Below is the background and root cause, We are using the latest interactive governor on dual ca9 smp system, and linked cpufreq feature(cpumask_setall(policy->cpus) for each cpus) is enabled for them. Based on it, cpufreq governor could consider the both cpu's workload and make decision for the smp cores. And we also have a hotplug governor to monitor the system workload to make decision when will cpu1 could be hotplugged.
The default cpufreq governor during boot up is userspace.
After system is boot up, hotplug governor found system has no
workload and cpu1 was hotplugged. Hotplug notifier will set cpu0’s policy->cpus only for cpu0.
- After that, Cpufreq governor switched to Interactive, and its
CPUFREQ_GOV_START will only init the cpu0’s info struct, such as pcpu->policy and governor_enabled etc.
- A boost event comes, such as touching the screen, it will lead to
CPU1 is plugged in at first, that means cpu0’s policy->cpus is linked to cpu0&cpu1 again, but it will not call interactive governor’s CPUFREQ_GOV_START to initialize cpu1’s info struct.
What kernel version are you using? I have pushed tonnes of patches in 3.9 which must fix this issue for you. Just try with that. It should be present in latest linaro release too.
- After that cpu0’s frequency changed, and it will notify interactive
governor, in the notifier function(cpufreq_interactive_notifier) it will call update_load and access CPU1’s pcpu->policy->cur, but it is never initialized and lead to kernel panic.
Same.
But there is no such issue in b.L system as all cpu has no link
b.L? big LITTLE, right? The same issue should be present in big LITTLE systems too with hot [un]plugging.. It might not be using a governor to offline/online cpu at runtime though.
Current big LITTLE system have two clusters (A7 and A15) and ->init() is called for first cpu of both clusters and they set policy->cpus correctly.
relationship. Looks like interactive governor doesn’t consider much for SMP.
b.L. is also SMP :)
I made a draft patch to solve the panic as below, but still could solve the issue interactive is enabled after cpu1 is hotplugged. Could you please take a look how to solve the interactive governor issue for SMP?
Probably not required.
Hi,
2013/3/28 Viresh Kumar viresh.kumar@linaro.org
Hi Zhoujie,
On 28 March 2013 18:42, zhoujie wu zhoujiewu@gmail.com wrote:
I found a kernel panic case caused by interactive governor when we enabled both hotplug governor and linked cpufreq features.
Below is the background and root cause, We are using the latest interactive governor on dual ca9 smp system, and linked cpufreq feature(cpumask_setall(policy->cpus) for each cpus) is enabled for them. Based on it, cpufreq governor could consider the both cpu's workload and make decision for the smp cores. And we also have a hotplug governor to monitor the system workload to make decision when will cpu1 could be hotplugged.
The default cpufreq governor during boot up is userspace.
After system is boot up, hotplug governor found system has no
workload and cpu1 was hotplugged. Hotplug notifier will set cpu0’s policy->cpus only for cpu0.
- After that, Cpufreq governor switched to Interactive, and its
CPUFREQ_GOV_START will only init the cpu0’s info struct, such as pcpu->policy and governor_enabled etc.
- A boost event comes, such as touching the screen, it will lead to
CPU1 is plugged in at first, that means cpu0’s policy->cpus is linked to cpu0&cpu1 again, but it will not call interactive governor’s CPUFREQ_GOV_START to initialize cpu1’s info struct.
What kernel version are you using? I have pushed tonnes of patches in 3.9 which must fix this issue for you. Just try with that. It should be present in latest linaro release too.
I have synced the governor_interactive.c from the latest linaro code branch integration-android-vexpress that release at Feb.2013.
The latest commit is cpufreq: interactive: fix race on governor start/stop
Is it aligned with your version? Could you please help to tell me which patch is used to fix this issue?
- After that cpu0’s frequency changed, and it will notify interactive
governor, in the notifier function(cpufreq_interactive_notifier) it will call update_load and access CPU1’s pcpu->policy->cur, but it is never initialized and lead to kernel panic.
Same.
But there is no such issue in b.L system as all cpu has no link
b.L? big LITTLE, right? The same issue should be present in big LITTLE systems too with hot [un]plugging.. It might not be using a governor to offline/online cpu at runtime though.
You are correct. In the system that enabled linked cpufreq+hotplug will show the same issue.
Current big LITTLE system have two clusters (A7 and A15) and ->init() is called for first cpu of both clusters and they set policy->cpus correctly.
relationship. Looks like interactive governor doesn’t consider much for SMP.
b.L. is also SMP :)
I made a draft patch to solve the panic as below, but still could solve the issue interactive is enabled after cpu1 is hotplugged. Could you please take a look how to solve the interactive governor issue for SMP?
Probably not required.
Thanks.
On 29 March 2013 20:19, zhoujie wu zhoujiewu@gmail.com wrote:
I have synced the governor_interactive.c from the latest linaro code branch integration-android-vexpress that release at Feb.2013.
I never said i have changes in interactive governor as i can't push that for 3.9. It isn't mainlined yet :)
The issue was present in cpufreq core rather than interactive governor. These are the fixes (Atleast these, but better you get latest stuff from latest rc release)
6954ca9 cpufreq: Don't use cpu removed during cpufreq_driver_unregister f6a7409 cpufreq: Notify governors when cpus are hot-[un]plugged 643ae6e cpufreq: Manage only online cpus
-- viresh
2013/3/29 Viresh Kumar viresh.kumar@linaro.org:
On 29 March 2013 20:19, zhoujie wu zhoujiewu@gmail.com wrote:
I have synced the governor_interactive.c from the latest linaro code branch integration-android-vexpress that release at Feb.2013.
I never said i have changes in interactive governor as i can't push that for 3.9. It isn't mainlined yet :)
The issue was present in cpufreq core rather than interactive governor. These are the fixes (Atleast these, but better you get latest stuff from latest rc release)
It is more clear to me now, previously i misunderstand the change is in governor:)
6954ca9 cpufreq: Don't use cpu removed during cpufreq_driver_unregister f6a7409 cpufreq: Notify governors when cpus are hot-[un]plugged 643ae6e cpufreq: Manage only online cpus
Ok, i will check the latest cpufreq modification with this part, and backport it to our 3.4 code base.
-- viresh
Thanks for your help.
-- Zhoujie Wu