Hi Rafael,
You recently did this:
commit 878f6e074e9a7784a6e351512eace4ccb3542eef Author: Rafael J. Wysocki rafael.j.wysocki@intel.com Date: Sun Aug 18 15:35:59 2013 +0200
Revert "cpufreq: Use cpufreq_policy_list for iterating over policies"
Revert commit eb60852 (cpufreq: Use cpufreq_policy_list for iterating over policies), because it breaks system suspend/resume on multiple machines.
It either causes resume to block indefinitely or causes the BUG_ON() in lock_policy_rwsem_##mode() to trigger on sysfs accesses to cpufreq attributes.
------x------------x---------------
This patchset gets the reverted patch back along with few supporting patches. Cause of the initial problem you observed was this:
- At suspend all CPUs are removed leaving boot cpu. At this time policies aren't freed and also aren't removed from cpufreq_policy_list. And per-cpu variable cpufreq_cpu_data is marked as NULL. - At resume CPUs other than boot cpu called __cpufreq_add_dev(). The tricky change that was introduced by my patch was: We iterate over list of policies instead of CPUs, where we used to get policy structure associated with CPUs using per-cpu variable. Which used to be NULL for first CPU of a policy that turned up. For the first cpu we don't want to call cpufreq_add_policy_cpu() but want __cpufreq_add_add() to continue.
When we called cpufreq_add_policy_cpu() it tried to stop the governor (which was already stopped) and hence errors leading into unstable state.
This patchset fixes these issues and is tested with suspend-resume over my thinkpad with ubuntu. Apart from minor cleanups it removes policy from cpufreq_policy_list in case of suspend/resume as well and hence we will never call cpufreq_add_policy_cpu() for first cpu of a policy.
-- viresh
Viresh Kumar (5): cpufreq: align closing brace '}' of an if block cpufreq: remove policy from cpufreq_policy_list in system suspend cpufreq: remove unnecessary check in __cpufreq_governor() cpufreq: remove cpufreq_policy_cpu per-cpu variable cpufreq: Use cpufreq_policy_list for iterating over policies
drivers/cpufreq/cpufreq.c | 77 +++++++++++++++-------------------------------- 1 file changed, 24 insertions(+), 53 deletions(-)
Following patch caused this:
commit 3de9bdeb28638e164d1f0eb38dd68e3f5d2ac95c Author: Viresh Kumar viresh.kumar@linaro.org Date: Tue Aug 6 22:53:13 2013 +0530
cpufreq: improve error checking on return values of __cpufreq_governor()
Lets fix it.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index c0ef84d..fedc842 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1253,7 +1253,7 @@ static int __cpufreq_remove_dev(struct device *dev, __func__); return ret; } - } + }
if (!frozen) { lock_policy_rwsem_read(cpu);
cpufreq_policy_list is a list of active policies. We do remove policies from this list when all CPUs belonging to that policy are removed. But during suspend we don't really free a policy struct as it will be used again during resume. And so we didn't remove it from cpufreq_policy_list as well..
This is incorrect. We are saying this policy isn't valid anymore and must not be referenced (though we haven't freed it), but it can still be used by code that iterates over cpufreq_policy_list.
Lets remove policy from this list even in case of suspend as well.. Similarly we must add it back whenever the first cpu belonging to that policy turns up.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index fedc842..0302121 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -950,12 +950,6 @@ err_free_policy:
static void cpufreq_policy_free(struct cpufreq_policy *policy) { - unsigned long flags; - - write_lock_irqsave(&cpufreq_driver_lock, flags); - list_del(&policy->policy_list); - write_unlock_irqrestore(&cpufreq_driver_lock, flags); - free_cpumask_var(policy->related_cpus); free_cpumask_var(policy->cpus); kfree(policy); @@ -1069,12 +1063,12 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, ret = cpufreq_add_dev_interface(policy, dev); if (ret) goto err_out_unregister; - - write_lock_irqsave(&cpufreq_driver_lock, flags); - list_add(&policy->policy_list, &cpufreq_policy_list); - write_unlock_irqrestore(&cpufreq_driver_lock, flags); }
+ write_lock_irqsave(&cpufreq_driver_lock, flags); + list_add(&policy->policy_list, &cpufreq_policy_list); + write_unlock_irqrestore(&cpufreq_driver_lock, flags); + cpufreq_init_policy(policy);
kobject_uevent(&policy->kobj, KOBJ_ADD); @@ -1280,6 +1274,11 @@ static int __cpufreq_remove_dev(struct device *dev, if (cpufreq_driver->exit) cpufreq_driver->exit(policy);
+ /* Remove policy from list of active policies */ + write_lock_irqsave(&cpufreq_driver_lock, flags); + list_del(&policy->policy_list); + write_unlock_irqrestore(&cpufreq_driver_lock, flags); + if (!frozen) cpufreq_policy_free(policy); } else {
We don't need to check if event is CPUFREQ_GOV_POLICY_INIT and put governor module as we are sure event can only be START/STOP here.
And so remove this useless check.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 0302121..b03a851 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1719,8 +1719,6 @@ static int __cpufreq_governor(struct cpufreq_policy *policy, if ((!policy->governor_enabled && (event == CPUFREQ_GOV_STOP)) || (policy->governor_enabled && (event == CPUFREQ_GOV_START))) { mutex_unlock(&cpufreq_governor_lock); - if (event == CPUFREQ_GOV_POLICY_INIT) - module_put(policy->governor->owner); return -EBUSY; }
cpufreq_policy_cpu per-cpu variables are used for storing cpu that manages a cpu or its policy->cpu. We are also storing policy for each cpu in cpufreq_cpu_data and so this information is just redundant.
Better use cpufreq_cpu_data to retrieve policy and get policy->cpu from there. And so this patch removes cpufreq_policy_cpu per-cpu variable.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 45 ++++++++++----------------------------------- 1 file changed, 10 insertions(+), 35 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index b03a851..0586bd2 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -64,15 +64,14 @@ static DEFINE_PER_CPU(char[CPUFREQ_NAME_LEN], cpufreq_cpu_governor); * - Lock should not be held across * __cpufreq_governor(data, CPUFREQ_GOV_STOP); */ -static DEFINE_PER_CPU(int, cpufreq_policy_cpu); static DEFINE_PER_CPU(struct rw_semaphore, cpu_policy_rwsem);
#define lock_policy_rwsem(mode, cpu) \ static int lock_policy_rwsem_##mode(int cpu) \ { \ - int policy_cpu = per_cpu(cpufreq_policy_cpu, cpu); \ - BUG_ON(policy_cpu == -1); \ - down_##mode(&per_cpu(cpu_policy_rwsem, policy_cpu)); \ + struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu); \ + BUG_ON(!policy); \ + down_##mode(&per_cpu(cpu_policy_rwsem, policy->cpu)); \ \ return 0; \ } @@ -83,9 +82,9 @@ lock_policy_rwsem(write, cpu); #define unlock_policy_rwsem(mode, cpu) \ static void unlock_policy_rwsem_##mode(int cpu) \ { \ - int policy_cpu = per_cpu(cpufreq_policy_cpu, cpu); \ - BUG_ON(policy_cpu == -1); \ - up_##mode(&per_cpu(cpu_policy_rwsem, policy_cpu)); \ + struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu); \ + BUG_ON(!policy); \ + up_##mode(&per_cpu(cpu_policy_rwsem, policy->cpu)); \ }
unlock_policy_rwsem(read, cpu); @@ -887,7 +886,6 @@ static int cpufreq_add_policy_cpu(struct cpufreq_policy *policy, write_lock_irqsave(&cpufreq_driver_lock, flags);
cpumask_set_cpu(cpu, policy->cpus); - per_cpu(cpufreq_policy_cpu, cpu) = policy->cpu; per_cpu(cpufreq_cpu_data, cpu) = policy; write_unlock_irqrestore(&cpufreq_driver_lock, flags);
@@ -1013,9 +1011,6 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, policy->governor = CPUFREQ_DEFAULT_GOVERNOR; cpumask_copy(policy->cpus, cpumask_of(cpu));
- /* Initially set CPU itself as the policy_cpu */ - per_cpu(cpufreq_policy_cpu, cpu) = cpu; - init_completion(&policy->kobj_unregister); INIT_WORK(&policy->update, handle_update);
@@ -1053,10 +1048,8 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, #endif
write_lock_irqsave(&cpufreq_driver_lock, flags); - for_each_cpu(j, policy->cpus) { + for_each_cpu(j, policy->cpus) per_cpu(cpufreq_cpu_data, j) = policy; - per_cpu(cpufreq_policy_cpu, j) = policy->cpu; - } write_unlock_irqrestore(&cpufreq_driver_lock, flags);
if (!frozen) { @@ -1080,15 +1073,11 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif,
err_out_unregister: write_lock_irqsave(&cpufreq_driver_lock, flags); - for_each_cpu(j, policy->cpus) { + for_each_cpu(j, policy->cpus) per_cpu(cpufreq_cpu_data, j) = NULL; - if (j != cpu) - per_cpu(cpufreq_policy_cpu, j) = -1; - } write_unlock_irqrestore(&cpufreq_driver_lock, flags);
err_set_policy_cpu: - per_cpu(cpufreq_policy_cpu, cpu) = -1; cpufreq_policy_free(policy); nomem_out: up_read(&cpufreq_rwsem); @@ -1112,14 +1101,9 @@ static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
static void update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu) { - int j; - policy->last_cpu = policy->cpu; policy->cpu = cpu;
- for_each_cpu(j, policy->cpus) - per_cpu(cpufreq_policy_cpu, j) = cpu; - #ifdef CONFIG_CPU_FREQ_TABLE cpufreq_frequency_table_update_policy_cpu(policy); #endif @@ -1131,7 +1115,6 @@ static int cpufreq_nominate_new_policy_cpu(struct cpufreq_policy *policy, unsigned int old_cpu, bool frozen) { struct device *cpu_dev; - unsigned long flags; int ret;
/* first sibling now owns the new sysfs dir */ @@ -1148,11 +1131,6 @@ static int cpufreq_nominate_new_policy_cpu(struct cpufreq_policy *policy,
WARN_ON(lock_policy_rwsem_write(old_cpu)); cpumask_set_cpu(old_cpu, policy->cpus); - - write_lock_irqsave(&cpufreq_driver_lock, flags); - per_cpu(cpufreq_cpu_data, old_cpu) = policy; - write_unlock_irqrestore(&cpufreq_driver_lock, flags); - unlock_policy_rwsem_write(old_cpu);
ret = sysfs_create_link(&cpu_dev->kobj, &policy->kobj, @@ -1186,7 +1164,6 @@ static int __cpufreq_remove_dev(struct device *dev, write_lock_irqsave(&cpufreq_driver_lock, flags);
policy = per_cpu(cpufreq_cpu_data, cpu); - per_cpu(cpufreq_cpu_data, cpu) = NULL;
/* Save the policy somewhere when doing a light-weight tear-down */ if (frozen) @@ -1292,7 +1269,7 @@ static int __cpufreq_remove_dev(struct device *dev, } }
- per_cpu(cpufreq_policy_cpu, cpu) = -1; + per_cpu(cpufreq_cpu_data, cpu) = NULL; return 0; }
@@ -2148,10 +2125,8 @@ static int __init cpufreq_core_init(void) if (cpufreq_disabled()) return -ENODEV;
- for_each_possible_cpu(cpu) { - per_cpu(cpufreq_policy_cpu, cpu) = -1; + for_each_possible_cpu(cpu) init_rwsem(&per_cpu(cpu_policy_rwsem, cpu)); - }
cpufreq_global_kobject = kobject_create(); BUG_ON(!cpufreq_global_kobject);
To iterate over all policies we currently iterate over all CPUs and then get the policy for each of them. Let's use the newly created cpufreq_policy_list for this purpose.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 0586bd2..81ceea6 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -961,8 +961,8 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, struct cpufreq_policy *policy; unsigned long flags; #ifdef CONFIG_HOTPLUG_CPU + struct cpufreq_policy *tpolicy; struct cpufreq_governor *gov; - int sibling; #endif
if (cpu_is_offline(cpu)) @@ -985,11 +985,10 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, #ifdef CONFIG_HOTPLUG_CPU /* Check if this cpu was hot-unplugged earlier and has siblings */ read_lock_irqsave(&cpufreq_driver_lock, flags); - for_each_online_cpu(sibling) { - struct cpufreq_policy *cp = per_cpu(cpufreq_cpu_data, sibling); - if (cp && cpumask_test_cpu(cpu, cp->related_cpus)) { + list_for_each_entry(tpolicy, &cpufreq_policy_list, policy_list) { + if (cpumask_test_cpu(cpu, tpolicy->related_cpus)) { read_unlock_irqrestore(&cpufreq_driver_lock, flags); - ret = cpufreq_add_policy_cpu(cp, cpu, dev, frozen); + ret = cpufreq_add_policy_cpu(tpolicy, cpu, dev, frozen); up_read(&cpufreq_rwsem); return ret; }
On Tuesday, August 20, 2013 12:08:21 PM Viresh Kumar wrote:
Hi Rafael,
You recently did this:
commit 878f6e074e9a7784a6e351512eace4ccb3542eef Author: Rafael J. Wysocki rafael.j.wysocki@intel.com Date: Sun Aug 18 15:35:59 2013 +0200
Revert "cpufreq: Use cpufreq_policy_list for iterating over policies"
Revert commit eb60852 (cpufreq: Use cpufreq_policy_list for iterating over policies), because it breaks system suspend/resume on multiple machines. It either causes resume to block indefinitely or causes the BUG_ON() in lock_policy_rwsem_##mode() to trigger on sysfs accesses to cpufreq attributes. ------x------------x---------------
This patchset gets the reverted patch back along with few supporting patches. Cause of the initial problem you observed was this:
At suspend all CPUs are removed leaving boot cpu. At this time policies aren't freed and also aren't removed from cpufreq_policy_list. And per-cpu variable cpufreq_cpu_data is marked as NULL.
At resume CPUs other than boot cpu called __cpufreq_add_dev(). The tricky change that was introduced by my patch was: We iterate over list of policies instead of CPUs, where we used to get policy structure associated with CPUs using per-cpu variable. Which used to be NULL for first CPU of a policy that turned up. For the first cpu we don't want to call cpufreq_add_policy_cpu() but want __cpufreq_add_add() to continue.
When we called cpufreq_add_policy_cpu() it tried to stop the governor (which was already stopped) and hence errors leading into unstable state.
This patchset fixes these issues and is tested with suspend-resume over my thinkpad with ubuntu. Apart from minor cleanups it removes policy from cpufreq_policy_list in case of suspend/resume as well and hence we will never call cpufreq_add_policy_cpu() for first cpu of a policy.
Well, this looks good, but do we really need it in 3.12? It doesn't look like 3.12 will be missing these changes a lot?
Rafael
On 20 August 2013 18:13, Rafael J. Wysocki rjw@sisk.pl wrote:
Well, this looks good, but do we really need it in 3.12? It doesn't look like 3.12 will be missing these changes a lot?
Hmm.. If its too late for 3.12 we can skip them, but first two are bug fixes really (though nobody is affected as cpufreq_policy_list isn't used yet :) ) and other three are cleanups..
So, if there is any chance that we can queue them up for 3.12, it would be good.. but definitely not a must to have stuff. :)
-- viresh
On Tuesday, August 20, 2013 06:07:14 PM Viresh Kumar wrote:
On 20 August 2013 18:13, Rafael J. Wysocki rjw@sisk.pl wrote:
Well, this looks good, but do we really need it in 3.12? It doesn't look like 3.12 will be missing these changes a lot?
Hmm.. If its too late for 3.12 we can skip them, but first two are bug fixes really (though nobody is affected as cpufreq_policy_list isn't used yet :) ) and other three are cleanups..
So, if there is any chance that we can queue them up for 3.12, it would be good.. but definitely not a must to have stuff. :)
OK, queued up for 3.12. Hopefully, there won't be any fallout from [3/5].
Thanks, Rafael
On 20 August 2013 19:33, Rafael J. Wysocki rjw@sisk.pl wrote:
OK, queued up for 3.12. Hopefully, there won't be any fallout from [3/5].
Hopefully :)
Thanks..
On Tuesday, August 20, 2013 07:24:41 PM Viresh Kumar wrote:
On 20 August 2013 19:33, Rafael J. Wysocki rjw@sisk.pl wrote:
OK, queued up for 3.12. Hopefully, there won't be any fallout from [3/5].
Argh, I meant [4/5], typed a wrong number. Whatever.
linaro-kernel@lists.linaro.org