There are several problems cpufreq stats in the way it handles cpufreq_unregister_driver() and suspend/resume..
- We must not loose data collected so far when suspend/resume happens and so stats directories must not be removed/allocated during these operations, Which is done currently.
- cpufreq_stat has registered notifiers with both cpufreq and hotplug. It adds sysfs stats directory with a cpufreq notifier: CPUFREQ_NOTIFY and removes this directory with a notifier from hotplug core.
In case cpufreq_unregister_driver() is called (on rmmod cpufreq driver), stats directories per cpu aren't removed as CPUs are still online. The only call cpufreq_stats gets is cpufreq_stats_update_policy_cpu for all CPUs except the last of each policy. And pointer to stat information is stored in the entry for last cpu in per-cpu cpufreq_stats_table. But policy structure would be freed inside cpufreq core and so that will result in memory leak inside cpufreq stats (as we are never freeing memory for stats).
Now if we again insert the module cpufreq_register_driver() will be called and we will again allocate stats data and put it on for first cpu of every policy. In case we only have a single cpu per policy we will return with a error from cpufreq_stats_create_table() due to below code:
if (per_cpu(cpufreq_stats_table, cpu)) return -EBUSY;
And so probably cpufreq stats directory would show up anymore (as it was added inside last policies->kobj which doesn't exist anymore). I haven't tested it though. Also the values in stats files wouldn't be refreshed as we are using the earlier stats structure.
- CPUFREQ_NOTIFY is called from cpufreq_set_policy() which is called for scenarios where we don't really want cpufreq_stat_notifier_policy() to get called. For example whenever we are changing anything related to a policy: min/max/current freq, etc.. cpufreq_set_policy() is called and so cpufreq stats is notified. Where we don't do any useful stuff other than simply returning with -EBUSY from cpufreq_stats_create_table(). And so this isn't the right notifier that cpufreq stats..
Due to all above reasons this patch does following changes: - Add new notifiers CPUFREQ_CREATE_POLICY and CPUFREQ_REMOVE_POLICY, which are only called when policy is created/destroyed. They aren't called for suspend/resume paths.. - Use these notifiers in cpufreq_stat_notifier_policy() to create/destory stats sysfs entries. And so cpufreq_unregister_driver() or suspend/resume shouldn't be a problem for cpufreq_stats. - Return early from cpufreq_stat_cpu_callback() for suspend/resume sequence, so that we don't free stats structure.
Acked-and-tested-by: Nicolas Pitre nico@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 5 +++++ drivers/cpufreq/cpufreq_stats.c | 24 +++++++++++++++++------- include/linux/cpufreq.h | 2 ++ 3 files changed, 24 insertions(+), 7 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 3509ca0..1afbe52 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -943,6 +943,9 @@ static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy) struct kobject *kobj; struct completion *cmp;
+ blocking_notifier_call_chain(&cpufreq_policy_notifier_list, + CPUFREQ_REMOVE_POLICY, policy); + down_read(&policy->rwsem); kobj = &policy->kobj; cmp = &policy->kobj_unregister; @@ -1148,6 +1151,8 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, ret = cpufreq_add_dev_interface(policy, dev); if (ret) goto err_out_unregister; + blocking_notifier_call_chain(&cpufreq_policy_notifier_list, + CPUFREQ_CREATE_POLICY, policy); }
write_lock_irqsave(&cpufreq_driver_lock, flags); diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index 4cf0d28..0f71562 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -277,7 +277,7 @@ static void cpufreq_stats_update_policy_cpu(struct cpufreq_policy *policy) static int cpufreq_stat_notifier_policy(struct notifier_block *nb, unsigned long val, void *data) { - int ret; + int ret = 0; struct cpufreq_policy *policy = data; struct cpufreq_frequency_table *table; unsigned int cpu = policy->cpu; @@ -287,15 +287,21 @@ static int cpufreq_stat_notifier_policy(struct notifier_block *nb, return 0; }
- if (val != CPUFREQ_NOTIFY) - return 0; table = cpufreq_frequency_get_table(cpu); if (!table) return 0; - ret = cpufreq_stats_create_table(policy, table); - if (ret) - return ret; - return 0; + + if (val == CPUFREQ_CREATE_POLICY) + ret = cpufreq_stats_create_table(policy, table); + else if (val == CPUFREQ_REMOVE_POLICY) { + /* This might already be freed by cpu hotplug notifier */ + if (per_cpu(cpufreq_stats_table, cpu)) { + cpufreq_stats_free_sysfs(cpu); + cpufreq_stats_free_table(cpu); + } + } + + return ret; }
static int cpufreq_stat_notifier_trans(struct notifier_block *nb, @@ -340,6 +346,10 @@ static int cpufreq_stat_cpu_callback(struct notifier_block *nfb, { unsigned int cpu = (unsigned long)hcpu;
+ /* Don't free/allocate stats during suspend/resume */ + if (action & CPU_TASKS_FROZEN) + return 0; + switch (action) { case CPU_DOWN_PREPARE: cpufreq_stats_free_sysfs(cpu); diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index aaf800e..bb727eb 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -308,6 +308,8 @@ cpufreq_verify_within_cpu_limits(struct cpufreq_policy *policy) #define CPUFREQ_NOTIFY (2) #define CPUFREQ_START (3) #define CPUFREQ_UPDATE_POLICY_CPU (4) +#define CPUFREQ_CREATE_POLICY (5) +#define CPUFREQ_REMOVE_POLICY (6)
#ifdef CONFIG_CPU_FREQ int cpufreq_register_notifier(struct notifier_block *nb, unsigned int list);
Either CPUs are hot-unplugged or suspend/resume occurs, cpufreq core will send notifications to cpufreq-stats and stats structure and sysfs entries would be correctly handled..
And so we don't actually need hotcpu notifiers in cpufreq-stats anymore. We were only handling cpu hot-unplug events here and that are already taken care of by POLICY notifiers.
Acked-and-tested-by: Nicolas Pitre nico@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_stats.c | 38 ++------------------------------------ 1 file changed, 2 insertions(+), 36 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index 0f71562..8fa5844 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -294,11 +294,8 @@ static int cpufreq_stat_notifier_policy(struct notifier_block *nb, if (val == CPUFREQ_CREATE_POLICY) ret = cpufreq_stats_create_table(policy, table); else if (val == CPUFREQ_REMOVE_POLICY) { - /* This might already be freed by cpu hotplug notifier */ - if (per_cpu(cpufreq_stats_table, cpu)) { - cpufreq_stats_free_sysfs(cpu); - cpufreq_stats_free_table(cpu); - } + cpufreq_stats_free_sysfs(cpu); + cpufreq_stats_free_table(cpu); }
return ret; @@ -340,33 +337,6 @@ static int cpufreq_stat_notifier_trans(struct notifier_block *nb, return 0; }
-static int cpufreq_stat_cpu_callback(struct notifier_block *nfb, - unsigned long action, - void *hcpu) -{ - unsigned int cpu = (unsigned long)hcpu; - - /* Don't free/allocate stats during suspend/resume */ - if (action & CPU_TASKS_FROZEN) - return 0; - - switch (action) { - case CPU_DOWN_PREPARE: - cpufreq_stats_free_sysfs(cpu); - break; - case CPU_DEAD: - cpufreq_stats_free_table(cpu); - break; - } - return NOTIFY_OK; -} - -/* priority=1 so this will get called before cpufreq_remove_dev */ -static struct notifier_block cpufreq_stat_cpu_notifier __refdata = { - .notifier_call = cpufreq_stat_cpu_callback, - .priority = 1, -}; - static struct notifier_block notifier_policy_block = { .notifier_call = cpufreq_stat_notifier_policy }; @@ -386,14 +356,11 @@ static int __init cpufreq_stats_init(void) if (ret) return ret;
- register_hotcpu_notifier(&cpufreq_stat_cpu_notifier); - ret = cpufreq_register_notifier(¬ifier_trans_block, CPUFREQ_TRANSITION_NOTIFIER); if (ret) { cpufreq_unregister_notifier(¬ifier_policy_block, CPUFREQ_POLICY_NOTIFIER); - unregister_hotcpu_notifier(&cpufreq_stat_cpu_notifier); for_each_online_cpu(cpu) cpufreq_stats_free_table(cpu); return ret; @@ -409,7 +376,6 @@ static void __exit cpufreq_stats_exit(void) CPUFREQ_POLICY_NOTIFIER); cpufreq_unregister_notifier(¬ifier_trans_block, CPUFREQ_TRANSITION_NOTIFIER); - unregister_hotcpu_notifier(&cpufreq_stat_cpu_notifier); for_each_online_cpu(cpu) { cpufreq_stats_free_table(cpu); cpufreq_stats_free_sysfs(cpu);
We don't have code paths now, where we need to do these two things separately and so better do them in a single routine. Just as they are allocated in a single routine.
Acked-and-tested-by: Nicolas Pitre nico@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_stats.c | 50 ++++++++++++++++------------------------- 1 file changed, 19 insertions(+), 31 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index 8fa5844..9dd5883 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -151,40 +151,32 @@ static int freq_table_get_index(struct cpufreq_stats *stat, unsigned int freq) return -1; }
-/* should be called late in the CPU removal sequence so that the stats - * memory is still available in case someone tries to use it. - */ -static void cpufreq_stats_free_table(unsigned int cpu) +static void __cpufreq_stats_free_table(struct cpufreq_policy *policy) { - struct cpufreq_stats *stat = per_cpu(cpufreq_stats_table, cpu); + struct cpufreq_stats *stat = per_cpu(cpufreq_stats_table, policy->cpu);
- if (stat) { - pr_debug("%s: Free stat table\n", __func__); - kfree(stat->time_in_state); - kfree(stat); - per_cpu(cpufreq_stats_table, cpu) = NULL; - } + if (!stat) + return; + + pr_debug("%s: Free stat table\n", __func__); + + sysfs_remove_group(&policy->kobj, &stats_attr_group); + kfree(stat->time_in_state); + kfree(stat); + per_cpu(cpufreq_stats_table, policy->cpu) = NULL; }
-/* must be called early in the CPU removal sequence (before - * cpufreq_remove_dev) so that policy is still valid. - */ -static void cpufreq_stats_free_sysfs(unsigned int cpu) +static void cpufreq_stats_free_table(unsigned int cpu) { - struct cpufreq_policy *policy = cpufreq_cpu_get(cpu); + struct cpufreq_policy *policy;
+ policy = cpufreq_cpu_get(cpu); if (!policy) return;
- if (!cpufreq_frequency_get_table(cpu)) - goto put_ref; - - if (!policy_is_shared(policy)) { - pr_debug("%s: Free sysfs stat\n", __func__); - sysfs_remove_group(&policy->kobj, &stats_attr_group); - } + if (cpufreq_frequency_get_table(policy->cpu)) + __cpufreq_stats_free_table(policy);
-put_ref: cpufreq_cpu_put(policy); }
@@ -293,10 +285,8 @@ static int cpufreq_stat_notifier_policy(struct notifier_block *nb,
if (val == CPUFREQ_CREATE_POLICY) ret = cpufreq_stats_create_table(policy, table); - else if (val == CPUFREQ_REMOVE_POLICY) { - cpufreq_stats_free_sysfs(cpu); - cpufreq_stats_free_table(cpu); - } + else if (val == CPUFREQ_REMOVE_POLICY) + __cpufreq_stats_free_table(policy);
return ret; } @@ -376,10 +366,8 @@ static void __exit cpufreq_stats_exit(void) CPUFREQ_POLICY_NOTIFIER); cpufreq_unregister_notifier(¬ifier_trans_block, CPUFREQ_TRANSITION_NOTIFIER); - for_each_online_cpu(cpu) { + for_each_online_cpu(cpu) cpufreq_stats_free_table(cpu); - cpufreq_stats_free_sysfs(cpu); - } }
MODULE_AUTHOR("Zou Nan hai nanhai.zou@intel.com");
When cpufreq_stats is compiled in as a module, cpufreq driver would have already been registered. And so the CPUFREQ_CREATE_POLICY notifiers wouldn't be called for it. Hence no sysfs entries for stats. :(
This patch calls cpufreq_stats_create_table() for each online cpu from cpufreq_stats_init() and so if policy is already created for CPUx then we will register sysfs stats for it.
When its not compiled as module, we will return early as policy wouldn't be found for any of the CPUs.
Acked-and-tested-by: Nicolas Pitre nico@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_stats.c | 27 +++++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index 9dd5883..5793e14 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -180,7 +180,7 @@ static void cpufreq_stats_free_table(unsigned int cpu) cpufreq_cpu_put(policy); }
-static int cpufreq_stats_create_table(struct cpufreq_policy *policy, +static int __cpufreq_stats_create_table(struct cpufreq_policy *policy, struct cpufreq_frequency_table *table) { unsigned int i, j, count = 0, ret = 0; @@ -253,6 +253,26 @@ error_get_fail: return ret; }
+static void cpufreq_stats_create_table(unsigned int cpu) +{ + struct cpufreq_policy *policy; + struct cpufreq_frequency_table *table; + + /* + * "likely(!policy)" because normally cpufreq_stats will be registered + * before cpufreq driver + */ + policy = cpufreq_cpu_get(cpu); + if (likely(!policy)) + return; + + table = cpufreq_frequency_get_table(policy->cpu); + if (likely(table)) + __cpufreq_stats_create_table(policy, table); + + cpufreq_cpu_put(policy); +} + static void cpufreq_stats_update_policy_cpu(struct cpufreq_policy *policy) { struct cpufreq_stats *stat = per_cpu(cpufreq_stats_table, @@ -284,7 +304,7 @@ static int cpufreq_stat_notifier_policy(struct notifier_block *nb, return 0;
if (val == CPUFREQ_CREATE_POLICY) - ret = cpufreq_stats_create_table(policy, table); + ret = __cpufreq_stats_create_table(policy, table); else if (val == CPUFREQ_REMOVE_POLICY) __cpufreq_stats_free_table(policy);
@@ -346,6 +366,9 @@ static int __init cpufreq_stats_init(void) if (ret) return ret;
+ for_each_online_cpu(cpu) + cpufreq_stats_create_table(cpu); + ret = cpufreq_register_notifier(¬ifier_trans_block, CPUFREQ_TRANSITION_NOTIFIER); if (ret) {
On Tuesday, January 07, 2014 07:10:10 AM Viresh Kumar wrote:
There are several problems cpufreq stats in the way it handles cpufreq_unregister_driver() and suspend/resume..
We must not loose data collected so far when suspend/resume happens and so stats directories must not be removed/allocated during these operations, Which is done currently.
cpufreq_stat has registered notifiers with both cpufreq and hotplug. It adds sysfs stats directory with a cpufreq notifier: CPUFREQ_NOTIFY and removes this directory with a notifier from hotplug core.
In case cpufreq_unregister_driver() is called (on rmmod cpufreq driver), stats directories per cpu aren't removed as CPUs are still online. The only call cpufreq_stats gets is cpufreq_stats_update_policy_cpu for all CPUs except the last of each policy. And pointer to stat information is stored in the entry for last cpu in per-cpu cpufreq_stats_table. But policy structure would be freed inside cpufreq core and so that will result in memory leak inside cpufreq stats (as we are never freeing memory for stats).
Now if we again insert the module cpufreq_register_driver() will be called and we will again allocate stats data and put it on for first cpu of every policy. In case we only have a single cpu per policy we will return with a error from cpufreq_stats_create_table() due to below code:
if (per_cpu(cpufreq_stats_table, cpu)) return -EBUSY;
And so probably cpufreq stats directory would show up anymore (as it was added inside last policies->kobj which doesn't exist anymore). I haven't tested it though. Also the values in stats files wouldn't be refreshed as we are using the earlier stats structure.
CPUFREQ_NOTIFY is called from cpufreq_set_policy() which is called for scenarios where we don't really want cpufreq_stat_notifier_policy() to get called. For example whenever we are changing anything related to a policy: min/max/current freq, etc.. cpufreq_set_policy() is called and so cpufreq stats is notified. Where we don't do any useful stuff other than simply returning with -EBUSY from cpufreq_stats_create_table(). And so this isn't the right notifier that cpufreq stats..
Due to all above reasons this patch does following changes:
- Add new notifiers CPUFREQ_CREATE_POLICY and CPUFREQ_REMOVE_POLICY, which are only called when policy is created/destroyed. They aren't called for suspend/resume paths..
- Use these notifiers in cpufreq_stat_notifier_policy() to create/destory stats sysfs entries. And so cpufreq_unregister_driver() or suspend/resume shouldn't be a problem for cpufreq_stats.
- Return early from cpufreq_stat_cpu_callback() for suspend/resume sequence, so that we don't free stats structure.
Acked-and-tested-by: Nicolas Pitre nico@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
The entire series applied to bleeding-edge, thanks!
linaro-kernel@lists.linaro.org