Hi Rafael,
The aim of this series is to stop managing cpufreq sysfs directories on CPU hotplugs.
This issue has been raised multiple times earlier, the latest being tried by Saravana. While working on the $Subject thread, I did lots of cleanups, most of which are already pushed by you.
The two remaining ones are added to this series..
Currently on removal of a 'cpu != policy->cpu', we remove its sysfs directories by removing the soft-link. And on removal of policy->cpu, we migrate the sysfs directories to the next cpu. But if policy->cpu was the last CPU, we remove the policy completely and allocate it again as soon as the CPUs come back. This has shortcomings:
- Code Complexity - Slower hotplug - sysfs file permissions are reset after all policy->cpus are offlined - CPUFreq stats history lost after all policy->cpus are offlined - Special management of sysfs stuff during suspend/resume
To make things simple we stop playing with sysfs files unless the driver is getting removed. Also the policy is kept intact to be used later.
First few patches provide a clean base for others *more important* patches.
Rebased-over: your bleeding edge branch as there were dependencies on my earlier patches.
Pushed here:
git://git.linaro.org/people/viresh.kumar/linux.git cpufreq/core/sysfs
@Saravana/Prarit: It would be good if you can provide your feedback/reviews on this. Thanks in advance.
@Saravana: Few patches might look very similar to what you have done, please add your SOB for those :)
-- viresh
Viresh Kumar (18): cpufreq: Drop cpufreq_disabled() check from cpufreq_cpu_{get|put}() cpufreq: Create for_each_policy() cpufreq: Create for_each_governor() cpufreq: Manage fallback policies in a list cpufreq: Manage governor usage history with 'policy->last_governor' cpufreq: Reuse policy list instead of per-cpu variable 'cpufreq_cpu_data' cpufreq: Drop (now) useless check 'cpu > nr_cpu_ids' cpufreq: Add doc style comment about cpufreq_cpu_{get|put}() cpufreq: Mark policy->governor = NULL for fallback policies cpufreq: Don't allow updating inactive-policies from sysfs cpufreq: Track cpu managing sysfs kobjects separately cpufreq: Stop migrating sysfs files on hotplug cpufreq: Keep a single path for adding managed CPUs cpufreq: Remove cpufreq_update_policy() cpufreq: Initialize policy->kobj while allocating policy cpufreq: Call cpufreq_policy_put_kobj() from cpufreq_policy_free() cpufreq: Restart governor as soon as possible cpufreq: Merge __cpufreq_add_dev() and cpufreq_add_dev()
drivers/cpufreq/cpufreq.c | 551 +++++++++++++++++++++++----------------------- include/linux/cpufreq.h | 10 +- 2 files changed, 287 insertions(+), 274 deletions(-)
Even if cpufreq is disabled then also the per-cpu variable will return NULL and things will continue working as is. Remove this unnecessary check.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index ca69f42b8e1e..72990ba59fad 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -203,7 +203,7 @@ struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu) struct cpufreq_policy *policy = NULL; unsigned long flags;
- if (cpufreq_disabled() || (cpu >= nr_cpu_ids)) + if (cpu >= nr_cpu_ids) return NULL;
if (!down_read_trylock(&cpufreq_rwsem)) @@ -230,9 +230,6 @@ EXPORT_SYMBOL_GPL(cpufreq_cpu_get);
void cpufreq_cpu_put(struct cpufreq_policy *policy) { - if (cpufreq_disabled()) - return; - kobject_put(&policy->kobj); up_read(&cpufreq_rwsem); }
On 01/27/2015 12:36 AM, Viresh Kumar wrote:
Even if cpufreq is disabled then also the per-cpu variable will return NULL and things will continue working as is. Remove this unnecessary check.
Commit text reword suggestion:
When cpufreq is disabled, the per-cpu variable would have been set to NULL. Remove this unnecessary check.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/cpufreq.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index ca69f42b8e1e..72990ba59fad 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -203,7 +203,7 @@ struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu) struct cpufreq_policy *policy = NULL; unsigned long flags;
- if (cpufreq_disabled() || (cpu >= nr_cpu_ids))
if (cpu >= nr_cpu_ids) return NULL;
if (!down_read_trylock(&cpufreq_rwsem))
@@ -230,9 +230,6 @@ EXPORT_SYMBOL_GPL(cpufreq_cpu_get);
void cpufreq_cpu_put(struct cpufreq_policy *policy) {
- if (cpufreq_disabled())
return;
- kobject_put(&policy->kobj); up_read(&cpufreq_rwsem); }
Acked-by: Saravana Kannan skannan@codeaurora.org
To make code more readable and less error prone, lets create a helper macro for iterating over all active policies.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 72990ba59fad..9dfefb8ece6d 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -31,6 +31,12 @@ #include <linux/tick.h> #include <trace/events/power.h>
+/* Macros to iterate over lists */ +/* Iterate over online CPUs policies */ +static LIST_HEAD(cpufreq_policy_list); +#define for_each_policy(__policy) \ + list_for_each_entry(__policy, &cpufreq_policy_list, policy_list) + /** * The "cpufreq driver" - the arch- or hardware-dependent low * level driver of CPUFreq support, and its spinlock. This lock @@ -41,7 +47,6 @@ static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data); static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data_fallback); static DEFINE_RWLOCK(cpufreq_driver_lock); DEFINE_MUTEX(cpufreq_governor_lock); -static LIST_HEAD(cpufreq_policy_list);
/* This one keeps track of the previously set governor of a removed CPU */ static DEFINE_PER_CPU(char[CPUFREQ_NAME_LEN], cpufreq_cpu_governor); @@ -1113,7 +1118,7 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
/* Check if this cpu was hot-unplugged earlier and has siblings */ read_lock_irqsave(&cpufreq_driver_lock, flags); - list_for_each_entry(policy, &cpufreq_policy_list, policy_list) { + for_each_policy(policy) { if (cpumask_test_cpu(cpu, policy->related_cpus)) { read_unlock_irqrestore(&cpufreq_driver_lock, flags); ret = cpufreq_add_policy_cpu(policy, cpu, dev); @@ -1647,7 +1652,7 @@ void cpufreq_suspend(void)
pr_debug("%s: Suspending Governors\n", __func__);
- list_for_each_entry(policy, &cpufreq_policy_list, policy_list) { + for_each_policy(policy) { if (__cpufreq_governor(policy, CPUFREQ_GOV_STOP)) pr_err("%s: Failed to stop governor for policy: %p\n", __func__, policy); @@ -1681,7 +1686,7 @@ void cpufreq_resume(void)
pr_debug("%s: Resuming Governors\n", __func__);
- list_for_each_entry(policy, &cpufreq_policy_list, policy_list) { + for_each_policy(policy) { if (cpufreq_driver->resume && cpufreq_driver->resume(policy)) pr_err("%s: Failed to resume driver: %p\n", __func__, policy); @@ -2324,7 +2329,7 @@ static int cpufreq_boost_set_sw(int state) struct cpufreq_policy *policy; int ret = -EINVAL;
- list_for_each_entry(policy, &cpufreq_policy_list, policy_list) { + for_each_policy(policy) { freq_table = cpufreq_frequency_get_table(policy->cpu); if (freq_table) { ret = cpufreq_frequency_table_cpuinfo(policy,
On 01/27/2015 12:36 AM, Viresh Kumar wrote:
To make code more readable and less error prone, lets create a helper macro for iterating over all active policies.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/cpufreq.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 72990ba59fad..9dfefb8ece6d 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -31,6 +31,12 @@ #include <linux/tick.h> #include <trace/events/power.h>
+/* Macros to iterate over lists */ +/* Iterate over online CPUs policies */ +static LIST_HEAD(cpufreq_policy_list); +#define for_each_policy(__policy) \
- list_for_each_entry(__policy, &cpufreq_policy_list, policy_list)
Could you "export" this macro/list so that it's usable by other clients of cpufreq too? If a driver wants to examine all the policies today, it has to do "for_each_cpu" and then try to get the policy for each CPU. This seems nicer.
-Saravana
On 4 February 2015 at 03:52, Saravana Kannan skannan@codeaurora.org wrote:
Could you "export" this macro/list so that it's usable by other clients of cpufreq too? If a driver wants to examine all the policies today, it has to do "for_each_cpu" and then try to get the policy for each CPU. This seems nicer.
I would wait for such requests to come, it wouldn't be difficult to move these to cpufreq.h .. Do you have some users in mind ?
Also, exposing the cpufreq-locks wouldn't be a good idea and accessing these macro's without them might result in crash, in case there are updates at the same time..
So, its a bit tricky I would say.
To make code more readable and less error prone, lets create a helper macro for iterating over all available governors.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 9dfefb8ece6d..cf8ce523074f 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -37,6 +37,11 @@ static LIST_HEAD(cpufreq_policy_list); #define for_each_policy(__policy) \ list_for_each_entry(__policy, &cpufreq_policy_list, policy_list)
+/* Iterate over governors */ +static LIST_HEAD(cpufreq_governor_list); +#define for_each_governor(__governor) \ + list_for_each_entry(__governor, &cpufreq_governor_list, governor_list) + /** * The "cpufreq driver" - the arch- or hardware-dependent low * level driver of CPUFreq support, and its spinlock. This lock @@ -99,7 +104,6 @@ void disable_cpufreq(void) { off = 1; } -static LIST_HEAD(cpufreq_governor_list); static DEFINE_MUTEX(cpufreq_governor_mutex);
bool have_governor_per_policy(void) @@ -434,7 +438,7 @@ static struct cpufreq_governor *find_governor(const char *str_governor) { struct cpufreq_governor *t;
- list_for_each_entry(t, &cpufreq_governor_list, governor_list) + for_each_governor(t) if (!strncasecmp(str_governor, t->name, CPUFREQ_NAME_LEN)) return t;
@@ -636,7 +640,7 @@ static ssize_t show_scaling_available_governors(struct cpufreq_policy *policy, goto out; }
- list_for_each_entry(t, &cpufreq_governor_list, governor_list) { + for_each_governor(t) { if (i >= (ssize_t) ((PAGE_SIZE / sizeof(char)) - (CPUFREQ_NAME_LEN + 2))) goto out;
On 01/27/2015 12:36 AM, Viresh Kumar wrote:
To make code more readable and less error prone, lets create a helper macro for iterating over all available governors.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/cpufreq.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 9dfefb8ece6d..cf8ce523074f 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -37,6 +37,11 @@ static LIST_HEAD(cpufreq_policy_list); #define for_each_policy(__policy) \ list_for_each_entry(__policy, &cpufreq_policy_list, policy_list)
+/* Iterate over governors */ +static LIST_HEAD(cpufreq_governor_list); +#define for_each_governor(__governor) \
- list_for_each_entry(__governor, &cpufreq_governor_list, governor_list)
- /**
- The "cpufreq driver" - the arch- or hardware-dependent low
- level driver of CPUFreq support, and its spinlock. This lock
@@ -99,7 +104,6 @@ void disable_cpufreq(void) { off = 1; } -static LIST_HEAD(cpufreq_governor_list); static DEFINE_MUTEX(cpufreq_governor_mutex);
bool have_governor_per_policy(void) @@ -434,7 +438,7 @@ static struct cpufreq_governor *find_governor(const char *str_governor) { struct cpufreq_governor *t;
- list_for_each_entry(t, &cpufreq_governor_list, governor_list)
- for_each_governor(t) if (!strncasecmp(str_governor, t->name, CPUFREQ_NAME_LEN)) return t;
@@ -636,7 +640,7 @@ static ssize_t show_scaling_available_governors(struct cpufreq_policy *policy, goto out; }
- list_for_each_entry(t, &cpufreq_governor_list, governor_list) {
- for_each_governor(t) { if (i >= (ssize_t) ((PAGE_SIZE / sizeof(char))
- (CPUFREQ_NAME_LEN + 2)))
goto out;
Acked-by: Saravana Kannan skannan@codeaurora.org
Policies manage a group of CPUs and tracking them on per-cpu basis isn't the best approach for sure.
The obvious loss is the amount of memory consumed for keeping a per-cpu copy of the same pointer. But the bigger problem is managing such a data structure as we need to update it for all policy->cpus.
To make it simple, lets manage fallback CPUs in a list rather than a per-cpu variable.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 51 +++++++++++++++++++++++++++-------------------- include/linux/cpufreq.h | 5 ++++- 2 files changed, 33 insertions(+), 23 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index cf8ce523074f..f253cf45f910 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -37,6 +37,11 @@ static LIST_HEAD(cpufreq_policy_list); #define for_each_policy(__policy) \ list_for_each_entry(__policy, &cpufreq_policy_list, policy_list)
+/* Iterate over offline CPUs policies */ +static LIST_HEAD(cpufreq_fallback_list); +#define for_each_fallback_policy(__policy) \ + list_for_each_entry(__policy, &cpufreq_fallback_list, fallback_list) + /* Iterate over governors */ static LIST_HEAD(cpufreq_governor_list); #define for_each_governor(__governor) \ @@ -49,7 +54,6 @@ static LIST_HEAD(cpufreq_governor_list); */ static struct cpufreq_driver *cpufreq_driver; static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data); -static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data_fallback); static DEFINE_RWLOCK(cpufreq_driver_lock); DEFINE_MUTEX(cpufreq_governor_lock);
@@ -999,13 +1003,16 @@ static int cpufreq_add_policy_cpu(struct cpufreq_policy *policy,
static struct cpufreq_policy *cpufreq_policy_restore(unsigned int cpu) { - struct cpufreq_policy *policy; + struct cpufreq_policy *policy = NULL, *temp; unsigned long flags;
read_lock_irqsave(&cpufreq_driver_lock, flags); - - policy = per_cpu(cpufreq_cpu_data_fallback, cpu); - + for_each_fallback_policy(temp) { + if (cpumask_test_cpu(cpu, temp->related_cpus)) { + policy = temp; + break; + } + } read_unlock_irqrestore(&cpufreq_driver_lock, flags);
if (policy) @@ -1029,6 +1036,7 @@ static struct cpufreq_policy *cpufreq_policy_alloc(void) goto err_free_cpumask;
INIT_LIST_HEAD(&policy->policy_list); + INIT_LIST_HEAD(&policy->fallback_list); init_rwsem(&policy->rwsem); spin_lock_init(&policy->transition_lock); init_waitqueue_head(&policy->transition_wait); @@ -1142,6 +1150,11 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) policy = cpufreq_policy_alloc(); if (!policy) goto nomem_out; + } else { + /* Don't need fallback list anymore */ + write_lock_irqsave(&cpufreq_driver_lock, flags); + list_del(&policy->fallback_list); + write_unlock_irqrestore(&cpufreq_driver_lock, flags); }
/* @@ -1296,11 +1309,8 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) if (cpufreq_driver->exit) cpufreq_driver->exit(policy); err_set_policy_cpu: - if (recover_policy) { - /* Do not leave stale fallback data behind. */ - per_cpu(cpufreq_cpu_data_fallback, cpu) = NULL; + if (recover_policy) cpufreq_policy_put_kobj(policy); - } cpufreq_policy_free(policy);
nomem_out: @@ -1333,21 +1343,22 @@ static int __cpufreq_remove_dev_prepare(struct device *dev,
pr_debug("%s: unregistering CPU %u\n", __func__, cpu);
- write_lock_irqsave(&cpufreq_driver_lock, flags); - policy = per_cpu(cpufreq_cpu_data, cpu); - - /* Save the policy somewhere when doing a light-weight tear-down */ - if (cpufreq_suspended) - per_cpu(cpufreq_cpu_data_fallback, cpu) = policy; - - write_unlock_irqrestore(&cpufreq_driver_lock, flags); - if (!policy) { pr_debug("%s: No cpu_data found\n", __func__); return -EINVAL; }
+ down_read(&policy->rwsem); + cpus = cpumask_weight(policy->cpus); + up_read(&policy->rwsem); + + /* Save the policy when doing a light-weight tear-down of last cpu */ + write_lock_irqsave(&cpufreq_driver_lock, flags); + if (cpufreq_suspended && cpus == 1) + list_add(&policy->fallback_list, &cpufreq_fallback_list); + write_unlock_irqrestore(&cpufreq_driver_lock, flags); + if (has_target()) { ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP); if (ret) { @@ -1359,10 +1370,6 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, policy->governor->name, CPUFREQ_NAME_LEN); }
- down_read(&policy->rwsem); - cpus = cpumask_weight(policy->cpus); - up_read(&policy->rwsem); - if (cpu != policy->cpu) { sysfs_remove_link(&dev->kobj, "cpufreq"); } else if (cpus > 1) { diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 2ee4888c1f47..df6af4cfa26a 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -87,7 +87,10 @@ struct cpufreq_policy { struct cpufreq_real_policy user_policy; struct cpufreq_frequency_table *freq_table;
- struct list_head policy_list; + /* Internal lists */ + struct list_head policy_list; /* policies of online CPUs */ + struct list_head fallback_list; /* policies of offline CPUs */ + struct kobject kobj; struct completion kobj_unregister;
On Tuesday, January 27, 2015 02:06:10 PM Viresh Kumar wrote:
Policies manage a group of CPUs and tracking them on per-cpu basis isn't the best approach for sure.
The obvious loss is the amount of memory consumed for keeping a per-cpu copy of the same pointer. But the bigger problem is managing such a data structure as we need to update it for all policy->cpus.
To make it simple, lets manage fallback CPUs in a list rather than a per-cpu variable.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/cpufreq.c | 51 +++++++++++++++++++++++++++-------------------- include/linux/cpufreq.h | 5 ++++- 2 files changed, 33 insertions(+), 23 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index cf8ce523074f..f253cf45f910 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -37,6 +37,11 @@ static LIST_HEAD(cpufreq_policy_list); #define for_each_policy(__policy) \ list_for_each_entry(__policy, &cpufreq_policy_list, policy_list) +/* Iterate over offline CPUs policies */ +static LIST_HEAD(cpufreq_fallback_list); +#define for_each_fallback_policy(__policy) \
- list_for_each_entry(__policy, &cpufreq_fallback_list, fallback_list)
/* Iterate over governors */ static LIST_HEAD(cpufreq_governor_list); #define for_each_governor(__governor) \ @@ -49,7 +54,6 @@ static LIST_HEAD(cpufreq_governor_list); */ static struct cpufreq_driver *cpufreq_driver; static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data); -static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data_fallback); static DEFINE_RWLOCK(cpufreq_driver_lock); DEFINE_MUTEX(cpufreq_governor_lock); @@ -999,13 +1003,16 @@ static int cpufreq_add_policy_cpu(struct cpufreq_policy *policy, static struct cpufreq_policy *cpufreq_policy_restore(unsigned int cpu) {
- struct cpufreq_policy *policy;
- struct cpufreq_policy *policy = NULL, *temp; unsigned long flags;
read_lock_irqsave(&cpufreq_driver_lock, flags);
- policy = per_cpu(cpufreq_cpu_data_fallback, cpu);
- for_each_fallback_policy(temp) {
if (cpumask_test_cpu(cpu, temp->related_cpus)) {
policy = temp;
break;
}
This becomes quite inefficient on a system with many CPUs having different policies. My approach would be to somehow attach the fallback policy information to the CPU device object.
- } read_unlock_irqrestore(&cpufreq_driver_lock, flags);
if (policy) @@ -1029,6 +1036,7 @@ static struct cpufreq_policy *cpufreq_policy_alloc(void) goto err_free_cpumask; INIT_LIST_HEAD(&policy->policy_list);
- INIT_LIST_HEAD(&policy->fallback_list); init_rwsem(&policy->rwsem); spin_lock_init(&policy->transition_lock); init_waitqueue_head(&policy->transition_wait);
@@ -1142,6 +1150,11 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) policy = cpufreq_policy_alloc(); if (!policy) goto nomem_out;
- } else {
/* Don't need fallback list anymore */
write_lock_irqsave(&cpufreq_driver_lock, flags);
list_del(&policy->fallback_list);
}write_unlock_irqrestore(&cpufreq_driver_lock, flags);
/* @@ -1296,11 +1309,8 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) if (cpufreq_driver->exit) cpufreq_driver->exit(policy); err_set_policy_cpu:
- if (recover_policy) {
/* Do not leave stale fallback data behind. */
per_cpu(cpufreq_cpu_data_fallback, cpu) = NULL;
- if (recover_policy) cpufreq_policy_put_kobj(policy);
- } cpufreq_policy_free(policy);
nomem_out: @@ -1333,21 +1343,22 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, pr_debug("%s: unregistering CPU %u\n", __func__, cpu);
- write_lock_irqsave(&cpufreq_driver_lock, flags);
- policy = per_cpu(cpufreq_cpu_data, cpu);
- /* Save the policy somewhere when doing a light-weight tear-down */
- if (cpufreq_suspended)
per_cpu(cpufreq_cpu_data_fallback, cpu) = policy;
- write_unlock_irqrestore(&cpufreq_driver_lock, flags);
- if (!policy) { pr_debug("%s: No cpu_data found\n", __func__); return -EINVAL; }
- down_read(&policy->rwsem);
- cpus = cpumask_weight(policy->cpus);
- up_read(&policy->rwsem);
- /* Save the policy when doing a light-weight tear-down of last cpu */
- write_lock_irqsave(&cpufreq_driver_lock, flags);
- if (cpufreq_suspended && cpus == 1)
list_add(&policy->fallback_list, &cpufreq_fallback_list);
- write_unlock_irqrestore(&cpufreq_driver_lock, flags);
- if (has_target()) { ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP); if (ret) {
@@ -1359,10 +1370,6 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, policy->governor->name, CPUFREQ_NAME_LEN); }
- down_read(&policy->rwsem);
- cpus = cpumask_weight(policy->cpus);
- up_read(&policy->rwsem);
- if (cpu != policy->cpu) { sysfs_remove_link(&dev->kobj, "cpufreq"); } else if (cpus > 1) {
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 2ee4888c1f47..df6af4cfa26a 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -87,7 +87,10 @@ struct cpufreq_policy { struct cpufreq_real_policy user_policy; struct cpufreq_frequency_table *freq_table;
- struct list_head policy_list;
- /* Internal lists */
- struct list_head policy_list; /* policies of online CPUs */
- struct list_head fallback_list; /* policies of offline CPUs */
- struct kobject kobj; struct completion kobj_unregister;
On 3 February 2015 at 06:11, Rafael J. Wysocki rjw@rjwysocki.net wrote:
This becomes quite inefficient on a system with many CPUs having different policies. My approach would be to somehow attach the fallback policy information to the CPU device object.
It will be same as the per-cpu approach which we are fed up of.
On Tuesday, February 03, 2015 09:40:58 AM Viresh Kumar wrote:
On 3 February 2015 at 06:11, Rafael J. Wysocki rjw@rjwysocki.net wrote:
This becomes quite inefficient on a system with many CPUs having different policies. My approach would be to somehow attach the fallback policy information to the CPU device object.
It will be same as the per-cpu approach which we are fed up of.
No, it won't be the same. The per-CPU memory is special.
The general idea of the existing code is sane in my view. It connects the fallback policy information with the given CPU directly, which generally is a good idea. Storing that information in the per-CPU memory isn't a good idea, however.
On 3 February 2015 at 20:34, Rafael J. Wysocki rjw@rjwysocki.net wrote:
No, it won't be the same. The per-CPU memory is special.
The general idea of the existing code is sane in my view. It connects the fallback policy information with the given CPU directly, which generally is a good idea. Storing that information in the per-CPU memory isn't a good idea, however.
Okay, will see what can be done.
On 01/27/2015 12:36 AM, Viresh Kumar wrote:
Policies manage a group of CPUs and tracking them on per-cpu basis isn't the best approach for sure.
The obvious loss is the amount of memory consumed for keeping a per-cpu copy of the same pointer. But the bigger problem is managing such a data structure as we need to update it for all policy->cpus.
To make it simple, lets manage fallback CPUs in a list rather than a per-cpu variable.
Can you explain why we need a fallback list in the first place? Now that we are not destroying and creating policy objects, I don't see any point in the fallback list.
-Saravana
On 4 February 2015 at 03:58, Saravana Kannan skannan@codeaurora.org wrote:
Can you explain why we need a fallback list in the first place? Now that we are not destroying and creating policy objects, I don't see any point in the fallback list.
Because we wanted to mark the policy inactive. But as I have introduced another field for that now, probably it can be fixed. Will check again on what can be done.
Can you review the other patches so that they are reviewed once before sending V2 here ?
On 02/03/2015 10:20 PM, Viresh Kumar wrote:
On 4 February 2015 at 03:58, Saravana Kannan skannan@codeaurora.org wrote:
Can you explain why we need a fallback list in the first place? Now that we are not destroying and creating policy objects, I don't see any point in the fallback list.
Because we wanted to mark the policy inactive. But as I have introduced another field for that now, probably it can be fixed. Will check again on what can be done.
Thanks. That's why I was asking. Now that you have another flag. Also, you might not even need a flag. You can just check if policy->cpus is empty (btw, I think we should let that go to empty)
Can you review the other patches so that they are reviewed once before sending V2 here ?
Definitely. I want this feature to go in.
-Saravana
On Wednesday, February 04, 2015 02:28:55 PM Saravana Kannan wrote:
On 02/03/2015 10:20 PM, Viresh Kumar wrote:
On 4 February 2015 at 03:58, Saravana Kannan skannan@codeaurora.org wrote:
Can you explain why we need a fallback list in the first place? Now that we are not destroying and creating policy objects, I don't see any point in the fallback list.
Because we wanted to mark the policy inactive. But as I have introduced another field for that now, probably it can be fixed. Will check again on what can be done.
Thanks. That's why I was asking. Now that you have another flag. Also, you might not even need a flag. You can just check if policy->cpus is empty (btw, I think we should let that go to empty)
So the idea would be to avoid clearig cpufreq_cpu_data during offline tear-down (because we know that the CPU is offline anyway) and then start using the same policy pointer during offline?
On 02/04/2015 03:20 PM, Rafael J. Wysocki wrote:
On Wednesday, February 04, 2015 02:28:55 PM Saravana Kannan wrote:
On 02/03/2015 10:20 PM, Viresh Kumar wrote:
On 4 February 2015 at 03:58, Saravana Kannan skannan@codeaurora.org wrote:
Can you explain why we need a fallback list in the first place? Now that we are not destroying and creating policy objects, I don't see any point in the fallback list.
Because we wanted to mark the policy inactive. But as I have introduced another field for that now, probably it can be fixed. Will check again on what can be done.
Thanks. That's why I was asking. Now that you have another flag. Also, you might not even need a flag. You can just check if policy->cpus is empty (btw, I think we should let that go to empty)
So the idea would be to avoid clearig cpufreq_cpu_data during offline tear-down (because we know that the CPU is offline anyway) and then start using the same policy pointer during offline?
Yeah. We still don't clear the policy->cpus today when the last CPU goes down. But that can be done easily by changing a few "if" conditions and rearranging the hotplug notifier code (I think it's mostly there with this series). Once we clear policy->cpus when all CPUs are offline, we can just use that data to figure out if it's "active" or not.
-Saravana
On Wednesday, February 04, 2015 05:55:02 PM Saravana Kannan wrote:
On 02/04/2015 03:20 PM, Rafael J. Wysocki wrote:
On Wednesday, February 04, 2015 02:28:55 PM Saravana Kannan wrote:
On 02/03/2015 10:20 PM, Viresh Kumar wrote:
On 4 February 2015 at 03:58, Saravana Kannan skannan@codeaurora.org wrote:
Can you explain why we need a fallback list in the first place? Now that we are not destroying and creating policy objects, I don't see any point in the fallback list.
Because we wanted to mark the policy inactive. But as I have introduced another field for that now, probably it can be fixed. Will check again on what can be done.
Thanks. That's why I was asking. Now that you have another flag. Also, you might not even need a flag. You can just check if policy->cpus is empty (btw, I think we should let that go to empty)
So the idea would be to avoid clearig cpufreq_cpu_data during offline tear-down (because we know that the CPU is offline anyway) and then start using the same policy pointer during offline?
Yeah. We still don't clear the policy->cpus today when the last CPU goes down. But that can be done easily by changing a few "if" conditions and rearranging the hotplug notifier code (I think it's mostly there with this series). Once we clear policy->cpus when all CPUs are offline, we can just use that data to figure out if it's "active" or not.
OK
I'm still concerned about the case when the last policy CPU is physically going away, in which we do the offline as a preliminary step and then will go for full CPU device unregistration.
On 02/05/2015 07:11 AM, Rafael J. Wysocki wrote:
On Wednesday, February 04, 2015 05:55:02 PM Saravana Kannan wrote:
On 02/04/2015 03:20 PM, Rafael J. Wysocki wrote:
On Wednesday, February 04, 2015 02:28:55 PM Saravana Kannan wrote:
On 02/03/2015 10:20 PM, Viresh Kumar wrote:
On 4 February 2015 at 03:58, Saravana Kannan skannan@codeaurora.org wrote:
Can you explain why we need a fallback list in the first place? Now that we are not destroying and creating policy objects, I don't see any point in the fallback list.
Because we wanted to mark the policy inactive. But as I have introduced another field for that now, probably it can be fixed. Will check again on what can be done.
Thanks. That's why I was asking. Now that you have another flag. Also, you might not even need a flag. You can just check if policy->cpus is empty (btw, I think we should let that go to empty)
So the idea would be to avoid clearig cpufreq_cpu_data during offline tear-down (because we know that the CPU is offline anyway) and then start using the same policy pointer during offline?
Yeah. We still don't clear the policy->cpus today when the last CPU goes down. But that can be done easily by changing a few "if" conditions and rearranging the hotplug notifier code (I think it's mostly there with this series). Once we clear policy->cpus when all CPUs are offline, we can just use that data to figure out if it's "active" or not.
OK
I'm still concerned about the case when the last policy CPU is physically going away, in which we do the offline as a preliminary step and then will go for full CPU device unregistration.
I made sure that would work in the patch series I sent out a while back. I now need to make sure that Viresh's patch series account for it correctly. I'll make sure to review this series at least once a week.
The way to handle the case you mentioned is by treating the subsys_interface based add/remove ops as physical add/remove and the hotplug add/remove as logical add/remove.
-Saravana
On 5 February 2015 at 23:11, Rafael J. Wysocki rjw@rjwysocki.net wrote:
I'm still concerned about the case when the last policy CPU is physically going away, in which we do the offline as a preliminary step and then will go for full CPU device unregistration.
I need more clarity on how this happens. So, on a physical hotplug out: - We first do cpu offline, i.e. hotplug notifier gets called. - And then during unregistration the sysfs remove will get called?
I haven't taken care of this as I never understood what actually happens in terms of code sequencing ..
Will take care of that in next revision once I get answers to above queries.
-- viresh
On Tuesday, February 17, 2015 01:36:47 PM Viresh Kumar wrote:
On 5 February 2015 at 23:11, Rafael J. Wysocki rjw@rjwysocki.net wrote:
I'm still concerned about the case when the last policy CPU is physically going away, in which we do the offline as a preliminary step and then will go for full CPU device unregistration.
I need more clarity on how this happens. So, on a physical hotplug out:
- We first do cpu offline, i.e. hotplug notifier gets called.
Yes.
- And then during unregistration the sysfs remove will get called?
The CPU device goes away then and that should trigger the remove of all the sysfs directories under it.
On 17 February 2015 at 23:45, Rafael J. Wysocki rjw@rjwysocki.net wrote:
- And then during unregistration the sysfs remove will get called?
I meant subsys-remove here, sorry :(
The CPU device goes away then and that should trigger the remove of all the sysfs directories under it.
How will the cpufreq core know about it? cpufreq_remove_dev() will get called?
Is there a way to emulate it? So that I can test it in my setup..
On 02/17/2015 08:23 PM, Viresh Kumar wrote:
On 17 February 2015 at 23:45, Rafael J. Wysocki rjw@rjwysocki.net wrote:
- And then during unregistration the sysfs remove will get called?
I meant subsys-remove here, sorry :(
The CPU device goes away then and that should trigger the remove of all the sysfs directories under it.
How will the cpufreq core know about it? cpufreq_remove_dev() will get called?
Is there a way to emulate it? So that I can test it in my setup..
During logical hotplug, you only get the hotplug notifiers. During physical hotplug, you get the subsys_remove.
So, we have all the information needed to distinguish. Remove the sysfs entries only in subsys-remove.
-Saravana
On 19 February 2015 at 02:45, Saravana Kannan skannan@codeaurora.org wrote:
During logical hotplug, you only get the hotplug notifiers. During physical hotplug, you get the subsys_remove.
From Rafael's mail it looked like we also get hotplug notifiers..
So it will be hoplug notifier followed by subsys, and that would be a special case then..
So, we have all the information needed to distinguish. Remove the sysfs entries only in subsys-remove.
But hotplug notifiers would have already done most of the work we wanted. And this will be a special case then.
Also, a policy might be shared by multiple CPUs and we need to take that into account as well here..
So, not that straight forward as I can see it..
Also, are you aware of any way to emulate that ?
History of which governor was used last is common to all CPUs within a policy and maintaining it per-cpu isn't the best approach for sure.
Apart from wasting memory, this also increases the complexity of managing this data structure as it has to be updated for all CPUs.
To make that somewhat simpler, lets store this information in a new field 'last_governor' in struct cpufreq_policy and update it on removal of last cpu of a policy.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 24 ++++++++++++------------ include/linux/cpufreq.h | 1 + 2 files changed, 13 insertions(+), 12 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index f253cf45f910..4ad1e46891b5 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -57,9 +57,6 @@ static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data); static DEFINE_RWLOCK(cpufreq_driver_lock); DEFINE_MUTEX(cpufreq_governor_lock);
-/* This one keeps track of the previously set governor of a removed CPU */ -static DEFINE_PER_CPU(char[CPUFREQ_NAME_LEN], cpufreq_cpu_governor); - /* Flag to suspend/resume CPUFreq governors */ static bool cpufreq_suspended;
@@ -941,7 +938,7 @@ static void cpufreq_init_policy(struct cpufreq_policy *policy) memcpy(&new_policy, policy, sizeof(*policy));
/* Update governor of new_policy to the governor used before hotplug */ - gov = find_governor(per_cpu(cpufreq_cpu_governor, policy->cpu)); + gov = find_governor(policy->last_governor); if (gov) pr_debug("Restoring governor %s for cpu %d\n", policy->governor->name, policy->cpu); @@ -1366,8 +1363,9 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, return ret; }
- strncpy(per_cpu(cpufreq_cpu_governor, cpu), - policy->governor->name, CPUFREQ_NAME_LEN); + if (cpus == 1) + strncpy(policy->last_governor, policy->governor->name, + CPUFREQ_NAME_LEN); }
if (cpu != policy->cpu) { @@ -2096,7 +2094,8 @@ EXPORT_SYMBOL_GPL(cpufreq_register_governor);
void cpufreq_unregister_governor(struct cpufreq_governor *governor) { - int cpu; + struct cpufreq_policy *policy; + unsigned long flags;
if (!governor) return; @@ -2104,12 +2103,13 @@ void cpufreq_unregister_governor(struct cpufreq_governor *governor) if (cpufreq_disabled()) return;
- for_each_present_cpu(cpu) { - if (cpu_online(cpu)) - continue; - if (!strcmp(per_cpu(cpufreq_cpu_governor, cpu), governor->name)) - strcpy(per_cpu(cpufreq_cpu_governor, cpu), "\0"); + /* clear last_governor for all fallback policies */ + read_lock_irqsave(&cpufreq_driver_lock, flags); + for_each_fallback_policy(policy) { + if (!strcmp(policy->last_governor, governor->name)) + strcpy(policy->last_governor, "\0"); } + read_unlock_irqrestore(&cpufreq_driver_lock, flags);
mutex_lock(&cpufreq_governor_mutex); list_del(&governor->governor_list); diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index df6af4cfa26a..e326cddef6db 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -80,6 +80,7 @@ struct cpufreq_policy { struct cpufreq_governor *governor; /* see below */ void *governor_data; bool governor_enabled; /* governor start/stop flag */ + char last_governor[CPUFREQ_NAME_LEN]; /* last governor used */
struct work_struct update; /* if update_policy() needs to be * called, but you're in IRQ context */
On 01/27/2015 12:36 AM, Viresh Kumar wrote:
History of which governor was used last is common to all CPUs within a policy and maintaining it per-cpu isn't the best approach for sure.
Apart from wasting memory, this also increases the complexity of managing this data structure as it has to be updated for all CPUs.
To make that somewhat simpler, lets store this information in a new field 'last_governor' in struct cpufreq_policy and update it on removal of last cpu of a policy.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/cpufreq.c | 24 ++++++++++++------------ include/linux/cpufreq.h | 1 + 2 files changed, 13 insertions(+), 12 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index f253cf45f910..4ad1e46891b5 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -57,9 +57,6 @@ static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data); static DEFINE_RWLOCK(cpufreq_driver_lock); DEFINE_MUTEX(cpufreq_governor_lock);
-/* This one keeps track of the previously set governor of a removed CPU */ -static DEFINE_PER_CPU(char[CPUFREQ_NAME_LEN], cpufreq_cpu_governor);
- /* Flag to suspend/resume CPUFreq governors */ static bool cpufreq_suspended;
@@ -941,7 +938,7 @@ static void cpufreq_init_policy(struct cpufreq_policy *policy) memcpy(&new_policy, policy, sizeof(*policy));
/* Update governor of new_policy to the governor used before hotplug */
- gov = find_governor(per_cpu(cpufreq_cpu_governor, policy->cpu));
- gov = find_governor(policy->last_governor);
Just change this to:
gov = policy->governor;
No need to search for the governor again. It should already be valid for policy that's being restored. For new policies, it would be NULL and would get defaulted correctly.
if (gov) pr_debug("Restoring governor %s for cpu %d\n", policy->governor->name, policy->cpu); @@ -1366,8 +1363,9 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, return ret; }
strncpy(per_cpu(cpufreq_cpu_governor, cpu),
policy->governor->name, CPUFREQ_NAME_LEN);
if (cpus == 1)
strncpy(policy->last_governor, policy->governor->name,
CPUFREQ_NAME_LEN);
}
if (cpu != policy->cpu) {
@@ -2096,7 +2094,8 @@ EXPORT_SYMBOL_GPL(cpufreq_register_governor);
void cpufreq_unregister_governor(struct cpufreq_governor *governor) {
- int cpu;
struct cpufreq_policy *policy;
unsigned long flags;
if (!governor) return;
@@ -2104,12 +2103,13 @@ void cpufreq_unregister_governor(struct cpufreq_governor *governor) if (cpufreq_disabled()) return;
- for_each_present_cpu(cpu) {
if (cpu_online(cpu))
continue;
if (!strcmp(per_cpu(cpufreq_cpu_governor, cpu), governor->name))
strcpy(per_cpu(cpufreq_cpu_governor, cpu), "\0");
/* clear last_governor for all fallback policies */
read_lock_irqsave(&cpufreq_driver_lock, flags);
for_each_fallback_policy(policy) {
if (!strcmp(policy->last_governor, governor->name))
strcpy(policy->last_governor, "\0");
}
read_unlock_irqrestore(&cpufreq_driver_lock, flags);
mutex_lock(&cpufreq_governor_mutex); list_del(&governor->governor_list);
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index df6af4cfa26a..e326cddef6db 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -80,6 +80,7 @@ struct cpufreq_policy { struct cpufreq_governor *governor; /* see below */ void *governor_data; bool governor_enabled; /* governor start/stop flag */
char last_governor[CPUFREQ_NAME_LEN]; /* last governor used */
struct work_struct update; /* if update_policy() needs to be * called, but you're in IRQ context */
Kinda Nack for a couple of reasons:
1. For logical hotplugs, we don't really throw away the policy, so the governor pointer should already be the last governor. We can just use the pointer.
2. For physical hotplug of CPUs, the policies are getting freed (I assume and hope so). So, the "last_governor" field is going to be empty.
3. Even if we continue storing the last_governor outside of the policy, for physical hotplug of CPUs where the policy is getting recreated, I'm not sure restoring the governor is the right thing to do anyway. I'll explain various possible configurations below for the physical hotplug case:
a. Single policy for all CPUs: The policy never gets removed since there'll be at least one CPU present. So the save/restore code is never used.
b. Multi-cluster/policy system with per-policy tunables enabled: Restoring the governor is bad/incomplete since the per-policy tunables are lost when the governor gets POLICY_EXIT when the policy is destroyed. IMO, in these cases, it's better to use the default governor since we know that using it at cpufreq init with the default tunables for it is a safe/nice configuration.
c. Multi-cluster/policy system with global tunables: Restoring the governor works for this case, but I think it's still better to use the default governor to be consistent with (b) and IMO it makes more sense since the CPU is getting physically added for the first time.
So, long story short, I think this patch should be changed to: 1. Use policy->governor instead of find_governor(last_governor) 2. Use default governor if policy->governor is NULL 3. Completely delete all references to last_governor.
Thanks, Saravana
On 12 February 2015 at 11:03, Saravana Kannan skannan@codeaurora.org wrote:
gov = find_governor(per_cpu(cpufreq_cpu_governor, policy->cpu));
gov = find_governor(policy->last_governor);
Just change this to:
gov = policy->governor;
No need to search for the governor again. It should already be valid for policy that's being restored. For new policies, it would be NULL and would get defaulted correctly.
No. Apart from the fact that one of my patches is making it NULL intentionally, here are the problems that I see with reusing the pointer: - All CPUs of a policy are down - The last used governor is removed (was built in as a module) - Now if we online the CPUs again, it wouldn't work as the pointer insn't usable. - Or if we insert the governor again, pointers would change and then onlining the CPUs wouldn't work..
- For physical hotplug of CPUs, the policies are getting freed (I assume
and hope so). So, the "last_governor" field is going to be empty.
Its not about it being empty. Its more about when we don't remove them physically..
- Even if we continue storing the last_governor outside of the policy, for
physical hotplug of CPUs where the policy is getting recreated, I'm not sure restoring the governor is the right thing to do anyway. I'll explain various possible configurations below for the physical hotplug case:
We do it today as this is part of the per-cpu stuff now and my patch would fix it then :)
Viresh Kumar wrote:
On 12 February 2015 at 11:03, Saravana Kannan skannan@codeaurora.org wrote:
gov = find_governor(per_cpu(cpufreq_cpu_governor,
policy->cpu));
gov = find_governor(policy->last_governor);
Just change this to:
gov = policy->governor;
No need to search for the governor again. It should already be valid for policy that's being restored. For new policies, it would be NULL and would get defaulted correctly.
No. Apart from the fact that one of my patches is making it NULL intentionally, here are the problems that I see with reusing the pointer:
- All CPUs of a policy are down
- The last used governor is removed (was built in as a module)
- Now if we online the CPUs again, it wouldn't work as the pointer
insn't usable.
- Or if we insert the governor again, pointers would change and then
onlining the CPUs wouldn't work..
I think for logical hotplug we should lock the governor from being removed. Or, if you don't want that, go set the pointer to NULL lazily when/if that happens.
- For physical hotplug of CPUs, the policies are getting freed (I
assume and hope so). So, the "last_governor" field is going to be empty.
Its not about it being empty. Its more about when we don't remove them physically..
- Even if we continue storing the last_governor outside of the policy,
for physical hotplug of CPUs where the policy is getting recreated, I'm not sure restoring the governor is the right thing to do anyway. I'll explain various possible configurations below for the physical hotplug case:
We do it today as this is part of the per-cpu stuff now and my patch would fix it then :)
I'm not sure if you read rest of my email. Unless you added some patches recently, restoring the last_governor is definitely functionally broken as of at least 3.12. Probably 3.14 too (I need to check again). It just restores the governor with the default tunables. So, it's kinda useless in the real world. You might as well set it to default governor because it's just as helpful as using the last governor with different tunables.
-Saravana
On 12 February 2015 at 16:00, skannan@codeaurora.org wrote:
I think for logical hotplug we should lock the governor from being
Nack.
removed. Or, if you don't want that, go set the pointer to NULL lazily when/if that happens.
That would be another band-aid ..
I'm not sure if you read rest of my email. Unless you added some patches
I did read them..
recently, restoring the last_governor is definitely functionally broken as of at least 3.12. Probably 3.14 too (I need to check again). It just restores the governor with the default tunables. So, it's kinda useless in the real world. You might as well set it to default governor because it's just as helpful as using the last governor with different tunables.
I agree that the tunables aren't restored, but this is useful atleast for the case where the governor tunables are shared across policies. And so we don't have to change the tunables on policy-re-init..
This patch wasn't about changing the functionality but about saving some memory and simple code flow..
Managing a per-cpu variable (cpufreq_cpu_data) for keeping track of policy structures is a bit overdone. Apart from wasting some bytes of memory to save these pointers for each cpu, it also makes the code much more complex.
It would be much better if we have a single place which needs updates on addition/removal of a policy.
We already have a list of active-policies and that can be used instead of this per-cpu variable.
Lets do it.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 124 ++++++++++++++++++++++------------------------ 1 file changed, 58 insertions(+), 66 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 4ad1e46891b5..7f947287ba46 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -49,11 +49,9 @@ static LIST_HEAD(cpufreq_governor_list);
/** * The "cpufreq driver" - the arch- or hardware-dependent low - * level driver of CPUFreq support, and its spinlock. This lock - * also protects the cpufreq_cpu_data array. + * level driver of CPUFreq support, and its spinlock. */ static struct cpufreq_driver *cpufreq_driver; -static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data); static DEFINE_RWLOCK(cpufreq_driver_lock); DEFINE_MUTEX(cpufreq_governor_lock);
@@ -157,6 +155,54 @@ u64 get_cpu_idle_time(unsigned int cpu, u64 *wall, int io_busy) } EXPORT_SYMBOL_GPL(get_cpu_idle_time);
+/* Only for cpufreq core internal use */ +struct cpufreq_policy *cpufreq_cpu_get_raw(unsigned int cpu) +{ + struct cpufreq_policy *policy; + + for_each_policy(policy) + if (cpumask_test_cpu(cpu, policy->cpus)) + return policy; + + return NULL; +} + +struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu) +{ + struct cpufreq_policy *policy = NULL; + unsigned long flags; + + if (cpu >= nr_cpu_ids) + return NULL; + + if (!down_read_trylock(&cpufreq_rwsem)) + return NULL; + + /* get the cpufreq driver */ + read_lock_irqsave(&cpufreq_driver_lock, flags); + + if (cpufreq_driver) { + policy = cpufreq_cpu_get_raw(cpu); + if (policy) + kobject_get(&policy->kobj); + } + + read_unlock_irqrestore(&cpufreq_driver_lock, flags); + + if (!policy) + up_read(&cpufreq_rwsem); + + return policy; +} +EXPORT_SYMBOL_GPL(cpufreq_cpu_get); + +void cpufreq_cpu_put(struct cpufreq_policy *policy) +{ + kobject_put(&policy->kobj); + up_read(&cpufreq_rwsem); +} +EXPORT_SYMBOL_GPL(cpufreq_cpu_put); + /* * This is a generic cpufreq init() routine which can be used by cpufreq * drivers of SMP systems. It will do following: @@ -190,7 +236,7 @@ EXPORT_SYMBOL_GPL(cpufreq_generic_init);
unsigned int cpufreq_generic_get(unsigned int cpu) { - struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu); + struct cpufreq_policy *policy = cpufreq_cpu_get_raw(cpu);
if (!policy || IS_ERR(policy->clk)) { pr_err("%s: No %s associated to cpu: %d\n", @@ -202,49 +248,6 @@ unsigned int cpufreq_generic_get(unsigned int cpu) } EXPORT_SYMBOL_GPL(cpufreq_generic_get);
-/* Only for cpufreq core internal use */ -struct cpufreq_policy *cpufreq_cpu_get_raw(unsigned int cpu) -{ - return per_cpu(cpufreq_cpu_data, cpu); -} - -struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu) -{ - struct cpufreq_policy *policy = NULL; - unsigned long flags; - - if (cpu >= nr_cpu_ids) - return NULL; - - if (!down_read_trylock(&cpufreq_rwsem)) - return NULL; - - /* get the cpufreq driver */ - read_lock_irqsave(&cpufreq_driver_lock, flags); - - if (cpufreq_driver) { - /* get the CPU */ - policy = per_cpu(cpufreq_cpu_data, cpu); - if (policy) - kobject_get(&policy->kobj); - } - - read_unlock_irqrestore(&cpufreq_driver_lock, flags); - - if (!policy) - up_read(&cpufreq_rwsem); - - return policy; -} -EXPORT_SYMBOL_GPL(cpufreq_cpu_get); - -void cpufreq_cpu_put(struct cpufreq_policy *policy) -{ - kobject_put(&policy->kobj); - up_read(&cpufreq_rwsem); -} -EXPORT_SYMBOL_GPL(cpufreq_cpu_put); - /********************************************************************* * EXTERNALLY AFFECTING FREQUENCY CHANGES * *********************************************************************/ @@ -964,7 +967,6 @@ static int cpufreq_add_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu, struct device *dev) { int ret = 0; - unsigned long flags;
if (has_target()) { ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP); @@ -975,13 +977,7 @@ static int cpufreq_add_policy_cpu(struct cpufreq_policy *policy, }
down_write(&policy->rwsem); - - write_lock_irqsave(&cpufreq_driver_lock, flags); - cpumask_set_cpu(cpu, policy->cpus); - per_cpu(cpufreq_cpu_data, cpu) = policy; - write_unlock_irqrestore(&cpufreq_driver_lock, flags); - up_write(&policy->rwsem);
if (has_target()) { @@ -1105,7 +1101,7 @@ static int update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu,
static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) { - unsigned int j, cpu = dev->id; + unsigned int cpu = dev->id; int ret = -ENOMEM; struct cpufreq_policy *policy; unsigned long flags; @@ -1202,8 +1198,7 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) }
write_lock_irqsave(&cpufreq_driver_lock, flags); - for_each_cpu(j, policy->cpus) - per_cpu(cpufreq_cpu_data, j) = policy; + list_add(&policy->policy_list, &cpufreq_policy_list); write_unlock_irqrestore(&cpufreq_driver_lock, flags);
if (cpufreq_driver->get && !cpufreq_driver->setpolicy) { @@ -1265,10 +1260,6 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) CPUFREQ_CREATE_POLICY, policy); }
- write_lock_irqsave(&cpufreq_driver_lock, flags); - list_add(&policy->policy_list, &cpufreq_policy_list); - write_unlock_irqrestore(&cpufreq_driver_lock, flags); - cpufreq_init_policy(policy);
if (!recover_policy) { @@ -1292,8 +1283,7 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) err_out_unregister: err_get_freq: write_lock_irqsave(&cpufreq_driver_lock, flags); - for_each_cpu(j, policy->cpus) - per_cpu(cpufreq_cpu_data, j) = NULL; + list_del(&policy->policy_list); write_unlock_irqrestore(&cpufreq_driver_lock, flags);
if (!recover_policy) { @@ -1340,7 +1330,10 @@ static int __cpufreq_remove_dev_prepare(struct device *dev,
pr_debug("%s: unregistering CPU %u\n", __func__, cpu);
- policy = per_cpu(cpufreq_cpu_data, cpu); + read_lock_irqsave(&cpufreq_driver_lock, flags); + policy = cpufreq_cpu_get_raw(cpu); + read_unlock_irqrestore(&cpufreq_driver_lock, flags); + if (!policy) { pr_debug("%s: No cpu_data found\n", __func__); return -EINVAL; @@ -1404,7 +1397,7 @@ static int __cpufreq_remove_dev_finish(struct device *dev, struct cpufreq_policy *policy;
read_lock_irqsave(&cpufreq_driver_lock, flags); - policy = per_cpu(cpufreq_cpu_data, cpu); + policy = cpufreq_cpu_get_raw(cpu); read_unlock_irqrestore(&cpufreq_driver_lock, flags);
if (!policy) { @@ -1460,7 +1453,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev, } }
- per_cpu(cpufreq_cpu_data, cpu) = NULL; return 0; }
On 01/27/2015 12:36 AM, Viresh Kumar wrote:
Managing a per-cpu variable (cpufreq_cpu_data) for keeping track of policy structures is a bit overdone. Apart from wasting some bytes of memory to save these pointers for each cpu, it also makes the code much more complex.
It would be much better if we have a single place which needs updates on addition/removal of a policy.
We already have a list of active-policies and that can be used instead of this per-cpu variable.
Lets do it.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/cpufreq.c | 124 ++++++++++++++++++++++------------------------ 1 file changed, 58 insertions(+), 66 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 4ad1e46891b5..7f947287ba46 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -49,11 +49,9 @@ static LIST_HEAD(cpufreq_governor_list);
/**
- The "cpufreq driver" - the arch- or hardware-dependent low
- level driver of CPUFreq support, and its spinlock. This lock
- also protects the cpufreq_cpu_data array.
*/ static struct cpufreq_driver *cpufreq_driver;
- level driver of CPUFreq support, and its spinlock.
-static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data); static DEFINE_RWLOCK(cpufreq_driver_lock); DEFINE_MUTEX(cpufreq_governor_lock);
@@ -157,6 +155,54 @@ u64 get_cpu_idle_time(unsigned int cpu, u64 *wall, int io_busy) } EXPORT_SYMBOL_GPL(get_cpu_idle_time);
+/* Only for cpufreq core internal use */ +struct cpufreq_policy *cpufreq_cpu_get_raw(unsigned int cpu)
Rename this to cpufreq_cpu_get_unsafe or _nolock?
Seems more descriptive. Hmm... you are just moving this function around. Ok, your call.
+{
- struct cpufreq_policy *policy;
- for_each_policy(policy)
if (cpumask_test_cpu(cpu, policy->cpus))
return policy;
- return NULL;
+}
+struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu) +{
- struct cpufreq_policy *policy = NULL;
- unsigned long flags;
- if (cpu >= nr_cpu_ids)
return NULL;
- if (!down_read_trylock(&cpufreq_rwsem))
return NULL;
- /* get the cpufreq driver */
- read_lock_irqsave(&cpufreq_driver_lock, flags);
- if (cpufreq_driver) {
policy = cpufreq_cpu_get_raw(cpu);
if (policy)
kobject_get(&policy->kobj);
- }
- read_unlock_irqrestore(&cpufreq_driver_lock, flags);
- if (!policy)
up_read(&cpufreq_rwsem);
- return policy;
+} +EXPORT_SYMBOL_GPL(cpufreq_cpu_get);
+void cpufreq_cpu_put(struct cpufreq_policy *policy) +{
- kobject_put(&policy->kobj);
- up_read(&cpufreq_rwsem);
+} +EXPORT_SYMBOL_GPL(cpufreq_cpu_put);
- /*
- This is a generic cpufreq init() routine which can be used by cpufreq
- drivers of SMP systems. It will do following:
@@ -190,7 +236,7 @@ EXPORT_SYMBOL_GPL(cpufreq_generic_init);
unsigned int cpufreq_generic_get(unsigned int cpu) {
- struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
struct cpufreq_policy *policy = cpufreq_cpu_get_raw(cpu);
if (!policy || IS_ERR(policy->clk)) { pr_err("%s: No %s associated to cpu: %d\n",
@@ -202,49 +248,6 @@ unsigned int cpufreq_generic_get(unsigned int cpu) } EXPORT_SYMBOL_GPL(cpufreq_generic_get);
-/* Only for cpufreq core internal use */ -struct cpufreq_policy *cpufreq_cpu_get_raw(unsigned int cpu) -{
- return per_cpu(cpufreq_cpu_data, cpu);
-}
-struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu) -{
- struct cpufreq_policy *policy = NULL;
- unsigned long flags;
- if (cpu >= nr_cpu_ids)
return NULL;
- if (!down_read_trylock(&cpufreq_rwsem))
return NULL;
- /* get the cpufreq driver */
- read_lock_irqsave(&cpufreq_driver_lock, flags);
- if (cpufreq_driver) {
/* get the CPU */
policy = per_cpu(cpufreq_cpu_data, cpu);
if (policy)
kobject_get(&policy->kobj);
- }
- read_unlock_irqrestore(&cpufreq_driver_lock, flags);
- if (!policy)
up_read(&cpufreq_rwsem);
- return policy;
-} -EXPORT_SYMBOL_GPL(cpufreq_cpu_get);
-void cpufreq_cpu_put(struct cpufreq_policy *policy) -{
- kobject_put(&policy->kobj);
- up_read(&cpufreq_rwsem);
-} -EXPORT_SYMBOL_GPL(cpufreq_cpu_put);
- /*********************************************************************
*********************************************************************/
EXTERNALLY AFFECTING FREQUENCY CHANGES *
@@ -964,7 +967,6 @@ static int cpufreq_add_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu, struct device *dev) { int ret = 0;
unsigned long flags;
if (has_target()) { ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
@@ -975,13 +977,7 @@ static int cpufreq_add_policy_cpu(struct cpufreq_policy *policy, }
down_write(&policy->rwsem);
write_lock_irqsave(&cpufreq_driver_lock, flags);
cpumask_set_cpu(cpu, policy->cpus);
per_cpu(cpufreq_cpu_data, cpu) = policy;
write_unlock_irqrestore(&cpufreq_driver_lock, flags);
up_write(&policy->rwsem);
if (has_target()) {
@@ -1105,7 +1101,7 @@ static int update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu,
static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) {
- unsigned int j, cpu = dev->id;
- unsigned int cpu = dev->id; int ret = -ENOMEM; struct cpufreq_policy *policy; unsigned long flags;
@@ -1202,8 +1198,7 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) }
write_lock_irqsave(&cpufreq_driver_lock, flags);
- for_each_cpu(j, policy->cpus)
per_cpu(cpufreq_cpu_data, j) = policy;
list_add(&policy->policy_list, &cpufreq_policy_list); write_unlock_irqrestore(&cpufreq_driver_lock, flags);
if (cpufreq_driver->get && !cpufreq_driver->setpolicy) {
@@ -1265,10 +1260,6 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) CPUFREQ_CREATE_POLICY, policy); }
write_lock_irqsave(&cpufreq_driver_lock, flags);
list_add(&policy->policy_list, &cpufreq_policy_list);
write_unlock_irqrestore(&cpufreq_driver_lock, flags);
cpufreq_init_policy(policy);
if (!recover_policy) {
@@ -1292,8 +1283,7 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) err_out_unregister: err_get_freq: write_lock_irqsave(&cpufreq_driver_lock, flags);
- for_each_cpu(j, policy->cpus)
per_cpu(cpufreq_cpu_data, j) = NULL;
list_del(&policy->policy_list); write_unlock_irqrestore(&cpufreq_driver_lock, flags);
if (!recover_policy) {
@@ -1340,7 +1330,10 @@ static int __cpufreq_remove_dev_prepare(struct device *dev,
pr_debug("%s: unregistering CPU %u\n", __func__, cpu);
- policy = per_cpu(cpufreq_cpu_data, cpu);
- read_lock_irqsave(&cpufreq_driver_lock, flags);
- policy = cpufreq_cpu_get_raw(cpu);
- read_unlock_irqrestore(&cpufreq_driver_lock, flags);
- if (!policy) { pr_debug("%s: No cpu_data found\n", __func__); return -EINVAL;
@@ -1404,7 +1397,7 @@ static int __cpufreq_remove_dev_finish(struct device *dev, struct cpufreq_policy *policy;
read_lock_irqsave(&cpufreq_driver_lock, flags);
- policy = per_cpu(cpufreq_cpu_data, cpu);
policy = cpufreq_cpu_get_raw(cpu); read_unlock_irqrestore(&cpufreq_driver_lock, flags);
if (!policy) {
@@ -1460,7 +1453,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev, } }
- per_cpu(cpufreq_cpu_data, cpu) = NULL; return 0; }
For the current version of the patch series, this patch looks ok. But when you update it so that you don't have a separate "fallback policies list", the change you made to __cpufreq_add_dev in this patch might need more review.
Thanks, Saravana
On 12 February 2015 at 11:13, Saravana Kannan skannan@codeaurora.org wrote:
+struct cpufreq_policy *cpufreq_cpu_get_raw(unsigned int cpu)
Rename this to cpufreq_cpu_get_unsafe or _nolock?
Its done in a later patch..
Seems more descriptive. Hmm... you are just moving this function around. Ok, your call.
Yeah, as it has to be used earlier.
For the current version of the patch series, this patch looks ok. But when you update it so that you don't have a separate "fallback policies list", the change you made to __cpufreq_add_dev in this patch might need more review.
Things will change, lets see how they look like..
Earlier we used to find the 'policy' belonging to a cpu with the help of a per-cpu variable. And if 'cpu' passed to cpufreq_cpu_get() is bigger than 'nr_cpu_ids', it would have caused unpredictable issues as the per-cpu variable wouldn't have covered that value of 'cpu'. And so we had this check.
We traverse active-policy list to find policy for a cpu now. Even if 'cpu' passed to cpufreq_cpu_get() is an invalid number (i.e. greater than nr_cpu_ids), we will be able to manage it without any unpredictable behavior.
And so this check isn't required anymore. Get rid of it.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 7f947287ba46..d9528046f651 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -172,9 +172,6 @@ struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu) struct cpufreq_policy *policy = NULL; unsigned long flags;
- if (cpu >= nr_cpu_ids) - return NULL; - if (!down_read_trylock(&cpufreq_rwsem)) return NULL;
On 01/27/2015 12:36 AM, Viresh Kumar wrote:
Earlier we used to find the 'policy' belonging to a cpu with the help of a per-cpu variable. And if 'cpu' passed to cpufreq_cpu_get() is bigger than 'nr_cpu_ids', it would have caused unpredictable issues as the per-cpu variable wouldn't have covered that value of 'cpu'. And so we had this check.
We traverse active-policy list to find policy for a cpu now. Even if 'cpu' passed to cpufreq_cpu_get() is an invalid number (i.e. greater than nr_cpu_ids), we will be able to manage it without any unpredictable behavior.
And so this check isn't required anymore. Get rid of it.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/cpufreq.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 7f947287ba46..d9528046f651 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -172,9 +172,6 @@ struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu) struct cpufreq_policy *policy = NULL; unsigned long flags;
- if (cpu >= nr_cpu_ids)
return NULL;
- if (!down_read_trylock(&cpufreq_rwsem)) return NULL;
You are getting rid of this check because you no longer use the per-cpu variable. So, just squash it with the previous patch that removes the per-cpu variable?
-Saravana
On 12 February 2015 at 11:15, Saravana Kannan skannan@codeaurora.org wrote:
You are getting rid of this check because you no longer use the per-cpu variable. So, just squash it with the previous patch that removes the per-cpu variable?
We can, but its always better to keep such things in separate patches so that we know exactly why we did it.. And this is how Me and Srivatsa were asking you to do it earlier. It helps later on why something has changed.
This clearly states what the code inside these routines is doing and how these must be used.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index d9528046f651..21c8ef6073d7 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -167,6 +167,23 @@ struct cpufreq_policy *cpufreq_cpu_get_raw(unsigned int cpu) return NULL; }
+/** + * cpufreq_cpu_get: returns policy for a cpu and marks it busy. + * + * @cpu: cpu to find policy for. + * + * This returns policy for 'cpu', returns NULL if it doesn't exist. + * It also increments the kobject reference count to mark it busy and so would + * require a corresponding call to cpufreq_cpu_put() to decrement it back. + * If corresponding call cpufreq_cpu_put() isn't made, the policy wouldn't be + * freed as that depends on the kobj count. + * + * It also takes a read-lock of 'cpufreq_rwsem' and doesn't put it back if a + * valid policy is found. This is done to make sure the driver doesn't get + * unregistered while the policy is being used. + * + * Return: A valid policy on success, otherwise NULL on failure. + */ struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu) { struct cpufreq_policy *policy = NULL; @@ -193,6 +210,16 @@ struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu) } EXPORT_SYMBOL_GPL(cpufreq_cpu_get);
+/** + * cpufreq_cpu_put: Decrements the usage count of a policy + * + * @policy: policy earlier returned by cpufreq_cpu_get(). + * + * This decrements the kobject reference count incremented earlier by calling + * cpufreq_cpu_get(). + * + * It also drops the read-lock of 'cpufreq_rwsem' taken at cpufreq_cpu_get(). + */ void cpufreq_cpu_put(struct cpufreq_policy *policy) { kobject_put(&policy->kobj);
On 01/27/2015 12:36 AM, Viresh Kumar wrote:
This clearly states what the code inside these routines is doing and how these must be used.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/cpufreq.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index d9528046f651..21c8ef6073d7 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -167,6 +167,23 @@ struct cpufreq_policy *cpufreq_cpu_get_raw(unsigned int cpu) return NULL; }
+/**
- cpufreq_cpu_get: returns policy for a cpu and marks it busy.
- @cpu: cpu to find policy for.
- This returns policy for 'cpu', returns NULL if it doesn't exist.
- It also increments the kobject reference count to mark it busy and so would
- require a corresponding call to cpufreq_cpu_put() to decrement it back.
- If corresponding call cpufreq_cpu_put() isn't made, the policy wouldn't be
- freed as that depends on the kobj count.
- It also takes a read-lock of 'cpufreq_rwsem' and doesn't put it back if a
- valid policy is found. This is done to make sure the driver doesn't get
- unregistered while the policy is being used.
- Return: A valid policy on success, otherwise NULL on failure.
- */
I'm not sure we need to or should refer to kobject count (maybe just reference count instead?) and cpufreq_rwsem in the documentation. Those are implementation specific details and IMHO, shouldn't be in the documentation. I think it would be better to just explain the impact (which you already do) -- policy will never be freed and it will also prevent the cpufreq driver from getting unregistered.
Just my 2 cents.
struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu) { struct cpufreq_policy *policy = NULL; @@ -193,6 +210,16 @@ struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu) } EXPORT_SYMBOL_GPL(cpufreq_cpu_get);
+/**
- cpufreq_cpu_put: Decrements the usage count of a policy
- @policy: policy earlier returned by cpufreq_cpu_get().
- This decrements the kobject reference count incremented earlier by calling
- cpufreq_cpu_get().
- It also drops the read-lock of 'cpufreq_rwsem' taken at cpufreq_cpu_get().
- */
Similar comment here.
void cpufreq_cpu_put(struct cpufreq_policy *policy) { kobject_put(&policy->kobj);
-Saravana
On 12 February 2015 at 11:19, Saravana Kannan skannan@codeaurora.org wrote:
I'm not sure we need to or should refer to kobject count (maybe just reference count instead?) and cpufreq_rwsem in the documentation. Those are implementation specific details and IMHO, shouldn't be in the documentation. I think it would be better to just explain the impact (which you already do) -- policy will never be freed and it will also prevent the cpufreq driver from getting unregistered.
Just my 2 cents.
I wanted to be damn explicit here so that people understand what's going on. Few days back I had some real hard time to explain this concept to an kernel engineer and many more may follow.
Later commits would change the way policies are managed today. Policies wouldn't be freed on cpu hotplug (currently they aren't freed on suspend), and while the CPU is offline, the sysfs cpufreq files would still be present.
Because we don't mark policy->governor as NULL, it still contains pointer of the last governor it used. And when we read the 'scaling_governor' file, it shows the old value.
To prevent from this, mark policy->governor as NULL after we have issued CPUFREQ_GOV_POLICY_EXIT event for its governor.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 21c8ef6073d7..ed36c09f83cc 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1032,9 +1032,6 @@ static struct cpufreq_policy *cpufreq_policy_restore(unsigned int cpu) } read_unlock_irqrestore(&cpufreq_driver_lock, flags);
- if (policy) - policy->governor = NULL; - return policy; }
@@ -1466,6 +1463,8 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
if (!cpufreq_suspended) cpufreq_policy_free(policy); + else + policy->governor = NULL; } else if (has_target()) { ret = __cpufreq_governor(policy, CPUFREQ_GOV_START); if (!ret)
On 01/27/2015 12:36 AM, Viresh Kumar wrote:
Later commits would change the way policies are managed today. Policies wouldn't be freed on cpu hotplug (currently they aren't freed on suspend), and while the CPU is offline, the sysfs cpufreq files would still be present.
Because we don't mark policy->governor as NULL, it still contains pointer of the last governor it used. And when we read the 'scaling_governor' file, it shows the old value.
To prevent from this, mark policy->governor as NULL after we have issued CPUFREQ_GOV_POLICY_EXIT event for its governor.
I don't see anything wrong with scaling_governor showing the last used governor when the CPUs are just logically hotplug removed. In the physical hotplug case, the whole directory would go away and the policy would be freed. So, this is not a concern there.
Also, longer term (after your series goes in), I think we actually should NOT send POLICY_EXIT on just logical hotplug. That way, the governor tunables are not lost across a logical hotplug.
-Saravana
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
drivers/cpufreq/cpufreq.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 21c8ef6073d7..ed36c09f83cc 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1032,9 +1032,6 @@ static struct cpufreq_policy *cpufreq_policy_restore(unsigned int cpu) } read_unlock_irqrestore(&cpufreq_driver_lock, flags);
- if (policy)
policy->governor = NULL;
- return policy; }
@@ -1466,6 +1463,8 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
if (!cpufreq_suspended) cpufreq_policy_free(policy);
else
} else if (has_target()) { ret = __cpufreq_governor(policy, CPUFREQ_GOV_START); if (!ret)policy->governor = NULL;
On 12 February 2015 at 11:22, Saravana Kannan skannan@codeaurora.org wrote:
I don't see anything wrong with scaling_governor showing the last used governor when the CPUs are just logically hotplug removed. In the physical
But the governor is *not* used currently by the CPUs and marking it used is completely wrong.
hotplug case, the whole directory would go away and the policy would be freed. So, this is not a concern there.
Its not relevant here.
Also, longer term (after your series goes in), I think we actually should NOT send POLICY_EXIT on just logical hotplug. That way, the governor tunables are not lost across a logical hotplug.
I am not sure about it. For example, after all CPUs of a policy are gone, the governor is free to be get removed. And so the user should be free to do rmmod on it.
Later commits would change the way policies are managed today. Policies wouldn't be freed on cpu hotplug (currently they aren't freed on suspend), and while the CPU is offline, the sysfs cpufreq files would still be present.
User may accidentally try to update the sysfs files in following directory: '/sys/devices/system/cpu/cpuX/cpufreq/'. And that would result in undefined behavior as policy wouldn't be active then.
To disallow such accesses, sense if a policy is active or not while doing such operations. This can be done easily by getting the policy again with cpufreq_cpu_get_raw(), as that would traverse the list of active policies.
Apart from updating the store() routine, we also update __cpufreq_get() which can call cpufreq_out_of_sync(). The later routine tries to update policy->cur and start notifying kernel about it.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index ed36c09f83cc..fffa37136b7b 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -826,11 +826,22 @@ static ssize_t store(struct kobject *kobj, struct attribute *attr,
down_write(&policy->rwsem);
+ /* + * Policy might not be active currently, and so we shouldn't try + * updating any values here. An inactive policy is moved to fallback + * list and so cpufreq_cpu_get_raw() should fail. + */ + if (unlikely(!cpufreq_cpu_get_raw(policy->cpu))) { + ret = -EPERM; + goto unlock_policy_rwsem; + } + if (fattr->store) ret = fattr->store(policy, buf, count); else ret = -EIO;
+unlock_policy_rwsem: up_write(&policy->rwsem);
up_read(&cpufreq_rwsem); @@ -1587,6 +1598,14 @@ static unsigned int __cpufreq_get(struct cpufreq_policy *policy)
ret_freq = cpufreq_driver->get(policy->cpu);
+ /* + * Policy might not be active currently, and so we shouldn't try + * updating any values here. An inactive policy is moved to fallback + * list and so cpufreq_cpu_get_raw() should fail. + */ + if (unlikely(!cpufreq_cpu_get_raw(policy->cpu))) + return ret_freq; + if (ret_freq && policy->cur && !(cpufreq_driver->flags & CPUFREQ_CONST_LOOPS)) { /* verify no discrepancy between actual and
On 01/27/2015 12:36 AM, Viresh Kumar wrote:
Later commits would change the way policies are managed today. Policies wouldn't be freed on cpu hotplug (currently they aren't freed on suspend), and while the CPU is offline, the sysfs cpufreq files would still be present.
User may accidentally try to update the sysfs files in following directory: '/sys/devices/system/cpu/cpuX/cpufreq/'. And that would result in undefined behavior as policy wouldn't be active then.
To disallow such accesses, sense if a policy is active or not while doing such operations. This can be done easily by getting the policy again with cpufreq_cpu_get_raw(), as that would traverse the list of active policies.
Apart from updating the store() routine, we also update __cpufreq_get() which can call cpufreq_out_of_sync(). The later routine tries to update policy->cur and start notifying kernel about it.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
I'm okay with the idea of this patch for now. I'll argue against this after this series goes in as it'll need more non-trivial changes before this idea of this patch can be removed.
But not acking it because I expect the code to change after combining active/inactive list.
-Saravana
In order to prepare for the next few commits, that will stop migrating sysfs files on cpu hotplug, this patch starts managing sysfs-cpu separately.
The behavior is still the same as we are still migrating sysfs files on hotplug, later commits would change that.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 11 +++++++---- include/linux/cpufreq.h | 4 +++- 2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index fffa37136b7b..4476471c594e 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -921,7 +921,7 @@ static int cpufreq_add_dev_symlink(struct cpufreq_policy *policy) for_each_cpu(j, policy->cpus) { struct device *cpu_dev;
- if (j == policy->cpu) + if (j == policy->kobj_cpu) continue;
pr_debug("Adding link for CPU: %u\n", j); @@ -1126,6 +1126,7 @@ static int update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu,
down_write(&policy->rwsem); policy->cpu = cpu; + policy->kobj_cpu = cpu; up_write(&policy->rwsem);
return 0; @@ -1188,10 +1189,12 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) * the creation of a brand new one. So we need to perform this update * by invoking update_policy_cpu(). */ - if (recover_policy && cpu != policy->cpu) + if (recover_policy && cpu != policy->cpu) { WARN_ON(update_policy_cpu(policy, cpu, dev)); - else + } else { policy->cpu = cpu; + policy->kobj_cpu = cpu; + }
cpumask_copy(policy->cpus, cpumask_of(cpu));
@@ -1393,7 +1396,7 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, CPUFREQ_NAME_LEN); }
- if (cpu != policy->cpu) { + if (cpu != policy->kobj_cpu) { sysfs_remove_link(&dev->kobj, "cpufreq"); } else if (cpus > 1) { /* Nominate new CPU */ diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index e326cddef6db..9b8937f027fb 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -65,7 +65,9 @@ struct cpufreq_policy {
unsigned int shared_type; /* ACPI: ANY or ALL affected CPUs should set cpufreq */ - unsigned int cpu; /* cpu nr of CPU managing this policy */ + unsigned int cpu; /* cpu managing this policy, must be online */ + unsigned int kobj_cpu; /* cpu managing sysfs files, can be offline */ + struct clk *clk; struct cpufreq_cpuinfo cpuinfo;/* see above */
When we hot-unplug a cpu, we remove its sysfs cpufreq directory and if the outgoing cpu was the owner of policy->kobj earlier then we migrate the sysfs directory to under another online cpu.
There are few disadvantages this brings: - Code Complexity - Slower hotplug/suspend/resume - sysfs file permissions are reset after all policy->cpus are offlined - CPUFreq stats history lost after all policy->cpus are offlined - Special management of sysfs stuff during suspend/resume
To overcome these, this patch modifies the way sysfs directories are managed: - Select sysfs kobjects owner while initializing policy and don't change it during hotplugs. Track it with kobj_cpu created earlier.
- Create symlinks for all related CPUs (can be offline) instead of affected CPUs on policy initialization and remove them only when the policy is freed.
- Free policy structure only on the removal of cpufreq-driver and not during hotplug/suspend/resume, detected by checking 'struct subsys_interface *' (Valid only when called from subsys_interface_unregister() while unregistering driver).
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 176 ++++++++++++++++++++++------------------------ 1 file changed, 85 insertions(+), 91 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 4476471c594e..0e33c63b4a0e 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -912,25 +912,36 @@ void cpufreq_sysfs_remove_file(const struct attribute *attr) } EXPORT_SYMBOL(cpufreq_sysfs_remove_file);
-/* symlink affected CPUs */ -static int cpufreq_add_dev_symlink(struct cpufreq_policy *policy) +/* Add/remove symlinks for all related CPUs */ +static int cpufreq_add_remove_dev_symlink(struct cpufreq_policy *policy, + bool add) { unsigned int j; int ret = 0;
- for_each_cpu(j, policy->cpus) { + for_each_cpu(j, policy->related_cpus) { struct device *cpu_dev;
if (j == policy->kobj_cpu) continue;
- pr_debug("Adding link for CPU: %u\n", j); + pr_debug("%s: %s symlink for CPU: %u\n", __func__, + add ? "Adding" : "Removing", j); + cpu_dev = get_cpu_device(j); - ret = sysfs_create_link(&cpu_dev->kobj, &policy->kobj, - "cpufreq"); - if (ret) - break; + if (WARN_ON(!cpu_dev)) + continue; + + if (add) { + ret = sysfs_create_link(&cpu_dev->kobj, &policy->kobj, + "cpufreq"); + if (ret) + break; + } else { + sysfs_remove_link(&cpu_dev->kobj, "cpufreq"); + } } + return ret; }
@@ -964,7 +975,7 @@ static int cpufreq_add_dev_interface(struct cpufreq_policy *policy, return ret; }
- return cpufreq_add_dev_symlink(policy); + return cpufreq_add_remove_dev_symlink(policy, true); }
static void cpufreq_init_policy(struct cpufreq_policy *policy) @@ -1026,7 +1037,7 @@ static int cpufreq_add_policy_cpu(struct cpufreq_policy *policy, } }
- return sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq"); + return 0; }
static struct cpufreq_policy *cpufreq_policy_restore(unsigned int cpu) @@ -1046,7 +1057,7 @@ static struct cpufreq_policy *cpufreq_policy_restore(unsigned int cpu) return policy; }
-static struct cpufreq_policy *cpufreq_policy_alloc(void) +static struct cpufreq_policy *cpufreq_policy_alloc(int cpu) { struct cpufreq_policy *policy;
@@ -1068,6 +1079,11 @@ static struct cpufreq_policy *cpufreq_policy_alloc(void) init_completion(&policy->kobj_unregister); INIT_WORK(&policy->update, handle_update);
+ policy->cpu = cpu; + + /* Set this once on allocation */ + policy->kobj_cpu = cpu; + return policy;
err_free_cpumask: @@ -1086,10 +1102,11 @@ static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy) blocking_notifier_call_chain(&cpufreq_policy_notifier_list, CPUFREQ_REMOVE_POLICY, policy);
- down_read(&policy->rwsem); + down_write(&policy->rwsem); + cpufreq_add_remove_dev_symlink(policy, false); kobj = &policy->kobj; cmp = &policy->kobj_unregister; - up_read(&policy->rwsem); + up_write(&policy->rwsem); kobject_put(kobj);
/* @@ -1109,27 +1126,14 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy) kfree(policy); }
-static int update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu, - struct device *cpu_dev) +static void update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu) { - int ret; - if (WARN_ON(cpu == policy->cpu)) - return 0; - - /* Move kobject to the new policy->cpu */ - ret = kobject_move(&policy->kobj, &cpu_dev->kobj); - if (ret) { - pr_err("%s: Failed to move kobj: %d\n", __func__, ret); - return ret; - } + return;
down_write(&policy->rwsem); policy->cpu = cpu; - policy->kobj_cpu = cpu; up_write(&policy->rwsem); - - return 0; }
static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) @@ -1138,7 +1142,7 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) int ret = -ENOMEM; struct cpufreq_policy *policy; unsigned long flags; - bool recover_policy = cpufreq_suspended; + bool recover_policy = !sif;
if (cpu_is_offline(cpu)) return 0; @@ -1173,7 +1177,7 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) policy = recover_policy ? cpufreq_policy_restore(cpu) : NULL; if (!policy) { recover_policy = false; - policy = cpufreq_policy_alloc(); + policy = cpufreq_policy_alloc(cpu); if (!policy) goto nomem_out; } else { @@ -1189,12 +1193,8 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) * the creation of a brand new one. So we need to perform this update * by invoking update_policy_cpu(). */ - if (recover_policy && cpu != policy->cpu) { - WARN_ON(update_policy_cpu(policy, cpu, dev)); - } else { - policy->cpu = cpu; - policy->kobj_cpu = cpu; - } + if (recover_policy && cpu != policy->cpu) + update_policy_cpu(policy, cpu);
cpumask_copy(policy->cpus, cpumask_of(cpu));
@@ -1378,9 +1378,13 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, cpus = cpumask_weight(policy->cpus); up_read(&policy->rwsem);
- /* Save the policy when doing a light-weight tear-down of last cpu */ + /* + * Preserve the policy when all policy->cpus are down (cpu == 1). + * But don't preserve it if the driver is actually getting removed + * (sif != NULL). + */ write_lock_irqsave(&cpufreq_driver_lock, flags); - if (cpufreq_suspended && cpus == 1) + if (!sif && cpus == 1) list_add(&policy->fallback_list, &cpufreq_fallback_list); write_unlock_irqrestore(&cpufreq_driver_lock, flags);
@@ -1396,29 +1400,14 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, CPUFREQ_NAME_LEN); }
- if (cpu != policy->kobj_cpu) { - sysfs_remove_link(&dev->kobj, "cpufreq"); - } else if (cpus > 1) { - /* Nominate new CPU */ - int new_cpu = cpumask_any_but(policy->cpus, cpu); - struct device *cpu_dev = get_cpu_device(new_cpu); - - sysfs_remove_link(&cpu_dev->kobj, "cpufreq"); - ret = update_policy_cpu(policy, new_cpu, cpu_dev); - if (ret) { - if (sysfs_create_link(&cpu_dev->kobj, &policy->kobj, - "cpufreq")) - pr_err("%s: Failed to restore kobj link to cpu:%d\n", - __func__, cpu_dev->id); - return ret; - } + if (cpu != policy->cpu) + return 0;
- if (!cpufreq_suspended) - pr_debug("%s: policy Kobject moved to cpu: %d from: %d\n", - __func__, new_cpu, cpu); - } else if (cpufreq_driver->stop_cpu) { + if (cpus > 1) + /* Nominate new CPU */ + update_policy_cpu(policy, cpumask_any_but(policy->cpus, cpu)); + else if (cpufreq_driver->stop_cpu) cpufreq_driver->stop_cpu(policy); - }
return 0; } @@ -1447,39 +1436,11 @@ static int __cpufreq_remove_dev_finish(struct device *dev, cpumask_clear_cpu(cpu, policy->cpus); up_write(&policy->rwsem);
- /* If cpu is last user of policy, free policy */ - if (cpus == 1) { - if (has_target()) { - ret = __cpufreq_governor(policy, - CPUFREQ_GOV_POLICY_EXIT); - if (ret) { - pr_err("%s: Failed to exit governor\n", - __func__); - return ret; - } - } - - if (!cpufreq_suspended) - cpufreq_policy_put_kobj(policy); - - /* - * Perform the ->exit() even during light-weight tear-down, - * since this is a core component, and is essential for the - * subsequent light-weight ->init() to succeed. - */ - if (cpufreq_driver->exit) - cpufreq_driver->exit(policy); - - /* Remove policy from list of active policies */ - write_lock_irqsave(&cpufreq_driver_lock, flags); - list_del(&policy->policy_list); - write_unlock_irqrestore(&cpufreq_driver_lock, flags); + /* Not the last cpu of policy, start governor again ? */ + if (cpus > 1) { + if (!has_target()) + return 0;
- if (!cpufreq_suspended) - cpufreq_policy_free(policy); - else - policy->governor = NULL; - } else if (has_target()) { ret = __cpufreq_governor(policy, CPUFREQ_GOV_START); if (!ret) ret = __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); @@ -1488,8 +1449,41 @@ static int __cpufreq_remove_dev_finish(struct device *dev, pr_err("%s: Failed to start governor\n", __func__); return ret; } + + return 0; }
+ /* If cpu is last user of policy, free policy */ + if (has_target()) { + ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); + if (ret) { + pr_err("%s: Failed to exit governor\n", __func__); + return ret; + } + } + + /* Free the policy kobjects only if the driver is getting removed. */ + if (sif) + cpufreq_policy_put_kobj(policy); + + /* + * Perform the ->exit() even during light-weight tear-down, + * since this is a core component, and is essential for the + * subsequent light-weight ->init() to succeed. + */ + if (cpufreq_driver->exit) + cpufreq_driver->exit(policy); + + /* Remove policy from list of active policies */ + write_lock_irqsave(&cpufreq_driver_lock, flags); + list_del(&policy->policy_list); + write_unlock_irqrestore(&cpufreq_driver_lock, flags); + + if (sif) + cpufreq_policy_free(policy); + else + policy->governor = NULL; + return 0; }
There are two cases when we may try to add already managed CPUs: - On boot, the first cpu has marked all policy->cpus managed and so we will find policy for all other policy->cpus later on. - When a managed cpu is hotplugged out and later brought back in.
Currently we manage these two with separate paths and checks. While the first one is detected by testing cpu against 'policy->cpus', the other one is detected by testing cpu against 'policy->related_cpus'.
We can manage both these via a single path and there is no need to do special checking for the first one.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 0e33c63b4a0e..8b110a50c22e 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1014,6 +1014,10 @@ static int cpufreq_add_policy_cpu(struct cpufreq_policy *policy, { int ret = 0;
+ /* Is cpu already managed ? */ + if (cpumask_test_cpu(cpu, policy->cpus)) + return 0; + if (has_target()) { ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP); if (ret) { @@ -1149,16 +1153,10 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
pr_debug("adding CPU %u\n", cpu);
- /* check whether a different CPU already registered this - * CPU because it is in the same boat. */ - policy = cpufreq_cpu_get_raw(cpu); - if (unlikely(policy)) - return 0; - if (!down_read_trylock(&cpufreq_rwsem)) return 0;
- /* Check if this cpu was hot-unplugged earlier and has siblings */ + /* Check if this cpu already has a policy to manage it */ read_lock_irqsave(&cpufreq_driver_lock, flags); for_each_policy(policy) { if (cpumask_test_cpu(cpu, policy->related_cpus)) {
cpufreq_update_policy() was kept as a separate routine earlier as it was handling migration of sysfs directories, which isn't done anymore. It is only updating policy->cpu now and can be removed.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 34 ++++++++++++---------------------- 1 file changed, 12 insertions(+), 22 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 8b110a50c22e..fde0a75f9692 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1058,6 +1058,12 @@ static struct cpufreq_policy *cpufreq_policy_restore(unsigned int cpu) } read_unlock_irqrestore(&cpufreq_driver_lock, flags);
+ if (policy) { + down_write(&policy->rwsem); + policy->cpu = cpu; + up_write(&policy->rwsem); + } + return policy; }
@@ -1130,16 +1136,6 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy) kfree(policy); }
-static void update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu) -{ - if (WARN_ON(cpu == policy->cpu)) - return; - - down_write(&policy->rwsem); - policy->cpu = cpu; - up_write(&policy->rwsem); -} - static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) { unsigned int cpu = dev->id; @@ -1185,15 +1181,6 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) write_unlock_irqrestore(&cpufreq_driver_lock, flags); }
- /* - * In the resume path, since we restore a saved policy, the assignment - * to policy->cpu is like an update of the existing policy, rather than - * the creation of a brand new one. So we need to perform this update - * by invoking update_policy_cpu(). - */ - if (recover_policy && cpu != policy->cpu) - update_policy_cpu(policy, cpu); - cpumask_copy(policy->cpus, cpumask_of(cpu));
/* call driver. From then on the cpufreq must be able @@ -1401,11 +1388,14 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, if (cpu != policy->cpu) return 0;
- if (cpus > 1) + if (cpus > 1) { /* Nominate new CPU */ - update_policy_cpu(policy, cpumask_any_but(policy->cpus, cpu)); - else if (cpufreq_driver->stop_cpu) + down_write(&policy->rwsem); + policy->cpu = cpumask_any_but(policy->cpus, cpu); + up_write(&policy->rwsem); + } else if (cpufreq_driver->stop_cpu) { cpufreq_driver->stop_cpu(policy); + }
return 0; }
policy->kobj is required to be initialized once in the lifetime of a policy while its allocated. Currently we are initializing it from __cpufreq_add_dev() and that doesn't look to be the best place for doing so as we have to do this on special cases (like: !recover_policy).
We can initialize it from a more obvious place cpufreq_policy_alloc() and that will make code look cleaner, specially the error handling part.
The error handling part of __cpufreq_add_dev() was doing almost the same thing while recover_policy is true or false. Fix that as well by always calling cpufreq_policy_put_kobj() with an additional parameter to skip some steps.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 45 ++++++++++++++++++++------------------------- 1 file changed, 20 insertions(+), 25 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index fde0a75f9692..4a97d7428d25 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1067,9 +1067,10 @@ static struct cpufreq_policy *cpufreq_policy_restore(unsigned int cpu) return policy; }
-static struct cpufreq_policy *cpufreq_policy_alloc(int cpu) +static struct cpufreq_policy *cpufreq_policy_alloc(struct device *dev) { struct cpufreq_policy *policy; + int ret;
policy = kzalloc(sizeof(*policy), GFP_KERNEL); if (!policy) @@ -1081,6 +1082,13 @@ static struct cpufreq_policy *cpufreq_policy_alloc(int cpu) if (!zalloc_cpumask_var(&policy->related_cpus, GFP_KERNEL)) goto err_free_cpumask;
+ ret = kobject_init_and_add(&policy->kobj, &ktype_cpufreq, &dev->kobj, + "cpufreq"); + if (ret) { + pr_err("%s: failed to init policy->kobj: %d\n", __func__, ret); + goto err_free_rcpumask; + } + INIT_LIST_HEAD(&policy->policy_list); INIT_LIST_HEAD(&policy->fallback_list); init_rwsem(&policy->rwsem); @@ -1089,13 +1097,15 @@ static struct cpufreq_policy *cpufreq_policy_alloc(int cpu) init_completion(&policy->kobj_unregister); INIT_WORK(&policy->update, handle_update);
- policy->cpu = cpu; + policy->cpu = dev->id;
/* Set this once on allocation */ - policy->kobj_cpu = cpu; + policy->kobj_cpu = dev->id;
return policy;
+err_free_rcpumask: + free_cpumask_var(policy->related_cpus); err_free_cpumask: free_cpumask_var(policy->cpus); err_free_policy: @@ -1104,13 +1114,14 @@ static struct cpufreq_policy *cpufreq_policy_alloc(int cpu) return NULL; }
-static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy) +static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy, bool notify) { struct kobject *kobj; struct completion *cmp;
- blocking_notifier_call_chain(&cpufreq_policy_notifier_list, - CPUFREQ_REMOVE_POLICY, policy); + if (notify) + blocking_notifier_call_chain(&cpufreq_policy_notifier_list, + CPUFREQ_REMOVE_POLICY, policy);
down_write(&policy->rwsem); cpufreq_add_remove_dev_symlink(policy, false); @@ -1171,7 +1182,7 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) policy = recover_policy ? cpufreq_policy_restore(cpu) : NULL; if (!policy) { recover_policy = false; - policy = cpufreq_policy_alloc(cpu); + policy = cpufreq_policy_alloc(dev); if (!policy) goto nomem_out; } else { @@ -1206,15 +1217,6 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) if (!recover_policy) { policy->user_policy.min = policy->min; policy->user_policy.max = policy->max; - - /* prepare interface data */ - ret = kobject_init_and_add(&policy->kobj, &ktype_cpufreq, - &dev->kobj, "cpufreq"); - if (ret) { - pr_err("%s: failed to init policy->kobj: %d\n", - __func__, ret); - goto err_init_policy_kobj; - } }
write_lock_irqsave(&cpufreq_driver_lock, flags); @@ -1305,19 +1307,12 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) write_lock_irqsave(&cpufreq_driver_lock, flags); list_del(&policy->policy_list); write_unlock_irqrestore(&cpufreq_driver_lock, flags); - - if (!recover_policy) { - kobject_put(&policy->kobj); - wait_for_completion(&policy->kobj_unregister); - } -err_init_policy_kobj: up_write(&policy->rwsem);
if (cpufreq_driver->exit) cpufreq_driver->exit(policy); err_set_policy_cpu: - if (recover_policy) - cpufreq_policy_put_kobj(policy); + cpufreq_policy_put_kobj(policy, recover_policy); cpufreq_policy_free(policy);
nomem_out: @@ -1452,7 +1447,7 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
/* Free the policy kobjects only if the driver is getting removed. */ if (sif) - cpufreq_policy_put_kobj(policy); + cpufreq_policy_put_kobj(policy, true);
/* * Perform the ->exit() even during light-weight tear-down,
Both cpufreq_policy_put_kobj() and cpufreq_policy_free() are always called together, cpufreq_policy_put_kobj() followed by cpufreq_policy_free().
cpufreq_policy_put_kobj() is actually part of freeing policy and so can be called from cpufreq_policy_free().
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 4a97d7428d25..0e03ae1bc96a 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1140,8 +1140,9 @@ static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy, bool notify) pr_debug("wait complete\n"); }
-static void cpufreq_policy_free(struct cpufreq_policy *policy) +static void cpufreq_policy_free(struct cpufreq_policy *policy, bool notify) { + cpufreq_policy_put_kobj(policy, notify); free_cpumask_var(policy->related_cpus); free_cpumask_var(policy->cpus); kfree(policy); @@ -1312,9 +1313,7 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) if (cpufreq_driver->exit) cpufreq_driver->exit(policy); err_set_policy_cpu: - cpufreq_policy_put_kobj(policy, recover_policy); - cpufreq_policy_free(policy); - + cpufreq_policy_free(policy, recover_policy); nomem_out: up_read(&cpufreq_rwsem);
@@ -1445,10 +1444,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev, } }
- /* Free the policy kobjects only if the driver is getting removed. */ - if (sif) - cpufreq_policy_put_kobj(policy, true); - /* * Perform the ->exit() even during light-weight tear-down, * since this is a core component, and is essential for the @@ -1462,8 +1457,9 @@ static int __cpufreq_remove_dev_finish(struct device *dev, list_del(&policy->policy_list); write_unlock_irqrestore(&cpufreq_driver_lock, flags);
+ /* Free the policy only if the driver is getting removed. */ if (sif) - cpufreq_policy_free(policy); + cpufreq_policy_free(policy, true); else policy->governor = NULL;
On cpu hot-unplug, we don't need to wait for POST_DEAD notification to restart the governor if the policy has atleast one online cpu left. We can restart the governor right from the DOWN_PREPARE notification instead.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 59 +++++++++++++++++++---------------------------- 1 file changed, 24 insertions(+), 35 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 0e03ae1bc96a..bcc042a6221a 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1338,7 +1338,7 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, struct subsys_interface *sif) { unsigned int cpu = dev->id, cpus; - int ret; + int ret = 0; unsigned long flags; struct cpufreq_policy *policy;
@@ -1379,25 +1379,35 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, CPUFREQ_NAME_LEN); }
- if (cpu != policy->cpu) - return 0; - if (cpus > 1) { - /* Nominate new CPU */ down_write(&policy->rwsem); - policy->cpu = cpumask_any_but(policy->cpus, cpu); + cpumask_clear_cpu(cpu, policy->cpus); + + /* Nominate new CPU */ + if (cpu == policy->cpu) + policy->cpu = cpumask_any(policy->cpus); up_write(&policy->rwsem); + + if (has_target()) { + /* Start governor again */ + ret = __cpufreq_governor(policy, CPUFREQ_GOV_START); + if (!ret) + ret = __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); + + if (ret) + pr_err("%s: Failed to start governor\n", __func__); + } } else if (cpufreq_driver->stop_cpu) { cpufreq_driver->stop_cpu(policy); }
- return 0; + return ret; }
static int __cpufreq_remove_dev_finish(struct device *dev, struct subsys_interface *sif) { - unsigned int cpu = dev->id, cpus; + unsigned int cpu = dev->id; int ret; unsigned long flags; struct cpufreq_policy *policy; @@ -1406,34 +1416,13 @@ static int __cpufreq_remove_dev_finish(struct device *dev, policy = cpufreq_cpu_get_raw(cpu); read_unlock_irqrestore(&cpufreq_driver_lock, flags);
- if (!policy) { - pr_debug("%s: No cpu_data found\n", __func__); - return -EINVAL; - } - - down_write(&policy->rwsem); - cpus = cpumask_weight(policy->cpus); - - if (cpus > 1) - cpumask_clear_cpu(cpu, policy->cpus); - up_write(&policy->rwsem); - - /* Not the last cpu of policy, start governor again ? */ - if (cpus > 1) { - if (!has_target()) - return 0; - - ret = __cpufreq_governor(policy, CPUFREQ_GOV_START); - if (!ret) - ret = __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); - - if (ret) { - pr_err("%s: Failed to start governor\n", __func__); - return ret; - } - + /* + * If this isn't the last cpu of policy, we will fail to get policy here + * as the cpumask in policy->cpus is already cleared. Hence we don't + * need to proceed here anymore + */ + if (!policy) return 0; - }
/* If cpu is last user of policy, free policy */ if (has_target()) {
cpufreq_add_dev() is an unnecessary wrapper over __cpufreq_add_dev(). Merge them.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 29 ++++++++++++----------------- 1 file changed, 12 insertions(+), 17 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index bcc042a6221a..7501347cf19c 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1148,7 +1148,16 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy, bool notify) kfree(policy); }
-static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) +/** + * cpufreq_add_dev - add a CPU device + * + * Adds the cpufreq interface for a CPU device. + * + * The Oracle says: try running cpufreq registration/unregistration concurrently + * with with cpu hotplugging and all hell will break loose. Tried to clean this + * mess up, but more thorough testing is needed. - Mathieu + */ +static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) { unsigned int cpu = dev->id; int ret = -ENOMEM; @@ -1320,20 +1329,6 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) return ret; }
-/** - * cpufreq_add_dev - add a CPU device - * - * Adds the cpufreq interface for a CPU device. - * - * The Oracle says: try running cpufreq registration/unregistration concurrently - * with with cpu hotplugging and all hell will break loose. Tried to clean this - * mess up, but more thorough testing is needed. - Mathieu - */ -static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) -{ - return __cpufreq_add_dev(dev, sif); -} - static int __cpufreq_remove_dev_prepare(struct device *dev, struct subsys_interface *sif) { @@ -2307,7 +2302,7 @@ static int cpufreq_cpu_callback(struct notifier_block *nfb, if (dev) { switch (action & ~CPU_TASKS_FROZEN) { case CPU_ONLINE: - __cpufreq_add_dev(dev, NULL); + cpufreq_add_dev(dev, NULL); break;
case CPU_DOWN_PREPARE: @@ -2319,7 +2314,7 @@ static int cpufreq_cpu_callback(struct notifier_block *nfb, break;
case CPU_DOWN_FAILED: - __cpufreq_add_dev(dev, NULL); + cpufreq_add_dev(dev, NULL); break; } }
On Tuesday, January 27, 2015 02:06:06 PM Viresh Kumar wrote:
Hi Rafael,
The aim of this series is to stop managing cpufreq sysfs directories on CPU hotplugs.
OK, and this is the last one I'm going to consider for 3.20 (in case you have more of them queued up somewhere).
This issue has been raised multiple times earlier, the latest being tried by Saravana. While working on the $Subject thread, I did lots of cleanups, most of which are already pushed by you.
The two remaining ones are added to this series..
Currently on removal of a 'cpu != policy->cpu', we remove its sysfs directories by removing the soft-link. And on removal of policy->cpu, we migrate the sysfs directories to the next cpu. But if policy->cpu was the last CPU, we remove the policy completely and allocate it again as soon as the CPUs come back. This has shortcomings:
- Code Complexity
- Slower hotplug
- sysfs file permissions are reset after all policy->cpus are offlined
- CPUFreq stats history lost after all policy->cpus are offlined
- Special management of sysfs stuff during suspend/resume
To make things simple we stop playing with sysfs files unless the driver is getting removed.
And uses will see CPUs that are not present any more?
On 27 January 2015 at 20:36, Rafael J. Wysocki rjw@rjwysocki.net wrote:
OK, and this is the last one I'm going to consider for 3.20 (in case you have more of them queued up somewhere).
Okay..
To make things simple we stop playing with sysfs files unless the driver is getting removed.
And uses will see CPUs that are not present any more?
s/uses/users ?
Actually I have never seen such a situation on the ARM world and so don't know what exactly happens here. You are probably talking about the NUMA server space, where a cluster of CPUs can be removed physically from the NUMA network of CPUs.
What actually happens in that case? Similar to echo 0 > online ? And do we support it currently?
FWIW, I haven't paid any attention to side of the problem yet, but as soon as I understand it, I can fix it. Any tips on how can I emulate that on platforms where we can't take out the CPUs physically ?
On 01/27/2015 12:36 AM, Viresh Kumar wrote:
@Saravana: Few patches might look very similar to what you have done, please add your SOB for those :)
@Viresh, thanks for taking up the patches and working on it. I wish I could have spent more time to get this in myself.
For the patches that are very similar to the ones I sent out, could you please leave the author as me? I spent quite some time figuring out the different corner cases and coding it up and don't want to completely lose authorship for the work I did. Would appreciate it if you could accommodate that.
I'll definitely review this. One thing that's not done in this patchset which I think should eventually be done (so, we should make sure we don't have any code that makes that harder to do in the future) is to allow setting the cpufreq parameters even when all the CPUs in the policy are offline. It's actually trivial for most of the parameters except a few governor specific tunables.
Thanks, Saravana
On 29 January 2015 at 01:05, Saravana Kannan skannan@codeaurora.org wrote:
@Viresh, thanks for taking up the patches and working on it. I wish I could have spent more time to get this in myself.
For the patches that are very similar to the ones I sent out, could you please leave the author as me? I spent quite some time figuring out the different corner cases and coding it up and don't want to completely lose authorship for the work I did. Would appreciate it if you could accommodate that.
Actually I haven't applied any of your patches as is, but wrote them myself. On some occasions they finally got similar to what you have done. Still I will try to identify the ones which match exactly and mark you the author.
I'll definitely review this. One thing that's not done in this patchset which I think should eventually be done (so, we should make sure we don't have any code that makes that harder to do in the future) is to allow setting the cpufreq parameters even when all the CPUs in the policy are offline. It's actually trivial for most of the parameters except a few governor specific tunables.
I disagree. The cpufreq-policy is inactive at that time and must not be updated at all. I have specifically marked the policy inactive in this case and wouldn't like to make the code unnecessarily complex without much need. If you have a good case why you want it, then we can discuss.
On Tuesday, January 27, 2015 02:06:06 PM Viresh Kumar wrote:
Hi Rafael,
The aim of this series is to stop managing cpufreq sysfs directories on CPU hotplugs.
This issue has been raised multiple times earlier, the latest being tried by Saravana. While working on the $Subject thread, I did lots of cleanups, most of which are already pushed by you.
The two remaining ones are added to this series..
Currently on removal of a 'cpu != policy->cpu', we remove its sysfs directories by removing the soft-link. And on removal of policy->cpu, we migrate the sysfs directories to the next cpu. But if policy->cpu was the last CPU, we remove the policy completely and allocate it again as soon as the CPUs come back. This has shortcomings:
- Code Complexity
- Slower hotplug
- sysfs file permissions are reset after all policy->cpus are offlined
- CPUFreq stats history lost after all policy->cpus are offlined
- Special management of sysfs stuff during suspend/resume
To make things simple we stop playing with sysfs files unless the driver is getting removed. Also the policy is kept intact to be used later.
First few patches provide a clean base for others *more important* patches.
Rebased-over: your bleeding edge branch as there were dependencies on my earlier patches.
Pushed here:
git://git.linaro.org/people/viresh.kumar/linux.git cpufreq/core/sysfs
@Saravana/Prarit: It would be good if you can provide your feedback/reviews on this. Thanks in advance.
@Saravana: Few patches might look very similar to what you have done, please add your SOB for those :)
-- viresh
Viresh Kumar (18): cpufreq: Drop cpufreq_disabled() check from cpufreq_cpu_{get|put}() cpufreq: Create for_each_policy() cpufreq: Create for_each_governor() cpufreq: Manage fallback policies in a list cpufreq: Manage governor usage history with 'policy->last_governor' cpufreq: Reuse policy list instead of per-cpu variable 'cpufreq_cpu_data' cpufreq: Drop (now) useless check 'cpu > nr_cpu_ids' cpufreq: Add doc style comment about cpufreq_cpu_{get|put}() cpufreq: Mark policy->governor = NULL for fallback policies cpufreq: Don't allow updating inactive-policies from sysfs cpufreq: Track cpu managing sysfs kobjects separately cpufreq: Stop migrating sysfs files on hotplug cpufreq: Keep a single path for adding managed CPUs cpufreq: Remove cpufreq_update_policy() cpufreq: Initialize policy->kobj while allocating policy cpufreq: Call cpufreq_policy_put_kobj() from cpufreq_policy_free() cpufreq: Restart governor as soon as possible cpufreq: Merge __cpufreq_add_dev() and cpufreq_add_dev()
drivers/cpufreq/cpufreq.c | 551 +++++++++++++++++++++++----------------------- include/linux/cpufreq.h | 10 +- 2 files changed, 287 insertions(+), 274 deletions(-)
The first three patches I have no problems with, so I've queued them up for 3.20.
linaro-kernel@lists.linaro.org