Consider a dual core (0/1) system with two CPUs: - sharing clock/voltage rails and hence cpufreq-policy - CPU1 is offline while the cpufreq driver is registered - cpufreq_add_dev() is called from subsys callback for CPU0 and we create the policy for the group of CPUs and create links for all present CPUs, i.e. CPU1 as well. - cpufreq_add_dev() is called from subsys callback for CPU1, we find that the cpu is offline and we try to create a sysfs link for CPU1.
This results in double addtion of the sysfs link and we will get this:
WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x60/0x7c() sysfs: cannot create duplicate filename '/devices/system/cpu/cpu1/cpufreq' Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc2+ #1704 Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) Backtrace: [<c0013248>] (dump_backtrace) from [<c00133e4>] (show_stack+0x18/0x1c) r6:c01a1f30 r5:0000001f r4:00000000 r3:00000000 [<c00133cc>] (show_stack) from [<c076920c>] (dump_stack+0x7c/0x98) [<c0769190>] (dump_stack) from [<c0029ab4>] (warn_slowpath_common+0x80/0xbc) r4:d74abbd0 r3:d74c0000 [<c0029a34>] (warn_slowpath_common) from [<c0029b94>] (warn_slowpath_fmt+0x38/0x40) r8:ffffffef r7:00000000 r6:d75a8960 r5:c0993280 r4:d6b4d000 [<c0029b60>] (warn_slowpath_fmt) from [<c01a1f30>] (sysfs_warn_dup+0x60/0x7c) r3:d6b4dfe7 r2:c0930750 [<c01a1ed0>] (sysfs_warn_dup) from [<c01a22c8>] (sysfs_do_create_link_sd+0xb8/0xc0) r6:d75a8960 r5:c0993280 r4:d00aba20 [<c01a2210>] (sysfs_do_create_link_sd) from [<c01a22fc>] (sysfs_create_link+0x2c/0x3c) r10:00000001 r8:c14db3c8 r7:d7b89010 r6:c0ae7c60 r5:d7b89010 r4:d00d1200 [<c01a22d0>] (sysfs_create_link) from [<c0506160>] (add_cpu_dev_symlink+0x34/0x5c) [<c050612c>] (add_cpu_dev_symlink) from [<c05084d0>] (cpufreq_add_dev+0x674/0x794) r5:00000001 r4:00000000 [<c0507e5c>] (cpufreq_add_dev) from [<c03db114>] (subsys_interface_register+0x8c/0xd0) r10:00000003 r9:d7bb01f0 r8:c14db3c8 r7:00106738 r6:c0ae7c60 r5:c0acbd08 r4:c0ae7e20 [<c03db088>] (subsys_interface_register) from [<c0508a2c>] (cpufreq_register_driver+0x104/0x1f4)
The check for offline-cpu in cpufreq_add_dev() is to ensure that link gets added for the CPUs which weren't physically present earlier and that misses the case where a CPU is offline while registering the driver.
To fix this properly, don't create these links when the policy get initialized. Rather wait for individual subsys callback for CPUs to add/remove these links. This simplifies most of the code leaving cpufreq_remove_dev().
The problem is that, we might remove cpu which was owner of policy->kobj in sysfs, before other CPUs are removed. Fix this by the solution we have been using until very recently, in which we move the kobject to any other CPU, for which remove is yet to be called.
Tested on dual core exynos board with cpufreq-dt driver. The driver was compiled as module and inserted/removed multiple times on a running kernel.
Fixes: 87549141d516 ("cpufreq: Stop migrating sysfs files on hotplug") Reported-and-suggested-by: Russell King linux@arm.linux.org.uk Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- V1->V2: Completely changed, please review again :)
@Rafael: I didn't review your solution and gave this one because I thought Russell suggested the right thing. i.e. don't create links in the beginning.
This is based of 4.2-rc3 and so your other patch, https://patchwork.kernel.org/patch/6839031/ has to be rebased over it.
I didn't rebase this patch over yours for two reasons: - Yours wasn't necessarily 4.2 material. - I already mentioned a problem in that patch.
@Russell: I hope this will look much better than V1 to you. Please give it a try once you get some time.
drivers/cpufreq/cpufreq.c | 165 ++++++++++++++++++---------------------------- include/linux/cpufreq.h | 1 + 2 files changed, 65 insertions(+), 101 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 26063afb3eba..81c2417e52f4 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -966,67 +966,6 @@ void cpufreq_sysfs_remove_file(const struct attribute *attr) } EXPORT_SYMBOL(cpufreq_sysfs_remove_file);
-static int add_cpu_dev_symlink(struct cpufreq_policy *policy, int cpu) -{ - struct device *cpu_dev; - - pr_debug("%s: Adding symlink for CPU: %u\n", __func__, cpu); - - if (!policy) - return 0; - - cpu_dev = get_cpu_device(cpu); - if (WARN_ON(!cpu_dev)) - return 0; - - return sysfs_create_link(&cpu_dev->kobj, &policy->kobj, "cpufreq"); -} - -static void remove_cpu_dev_symlink(struct cpufreq_policy *policy, int cpu) -{ - struct device *cpu_dev; - - pr_debug("%s: Removing symlink for CPU: %u\n", __func__, cpu); - - cpu_dev = get_cpu_device(cpu); - if (WARN_ON(!cpu_dev)) - return; - - sysfs_remove_link(&cpu_dev->kobj, "cpufreq"); -} - -/* Add/remove symlinks for all related CPUs */ -static int cpufreq_add_dev_symlink(struct cpufreq_policy *policy) -{ - unsigned int j; - int ret = 0; - - /* Some related CPUs might not be present (physically hotplugged) */ - for_each_cpu_and(j, policy->related_cpus, cpu_present_mask) { - if (j == policy->kobj_cpu) - continue; - - ret = add_cpu_dev_symlink(policy, j); - if (ret) - break; - } - - return ret; -} - -static void cpufreq_remove_dev_symlink(struct cpufreq_policy *policy) -{ - unsigned int j; - - /* Some related CPUs might not be present (physically hotplugged) */ - for_each_cpu_and(j, policy->related_cpus, cpu_present_mask) { - if (j == policy->kobj_cpu) - continue; - - remove_cpu_dev_symlink(policy, j); - } -} - static int cpufreq_add_dev_interface(struct cpufreq_policy *policy, struct device *dev) { @@ -1057,7 +996,7 @@ static int cpufreq_add_dev_interface(struct cpufreq_policy *policy, return ret; }
- return cpufreq_add_dev_symlink(policy); + return 0; }
static void cpufreq_init_policy(struct cpufreq_policy *policy) @@ -1163,11 +1102,14 @@ static struct cpufreq_policy *cpufreq_policy_alloc(struct device *dev) if (!zalloc_cpumask_var(&policy->related_cpus, GFP_KERNEL)) goto err_free_cpumask;
+ if (!zalloc_cpumask_var(&policy->symlinks, GFP_KERNEL)) + goto err_free_related_cpumask; + ret = kobject_init_and_add(&policy->kobj, &ktype_cpufreq, &dev->kobj, "cpufreq"); if (ret) { pr_err("%s: failed to init policy->kobj: %d\n", __func__, ret); - goto err_free_rcpumask; + goto err_free_symlink_cpumask; }
INIT_LIST_HEAD(&policy->policy_list); @@ -1184,7 +1126,9 @@ static struct cpufreq_policy *cpufreq_policy_alloc(struct device *dev)
return policy;
-err_free_rcpumask: +err_free_symlink_cpumask: + free_cpumask_var(policy->symlinks); +err_free_related_cpumask: free_cpumask_var(policy->related_cpus); err_free_cpumask: free_cpumask_var(policy->cpus); @@ -1204,7 +1148,6 @@ static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy, bool notify) CPUFREQ_REMOVE_POLICY, policy);
down_write(&policy->rwsem); - cpufreq_remove_dev_symlink(policy); kobj = &policy->kobj; cmp = &policy->kobj_unregister; up_write(&policy->rwsem); @@ -1234,6 +1177,7 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy, bool notify) write_unlock_irqrestore(&cpufreq_driver_lock, flags);
cpufreq_policy_put_kobj(policy, notify); + free_cpumask_var(policy->symlinks); free_cpumask_var(policy->related_cpus); free_cpumask_var(policy->cpus); kfree(policy); @@ -1252,26 +1196,37 @@ static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) { unsigned int j, cpu = dev->id; int ret = -ENOMEM; - struct cpufreq_policy *policy; + struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu); unsigned long flags; bool recover_policy = !sif;
pr_debug("adding CPU %u\n", cpu);
+ /* sysfs links are only created on subsys callback */ + if (sif && policy) { + pr_debug("%s: Adding symlink for CPU: %u\n", __func__, cpu); + ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq"); + if (ret) { + dev_err(dev, "%s: Failed to create link for cpu %d (%d)\n", + __func__, cpu, ret); + return ret; + } + + /* Track CPUs for which sysfs links are created */ + cpumask_set_cpu(cpu, policy->symlinks); + } + /* - * Only possible if 'cpu' wasn't physically present earlier and we are - * here from subsys_interface add callback. A hotplug notifier will - * follow and we will handle it like logical CPU hotplug then. For now, - * just create the sysfs link. + * A hotplug notifier will follow and we will take care of rest + * of the initialization then. */ if (cpu_is_offline(cpu)) - return add_cpu_dev_symlink(per_cpu(cpufreq_cpu_data, cpu), cpu); + return 0;
if (!down_read_trylock(&cpufreq_rwsem)) return 0;
/* Check if this CPU already has a policy to manage it */ - policy = per_cpu(cpufreq_cpu_data, cpu); if (policy && !policy_is_inactive(policy)) { WARN_ON(!cpumask_test_cpu(cpu, policy->related_cpus)); ret = cpufreq_add_policy_cpu(policy, cpu, dev); @@ -1506,10 +1461,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev, if (cpufreq_driver->exit) cpufreq_driver->exit(policy);
- /* Free the policy only if the driver is getting removed. */ - if (sif) - cpufreq_policy_free(policy, true); - return 0; }
@@ -1521,42 +1472,54 @@ static int __cpufreq_remove_dev_finish(struct device *dev, static int cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif) { unsigned int cpu = dev->id; + struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu); int ret;
- /* - * Only possible if 'cpu' is getting physically removed now. A hotplug - * notifier should have already been called and we just need to remove - * link or free policy here. - */ - if (cpu_is_offline(cpu)) { - struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu); - struct cpumask mask; + if (!policy) + return 0;
- if (!policy) - return 0; + if (cpu_online(cpu)) { + ret = __cpufreq_remove_dev_prepare(dev, sif); + if (!ret) + ret = __cpufreq_remove_dev_finish(dev, sif); + if (ret) + return ret; + }
- cpumask_copy(&mask, policy->related_cpus); - cpumask_clear_cpu(cpu, &mask); + /* sysfs links are removed only on subsys callback */ + if (cpumask_test_cpu(cpu, policy->symlinks)) { + dev_dbg(dev, "%s: Removing symlink for CPU: %u\n", __func__, + cpu); + cpumask_clear_cpu(cpu, policy->symlinks); + sysfs_remove_link(&dev->kobj, "cpufreq"); + return 0; + }
+ if (cpumask_weight(policy->symlinks)) { /* - * Free policy only if all policy->related_cpus are removed - * physically. + * Okay, we still have some CPUs left. Transfer the ownership of + * policy to one of them. Would be better to pass that to + * cpumask_last() as that will be the last CPU to get removed, + * but there is no API to get last cpu of the mask. Lets move it + * to the first cpu in the mask. */ - if (cpumask_intersects(&mask, cpu_present_mask)) { - remove_cpu_dev_symlink(policy, cpu); - return 0; - } + int new_cpu = cpumask_first(policy->symlinks); + struct device *new_dev = get_cpu_device(new_cpu);
- cpufreq_policy_free(policy, true); - return 0; - } + dev_dbg(dev, "%s: Migrating kobj from %d to %d\n", __func__, + cpu, new_cpu);
- ret = __cpufreq_remove_dev_prepare(dev, sif); + cpumask_clear_cpu(new_cpu, policy->symlinks); + sysfs_remove_link(&new_dev->kobj, "cpufreq");
- if (!ret) - ret = __cpufreq_remove_dev_finish(dev, sif); + policy->kobj_cpu = new_cpu; + WARN_ON(kobject_move(&policy->kobj, &new_dev->kobj)); + } else { + /* This is the last CPU to be removed */ + cpufreq_policy_free(policy, true); + }
- return ret; + return 0; }
static void handle_update(struct work_struct *work) diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 29ad97c34fd5..c748d1cd0815 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -62,6 +62,7 @@ struct cpufreq_policy { /* CPUs sharing clock, require sw coordination */ cpumask_var_t cpus; /* Online CPUs only */ cpumask_var_t related_cpus; /* Online + Offline CPUs */ + cpumask_var_t symlinks; /* CPUs for which cpufreq sysfs directory is present */
unsigned int shared_type; /* ACPI: ANY or ALL affected CPUs should set cpufreq */
Hi Viresh,
On Wed, Jul 22, 2015 at 2:07 PM, Viresh Kumar viresh.kumar@linaro.org wrote:
Consider a dual core (0/1) system with two CPUs:
- sharing clock/voltage rails and hence cpufreq-policy
- CPU1 is offline while the cpufreq driver is registered
- cpufreq_add_dev() is called from subsys callback for CPU0 and we create the policy for the group of CPUs and create links for all present CPUs, i.e. CPU1 as well.
- cpufreq_add_dev() is called from subsys callback for CPU1, we find that the cpu is offline and we try to create a sysfs link for CPU1.
This results in double addtion of the sysfs link and we will get this:
WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x60/0x7c() sysfs: cannot create duplicate filename '/devices/system/cpu/cpu1/cpufreq' Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc2+ #1704 Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) Backtrace: [<c0013248>] (dump_backtrace) from [<c00133e4>] (show_stack+0x18/0x1c) r6:c01a1f30 r5:0000001f r4:00000000 r3:00000000 [<c00133cc>] (show_stack) from [<c076920c>] (dump_stack+0x7c/0x98) [<c0769190>] (dump_stack) from [<c0029ab4>] (warn_slowpath_common+0x80/0xbc) r4:d74abbd0 r3:d74c0000 [<c0029a34>] (warn_slowpath_common) from [<c0029b94>] (warn_slowpath_fmt+0x38/0x40) r8:ffffffef r7:00000000 r6:d75a8960 r5:c0993280 r4:d6b4d000 [<c0029b60>] (warn_slowpath_fmt) from [<c01a1f30>] (sysfs_warn_dup+0x60/0x7c) r3:d6b4dfe7 r2:c0930750 [<c01a1ed0>] (sysfs_warn_dup) from [<c01a22c8>] (sysfs_do_create_link_sd+0xb8/0xc0) r6:d75a8960 r5:c0993280 r4:d00aba20 [<c01a2210>] (sysfs_do_create_link_sd) from [<c01a22fc>] (sysfs_create_link+0x2c/0x3c) r10:00000001 r8:c14db3c8 r7:d7b89010 r6:c0ae7c60 r5:d7b89010 r4:d00d1200 [<c01a22d0>] (sysfs_create_link) from [<c0506160>] (add_cpu_dev_symlink+0x34/0x5c) [<c050612c>] (add_cpu_dev_symlink) from [<c05084d0>] (cpufreq_add_dev+0x674/0x794) r5:00000001 r4:00000000 [<c0507e5c>] (cpufreq_add_dev) from [<c03db114>] (subsys_interface_register+0x8c/0xd0) r10:00000003 r9:d7bb01f0 r8:c14db3c8 r7:00106738 r6:c0ae7c60 r5:c0acbd08 r4:c0ae7e20 [<c03db088>] (subsys_interface_register) from [<c0508a2c>] (cpufreq_register_driver+0x104/0x1f4)
The check for offline-cpu in cpufreq_add_dev() is to ensure that link gets added for the CPUs which weren't physically present earlier and that misses the case where a CPU is offline while registering the driver.
To fix this properly, don't create these links when the policy get initialized. Rather wait for individual subsys callback for CPUs to add/remove these links. This simplifies most of the code leaving cpufreq_remove_dev().
The problem is that, we might remove cpu which was owner of policy->kobj in sysfs, before other CPUs are removed. Fix this by the solution we have been using until very recently, in which we move the kobject to any other CPU, for which remove is yet to be called.
Tested on dual core exynos board with cpufreq-dt driver. The driver was compiled as module and inserted/removed multiple times on a running kernel.
Fixes: 87549141d516 ("cpufreq: Stop migrating sysfs files on hotplug") Reported-and-suggested-by: Russell King linux@arm.linux.org.uk Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
That looks good to me overall, but please let me rename your new "symlinks" CPU mask to "dependent_cpus".
V1->V2: Completely changed, please review again :)
@Rafael: I didn't review your solution and gave this one because I thought Russell suggested the right thing. i.e. don't create links in the beginning.
Sure. I prefer this approach too.
This is based of 4.2-rc3 and so your other patch, https://patchwork.kernel.org/patch/6839031/ has to be rebased over it.
OK
I didn't rebase this patch over yours for two reasons:
- Yours wasn't necessarily 4.2 material.
Right.
- I already mentioned a problem in that patch.
I'm not sure if the problem is really there, but after the changes in this patch it doesn't really matter. :-)
Thanks, Rafael
On Wed, Jul 22, 2015 at 05:37:18PM +0530, Viresh Kumar wrote:
Consider a dual core (0/1) system with two CPUs:
- sharing clock/voltage rails and hence cpufreq-policy
- CPU1 is offline while the cpufreq driver is registered
- cpufreq_add_dev() is called from subsys callback for CPU0 and we create the policy for the group of CPUs and create links for all present CPUs, i.e. CPU1 as well.
- cpufreq_add_dev() is called from subsys callback for CPU1, we find that the cpu is offline and we try to create a sysfs link for CPU1.
This results in double addtion of the sysfs link and we will get this:
WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x60/0x7c() sysfs: cannot create duplicate filename '/devices/system/cpu/cpu1/cpufreq' Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc2+ #1704 Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) Backtrace: [<c0013248>] (dump_backtrace) from [<c00133e4>] (show_stack+0x18/0x1c) r6:c01a1f30 r5:0000001f r4:00000000 r3:00000000 [<c00133cc>] (show_stack) from [<c076920c>] (dump_stack+0x7c/0x98) [<c0769190>] (dump_stack) from [<c0029ab4>] (warn_slowpath_common+0x80/0xbc) r4:d74abbd0 r3:d74c0000 [<c0029a34>] (warn_slowpath_common) from [<c0029b94>] (warn_slowpath_fmt+0x38/0x40) r8:ffffffef r7:00000000 r6:d75a8960 r5:c0993280 r4:d6b4d000 [<c0029b60>] (warn_slowpath_fmt) from [<c01a1f30>] (sysfs_warn_dup+0x60/0x7c) r3:d6b4dfe7 r2:c0930750 [<c01a1ed0>] (sysfs_warn_dup) from [<c01a22c8>] (sysfs_do_create_link_sd+0xb8/0xc0) r6:d75a8960 r5:c0993280 r4:d00aba20 [<c01a2210>] (sysfs_do_create_link_sd) from [<c01a22fc>] (sysfs_create_link+0x2c/0x3c) r10:00000001 r8:c14db3c8 r7:d7b89010 r6:c0ae7c60 r5:d7b89010 r4:d00d1200 [<c01a22d0>] (sysfs_create_link) from [<c0506160>] (add_cpu_dev_symlink+0x34/0x5c) [<c050612c>] (add_cpu_dev_symlink) from [<c05084d0>] (cpufreq_add_dev+0x674/0x794) r5:00000001 r4:00000000 [<c0507e5c>] (cpufreq_add_dev) from [<c03db114>] (subsys_interface_register+0x8c/0xd0) r10:00000003 r9:d7bb01f0 r8:c14db3c8 r7:00106738 r6:c0ae7c60 r5:c0acbd08 r4:c0ae7e20 [<c03db088>] (subsys_interface_register) from [<c0508a2c>] (cpufreq_register_driver+0x104/0x1f4)
The check for offline-cpu in cpufreq_add_dev() is to ensure that link gets added for the CPUs which weren't physically present earlier and that misses the case where a CPU is offline while registering the driver.
To fix this properly, don't create these links when the policy get initialized. Rather wait for individual subsys callback for CPUs to add/remove these links. This simplifies most of the code leaving cpufreq_remove_dev().
The problem is that, we might remove cpu which was owner of policy->kobj in sysfs, before other CPUs are removed. Fix this by the solution we have been using until very recently, in which we move the kobject to any other CPU, for which remove is yet to be called.
Tested on dual core exynos board with cpufreq-dt driver. The driver was compiled as module and inserted/removed multiple times on a running kernel.
Fixes: 87549141d516 ("cpufreq: Stop migrating sysfs files on hotplug") Reported-and-suggested-by: Russell King linux@arm.linux.org.uk Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
V1->V2: Completely changed, please review again :)
@Rafael: I didn't review your solution and gave this one because I thought Russell suggested the right thing. i.e. don't create links in the beginning.
This is based of 4.2-rc3 and so your other patch, https://patchwork.kernel.org/patch/6839031/ has to be rebased over it.
I didn't rebase this patch over yours for two reasons:
- Yours wasn't necessarily 4.2 material.
- I already mentioned a problem in that patch.
@Russell: I hope this will look much better than V1 to you. Please give it a try once you get some time.
drivers/cpufreq/cpufreq.c | 165 ++++++++++++++++++---------------------------- include/linux/cpufreq.h | 1 + 2 files changed, 65 insertions(+), 101 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 26063afb3eba..81c2417e52f4 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -966,67 +966,6 @@ void cpufreq_sysfs_remove_file(const struct attribute *attr) } EXPORT_SYMBOL(cpufreq_sysfs_remove_file); -static int add_cpu_dev_symlink(struct cpufreq_policy *policy, int cpu) -{
- struct device *cpu_dev;
- pr_debug("%s: Adding symlink for CPU: %u\n", __func__, cpu);
- if (!policy)
return 0;
- cpu_dev = get_cpu_device(cpu);
- if (WARN_ON(!cpu_dev))
return 0;
- return sysfs_create_link(&cpu_dev->kobj, &policy->kobj, "cpufreq");
-}
-static void remove_cpu_dev_symlink(struct cpufreq_policy *policy, int cpu) -{
- struct device *cpu_dev;
- pr_debug("%s: Removing symlink for CPU: %u\n", __func__, cpu);
- cpu_dev = get_cpu_device(cpu);
- if (WARN_ON(!cpu_dev))
return;
- sysfs_remove_link(&cpu_dev->kobj, "cpufreq");
-}
-/* Add/remove symlinks for all related CPUs */ -static int cpufreq_add_dev_symlink(struct cpufreq_policy *policy) -{
- unsigned int j;
- int ret = 0;
- /* Some related CPUs might not be present (physically hotplugged) */
- for_each_cpu_and(j, policy->related_cpus, cpu_present_mask) {
if (j == policy->kobj_cpu)
continue;
ret = add_cpu_dev_symlink(policy, j);
if (ret)
break;
- }
- return ret;
-}
-static void cpufreq_remove_dev_symlink(struct cpufreq_policy *policy) -{
- unsigned int j;
- /* Some related CPUs might not be present (physically hotplugged) */
- for_each_cpu_and(j, policy->related_cpus, cpu_present_mask) {
if (j == policy->kobj_cpu)
continue;
remove_cpu_dev_symlink(policy, j);
- }
-}
static int cpufreq_add_dev_interface(struct cpufreq_policy *policy, struct device *dev) { @@ -1057,7 +996,7 @@ static int cpufreq_add_dev_interface(struct cpufreq_policy *policy, return ret; }
- return cpufreq_add_dev_symlink(policy);
- return 0;
} static void cpufreq_init_policy(struct cpufreq_policy *policy) @@ -1163,11 +1102,14 @@ static struct cpufreq_policy *cpufreq_policy_alloc(struct device *dev) if (!zalloc_cpumask_var(&policy->related_cpus, GFP_KERNEL)) goto err_free_cpumask;
- if (!zalloc_cpumask_var(&policy->symlinks, GFP_KERNEL))
goto err_free_related_cpumask;
- ret = kobject_init_and_add(&policy->kobj, &ktype_cpufreq, &dev->kobj, "cpufreq"); if (ret) { pr_err("%s: failed to init policy->kobj: %d\n", __func__, ret);
goto err_free_rcpumask;
}goto err_free_symlink_cpumask;
INIT_LIST_HEAD(&policy->policy_list); @@ -1184,7 +1126,9 @@ static struct cpufreq_policy *cpufreq_policy_alloc(struct device *dev) return policy; -err_free_rcpumask: +err_free_symlink_cpumask:
- free_cpumask_var(policy->symlinks);
+err_free_related_cpumask: free_cpumask_var(policy->related_cpus); err_free_cpumask: free_cpumask_var(policy->cpus); @@ -1204,7 +1148,6 @@ static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy, bool notify) CPUFREQ_REMOVE_POLICY, policy); down_write(&policy->rwsem);
- cpufreq_remove_dev_symlink(policy); kobj = &policy->kobj; cmp = &policy->kobj_unregister; up_write(&policy->rwsem);
@@ -1234,6 +1177,7 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy, bool notify) write_unlock_irqrestore(&cpufreq_driver_lock, flags); cpufreq_policy_put_kobj(policy, notify);
- free_cpumask_var(policy->symlinks); free_cpumask_var(policy->related_cpus); free_cpumask_var(policy->cpus); kfree(policy);
@@ -1252,26 +1196,37 @@ static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) { unsigned int j, cpu = dev->id; int ret = -ENOMEM;
- struct cpufreq_policy *policy;
- struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu); unsigned long flags; bool recover_policy = !sif;
pr_debug("adding CPU %u\n", cpu);
- /* sysfs links are only created on subsys callback */
- if (sif && policy) {
pr_debug("%s: Adding symlink for CPU: %u\n", __func__, cpu);
dev_dbg() ?
ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq");
if (ret) {
dev_err(dev, "%s: Failed to create link for cpu %d (%d)\n",
__func__, cpu, ret);
I wonder why we print the CPU number - since it's from dev->id, isn't it included in the struct device name printed by dev_err() already?
return ret;
}
/* Track CPUs for which sysfs links are created */
cpumask_set_cpu(cpu, policy->symlinks);
- }
I guess this will do for -rc, but it's not particularly nice. Can I suggest splitting the two operations here - the add_dev callback from the subsys interface, and the handling of hotplug online/offline notifications.
You only need to do the above for the subsys interface, and you only need to do the remainder if the CPU was online.
Also, what about the CPU "owning" the policy?
So, this would become:
static int cpufreq_cpu_online(struct device *dev) { pr_debug("bringing CPU%d online\n", dev->id); ... stuff to do when CPU is online ... }
static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) { unsigned int cpu = dev->id; struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
pr_debug("adding CPU %u\n", cpu);
if (policy && policy->kobj_cpu != cpu) { dev_dbg(dev, "%s: Adding cpufreq symlink\n", __func__); ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq"); if (ret) { dev_err(dev, "%s: Failed to create cpufreq symlink (%d)\n", __func__, ret); return ret; }
/* Track CPUs for which sysfs links are created */ cpumask_set_cpu(cpu, policy->symlinks); }
/* Now do the remainder if the CPU is already online */ if (cpu_online(cpu)) return cpufreq_cpu_online(dev);
return 0; }
Next, change the cpufreq_add_dev(dev, NULL) in the hotplug notifier call to cpufreq_cpu_online(dev) instead.
Doing the similar thing for the cpufreq_remove_dev() path would also make sense.
/*
* Only possible if 'cpu' wasn't physically present earlier and we are
* here from subsys_interface add callback. A hotplug notifier will
* follow and we will handle it like logical CPU hotplug then. For now,
* just create the sysfs link.
* A hotplug notifier will follow and we will take care of rest
*/ if (cpu_is_offline(cpu))* of the initialization then.
return add_cpu_dev_symlink(per_cpu(cpufreq_cpu_data, cpu), cpu);
return 0;
if (!down_read_trylock(&cpufreq_rwsem)) return 0; /* Check if this CPU already has a policy to manage it */
- policy = per_cpu(cpufreq_cpu_data, cpu); if (policy && !policy_is_inactive(policy)) { WARN_ON(!cpumask_test_cpu(cpu, policy->related_cpus)); ret = cpufreq_add_policy_cpu(policy, cpu, dev);
@@ -1506,10 +1461,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev, if (cpufreq_driver->exit) cpufreq_driver->exit(policy);
- /* Free the policy only if the driver is getting removed. */
- if (sif)
cpufreq_policy_free(policy, true);
- return 0;
} @@ -1521,42 +1472,54 @@ static int __cpufreq_remove_dev_finish(struct device *dev, static int cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif) { unsigned int cpu = dev->id;
- struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu); int ret;
- /*
* Only possible if 'cpu' is getting physically removed now. A hotplug
* notifier should have already been called and we just need to remove
* link or free policy here.
*/
- if (cpu_is_offline(cpu)) {
struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
struct cpumask mask;
- if (!policy)
return 0;
if (!policy)
return 0;
- if (cpu_online(cpu)) {
ret = __cpufreq_remove_dev_prepare(dev, sif);
if (!ret)
ret = __cpufreq_remove_dev_finish(dev, sif);
if (ret)
return ret;
Here, I have to wonder about this. If you look at the code in drivers/base/bus.c, you'll notice that the ->remove_dev return code is not used (personally, I hate interfaces which are created with an int return type for a removal operation, but then ignore the return code. Either have the return code and use it, or don't confuse driver authors by having one.)
What this means is that in the remove path, the device _is_ going away, whether you like it or not. So, if you have an error early in your remove path, returning that error does you no good - you end up leaking memory because subsequent cleanup doesn't get done.
It's better to either ensure that your removal path can't fail, or if it can, to reasonably clean up as much as you can (which here, means continuing to remove the symlink.)
If you adopt my suggestion above, then cpufreq_remove_dev() becomes something like:
static int cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif) { unsigned int cpu = dev->id; struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
if (cpu_is_online(cpu)) cpufreq_cpu_offline(dev);
if (policy) { if (cpumask_test_cpu(cpu, policy->symlinks)) { dev_dbg(dev, "%s: Removing cpufreq symlink\n", __func__); cpumask_clear_cpu(cpu, policy->symlinks); sysfs_remove_link(&dev->kobj, "cpufreq"); }
if (policy->kobj_cpu == cpu) { ... migration code and final CPU deletion code ... } }
return 0; }
which IMHO is easier to read and follow, and more symetrical with cpufreq_add_dev().
Now, I'm left wondering about a few things:
1. whether having a CPU "own" the policy, and having the cpufreq/ directory beneath the cpuN node is a good idea, or whether it would be better to place this in the /sys/devices/system/cpufreq/ subdirectory and always symlink to there. It strikes me that would simplify the code a little.
2. whether using a kref to track the usage of the policy would be better than tracking symlink weight (or in the case of (1) being adopted, whether the symlink cpumask becomes empty.) Note that the symlink weight becoming zero without (1) (in other words, no symlinks) is not the correct condition for freeing the policy - we still have one CPU, that being the CPU for policy->kobj_cpu.
3. what happens when 'policy' is NULL at the point when the first (few) CPUs are added - how do the symlinks get created later if/when policy becomes non-NULL (can it?)
4. what about policy->related_cpus ? What if one of the CPUs being added is not in policy->related_cpus? Should we still go ahead and create the symlink?
Hi Russell,
On Wed, Jul 22, 2015 at 3:15 PM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
On Wed, Jul 22, 2015 at 05:37:18PM +0530, Viresh Kumar wrote:
[cut]
@@ -1252,26 +1196,37 @@ static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) { unsigned int j, cpu = dev->id; int ret = -ENOMEM;
struct cpufreq_policy *policy;
struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu); unsigned long flags; bool recover_policy = !sif; pr_debug("adding CPU %u\n", cpu);
/* sysfs links are only created on subsys callback */
if (sif && policy) {
pr_debug("%s: Adding symlink for CPU: %u\n", __func__, cpu);
dev_dbg() ?
Right.
ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq");
if (ret) {
dev_err(dev, "%s: Failed to create link for cpu %d (%d)\n",
__func__, cpu, ret);
I wonder why we print the CPU number - since it's from dev->id, isn't it included in the struct device name printed by dev_err() already?
It is AFAICS.
return ret;
}
/* Track CPUs for which sysfs links are created */
cpumask_set_cpu(cpu, policy->symlinks);
}
I guess this will do for -rc, but it's not particularly nice.
Right.
This is what I'm queuing up for -rc: http://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/commit/?h=bl...
I've made a few other minor changes (discussed with Viresh on IRC) to it.
Can I suggest splitting the two operations here - the add_dev callback from the subsys interface, and the handling of hotplug online/offline notifications.
You only need to do the above for the subsys interface, and you only need to do the remainder if the CPU was online.
Also, what about the CPU "owning" the policy?
So, this would become:
static int cpufreq_cpu_online(struct device *dev) { pr_debug("bringing CPU%d online\n", dev->id); ... stuff to do when CPU is online ... }
static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) { unsigned int cpu = dev->id; struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
pr_debug("adding CPU %u\n", cpu); if (policy && policy->kobj_cpu != cpu) { dev_dbg(dev, "%s: Adding cpufreq symlink\n", __func__); ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq"); if (ret) { dev_err(dev, "%s: Failed to create cpufreq symlink (%d)\n", __func__, ret); return ret; } /* Track CPUs for which sysfs links are created */ cpumask_set_cpu(cpu, policy->symlinks); } /* Now do the remainder if the CPU is already online */ if (cpu_online(cpu)) return cpufreq_cpu_online(dev); return 0;
}
Next, change the cpufreq_add_dev(dev, NULL) in the hotplug notifier call to cpufreq_cpu_online(dev) instead.
These are good suggestions. I actually have a plan to do that cleanup on top of the VIresh's patch.
Doing the similar thing for the cpufreq_remove_dev() path would also make sense.
cpufreq_remove_dev() is only called from bus.c already, but it also has to handle the driver removal case.
And I already have a patch to drop the "sif" argument from __cpufreq_remove_dev_prepare/finish() that are called on CPU offline.
/*
* Only possible if 'cpu' wasn't physically present earlier and we are
* here from subsys_interface add callback. A hotplug notifier will
* follow and we will handle it like logical CPU hotplug then. For now,
* just create the sysfs link.
* A hotplug notifier will follow and we will take care of rest
* of the initialization then. */ if (cpu_is_offline(cpu))
return add_cpu_dev_symlink(per_cpu(cpufreq_cpu_data, cpu), cpu);
return 0; if (!down_read_trylock(&cpufreq_rwsem)) return 0; /* Check if this CPU already has a policy to manage it */
policy = per_cpu(cpufreq_cpu_data, cpu); if (policy && !policy_is_inactive(policy)) { WARN_ON(!cpumask_test_cpu(cpu, policy->related_cpus)); ret = cpufreq_add_policy_cpu(policy, cpu, dev);
@@ -1506,10 +1461,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev, if (cpufreq_driver->exit) cpufreq_driver->exit(policy);
/* Free the policy only if the driver is getting removed. */
if (sif)
cpufreq_policy_free(policy, true);
return 0;
}
@@ -1521,42 +1472,54 @@ static int __cpufreq_remove_dev_finish(struct device *dev, static int cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif) { unsigned int cpu = dev->id;
struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu); int ret;
/*
* Only possible if 'cpu' is getting physically removed now. A hotplug
* notifier should have already been called and we just need to remove
* link or free policy here.
*/
if (cpu_is_offline(cpu)) {
struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
struct cpumask mask;
if (!policy)
return 0;
if (!policy)
return 0;
if (cpu_online(cpu)) {
ret = __cpufreq_remove_dev_prepare(dev, sif);
if (!ret)
ret = __cpufreq_remove_dev_finish(dev, sif);
if (ret)
return ret;
Here, I have to wonder about this. If you look at the code in drivers/base/bus.c, you'll notice that the ->remove_dev return code is not used (personally, I hate interfaces which are created with an int return type for a removal operation, but then ignore the return code. Either have the return code and use it, or don't confuse driver authors by having one.)
What this means is that in the remove path, the device _is_ going away, whether you like it or not. So, if you have an error early in your remove path, returning that error does you no good - you end up leaking memory because subsequent cleanup doesn't get done.
Right.
It's better to either ensure that your removal path can't fail, or if it can, to reasonably clean up as much as you can (which here, means continuing to remove the symlink.)
If you adopt my suggestion above, then cpufreq_remove_dev() becomes something like:
static int cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif) { unsigned int cpu = dev->id; struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
if (cpu_is_online(cpu)) cpufreq_cpu_offline(dev); if (policy) { if (cpumask_test_cpu(cpu, policy->symlinks)) { dev_dbg(dev, "%s: Removing cpufreq symlink\n", __func__); cpumask_clear_cpu(cpu, policy->symlinks); sysfs_remove_link(&dev->kobj, "cpufreq"); } if (policy->kobj_cpu == cpu) { ... migration code and final CPU deletion code ... } } return 0;
}
which IMHO is easier to read and follow, and more symetrical with cpufreq_add_dev().
Agreed (in general).
Now, I'm left wondering about a few things:
- whether having a CPU "own" the policy, and having the cpufreq/ directory beneath the cpuN node is a good idea, or whether it would be better to place this in the /sys/devices/system/cpufreq/ subdirectory and always symlink to there. It strikes me that would simplify the code a little.
That is a good idea IMO. A small complication here is that there may be multiple policies in the system and a kobject is needed for each of them.
- whether using a kref to track the usage of the policy would be better than tracking symlink weight (or in the case of (1) being adopted, whether the symlink cpumask becomes empty.) Note that the symlink weight becoming zero without (1) (in other words, no symlinks) is not the correct condition for freeing the policy - we still have one CPU, that being the CPU for policy->kobj_cpu.
Well, if we do (1), it certainly would be more straightforward to use krefs for that.
- what happens when 'policy' is NULL at the point when the first (few) CPUs are added - how do the symlinks get created later if/when policy becomes non-NULL (can it?)
Yes, it can, and we have a design issue here that bothers me a bit. Namley, we need a driver's ->init callback to populate policy->cpus for us, but this is not the only thing it is doing, so the concern is that it may not be able to deal with CPUs that aren't online.
I was thinking about an additional driver callback that would *only* populate a mask of CPUs that should use the same policy as the given one. We'd be able to call that from cpufreq_add_dev() for offline CPUs too and this way the policy object could be created for the first CPU using the policy that is registered instead of being added for the first CPU using that policy that becomes online (which happens today).
- what about policy->related_cpus ? What if one of the CPUs being added is not in policy->related_cpus?
It will need a new policy.
Should we still go ahead and create the symlink?
There's nothing to link to then. The policy object will be created when that CPU becomes online (as per the above).
Thanks, Rafael
On 22-07-15, 18:42, Rafael J. Wysocki wrote:
- what happens when 'policy' is NULL at the point when the first (few) CPUs are added - how do the symlinks get created later if/when policy becomes non-NULL (can it?)
Yes, it can, and we have a design issue here that bothers me a bit.
I replied to Russell with a NO here as the first CPU should have created the policy. BUT...
Namley, we need a driver's ->init callback to populate policy->cpus for us, but this is not the only thing it is doing, so the concern is that it may not be able to deal with CPUs that aren't online.
... the first few CPUs could have been offline and so we might not have tried to add the policy at all.. Need to fix that for sure.
I was thinking about an additional driver callback that would *only* populate a mask of CPUs that should use the same policy as the given one.
Why so ? Drivers today are required to set policy->cpus with all CPUs that should be managed by that policy. i.e. all online+offline. So, actually ->init() fills policy->cpus with the value of policy->related_cpus.
Yes, I thought earlier to change that by setting policy->related_cpus from drivers, instead of policy->cpus and wasn't sure if I should do that :)
We'd be able to call that from cpufreq_add_dev() for offline CPUs too and this way the policy object could be created for the first CPU using the policy that is registered instead of being added for the first CPU using that policy that becomes online (which happens today).
Creating policy for offline CPUs doesn't look that great to me.
What we can do to fix the problem in hand, is to update a global mask of CPUs (with policy == NULL) which were offline when cpufreq_add_dev() was called for them. And when we create the policy, we can add links for all such CPUs.
By mistake dev_err was replaced by dev_dbg in a recent patch, fix that.
Fixes: 9b07109f06a1 ("cpufreq: Fix double addition of sysfs links") Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index fa718644f1ee..84504ae3fb38 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1173,7 +1173,7 @@ static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) pr_debug("%s: Adding symlink for CPU: %u\n", __func__, cpu); ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq"); if (ret) { - dev_dbg(dev, "%s: Failed to create link (%d)\n", + dev_err(dev, "%s: Failed to create link (%d)\n", __func__, ret); return ret; }
If subsys callback ->add_dev() is called for an offline CPU, before its policy is allocated, we will miss adding its sysfs symlink.
Fix this by tracking such CPUs in a separate mask.
Fixes: 9b07109f06a1 ("cpufreq: Fix double addition of sysfs links") Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 74 +++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 62 insertions(+), 12 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 84504ae3fb38..d01cad993fa7 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -31,6 +31,12 @@ #include <linux/tick.h> #include <trace/events/power.h>
+/* + * CPUs that were offline when a request to allocate policy was issued, symlinks + * for them should be created once the policy is available for them. + */ +cpumask_t linked_cpus_pending; + static LIST_HEAD(cpufreq_policy_list);
static inline bool policy_is_inactive(struct cpufreq_policy *policy) @@ -938,6 +944,47 @@ void cpufreq_sysfs_remove_file(const struct attribute *attr) } EXPORT_SYMBOL(cpufreq_sysfs_remove_file);
+static int cpufreq_add_symlink(struct cpufreq_policy *policy, + struct device *dev) +{ + int ret, cpu = dev->id; + + dev_dbg(dev, "%s: Adding symlink for CPU: %u\n", __func__, cpu); + ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq"); + if (ret) { + dev_err(dev, "%s: Failed to create link (%d)\n", __func__, ret); + return ret; + } + + /* Track CPUs for which sysfs links are created */ + cpumask_set_cpu(cpu, policy->linked_cpus); + return 0; +} + +/* + * Create symlinks for CPUs which are already added via subsys callbacks (and + * were offline then), before the policy was created. + */ +static int cpufreq_add_pending_symlinks(struct cpufreq_policy *policy) +{ + struct cpumask mask; + int cpu, ret; + + cpumask_and(&mask, policy->related_cpus, &linked_cpus_pending); + + if (cpumask_empty(&mask)) + return 0; + + for_each_cpu(cpu, &mask) { + ret = cpufreq_add_symlink(policy, get_cpu_device(cpu)); + if (ret) + return ret; + cpumask_clear_cpu(cpu, &linked_cpus_pending); + } + + return 0; +} + static int cpufreq_add_dev_interface(struct cpufreq_policy *policy, struct device *dev) { @@ -968,7 +1015,7 @@ static int cpufreq_add_dev_interface(struct cpufreq_policy *policy, return ret; }
- return 0; + return cpufreq_add_pending_symlinks(policy); }
static int cpufreq_init_policy(struct cpufreq_policy *policy) @@ -1170,24 +1217,21 @@ static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
/* sysfs links are only created on subsys callback */ if (sif && policy) { - pr_debug("%s: Adding symlink for CPU: %u\n", __func__, cpu); - ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq"); - if (ret) { - dev_err(dev, "%s: Failed to create link (%d)\n", - __func__, ret); + ret = cpufreq_add_symlink(policy, dev); + if (ret) return ret; - } - - /* Track CPUs for which sysfs links are created */ - cpumask_set_cpu(cpu, policy->linked_cpus); }
/* * A hotplug notifier will follow and we will take care of rest * of the initialization then. */ - if (cpu_is_offline(cpu)) + if (cpu_is_offline(cpu)) { + /* symlink should be added for this CPU later */ + if (!policy) + cpumask_set_cpu(cpu, &linked_cpus_pending); return 0; + }
/* Check if this CPU already has a policy to manage it */ if (policy && !policy_is_inactive(policy)) { @@ -1440,8 +1484,10 @@ static int cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif) struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu); int ret;
- if (!policy) + if (!policy) { + cpumask_clear_cpu(cpu, &linked_cpus_pending); return 0; + }
if (cpu_online(cpu)) { ret = __cpufreq_remove_dev_prepare(dev, sif); @@ -2533,10 +2579,14 @@ int cpufreq_unregister_driver(struct cpufreq_driver *driver)
/* Protect against concurrent cpu hotplug */ get_online_cpus(); + subsys_interface_unregister(&cpufreq_interface); if (cpufreq_boost_supported()) cpufreq_sysfs_remove_file(&boost.attr);
+ if (WARN_ON(!cpumask_empty(&linked_cpus_pending))) + cpumask_clear(&linked_cpus_pending); + unregister_hotcpu_notifier(&cpufreq_cpu_notifier);
write_lock_irqsave(&cpufreq_driver_lock, flags);
Hi Viresh,
On Thu, Jul 23, 2015 at 10:13 AM, Viresh Kumar viresh.kumar@linaro.org wrote:
If subsys callback ->add_dev() is called for an offline CPU, before its policy is allocated, we will miss adding its sysfs symlink.
Fix this by tracking such CPUs in a separate mask.
Fixes: 9b07109f06a1 ("cpufreq: Fix double addition of sysfs links") Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
No, we need to go to back to square one.
No fixes of fixes of fixes etc please.
Let me prepare a patch for -rc that won't introduce *new* problems and we can make major changes as 4.3 material, OK?
Thanks, Rafael
We need to clear cpumask only if the relevant cpu is set and we could have used cpumask_test_and_clear_cpu() and set instead.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index d01cad993fa7..b223c9c5296b 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1498,10 +1498,9 @@ static int cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif) }
/* sysfs links are removed only on subsys callback */ - if (cpumask_test_cpu(cpu, policy->linked_cpus)) { + if (cpumask_test_and_clear_cpu(cpu, policy->linked_cpus)) { dev_dbg(dev, "%s: Removing symlink for CPU: %u\n", __func__, cpu); - cpumask_clear_cpu(cpu, policy->linked_cpus); sysfs_remove_link(&dev->kobj, "cpufreq"); return 0; }
Hi Viresh,
On Thu, Jul 23, 2015 at 8:09 AM, Viresh Kumar viresh.kumar@linaro.org wrote:
On 22-07-15, 18:42, Rafael J. Wysocki wrote:
- what happens when 'policy' is NULL at the point when the first (few) CPUs are added - how do the symlinks get created later if/when policy becomes non-NULL (can it?)
Yes, it can, and we have a design issue here that bothers me a bit.
I replied to Russell with a NO here as the first CPU should have created the policy. BUT...
Namley, we need a driver's ->init callback to populate policy->cpus for us, but this is not the only thing it is doing, so the concern is that it may not be able to deal with CPUs that aren't online.
... the first few CPUs could have been offline and so we might not have tried to add the policy at all.. Need to fix that for sure.
Wait here.
The current Linus' tree doesn't have that problem as far as I can say.
Say cpufreq_interface->add_dev() is called for an offline CPU (say CPU2). It points to cpufreq_add_dev(), so we see that the CPU is offline and call add_cpu_dev_symlink() for it. But the first argument we pass to that is per_cpu(cpufreq_cpu_data, cpu) and that is NULL, because the policy is not there yet. So we just return 0 (and the CPU has no policy and no link).
Now say cpufreq_interface->add_dev() is called for an online CPU (say CPU3). It goes and creates the policy for it and the driver's ->init() tells us that CPU2 is related to it. So cpufreq_add_dev_interface() creates the link for CPU2 and we're fine.
Now say CPU3 was offline too when cpufreq_interface->add_dev() was called for it. We don't create a policy or a link for it. Now say CPU2 becomes online. cpufreq_cpu_callback() calls cpufreq_add_dev() for it and we land in the previous case.
The *broken* case is when CPU2 is online to start with and it had created the link for CPU3, so when an offline CPU3 is now being added, we try to create the link for it again. That is the case we need to address in -rc without introducing new problems. The $subject patch adresses that issue, but it introduces the above problem. On the other hand, my patch at https://patchwork.kernel.org/patch/6839151/ should take care of this too (unless it is broken in a way I'm not seeing now).
I was thinking about an additional driver callback that would *only* populate a mask of CPUs that should use the same policy as the given one.
Why so ? Drivers today are required to set policy->cpus with all CPUs that should be managed by that policy. i.e. all online+offline. So, actually ->init() fills policy->cpus with the value of policy->related_cpus.
So the problem is that the setting of policy->cpus is not the *only* thing that ->init() is supposed to do. It can go and write to registers etc and is that guaranteed to work with offline CPUs? I honestly don't think so.
Yes, I thought earlier to change that by setting policy->related_cpus from drivers, instead of policy->cpus and wasn't sure if I should do that :)
We'd be able to call that from cpufreq_add_dev() for offline CPUs too and this way the policy object could be created for the first CPU using the policy that is registered instead of being added for the first CPU using that policy that becomes online (which happens today).
Creating policy for offline CPUs doesn't look that great to me.
Then we have a problem that CPUs that are not initially online do not have policies, but if they go online and *then* offline subsequently, the policies will be there. So there are two different "offline" statuses for a CPU as far as cpufreq is concerned, depending on whether or not the CPU has ever been online. That's weird and IMO we should avoid it.
What we can do to fix the problem in hand, is to update a global mask of CPUs (with policy == NULL) which were offline when cpufreq_add_dev() was called for them. And when we create the policy, we can add links for all such CPUs.
For -rc, why don't we have a mask of CPUs that are both "related" and present and create links only for those? Which is what the patch at https://patchwork.kernel.org/patch/6839151/ is doing?
And then we can target a major rework at the next merge window.
Thanks, Rafael
On Thu, Jul 23, 2015 at 7:22 PM, Rafael J. Wysocki rafael@kernel.org wrote:
Hi Viresh,
On Thu, Jul 23, 2015 at 8:09 AM, Viresh Kumar viresh.kumar@linaro.org wrote:
On 22-07-15, 18:42, Rafael J. Wysocki wrote:
- what happens when 'policy' is NULL at the point when the first (few) CPUs are added - how do the symlinks get created later if/when policy becomes non-NULL (can it?)
Yes, it can, and we have a design issue here that bothers me a bit.
I replied to Russell with a NO here as the first CPU should have created the policy. BUT...
Namley, we need a driver's ->init callback to populate policy->cpus for us, but this is not the only thing it is doing, so the concern is that it may not be able to deal with CPUs that aren't online.
... the first few CPUs could have been offline and so we might not have tried to add the policy at all.. Need to fix that for sure.
Wait here.
The current Linus' tree doesn't have that problem as far as I can say.
Say cpufreq_interface->add_dev() is called for an offline CPU (say CPU2). It points to cpufreq_add_dev(), so we see that the CPU is offline and call add_cpu_dev_symlink() for it. But the first argument we pass to that is per_cpu(cpufreq_cpu_data, cpu) and that is NULL, because the policy is not there yet. So we just return 0 (and the CPU has no policy and no link).
Now say cpufreq_interface->add_dev() is called for an online CPU (say CPU3). It goes and creates the policy for it and the driver's ->init() tells us that CPU2 is related to it. So cpufreq_add_dev_interface() creates the link for CPU2 and we're fine.
Now say CPU3 was offline too when cpufreq_interface->add_dev() was called for it. We don't create a policy or a link for it. Now say CPU2 becomes online. cpufreq_cpu_callback() calls cpufreq_add_dev() for it and we land in the previous case.
The *broken* case is when CPU2 is online to start with and it had created the link for CPU3, so when an offline CPU3 is now being added, we try to create the link for it again. That is the case we need to address in -rc without introducing new problems. The $subject patch adresses that issue, but it introduces the above problem. On the other hand, my patch at https://patchwork.kernel.org/patch/6839151/ should take care of this too (unless it is broken in a way I'm not seeing now).
It doesn't address the case when the CPU being removed is the policy owner.
Let me prepare a new version of it and we'll start over from there.
Thanks, Rafael
On 22-07-15, 14:15, Russell King - ARM Linux wrote:
- /* sysfs links are only created on subsys callback */
- if (sif && policy) {
pr_debug("%s: Adding symlink for CPU: %u\n", __func__, cpu);
dev_dbg() ?
Hmm, right.
ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq");
if (ret) {
dev_err(dev, "%s: Failed to create link for cpu %d (%d)\n",
Rafael updated this instead with dev_dbg :), I am sending separate patches to fix that now.
__func__, cpu, ret);
I wonder why we print the CPU number - since it's from dev->id, isn't it included in the struct device name printed by dev_err() already?
:(
return ret;
}
/* Track CPUs for which sysfs links are created */
cpumask_set_cpu(cpu, policy->symlinks);
- }
I guess this will do for -rc, but it's not particularly nice. Can I suggest splitting the two operations here - the add_dev callback from the subsys interface, and the handling of hotplug online/offline notifications.
You only need to do the above for the subsys interface, and you only need to do the remainder if the CPU was online.
Also, what about the CPU "owning" the policy?
So, this would become:
static int cpufreq_cpu_online(struct device *dev) { pr_debug("bringing CPU%d online\n", dev->id); ... stuff to do when CPU is online ... }
static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) { unsigned int cpu = dev->id; struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
pr_debug("adding CPU %u\n", cpu);
if (policy && policy->kobj_cpu != cpu) { dev_dbg(dev, "%s: Adding cpufreq symlink\n", __func__); ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq"); if (ret) { dev_err(dev, "%s: Failed to create cpufreq symlink (%d)\n", __func__, ret); return ret; }
/* Track CPUs for which sysfs links are created */ cpumask_set_cpu(cpu, policy->symlinks);
}
/* Now do the remainder if the CPU is already online */ if (cpu_online(cpu)) return cpufreq_cpu_online(dev);
return 0; }
Next, change the cpufreq_add_dev(dev, NULL) in the hotplug notifier call to cpufreq_cpu_online(dev) instead.
Doing the similar thing for the cpufreq_remove_dev() path would also make sense.
Hmmm, Looks better ofcourse.
@@ -1521,42 +1472,54 @@ static int __cpufreq_remove_dev_finish(struct device *dev, static int cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif) { unsigned int cpu = dev->id;
- struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu); int ret;
- /*
* Only possible if 'cpu' is getting physically removed now. A hotplug
* notifier should have already been called and we just need to remove
* link or free policy here.
*/
- if (cpu_is_offline(cpu)) {
struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
struct cpumask mask;
- if (!policy)
return 0;
if (!policy)
return 0;
- if (cpu_online(cpu)) {
ret = __cpufreq_remove_dev_prepare(dev, sif);
if (!ret)
ret = __cpufreq_remove_dev_finish(dev, sif);
if (ret)
return ret;
Here, I have to wonder about this. If you look at the code in drivers/base/bus.c, you'll notice that the ->remove_dev return code is not used
Its not even using the return type of ->add_dev :), I have send an update for that recently as that was required for cpufreq-drivers. Greg must be applying that for 4.3 I hope :)
(personally, I hate interfaces which are created with an int return type for a removal operation, but then ignore the return code. Either have the return code and use it, or don't confuse driver authors by having one.)
+1
What this means is that in the remove path, the device _is_ going away, whether you like it or not. So, if you have an error early in your remove path, returning that error does you no good - you end up leaking memory because subsequent cleanup doesn't get done.
It's better to either ensure that your removal path can't fail, or if it can, to reasonably clean up as much as you can (which here, means continuing to remove the symlink.)
If you adopt my suggestion above, then cpufreq_remove_dev() becomes something like:
static int cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif) { unsigned int cpu = dev->id; struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
if (cpu_is_online(cpu)) cpufreq_cpu_offline(dev);
if (policy) { if (cpumask_test_cpu(cpu, policy->symlinks)) { dev_dbg(dev, "%s: Removing cpufreq symlink\n", __func__); cpumask_clear_cpu(cpu, policy->symlinks); sysfs_remove_link(&dev->kobj, "cpufreq"); }
if (policy->kobj_cpu == cpu) { ... migration code and final CPU deletion code ... }
}
return 0; }
which IMHO is easier to read and follow, and more symetrical with cpufreq_add_dev().
Ack.
Now, I'm left wondering about a few things:
- whether having a CPU "own" the policy, and having the cpufreq/ directory beneath the cpuN node is a good idea, or whether it would be better to place this in the /sys/devices/system/cpufreq/ subdirectory and always symlink to there. It strikes me that would simplify the code a little.
Hmm, but there can be multiple policies in a system and that would surely confuse people.
- whether using a kref to track the usage of the policy would be better than tracking symlink weight (or in the case of (1) being adopted, whether the symlink cpumask becomes empty.)
Note that the symlink weight becoming zero without (1) (in other words, no symlinks) is not the correct condition for freeing the policy - we still have one CPU, that being the CPU for policy->kobj_cpu.
But that's the cpu which is getting removed now, so it was really the last cpu and we can free the policy.
- what happens when 'policy' is NULL at the point when the first (few) CPUs are added
The first CPU that comes up has to create the policy.
- how do the symlinks get created later if/when policy becomes
non-NULL (can it?)
It can't.
- what about policy->related_cpus ? What if one of the CPUs being added is not in policy->related_cpus? Should we still go ahead and create the symlink?
Let me explain a bit around how policy are managed, you might already know this but I got a bit confused by your question.
Consider a octa-core big LITTLE platform. All big core share clock/voltage rails and all LITTLE too..
The system will have two policies: - big: This will manage four CPUs (0-3) - policy->related_cpus = 0 1 2 3 - policy->cpus = all online CPUs from 0-3 - LITTLE: This will manage four CPUs (4-7) - policy->related_cpus = 4 5 6 7 - policy->cpus = all online CPUs from 4-7
So if a CPU (say 5) doesn't find a place in big cluster's policy->related_cpus, then it must belong to a different policy.
Does that clear your query? Or did I completely miss your concern ?
linaro-kernel@lists.linaro.org