Hi Russell,
On Mon, Jul 20, 2015 at 12:36 PM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
On Mon, Jul 20, 2015 at 03:17:10PM +0530, Viresh Kumar wrote:
Consider a dual core (0/1) system with two CPUs:
sharing clock/voltage rails and hence cpufreq-policy
CPU1 is offline while the cpufreq driver is registered
cpufreq_add_dev() is called from subsys callback for CPU0 and we create the policy for the CPUs and create link for CPU1.
cpufreq_add_dev() is called from subsys callback for CPU1, we find that the cpu is offline and we try to create a sysfs link for CPU1.
This results in double addition of the sysfs link and we get this:
WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x60/0x7c() sysfs: cannot create duplicate filename '/devices/system/cpu/cpu1/cpufreq' Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc2+ #1704 Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) Backtrace: [<c0013248>] (dump_backtrace) from [<c00133e4>] (show_stack+0x18/0x1c) r6:c01a1f30 r5:0000001f r4:00000000 r3:00000000 [<c00133cc>] (show_stack) from [<c076920c>] (dump_stack+0x7c/0x98) [<c0769190>] (dump_stack) from [<c0029ab4>] (warn_slowpath_common+0x80/0xbc) r4:d74abbd0 r3:d74c0000 [<c0029a34>] (warn_slowpath_common) from [<c0029b94>] (warn_slowpath_fmt+0x38/0x40) r8:ffffffef r7:00000000 r6:d75a8960 r5:c0993280 r4:d6b4d000 [<c0029b60>] (warn_slowpath_fmt) from [<c01a1f30>] (sysfs_warn_dup+0x60/0x7c) r3:d6b4dfe7 r2:c0930750 [<c01a1ed0>] (sysfs_warn_dup) from [<c01a22c8>] (sysfs_do_create_link_sd+0xb8/0xc0) r6:d75a8960 r5:c0993280 r4:d00aba20 [<c01a2210>] (sysfs_do_create_link_sd) from [<c01a22fc>] (sysfs_create_link+0x2c/0x3c) r10:00000001 r8:c14db3c8 r7:d7b89010 r6:c0ae7c60 r5:d7b89010 r4:d00d1200 [<c01a22d0>] (sysfs_create_link) from [<c0506160>] (add_cpu_dev_symlink+0x34/0x5c) [<c050612c>] (add_cpu_dev_symlink) from [<c05084d0>] (cpufreq_add_dev+0x674/0x794) r5:00000001 r4:00000000 [<c0507e5c>] (cpufreq_add_dev) from [<c03db114>] (subsys_interface_register+0x8c/0xd0) r10:00000003 r9:d7bb01f0 r8:c14db3c8 r7:00106738 r6:c0ae7c60 r5:c0acbd08 r4:c0ae7e20 [<c03db088>] (subsys_interface_register) from [<c0508a2c>] (cpufreq_register_driver+0x104/0x1f4)
The check for offline-cpu in cpufreq_add_dev() is present to ensure that link gets added for the CPUs, that weren't physically present earlier and we missed the case where a CPU is offline while registering the driver.
Fix this by keeping track of CPUs for which link is already created, and avoiding duplicate sysfs entries.
Why do we try to create the symlink for CPU devices which we haven't "detected" yet (iow, we haven't had cpufreq_add_dev() called for)? Surely we are guaranteed to have cpufreq_add_dev() called for every CPU which exists in sysfs? So why not _only_ create the sysfs symlinks when cpufreq_add_dev() is notified that a CPU subsys interface is present?
That's something I've overlooked.
Yes, we should be doing exactly that.
Sure, if the policy changes, we need to do maintanence on these symlinks, but I see only one path down into cpufreq_add_dev_symlink(), which is:
cpufreq_add_dev() -> cpufreq_add_dev_interface() -> cpufreq_add_dev_symlink()
In other words, only when we see a new CPU interface appears, not when the policy changes. If the set of related CPUs is policy independent, why is this information carried in the cpufreq_policy struct?
It is not policy-dependent, but the way that information is gathered is not exactly straightforward. It generally depends on what the platform firmware tells us about the topology.
If it is policy dependent, then I see no code which handles the effect of a policy change where the policy->related_cpus is different. To me, that sounds like a rather huge design hole.
Things get worse. Reading drivers/base/cpu.c, CPU interface nodes are only ever created - they're created for the set of _possible_ CPUs in the system, not those which are possible and present, and there is no unregister_cpu() API, only a register_cpu() API.
There is unregister_cpu() API too, but it is called from arch_unregister_cpu(). And it calls device_unregister() and all of the appropriate things happen AFAICS. Eventually, cpufreq_remove_dev() is called from that path.
Thanks, Rafael