cpufreq_cpu_data is protected by cpufreq_driver_lock and one of the instances has missed this. And as a result we get this:
------------[ cut here ]------------ WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47 kobject_get+0x41/0x50() Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon ... Call Trace: [<ffffffff81661b14>] dump_stack+0x46/0x58 [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0 [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20 [<ffffffff812e16d1>] kobject_get+0x41/0x50 [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0 [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0 [<ffffffff810b8cb2>] ? up+0x32/0x50 [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2 [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252 [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0 [<ffffffff81360967>] ? acpi_has_method+0x25/0x40 [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82 [<ffffffff81089566>] ? move_linked_works+0x66/0x90 [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7 [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22 [<ffffffff8108c910>] process_one_work+0x160/0x410 [<ffffffff8108d05b>] worker_thread+0x11b/0x520 [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380 [<ffffffff81092421>] kthread+0xe1/0x100 [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0 [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0 [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0 ---[ end trace 89e66eb9795efdf7 ]---
And here is the race:
Thread A: Workqueue: kacpi_notify
acpi_processor_notify() acpi_processor_ppc_has_changed() cpufreq_update_policy() cpufreq_cpu_get() kobject_get()
Thread B: xenbus_thread()
xenbus_thread() msg->u.watch.handle->callback() handle_vcpu_hotplug_event() vcpu_hotplug() cpu_down() __cpu_notify(CPU_POST_DEAD..) cpufreq_cpu_callback() __cpufreq_remove_dev_finish() cpufreq_policy_put_kobj() kobject_put()
cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data under cpufreq_driver_lock, and once it gets a valid policy it expects it to not be freed until cpufreq_cpu_put() is called.
But the race happens when another thread puts the kobject first and updates cpufreq_cpu_data later and that too without these locks. And so the first thread gets a valid policy structure and before it does kobject_get() on it, the second one does kobject_put(). And so this WARN().
Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that too under locks.
Cc: stable@vger.kernel.org # 3.12+ Reported-by: Ethan Zhao ethan.zhao@oracle.com Reported-and-tested-by: Santosh Shilimkar santosh.shilimkar@oracle.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- @Santosh: I have changed read locks to write locks here and so you need to test again.
drivers/cpufreq/cpufreq.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 4473eba1d6b0..e3bf702b5588 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1409,9 +1409,10 @@ static int __cpufreq_remove_dev_finish(struct device *dev, unsigned long flags; struct cpufreq_policy *policy;
- read_lock_irqsave(&cpufreq_driver_lock, flags); + write_lock_irqsave(&cpufreq_driver_lock, flags); policy = per_cpu(cpufreq_cpu_data, cpu); - read_unlock_irqrestore(&cpufreq_driver_lock, flags); + per_cpu(cpufreq_cpu_data, cpu) = NULL; + write_unlock_irqrestore(&cpufreq_driver_lock, flags);
if (!policy) { pr_debug("%s: No cpu_data found\n", __func__); @@ -1466,7 +1467,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev, } }
- per_cpu(cpufreq_cpu_data, cpu) = NULL; return 0; }
Viresh,
On 2015/1/30 9:13, Viresh Kumar wrote:
cpufreq_cpu_data is protected by cpufreq_driver_lock and one of the instances has missed this. And as a result we get this:
------------[ cut here ]------------ WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47 kobject_get+0x41/0x50() Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon ... Call Trace: [<ffffffff81661b14>] dump_stack+0x46/0x58 [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0 [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20 [<ffffffff812e16d1>] kobject_get+0x41/0x50 [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0 [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0 [<ffffffff810b8cb2>] ? up+0x32/0x50 [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2 [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252 [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0 [<ffffffff81360967>] ? acpi_has_method+0x25/0x40 [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82 [<ffffffff81089566>] ? move_linked_works+0x66/0x90 [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7 [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22 [<ffffffff8108c910>] process_one_work+0x160/0x410 [<ffffffff8108d05b>] worker_thread+0x11b/0x520 [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380 [<ffffffff81092421>] kthread+0xe1/0x100 [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0 [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0 [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0 ---[ end trace 89e66eb9795efdf7 ]---
And here is the race:
Thread A: Workqueue: kacpi_notify
acpi_processor_notify() acpi_processor_ppc_has_changed() cpufreq_update_policy() cpufreq_cpu_get() kobject_get()
Thread B: xenbus_thread()
xenbus_thread() msg->u.watch.handle->callback() handle_vcpu_hotplug_event() vcpu_hotplug() cpu_down() __cpu_notify(CPU_POST_DEAD..) cpufreq_cpu_callback() __cpufreq_remove_dev_finish() cpufreq_policy_put_kobj() kobject_put()
cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data under cpufreq_driver_lock, and once it gets a valid policy it expects it to not be freed until cpufreq_cpu_put() is called.
It is another bug.
But the race happens when another thread puts the kobject first and updates cpufreq_cpu_data later and that too without these locks. And so the first thread gets a valid policy structure and before it does kobject_get() on it, the second one does kobject_put(). And so this WARN().
Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that too under locks.
Cc: stable@vger.kernel.org # 3.12+ Reported-by: Ethan Zhao ethan.zhao@oracle.com Reported-and-tested-by: Santosh Shilimkar santosh.shilimkar@oracle.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
@Santosh: I have changed read locks to write locks here and so you need to test again.
drivers/cpufreq/cpufreq.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 4473eba1d6b0..e3bf702b5588 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1409,9 +1409,10 @@ static int __cpufreq_remove_dev_finish(struct device *dev, unsigned long flags; struct cpufreq_policy *policy;
- read_lock_irqsave(&cpufreq_driver_lock, flags);
- write_lock_irqsave(&cpufreq_driver_lock, flags); policy = per_cpu(cpufreq_cpu_data, cpu);
- read_unlock_irqrestore(&cpufreq_driver_lock, flags);
- per_cpu(cpufreq_cpu_data, cpu) = NULL;
- write_unlock_irqrestore(&cpufreq_driver_lock, flags);
if (!policy) { pr_debug("%s: No cpu_data found\n", __func__); @@ -1466,7 +1467,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev, } }
- per_cpu(cpufreq_cpu_data, cpu) = NULL;
Yes, here is one bug should be fix. but seems not enough to avoid the issue completely, how about the Thread B running here
Thread B: xenbus_thread()
xenbus_thread() msg->u.watch.handle->callback() handle_vcpu_hotplug_event() vcpu_hotplug() cpu_down() __cpu_notify(CPU_POST_DEAD..) cpufreq_cpu_callback() __cpufreq_remove_dev_prepare update_policy_cpu(){ ... down_write(&policy->rwsem); policy->last_cpu = policy->cpu; policy->cpu = cpu; up_write(&policy->rwsem); ----> }
And thread A run to the same position as above, don't ignore my work blindly, that piece of bandage could save your time :>
Thanks, Ethan
return 0; }
On 30 January 2015 at 07:00, ethan zhao ethan.zhao@oracle.com wrote:
It is another bug.
Hmm, lets see..
Yes, here is one bug should be fix. but seems not enough to avoid the issue completely,
Did you test it ?
how about the Thread B running here
Thread B: xenbus_thread()
xenbus_thread() msg->u.watch.handle->callback() handle_vcpu_hotplug_event() vcpu_hotplug() cpu_down() __cpu_notify(CPU_POST_DEAD..) cpufreq_cpu_callback() __cpufreq_remove_dev_prepare update_policy_cpu(){ ... down_write(&policy->rwsem); policy->last_cpu = policy->cpu; policy->cpu = cpu; up_write(&policy->rwsem); ----> }
And thread A run to the same position as above, don't ignore my work blindly, that piece of bandage could save your time
Oh, please don't misunderstand me. I didn't had any intention of showing any kind of disrespect.
Okay, why do you say that the above thread you shown has a bug in there? Its juse updating policy->cpu and that shouldn't make anything unstable.
Please explain again one more time, in details why do you think you hit a different bug. Also look at kref.h to see which piece of code has hit that WARN() and it will happen only in the case I have shown. Lets see if I missed something :)
-- viresh
Viresh,
On 2015/1/30 10:05, Viresh Kumar wrote:
On 30 January 2015 at 07:00, ethan zhao ethan.zhao@oracle.com wrote:
It is another bug.
Hmm, lets see..
Yes, here is one bug should be fix. but seems not enough to avoid the issue completely,
Did you test it ?
For a PPC notification and xen-bus thread race, could you tell me a way how to reproduce it by trigger the PPC notification and xen-bus events manually ? You really want me write some code into a test kernel to flood the PPC and xen-bus at the same time ? if we could analysis code and get the issue clearly, we wouldn't wait the users to yell out.
Thanks, Ethan
how about the Thread B running here
Thread B: xenbus_thread()
xenbus_thread() msg->u.watch.handle->callback() handle_vcpu_hotplug_event() vcpu_hotplug() cpu_down() __cpu_notify(CPU_POST_DEAD..) cpufreq_cpu_callback() __cpufreq_remove_dev_prepare update_policy_cpu(){ ... down_write(&policy->rwsem); policy->last_cpu = policy->cpu; policy->cpu = cpu; up_write(&policy->rwsem); ----> }
And thread A run to the same position as above, don't ignore my work blindly, that piece of bandage could save your time
Oh, please don't misunderstand me. I didn't had any intention of showing any kind of disrespect.
Okay, why do you say that the above thread you shown has a bug in there? Its juse updating policy->cpu and that shouldn't make anything unstable.
Please explain again one more time, in details why do you think you hit a different bug. Also look at kref.h to see which piece of code has hit that WARN() and it will happen only in the case I have shown. Lets see if I missed something :)
-- viresh
On 30 January 2015 at 07:40, ethan zhao ethan.zhao@oracle.com wrote:
For a PPC notification and xen-bus thread race, could you tell me a way how to reproduce it by trigger the PPC notification and xen-bus events manually ? You really want me write some code into a test kernel to flood the PPC and xen-bus at the same time ? if we could analysis code and get the issue clearly, we wouldn't wait the users to yell out.
I thought you already have a test where you are hitting the issue you originally reported. Atleast Santosh did confirm that he is hitting 3/5 times in his kernel during boot..
My reasoning of why your observation doesn't fit here:
Copying from your earlier mail..
Thread A: Workqueue: kacpi_notify
acpi_processor_notify() acpi_processor_ppc_has_changed() cpufreq_update_policy() cpufreq_cpu_get() kobject_get()
This tries to increment the count and the warning you have mentioned happen because:
WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
i.e. even after incrementing the count, it is < 2. Which I believe will be 1. Which means that we have tried to do kobject_get() on a kobject for which kobject_put() is already done.
Thread B: xenbus_thread()
xenbus_thread() msg->u.watch.handle->callback() handle_vcpu_hotplug_event() vcpu_hotplug() cpu_down() __cpu_notify(CPU_DOWN_PREPARE..) cpufreq_cpu_callback() __cpufreq_remove_dev_prepare() update_policy_cpu() kobject_move()
Okay, where is the race or kobject_put() here ? We are just moving the kobject and it has nothing to do with the refcount of kobject.
Why do you see its a race ?
On 2015/1/30 10:13, Viresh Kumar wrote:
On 30 January 2015 at 07:40, ethan zhao ethan.zhao@oracle.com wrote:
For a PPC notification and xen-bus thread race, could you tell me a way how to reproduce it by trigger the PPC notification and xen-bus events manually ? You really want me write some code into a test kernel to flood the PPC and xen-bus at the same time ? if we could analysis code and get the issue clearly, we wouldn't wait the users to yell out.
I thought you already have a test where you are hitting the issue you originally reported. Atleast Santosh did confirm that he is hitting 3/5 times in his kernel during boot..
As I know, PPC notification only happens when power capping needed, maybe the server over-hot, if the cooling condition recover, you couldn't reproduce it either !.
My reasoning of why your observation doesn't fit here:
Copying from your earlier mail..
Thread A: Workqueue: kacpi_notify
acpi_processor_notify() acpi_processor_ppc_has_changed() cpufreq_update_policy() cpufreq_cpu_get() kobject_get()
This tries to increment the count and the warning you have mentioned happen because:
WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
i.e. even after incrementing the count, it is < 2. Which I believe will be
- Which means that we have tried to do kobject_get() on a kobject
for which kobject_put() is already done.
Thread B: xenbus_thread()
xenbus_thread() msg->u.watch.handle->callback() handle_vcpu_hotplug_event() vcpu_hotplug() cpu_down() __cpu_notify(CPU_DOWN_PREPARE..) cpufreq_cpu_callback() __cpufreq_remove_dev_prepare() update_policy_cpu() kobject_move()
Okay, where is the race or kobject_put() here ? We are just moving the kobject and it has nothing to do with the refcount of kobject.
Why do you see its a race ?
I mean the policy->cpu has been changed, that CPU is about to be down, Thread A continue to get and update the policy for it blindly, that is what I Say 'race', not the refcount itself.
Thanks, Ethan
On 30 January 2015 at 07:51, ethan zhao ethan.zhao@oracle.com wrote:
My reasoning of why your observation doesn't fit here:
Copying from your earlier mail..
Thread A: Workqueue: kacpi_notify
acpi_processor_notify() acpi_processor_ppc_has_changed() cpufreq_update_policy() cpufreq_cpu_get() kobject_get()
This tries to increment the count and the warning you have mentioned happen because:
WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
i.e. even after incrementing the count, it is < 2. Which I believe will be
- Which means that we have tried to do kobject_get() on a kobject
for which kobject_put() is already done.
Thread B: xenbus_thread()
xenbus_thread() msg->u.watch.handle->callback() handle_vcpu_hotplug_event() vcpu_hotplug() cpu_down() __cpu_notify(CPU_DOWN_PREPARE..) cpufreq_cpu_callback() __cpufreq_remove_dev_prepare() update_policy_cpu() kobject_move()
Okay, where is the race or kobject_put() here ? We are just moving the kobject and it has nothing to do with the refcount of kobject.
Why do you see its a race ?
I mean the policy->cpu has been changed, that CPU is about to be down, Thread A continue to get and update the policy for it blindly, that is what I Say 'race', not the refcount itself.
First of all, the WARN you had in your patch doesn't have anything to do with the so-called race you just define. Its because of the reason I defined earlier.
Second, what if policy->cpu is getting updated? A policy manages a group of CPUs, not a single cpu. And there still are other CPUs online for that policy and so kobject_get() for that policy->kobj is perfectly valid.
On 2015/1/30 11:14, Viresh Kumar wrote:
On 30 January 2015 at 07:51, ethan zhao ethan.zhao@oracle.com wrote:
My reasoning of why your observation doesn't fit here:
Copying from your earlier mail..
Thread A: Workqueue: kacpi_notify
acpi_processor_notify() acpi_processor_ppc_has_changed() cpufreq_update_policy() cpufreq_cpu_get() kobject_get()
This tries to increment the count and the warning you have mentioned happen because:
WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
i.e. even after incrementing the count, it is < 2. Which I believe will be
- Which means that we have tried to do kobject_get() on a kobject
for which kobject_put() is already done.
Thread B: xenbus_thread()
xenbus_thread() msg->u.watch.handle->callback() handle_vcpu_hotplug_event() vcpu_hotplug() cpu_down() __cpu_notify(CPU_DOWN_PREPARE..) cpufreq_cpu_callback() __cpufreq_remove_dev_prepare() update_policy_cpu() kobject_move()
Okay, where is the race or kobject_put() here ? We are just moving the kobject and it has nothing to do with the refcount of kobject.
Why do you see its a race ?
I mean the policy->cpu has been changed, that CPU is about to be down, Thread A continue to get and update the policy for it blindly, that is what I Say 'race', not the refcount itself.
First of all, the WARN you had in your patch doesn't have anything to do with the so-called race you just define. Its because of the reason I defined earlier.
Second, what if policy->cpu is getting updated? A policy manages a group of CPUs, not a single cpu. And there still are other CPUs online for that policy and so kobject_get() for that policy->kobj is perfectly valid.
You mean the policy is shared by all CPUs, so PPC notification about one CPU should update all CPU's policy, right ? even the requested CPU is shutting down.
Thanks, Ethan
On 30 January 2015 at 09:16, ethan zhao ethan.zhao@oracle.com wrote:
You mean the policy is shared by all CPUs, so PPC notification about one
A policy may or maynot be shared by multiple CPUs. It all depends on your systems configuration. All CPUs which share clock line, share a policy.
You can verify that from /sys/devices/system/cpu/cpu*/related_cpus. This gives the CPUs that share policy.
CPU should update all CPU's policy, right ? even the requested CPU is shutting down.
CPUs sharing policy have a single policy structure. And so policy would be updated for all CPUs that share the poilcy. Even if some cpu goes down, the policy might still be up and running.
-- viresh
Viresh,
On 2015/1/30 12:14, Viresh Kumar wrote:
On 30 January 2015 at 09:16, ethan zhao ethan.zhao@oracle.com wrote:
You mean the policy is shared by all CPUs, so PPC notification about one
A policy may or maynot be shared by multiple CPUs. It all depends on your systems configuration. All CPUs which share clock line, share a policy.
You can verify that from /sys/devices/system/cpu/cpu*/related_cpus. This gives the CPUs that share policy.
CPU should update all CPU's policy, right ? even the requested CPU is shutting down.
CPUs sharing policy have a single policy structure. And so policy would be updated for all CPUs that share the poilcy. Even if some cpu goes down, the policy might still be up and running.
Let' me check ACPI spec and BIOS to match your implementation.
About that patch you suggested, there is another question left pending:
Policy will be freed only when that's last CPU shutting down, how does it happen when system booting ?
The description of the patch would be wrong (the Xen_bus call stack) -- Did the xen_bus shut down every CPU till the last during booting ?
Thanks, Ethan
-- viresh
On 2 February 2015 at 07:24, ethan zhao ethan.zhao@oracle.com wrote:
Policy will be freed only when that's last CPU shutting down, how does it happen when system booting ?
I couldn't understand what you are asking here. Though policy will be allocated for the first CPU of every policy during boot..
The description of the patch would be wrong (the Xen_bus call stack) -- Did the xen_bus shut down every CPU till the last during booting ?
I don't know about that..
On Friday, January 30, 2015 06:43:12 AM Viresh Kumar wrote:
cpufreq_cpu_data is protected by cpufreq_driver_lock and one of the instances has missed this.
No, it isn't. cpufreq_cpu_data is a pointer and doesn't need any locks to protect it. What the lock does is to guarantee the callers of cpufreq_cpu_get() that the policy object won't go away from under them (it is used for some other purposes too, but they are unrelated).
What technically happens is an ordering problem. per_cpu(cpufreq_cpu_data, cpu) needs to be cleared before calling kobject_put(&policy->kobj) *and* under the lock, because otherwise if someone else calls cpufreq_cpu_get() in parallel with that, they can obtain a non-NULL from it *after* kobject_put(&policy->kobj) was executed.
So the lock is needed not just because it protects cpufreq_cpu_data, but because it is supposed to prevent writes to cpufreq_cpu_data from happening between the read from it and the kobject_get(&policy->kobj) in cpufreq_cpu_get().
And as a result we get this:
------------[ cut here ]------------ WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47 kobject_get+0x41/0x50() Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon ... Call Trace: [<ffffffff81661b14>] dump_stack+0x46/0x58 [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0 [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20 [<ffffffff812e16d1>] kobject_get+0x41/0x50 [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0 [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0 [<ffffffff810b8cb2>] ? up+0x32/0x50 [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2 [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252 [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0 [<ffffffff81360967>] ? acpi_has_method+0x25/0x40 [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82 [<ffffffff81089566>] ? move_linked_works+0x66/0x90 [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7 [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22 [<ffffffff8108c910>] process_one_work+0x160/0x410 [<ffffffff8108d05b>] worker_thread+0x11b/0x520 [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380 [<ffffffff81092421>] kthread+0xe1/0x100 [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0 [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0 [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0 ---[ end trace 89e66eb9795efdf7 ]---
And here is the race:
Thread A: Workqueue: kacpi_notify
acpi_processor_notify() acpi_processor_ppc_has_changed() cpufreq_update_policy() cpufreq_cpu_get() kobject_get()
Thread B: xenbus_thread()
xenbus_thread() msg->u.watch.handle->callback() handle_vcpu_hotplug_event() vcpu_hotplug() cpu_down() __cpu_notify(CPU_POST_DEAD..) cpufreq_cpu_callback() __cpufreq_remove_dev_finish() cpufreq_policy_put_kobj() kobject_put()
cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data under cpufreq_driver_lock, and once it gets a valid policy it expects it to not be freed until cpufreq_cpu_put() is called.
Because it does the kobject_get() under the lock too.
But the race happens when another thread puts the kobject first and updates cpufreq_cpu_data later
This is an ordering problem.
and that too without these locks.
And this is racy.
And so the first thread gets a valid policy structure and before it does kobject_get() on it, the second one does kobject_put(). And so this WARN().
Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that too under locks.
That's almost correct. In addition to the above (albeit maybe unintentionally) the patch also fixes the possible race between two instances of __cpufreq_remove_dev_finish() with the same arguments running in parallel with each other. The proof is left as an exercise to the reader. :-)
Please try to improve the changelog ...
Cc: stable@vger.kernel.org # 3.12+ Reported-by: Ethan Zhao ethan.zhao@oracle.com Reported-and-tested-by: Santosh Shilimkar santosh.shilimkar@oracle.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
@Santosh: I have changed read locks to write locks here and so you need to test again.
drivers/cpufreq/cpufreq.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 4473eba1d6b0..e3bf702b5588 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1409,9 +1409,10 @@ static int __cpufreq_remove_dev_finish(struct device *dev, unsigned long flags; struct cpufreq_policy *policy;
- read_lock_irqsave(&cpufreq_driver_lock, flags);
- write_lock_irqsave(&cpufreq_driver_lock, flags); policy = per_cpu(cpufreq_cpu_data, cpu);
- read_unlock_irqrestore(&cpufreq_driver_lock, flags);
- per_cpu(cpufreq_cpu_data, cpu) = NULL;
- write_unlock_irqrestore(&cpufreq_driver_lock, flags);
if (!policy) { pr_debug("%s: No cpu_data found\n", __func__); @@ -1466,7 +1467,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev, } }
- per_cpu(cpufreq_cpu_data, cpu) = NULL; return 0;
}
On 1/30/2015 2:57 PM, Rafael J. Wysocki wrote:
On Friday, January 30, 2015 06:43:12 AM Viresh Kumar wrote:
cpufreq_cpu_data is protected by cpufreq_driver_lock and one of the instances has missed this.
No, it isn't. cpufreq_cpu_data is a pointer and doesn't need any locks to protect it. What the lock does is to guarantee the callers of cpufreq_cpu_get() that the policy object won't go away from under them (it is used for some other purposes too, but they are unrelated).
What technically happens is an ordering problem. per_cpu(cpufreq_cpu_data, cpu) needs to be cleared before calling kobject_put(&policy->kobj) *and* under the lock, because otherwise if someone else calls cpufreq_cpu_get() in parallel with that, they can obtain a non-NULL from it *after* kobject_put(&policy->kobj) was executed.
So the lock is needed not just because it protects cpufreq_cpu_data, but because it is supposed to prevent writes to cpufreq_cpu_data from happening between the read from it and the kobject_get(&policy->kobj) in cpufreq_cpu_get().
And as a result we get this:
------------[ cut here ]------------ WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47 kobject_get+0x41/0x50() Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon ... Call Trace: [<ffffffff81661b14>] dump_stack+0x46/0x58 [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0 [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20 [<ffffffff812e16d1>] kobject_get+0x41/0x50 [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0 [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0 [<ffffffff810b8cb2>] ? up+0x32/0x50 [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2 [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252 [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0 [<ffffffff81360967>] ? acpi_has_method+0x25/0x40 [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82 [<ffffffff81089566>] ? move_linked_works+0x66/0x90 [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7 [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22 [<ffffffff8108c910>] process_one_work+0x160/0x410 [<ffffffff8108d05b>] worker_thread+0x11b/0x520 [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380 [<ffffffff81092421>] kthread+0xe1/0x100 [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0 [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0 [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0 ---[ end trace 89e66eb9795efdf7 ]---
And here is the race:
Thread A: Workqueue: kacpi_notify
acpi_processor_notify() acpi_processor_ppc_has_changed() cpufreq_update_policy() cpufreq_cpu_get() kobject_get()
Thread B: xenbus_thread()
xenbus_thread() msg->u.watch.handle->callback() handle_vcpu_hotplug_event() vcpu_hotplug() cpu_down() __cpu_notify(CPU_POST_DEAD..) cpufreq_cpu_callback() __cpufreq_remove_dev_finish() cpufreq_policy_put_kobj() kobject_put()
cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data under cpufreq_driver_lock, and once it gets a valid policy it expects it to not be freed until cpufreq_cpu_put() is called.
Because it does the kobject_get() under the lock too.
But the race happens when another thread puts the kobject first and updates cpufreq_cpu_data later
This is an ordering problem.
and that too without these locks.
And this is racy.
And so the first thread gets a valid policy structure and before it does kobject_get() on it, the second one does kobject_put(). And so this WARN().
Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that too under locks.
That's almost correct. In addition to the above (albeit maybe unintentionally) the patch also fixes the possible race between two instances of __cpufreq_remove_dev_finish() with the same arguments running in parallel with each other. The proof is left as an exercise to the reader. :-)
Please try to improve the changelog ...
Cc: stable@vger.kernel.org # 3.12+ Reported-by: Ethan Zhao ethan.zhao@oracle.com Reported-and-tested-by: Santosh Shilimkar santosh.shilimkar@oracle.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
@Santosh: I have changed read locks to write locks here and so you need to test again.
Right. Tested this version and confirm that it fixes the reported WARN()
Regards, Santosh
On 31 January 2015 at 04:27, Rafael J. Wysocki rjw@rjwysocki.net wrote:
No, it isn't. cpufreq_cpu_data is a pointer and doesn't need any locks to protect it. What the lock does is to guarantee the callers of cpufreq_cpu_get() that the policy object won't go away from under them (it is used for some other purposes too, but they are unrelated).
Yeah, its not just locking. Otherwise the locking only at the bottom of this routine should have fixed it.
That's almost correct. In addition to the above (albeit maybe unintentionally) the patch also fixes the possible race between two instances of __cpufreq_remove_dev_finish() with the same arguments running in parallel with each other. The proof is left as an exercise to the reader. :-)
Two __cpufreq_remove_dev_finish() can't get called in parallel, so it doesn't fix any race there :).
Please try to improve the changelog ...
Yes.
linaro-kernel@lists.linaro.org