On 2015/1/30 10:13, Viresh Kumar wrote:
On 30 January 2015 at 07:40, ethan zhao ethan.zhao@oracle.com wrote:
For a PPC notification and xen-bus thread race, could you tell me a way how to reproduce it by trigger the PPC notification and xen-bus events manually ? You really want me write some code into a test kernel to flood the PPC and xen-bus at the same time ? if we could analysis code and get the issue clearly, we wouldn't wait the users to yell out.
I thought you already have a test where you are hitting the issue you originally reported. Atleast Santosh did confirm that he is hitting 3/5 times in his kernel during boot..
As I know, PPC notification only happens when power capping needed, maybe the server over-hot, if the cooling condition recover, you couldn't reproduce it either !.
My reasoning of why your observation doesn't fit here:
Copying from your earlier mail..
Thread A: Workqueue: kacpi_notify
acpi_processor_notify() acpi_processor_ppc_has_changed() cpufreq_update_policy() cpufreq_cpu_get() kobject_get()
This tries to increment the count and the warning you have mentioned happen because:
WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
i.e. even after incrementing the count, it is < 2. Which I believe will be
- Which means that we have tried to do kobject_get() on a kobject
for which kobject_put() is already done.
Thread B: xenbus_thread()
xenbus_thread() msg->u.watch.handle->callback() handle_vcpu_hotplug_event() vcpu_hotplug() cpu_down() __cpu_notify(CPU_DOWN_PREPARE..) cpufreq_cpu_callback() __cpufreq_remove_dev_prepare() update_policy_cpu() kobject_move()
Okay, where is the race or kobject_put() here ? We are just moving the kobject and it has nothing to do with the refcount of kobject.
Why do you see its a race ?
I mean the policy->cpu has been changed, that CPU is about to be down, Thread A continue to get and update the policy for it blindly, that is what I Say 'race', not the refcount itself.
Thanks, Ethan