Re: deadlock in HMP

26 Jun 2014


      On 26/06/14 10:37, Alex Shi wrote:
...
Thanks for testing, Naresh!
于 6/26/14, 17:34, Naresh Kamboju 写道:
...
...
...
After applying above patch. Boot tested on vexpress-tc2 and noticed
that this deadlock bug stopped reporting.
Boot logs attached:

deadlock-on-hmp-with-alex-fix-patch-on-tc2.log
deadlock-on-hmp-without-alex-fix-patch-on-tc2.log

Bug updated:
https://bugs.launchpad.net/linaro-stable-kernel/+bug/1330392
Yes, the warning goes away but what you've done is to just ignore 
locking this resource. It's not guaranteed by anything in CPUidle that 
the driver won't be changed while we are doing this. On all our 
platforms, the cpuidle driver is built-in rather than loaded as a module 
so it's extremely unlikely to happen (I think it's actually impossible) 
but I don't think we should write kernel code that works due to 
circumstance - we should be honoring the contract for accessing this 
resource. We didn't have the warning checks working properly in our 
integrated test and I didn't see this before but the root cause is that 
I should not be taking that lock in potentially re-entrant code without 
guarding it.
To properly fix this specific implementation will probably require 
hacking a new API into the cpuidle framework (something like 
try_cpuidle_driver_ref) which will make future backporting a wee bit 
more complicated but not a lot. The patch would have to be added to the 
big.LITTLE MP patch set.
Perhaps we could also add a lock into the timer hack which we can 
trylock before attempting to access the cpuidle driver, this would at 
least stop us deadlocking ourselves.
An alternative might be to re-engineer the solution to provide a 
low-resume-latency request interface which the scheduler can use instead 
of a timer hack to keep it alive - the existing QoS interface isn't 
suitable because it's global and wakes all the CPUs up to get the new 
latency requirement in effect. That would not strictly be a big.LITTLE 
feature, but as its the only client it would likely need to be carried 
around in the same patch set.
There is already a suitable state disable API in cpuidle, but 
determining which states to disable still requires reading the exit 
latency which is only available in this driver information.
In summary, I don't like this fix although I accept that it works and 
it'll continue to work for TC2 and any other platform where the idle 
driver cannot be changed at runtime.
What do you think?
--Chris

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: deadlock in HMP