On Wed, Feb 12, 2025 at 11:59:36AM +0100, Johan Hovold wrote:
On Wed, Feb 12, 2025 at 10:37:35AM +0100, Johan Hovold wrote:
On Tue, Feb 11, 2025 at 10:37:11PM +0530, Mukesh Ojha wrote:
On Mon, Feb 10, 2025 at 10:43:23AM +0100, Johan Hovold wrote:
A Link tag to my report would be good to have as well if this fixes the audio regression.
I see this is somehow matching the logs you have reported, but this deadlock is there from the very first day of pdr_interface driver.
[ 14.565059] PDR: avs/audio get domain list txn wait failed: -110 [ 14.571943] PDR: service lookup for avs/audio failed: -110
Yes, but using the in-kernel pd-mapper has exposed a number of existing bugs since it changes the timing of events enough to make it easier to hit them.
The audio regression is a very real regression for users of Snapdragon based laptops like, for example, the Lenovo Yoga Slim 7x.
If Bjorn has confirmed that this is the same issue (I can try to instrument the code based on your analysis to confirm this too), then I think it would be good to mention this in the commit message and link to the report, for example:
This specifically also fixes an audio regression when using the in-kernel pd-mapper as that makes it easier to hit this race. [1]
Link: https://lore.kernel.org/lkml/Zqet8iInnDhnxkT9@hovoldconsulting.com/ # [1]
or similar.
I can confirm that audio regression with the in-kernel pd-mapper appears to be caused by the race that this patch fixes.
If I insert a short (100-200 ms) sleep before taking the list lock in pdr_locator_new_server() to increase the race window, I see the audio service failing to register on both the X1E CRD and Lenovo ThinkPad X13s:
[ 11.118557] pdr_add_lookup - tms/servreg / msm/adsp/charger_pd [ 11.443632] pdr_locator_new_server [ 11.558122] pdr_locator_new_server - taking list lock [ 11.563939] pdr_locator_new_server - releasing list lock [ 11.582178] pdr_locator_work - taking list lock [ 11.594468] pdr_locator_work - releasing list lock [ 11.992018] pdr_add_lookup - avs/audio / msm/adsp/audio_pd [ 11.992034] pdr_add_lookup - avs/audio / msm/adsp/audio_pd [ 11.992224] pdr_locator_new_server < 100 ms sleep inserted before taking lock in pdr_locator_new_server() > [ 11.997937] pdr_locator_work - taking list lock [ 12.093984] pdr_locator_new_server - taking list lock [ 17.120169] PDR: avs/audio get domain list txn wait failed: -110 [ 17.127066] PDR: service lookup for avs/audio failed: -110 [ 17.132893] pdr_locator_work - releasing list lock [ 17.139885] pdr_locator_new_server - releasing list lock
[ On the X13s, where I have not hit this issue with the in-kernel pd-mapper, I had to make sure to insert the sleep only on the second call, possibly because of interaction with the charger_pd registration which happened closer to the audio registration. ]
Please add a comment and link to the audio regression report as I suggested above, and feel free to include my:
Tested-by: Johan Hovold johan+linaro@kernel.org
Thanks for fixing this!
Thanks for putting extra effort in reproducing the issue and testing.
-Mukesh
Johan