On Wed, 8 Jan 2025 at 14:39, Manivannan Sadhasivam via B4 Relay devnull+manivannan.sadhasivam.linaro.org@kernel.org wrote:
From: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org
Currently, in mhi_pci_runtime_resume(), if the resume fails, recovery_work is started asynchronously and success is returned. But this doesn't align with what PM core expects as documented in Documentation/power/runtime_pm.rst:
"Once the subsystem-level resume callback (or the driver resume callback, if invoked directly) has completed successfully, the PM core regards the device as fully operational, which means that the device _must_ be able to complete I/O operations as needed. The runtime PM status of the device is then 'active'."
So the PM core ends up marking the runtime PM status of the device as 'active', even though the device is not able to handle the I/O operations. This same condition more or less applies to system resume as well.
So to avoid this ambiguity, try to recover the device synchronously from mhi_pci_runtime_resume() and return the actual error code in the case of recovery failure.
For doing so, move the recovery code to __mhi_pci_recovery_work() helper and call that from both mhi_pci_recovery_work() and mhi_pci_runtime_resume(). Former still ignores the return value, while the latter passes it to PM core.
Cc: stable@vger.kernel.org # 5.13 Reported-by: Johan Hovold johan@kernel.org Closes: https://lore.kernel.org/mhi/Z2PbEPYpqFfrLSJi@hovoldconsulting.com Fixes: d3800c1dce24 ("bus: mhi: pci_generic: Add support for runtime PM") Signed-off-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org
Note that it will noticeably impact the user experience on system-wide resume (mhi_pci_resume), because MHI devices usually take a while (a few seconds) to cold boot and reach a ready state (or time out in the worst case). So we may have people complaining about delayed resume regression on their laptop even if they are not using the MHI device/modem function. Are we ok with that?
Regards, Loic