On Thu, Oct 10, 2024 at 09:42:46AM +0200, Johan Hovold wrote:
When using the in-kernel pd-mapper on x1e80100, client drivers often fail to communicate with the firmware during boot, which specifically breaks battery and USB-C altmode notifications. This has been observed to happen on almost every second boot (41%) but likely depends on probe order:
pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125) pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125 ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125 qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications
In the same setup audio also fails to probe albeit much more rarely:
PDR: avs/audio get domain list txn wait failed: -110 PDR: service lookup for avs/audio failed: -110
Chris Lew has provided an analysis and is working on a fix for the ECANCELED (125) errors, but it is not yet clear whether this will also address the audio regression.
Even if this was first observed on x1e80100 there is currently no reason to believe that these issues are specific to that platform.
Disable the in-kernel pd-mapper for now, and make sure to backport this to stable to prevent users and distros from migrating away from the user-space service.
Fixes: 1ebcde047c54 ("soc: qcom: add pd-mapper implementation") Cc: stable@vger.kernel.org # 6.11 Link: https://lore.kernel.org/lkml/Zqet8iInnDhnxkT9@hovoldconsulting.com/ Signed-off-by: Johan Hovold johan+linaro@kernel.org
It's now been over two months since I reported this regression, and even if we seem to be making some progress on at least some of these issues I think we need disable the pd-mapper temporarily until the fixes are in place (e.g. to prevent distros from dropping the user-space service).
This is just a random thought, but I wonder if we could insert a delay somewhere as temporary workaround to make the in-kernel pd-mapper more reliable. I just tried replicating the userspace pd-mapper timing on X1E80100 CRD by:
1. Disabling auto-loading of qcom_pd_mapper (modprobe.blacklist=qcom_pd_mapper) 2. Adding a systemd service that does nothing except running "modprobe qcom_pd_mapper" at the same point in time where the userspace pd-mapper would usually be started.
This seems to work quite well for me, I haven't seen any of the mentioned errors anymore in a couple of boot tests. Clearly, there is no actual bug in the in-kernel pd-mapper, only worse timing.
Thanks, Stephan