On Mon, Jul 21, 2025 at 02:49:14PM GMT, Johan Hovold wrote:
On Mon, Jul 21, 2025 at 04:26:41PM +0530, Manivannan Sadhasivam wrote:
On Mon, Jul 21, 2025 at 11:32:44AM GMT, Johan Hovold wrote:
On Mon, Jul 14, 2025 at 11:31:04PM +0530, Manivannan Sadhasivam wrote:
Commit 9f4f3dfad8cf ("PCI: qcom: Enable ASPM for platforms supporting 1.9.0 ops") allowed the Qcom controller driver to enable ASPM for all PCI devices enumerated at the time of the controller driver probe. It proved to be useful for devices already powered on by the bootloader as it allowed devices to enter ASPM without user intervention.
However, it could not enable ASPM for the hotplug capable devices i.e., devices enumerated *after* the controller driver probe. This limitation mostly went unnoticed as the Qcom PCI controllers are not hotplug capable and also the bootloader has been enabling the PCI devices before Linux Kernel boots (mostly on the Qcom compute platforms which users use on a daily basis).
But with the advent of the commit b458ff7e8176 ("PCI/pwrctl: Ensure that pwrctl drivers are probed before PCI client drivers"), the pwrctrl driver started to block the PCI device enumeration until it had been probed. Though, the intention of the commit was to avoid race between the pwrctrl driver and PCI client driver, it also meant that the pwrctrl controlled PCI devices may get probed after the controller driver and will no longer have ASPM enabled. So users started noticing high runtime power consumption with WLAN chipsets on Qcom compute platforms like Thinkpad X13s, and Thinkpad T14s, etc...
Note the ASPM regression for ath11k/ath12k only happened in 6.15, so commit b458ff7e8176 ("PCI/pwrctl: Ensure that pwrctl drivers are probed before PCI client drivers") in 6.13 does not seem to be the immediate culprit here.
This series was intented to fix the ASPM issue which exist even before the introduction of pwrctrl framework.
But this limitation of the ASPM enable implementation wasn't really an issue before pwrctrl since, as you point out above, these controllers are not hotplug capable.
Yeah, but nothing prevented an user from powering on the endpoint later and doing manual rescan.
But I also agree that the below commits made the issue more visible and caused regression on platforms where WLAN used to work.
Candidates from 6.15 include commits like
957f40d039a9 ("PCI/pwrctrl: Move creation of pwrctrl devices to pci_scan_device()") 2489eeb777af ("PCI/pwrctrl: Skip scanning for the device further if pwrctrl device is created")
This is probably related to the reports of these drivers sometimes failing to probe with
ath12k_pci 0004:01:00.0: of_irq_parse_pci: failed with rc=134
after pwrctrl was merged, and which since 6.15 should instead result in the drivers not probing at all (as we've discussed off list).
We discussed about the ASPM issue IIRC. The above mentioned of_irq_parse_pci could also be related to the fact that we are turning off the supplies after pci_dev destruction. For this issue, I guess the patch from Brian could be the fix:
https://lore.kernel.org/linux-pci/20250711174332.1.I623f788178c1e4c5b1a41dbf...
We've also discussed the rc=134 issue, which appears to be due to some pwrctrl race. IIRC, you thought it may be the bluetooth driver powering down the bt/wlan controller before the wlan bit has had a chance to (complete its) probe. Not sure if that was fully confirmed, but I remember you saying that the rc=134 symptom would no longer be visible since 6.15 and instead wlan would never even probe at all if we hit this issue...
Ah yes, this one was *before* the ASPM discussion we had.
The patch you link to above only appears to relate to drivers being manually unbound. I hope we're not also hitting such issues during regular boot?
The patch makes sure that the pwrctrl doesn't power off the endpoint when 'struct pci_dev' gets destroyed. But thinking more, I'm not sure if that's what happenning during the 'of_irq_parse_pci' issue.
I need to dig more at some point.
- Mani