On Wed, Jul 22, 2020 at 7:04 PM Bjorn Helgaas helgaas@kernel.org wrote:
On Wed, Jul 22, 2020 at 06:46:06PM -0600, Robert Hancock wrote:
On Wed, Jul 22, 2020 at 11:40 AM Bjorn Helgaas helgaas@kernel.org wrote:
On Tue, Jul 21, 2020 at 08:18:03PM -0600, Robert Hancock wrote:
Recently ASPM handling was changed to no longer disable ASPM on all PCIe to PCI bridges. Unfortunately these ASMedia PCIe to PCI bridge devices don't seem to function properly with ASPM enabled, as they cause the parent PCIe root port to cause repeated AER timeout errors. In addition to flooding the kernel log, this also causes the machine to wake up immediately after suspend is initiated.
Hi Robert, thanks a lot for the report of this problem (https://lore.kernel.org/r/CADLC3L1R2hssRjxHJv9yhdN_7-hGw58rXSfNp-FraZh0Tw+gR... and https://bugzilla.redhat.com/show_bug.cgi?id=1853960).
I'm pretty sure Linux ASPM support is missing some things. This problem might be a hardware problem where a quirk is the right solution, but it could also be that it's a result of a Linux defect that we should fix.
Could you collect the dmesg log and "sudo lspci -vvxxxx" output somewhere (maybe a bugzilla.kernel.org issue)? I want to figure out whether this L1 PM substates are enabled on this link, and whether that's configured correctly.
Created a Bugzilla entry and added dmesg and lspci output: https://bugzilla.kernel.org/show_bug.cgi?id=208667
As I noted in that report, I subsequently found this page on ASMedia's site: https://www.asmedia.com.tw/eng/e_show_products.php?cate_index=169&item=1... which indicates this ASM1083 device has "No PCIe ASPM support".
How nice. According to your lspci, the device itself claims to support ASPM:
02:00.0 ... ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge LnkCap: ... ASPM L0s L1 ...
but the web page claims otherwise. That would mean the device is defective for claiming something that's not true. Or possibly those capability bits can be set by BIOS.
It's not clear why this problem isn't occurring on Windows however - either it is not enabling ASPM, somehow it doesn't cause issues with the PCIe link, or it is causing issues and just doesn't notify the user in any way. I can try and check if this bridge device is ending up with ASPM enabled under Windows 10 or not..
If Windows *does* manage to enable ASPM, that would be interesting. I don't know whether Windows has a similar quirk mechanism. I suppose they must have *some* way to work around defective devices.
As I posted on the Bugzilla report, based on lspci output it appears Windows does have ASPM L0s enabled for this bridge. However, it appears to have the exact same problem: there are correctable PCIe error entries showing up in the Windows system event log against the root port the bridge is connected to. So I am thinking this hardware is just broken with ASPM enabled.