[+cc iwlwifi folks]
Re: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
On Wed, Mar 29, 2023 at 04:17:29PM -0700, Ben Greear wrote:
On 8/30/22 3:16 PM, Ben Greear wrote: ...
I notice this patch appears to be in 6.2.6 kernel, and my kernel logs are full of spam and system is unstable. Possibly the unstable part is related to something else, but the log spam is definitely extreme.
These systems are fairly stable on 5.19-ish kernels without the patch in question.
Hmmm, I was going to thank you for the report, but looking closer, I see that you reported this last August [1] and we *should* have pursued it with the iwlwifi folks or figured out what the PCI core is doing wrong, but I totally dropped the ball. Sorry about that.
To make sure we're all on the same page, we're talking about 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()") [2], which is present in v6.0 and later [3] but not v5.19.16 [4].
Here is sample of the spam:
[ 1675.547023] pcieport 0000:03:02.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) [ 1675.556851] pcieport 0000:03:02.0: device [10b5:8619] error status/mask=00100000/00000000 [ 1675.563904] pcieport 0000:03:02.0: [20] UnsupReq (First) [ 1675.569398] pcieport 0000:03:02.0: AER: TLP Header: 34000000 05001f10 00000000 88c888c8 [ 1675.576296] iwlwifi 0000:05:00.0: AER: can't recover (no error_detected callback)
The TLP header says this is an LTR message from 05:00.0. Apparently the bridge above 05:00.0 is 03:02.0, which logged an Unsupported Request error for the message, probably because 03:02.0 doesn't have LTR enabled.
Can you collect the output of "sudo lspci -vv"? Does this happen even before loading the iwlwifi driver? I assume there are no hotplug events before this happens?
The PCI core enables LTR during enumeration for every device for which LTR is supported and enabled along the entire path up to a Root Port. If it does that wrong, you might see errors even before loading iwlwifi.
I see that iwlwifi *reads* PCI_EXP_DEVCTL2_LTR_EN in iwl_pcie_apm_config(), which should be safe. I don't see any writes, but the iwlwifi experts should know more about this. There are a couple paths that do this, which looks somehow related:
__iwl_mvm_mac_start iwl_mvm_up iwl_mvm_config_ltr if (trans->ltr_enabled) iwl_mvm_send_cmd_pdu(mvm, LTR_CONFIG, ...)
Bjorn
[1] https://lore.kernel.org/all/47b775c5-57fa-5edf-b59e-8a9041ffbee7@candelatech... [2] https://git.kernel.org/linus/8795e182b02d [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv... [4] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/driver...