6.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Breno Leitao leitao@debian.org
commit deb2f228388ff3a9d0623e3b59a053e9235c341d upstream.
When platform firmware supplies error information to the OS, e.g., via the ACPI APEI GHES mechanism, it may identify an error source device that doesn't advertise an AER Capability and therefore dev->aer_info, which contains AER stats and ratelimiting data, is NULL.
pci_dev_aer_stats_incr() already checks dev->aer_info for NULL, but aer_ratelimit() did not, leading to NULL pointer dereferences like this one from the URL below:
{1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 {1}[Hardware Error]: event severity: corrected {1}[Hardware Error]: device_id: 0000:00:00.0 {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x2020 {1}[Hardware Error]: aer_cor_status: 0x00001000, aer_cor_mask: 0x00002000 BUG: kernel NULL pointer dereference, address: 0000000000000264 RIP: 0010:___ratelimit+0xc/0x1b0 pci_print_aer+0x141/0x360 aer_recover_work_func+0xb5/0x130
[8086:2020] is an Intel "Sky Lake-E DMI3 Registers" device that claims to be a Root Port but does not advertise an AER Capability.
Add a NULL check in aer_ratelimit() to avoid the NULL pointer dereference. Note that this also prevents ratelimiting these events from GHES.
Fixes: a57f2bfb4a5863 ("PCI/AER: Ratelimit correctable and non-fatal error logging") Link: https://lore.kernel.org/r/buduna6darbvwfg3aogl5kimyxkggu3n4romnmq6sozut6axeu... Signed-off-by: Breno Leitao leitao@debian.org [bhelgaas: add crash details to commit log] Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Kuppuswamy Sathyanarayanan sathyanarayanan.kuppuswamy@linux.intel.com Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250929-aer_crash_2-v1-1-68ec4f81c356@debian.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/pcie/aer.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -786,6 +786,9 @@ static void pci_rootport_aer_stats_incr(
static int aer_ratelimit(struct pci_dev *dev, unsigned int severity) { + if (!dev->aer_info) + return 1; + switch (severity) { case AER_NONFATAL: return __ratelimit(&dev->aer_info->nonfatal_ratelimit);