Hi all,
This set unifies the AMD MCA interrupt handlers with common MCA code.
The goal is to avoid duplicating functionality like reading and clearing
MCA banks.
Based on feedback, this revision also include changes to the MCA init
flow.
Patches 1-4:
General fixes and cleanups.
Patches 5-10:
Add BSP-only init flow and related changes.
Patches 11-15:
Updates from v1 set.
Patch 16:
Interrupt storm handling rebased on current set.
Thanks,
Yazen
---
Changes in v2:
- Add general cleanup pre-patches.
- Add changes for BSP-only init.
- Add interrupt storm handling for AMD.
- Link to v1: https://lore.kernel.org/r/20240523155641.2805411-1-yazen.ghannam@amd.com
---
Borislav Petkov (1):
x86/mce: Cleanup bank processing on init
Smita Koralahalli (1):
x86/mce: Handle AMD threshold interrupt storms
Yazen Ghannam (14):
x86/mce: Don't remove sysfs if thresholding sysfs init fails
x86/mce/amd: Remove return value for mce_threshold_create_device()
x86/mce/amd: Remove smca_banks_map
x86/mce/amd: Put list_head in threshold_bank
x86/mce: Remove __mcheck_cpu_init_early()
x86/mce: Define BSP-only init
x86/mce: Define BSP-only SMCA init
x86/mce: Do 'UNKNOWN' vendor check early
x86/mce: Separate global and per-CPU quirks
x86/mce: Move machine_check_poll() status checks to helper functions
x86/mce: Unify AMD THR handler with MCA Polling
x86/mce: Unify AMD DFR handler with MCA Polling
x86/mce/amd: Enable interrupt vectors once per-CPU on SMCA systems
x86/mce/amd: Support SMCA Corrected Error Interrupt
arch/x86/include/asm/mce.h | 7 +-
arch/x86/kernel/cpu/common.c | 1 +
arch/x86/kernel/cpu/mce/amd.c | 391 +++++++++++++-----------------------
arch/x86/kernel/cpu/mce/core.c | 322 ++++++++++++++---------------
arch/x86/kernel/cpu/mce/intel.c | 15 ++
arch/x86/kernel/cpu/mce/internal.h | 8 +
arch/x86/kernel/cpu/mce/threshold.c | 3 +
7 files changed, 332 insertions(+), 415 deletions(-)
---
base-commit: b36de8b904b8ff2095ece7af6b3cfff8c73c2fb1
change-id: 20250210-wip-mca-updates-bed2a67c9c57
Once device_register() failed, we should call put_device() to
decrement reference count for cleanup. Or it could cause memory leak.
device_register() includes device_add(). As comment of device_add()
says, 'if device_add() succeeds, you should call device_del() when you
want to get rid of it. If device_add() has not succeeded, use only
put_device() to drop the reference count'.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 37d6a0a6f470 ("PCI: Add pci_register_host_bridge() interface")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v2:
- modified the patch description.
---
drivers/pci/probe.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index b6536ed599c3..03dc65cf48f1 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1017,8 +1017,10 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge)
name = dev_name(&bus->dev);
err = device_register(&bus->dev);
- if (err)
+ if (err) {
+ put_device(&bus->dev);
goto unregister;
+ }
pcibios_add_bus(bus);
--
2.25.1
Once device_register() failed, we should call put_device() to
decrement reference count for cleanup. Or it could cause memory leak.
As comment of device_register() says, 'NOTE: _Never_ directly free
@dev after calling this function, even if it returned an error! Always
use put_device() to give up the reference initialized in this function
instead.'
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v5:
- modified the bug description as suggestions;
Changes in v4:
- deleted the redundant initialization;
Changes in v3:
- modified the patch as suggestions;
Changes in v2:
- modified the patch as suggestions.
---
arch/arm/common/locomo.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/arch/arm/common/locomo.c b/arch/arm/common/locomo.c
index cb6ef449b987..45106066a17f 100644
--- a/arch/arm/common/locomo.c
+++ b/arch/arm/common/locomo.c
@@ -223,10 +223,8 @@ locomo_init_one_child(struct locomo *lchip, struct locomo_dev_info *info)
int ret;
dev = kzalloc(sizeof(struct locomo_dev), GFP_KERNEL);
- if (!dev) {
- ret = -ENOMEM;
- goto out;
- }
+ if (!dev)
+ return -ENOMEM;
/*
* If the parent device has a DMA mask associated with it,
@@ -254,10 +252,9 @@ locomo_init_one_child(struct locomo *lchip, struct locomo_dev_info *info)
NO_IRQ : lchip->irq_base + info->irq[0];
ret = device_register(&dev->dev);
- if (ret) {
- out:
- kfree(dev);
- }
+ if (ret)
+ put_device(&dev->dev);
+
return ret;
}
--
2.25.1
If device_add() fails, do not use device_unregister() for error
handling. device_unregister() consists two functions: device_del() and
put_device(). device_unregister() should only be called after
device_add() succeeded because device_del() undoes what device_add()
does if successful. Change device_unregister() to put_device() call
before returning from the function.
As comment of device_add() says, 'if device_add() succeeds, you should
call device_del() when you want to get rid of it. If device_add() has
not succeeded, use only put_device() to drop the reference count'.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 53d2a715c240 ("phy: Add Tegra XUSB pad controller support")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v2:
- modified the bug description as suggestions.
---
drivers/phy/tegra/xusb.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/phy/tegra/xusb.c b/drivers/phy/tegra/xusb.c
index 79d4814d758d..c89df95aa6ca 100644
--- a/drivers/phy/tegra/xusb.c
+++ b/drivers/phy/tegra/xusb.c
@@ -548,16 +548,16 @@ static int tegra_xusb_port_init(struct tegra_xusb_port *port,
err = dev_set_name(&port->dev, "%s-%u", name, index);
if (err < 0)
- goto unregister;
+ goto put_device;
err = device_add(&port->dev);
if (err < 0)
- goto unregister;
+ goto put_device;
return 0;
-unregister:
- device_unregister(&port->dev);
+put_device:
+ put_device(&port->dev);
return err;
}
--
2.25.1
Fix two issues when cross-building userprogs with clang.
Reproducer, using nolibc to avoid libc requirements for cross building:
$ tail -2 init/Makefile
userprogs-always-y += test-llvm
test-llvm-userccflags += -nostdlib -nolibc -static -isystem usr/ -include $(srctree)/tools/include/nolibc/nolibc.h
$ cat init/test-llvm.c
int main(void)
{
return 0;
}
$ make ARCH=arm64 LLVM=1 allnoconfig headers_install init/
Validate that init/test-llvm builds and has the correct binary format.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Thomas Weißschuh (2):
kbuild: userprogs: fix bitsize and target detection on clang
kbuild: userprogs: use lld to link through clang
Makefile | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
---
base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
change-id: 20250213-kbuild-userprog-fixes-4f07b62ae818
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Hi Juergen, hi all,
Radoslav Bodó reported in Debian an issue after updating our kernel
from 6.1.112 to 6.1.115. His report in full is at:
https://bugs.debian.org/1088159
He reports that after switching to 6.1.115 (and present in any of the
later 6.1.y series) booting under xen, the mptsas devices are not
anymore accessible, the boot shows:
mpt3sas version 43.100.00.00 loaded
mpt3sas_cm0: 63 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (8086116 kB)
mpt3sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
mpt3sas_cm0: MSI-X vectors supported: 96
mpt3sas_cm0: 0 40 40
mpt3sas_cm0: High IOPs queues : disabled
mpt3sas0-msix0: PCI-MSI-X enabled: IRQ 447
mpt3sas0-msix1: PCI-MSI-X enabled: IRQ 448
mpt3sas0-msix2: PCI-MSI-X enabled: IRQ 449
mpt3sas0-msix3: PCI-MSI-X enabled: IRQ 450
mpt3sas0-msix4: PCI-MSI-X enabled: IRQ 451
mpt3sas0-msix5: PCI-MSI-X enabled: IRQ 452
mpt3sas0-msix6: PCI-MSI-X enabled: IRQ 453
mpt3sas0-msix7: PCI-MSI-X enabled: IRQ 454
mpt3sas0-msix8: PCI-MSI-X enabled: IRQ 455
mpt3sas0-msix9: PCI-MSI-X enabled: IRQ 456
mpt3sas0-msix10: PCI-MSI-X enabled: IRQ 457
mpt3sas0-msix11: PCI-MSI-X enabled: IRQ 458
mpt3sas0-msix12: PCI-MSI-X enabled: IRQ 459
mpt3sas0-msix13: PCI-MSI-X enabled: IRQ 460
mpt3sas0-msix14: PCI-MSI-X enabled: IRQ 461
mpt3sas0-msix15: PCI-MSI-X enabled: IRQ 462
mpt3sas0-msix16: PCI-MSI-X enabled: IRQ 463
mpt3sas0-msix17: PCI-MSI-X enabled: IRQ 464
mpt3sas0-msix18: PCI-MSI-X enabled: IRQ 465
mpt3sas0-msix19: PCI-MSI-X enabled: IRQ 466
mpt3sas0-msix20: PCI-MSI-X enabled: IRQ 467
mpt3sas0-msix21: PCI-MSI-X enabled: IRQ 468
mpt3sas0-msix22: PCI-MSI-X enabled: IRQ 469
mpt3sas0-msix23: PCI-MSI-X enabled: IRQ 470
mpt3sas0-msix24: PCI-MSI-X enabled: IRQ 471
mpt3sas0-msix25: PCI-MSI-X enabled: IRQ 472
mpt3sas0-msix26: PCI-MSI-X enabled: IRQ 473
mpt3sas0-msix27: PCI-MSI-X enabled: IRQ 474
mpt3sas0-msix28: PCI-MSI-X enabled: IRQ 475
mpt3sas0-msix29: PCI-MSI-X enabled: IRQ 476
mpt3sas0-msix30: PCI-MSI-X enabled: IRQ 477
mpt3sas0-msix31: PCI-MSI-X enabled: IRQ 478
mpt3sas0-msix32: PCI-MSI-X enabled: IRQ 479
mpt3sas0-msix33: PCI-MSI-X enabled: IRQ 480
mpt3sas0-msix34: PCI-MSI-X enabled: IRQ 481
mpt3sas0-msix35: PCI-MSI-X enabled: IRQ 482
mpt3sas0-msix36: PCI-MSI-X enabled: IRQ 483
mpt3sas0-msix37: PCI-MSI-X enabled: IRQ 484
mpt3sas0-msix38: PCI-MSI-X enabled: IRQ 485
mpt3sas0-msix39: PCI-MSI-X enabled: IRQ 486
mpt3sas_cm0: iomem(0x00000000ac400000), mapped(0x00000000d9f45f61), size(65536)
mpt3sas_cm0: ioport(0x0000000000006000), size(256)
mpt3sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
mpt3sas_cm0: scatter gather: sge_in_main_msg(1), sge_per_chain(7), sge_per_io(128), chains_per_io(19)
mpt3sas_cm0: failure at drivers/scsi/mpt3sas/mpt3sas_scsih.c:12348/_scsih_probe()!
We were able to bissect the changes (see https://bugs.debian.org/1088159#64) down to
b1e6e80a1b42 ("xen/swiotlb: add alignment check for dma buffers")
#regzbot introduced: b1e6e80a1b42
#regzbot link: https://bugs.debian.org/1088159
reverting the commit resolves the issue.
Does that ring some bells?
In fact we have two more bugs reported with similar symptoms but not
yet confirmed they are the same, but I'm referencing them here as well
in case we are able to cross-match to root cause:
https://bugs.debian.org/1093371 (megaraid_sas didn't work anymore with
Xen)
and
https://bugs.debian.org/1087807 (Unable to boot: i40e swiotlb buffer
is full)
(but again the these are yet not confirmed to have the same root
cause).
Thanks in advance,
Regards,
Salvatore