Hi all,
This set unifies the AMD MCA interrupt handlers with common MCA code. The goal is to avoid duplicating functionality like reading and clearing MCA banks.
Based on feedback, this revision also include changes to the MCA init flow.
Patches 1-4: General fixes and cleanups.
Patches 5-10: Add BSP-only init flow and related changes.
Patches 11-15: Updates from v1 set.
Patch 16: Interrupt storm handling rebased on current set.
Thanks, Yazen
--- Changes in v2: - Add general cleanup pre-patches. - Add changes for BSP-only init. - Add interrupt storm handling for AMD. - Link to v1: https://lore.kernel.org/r/20240523155641.2805411-1-yazen.ghannam@amd.com
--- Borislav Petkov (1): x86/mce: Cleanup bank processing on init
Smita Koralahalli (1): x86/mce: Handle AMD threshold interrupt storms
Yazen Ghannam (14): x86/mce: Don't remove sysfs if thresholding sysfs init fails x86/mce/amd: Remove return value for mce_threshold_create_device() x86/mce/amd: Remove smca_banks_map x86/mce/amd: Put list_head in threshold_bank x86/mce: Remove __mcheck_cpu_init_early() x86/mce: Define BSP-only init x86/mce: Define BSP-only SMCA init x86/mce: Do 'UNKNOWN' vendor check early x86/mce: Separate global and per-CPU quirks x86/mce: Move machine_check_poll() status checks to helper functions x86/mce: Unify AMD THR handler with MCA Polling x86/mce: Unify AMD DFR handler with MCA Polling x86/mce/amd: Enable interrupt vectors once per-CPU on SMCA systems x86/mce/amd: Support SMCA Corrected Error Interrupt
arch/x86/include/asm/mce.h | 7 +- arch/x86/kernel/cpu/common.c | 1 + arch/x86/kernel/cpu/mce/amd.c | 391 +++++++++++++----------------------- arch/x86/kernel/cpu/mce/core.c | 322 ++++++++++++++--------------- arch/x86/kernel/cpu/mce/intel.c | 15 ++ arch/x86/kernel/cpu/mce/internal.h | 8 + arch/x86/kernel/cpu/mce/threshold.c | 3 + 7 files changed, 332 insertions(+), 415 deletions(-) --- base-commit: b36de8b904b8ff2095ece7af6b3cfff8c73c2fb1 change-id: 20250210-wip-mca-updates-bed2a67c9c57
Currently, the MCE subsystem sysfs interface will be removed if the thresholding sysfs interface fails to be created. A common failure is due to new MCA bank types that are not recognized and don't have a short name set.
The MCA thresholding feature is optional and should not break the common MCE sysfs interface. Also, new MCA bank types are occasionally introduced, and updates will be needed to recognize them. But likewise, this should not break the common sysfs interface.
Keep the MCE sysfs interface regardless of the status of the thresholding sysfs interface.
Signed-off-by: Yazen Ghannam yazen.ghannam@amd.com Cc: stable@vger.kernel.org ---
Notes: v1->v2: * New in v2. * Included stable tag but there's no specific commit for Fixes.
arch/x86/kernel/cpu/mce/core.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 0dc00c9894c7..d39af20154c7 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -2801,15 +2801,9 @@ static int mce_cpu_dead(unsigned int cpu) static int mce_cpu_online(unsigned int cpu) { struct timer_list *t = this_cpu_ptr(&mce_timer); - int ret;
mce_device_create(cpu); - - ret = mce_threshold_create_device(cpu); - if (ret) { - mce_device_remove(cpu); - return ret; - } + mce_threshold_create_device(cpu); mce_reenable_cpu(); mce_start_timer(t); return 0;
From: Yazen Ghannam yazen.ghannam@amd.com Sent: Friday, February 14, 2025 12:46 AM To: x86@kernel.org; Luck, Tony tony.luck@intel.com Cc: linux-kernel@vger.kernel.org; linux-edac@vger.kernel.org; Smita.KoralahalliChannabasappa@amd.com; Yazen Ghannam yazen.ghannam@amd.com; stable@vger.kernel.org Subject: [PATCH v2 01/16] x86/mce: Don't remove sysfs if thresholding sysfs init fails
Currently, the MCE subsystem sysfs interface will be removed if the thresholding sysfs interface fails to be created. A common failure is due to new MCA bank types that are not recognized and don't have a short name set.
The MCA thresholding feature is optional and should not break the common MCE sysfs interface. Also, new MCA bank types are occasionally introduced, and updates will be needed to recognize them. But likewise, this should not break the common sysfs interface.
Keep the MCE sysfs interface regardless of the status of the thresholding sysfs interface.
Signed-off-by: Yazen Ghannam yazen.ghannam@amd.com Cc: stable@vger.kernel.org
LGTM. Reviewed-by: Qiuxu Zhuo qiuxu.zhuo@intel.com
On Thu, Feb 13, 2025 at 04:45:49PM +0000, Yazen Ghannam wrote:
Hi all,
This set unifies the AMD MCA interrupt handlers with common MCA code. The goal is to avoid duplicating functionality like reading and clearing MCA banks.
Based on feedback, this revision also include changes to the MCA init flow.
Apart from the nits I posed againt parts 5 & 15. LGTM.
Tested-by: Tony Luck tony.luck@intel.com Reviewed-by: Tony Luck tony.luck@intel.com
-Tony
linux-stable-mirror@lists.linaro.org