From: Smita Koralahalli Smita.KoralahalliChannabasappa@amd.com
Extend the logic of handling CMCI storms to AMD threshold interrupts.
Rely on the similar approach as of Intel's CMCI to mitigate storms per CPU and per bank. But, unlike CMCI, do not set thresholds and reduce interrupt rate on a storm. Rather, disable the interrupt on the corresponding CPU and bank. Re-enable back the interrupts if enough consecutive polls of the bank show no corrected errors (30, as programmed by Intel).
Turning off the threshold interrupts would be a better solution on AMD systems as other error severities will still be handled even if the threshold interrupts are disabled.
Also, AMD systems currently allow banks to be managed by both polling and interrupts. So don't modify the polling banks set after a storm ends.
[Tony: Small tweak because mce_handle_storm() isn't a pointer now] [Yazen: Rebase and simplify]
Stable backport notes: 1. Currently, when a Machine check interrupt storm is detected, the bank's corresponding bit in mce_poll_banks per-CPU variable is cleared by cmci_storm_end(). As a result, on AMD's SMCA systems, errors injected or encountered after the storm subsides are not logged since polling on that bank has been disabled. Polling banks set on AMD systems should not be modified when a storm subsides.
2. This patch is a snippet from the CMCI storm handling patch (link below) that has been accepted into tip for v6.19. While backporting the patch would have been the preferred way, the same cannot be undertaken since its part of a larger set. As such, this fix will be temporary. When the original patch and its set is integrated into stable, this patch should be reverted.
Signed-off-by: Smita Koralahalli Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Tony Luck tony.luck@intel.com Signed-off-by: Yazen Ghannam yazen.ghannam@amd.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Reviewed-by: Qiuxu Zhuo qiuxu.zhuo@intel.com Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com Signed-off-by: Avadhut Naik avadhut.naik@amd.com --- This is somewhat of a new scenario for me. Not really sure about the procedure. Hence, haven't modified the commit message and removed the tags. If required, will rework both. Also, while this issue can be encountered on AMD systems using v6.8 and later stable kernels, we would specifically prefer for this fix to be backported to v6.12 since its LTS. --- arch/x86/kernel/cpu/mce/threshold.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/mce/threshold.c b/arch/x86/kernel/cpu/mce/threshold.c index f4a007616468..61eaa1774931 100644 --- a/arch/x86/kernel/cpu/mce/threshold.c +++ b/arch/x86/kernel/cpu/mce/threshold.c @@ -85,7 +85,8 @@ void cmci_storm_end(unsigned int bank) { struct mca_storm_desc *storm = this_cpu_ptr(&storm_desc);
- __clear_bit(bank, this_cpu_ptr(mce_poll_banks)); + if (!mce_flags.amd_threshold) + __clear_bit(bank, this_cpu_ptr(mce_poll_banks)); storm->banks[bank].history = 0; storm->banks[bank].in_storm_mode = false;
base-commit: 8b690556d8fe074b4f9835075050fba3fb180e93
Hi,
Thanks for your patch.
FYI: kernel test robot notices the stable kernel rule is not satisfied.
The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#opti...
Rule: add the tag "Cc: stable@vger.kernel.org" in the sign-off area to have the patch automatically included in the stable tree. Subject: [PATCH] x86/mce: Handle AMD threshold interrupt storms Link: https://lore.kernel.org/stable/20251120214139.1721338-1-avadhut.naik%40amd.c...
On Thu, Nov 20, 2025 at 09:41:24PM +0000, Avadhut Naik wrote:
From: Smita Koralahalli Smita.KoralahalliChannabasappa@amd.com
You need to put here
"Commit <sha1> upstream."
Extend the logic of handling CMCI storms to AMD threshold interrupts.
...
On 11/20/2025 15:53, Borislav Petkov wrote:
On Thu, Nov 20, 2025 at 09:41:24PM +0000, Avadhut Naik wrote:
From: Smita Koralahalli Smita.KoralahalliChannabasappa@amd.com
You need to put here
"Commit <sha1> upstream."
Will add that.
Also, does this need to have a Fixes tag?
Didn't add one here as the original patch committed to tip didn't have one.
Extend the logic of handling CMCI storms to AMD threshold interrupts.
...
linux-stable-mirror@lists.linaro.org