From: Breno Leitao leitao@debian.org
[ Upstream commit 4734c8b46b901cff2feda8b82abc710b65dc31c1 ]
When a GHES (Generic Hardware Error Source) triggers a panic, add the TAINT_MACHINE_CHECK taint flag to the kernel. This explicitly marks the kernel as tainted due to a machine check event, improving diagnostics and post-mortem analysis. The taint is set with LOCKDEP_STILL_OK to indicate lockdep remains valid.
At large scale deployment, this helps to quickly determine panics that are coming due to hardware failures.
Signed-off-by: Breno Leitao leitao@debian.org Reviewed-by: Tony Luck tony.luck@intel.com Link: https://patch.msgid.link/20250702-add_tain-v1-1-9187b10914b9@debian.org Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my analysis of this commit, here is my assessment:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
1. **Fixes a real diagnostic issue**: The commit adds the TAINT_MACHINE_CHECK flag when GHES (Generic Hardware Error Source) triggers a panic. This is important for post-mortem analysis at scale, as explicitly stated in the commit message: "At large scale deployment, this helps to quickly determine panics that are coming due to hardware failures."
2. **Small and contained change**: The patch adds only a single line of code (`add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);`) in the `__ghes_panic()` function at drivers/acpi/apei/ghes.c:1091. This meets the stable kernel rule of being under 100 lines.
3. **Obviously correct**: The change follows established kernel patterns. Looking at the grep results, other machine check handlers already use this same pattern: - arch/x86/kernel/cpu/mce/core.c:1640 - arch/powerpc/kernel/mce.c:332 - arch/x86/kernel/cpu/mce/p5.c:40 - arch/x86/kernel/cpu/mce/winchip.c:24
4. **No architectural changes**: This is purely a diagnostic improvement that adds taint information without changing any functionality or behavior of the GHES panic path.
5. **Minimal risk**: The change uses LOCKDEP_STILL_OK flag, indicating that lockdep remains valid after the taint, which is the safer option compared to LOCKDEP_NOW_UNRELIABLE used in some other machine check paths.
6. **Real benefit for users**: For organizations running Linux at scale, being able to quickly identify hardware-related panics through the taint flag provides significant operational value for triaging issues.
The commit meets all the stable kernel criteria: it's small, obviously correct, fixes a real diagnostic limitation that affects users (especially at scale), and has been reviewed by Tony Luck who is a recognized maintainer in the RAS (Reliability, Availability, and Serviceability) subsystem.
drivers/acpi/apei/ghes.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 94e3d3fe11ae..91f9267c07ea 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -996,6 +996,8 @@ static void __ghes_panic(struct ghes *ghes,
__ghes_print_estatus(KERN_EMERG, ghes->generic, estatus);
+ add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK); + ghes_clear_estatus(ghes, estatus, buf_paddr, fixmap_idx);
if (!panic_timeout)