From: Huacai Chen chenhuacai@loongson.cn
[ Upstream commit 863a320dc6fd7c855f47da4bb82a8de2d9102ea2 ]
If the default state of the interrupt controllers in the first kernel don't mask any interrupts, it may cause the second kernel to potentially receive interrupts (which were previously allocated by the first kernel) immediately after a CPU becomes online during its boot process. These interrupts cannot be properly routed, leading to bad IRQ issues.
This patch calls machine_kexec_mask_interrupts() to mask all interrupts during the kexec/kdump process.
Signed-off-by: Tianyang Zhang zhangtianyang@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
## Comprehensive Analysis
### 1. Commit Message and Problem Analysis The commit addresses a significant reliability issue in the LoongArch architecture's kexec (soft reboot) and kdump (crash dump) mechanisms. - **Problem:** When transitioning from the current kernel to a new one (either via kexec or panic-induced kdump), the interrupt controllers are not being properly masked. This allows interrupts from the old kernel to fire immediately as the new kernel boots, before it is ready to handle them. This results in "bad IRQ" errors, spurious interrupts, and potentially failed crash dumps. - **Solution:** The patch introduces calls to `machine_kexec_mask_interrupts()` in the shutdown paths. This function iterates through active interrupts and masks them at the controller level, ensuring a clean, quiescent state for the incoming kernel. - **Context:** This aligns LoongArch with other architectures (ARM64, RISC-V, PowerPC) where this masking is already standard practice.
### 2. Code Research and Validation - **Mechanism:** The fix adds two function calls: one in `machine_kexec()` (standard path) and one in `machine_crash_shutdown()` (crash path). - **Dependencies & Backporting Complexity:** - The function `machine_kexec_mask_interrupts()` is a standard helper. However, it was consolidated into the generic `kernel/irq/kexec.c` in kernel versions around v6.14 (approx. Dec 2024). - **For recent stable kernels (e.g., 6.14+):** The patch should apply cleanly as the generic symbol is available. - **For older LTS kernels (e.g., 6.1.y, 6.6.y, 6.12.y):** The generic helper likely does not exist. Backporting to these trees will require a slight modification to include a local implementation of `machine_kexec_mask_interrupts()` within `arch/loongarch/kernel/machine_kexec.c`, similar to how ARM64 and RISC-V handled it prior to the consolidation. This is a standard procedure for architecture-specific fixes in stable.
### 3. Stable Kernel Rules Assessment - **Fixes a Real Bug:** Yes. The lack of interrupt masking causes race conditions and potential boot failures in the second kernel. - **Important Severity:** High. Kdump is a critical feature for enterprise debugging. If kdump fails due to spurious IRQs, diagnosing the original system crash becomes impossible. - **Small and Contained:** The logic change is minimal (masking interrupts). - **No New Features:** This is a fix for existing, broken functionality. - **Regression Risk:** Low. The system is shutting down; masking interrupts is the correct defensive posture.
### 4. Conclusion This commit is an essential fix for LoongArch system reliability. It corrects a deviation from standard kernel behavior that jeopardizes crash recovery. While backporting to older long-term stable trees will require handling the missing generic helper function (by adding a local version), the fix itself is obviously correct and required.
**YES**
arch/loongarch/kernel/machine_kexec.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/loongarch/kernel/machine_kexec.c b/arch/loongarch/kernel/machine_kexec.c index f9381800e291c..8ef4e4595d61a 100644 --- a/arch/loongarch/kernel/machine_kexec.c +++ b/arch/loongarch/kernel/machine_kexec.c @@ -249,6 +249,7 @@ void machine_crash_shutdown(struct pt_regs *regs) #ifdef CONFIG_SMP crash_smp_send_stop(); #endif + machine_kexec_mask_interrupts(); cpumask_set_cpu(crashing_cpu, &cpus_in_crash);
pr_info("Starting crashdump kernel...\n"); @@ -286,6 +287,7 @@ void machine_kexec(struct kimage *image)
/* We do not want to be bothered. */ local_irq_disable(); + machine_kexec_mask_interrupts();
pr_notice("EFI boot flag 0x%lx\n", efi_boot); pr_notice("Command line at 0x%lx\n", cmdline_ptr);