Direct HLT instruction execution causes #VEs for TDX VMs which is routed to hypervisor via TDCALL. safe_halt() routines execute HLT in STI-shadow so IRQs need to remain disabled until the TDCALL to ensure that pending IRQs are correctly treated as wake events. So "sti;hlt" sequence needs to be replaced for TDX VMs with "TDCALL; *_irq_enable()" to keep interrupts disabled during TDCALL execution.
Commit bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests") prevented the idle routines from using "sti;hlt". But it missed the paravirt routine which can be reached like this as an example: acpi_safe_halt() => raw_safe_halt() => arch_safe_halt() => irq.safe_halt() => pv_native_safe_halt()
Modify tdx_safe_halt() to implement the sequence "TDCALL; raw_local_irq_enable()" and invoke tdx_halt() from idle routine which just executes TDCALL without toggling interrupt state. Introduce dependency on CONFIG_PARAVIRT and override paravirt halt()/safe_halt() routines for TDX VMs.
Cc: stable@vger.kernel.org Fixes: bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests") Signed-off-by: Vishal Annapurve vannapurve@google.com --- arch/x86/Kconfig | 1 + arch/x86/coco/tdx/tdx.c | 22 +++++++++++++++++++++- arch/x86/include/asm/tdx.h | 2 +- arch/x86/kernel/process.c | 2 +- 4 files changed, 24 insertions(+), 3 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 87198d957e2f..afcdbc9693dc 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -902,6 +902,7 @@ config INTEL_TDX_GUEST depends on X86_64 && CPU_SUP_INTEL depends on X86_X2APIC depends on EFI_STUB + depends on PARAVIRT select ARCH_HAS_CC_PLATFORM select X86_MEM_ENCRYPT select X86_MCE diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 32809a06dab4..7ab427e85bd3 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -14,6 +14,7 @@ #include <asm/ia32.h> #include <asm/insn.h> #include <asm/insn-eval.h> +#include <asm/paravirt_types.h> #include <asm/pgtable.h> #include <asm/set_memory.h> #include <asm/traps.h> @@ -398,7 +399,7 @@ static int handle_halt(struct ve_info *ve) return ve_instr_len(ve); }
-void __cpuidle tdx_safe_halt(void) +void __cpuidle tdx_halt(void) { const bool irq_disabled = false;
@@ -409,6 +410,16 @@ void __cpuidle tdx_safe_halt(void) WARN_ONCE(1, "HLT instruction emulation failed\n"); }
+static void __cpuidle tdx_safe_halt(void) +{ + tdx_halt(); + /* + * "__cpuidle" section doesn't support instrumentation, so stick + * with raw_* variant that avoids tracing hooks. + */ + raw_local_irq_enable(); +} + static int read_msr(struct pt_regs *regs, struct ve_info *ve) { struct tdx_module_args args = { @@ -1109,6 +1120,15 @@ void __init tdx_early_init(void) x86_platform.guest.enc_kexec_begin = tdx_kexec_begin; x86_platform.guest.enc_kexec_finish = tdx_kexec_finish;
+ /* + * "sti;hlt" execution in TDX guests will induce a #VE in the STI-shadow + * which will enable interrupts before HLT TDCALL inocation possibly + * resulting in missed wakeup events. Modify all possible HLT + * execution paths to use TDCALL for performance/reliability reasons. + */ + pv_ops.irq.safe_halt = tdx_safe_halt; + pv_ops.irq.halt = tdx_halt; + /* * TDX intercepts the RDMSR to read the X2APIC ID in the parallel * bringup low level code. That raises #VE which cannot be handled diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index b4b16dafd55e..393ee2dfaab1 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -58,7 +58,7 @@ void tdx_get_ve_info(struct ve_info *ve);
bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve);
-void tdx_safe_halt(void); +void tdx_halt(void);
bool tdx_early_handle_ve(struct pt_regs *regs);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 6da6769d7254..d11956a178df 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -934,7 +934,7 @@ void __init select_idle_routine(void) static_call_update(x86_idle, mwait_idle); } else if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) { pr_info("using TDX aware idle routine\n"); - static_call_update(x86_idle, tdx_safe_halt); + static_call_update(x86_idle, tdx_halt); } else { static_call_update(x86_idle, default_idle); }
On 2/20/25 13:16, Vishal Annapurve wrote:
Direct HLT instruction execution causes #VEs for TDX VMs which is routed to hypervisor via TDCALL. safe_halt() routines execute HLT in STI-shadow so IRQs need to remain disabled until the TDCALL to ensure that pending IRQs are correctly treated as wake events.
This isn't quite true. There's only one paravirt safe_halt() and it doesn't do HLT or STI.
I think it's more true to say that "safe" halts are entered with IRQs disabled. They logically do the halt operation and then enable interrupts before returning.
So "sti;hlt" sequence needs to be replaced for TDX VMs with "TDCALL; *_irq_enable()" to keep interrupts disabled during TDCALL execution.
But this isn't new. TDX already tried to avoid "sti;hlt". It just screwed up the implementation.
Commit bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests") prevented the idle routines from using "sti;hlt". But it missed the paravirt routine which can be reached like this as an example: acpi_safe_halt() => raw_safe_halt() => arch_safe_halt() => irq.safe_halt() => pv_native_safe_halt()
This, on the other hand, *is* important.
Modify tdx_safe_halt() to implement the sequence "TDCALL; raw_local_irq_enable()" and invoke tdx_halt() from idle routine which just executes TDCALL without toggling interrupt state. Introduce dependency on CONFIG_PARAVIRT and override paravirt halt()/safe_halt() routines for TDX VMs.
This changelog glosses over one of the key points: Why *MUST* TDX use paravirt? It further confuses the reasoning by alluding to the idea that "Direct HLT instruction execution ... is routed to hypervisor via TDCALL".
It gives background and a solution, but it's not obvious what the problem is or how the solution _fixes_ the problem.
What must TDX now depend on PARAVIRT?
Why not just route the HLT to a TDXCALL via the #VE code?
On Thu, Feb 20, 2025 at 3:00 PM Dave Hansen dave.hansen@intel.com wrote:
On 2/20/25 13:16, Vishal Annapurve wrote:
Direct HLT instruction execution causes #VEs for TDX VMs which is routed to hypervisor via TDCALL. safe_halt() routines execute HLT in STI-shadow so IRQs need to remain disabled until the TDCALL to ensure that pending IRQs are correctly treated as wake events.
This isn't quite true. There's only one paravirt safe_halt() and it doesn't do HLT or STI.
pv_native_safe_halt() -> native_safe_halt() -> "sti; hlt".
I think it's more true to say that "safe" halts are entered with IRQs disabled. They logically do the halt operation and then enable interrupts before returning.
So "sti;hlt" sequence needs to be replaced for TDX VMs with "TDCALL; *_irq_enable()" to keep interrupts disabled during TDCALL execution.
But this isn't new. TDX already tried to avoid "sti;hlt". It just screwed up the implementation.
Commit bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests") prevented the idle routines from using "sti;hlt". But it missed the paravirt routine which can be reached like this as an example: acpi_safe_halt() => raw_safe_halt() => arch_safe_halt() => irq.safe_halt() => pv_native_safe_halt()
This, on the other hand, *is* important.
Modify tdx_safe_halt() to implement the sequence "TDCALL; raw_local_irq_enable()" and invoke tdx_halt() from idle routine which just executes TDCALL without toggling interrupt state. Introduce dependency on CONFIG_PARAVIRT and override paravirt halt()/safe_halt() routines for TDX VMs.
This changelog glosses over one of the key points: Why *MUST* TDX use paravirt? It further confuses the reasoning by alluding to the idea that "Direct HLT instruction execution ... is routed to hypervisor via TDCALL".
It gives background and a solution, but it's not obvious what the problem is or how the solution _fixes_ the problem.
What must TDX now depend on PARAVIRT?
Why not just route the HLT to a TDXCALL via the #VE code?
Makes sense, I will update the commit message in the next version to clearly answer these questions and address the above comments.
linux-stable-mirror@lists.linaro.org