The upcoming new Idle HLT Intercept feature allows for the HLT instruction execution by a vCPU to be intercepted by the hypervisor only if there are no pending V_INTR and V_NMI events for the vCPU. When the vCPU is expected to service the pending V_INTR and V_NMI events, the Idle HLT intercept won’t trigger. The feature allows the hypervisor to determine if the vCPU is actually idle and reduces wasteful VMEXITs.
The idle HLT intercept feature is used for enlightened guests who wish to securely handle the events. When an enlightened guest does a HLT while an interrupt is pending, hypervisor will not have a way to figure out whether the guest needs to be re-entered or not. The Idle HLT intercept feature allows the HLT execution only if there are no pending V_INTR and V_NMI events.
Presence of the Idle HLT Intercept feature is indicated via CPUID function Fn8000_000A_EDX[30].
Document for the Idle HLT intercept feature is available at [1].
This series is based on kvm-next/next (64dbb3a771a1) + [2].
Experiments done: ----------------
kvm_amd.avic is set to '0' for this experiment.
The below numbers represent the average of 10 runs.
Normal guest (L1) The below netperf command was run on the guest with smp = 1 (pinned).
netperf -H <host ip> -t TCP_RR -l 60 ---------------------------------------------------------------- |with Idle HLT(transactions/Sec)|w/o Idle HLT(transactions/Sec)| ---------------------------------------------------------------- | 25645.7136 | 25773.2796 | ----------------------------------------------------------------
Number of transactions/sec with and without idle HLT intercept feature are almost same.
Nested guest (L2) The below netperf command was run on L2 guest with smp = 1 (pinned).
netperf -H <host ip> -t TCP_RR -l 60 ---------------------------------------------------------------- |with Idle HLT(transactions/Sec)|w/o Idle HLT(transactions/Sec)| ---------------------------------------------------------------- | 5655.4468 | 5755.2189 | ----------------------------------------------------------------
Number of transactions/sec with and without idle HLT intercept feature are almost same.
Testing Done: - Tested the functionality for the Idle HLT intercept feature using selftest svm_idle_hlt_test. - Tested SEV and SEV-ES guest for the Idle HLT intercept functionality. - Tested the Idle HLT intercept functionality on nested guest.
v3 -> v4 - Drop the patches to add vcpu_get_stat() into a new series [2]. - Added nested Idle HLT intercept support.
v2 -> v3 - Incorporated Andrew's suggestion to structure vcpu_stat_types in a way that each architecture can share the generic types and also provide its own.
v1 -> v2 - Done changes in svm_idle_hlt_test based on the review comments from Sean. - Added an enum based approach to get binary stats in vcpu_get_stat() which doesn't use string to get stat data based on the comments from Sean. - Added self_halt() and cli() helpers based on the comments from Sean.
[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024, Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT). https://bugzilla.kernel.org/attachment.cgi?id=306250
[2]: https://lore.kernel.org/kvm/20241021062226.108657-1-manali.shukla@amd.com/T/...
Manali Shukla (4): x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept KVM: SVM: Add Idle HLT intercept support KVM: nSVM: implement the nested idle halt intercept KVM: selftests: KVM: SVM: Add Idle HLT intercept test
arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/svm.h | 1 + arch/x86/include/uapi/asm/svm.h | 2 + arch/x86/kvm/governed_features.h | 1 + arch/x86/kvm/svm/nested.c | 7 ++ arch/x86/kvm/svm/svm.c | 15 +++- tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/include/x86_64/processor.h | 1 + .../selftests/kvm/x86_64/svm_idle_hlt_test.c | 89 +++++++++++++++++++ 9 files changed, 115 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86_64/svm_idle_hlt_test.c
base-commit: c8d430db8eec7d4fd13a6bea27b7086a54eda6da prerequisite-patch-id: ca912571db5c004f77b70843b8dd35517ff1267f prerequisite-patch-id: 164ea3b4346f9e04bc69819278d20f5e1b5df5ed prerequisite-patch-id: 90d870f426ebc2cec43c0dd89b701ee998385455 prerequisite-patch-id: 45812b799c517a4521782a1fdbcda881237e1eda
From: Manali Shukla Manali.Shukla@amd.com
The Idle HLT Intercept feature allows for the HLT instruction execution by a vCPU to be intercepted by the hypervisor only if there are no pending events (V_INTR and V_NMI) for the vCPU. When the vCPU is expected to service the pending events (V_INTR and V_NMI), the Idle HLT intercept won’t trigger. The feature allows the hypervisor to determine if the vCPU is idle and reduces wasteful VMEXITs.
In addition to the aforementioned use case, the Idle HLT intercept feature is also used for enlightened guests who aim to securely manage events without the hypervisor’s awareness. If a HLT occurs while a virtual event is pending and the hypervisor is unaware of this pending event (as could be the case with enlightened guests), the absence of the Idle HLT intercept feature could result in a vCPU being suspended indefinitely.
Presence of Idle HLT intercept feature for guests is indicated via CPUID function 0x8000000A_EDX[30].
Signed-off-by: Manali Shukla Manali.Shukla@amd.com Acked-by: Borislav Petkov (AMD) bp@alien8.de --- arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index dd4682857c12..7461a49e1045 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -382,6 +382,7 @@ #define X86_FEATURE_V_SPEC_CTRL (15*32+20) /* "v_spec_ctrl" Virtual SPEC_CTRL */ #define X86_FEATURE_VNMI (15*32+25) /* "vnmi" Virtual NMI */ #define X86_FEATURE_SVME_ADDR_CHK (15*32+28) /* SVME addr check */ +#define X86_FEATURE_IDLE_HLT (15*32+30) /* "" IDLE HLT intercept */
/* Intel-defined CPU features, CPUID level 0x00000007:0 (ECX), word 16 */ #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* "avx512vbmi" AVX512 Vector Bit Manipulation instructions*/
base-commit: c8d430db8eec7d4fd13a6bea27b7086a54eda6da prerequisite-patch-id: ca912571db5c004f77b70843b8dd35517ff1267f prerequisite-patch-id: 164ea3b4346f9e04bc69819278d20f5e1b5df5ed prerequisite-patch-id: 90d870f426ebc2cec43c0dd89b701ee998385455 prerequisite-patch-id: 45812b799c517a4521782a1fdbcda881237e1eda
On Tue, Oct 22, 2024 at 05:48:07AM +0000, Manali Shukla wrote:
From: Manali Shukla Manali.Shukla@amd.com
The Idle HLT Intercept feature allows for the HLT instruction execution by a vCPU to be intercepted by the hypervisor only if there are no pending events (V_INTR and V_NMI) for the vCPU. When the vCPU is expected to service the pending events (V_INTR and V_NMI), the Idle HLT intercept won’t trigger. The feature allows the hypervisor to determine if the vCPU is idle and reduces wasteful VMEXITs.
In addition to the aforementioned use case, the Idle HLT intercept feature is also used for enlightened guests who aim to securely manage events without the hypervisor’s awareness. If a HLT occurs while a virtual event is pending and the hypervisor is unaware of this pending event (as could be the case with enlightened guests), the absence of the Idle HLT intercept feature could result in a vCPU being suspended indefinitely.
Presence of Idle HLT intercept feature for guests is indicated via CPUID function 0x8000000A_EDX[30].
Signed-off-by: Manali Shukla Manali.Shukla@amd.com Acked-by: Borislav Petkov (AMD) bp@alien8.de
arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index dd4682857c12..7461a49e1045 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -382,6 +382,7 @@ #define X86_FEATURE_V_SPEC_CTRL (15*32+20) /* "v_spec_ctrl" Virtual SPEC_CTRL */ #define X86_FEATURE_VNMI (15*32+25) /* "vnmi" Virtual NMI */ #define X86_FEATURE_SVME_ADDR_CHK (15*32+28) /* SVME addr check */ +#define X86_FEATURE_IDLE_HLT (15*32+30) /* "" IDLE HLT intercept */
Whoever commits this, you can remove the "" in the comment now as they're not needed anymore.
Thx.
On Tue, Oct 22, 2024, Borislav Petkov wrote:
On Tue, Oct 22, 2024 at 05:48:07AM +0000, Manali Shukla wrote:
From: Manali Shukla Manali.Shukla@amd.com
The Idle HLT Intercept feature allows for the HLT instruction execution by a vCPU to be intercepted by the hypervisor only if there are no pending events (V_INTR and V_NMI) for the vCPU. When the vCPU is expected to service the pending events (V_INTR and V_NMI), the Idle HLT intercept won’t trigger. The feature allows the hypervisor to determine if the vCPU is idle and reduces wasteful VMEXITs.
In addition to the aforementioned use case, the Idle HLT intercept feature is also used for enlightened guests who aim to securely manage events without the hypervisor’s awareness. If a HLT occurs while a virtual event is pending and the hypervisor is unaware of this pending event (as could be the case with enlightened guests), the absence of the Idle HLT intercept feature could result in a vCPU being suspended indefinitely.
Presence of Idle HLT intercept feature for guests is indicated via CPUID function 0x8000000A_EDX[30].
Signed-off-by: Manali Shukla Manali.Shukla@amd.com Acked-by: Borislav Petkov (AMD) bp@alien8.de
arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index dd4682857c12..7461a49e1045 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -382,6 +382,7 @@ #define X86_FEATURE_V_SPEC_CTRL (15*32+20) /* "v_spec_ctrl" Virtual SPEC_CTRL */ #define X86_FEATURE_VNMI (15*32+25) /* "vnmi" Virtual NMI */ #define X86_FEATURE_SVME_ADDR_CHK (15*32+28) /* SVME addr check */ +#define X86_FEATURE_IDLE_HLT (15*32+30) /* "" IDLE HLT intercept */
Whoever commits this, you can remove the "" in the comment now as they're not needed anymore.
Ya, will do.
From: Manali Shukla Manali.Shukla@amd.com
The hypervisor can intercept the HLT instruction by setting the HLT-Intercept Bit in VMCB, causing a VMEXIT. This can be wasteful if there are pending V_INTR and V_NMI events, as the hypervisor must then initiate a VMRUN to handle them.
If the HLT-Intercept Bit is cleared and the vCPU executes HLT while there are pending V_INTR and V_NMI events, the hypervisor won’t detect them, potentially causing indefinite suspension of the vCPU. This poses a problem for enlightened guests who wish to securely handle the events.
For Secure AVIC scenarios, if a guest does a HLT while an interrupt is pending (in IRR), the hypervisor does not have a way to figure out whether the guest needs to be re-entered, as it cannot read the guest backing page. The Idle HLT intercept feature allows the hypervisor to intercept HLT execution only if there are no pending V_INTR and V_NMI events.
There are two use cases for the Idle HLT intercept feature: - Secure VMs that wish to handle pending events securely without exiting to the hypervisor on HLT (Secure AVIC). - Optimization for all the VMs to avoid a wasteful VMEXIT during HLT when there are pending events.
On discovering the Idle HLT Intercept, the KVM hypervisor, Sets the Idle HLT Intercept bit (bit (6), offset 0x14h) in the VMCB. When the Idle HLT Intercept bit is set, HLT Intercept bit (bit (0), offset 0xFh) should be cleared.
Before entering the HLT state, the HLT instruction performs checks in following order: - The HLT intercept check, if set, it unconditionally triggers SVM_EXIT_HLT (0x78). - The Idle HLT intercept check, if set and there are no pending V_INTR or V_NMI events, triggers SVM_EXIT_IDLE_HLT (0xA6).
Details about the Idle HLT intercept feature can be found in AMD APM [1].
[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024, Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT). https://bugzilla.kernel.org/attachment.cgi?id=306250
Signed-off-by: Manali Shukla Manali.Shukla@amd.com --- arch/x86/include/asm/svm.h | 1 + arch/x86/include/uapi/asm/svm.h | 2 ++ arch/x86/kvm/svm/svm.c | 11 ++++++++--- 3 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h index 2b59b9951c90..992050cb83d0 100644 --- a/arch/x86/include/asm/svm.h +++ b/arch/x86/include/asm/svm.h @@ -116,6 +116,7 @@ enum { INTERCEPT_INVPCID, INTERCEPT_MCOMMIT, INTERCEPT_TLBSYNC, + INTERCEPT_IDLE_HLT = 166, };
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h index 1814b413fd57..ec1321248dac 100644 --- a/arch/x86/include/uapi/asm/svm.h +++ b/arch/x86/include/uapi/asm/svm.h @@ -95,6 +95,7 @@ #define SVM_EXIT_CR14_WRITE_TRAP 0x09e #define SVM_EXIT_CR15_WRITE_TRAP 0x09f #define SVM_EXIT_INVPCID 0x0a2 +#define SVM_EXIT_IDLE_HLT 0x0a6 #define SVM_EXIT_NPF 0x400 #define SVM_EXIT_AVIC_INCOMPLETE_IPI 0x401 #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402 @@ -224,6 +225,7 @@ { SVM_EXIT_CR4_WRITE_TRAP, "write_cr4_trap" }, \ { SVM_EXIT_CR8_WRITE_TRAP, "write_cr8_trap" }, \ { SVM_EXIT_INVPCID, "invpcid" }, \ + { SVM_EXIT_IDLE_HLT, "idle-halt" }, \ { SVM_EXIT_NPF, "npf" }, \ { SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \ { SVM_EXIT_AVIC_UNACCELERATED_ACCESS, "avic_unaccelerated_access" }, \ diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 9df3e1e5ae81..e86b79e975d3 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1298,8 +1298,12 @@ static void init_vmcb(struct kvm_vcpu *vcpu) svm_set_intercept(svm, INTERCEPT_MWAIT); }
- if (!kvm_hlt_in_guest(vcpu->kvm)) - svm_set_intercept(svm, INTERCEPT_HLT); + if (!kvm_hlt_in_guest(vcpu->kvm)) { + if (cpu_feature_enabled(X86_FEATURE_IDLE_HLT)) + svm_set_intercept(svm, INTERCEPT_IDLE_HLT); + else + svm_set_intercept(svm, INTERCEPT_HLT); + }
control->iopm_base_pa = iopm_base; control->msrpm_base_pa = __sme_set(__pa(svm->msrpm)); @@ -3353,6 +3357,7 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = { [SVM_EXIT_CR4_WRITE_TRAP] = cr_trap, [SVM_EXIT_CR8_WRITE_TRAP] = cr_trap, [SVM_EXIT_INVPCID] = invpcid_interception, + [SVM_EXIT_IDLE_HLT] = kvm_emulate_halt, [SVM_EXIT_NPF] = npf_interception, [SVM_EXIT_RSM] = rsm_interception, [SVM_EXIT_AVIC_INCOMPLETE_IPI] = avic_incomplete_ipi_interception, @@ -3515,7 +3520,7 @@ int svm_invoke_exit_handler(struct kvm_vcpu *vcpu, u64 exit_code) return interrupt_window_interception(vcpu); else if (exit_code == SVM_EXIT_INTR) return intr_interception(vcpu); - else if (exit_code == SVM_EXIT_HLT) + else if (exit_code == SVM_EXIT_HLT || exit_code == SVM_EXIT_IDLE_HLT) return kvm_emulate_halt(vcpu); else if (exit_code == SVM_EXIT_NPF) return npf_interception(vcpu);
Enable the idle HLT intercept for the L2 guest when it is available on the platform and the L2 guest can utilize the Idle HLT intercept feature. This is achieved by clearing the HLT intercept for the L1 hypervisor in the recalc_interrupts() function. The Idle HLT intercept requires that the HLT intercept is cleared and the Idle HLT intercept is set to properly enable the feature.
The nested idle halt intercept was tested by booting L1,L2 (all Linux) and checking there are only idle-hlt vmexits happened.
Signed-off-by: Manali Shukla manali.shukla@amd.com --- arch/x86/kvm/governed_features.h | 1 + arch/x86/kvm/svm/nested.c | 8 ++++++++ arch/x86/kvm/svm/svm.c | 4 ++++ 3 files changed, 13 insertions(+)
diff --git a/arch/x86/kvm/governed_features.h b/arch/x86/kvm/governed_features.h index ad463b1ed4e4..0154012721cb 100644 --- a/arch/x86/kvm/governed_features.h +++ b/arch/x86/kvm/governed_features.h @@ -17,6 +17,7 @@ KVM_GOVERNED_X86_FEATURE(PFTHRESHOLD) KVM_GOVERNED_X86_FEATURE(VGIF) KVM_GOVERNED_X86_FEATURE(VNMI) KVM_GOVERNED_X86_FEATURE(LAM) +KVM_GOVERNED_X86_FEATURE(IDLE_HLT)
#undef KVM_GOVERNED_X86_FEATURE #undef KVM_GOVERNED_FEATURE diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index d5314cb7dff4..feb241110f1a 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -178,6 +178,14 @@ void recalc_intercepts(struct vcpu_svm *svm) } else { WARN_ON(!(c->virt_ext & VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK)); } + + /* + * Clear the HLT intercept for L2 guest when the Idle HLT intercept feature + * is enabled on the platform and the guest can use the Idle HLT intercept + * feature. + */ + if (guest_can_use(&svm->vcpu, X86_FEATURE_IDLE_HLT)) + vmcb_clr_intercept(c, INTERCEPT_HLT); }
/* diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index e86b79e975d3..38d546788fc6 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4425,6 +4425,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_PFTHRESHOLD); kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_VGIF); kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_VNMI); + kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_IDLE_HLT);
svm_recalc_instruction_intercepts(vcpu, svm);
@@ -5228,6 +5229,9 @@ static __init void svm_set_cpu_caps(void) if (vnmi) kvm_cpu_cap_set(X86_FEATURE_VNMI);
+ if (cpu_feature_enabled(X86_FEATURE_IDLE_HLT)) + kvm_cpu_cap_set(X86_FEATURE_IDLE_HLT); + /* Nested VM can receive #VMEXIT instead of triggering #GP */ kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK); }
On Tue, Oct 22, 2024, Manali Shukla wrote:
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index d5314cb7dff4..feb241110f1a 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -178,6 +178,14 @@ void recalc_intercepts(struct vcpu_svm *svm) } else { WARN_ON(!(c->virt_ext & VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK)); }
- /*
* Clear the HLT intercept for L2 guest when the Idle HLT intercept feature
* is enabled on the platform and the guest can use the Idle HLT intercept
* feature.
*/
- if (guest_can_use(&svm->vcpu, X86_FEATURE_IDLE_HLT))
vmcb_clr_intercept(c, INTERCEPT_HLT);
This is wrong. KVM needs to honor the intercept of vmcb12. If L1 wants to intercept HLT, then KVM needs to configure vmcb02 to intercept HLT, regradless of whether or not L1 is utilizing INTERCEPT_IDLE_HLT.
Given how KVM currently handles intercepts for nested SVM, I'm pretty sure you can simply do nothing. recalc_intercepts() starts with KVM's intercepts (from vmcb01), and adds in L1's intercepts. So unless there is a special case, the default behavior should Just Work.
for (i = 0; i < MAX_INTERCEPT; i++) c->intercepts[i] = h->intercepts[i];
...
for (i = 0; i < MAX_INTERCEPT; i++) c->intercepts[i] |= g->intercepts[i];
KVM's approach creates all kinds of virtualization holes, e.g. L1 can utilize IDLE_HLT even if the feature isn't advertised to L1. But that's true for quite literally all feature-based intercepts, so for better or worse, I don't think it makes sense to try and change that approach for this feature.
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index e86b79e975d3..38d546788fc6 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4425,6 +4425,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_PFTHRESHOLD); kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_VGIF); kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_VNMI);
- kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_IDLE_HLT);
svm_recalc_instruction_intercepts(vcpu, svm); @@ -5228,6 +5229,9 @@ static __init void svm_set_cpu_caps(void) if (vnmi) kvm_cpu_cap_set(X86_FEATURE_VNMI);
if (cpu_feature_enabled(X86_FEATURE_IDLE_HLT))
kvm_cpu_cap_set(X86_FEATURE_IDLE_HLT);
kvm_cpu_cap_check_and_set() does this for you.
- /* Nested VM can receive #VMEXIT instead of triggering #GP */ kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK); }
-- 2.34.1
From: Manali Shukla Manali.Shukla@amd.com
Execution of the HLT instruction results in VMEXIT. Hypervisor observes pending V_INTR and V_NMI events just after VMEXIT generated by HLT for the vCPU and causes VM entry to service the pending events. The Idle HLT intercept feature avoids the wasteful VMEXIT during halt if there are pending V_INTR and V_NMI events for the vCPU.
The selftest for Idle HLT intercept instruments above-mentioned scenario.
Signed-off-by: Manali Shukla Manali.Shukla@amd.com --- tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/include/x86_64/processor.h | 1 + .../selftests/kvm/x86_64/svm_idle_hlt_test.c | 89 +++++++++++++++++++ 3 files changed, 91 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86_64/svm_idle_hlt_test.c
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 960cf6a77198..25e4bc9473d5 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -94,6 +94,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/smaller_maxphyaddr_emulation_test TEST_GEN_PROGS_x86_64 += x86_64/smm_test TEST_GEN_PROGS_x86_64 += x86_64/state_test TEST_GEN_PROGS_x86_64 += x86_64/vmx_preemption_timer_test +TEST_GEN_PROGS_x86_64 += x86_64/svm_idle_hlt_test TEST_GEN_PROGS_x86_64 += x86_64/svm_vmcall_test TEST_GEN_PROGS_x86_64 += x86_64/svm_int_ctl_test TEST_GEN_PROGS_x86_64 += x86_64/svm_nested_shutdown_test diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h index 8e36de85b68f..41815d6ae500 100644 --- a/tools/testing/selftests/kvm/include/x86_64/processor.h +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h @@ -197,6 +197,7 @@ struct kvm_x86_cpu_feature { #define X86_FEATURE_PAUSEFILTER KVM_X86_CPU_FEATURE(0x8000000A, 0, EDX, 10) #define X86_FEATURE_PFTHRESHOLD KVM_X86_CPU_FEATURE(0x8000000A, 0, EDX, 12) #define X86_FEATURE_VGIF KVM_X86_CPU_FEATURE(0x8000000A, 0, EDX, 16) +#define X86_FEATURE_IDLE_HLT KVM_X86_CPU_FEATURE(0x8000000A, 0, EDX, 30) #define X86_FEATURE_SEV KVM_X86_CPU_FEATURE(0x8000001F, 0, EAX, 1) #define X86_FEATURE_SEV_ES KVM_X86_CPU_FEATURE(0x8000001F, 0, EAX, 3)
diff --git a/tools/testing/selftests/kvm/x86_64/svm_idle_hlt_test.c b/tools/testing/selftests/kvm/x86_64/svm_idle_hlt_test.c new file mode 100644 index 000000000000..fe2ea96695e4 --- /dev/null +++ b/tools/testing/selftests/kvm/x86_64/svm_idle_hlt_test.c @@ -0,0 +1,89 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2024 Advanced Micro Devices, Inc. + * + */ +#include <kvm_util.h> +#include <processor.h> +#include <test_util.h> +#include "svm_util.h" +#include "apic.h" + +#define VINTR_VECTOR 0x30 +#define NUM_ITERATIONS 1000 + +static bool irq_received; + +/* + * The guest code instruments the scenario where there is a V_INTR pending + * event available while hlt instruction is executed. The HLT VM Exit doesn't + * occur in above-mentioned scenario if Idle HLT intercept feature is enabled. + */ + +static void guest_code(void) +{ + uint32_t icr_val; + int i; + + xapic_enable(); + + icr_val = (APIC_DEST_SELF | APIC_INT_ASSERT | VINTR_VECTOR); + + for (i = 0; i < NUM_ITERATIONS; i++) { + cli(); + xapic_write_reg(APIC_ICR, icr_val); + safe_halt(); + GUEST_ASSERT(READ_ONCE(irq_received)); + WRITE_ONCE(irq_received, false); + } + GUEST_DONE(); +} + +static void guest_vintr_handler(struct ex_regs *regs) +{ + WRITE_ONCE(irq_received, true); + xapic_write_reg(APIC_EOI, 0x00); +} + +int main(int argc, char *argv[]) +{ + struct kvm_vm *vm; + struct kvm_vcpu *vcpu; + struct ucall uc; + uint64_t halt_exits, vintr_exits; + + /* Check the extension for binary stats */ + TEST_REQUIRE(this_cpu_has(X86_FEATURE_IDLE_HLT)); + TEST_REQUIRE(kvm_has_cap(KVM_CAP_BINARY_STATS_FD)); + + vm = vm_create_with_one_vcpu(&vcpu, guest_code); + + vm_install_exception_handler(vm, VINTR_VECTOR, guest_vintr_handler); + virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA); + + vcpu_run(vcpu); + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO); + + halt_exits = vcpu_get_stat(vcpu, halt_exits); + vintr_exits = vcpu_get_stat(vcpu, irq_window_exits); + + switch (get_ucall(vcpu, &uc)) { + case UCALL_ABORT: + REPORT_GUEST_ASSERT(uc); + /* NOT REACHED */ + case UCALL_DONE: + break; + + default: + TEST_FAIL("Unknown ucall 0x%lx.", uc.cmd); + } + + TEST_ASSERT_EQ(halt_exits, 0); + pr_debug("Guest executed VINTR followed by halts: %d times.\n" + "The guest exited due to halt: %ld times and number\n" + "of vintr exits: %ld.\n", + NUM_ITERATIONS, halt_exits, vintr_exits); + + kvm_vm_free(vm); + return 0; +}
On Tue, Oct 22, 2024, Manali Shukla wrote:
diff --git a/tools/testing/selftests/kvm/x86_64/svm_idle_hlt_test.c b/tools/testing/selftests/kvm/x86_64/svm_idle_hlt_test.c new file mode 100644 index 000000000000..fe2ea96695e4 --- /dev/null +++ b/tools/testing/selftests/kvm/x86_64/svm_idle_hlt_test.c @@ -0,0 +1,89 @@ +// SPDX-License-Identifier: GPL-2.0-only +/*
- Copyright (C) 2024 Advanced Micro Devices, Inc.
- */
+#include <kvm_util.h> +#include <processor.h> +#include <test_util.h> +#include "svm_util.h" +#include "apic.h"
+#define VINTR_VECTOR 0x30
Drop the "V". From the guest's perspective, it's simply the interrupt vector. The "V" suggests there's nested SVM stuff going on, e.g. to virtualize an interrupt for L2 or something.
+#define NUM_ITERATIONS 1000
+static bool irq_received;
+/*
- The guest code instruments the scenario where there is a V_INTR pending
- event available while hlt instruction is executed. The HLT VM Exit doesn't
- occur in above-mentioned scenario if Idle HLT intercept feature is enabled.
- */
So the only thing thing that is idle-HLT specific in this test is that final TEST_ASSERT_EQ(). Rather than make this test depend on idle-HLT, we should tweak it run on all hardware, and then:
if (kvm_cpu_has(X86_FEATURE_IDLE_HLT)) TEST_ASSERT_EQ(halt_exits, 0); else TEST_ASSERT_EQ(halt_exits, NUM_ITERATIONS);
Not sure about the name. Maybe hlt_ipi_test or ipi_hlt_test?
+static void guest_code(void) +{
- uint32_t icr_val;
- int i;
- xapic_enable();
Hmm, I think we should have this test force x2APIC mode. KVM emulates x2APIC in software (if it's not accerlated by APICv), i.e. it's always available. That will allow using this test to do performance testing of KVM's fastpath handling of handle_fastpath_set_x2apic_icr_irqoff().
Of course, KVM only uses the fastpath for non-shorthand IPIs, and any setup that can do self-IPI fully in the fastpath (via virtual interrupt delivery) won't exit in the first place (virtualized by hardware), i.e. there's probably no point in adding self-IPIs to the fastpath.
But maybe in the future I can convince someone to enhance this test to do cross-vCPU IPI testing.
- icr_val = (APIC_DEST_SELF | APIC_INT_ASSERT | VINTR_VECTOR);
- for (i = 0; i < NUM_ITERATIONS; i++) {
cli();
xapic_write_reg(APIC_ICR, icr_val);
safe_halt();
GUEST_ASSERT(READ_ONCE(irq_received));
WRITE_ONCE(irq_received, false);
- }
- GUEST_DONE();
+}
+static void guest_vintr_handler(struct ex_regs *regs) +{
- WRITE_ONCE(irq_received, true);
- xapic_write_reg(APIC_EOI, 0x00);
+}
+int main(int argc, char *argv[]) +{
- struct kvm_vm *vm;
- struct kvm_vcpu *vcpu;
- struct ucall uc;
- uint64_t halt_exits, vintr_exits;
- /* Check the extension for binary stats */
Pointless comment, the code below is self-explanatory.
- TEST_REQUIRE(this_cpu_has(X86_FEATURE_IDLE_HLT));
This needs to check *KVM* support. I.e. kvm_cpu_has().
- TEST_REQUIRE(kvm_has_cap(KVM_CAP_BINARY_STATS_FD));
- vm = vm_create_with_one_vcpu(&vcpu, guest_code);
- vm_install_exception_handler(vm, VINTR_VECTOR, guest_vintr_handler);
- virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA);
- vcpu_run(vcpu);
- TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
- halt_exits = vcpu_get_stat(vcpu, halt_exits);
- vintr_exits = vcpu_get_stat(vcpu, irq_window_exits);
- switch (get_ucall(vcpu, &uc)) {
- case UCALL_ABORT:
REPORT_GUEST_ASSERT(uc);
/* NOT REACHED */
- case UCALL_DONE:
break;
- default:
TEST_FAIL("Unknown ucall 0x%lx.", uc.cmd);
- }
- TEST_ASSERT_EQ(halt_exits, 0);
- pr_debug("Guest executed VINTR followed by halts: %d times.\n"
"The guest exited due to halt: %ld times and number\n"
"of vintr exits: %ld.\n",
NUM_ITERATIONS, halt_exits, vintr_exits);
halt_exits obviously is '0' at this point, so I don't see any point in printing it out.
As for vintr_exits, I vote to drop it, for now at least. At some point in the future, I would like to expand this test so that it can be used for a rudimentary IPI+HLT perf test. But for now, I think it makes sense to keep it simple, e.g. so that nothing needs to be unwound if improvements are made in the future.
- kvm_vm_free(vm);
- return 0;
+}
2.34.1
On 10/22/2024 11:18 AM, Manali Shukla wrote:
The upcoming new Idle HLT Intercept feature allows for the HLT instruction execution by a vCPU to be intercepted by the hypervisor only if there are no pending V_INTR and V_NMI events for the vCPU. When the vCPU is expected to service the pending V_INTR and V_NMI events, the Idle HLT intercept won’t trigger. The feature allows the hypervisor to determine if the vCPU is actually idle and reduces wasteful VMEXITs.
The idle HLT intercept feature is used for enlightened guests who wish to securely handle the events. When an enlightened guest does a HLT while an interrupt is pending, hypervisor will not have a way to figure out whether the guest needs to be re-entered or not. The Idle HLT intercept feature allows the HLT execution only if there are no pending V_INTR and V_NMI events.
Presence of the Idle HLT Intercept feature is indicated via CPUID function Fn8000_000A_EDX[30].
Document for the Idle HLT intercept feature is available at [1].
This series is based on kvm-next/next (64dbb3a771a1) + [2].
Experiments done:
kvm_amd.avic is set to '0' for this experiment.
The below numbers represent the average of 10 runs.
Normal guest (L1) The below netperf command was run on the guest with smp = 1 (pinned).
netperf -H <host ip> -t TCP_RR -l 60
|with Idle HLT(transactions/Sec)|w/o Idle HLT(transactions/Sec)|
| 25645.7136 | 25773.2796 |
Number of transactions/sec with and without idle HLT intercept feature are almost same.
Nested guest (L2) The below netperf command was run on L2 guest with smp = 1 (pinned).
netperf -H <host ip> -t TCP_RR -l 60
|with Idle HLT(transactions/Sec)|w/o Idle HLT(transactions/Sec)|
| 5655.4468 | 5755.2189 |
Number of transactions/sec with and without idle HLT intercept feature are almost same.
Testing Done:
- Tested the functionality for the Idle HLT intercept feature using selftest svm_idle_hlt_test.
- Tested SEV and SEV-ES guest for the Idle HLT intercept functionality.
- Tested the Idle HLT intercept functionality on nested guest.
v3 -> v4
- Drop the patches to add vcpu_get_stat() into a new series [2].
- Added nested Idle HLT intercept support.
v2 -> v3
- Incorporated Andrew's suggestion to structure vcpu_stat_types in a way that each architecture can share the generic types and also provide its own.
v1 -> v2
- Done changes in svm_idle_hlt_test based on the review comments from Sean.
- Added an enum based approach to get binary stats in vcpu_get_stat() which doesn't use string to get stat data based on the comments from Sean.
- Added self_halt() and cli() helpers based on the comments from Sean.
[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024, Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT). https://bugzilla.kernel.org/attachment.cgi?id=306250
Manali Shukla (4): x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept KVM: SVM: Add Idle HLT intercept support KVM: nSVM: implement the nested idle halt intercept KVM: selftests: KVM: SVM: Add Idle HLT intercept test
arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/svm.h | 1 + arch/x86/include/uapi/asm/svm.h | 2 + arch/x86/kvm/governed_features.h | 1 + arch/x86/kvm/svm/nested.c | 7 ++ arch/x86/kvm/svm/svm.c | 15 +++- tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/include/x86_64/processor.h | 1 + .../selftests/kvm/x86_64/svm_idle_hlt_test.c | 89 +++++++++++++++++++ 9 files changed, 115 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86_64/svm_idle_hlt_test.c
base-commit: c8d430db8eec7d4fd13a6bea27b7086a54eda6da prerequisite-patch-id: ca912571db5c004f77b70843b8dd35517ff1267f prerequisite-patch-id: 164ea3b4346f9e04bc69819278d20f5e1b5df5ed prerequisite-patch-id: 90d870f426ebc2cec43c0dd89b701ee998385455 prerequisite-patch-id: 45812b799c517a4521782a1fdbcda881237e1eda
A gentle reminder.
-Manali
On 11/28/2024 8:39 PM, Manali Shukla wrote:
On 10/22/2024 11:18 AM, Manali Shukla wrote:
The upcoming new Idle HLT Intercept feature allows for the HLT instruction execution by a vCPU to be intercepted by the hypervisor only if there are no pending V_INTR and V_NMI events for the vCPU. When the vCPU is expected to service the pending V_INTR and V_NMI events, the Idle HLT intercept won’t trigger. The feature allows the hypervisor to determine if the vCPU is actually idle and reduces wasteful VMEXITs.
The idle HLT intercept feature is used for enlightened guests who wish to securely handle the events. When an enlightened guest does a HLT while an interrupt is pending, hypervisor will not have a way to figure out whether the guest needs to be re-entered or not. The Idle HLT intercept feature allows the HLT execution only if there are no pending V_INTR and V_NMI events.
Presence of the Idle HLT Intercept feature is indicated via CPUID function Fn8000_000A_EDX[30].
Document for the Idle HLT intercept feature is available at [1].
This series is based on kvm-next/next (64dbb3a771a1) + [2].
Experiments done:
kvm_amd.avic is set to '0' for this experiment.
The below numbers represent the average of 10 runs.
Normal guest (L1) The below netperf command was run on the guest with smp = 1 (pinned).
netperf -H <host ip> -t TCP_RR -l 60
|with Idle HLT(transactions/Sec)|w/o Idle HLT(transactions/Sec)|
| 25645.7136 | 25773.2796 |
Number of transactions/sec with and without idle HLT intercept feature are almost same.
Nested guest (L2) The below netperf command was run on L2 guest with smp = 1 (pinned).
netperf -H <host ip> -t TCP_RR -l 60
|with Idle HLT(transactions/Sec)|w/o Idle HLT(transactions/Sec)|
| 5655.4468 | 5755.2189 |
Number of transactions/sec with and without idle HLT intercept feature are almost same.
Testing Done:
- Tested the functionality for the Idle HLT intercept feature using selftest svm_idle_hlt_test.
- Tested SEV and SEV-ES guest for the Idle HLT intercept functionality.
- Tested the Idle HLT intercept functionality on nested guest.
v3 -> v4
- Drop the patches to add vcpu_get_stat() into a new series [2].
- Added nested Idle HLT intercept support.
v2 -> v3
- Incorporated Andrew's suggestion to structure vcpu_stat_types in a way that each architecture can share the generic types and also provide its own.
v1 -> v2
- Done changes in svm_idle_hlt_test based on the review comments from Sean.
- Added an enum based approach to get binary stats in vcpu_get_stat() which doesn't use string to get stat data based on the comments from Sean.
- Added self_halt() and cli() helpers based on the comments from Sean.
[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024, Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT). https://bugzilla.kernel.org/attachment.cgi?id=306250
Manali Shukla (4): x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept KVM: SVM: Add Idle HLT intercept support KVM: nSVM: implement the nested idle halt intercept KVM: selftests: KVM: SVM: Add Idle HLT intercept test
arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/svm.h | 1 + arch/x86/include/uapi/asm/svm.h | 2 + arch/x86/kvm/governed_features.h | 1 + arch/x86/kvm/svm/nested.c | 7 ++ arch/x86/kvm/svm/svm.c | 15 +++- tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/include/x86_64/processor.h | 1 + .../selftests/kvm/x86_64/svm_idle_hlt_test.c | 89 +++++++++++++++++++ 9 files changed, 115 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86_64/svm_idle_hlt_test.c
base-commit: c8d430db8eec7d4fd13a6bea27b7086a54eda6da prerequisite-patch-id: ca912571db5c004f77b70843b8dd35517ff1267f prerequisite-patch-id: 164ea3b4346f9e04bc69819278d20f5e1b5df5ed prerequisite-patch-id: 90d870f426ebc2cec43c0dd89b701ee998385455 prerequisite-patch-id: 45812b799c517a4521782a1fdbcda881237e1eda
A gentle reminder.
-Manali
A Gentle reminder.
-Manali
linux-kselftest-mirror@lists.linaro.org