The upcoming new Idle HLT Intercept feature allows for the HLT instruction execution by a vCPU to be intercepted by the hypervisor only if there are no pending V_INTR and V_NMI events for the vCPU. When the vCPU is expected to service the pending V_INTR and V_NMI events, the Idle HLT intercept won’t trigger. The feature allows the hypervisor to determine if the vCPU is actually idle and reduces wasteful VMEXITs.
Presence of the Idle HLT Intercept feature is indicated via CPUID function Fn8000_000A_EDX[30].
Document for the Idle HLT intercept feature will be available in the next version of "AMD64 Architecture Programmer’s Manual".
Testing Done: Added a selftest to test the Idle HLT intercept functionality. Tested SEV and SEV-ES guest for the Idle HLT intercept functionality.
Manali Shukla (5): x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept KVM: SVM: Add Idle HLT intercept support tools: Add KVM exit reason for the Idle HLT selftests: Add an interface to read the data of named vcpu stat selftests: KVM: SVM: Add Idle HLT intercept test
arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/svm.h | 1 + arch/x86/include/uapi/asm/svm.h | 2 + arch/x86/kvm/svm/svm.c | 11 +- tools/arch/x86/include/uapi/asm/svm.h | 2 + tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/include/kvm_util_base.h | 11 ++ tools/testing/selftests/kvm/lib/kvm_util.c | 41 ++++++ .../selftests/kvm/x86_64/svm_idlehlt_test.c | 119 ++++++++++++++++++ 9 files changed, 186 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86_64/svm_idlehlt_test.c
base-commit: fdd58834d132046149699b88a27a0db26829f4fb
From: Manali Shukla Manali.Shukla@amd.com
The Idle HLT Intercept feature allows for the HLT instruction execution by a vCPU to be intercepted by the hypervisor only if there are no pending events (V_INTR and V_NMI) for the vCPU. When the vCPU is expected to service the pending events (V_INTR and V_NMI), the Idle HLT intercept won’t trigger. The feature allows the hypervisor to determine if the vCPU is idle and reduces wasteful VMEXITs.
Presence of the Idle HLT intercept feature for guests is indicated via CPUID function 0x8000000A_EDX[30].
Signed-off-by: Manali Shukla Manali.Shukla@amd.com --- arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index fdf723b6f6d0..c312b0bfec6a 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -379,6 +379,7 @@ #define X86_FEATURE_V_SPEC_CTRL (15*32+20) /* Virtual SPEC_CTRL */ #define X86_FEATURE_VNMI (15*32+25) /* Virtual NMI */ #define X86_FEATURE_SVME_ADDR_CHK (15*32+28) /* "" SVME addr check */ +#define X86_FEATURE_IDLE_HLT (15*32+30) /* "" IDLE HLT intercept */
/* Intel-defined CPU features, CPUID level 0x00000007:0 (ECX), word 16 */ #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/
From: Manali Shukla Manali.Shukla@amd.com
Execution of the HLT instruction by a vCPU can be intercepted by the hypervisor by setting the HLT-Intercept Bit in VMCB, thus resulting in a VMEXIT. It can be possible that soon after the VMEXIT, hypervisor observes that there are pending V_INTR and V_NMI events for the vCPU and causes it to perform a VMRUN to service those events. In that case, the VMEXIT is wasteful.
The Idle HLT intercept feature allows for the HLT instruction execution by a vCPU to be intercepted by the hypervisor only if there are no pending V_INTR and V_NMI events for the vCPU. The Idle HLT intercept will not be triggerred, when the vCPU is expected to have pending events (V_INR and V_NMI).
The feature allows the hypervisor to determine whether the vCPU is actually idle and reduces wasteful VMEXITs.
Signed-off-by: Manali Shukla Manali.Shukla@amd.com --- arch/x86/include/asm/svm.h | 1 + arch/x86/include/uapi/asm/svm.h | 2 ++ arch/x86/kvm/svm/svm.c | 11 ++++++++--- 3 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h index 87a7b917d30e..68ff0855a77b 100644 --- a/arch/x86/include/asm/svm.h +++ b/arch/x86/include/asm/svm.h @@ -116,6 +116,7 @@ enum { INTERCEPT_INVPCID, INTERCEPT_MCOMMIT, INTERCEPT_TLBSYNC, + INTERCEPT_IDLE_HLT = 166, };
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h index 80e1df482337..9910f86a2cef 100644 --- a/arch/x86/include/uapi/asm/svm.h +++ b/arch/x86/include/uapi/asm/svm.h @@ -95,6 +95,7 @@ #define SVM_EXIT_CR14_WRITE_TRAP 0x09e #define SVM_EXIT_CR15_WRITE_TRAP 0x09f #define SVM_EXIT_INVPCID 0x0a2 +#define SVM_EXIT_IDLE_HLT 0x0a6 #define SVM_EXIT_NPF 0x400 #define SVM_EXIT_AVIC_INCOMPLETE_IPI 0x401 #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402 @@ -223,6 +224,7 @@ { SVM_EXIT_CR4_WRITE_TRAP, "write_cr4_trap" }, \ { SVM_EXIT_CR8_WRITE_TRAP, "write_cr8_trap" }, \ { SVM_EXIT_INVPCID, "invpcid" }, \ + { SVM_EXIT_IDLE_HLT, "idle-halt" }, \ { SVM_EXIT_NPF, "npf" }, \ { SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \ { SVM_EXIT_AVIC_UNACCELERATED_ACCESS, "avic_unaccelerated_access" }, \ diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index e90b429c84f1..a08a53508469 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1289,8 +1289,12 @@ static void init_vmcb(struct kvm_vcpu *vcpu) svm_set_intercept(svm, INTERCEPT_MWAIT); }
- if (!kvm_hlt_in_guest(vcpu->kvm)) - svm_set_intercept(svm, INTERCEPT_HLT); + if (!kvm_hlt_in_guest(vcpu->kvm)) { + if (cpu_feature_enabled(X86_FEATURE_IDLE_HLT)) + svm_set_intercept(svm, INTERCEPT_IDLE_HLT); + else + svm_set_intercept(svm, INTERCEPT_HLT); + }
control->iopm_base_pa = __sme_set(iopm_base); control->msrpm_base_pa = __sme_set(__pa(svm->msrpm)); @@ -3302,6 +3306,7 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = { [SVM_EXIT_CR4_WRITE_TRAP] = cr_trap, [SVM_EXIT_CR8_WRITE_TRAP] = cr_trap, [SVM_EXIT_INVPCID] = invpcid_interception, + [SVM_EXIT_IDLE_HLT] = kvm_emulate_halt, [SVM_EXIT_NPF] = npf_interception, [SVM_EXIT_RSM] = rsm_interception, [SVM_EXIT_AVIC_INCOMPLETE_IPI] = avic_incomplete_ipi_interception, @@ -3462,7 +3467,7 @@ int svm_invoke_exit_handler(struct kvm_vcpu *vcpu, u64 exit_code) return interrupt_window_interception(vcpu); else if (exit_code == SVM_EXIT_INTR) return intr_interception(vcpu); - else if (exit_code == SVM_EXIT_HLT) + else if (exit_code == SVM_EXIT_HLT || exit_code == SVM_EXIT_IDLE_HLT) return kvm_emulate_halt(vcpu); else if (exit_code == SVM_EXIT_NPF) return npf_interception(vcpu);
From: Manali Shukla Manali.Shukla@amd.com
The Idle HLT intercept feature allows for the HLT instruction execution by a vCPU to be intercepted by hypervisor only if there are no pending V_INR and V_NMI events for the vCPU. The Idle HLT intercept will not be triggerred when vCPU is expected to service pending events (V_INTR and V_NMI).
The new SVM_EXIT_IDLE_HLT is introduced as part of the Idle HLT intercept feature. Add it to SVM_EXIT_REASONS, so that the SVM_EXIT_IDLE_HLT type of VMEXIT is recognized by tools like perf etc.
Signed-off-by: Manali Shukla Manali.Shukla@amd.com --- tools/arch/x86/include/uapi/asm/svm.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/tools/arch/x86/include/uapi/asm/svm.h b/tools/arch/x86/include/uapi/asm/svm.h index 80e1df482337..5bf1ad15e1ee 100644 --- a/tools/arch/x86/include/uapi/asm/svm.h +++ b/tools/arch/x86/include/uapi/asm/svm.h @@ -95,6 +95,7 @@ #define SVM_EXIT_CR14_WRITE_TRAP 0x09e #define SVM_EXIT_CR15_WRITE_TRAP 0x09f #define SVM_EXIT_INVPCID 0x0a2 +#define SVM_EXIT_IDLE_HLT 0x0a6 #define SVM_EXIT_NPF 0x400 #define SVM_EXIT_AVIC_INCOMPLETE_IPI 0x401 #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402 @@ -224,6 +225,7 @@ { SVM_EXIT_CR8_WRITE_TRAP, "write_cr8_trap" }, \ { SVM_EXIT_INVPCID, "invpcid" }, \ { SVM_EXIT_NPF, "npf" }, \ + { SVM_EXIT_IDLE_HLT, "idle-halt" }, \ { SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \ { SVM_EXIT_AVIC_UNACCELERATED_ACCESS, "avic_unaccelerated_access" }, \ { SVM_EXIT_VMGEXIT, "vmgexit" }, \
On Thu, Mar 07, 2024, Manali Shukla wrote:
From: Manali Shukla Manali.Shukla@amd.com
The Idle HLT intercept feature allows for the HLT instruction execution by a vCPU to be intercepted by hypervisor only if there are no pending V_INR and V_NMI events for the vCPU. The Idle HLT intercept will not be triggerred when vCPU is expected to service pending events (V_INTR and V_NMI).
The new SVM_EXIT_IDLE_HLT is introduced as part of the Idle HLT intercept feature. Add it to SVM_EXIT_REASONS, so that the SVM_EXIT_IDLE_HLT type of VMEXIT is recognized by tools like perf etc.
Signed-off-by: Manali Shukla Manali.Shukla@amd.com
tools/arch/x86/include/uapi/asm/svm.h | 2 ++
Please drop the tools/ uapi headers update. Nothing KVM-related in tools/ actually relies on the headers being copied into tools/, e.g. KVM selftests pulls KVM's headers from the .../usr/include/ directory that's populated by `make headers_install`.
Perf's tooling is what actually "needs" the headers to be copied into tools/; let the tools/perf maintainers deal with the headache of keeping everything up-to-date.
Hi Sean, Thank you for reviewing my patches
On 3/7/2024 8:00 PM, Sean Christopherson wrote:
On Thu, Mar 07, 2024, Manali Shukla wrote:
From: Manali Shukla Manali.Shukla@amd.com
The Idle HLT intercept feature allows for the HLT instruction execution by a vCPU to be intercepted by hypervisor only if there are no pending V_INR and V_NMI events for the vCPU. The Idle HLT intercept will not be triggerred when vCPU is expected to service pending events (V_INTR and V_NMI).
The new SVM_EXIT_IDLE_HLT is introduced as part of the Idle HLT intercept feature. Add it to SVM_EXIT_REASONS, so that the SVM_EXIT_IDLE_HLT type of VMEXIT is recognized by tools like perf etc.
Signed-off-by: Manali Shukla Manali.Shukla@amd.com
tools/arch/x86/include/uapi/asm/svm.h | 2 ++
Please drop the tools/ uapi headers update. Nothing KVM-related in tools/ actually relies on the headers being copied into tools/, e.g. KVM selftests pulls KVM's headers from the .../usr/include/ directory that's populated by `make headers_install`.
Perf's tooling is what actually "needs" the headers to be copied into tools/; let the tools/perf maintainers deal with the headache of keeping everything up-to-date.
Sure I will drop this patch in V2.
-Manali
From: Manali Shukla Manali.Shukla@amd.com
The interface is used to read the data values of a specified vcpu stat from the currenly available binary stats interface.
Signed-off-by: Manali Shukla Manali.Shukla@amd.com --- .../selftests/kvm/include/kvm_util_base.h | 11 +++++ tools/testing/selftests/kvm/lib/kvm_util.c | 41 +++++++++++++++++++ 2 files changed, 52 insertions(+)
diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h index 9e5afc472c14..294bb42b6940 100644 --- a/tools/testing/selftests/kvm/include/kvm_util_base.h +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h @@ -512,6 +512,17 @@ void read_stat_data(int stats_fd, struct kvm_stats_header *header, struct kvm_stats_desc *desc, uint64_t *data, size_t max_elements);
+void __vcpu_get_stat(struct kvm_vcpu *vcpu, const char *stat_name, uint64_t *data, + size_t max_elements); + +static inline uint64_t vcpu_get_stat(struct kvm_vcpu *vcpu, const char *stat_name) +{ + uint64_t data; + + __vcpu_get_stat(vcpu, stat_name, &data, 1); + return data; +} + void __vm_get_stat(struct kvm_vm *vm, const char *stat_name, uint64_t *data, size_t max_elements);
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index e066d584c656..3d3a67ea0c7a 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -2166,6 +2166,47 @@ void read_stat_data(int stats_fd, struct kvm_stats_header *header, desc->name, size, ret); }
+/* + * Read the data of the named vcpu stat + * + * Input Args: + * vcpu - the vcpu for which the stat should be read + * stat_name - the name of the stat to read + * max_elements - the maximum number of 8-byte values to read into data + * + * Output Args: + * data - the buffer into which stat data should be read + * + * Read the data values of a specified stat from the binary stats interface. + */ +void __vcpu_get_stat(struct kvm_vcpu *vcpu, const char *stat_name, uint64_t *data, + size_t max_elements) +{ + int vcpu_stats_fd; + struct kvm_stats_header header; + struct kvm_stats_desc *desc, *t_desc; + size_t size_desc; + int i; + + vcpu_stats_fd = vcpu_get_stats_fd(vcpu); + read_stats_header(vcpu_stats_fd, &header); + + desc = read_stats_descriptors(vcpu_stats_fd, &header); + size_desc = get_stats_descriptor_size(&header); + + for (i = 0; i < header.num_desc; ++i) { + t_desc = (void *)desc + (i * size_desc); + + if (strcmp(t_desc->name, stat_name)) + continue; + + read_stat_data(vcpu_stats_fd, &header, t_desc, + data, max_elements); + + break; + } +} + /* * Read the data of the named stat *
From: Manali Shukla Manali.Shukla@amd.com
The Execution of the HLT instruction results in a VMEXIT. Hypervisor observes pending V_INTR and V_NMI events just after the VMEXIT generated by the HLT for the vCPU and causes VM entry to service the pending events. The Idle HLT intercept feature avoids the wasteful VMEXIT during the halt if there are pending V_INTR and V_NMI events for the vCPU.
The selftest for the Idle HLT intercept instruments the above-mentioned scenario.
Signed-off-by: Manali Shukla Manali.Shukla@amd.com --- tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/x86_64/svm_idlehlt_test.c | 118 ++++++++++++++++++ 2 files changed, 119 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86_64/svm_idlehlt_test.c
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 492e937fab00..7ad03dc4f35e 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -89,6 +89,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/smaller_maxphyaddr_emulation_test TEST_GEN_PROGS_x86_64 += x86_64/smm_test TEST_GEN_PROGS_x86_64 += x86_64/state_test TEST_GEN_PROGS_x86_64 += x86_64/vmx_preemption_timer_test +TEST_GEN_PROGS_x86_64 += x86_64/svm_idlehlt_test TEST_GEN_PROGS_x86_64 += x86_64/svm_vmcall_test TEST_GEN_PROGS_x86_64 += x86_64/svm_int_ctl_test TEST_GEN_PROGS_x86_64 += x86_64/svm_nested_shutdown_test diff --git a/tools/testing/selftests/kvm/x86_64/svm_idlehlt_test.c b/tools/testing/selftests/kvm/x86_64/svm_idlehlt_test.c new file mode 100644 index 000000000000..1564511799d4 --- /dev/null +++ b/tools/testing/selftests/kvm/x86_64/svm_idlehlt_test.c @@ -0,0 +1,118 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * svm_idlehalt_test + * + * Copyright (C) 2024 Advanced Micro Devices, Inc. + * + * For licencing details see kernel-base/COPYING + * + * Author: + * Manali Shukla manali.shukla@amd.com + */ +#include "kvm_util.h" +#include "svm_util.h" +#include "processor.h" +#include "test_util.h" +#include "apic.h" + +#define VINTR_VECTOR 0x30 +#define NUM_ITERATIONS 100000 + +/* + * Incremented in the VINTR handler. Provides evidence to the sender that the + * VINR is arrived at the destination. + */ +static volatile uint64_t vintr_rcvd; + +void verify_apic_base_addr(void) +{ + uint64_t msr = rdmsr(MSR_IA32_APICBASE); + uint64_t base = GET_APIC_BASE(msr); + + GUEST_ASSERT(base == APIC_DEFAULT_GPA); +} + +/* + * The halting guest code instruments the scenario where there is a V_INTR pending + * event available while hlt instruction is executed. The HLT VM Exit doesn't + * occur in above-mentioned scenario if the Idle HLT intercept feature is enabled. + */ + +static void halter_guest_code(void) +{ + uint32_t icr_val; + int i; + + verify_apic_base_addr(); + xapic_enable(); + + icr_val = (APIC_DEST_SELF | APIC_INT_ASSERT | VINTR_VECTOR); + + for (i = 0; i < NUM_ITERATIONS; i++) { + xapic_write_reg(APIC_ICR, icr_val); + asm volatile("sti; hlt; cli"); + } + GUEST_DONE(); +} + +static void guest_vintr_handler(struct ex_regs *regs) +{ + vintr_rcvd++; + xapic_write_reg(APIC_EOI, 0x30); +} + +int main(int argc, char *argv[]) +{ + struct kvm_vm *vm; + struct kvm_vcpu *vcpu; + struct ucall uc; + uint64_t halt_exits, vintr_exits; + uint64_t *pvintr_rcvd; + + TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_SVM)); + + /* Check the extension for binary stats */ + TEST_REQUIRE(kvm_has_cap(KVM_CAP_BINARY_STATS_FD)); + + vm = vm_create_with_one_vcpu(&vcpu, halter_guest_code); + + vm_init_descriptor_tables(vm); + vcpu_init_descriptor_tables(vcpu); + vm_install_exception_handler(vm, VINTR_VECTOR, guest_vintr_handler); + virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA); + + vcpu_run(vcpu); + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO); + + halt_exits = vcpu_get_stat(vcpu, "halt_exits"); + vintr_exits = vcpu_get_stat(vcpu, "irq_window_exits"); + pvintr_rcvd = (uint64_t *)addr_gva2hva(vm, (uint64_t)&vintr_rcvd); + + switch (get_ucall(vcpu, &uc)) { + case UCALL_ABORT: + REPORT_GUEST_ASSERT(uc); + /* NOT REACHED */ + case UCALL_DONE: + goto done; + default: + TEST_FAIL("Unknown ucall 0x%lx.", uc.cmd); + } + +done: + TEST_ASSERT(halt_exits == 0, + "Test Failed:\n" + "Guest executed VINTR followed by halts: %d times\n" + "The guest exited due to halt: %ld times and number\n" + "of vintr exits: %ld and vintr got re-injected: %ld times\n", + NUM_ITERATIONS, halt_exits, vintr_exits, *pvintr_rcvd); + + fprintf(stderr, + "Test Successful:\n" + "Guest executed VINTR followed by halts: %d times\n" + "The guest exited due to halt: %ld times and number\n" + "of vintr exits: %ld and vintr got re-injected: %ld times\n", + NUM_ITERATIONS, halt_exits, vintr_exits, *pvintr_rcvd); + + kvm_vm_free(vm); + return 0; +}
On Thu, Mar 07, 2024, Manali Shukla wrote:
From: Manali Shukla Manali.Shukla@amd.com
The Execution of the HLT instruction results in a VMEXIT. Hypervisor observes pending V_INTR and V_NMI events just after the VMEXIT generated by the HLT for the vCPU and causes VM entry to service the pending events. The Idle HLT intercept feature avoids the wasteful VMEXIT during the halt if there are pending V_INTR and V_NMI events for the vCPU.
The selftest for the Idle HLT intercept instruments the above-mentioned scenario.
Signed-off-by: Manali Shukla Manali.Shukla@amd.com
tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/x86_64/svm_idlehlt_test.c | 118 ++++++++++++++++++ 2 files changed, 119 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86_64/svm_idlehlt_test.c
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 492e937fab00..7ad03dc4f35e 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -89,6 +89,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/smaller_maxphyaddr_emulation_test TEST_GEN_PROGS_x86_64 += x86_64/smm_test TEST_GEN_PROGS_x86_64 += x86_64/state_test TEST_GEN_PROGS_x86_64 += x86_64/vmx_preemption_timer_test +TEST_GEN_PROGS_x86_64 += x86_64/svm_idlehlt_test
Uber nit, maybe svm_idle_hlt_test? I find "idlehlt" hard to parse.
TEST_GEN_PROGS_x86_64 += x86_64/svm_vmcall_test TEST_GEN_PROGS_x86_64 += x86_64/svm_int_ctl_test TEST_GEN_PROGS_x86_64 += x86_64/svm_nested_shutdown_test diff --git a/tools/testing/selftests/kvm/x86_64/svm_idlehlt_test.c b/tools/testing/selftests/kvm/x86_64/svm_idlehlt_test.c new file mode 100644 index 000000000000..1564511799d4 --- /dev/null +++ b/tools/testing/selftests/kvm/x86_64/svm_idlehlt_test.c @@ -0,0 +1,118 @@ +// SPDX-License-Identifier: GPL-2.0-only +/*
- svm_idlehalt_test
Please omit this, file comments that state the name of the test inevitably become stale (see above).
- Copyright (C) 2024 Advanced Micro Devices, Inc.
- For licencing details see kernel-base/COPYING
This seems gratuitous, doesn't the SPDX stuff take care this?
- Author:
- Manali Shukla manali.shukla@amd.com
- */
+#include "kvm_util.h" +#include "svm_util.h" +#include "processor.h" +#include "test_util.h" +#include "apic.h"
+#define VINTR_VECTOR 0x30 +#define NUM_ITERATIONS 100000
What's the runtime? If it's less than a second, then whatever, but if it's at all longer than that, then I'd prefer to use a lower default and make this user- configurable.
+/*
- Incremented in the VINTR handler. Provides evidence to the sender that the
- VINR is arrived at the destination.
Evidence is useless if there's no detective looking for it. Yeah, it gets printed out in the end, but in reality, no one is going to look at that.
Rather than read this from the host, just make it a non-volatile bool and assert that it set after every
- */
+static volatile uint64_t vintr_rcvd;
+void verify_apic_base_addr(void) +{
- uint64_t msr = rdmsr(MSR_IA32_APICBASE);
- uint64_t base = GET_APIC_BASE(msr);
- GUEST_ASSERT(base == APIC_DEFAULT_GPA);
+}
+/*
- The halting guest code instruments the scenario where there is a V_INTR pending
- event available while hlt instruction is executed. The HLT VM Exit doesn't
- occur in above-mentioned scenario if the Idle HLT intercept feature is enabled.
- */
+static void halter_guest_code(void)
Just "guest_code()". Yeah, it's a weird generic name, but at this point it's so ubiquitous that it's analogous to main(), i.e. readers know that guest_code() is the entry point. And deviating from that suggests that there is a second vCPU running _other_ guest code (otherwise, why differentiate?), which isn't the case.
+{
- uint32_t icr_val;
- int i;
- verify_apic_base_addr();
Why? The test will fail if the APIC is borked, this is just unnecessary noise that distracts readers.
- xapic_enable();
- icr_val = (APIC_DEST_SELF | APIC_INT_ASSERT | VINTR_VECTOR);
- for (i = 0; i < NUM_ITERATIONS; i++) {
xapic_write_reg(APIC_ICR, icr_val);
asm volatile("sti; hlt; cli");
Please add safe_halt() and cli() helpers in processor.h. And then do:
cli(); xapic_write_reg(APIC_ICR, icr_val); safe_halt();
to guarantee that interrupts are disabled when the IPI is sent. And as above,
GUEST_ASSERT(READ_ONCE(irq_received)); WRITE_ONCE(irq_received, false);
- }
- GUEST_DONE();
+}
+static void guest_vintr_handler(struct ex_regs *regs) +{
- vintr_rcvd++;
- xapic_write_reg(APIC_EOI, 0x30);
EOI is typically written with '0', not the vector, because the legacy EOI register clears the highest ISR vector, not whatever is specified. IIRC, one of the Intel or AMD specs even says to use '0'.
AMD's Specific EOI register does allow targeting a specific vector, but that's not what's being used here.
+}
+int main(int argc, char *argv[]) +{
- struct kvm_vm *vm;
- struct kvm_vcpu *vcpu;
- struct ucall uc;
- uint64_t halt_exits, vintr_exits;
- uint64_t *pvintr_rcvd;
- TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_SVM));
No, this test doesn't require SVM, which is KVM advertising *nested* SVM. This test does require idle-hlt support though...
- /* Check the extension for binary stats */
- TEST_REQUIRE(kvm_has_cap(KVM_CAP_BINARY_STATS_FD));
- vm = vm_create_with_one_vcpu(&vcpu, halter_guest_code);
- vm_init_descriptor_tables(vm);
- vcpu_init_descriptor_tables(vcpu);
- vm_install_exception_handler(vm, VINTR_VECTOR, guest_vintr_handler);
- virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA);
- vcpu_run(vcpu);
- TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
- halt_exits = vcpu_get_stat(vcpu, "halt_exits");
Is there really no way to get binary stats without having to pass in strings?
- vintr_exits = vcpu_get_stat(vcpu, "irq_window_exits");
- pvintr_rcvd = (uint64_t *)addr_gva2hva(vm, (uint64_t)&vintr_rcvd);
- switch (get_ucall(vcpu, &uc)) {
- case UCALL_ABORT:
REPORT_GUEST_ASSERT(uc);
/* NOT REACHED */
Eh, just put a "break;" here and drop the comment.
- case UCALL_DONE:
goto done;
break;
- default:
TEST_FAIL("Unknown ucall 0x%lx.", uc.cmd);
- }
+done:
- TEST_ASSERT(halt_exits == 0,
So in all honesty, this isn't a very interesting test. It's more of a CPU test than a KVM test. I do think it's worth adding, because why not.
But I'd also like to see a testcase for KVM_X86_DISABLE_EXITS_HLT. It would be a generic test, i.e. not specific to idle-hlt since there is no dependency and the test shouldn't care. I _think_ it would be fairly straightforward: create a VM without an in-kernel APIC (so that HLT exits to userspace), disable HLT interception, send a signal from a different task after some delay, and execute HLT in the guest. Then verify the vCPU exited because of -EINTR and not HLT.
"Test Failed:\n"
"Guest executed VINTR followed by halts: %d times\n"
"The guest exited due to halt: %ld times and number\n"
"of vintr exits: %ld and vintr got re-injected: %ld times\n",
NUM_ITERATIONS, halt_exits, vintr_exits, *pvintr_rcvd);
I appreciate the effort to provide more info, but this is way too noisy. If anything, print gory details in a pr_debug() *before* the assert (see below), and then simply do:
TEST_ASSERT_EQ(halt_exits, 0);
- fprintf(stderr,
"Test Successful:\n"
"Guest executed VINTR followed by halts: %d times\n"
"The guest exited due to halt: %ld times and number\n"
"of vintr exits: %ld and vintr got re-injected: %ld times\n",
NUM_ITERATIONS, halt_exits, vintr_exits, *pvintr_rcvd);
And this should be pr_debug(), because no human is going to look at this except when the test isn't working correctly.
- kvm_vm_free(vm);
- return 0;
+}
2.34.1 /pvintr_rcvd
On Thu, Mar 07, 2024, Sean Christopherson wrote:
On Thu, Mar 07, 2024, Manali Shukla wrote: From: Manali Shukla Manali.Shukla@amd.com
- xapic_enable();
- icr_val = (APIC_DEST_SELF | APIC_INT_ASSERT | VINTR_VECTOR);
- for (i = 0; i < NUM_ITERATIONS; i++) {
xapic_write_reg(APIC_ICR, icr_val);
asm volatile("sti; hlt; cli");
Please add safe_halt() and cli() helpers in processor.h. And then do:
Doh, saw something shiny and forgot to finish my though. For safe_halt(), copy the thing verbatim from KVM-Unit-Tests, as not everyone is familiar with the sti=>hlt trick.
/* * Execute HLT in an STI interrupt shadow to ensure that a pending IRQ that's * intended to be a wake event arrives *after* HLT is executed. Modern CPUs, * except for a few oddballs that KVM is unlikely to run on, block IRQs for one * instruction after STI, *if* RFLAGS.IF=0 before STI. Note, Intel CPUs may * block other events beyond regular IRQs, e.g. may block NMIs and SMIs too. */ static inline void safe_halt(void) { asm volatile("sti; hlt"); }
Hi Sean,
Thank you for reviewing my patches.
On 3/7/2024 11:52 PM, Sean Christopherson wrote:
On Thu, Mar 07, 2024, Manali Shukla wrote:
From: Manali Shukla Manali.Shukla@amd.com
The Execution of the HLT instruction results in a VMEXIT. Hypervisor observes pending V_INTR and V_NMI events just after the VMEXIT generated by the HLT for the vCPU and causes VM entry to service the pending events. The Idle HLT intercept feature avoids the wasteful VMEXIT during the halt if there are pending V_INTR and V_NMI events for the vCPU.
The selftest for the Idle HLT intercept instruments the above-mentioned scenario.
Signed-off-by: Manali Shukla Manali.Shukla@amd.com
tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/x86_64/svm_idlehlt_test.c | 118 ++++++++++++++++++ 2 files changed, 119 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86_64/svm_idlehlt_test.c
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 492e937fab00..7ad03dc4f35e 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -89,6 +89,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/smaller_maxphyaddr_emulation_test TEST_GEN_PROGS_x86_64 += x86_64/smm_test TEST_GEN_PROGS_x86_64 += x86_64/state_test TEST_GEN_PROGS_x86_64 += x86_64/vmx_preemption_timer_test +TEST_GEN_PROGS_x86_64 += x86_64/svm_idlehlt_test
Uber nit, maybe svm_idle_hlt_test? I find "idlehlt" hard to parse.
Sure I will change it to svm_idle_hlt_test.
TEST_GEN_PROGS_x86_64 += x86_64/svm_vmcall_test TEST_GEN_PROGS_x86_64 += x86_64/svm_int_ctl_test TEST_GEN_PROGS_x86_64 += x86_64/svm_nested_shutdown_test diff --git a/tools/testing/selftests/kvm/x86_64/svm_idlehlt_test.c b/tools/testing/selftests/kvm/x86_64/svm_idlehlt_test.c new file mode 100644 index 000000000000..1564511799d4 --- /dev/null +++ b/tools/testing/selftests/kvm/x86_64/svm_idlehlt_test.c @@ -0,0 +1,118 @@ +// SPDX-License-Identifier: GPL-2.0-only +/*
- svm_idlehalt_test
Please omit this, file comments that state the name of the test inevitably become stale (see above).
Sure. I will remove it.
- Copyright (C) 2024 Advanced Micro Devices, Inc.
- For licencing details see kernel-base/COPYING
This seems gratuitous, doesn't the SPDX stuff take care this?
Agreed, I will remove this.
- Author:
- Manali Shukla manali.shukla@amd.com
- */
+#include "kvm_util.h" +#include "svm_util.h" +#include "processor.h" +#include "test_util.h" +#include "apic.h"
+#define VINTR_VECTOR 0x30 +#define NUM_ITERATIONS 100000
What's the runtime? If it's less than a second, then whatever, but if it's at all longer than that, then I'd prefer to use a lower default and make this user- configurable.
It takes ~34s to run this test.
+/*
- Incremented in the VINTR handler. Provides evidence to the sender that the
- VINR is arrived at the destination.
Evidence is useless if there's no detective looking for it. Yeah, it gets printed out in the end, but in reality, no one is going to look at that.
Rather than read this from the host, just make it a non-volatile bool and assert that it set after every
Sure. I can do that.
- */
+static volatile uint64_t vintr_rcvd;
+void verify_apic_base_addr(void) +{
- uint64_t msr = rdmsr(MSR_IA32_APICBASE);
- uint64_t base = GET_APIC_BASE(msr);
- GUEST_ASSERT(base == APIC_DEFAULT_GPA);
+}
+/*
- The halting guest code instruments the scenario where there is a V_INTR pending
- event available while hlt instruction is executed. The HLT VM Exit doesn't
- occur in above-mentioned scenario if the Idle HLT intercept feature is enabled.
- */
+static void halter_guest_code(void)
Just "guest_code()". Yeah, it's a weird generic name, but at this point it's so ubiquitous that it's analogous to main(), i.e. readers know that guest_code() is the entry point. And deviating from that suggests that there is a second vCPU running _other_ guest code (otherwise, why differentiate?), which isn't the case.
Sure.
+{
- uint32_t icr_val;
- int i;
- verify_apic_base_addr();
Why? The test will fail if the APIC is borked, this is just unnecessary noise that distracts readers.
Sure. I will remove it in V2.
- xapic_enable();
- icr_val = (APIC_DEST_SELF | APIC_INT_ASSERT | VINTR_VECTOR);
- for (i = 0; i < NUM_ITERATIONS; i++) {
xapic_write_reg(APIC_ICR, icr_val);
asm volatile("sti; hlt; cli");
Please add safe_halt() and cli() helpers in processor.h. And then do:
cli(); xapic_write_reg(APIC_ICR, icr_val); safe_halt();
to guarantee that interrupts are disabled when the IPI is sent. And as above,
GUEST_ASSERT(READ_ONCE(irq_received)); WRITE_ONCE(irq_received, false);
Sure.
- }
- GUEST_DONE();
+}
+static void guest_vintr_handler(struct ex_regs *regs) +{
- vintr_rcvd++;
- xapic_write_reg(APIC_EOI, 0x30);
EOI is typically written with '0', not the vector, because the legacy EOI register clears the highest ISR vector, not whatever is specified. IIRC, one of the Intel or AMD specs even says to use '0'.
AMD's Specific EOI register does allow targeting a specific vector, but that's not what's being used here.
Sure.
+}
+int main(int argc, char *argv[]) +{
- struct kvm_vm *vm;
- struct kvm_vcpu *vcpu;
- struct ucall uc;
- uint64_t halt_exits, vintr_exits;
- uint64_t *pvintr_rcvd;
- TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_SVM));
No, this test doesn't require SVM, which is KVM advertising *nested* SVM. This test does require idle-hlt support though...
Sure. I will add it in V2.
- /* Check the extension for binary stats */
- TEST_REQUIRE(kvm_has_cap(KVM_CAP_BINARY_STATS_FD));
- vm = vm_create_with_one_vcpu(&vcpu, halter_guest_code);
- vm_init_descriptor_tables(vm);
- vcpu_init_descriptor_tables(vcpu);
- vm_install_exception_handler(vm, VINTR_VECTOR, guest_vintr_handler);
- virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA);
- vcpu_run(vcpu);
- TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
- halt_exits = vcpu_get_stat(vcpu, "halt_exits");
Is there really no way to get binary stats without having to pass in strings?
I could see one of the test case is passing the strings to get binary stats of vm. That is why I used strings to get binary stats of vcpu. I will try to find another way to get the binary stats.
- vintr_exits = vcpu_get_stat(vcpu, "irq_window_exits");
- pvintr_rcvd = (uint64_t *)addr_gva2hva(vm, (uint64_t)&vintr_rcvd);
- switch (get_ucall(vcpu, &uc)) {
- case UCALL_ABORT:
REPORT_GUEST_ASSERT(uc);
/* NOT REACHED */
Eh, just put a "break;" here and drop the comment.
Sure.
- case UCALL_DONE:
goto done;
break;
- default:
TEST_FAIL("Unknown ucall 0x%lx.", uc.cmd);
- }
+done:
- TEST_ASSERT(halt_exits == 0,
So in all honesty, this isn't a very interesting test. It's more of a CPU test than a KVM test. I do think it's worth adding, because why not.
But I'd also like to see a testcase for KVM_X86_DISABLE_EXITS_HLT. It would be a generic test, i.e. not specific to idle-hlt since there is no dependency and the test shouldn't care. I _think_ it would be fairly straightforward: create a VM without an in-kernel APIC (so that HLT exits to userspace), disable HLT interception, send a signal from a different task after some delay, and execute HLT in the guest. Then verify the vCPU exited because of -EINTR and not HLT.
I will create this test.
"Test Failed:\n"
"Guest executed VINTR followed by halts: %d times\n"
"The guest exited due to halt: %ld times and number\n"
"of vintr exits: %ld and vintr got re-injected: %ld times\n",
NUM_ITERATIONS, halt_exits, vintr_exits, *pvintr_rcvd);
I appreciate the effort to provide more info, but this is way too noisy. If anything, print gory details in a pr_debug() *before* the assert (see below), and then simply do:
TEST_ASSERT_EQ(halt_exits, 0);
Sure.
- fprintf(stderr,
"Test Successful:\n"
"Guest executed VINTR followed by halts: %d times\n"
"The guest exited due to halt: %ld times and number\n"
"of vintr exits: %ld and vintr got re-injected: %ld times\n",
NUM_ITERATIONS, halt_exits, vintr_exits, *pvintr_rcvd);
And this should be pr_debug(), because no human is going to look at this except when the test isn't working correctly.
I will change it to pr_debug() in V2.
- kvm_vm_free(vm);
- return 0;
+}
2.34.1 /pvintr_rcvd
On Thu, Mar 14, 2024, Manali Shukla wrote:
+#define VINTR_VECTOR 0x30 +#define NUM_ITERATIONS 100000
What's the runtime? If it's less than a second, then whatever, but if it's at all longer than that, then I'd prefer to use a lower default and make this user- configurable.
It takes ~34s to run this test.
LOL, yeah, no. That's 33+ seconds of wasted time. From a *KVM* perspective, this is quite binary: either KVM intercepts HLT or it doesn't. Any timing bugs would be purely CPU bugs.
Please adjust this to have the default runtime <1 second. If you feel strongly about the need for a long-running test, feel free to add a command line option to control the number of iterations.
On 3/14/2024 8:36 PM, Sean Christopherson wrote:
On Thu, Mar 14, 2024, Manali Shukla wrote:
+#define VINTR_VECTOR 0x30 +#define NUM_ITERATIONS 100000
What's the runtime? If it's less than a second, then whatever, but if it's at all longer than that, then I'd prefer to use a lower default and make this user- configurable.
It takes ~34s to run this test.
LOL, yeah, no. That's 33+ seconds of wasted time. From a *KVM* perspective, this is quite binary: either KVM intercepts HLT or it doesn't. Any timing bugs would be purely CPU bugs.>> Please adjust this to have the default runtime <1 second. If you feel strongly about the need for a long-running test, feel free to add a command line option to control the number of iterations.
Ack.
linux-kselftest-mirror@lists.linaro.org