Changes since RFC v1: * add two kselftests (patch 10-11) * set virtual MSRs also on APs [Pawan] * enable "virtualize IA32_SPEC_CTRL" for L2 to prevent L2 from changing some bits of IA32_SPEC_CTRL (patch 4) * other misc cleanup and cosmetic changes
RFC v1: https://lore.kernel.org/lkml/20221210160046.2608762-1-chen.zhang@intel.com/
This series introduces "virtualize IA32_SPEC_CTRL" support. Here are introduction and use cases of this new feature.
### Virtualize IA32_SPEC_CTRL
"Virtualize IA32_SPEC_CTRL" [1] is a new VMX feature on Intel CPUs. This feature allows VMM to lock some bits of IA32_SPEC_CTRL MSR even when the MSR is pass-thru'd to a guest.
### Use cases of "virtualize IA32_SPEC_CTRL" [2]
Software mitigations like Retpoline and software BHB-clearing sequence depend on CPU microarchitectures. And guest cannot know exactly the underlying microarchitecture. When a guest is migrated between processors of different microarchitectures, software mitigations which work perfectly on previous microachitecture may be not effective on the new one. To fix the problem, some hardware mitigations should be used in conjunction with software mitigations. Using virtual IA32_SPEC_CTRL, VMM can enforce hardware mitigations transparently to guests and avoid those hardware mitigations being unintentionally disabled when guest changes IA32_SPEC_CTRL MSR.
### Intention of this series
This series adds the capability of enforcing hardware mitigations for guests transparently and efficiently (i.e., without intecepting IA32_SPEC_CTRL MSR accesses) to kvm. The capability can be used to solve the VM migration issue in a pool consisting of processors of different microarchitectures.
Specifically, below are two target scenarios of this series:
Scenario 1: If retpoline is used by a VM to mitigate IMBTI in CPL0, VMM can set RRSBA_DIS_S on parts enumerates RRSBA. Note that the VM is presented with a microarchitecture doesn't enumerate RRSBA.
Scenario 2: If a VM uses software BHB-clearing sequence on transitions into CPL0 to mitigate BHI, VMM can use "virtualize IA32_SPEC_CTRL" to set BHI_DIS_S on new parts which doesn't enumerate BHI_NO.
Intel defines some virtual MSRs [2] for guests to report in-use software mitigations. This allows guests to opt in VMM's deploying hardware mitigations for them if the guests are either running or later migrated to a system on which in-use software mitigations are not effective. The virtual MSRs interface is also added in this series.
### Organization of this series
1. Patch 1-3 Advertise RRSBA_CTRL and BHI_CTRL to guest 2. Patch 4 Add "virtualize IA32_SPEC_CTRL" support 3. Patch 5-9 Allow guests to report in-use software mitigations to KVM so that KVM can enable hardware mitigations for guests. 4. Patch 10-11 Add kselftest for virtual MSRs and IA32_SPEC_CTRL
[1]: https://cdrdv2.intel.com/v1/dl/getContent/671368 Ref. #319433-047 Chapter 12 [2]: https://www.intel.com/content/www/us/en/developer/articles/technical/softwar...
Chao Gao (3): KVM: VMX: Advertise MITI_ENUM_RETPOLINE_S_SUPPORT KVM: selftests: Add tests for virtual enumeration/mitigation MSRs KVM: selftests: Add tests for IA32_SPEC_CTRL MSR
Pawan Gupta (1): x86/bugs: Use Virtual MSRs to request hardware mitigations
Zhang Chen (7): x86/msr-index: Add bit definitions for BHI_DIS_S and BHI_NO KVM: x86: Advertise CPUID.7.2.EDX and RRSBA_CTRL support KVM: x86: Advertise BHI_CTRL support KVM: VMX: Add IA32_SPEC_CTRL virtualization support KVM: x86: Advertise ARCH_CAP_VIRTUAL_ENUM support KVM: VMX: Advertise MITIGATION_CTRL support KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT
arch/x86/include/asm/msr-index.h | 33 +++- arch/x86/include/asm/vmx.h | 5 + arch/x86/include/asm/vmxfeatures.h | 2 + arch/x86/kernel/cpu/bugs.c | 25 +++ arch/x86/kvm/cpuid.c | 22 ++- arch/x86/kvm/reverse_cpuid.h | 8 + arch/x86/kvm/svm/svm.c | 3 + arch/x86/kvm/vmx/capabilities.h | 5 + arch/x86/kvm/vmx/nested.c | 13 ++ arch/x86/kvm/vmx/vmcs.h | 2 + arch/x86/kvm/vmx/vmx.c | 112 ++++++++++- arch/x86/kvm/vmx/vmx.h | 43 ++++- arch/x86/kvm/x86.c | 19 +- tools/arch/x86/include/asm/msr-index.h | 37 +++- tools/testing/selftests/kvm/Makefile | 2 + .../selftests/kvm/include/x86_64/processor.h | 5 + .../selftests/kvm/x86_64/spec_ctrl_msr_test.c | 178 ++++++++++++++++++ .../kvm/x86_64/virtual_mitigation_msr_test.c | 175 +++++++++++++++++ 18 files changed, 676 insertions(+), 13 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86_64/spec_ctrl_msr_test.c create mode 100644 tools/testing/selftests/kvm/x86_64/virtual_mitigation_msr_test.c
base-commit: 400d2132288edbd6d500f45eab5d85526ca94e46
Three virtual MSRs added for guest to report the usage of software mitigations. They are enumerated in an architectural way. Try to access the three MSRs to ensure the behavior is expected: Specifically,
1. below three cases should cause #GP: * access to a non-present MSR * write to read-only MSRs * toggling reserved bit of a writeable MSR
2. rdmsr/wrmsr in other cases should succeed
3. rdmsr should return the value last written
Signed-off-by: Chao Gao chao.gao@intel.com --- tools/arch/x86/include/asm/msr-index.h | 23 +++ tools/testing/selftests/kvm/Makefile | 1 + .../kvm/x86_64/virtual_mitigation_msr_test.c | 175 ++++++++++++++++++ 3 files changed, 199 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86_64/virtual_mitigation_msr_test.c
diff --git a/tools/arch/x86/include/asm/msr-index.h b/tools/arch/x86/include/asm/msr-index.h index 6079a5fdb40b..55f75e9ebbb7 100644 --- a/tools/arch/x86/include/asm/msr-index.h +++ b/tools/arch/x86/include/asm/msr-index.h @@ -166,6 +166,7 @@ * IA32_XAPIC_DISABLE_STATUS MSR * supported */ +#define ARCH_CAP_VIRTUAL_ENUM BIT_ULL(63) /* MSR_VIRTUAL_ENUMERATION supported */
#define MSR_IA32_FLUSH_CMD 0x0000010b #define L1D_FLUSH BIT(0) /* @@ -1103,6 +1104,28 @@ #define MSR_IA32_VMX_MISC_INTEL_PT (1ULL << 14) #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL << 29) #define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE 0x1F + +/* Intel virtual MSRs */ +#define MSR_VIRTUAL_ENUMERATION 0x50000000 +#define VIRT_ENUM_MITIGATION_CTRL_SUPPORT BIT(0) /* + * Mitigation ctrl via virtual + * MSRs supported + */ + +#define MSR_VIRTUAL_MITIGATION_ENUM 0x50000001 +#define MITI_ENUM_BHB_CLEAR_SEQ_S_SUPPORT BIT(0) /* VMM supports BHI_DIS_S */ +#define MITI_ENUM_RETPOLINE_S_SUPPORT BIT(1) /* VMM supports RRSBA_DIS_S */ + +#define MSR_VIRTUAL_MITIGATION_CTRL 0x50000002 +#define MITI_CTRL_BHB_CLEAR_SEQ_S_USED BIT(0) /* + * Request VMM to deploy + * BHI_DIS_S mitigation + */ +#define MITI_CTRL_RETPOLINE_S_USED BIT(1) /* + * Request VMM to deploy + * RRSBA_DIS_S mitigation + */ + /* AMD-V MSRs */
#define MSR_VM_CR 0xc0010114 diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 84a627c43795..9db9a7e49a54 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -115,6 +115,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/sev_migrate_tests TEST_GEN_PROGS_x86_64 += x86_64/amx_test TEST_GEN_PROGS_x86_64 += x86_64/max_vcpuid_cap_test TEST_GEN_PROGS_x86_64 += x86_64/triple_fault_event_test +TEST_GEN_PROGS_x86_64 += x86_64/virtual_mitigation_msr_test TEST_GEN_PROGS_x86_64 += access_tracking_perf_test TEST_GEN_PROGS_x86_64 += demand_paging_test TEST_GEN_PROGS_x86_64 += dirty_log_test diff --git a/tools/testing/selftests/kvm/x86_64/virtual_mitigation_msr_test.c b/tools/testing/selftests/kvm/x86_64/virtual_mitigation_msr_test.c new file mode 100644 index 000000000000..4d924a0cf2dd --- /dev/null +++ b/tools/testing/selftests/kvm/x86_64/virtual_mitigation_msr_test.c @@ -0,0 +1,175 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2023, Intel, Inc. + * + * tests for virtual mitigation MSR accesses + */ + +#include <fcntl.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/ioctl.h> + +#include "test_util.h" + +#include "kvm_util.h" +#include "processor.h" + +static int guest_exception_count; +static int expected_exception_count; +static void guest_gp_handler(struct ex_regs *regs) +{ + /* RDMSR/WRMSR are 2 bytes */ + regs->rip += 2; + ++guest_exception_count; +} + +static void write_msr_expect_gp(uint32_t msr, uint64_t val) +{ + uint64_t old_val; + + old_val = rdmsr(msr); + wrmsr(msr, val); + expected_exception_count++; + GUEST_ASSERT_2(guest_exception_count == expected_exception_count, + guest_exception_count, expected_exception_count); + GUEST_ASSERT_2(rdmsr(msr) == old_val, rdmsr(msr), old_val); +} + +static void write_msr_expect_no_gp(uint32_t msr, uint64_t val) +{ + wrmsr(msr, val); + GUEST_ASSERT_EQ(guest_exception_count, expected_exception_count); + GUEST_ASSERT_EQ(rdmsr(msr), val); +} + +static void read_msr_expect_gp(uint32_t msr) +{ + (void)rdmsr(msr); + expected_exception_count++; + GUEST_ASSERT_2(guest_exception_count == expected_exception_count, + guest_exception_count, expected_exception_count); +} + +static void guest_code_with_virtual_mitigation_ctrl(void) +{ + uint64_t val, miti_ctrl = 0; + int i; + + val = rdmsr(MSR_VIRTUAL_ENUMERATION); + /* MSR_VIRTUAL_ENUMERATION is read-only. #GP is expected on write */ + write_msr_expect_gp(MSR_VIRTUAL_ENUMERATION, val); + + val = rdmsr(MSR_VIRTUAL_MITIGATION_ENUM); + /* MSR_VIRTUAL_MITIGATION_ENUM is read-only. #GP is expected on write */ + write_msr_expect_gp(MSR_VIRTUAL_MITIGATION_ENUM, val); + + for (i = 0; i < 64; i++) { + if (val & BIT_ULL(i)) { + miti_ctrl |= BIT_ULL(i); + write_msr_expect_no_gp(MSR_VIRTUAL_MITIGATION_CTRL, miti_ctrl); + } else { + write_msr_expect_gp(MSR_VIRTUAL_MITIGATION_CTRL, miti_ctrl | BIT_ULL(i)); + } + } + + write_msr_expect_no_gp(MSR_VIRTUAL_MITIGATION_CTRL, 0); + GUEST_DONE(); +} + +static void guest_code_no_virtual_enumeration(void) +{ + read_msr_expect_gp(MSR_VIRTUAL_ENUMERATION); + read_msr_expect_gp(MSR_VIRTUAL_MITIGATION_ENUM); + read_msr_expect_gp(MSR_VIRTUAL_MITIGATION_CTRL); + GUEST_DONE(); +} + +bool kvm_cpu_has_virtual_mitigation_ctrl(void) +{ + const struct kvm_msr_list *feature_list; + u64 virt_enum = 0; + int i; + + feature_list = kvm_get_feature_msr_index_list(); + for (i = 0; i < feature_list->nmsrs; i++) { + if (feature_list->indices[i] == MSR_VIRTUAL_ENUMERATION) + virt_enum = kvm_get_feature_msr(MSR_VIRTUAL_ENUMERATION); + } + + return virt_enum & VIRT_ENUM_MITIGATION_CTRL_SUPPORT; +} + +static void enable_virtual_mitigation_ctrl(struct kvm_vcpu *vcpu) +{ + vcpu_set_msr(vcpu, MSR_IA32_ARCH_CAPABILITIES, ARCH_CAP_VIRTUAL_ENUM); + vcpu_set_msr(vcpu, MSR_VIRTUAL_ENUMERATION, VIRT_ENUM_MITIGATION_CTRL_SUPPORT); + vcpu_set_msr(vcpu, MSR_VIRTUAL_MITIGATION_ENUM, + kvm_get_feature_msr(MSR_VIRTUAL_MITIGATION_ENUM)); +} + +static void disable_virtual_enumeration(struct kvm_vcpu *vcpu) +{ + vcpu_set_msr(vcpu, MSR_IA32_ARCH_CAPABILITIES, 0); +} + +static void test_virtual_mitiation_ctrl(bool enable) +{ + struct kvm_vcpu *vcpu; + struct kvm_run *run; + struct kvm_vm *vm; + struct ucall uc; + void *guest_code; + + guest_code = enable ? guest_code_with_virtual_mitigation_ctrl : + guest_code_no_virtual_enumeration; + + vm = vm_create_with_one_vcpu(&vcpu, guest_code); + run = vcpu->run; + + if (enable) + enable_virtual_mitigation_ctrl(vcpu); + else + disable_virtual_enumeration(vcpu); + + + /* Register #GP handler */ + vm_init_descriptor_tables(vm); + vcpu_init_descriptor_tables(vcpu); + vm_install_exception_handler(vm, GP_VECTOR, guest_gp_handler); + + while (1) { + vcpu_run(vcpu); + + TEST_ASSERT(run->exit_reason == KVM_EXIT_IO, + "Unexpected exit reason: %u (%s),\n", + run->exit_reason, + exit_reason_str(run->exit_reason)); + + switch (get_ucall(vcpu, &uc)) { + case UCALL_ABORT: + REPORT_GUEST_ASSERT_2(uc, "real %ld expected %ld"); + break; + case UCALL_DONE: + goto done; + default: + TEST_FAIL("Unknown ucall %lu", uc.cmd); + } + } + +done: + kvm_vm_free(vm); +} + +int main(int argc, char *argv[]) +{ + TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)); + TEST_REQUIRE(kvm_has_cap(KVM_CAP_GET_MSR_FEATURES)); + TEST_REQUIRE(kvm_cpu_has_virtual_mitigation_ctrl()); + + test_virtual_mitiation_ctrl(true); + test_virtual_mitiation_ctrl(false); + + return 0; +}
On 4/14/2023 2:25 PM, Chao Gao wrote:
Three virtual MSRs added for guest to report the usage of software
Seems it's better like below. s/Three virtual MSRs added/Add three virtual MSRs ?
mitigations. They are enumerated in an architectural way. Try to access the three MSRs to ensure the behavior is expected: Specifically,
- below three cases should cause #GP:
- access to a non-present MSR
- write to read-only MSRs
- toggling reserved bit of a writeable MSR
rdmsr/wrmsr in other cases should succeed
rdmsr should return the value last written
Signed-off-by: Chao Gao chao.gao@intel.com
Thanks, Jingqi
Toggle supported bits of IA32_SPEC_CTRL and verify the result. And also verify the MSR value is preserved across nested transitions.
Signed-off-by: Chao Gao chao.gao@intel.com --- tools/arch/x86/include/asm/msr-index.h | 6 + tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/include/x86_64/processor.h | 5 + .../selftests/kvm/x86_64/spec_ctrl_msr_test.c | 178 ++++++++++++++++++ 4 files changed, 190 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86_64/spec_ctrl_msr_test.c
diff --git a/tools/arch/x86/include/asm/msr-index.h b/tools/arch/x86/include/asm/msr-index.h index 55f75e9ebbb7..9ad6c307c0d0 100644 --- a/tools/arch/x86/include/asm/msr-index.h +++ b/tools/arch/x86/include/asm/msr-index.h @@ -48,6 +48,12 @@ #define SPEC_CTRL_STIBP BIT(SPEC_CTRL_STIBP_SHIFT) /* STIBP mask */ #define SPEC_CTRL_SSBD_SHIFT 2 /* Speculative Store Bypass Disable bit */ #define SPEC_CTRL_SSBD BIT(SPEC_CTRL_SSBD_SHIFT) /* Speculative Store Bypass Disable */ +#define SPEC_CTRL_IPRED_DIS_U_SHIFT 3 /* Disable IPRED behavior in user mode */ +#define SPEC_CTRL_IPRED_DIS_U BIT(SPEC_CTRL_IPRED_DIS_U_SHIFT) +#define SPEC_CTRL_IPRED_DIS_S_SHIFT 4 /* Disable IPRED behavior in supervisor mode */ +#define SPEC_CTRL_IPRED_DIS_S BIT(SPEC_CTRL_IPRED_DIS_S_SHIFT) +#define SPEC_CTRL_RRSBA_DIS_U_SHIFT 5 /* Disable RRSBA behavior in user mode */ +#define SPEC_CTRL_RRSBA_DIS_U BIT(SPEC_CTRL_RRSBA_DIS_U_SHIFT) #define SPEC_CTRL_RRSBA_DIS_S_SHIFT 6 /* Disable RRSBA behavior in supervisor mode */ #define SPEC_CTRL_RRSBA_DIS_S BIT(SPEC_CTRL_RRSBA_DIS_S_SHIFT) #define SPEC_CTRL_BHI_DIS_S_SHIFT 10 /* Disable BHI behavior in supervisor mode */ diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 9db9a7e49a54..9f117cf80477 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -116,6 +116,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/amx_test TEST_GEN_PROGS_x86_64 += x86_64/max_vcpuid_cap_test TEST_GEN_PROGS_x86_64 += x86_64/triple_fault_event_test TEST_GEN_PROGS_x86_64 += x86_64/virtual_mitigation_msr_test +TEST_GEN_PROGS_x86_64 += x86_64/spec_ctrl_msr_test TEST_GEN_PROGS_x86_64 += access_tracking_perf_test TEST_GEN_PROGS_x86_64 += demand_paging_test TEST_GEN_PROGS_x86_64 += dirty_log_test diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h index 90387ddcb2a9..355aba25dfef 100644 --- a/tools/testing/selftests/kvm/include/x86_64/processor.h +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h @@ -125,8 +125,13 @@ struct kvm_x86_cpu_feature { #define X86_FEATURE_IBT KVM_X86_CPU_FEATURE(0x7, 0, EDX, 20) #define X86_FEATURE_AMX_TILE KVM_X86_CPU_FEATURE(0x7, 0, EDX, 24) #define X86_FEATURE_SPEC_CTRL KVM_X86_CPU_FEATURE(0x7, 0, EDX, 26) +#define X86_FEATURE_INTEL_STIBP KVM_X86_CPU_FEATURE(0x7, 0, EDX, 27) +#define X86_FEATURE_SPEC_CTRL_SSBD KVM_X86_CPU_FEATURE(0x7, 0, EDX, 31) #define X86_FEATURE_ARCH_CAPABILITIES KVM_X86_CPU_FEATURE(0x7, 0, EDX, 29) #define X86_FEATURE_PKS KVM_X86_CPU_FEATURE(0x7, 0, ECX, 31) +#define X86_FEATURE_IPRED_CTRL KVM_X86_CPU_FEATURE(0x7, 2, EDX, 1) +#define X86_FEATURE_RRSBA_CTRL KVM_X86_CPU_FEATURE(0x7, 2, EDX, 2) +#define X86_FEATURE_BHI_CTRL KVM_X86_CPU_FEATURE(0x7, 2, EDX, 4) #define X86_FEATURE_XTILECFG KVM_X86_CPU_FEATURE(0xD, 0, EAX, 17) #define X86_FEATURE_XTILEDATA KVM_X86_CPU_FEATURE(0xD, 0, EAX, 18) #define X86_FEATURE_XSAVES KVM_X86_CPU_FEATURE(0xD, 1, EAX, 3) diff --git a/tools/testing/selftests/kvm/x86_64/spec_ctrl_msr_test.c b/tools/testing/selftests/kvm/x86_64/spec_ctrl_msr_test.c new file mode 100644 index 000000000000..ced4640ee92e --- /dev/null +++ b/tools/testing/selftests/kvm/x86_64/spec_ctrl_msr_test.c @@ -0,0 +1,178 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2023, Intel, Inc. + * + * tests for IA32_SPEC_CTRL MSR accesses + */ + +#include <fcntl.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/ioctl.h> + +#include "test_util.h" + +#include "kvm_util.h" +#include "vmx.h" +#include "processor.h" + +static void set_spec_ctrl(u64 val) +{ + /* Set the bit and verify the result */ + wrmsr(MSR_IA32_SPEC_CTRL, val); + GUEST_ASSERT_2(rdmsr(MSR_IA32_SPEC_CTRL) == val, rdmsr(MSR_IA32_SPEC_CTRL), val); + + /* Clear the bit and verify the result */ + val = 0; + wrmsr(MSR_IA32_SPEC_CTRL, val); + GUEST_ASSERT_2(rdmsr(MSR_IA32_SPEC_CTRL) == val, rdmsr(MSR_IA32_SPEC_CTRL), val); +} + +static void guest_code(void) +{ + set_spec_ctrl(SPEC_CTRL_IBRS); + + if (this_cpu_has(X86_FEATURE_INTEL_STIBP)) + set_spec_ctrl(SPEC_CTRL_STIBP); + + if (this_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD)) + set_spec_ctrl(SPEC_CTRL_SSBD); + + if (this_cpu_has(X86_FEATURE_IPRED_CTRL)) { + set_spec_ctrl(SPEC_CTRL_IPRED_DIS_S); + set_spec_ctrl(SPEC_CTRL_IPRED_DIS_U); + } + + if (this_cpu_has(X86_FEATURE_RRSBA_CTRL)) { + set_spec_ctrl(SPEC_CTRL_RRSBA_DIS_S); + set_spec_ctrl(SPEC_CTRL_RRSBA_DIS_U); + } + + if (this_cpu_has(X86_FEATURE_BHI_CTRL)) + set_spec_ctrl(SPEC_CTRL_BHI_DIS_S); + + GUEST_DONE(); +} + +static void test_spec_ctrl_access(void) +{ + struct kvm_vcpu *vcpu; + struct kvm_run *run; + struct kvm_vm *vm; + struct ucall uc; + + vm = vm_create_with_one_vcpu(&vcpu, guest_code); + run = vcpu->run; + + while (1) { + vcpu_run(vcpu); + + TEST_ASSERT(run->exit_reason == KVM_EXIT_IO, + "Unexpected exit reason: %u (%s),\n", + run->exit_reason, + exit_reason_str(run->exit_reason)); + + switch (get_ucall(vcpu, &uc)) { + case UCALL_ABORT: + REPORT_GUEST_ASSERT_2(uc, "real %ld expected %ld"); + break; + case UCALL_DONE: + goto done; + default: + TEST_FAIL("Unknown ucall %lu", uc.cmd); + } + } + +done: + kvm_vm_free(vm); +} + +static void l2_guest_code(void) +{ + GUEST_ASSERT(rdmsr(MSR_IA32_SPEC_CTRL) == SPEC_CTRL_IBRS); + wrmsr(MSR_IA32_SPEC_CTRL, 0); + + /* Exit to L1 */ + __asm__ __volatile__("vmcall"); +} + +static void l1_guest_code(struct vmx_pages *vmx_pages) +{ +#define L2_GUEST_STACK_SIZE 64 + unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE]; + uint32_t control; + + /* + * Try to disable interception of writes to SPEC_CTRL by writing a + * non-0 value. This test is intended to verify that SPEC_CTRL is + * preserved across nested transitions particuarlly when writes to + * the MSR isn't intercepted by L0 VMM or L1 VMM. + */ + wrmsr(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS); + + GUEST_ASSERT(vmx_pages->vmcs_gpa); + GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages)); + GUEST_ASSERT(load_vmcs(vmx_pages)); + GUEST_ASSERT(vmptrstz() == vmx_pages->vmcs_gpa); + prepare_vmcs(vmx_pages, l2_guest_code, + &l2_guest_stack[L2_GUEST_STACK_SIZE]); + + control = vmreadz(CPU_BASED_VM_EXEC_CONTROL); + control |= CPU_BASED_USE_MSR_BITMAPS; + vmwrite(CPU_BASED_VM_EXEC_CONTROL, control); + + GUEST_ASSERT(!vmlaunch()); + + GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL); + GUEST_ASSERT(rdmsr(MSR_IA32_SPEC_CTRL) == 0); + + GUEST_DONE(); +} + +static void test_spec_ctrl_vmx_transition(void) +{ + vm_vaddr_t vmx_pages_gva; + struct kvm_vcpu *vcpu; + struct kvm_run *run; + struct kvm_vm *vm; + struct ucall uc; + + TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX)); + + vm = vm_create_with_one_vcpu(&vcpu, l1_guest_code); + vcpu_alloc_vmx(vm, &vmx_pages_gva); + vcpu_args_set(vcpu, 1, vmx_pages_gva); + run = vcpu->run; + + while (1) { + vcpu_run(vcpu); + + TEST_ASSERT(run->exit_reason == KVM_EXIT_IO, + "Unexpected exit reason: %u (%s),\n", + run->exit_reason, + exit_reason_str(run->exit_reason)); + + switch (get_ucall(vcpu, &uc)) { + case UCALL_ABORT: + REPORT_GUEST_ASSERT(uc); + break; + case UCALL_DONE: + goto done; + default: + TEST_FAIL("Unknown ucall %lu", uc.cmd); + } + } + +done: + kvm_vm_free(vm); +} + +int main(int argc, char *argv[]) +{ + TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_SPEC_CTRL)); + test_spec_ctrl_access(); + test_spec_ctrl_vmx_transition(); + + return 0; +}
On 4/14/2023 2:25 PM, Chao Gao wrote:
Changes since RFC v1:
- add two kselftests (patch 10-11)
- set virtual MSRs also on APs [Pawan]
- enable "virtualize IA32_SPEC_CTRL" for L2 to prevent L2 from changing some bits of IA32_SPEC_CTRL (patch 4)
- other misc cleanup and cosmetic changes
RFC v1: https://lore.kernel.org/lkml/20221210160046.2608762-1-chen.zhang@intel.com/
This series introduces "virtualize IA32_SPEC_CTRL" support. Here are introduction and use cases of this new feature.
### Virtualize IA32_SPEC_CTRL
"Virtualize IA32_SPEC_CTRL" [1] is a new VMX feature on Intel CPUs. This feature allows VMM to lock some bits of IA32_SPEC_CTRL MSR even when the MSR is pass-thru'd to a guest.
### Use cases of "virtualize IA32_SPEC_CTRL" [2]
Software mitigations like Retpoline and software BHB-clearing sequence depend on CPU microarchitectures. And guest cannot know exactly the underlying microarchitecture. When a guest is migrated between processors of different microarchitectures, software mitigations which work perfectly on previous microachitecture may be not effective on the new one. To fix the problem, some hardware mitigations should be used in conjunction with software mitigations.
So even the hardware mitigations are enabled, the software mitigations are still needed, right?
Using virtual IA32_SPEC_CTRL, VMM can enforce hardware mitigations transparently to guests and avoid those hardware mitigations being unintentionally disabled when guest changes IA32_SPEC_CTRL MSR.
### Intention of this series
This series adds the capability of enforcing hardware mitigations for guests transparently and efficiently (i.e., without intecepting IA32_SPEC_CTRL MSR
/s/intecepting/intercepting
accesses) to kvm. The capability can be used to solve the VM migration issue in a pool consisting of processors of different microarchitectures.
Specifically, below are two target scenarios of this series:
Scenario 1: If retpoline is used by a VM to mitigate IMBTI in CPL0, VMM can set RRSBA_DIS_S on parts enumerates RRSBA. Note that the VM is presented with a microarchitecture doesn't enumerate RRSBA.
Scenario 2: If a VM uses software BHB-clearing sequence on transitions into CPL0 to mitigate BHI, VMM can use "virtualize IA32_SPEC_CTRL" to set BHI_DIS_S on new parts which doesn't enumerate BHI_NO.
Intel defines some virtual MSRs [2] for guests to report in-use software mitigations. This allows guests to opt in VMM's deploying hardware mitigations for them if the guests are either running or later migrated to a system on which in-use software mitigations are not effective. The virtual MSRs interface is also added in this series.
### Organization of this series
- Patch 1-3 Advertise RRSBA_CTRL and BHI_CTRL to guest
- Patch 4 Add "virtualize IA32_SPEC_CTRL" support
- Patch 5-9 Allow guests to report in-use software mitigations to KVM so that KVM can enable hardware mitigations for guests.
- Patch 10-11 Add kselftest for virtual MSRs and IA32_SPEC_CTRL
[1]: https://cdrdv2.intel.com/v1/dl/getContent/671368 Ref. #319433-047 Chapter 12 [2]: https://www.intel.com/content/www/us/en/developer/articles/technical/softwar...
Chao Gao (3): KVM: VMX: Advertise MITI_ENUM_RETPOLINE_S_SUPPORT KVM: selftests: Add tests for virtual enumeration/mitigation MSRs KVM: selftests: Add tests for IA32_SPEC_CTRL MSR
Pawan Gupta (1): x86/bugs: Use Virtual MSRs to request hardware mitigations
Zhang Chen (7): x86/msr-index: Add bit definitions for BHI_DIS_S and BHI_NO KVM: x86: Advertise CPUID.7.2.EDX and RRSBA_CTRL support KVM: x86: Advertise BHI_CTRL support KVM: VMX: Add IA32_SPEC_CTRL virtualization support KVM: x86: Advertise ARCH_CAP_VIRTUAL_ENUM support KVM: VMX: Advertise MITIGATION_CTRL support KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT
arch/x86/include/asm/msr-index.h | 33 +++- arch/x86/include/asm/vmx.h | 5 + arch/x86/include/asm/vmxfeatures.h | 2 + arch/x86/kernel/cpu/bugs.c | 25 +++ arch/x86/kvm/cpuid.c | 22 ++- arch/x86/kvm/reverse_cpuid.h | 8 + arch/x86/kvm/svm/svm.c | 3 + arch/x86/kvm/vmx/capabilities.h | 5 + arch/x86/kvm/vmx/nested.c | 13 ++ arch/x86/kvm/vmx/vmcs.h | 2 + arch/x86/kvm/vmx/vmx.c | 112 ++++++++++- arch/x86/kvm/vmx/vmx.h | 43 ++++- arch/x86/kvm/x86.c | 19 +- tools/arch/x86/include/asm/msr-index.h | 37 +++- tools/testing/selftests/kvm/Makefile | 2 + .../selftests/kvm/include/x86_64/processor.h | 5 + .../selftests/kvm/x86_64/spec_ctrl_msr_test.c | 178 ++++++++++++++++++ .../kvm/x86_64/virtual_mitigation_msr_test.c | 175 +++++++++++++++++ 18 files changed, 676 insertions(+), 13 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86_64/spec_ctrl_msr_test.c create mode 100644 tools/testing/selftests/kvm/x86_64/virtual_mitigation_msr_test.c
base-commit: 400d2132288edbd6d500f45eab5d85526ca94e46
On Fri, Apr 14, 2023 at 05:51:43PM +0800, Binbin Wu wrote:
On 4/14/2023 2:25 PM, Chao Gao wrote:
Changes since RFC v1:
- add two kselftests (patch 10-11)
- set virtual MSRs also on APs [Pawan]
- enable "virtualize IA32_SPEC_CTRL" for L2 to prevent L2 from changing some bits of IA32_SPEC_CTRL (patch 4)
- other misc cleanup and cosmetic changes
RFC v1: https://lore.kernel.org/lkml/20221210160046.2608762-1-chen.zhang@intel.com/
This series introduces "virtualize IA32_SPEC_CTRL" support. Here are introduction and use cases of this new feature.
### Virtualize IA32_SPEC_CTRL
"Virtualize IA32_SPEC_CTRL" [1] is a new VMX feature on Intel CPUs. This feature allows VMM to lock some bits of IA32_SPEC_CTRL MSR even when the MSR is pass-thru'd to a guest.
### Use cases of "virtualize IA32_SPEC_CTRL" [2]
Software mitigations like Retpoline and software BHB-clearing sequence depend on CPU microarchitectures. And guest cannot know exactly the underlying microarchitecture. When a guest is migrated between processors of different microarchitectures, software mitigations which work perfectly on previous microachitecture may be not effective on the new one. To fix the problem, some hardware mitigations should be used in conjunction with software mitigations.
So even the hardware mitigations are enabled, the software mitigations are still needed, right?
Retpoline mitigation is not fully effective when RET can take prediction from an alternate predictor. Newer hardware provides a way to disable this behavior (using RRSBA_DIS_S bit in MSR SPEC_CTRL).
eIBRS is the preferred way to mitigate BTI, but for some reason when a guest has deployed retpoline, VMM can make it more effective by deploying the relevant hardware control. That is why the above text says:
"... hardware mitigations should be used in conjunction with software mitigations."
linux-kselftest-mirror@lists.linaro.org