Clean up a handful of interrelated warts in the kernel's handling of VMX:
- Enable VMX in IA32_FEATURE_CONTROL during boot instead of on-demand during KVM load to avoid future contention over IA32_FEATURE_CONTROL.
- Rework VMX feature reporting so that it is accurate and up-to-date, now and in the future.
- Consolidate code across CPUs that support VMX.
This series stems from two separate but related issues. The first issue, pointed out by Boris in the SGX enabling series[*], is that the kernel currently doesn't ensure the IA32_FEATURE_CONTROL MSR is configured during boot. The second issue is that the kernel's reporting of VMX features is stale, potentially inaccurate, and difficult to maintain.
v5: - Rebase to tip/master, ec7b10f2d023 ("Merge branch 'x86/cleanups'") - Fix a missing IA32_FEAT_CTL change in the idle driver (which amusingly reads the MSR to deal with SGX). [kbuild test robot] - Tweak the displayed names for VMX flags. [Boris, Paolo, Liran] - Add a comment above the raw rdmsr() calls in the VMX feature flag parsing. [Boris] - Fix a few changelog typos. [Boris] - Use VMX_F() instead of F() for the VMX flag extraction macro. [Boris] - Drop 'PROC' from the PRIMARY_CTLS and SECONDARY_CTLS enums to keep line lengths sane. - Keep the pr_fmt at the top of feat_ctl.c when the flag populating code is introduced (previously got buried in the middle of the file).
v4: - Rebase to tip/master, 8a1b070333f4 ("Merge branch 'WIP.x86/mm'") - Rename everything feature control related to IA32_FEAT_CTL. [Boris] - Minor coding style tweaks [Boris and Jarkko]. - Print VMX feature flags in "vmx flags" to avoid polluting "flags", but keep printing the current synthetic VMX in "flags" so as not to break the ABI. [Boris] - Don't bother printing an error message in the extremely unlikely event VMX is supported but IA32_FEAT_CTL doesn't exist. [Boris] - Beef up a few changelogs and comments. [Boris] - Add a comment in the LMCE code for the new WARN. [Jarkko] - Check CONFIG_KVM_INTEL instead of CONFIG_KVM when deciding whether or not to enable VMX. - Add a patch to introduce X86_FEATURE_MSR_IA32_FEAT_CTL. - Dropped Jim's Reviewed-by from a few KVM patches due to the above addition.
v3: - Rebase to tip/master, ceceaf1f12ba ("Merge branch 'WIP.x86/cleanups'"). - Rename the feature control MSR bit defines [Boris]. - Rewrite the error message displayed when reading feature control MSR faults on a VMX capable CPU to explicitly state that it's likely a hardware or hypervisor issue [Boris]. - Collect a Reviewed-by for the LMCE change [Boris]. - Enable VMX in feature control (if it's unlocked) if and only if KVM is enabled [Paolo]. - Remove a big pile of redudant MSR defines from the KVM selftests that was discovered when renaming the feature control defines. - Fix a changelog typo [Boris].
v2: - Rebase to latest tip/x86/cpu (1edae1ae6258, "x86/Kconfig: Enforce...) - Collect Jim's reviews. - Fix a typo in setting of EPT capabilities [TonyWWang-oc]. - Remove defines for reserved VMX feature flags [Paolo]. - Print the VMX features under "flags" and maintain all existing names to be backward compatible with the ABI [Paolo]. - Create aggregate APIC features to report FLEXPRIORITY and APICV, so that the full feature *and* their associated individual features are printed, e.g. to aid in recognizing why an APIC feature isn't being used. - Fix a few copy paste errors in changelogs.
v1 cover letter:
== IA32_FEATURE_CONTROL == Lack of IA32_FEATURE_CONTROL configuration during boot isn't a functional issue in the current kernel as the majority of platforms set and lock IA32_FEATURE_CONTROL in firmware. And when the MSR is left unlocked, KVM is the only subsystem that writes IA32_FEATURE_CONTROL. That will change if/when SGX support is enabled, as SGX will also want to fully enable itself when IA32_FEATURE_CONTROL is unlocked.
== VMX Feature Reporting == VMX features are not enumerated via CPUID, but instead are enumerated through VMX MSRs. As a result, new VMX features are not automatically reported via /proc/cpuinfo.
An attempt was made long ago to report interesting and/or meaningful VMX features by synthesizing select features into a Linux-defined cpufeatures word. Synthetic feature flags worked for the initial purpose, but the existence of the synthetic flags was forgotten almost immediately, e.g. only one new flag (EPT A/D) has been added in the the decade since the synthetic VMX features were introduced, while VMX and KVM have gained support for many new features.
Placing the synthetic flags in x86_capability also allows them to be queried via cpu_has() and company, which is misleading as the flags exist purely for reporting via /proc/cpuinfo. KVM, the only in-kernel user of VMX, ignores the flags.
Last but not least, VMX features are reported in /proc/cpuinfo even when VMX is unusable due to lack of enabling in IA32_FEATURE_CONTROL.
== Caveats == All of the testing of non-standard flows was done in a VM, as I don't have a system that leaves IA32_FEATURE_CONTROL unlocked, or locks it with VMX disabled.
The Centaur and Zhaoxin changes are somewhat speculative, as I haven't confirmed they actually support IA32_FEATURE_CONTROL, or that they want to gain "official" KVM support. I assume they unofficially support KVM given that both CPUs went through the effort of enumerating VMX features. That in turn would require them to support IA32_FEATURE_CONTROL since KVM will fault and refuse to load if the MSR doesn't exist.
[*] https://lkml.kernel.org/r/20190925085156.GA3891@zn.tnic
Sean Christopherson (19): x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR selftests: kvm: Replace manual MSR defs with common msr-index.h tools arch x86: Sync msr-index.h from kernel sources x86/intel: Initialize IA32_FEAT_CTL MSR at boot x86/mce: WARN once if IA32_FEAT_CTL MSR is left unlocked x86/centaur: Use common IA32_FEAT_CTL MSR initialization x86/zhaoxin: Use common IA32_FEAT_CTL MSR initialization x86/cpu: Clear VMX feature flag if VMX is not fully enabled x86/vmx: Introduce VMX_FEATURES_* x86/cpu: Detect VMX features on Intel, Centaur and Zhaoxin CPUs x86/cpu: Print VMX flags in /proc/cpuinfo using VMX_FEATURES_* x86/cpu: Set synthetic VMX cpufeatures during init_ia32_feat_ctl() x86/cpufeatures: Add flag to track whether MSR IA32_FEAT_CTL is configured KVM: VMX: Drop initialization of IA32_FEAT_CTL MSR KVM: VMX: Use VMX feature flag to query BIOS enabling KVM: VMX: Check for full VMX support when verifying CPU compatibility KVM: VMX: Use VMX_FEATURE_* flags to define VMCS control bits perf/x86: Provide stubs of KVM helpers for non-Intel CPUs KVM: VMX: Allow KVM_INTEL when building for Centaur and/or Zhaoxin CPUs
MAINTAINERS | 2 +- arch/x86/Kconfig.cpu | 8 + arch/x86/boot/mkcpustr.c | 1 + arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/msr-index.h | 14 +- arch/x86/include/asm/perf_event.h | 22 +- arch/x86/include/asm/processor.h | 4 + arch/x86/include/asm/vmx.h | 105 +-- arch/x86/include/asm/vmxfeatures.h | 86 +++ arch/x86/kernel/cpu/Makefile | 6 +- arch/x86/kernel/cpu/centaur.c | 35 +- arch/x86/kernel/cpu/common.c | 3 + arch/x86/kernel/cpu/cpu.h | 4 + arch/x86/kernel/cpu/feat_ctl.c | 144 ++++ arch/x86/kernel/cpu/intel.c | 49 +- arch/x86/kernel/cpu/mce/intel.c | 15 +- arch/x86/kernel/cpu/mkcapflags.sh | 15 +- arch/x86/kernel/cpu/proc.c | 15 + arch/x86/kernel/cpu/zhaoxin.c | 35 +- arch/x86/kvm/Kconfig | 10 +- arch/x86/kvm/vmx/nested.c | 4 +- arch/x86/kvm/vmx/vmx.c | 67 +- arch/x86/kvm/vmx/vmx.h | 2 +- arch/x86/kvm/x86.c | 2 +- drivers/idle/intel_idle.c | 2 +- tools/arch/x86/include/asm/msr-index.h | 14 +- tools/power/x86/turbostat/turbostat.c | 4 +- tools/testing/selftests/kvm/Makefile | 4 +- .../selftests/kvm/include/x86_64/processor.h | 726 +----------------- tools/testing/selftests/kvm/lib/x86_64/vmx.c | 8 +- 30 files changed, 420 insertions(+), 987 deletions(-) create mode 100644 arch/x86/include/asm/vmxfeatures.h create mode 100644 arch/x86/kernel/cpu/feat_ctl.c
As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL are quite a mouthful, especially the VMX bits which must differentiate between enabling VMX inside and outside SMX (TXT) operation. Rename the MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to make them a little friendlier on the eyes.
Arguably the MSR itself should keep the full IA32_FEATURE_CONTROL name to match Intel's SDM, but a future patch will add a dedicated Kconfig, file and functions for the MSR. Using the full name for those assets is rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its nomenclature is consistent throughout the kernel.
Opportunistically fix a few other annoyances with the defines:
- Relocate the bit defines so that they immediately follow the MSR define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL. - Add whitespace around the block of feature control defines to make it clear they're all related. - Use BIT() instead of manually encoding the bit shift. - Use "VMX" instead of "VMXON" to match the SDM. - Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to be consistent with the kernel's verbiage used for all other feature control bits. Note, the SDM refers to the LMCE bit as LMCE_ON, likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN. Ignore the (literal) one-off usage of _ON, the SDM is simply "wrong".
Cc: Borislav Petkov bp@alien8.de Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com --- arch/x86/include/asm/msr-index.h | 14 +++++----- arch/x86/kernel/cpu/mce/intel.c | 10 +++---- arch/x86/kvm/vmx/nested.c | 4 +-- arch/x86/kvm/vmx/vmx.c | 46 ++++++++++++++++---------------- arch/x86/kvm/vmx/vmx.h | 2 +- arch/x86/kvm/x86.c | 2 +- drivers/idle/intel_idle.c | 2 +- 7 files changed, 41 insertions(+), 39 deletions(-)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 084e98da04a7..ebe1685e92dd 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -558,7 +558,14 @@ #define MSR_IA32_EBL_CR_POWERON 0x0000002a #define MSR_EBC_FREQUENCY_ID 0x0000002c #define MSR_SMI_COUNT 0x00000034 -#define MSR_IA32_FEATURE_CONTROL 0x0000003a + +/* Referred to as IA32_FEATURE_CONTROL in Intel's SDM. */ +#define MSR_IA32_FEAT_CTL 0x0000003a +#define FEAT_CTL_LOCKED BIT(0) +#define FEAT_CTL_VMX_ENABLED_INSIDE_SMX BIT(1) +#define FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX BIT(2) +#define FEAT_CTL_LMCE_ENABLED BIT(20) + #define MSR_IA32_TSC_ADJUST 0x0000003b #define MSR_IA32_BNDCFGS 0x00000d90
@@ -566,11 +573,6 @@
#define MSR_IA32_XSS 0x00000da0
-#define FEATURE_CONTROL_LOCKED (1<<0) -#define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX (1<<1) -#define FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX (1<<2) -#define FEATURE_CONTROL_LMCE (1<<20) - #define MSR_IA32_APICBASE 0x0000001b #define MSR_IA32_APICBASE_BSP (1<<8) #define MSR_IA32_APICBASE_ENABLE (1<<11) diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c index e270d0770134..c238518b84a2 100644 --- a/arch/x86/kernel/cpu/mce/intel.c +++ b/arch/x86/kernel/cpu/mce/intel.c @@ -115,12 +115,12 @@ static bool lmce_supported(void)
/* * BIOS should indicate support for LMCE by setting bit 20 in - * IA32_FEATURE_CONTROL without which touching MCG_EXT_CTL will - * generate a #GP fault. + * IA32_FEAT_CTL without which touching MCG_EXT_CTL will generate a #GP + * fault. */ - rdmsrl(MSR_IA32_FEATURE_CONTROL, tmp); - if ((tmp & (FEATURE_CONTROL_LOCKED | FEATURE_CONTROL_LMCE)) == - (FEATURE_CONTROL_LOCKED | FEATURE_CONTROL_LMCE)) + rdmsrl(MSR_IA32_FEAT_CTL, tmp); + if ((tmp & (FEAT_CTL_LOCKED | FEAT_CTL_LMCE_ENABLED)) == + (FEAT_CTL_LOCKED | FEAT_CTL_LMCE_ENABLED)) return true;
return false; diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 4aea7d304beb..6879966b7648 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -4588,8 +4588,8 @@ static int handle_vmon(struct kvm_vcpu *vcpu) gpa_t vmptr; uint32_t revision; struct vcpu_vmx *vmx = to_vmx(vcpu); - const u64 VMXON_NEEDED_FEATURES = FEATURE_CONTROL_LOCKED - | FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX; + const u64 VMXON_NEEDED_FEATURES = FEAT_CTL_LOCKED + | FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX;
/* * The Intel VMX Instruction Reference lists a bunch of bits that are diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index e3394c839dea..91b2517e3b62 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -1839,11 +1839,11 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_MCG_EXT_CTL: if (!msr_info->host_initiated && !(vmx->msr_ia32_feature_control & - FEATURE_CONTROL_LMCE)) + FEAT_CTL_LMCE_ENABLED)) return 1; msr_info->data = vcpu->arch.mcg_ext_ctl; break; - case MSR_IA32_FEATURE_CONTROL: + case MSR_IA32_FEAT_CTL: msr_info->data = vmx->msr_ia32_feature_control; break; case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC: @@ -2074,15 +2074,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_MCG_EXT_CTL: if ((!msr_info->host_initiated && !(to_vmx(vcpu)->msr_ia32_feature_control & - FEATURE_CONTROL_LMCE)) || + FEAT_CTL_LMCE_ENABLED)) || (data & ~MCG_EXT_CTL_LMCE_EN)) return 1; vcpu->arch.mcg_ext_ctl = data; break; - case MSR_IA32_FEATURE_CONTROL: + case MSR_IA32_FEAT_CTL: if (!vmx_feature_control_msr_valid(vcpu, data) || (to_vmx(vcpu)->msr_ia32_feature_control & - FEATURE_CONTROL_LOCKED && !msr_info->host_initiated)) + FEAT_CTL_LOCKED && !msr_info->host_initiated)) return 1; vmx->msr_ia32_feature_control = data; if (msr_info->host_initiated && data == 0) @@ -2206,22 +2206,22 @@ static __init int vmx_disabled_by_bios(void) { u64 msr;
- rdmsrl(MSR_IA32_FEATURE_CONTROL, msr); - if (msr & FEATURE_CONTROL_LOCKED) { + rdmsrl(MSR_IA32_FEAT_CTL, msr); + if (msr & FEAT_CTL_LOCKED) { /* launched w/ TXT and VMX disabled */ - if (!(msr & FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX) + if (!(msr & FEAT_CTL_VMX_ENABLED_INSIDE_SMX) && tboot_enabled()) return 1; /* launched w/o TXT and VMX only enabled w/ TXT */ - if (!(msr & FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX) - && (msr & FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX) + if (!(msr & FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX) + && (msr & FEAT_CTL_VMX_ENABLED_INSIDE_SMX) && !tboot_enabled()) { printk(KERN_WARNING "kvm: disable TXT in the BIOS or " "activate TXT before enabling KVM\n"); return 1; } /* launched w/o TXT and VMX disabled */ - if (!(msr & FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX) + if (!(msr & FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX) && !tboot_enabled()) return 1; } @@ -2269,16 +2269,16 @@ static int hardware_enable(void) */ crash_enable_local_vmclear(cpu);
- rdmsrl(MSR_IA32_FEATURE_CONTROL, old); + rdmsrl(MSR_IA32_FEAT_CTL, old);
- test_bits = FEATURE_CONTROL_LOCKED; - test_bits |= FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX; + test_bits = FEAT_CTL_LOCKED; + test_bits |= FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX; if (tboot_enabled()) - test_bits |= FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX; + test_bits |= FEAT_CTL_VMX_ENABLED_INSIDE_SMX;
if ((old & test_bits) != test_bits) { /* enable and lock */ - wrmsrl(MSR_IA32_FEATURE_CONTROL, old | test_bits); + wrmsrl(MSR_IA32_FEAT_CTL, old | test_bits); } kvm_cpu_vmxon(phys_addr); if (enable_ept) @@ -6801,7 +6801,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) vmx->nested.posted_intr_nv = -1; vmx->nested.current_vmptr = -1ull;
- vmx->msr_ia32_feature_control_valid_bits = FEATURE_CONTROL_LOCKED; + vmx->msr_ia32_feature_control_valid_bits = FEAT_CTL_LOCKED;
/* * Enforce invariant: pi_desc.nv is always either POSTED_INTR_VECTOR @@ -7099,12 +7099,12 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
if (nested_vmx_allowed(vcpu)) to_vmx(vcpu)->msr_ia32_feature_control_valid_bits |= - FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX | - FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX; + FEAT_CTL_VMX_ENABLED_INSIDE_SMX | + FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX; else to_vmx(vcpu)->msr_ia32_feature_control_valid_bits &= - ~(FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX | - FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX); + ~(FEAT_CTL_VMX_ENABLED_INSIDE_SMX | + FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX);
if (nested_vmx_allowed(vcpu)) { nested_vmx_cr_fixed1_bits_update(vcpu); @@ -7523,10 +7523,10 @@ static void vmx_setup_mce(struct kvm_vcpu *vcpu) { if (vcpu->arch.mcg_cap & MCG_LMCE_P) to_vmx(vcpu)->msr_ia32_feature_control_valid_bits |= - FEATURE_CONTROL_LMCE; + FEAT_CTL_LMCE_ENABLED; else to_vmx(vcpu)->msr_ia32_feature_control_valid_bits &= - ~FEATURE_CONTROL_LMCE; + ~FEAT_CTL_LMCE_ENABLED; }
static int vmx_smi_allowed(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index a4f7f737c5d4..7f42cf3dcd70 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -289,7 +289,7 @@ struct vcpu_vmx {
/* * Only bits masked by msr_ia32_feature_control_valid_bits can be set in - * msr_ia32_feature_control. FEATURE_CONTROL_LOCKED is always included + * msr_ia32_feature_control. FEAT_CTL_LOCKED is always included * in msr_ia32_feature_control_valid_bits. */ u64 msr_ia32_feature_control; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index cf917139de6b..740d3ee42455 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1142,7 +1142,7 @@ static const u32 msrs_to_save_all[] = { MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR, #endif MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA, - MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX, + MSR_IA32_FEAT_CTL, MSR_IA32_BNDCFGS, MSR_TSC_AUX, MSR_IA32_SPEC_CTRL, MSR_IA32_RTIT_CTL, MSR_IA32_RTIT_STATUS, MSR_IA32_RTIT_CR3_MATCH, MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK, diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index 75fd2a7b0842..5a44cd8142e9 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -1284,7 +1284,7 @@ static void sklh_idle_state_table_update(void) /* if SGX is present */ if (ebx & (1 << 2)) {
- rdmsrl(MSR_IA32_FEATURE_CONTROL, msr); + rdmsrl(MSR_IA32_FEAT_CTL, msr);
/* if SGX is enabled */ if (msr & (1 << 18))
The kernel's version of msr-index.h was pulled wholesale into tools by commit 444e2ff34df8 ("tools arch x86: Grab a copy of the file containing the MSR numbers"), use the common msr-index.h instead of manually redefining everything in a KVM-only header.
Note, a few MSR related definitions remain in processor.h because they are not covered by msr-index.h, including the awesomely named APIC_BASE_MSR, which refers to starting index of the x2APIC MSRs, not the actual MSR_IA32_APICBASE, which *is* defined by msr-index.h.
Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com --- tools/testing/selftests/kvm/Makefile | 4 +- .../selftests/kvm/include/x86_64/processor.h | 726 +----------------- 2 files changed, 6 insertions(+), 724 deletions(-)
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 3138a916574a..608fa835c764 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -45,9 +45,11 @@ LIBKVM += $(LIBKVM_$(UNAME_M)) INSTALL_HDR_PATH = $(top_srcdir)/usr LINUX_HDR_PATH = $(INSTALL_HDR_PATH)/include/ LINUX_TOOL_INCLUDE = $(top_srcdir)/tools/include +LINUX_TOOL_ARCH_INCLUDE = $(top_srcdir)/tools/arch/x86/include CFLAGS += -Wall -Wstrict-prototypes -Wuninitialized -O2 -g -std=gnu99 \ -fno-stack-protector -fno-PIE -I$(LINUX_TOOL_INCLUDE) \ - -I$(LINUX_HDR_PATH) -Iinclude -I$(<D) -Iinclude/$(UNAME_M) -I.. + -I$(LINUX_TOOL_ARCH_INCLUDE) -I$(LINUX_HDR_PATH) -Iinclude \ + -I$(<D) -Iinclude/$(UNAME_M) -I..
no-pie-option := $(call try-run, echo 'int main() { return 0; }' | \ $(CC) -Werror -no-pie -x c - -o "$$TMP", -no-pie) diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h index 635ee6c33ad2..aa6451b3f740 100644 --- a/tools/testing/selftests/kvm/include/x86_64/processor.h +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h @@ -11,6 +11,8 @@ #include <assert.h> #include <stdint.h>
+#include <asm/msr-index.h> + #define X86_EFLAGS_FIXED (1u << 1)
#define X86_CR4_VME (1ul << 0) @@ -348,444 +350,6 @@ void kvm_get_cpu_address_width(unsigned int *pa_bits, unsigned int *va_bits); #define X86_CR0_CD (1UL<<30) /* Cache Disable */ #define X86_CR0_PG (1UL<<31) /* Paging */
-/* - * CPU model specific register (MSR) numbers. - */ - -/* x86-64 specific MSRs */ -#define MSR_EFER 0xc0000080 /* extended feature register */ -#define MSR_STAR 0xc0000081 /* legacy mode SYSCALL target */ -#define MSR_LSTAR 0xc0000082 /* long mode SYSCALL target */ -#define MSR_CSTAR 0xc0000083 /* compat mode SYSCALL target */ -#define MSR_SYSCALL_MASK 0xc0000084 /* EFLAGS mask for syscall */ -#define MSR_FS_BASE 0xc0000100 /* 64bit FS base */ -#define MSR_GS_BASE 0xc0000101 /* 64bit GS base */ -#define MSR_KERNEL_GS_BASE 0xc0000102 /* SwapGS GS shadow */ -#define MSR_TSC_AUX 0xc0000103 /* Auxiliary TSC */ - -/* EFER bits: */ -#define EFER_SCE (1<<0) /* SYSCALL/SYSRET */ -#define EFER_LME (1<<8) /* Long mode enable */ -#define EFER_LMA (1<<10) /* Long mode active (read-only) */ -#define EFER_NX (1<<11) /* No execute enable */ -#define EFER_SVME (1<<12) /* Enable virtualization */ -#define EFER_LMSLE (1<<13) /* Long Mode Segment Limit Enable */ -#define EFER_FFXSR (1<<14) /* Enable Fast FXSAVE/FXRSTOR */ - -/* Intel MSRs. Some also available on other CPUs */ - -#define MSR_PPIN_CTL 0x0000004e -#define MSR_PPIN 0x0000004f - -#define MSR_IA32_PERFCTR0 0x000000c1 -#define MSR_IA32_PERFCTR1 0x000000c2 -#define MSR_FSB_FREQ 0x000000cd -#define MSR_PLATFORM_INFO 0x000000ce -#define MSR_PLATFORM_INFO_CPUID_FAULT_BIT 31 -#define MSR_PLATFORM_INFO_CPUID_FAULT BIT_ULL(MSR_PLATFORM_INFO_CPUID_FAULT_BIT) - -#define MSR_PKG_CST_CONFIG_CONTROL 0x000000e2 -#define NHM_C3_AUTO_DEMOTE (1UL << 25) -#define NHM_C1_AUTO_DEMOTE (1UL << 26) -#define ATM_LNC_C6_AUTO_DEMOTE (1UL << 25) -#define SNB_C1_AUTO_UNDEMOTE (1UL << 27) -#define SNB_C3_AUTO_UNDEMOTE (1UL << 28) - -#define MSR_MTRRcap 0x000000fe -#define MSR_IA32_BBL_CR_CTL 0x00000119 -#define MSR_IA32_BBL_CR_CTL3 0x0000011e - -#define MSR_IA32_SYSENTER_CS 0x00000174 -#define MSR_IA32_SYSENTER_ESP 0x00000175 -#define MSR_IA32_SYSENTER_EIP 0x00000176 - -#define MSR_IA32_MCG_CAP 0x00000179 -#define MSR_IA32_MCG_STATUS 0x0000017a -#define MSR_IA32_MCG_CTL 0x0000017b -#define MSR_IA32_MCG_EXT_CTL 0x000004d0 - -#define MSR_OFFCORE_RSP_0 0x000001a6 -#define MSR_OFFCORE_RSP_1 0x000001a7 -#define MSR_TURBO_RATIO_LIMIT 0x000001ad -#define MSR_TURBO_RATIO_LIMIT1 0x000001ae -#define MSR_TURBO_RATIO_LIMIT2 0x000001af - -#define MSR_LBR_SELECT 0x000001c8 -#define MSR_LBR_TOS 0x000001c9 -#define MSR_LBR_NHM_FROM 0x00000680 -#define MSR_LBR_NHM_TO 0x000006c0 -#define MSR_LBR_CORE_FROM 0x00000040 -#define MSR_LBR_CORE_TO 0x00000060 - -#define MSR_LBR_INFO_0 0x00000dc0 /* ... 0xddf for _31 */ -#define LBR_INFO_MISPRED BIT_ULL(63) -#define LBR_INFO_IN_TX BIT_ULL(62) -#define LBR_INFO_ABORT BIT_ULL(61) -#define LBR_INFO_CYCLES 0xffff - -#define MSR_IA32_PEBS_ENABLE 0x000003f1 -#define MSR_IA32_DS_AREA 0x00000600 -#define MSR_IA32_PERF_CAPABILITIES 0x00000345 -#define MSR_PEBS_LD_LAT_THRESHOLD 0x000003f6 - -#define MSR_IA32_RTIT_CTL 0x00000570 -#define MSR_IA32_RTIT_STATUS 0x00000571 -#define MSR_IA32_RTIT_ADDR0_A 0x00000580 -#define MSR_IA32_RTIT_ADDR0_B 0x00000581 -#define MSR_IA32_RTIT_ADDR1_A 0x00000582 -#define MSR_IA32_RTIT_ADDR1_B 0x00000583 -#define MSR_IA32_RTIT_ADDR2_A 0x00000584 -#define MSR_IA32_RTIT_ADDR2_B 0x00000585 -#define MSR_IA32_RTIT_ADDR3_A 0x00000586 -#define MSR_IA32_RTIT_ADDR3_B 0x00000587 -#define MSR_IA32_RTIT_CR3_MATCH 0x00000572 -#define MSR_IA32_RTIT_OUTPUT_BASE 0x00000560 -#define MSR_IA32_RTIT_OUTPUT_MASK 0x00000561 - -#define MSR_MTRRfix64K_00000 0x00000250 -#define MSR_MTRRfix16K_80000 0x00000258 -#define MSR_MTRRfix16K_A0000 0x00000259 -#define MSR_MTRRfix4K_C0000 0x00000268 -#define MSR_MTRRfix4K_C8000 0x00000269 -#define MSR_MTRRfix4K_D0000 0x0000026a -#define MSR_MTRRfix4K_D8000 0x0000026b -#define MSR_MTRRfix4K_E0000 0x0000026c -#define MSR_MTRRfix4K_E8000 0x0000026d -#define MSR_MTRRfix4K_F0000 0x0000026e -#define MSR_MTRRfix4K_F8000 0x0000026f -#define MSR_MTRRdefType 0x000002ff - -#define MSR_IA32_CR_PAT 0x00000277 - -#define MSR_IA32_DEBUGCTLMSR 0x000001d9 -#define MSR_IA32_LASTBRANCHFROMIP 0x000001db -#define MSR_IA32_LASTBRANCHTOIP 0x000001dc -#define MSR_IA32_LASTINTFROMIP 0x000001dd -#define MSR_IA32_LASTINTTOIP 0x000001de - -/* DEBUGCTLMSR bits (others vary by model): */ -#define DEBUGCTLMSR_LBR (1UL << 0) /* last branch recording */ -#define DEBUGCTLMSR_BTF_SHIFT 1 -#define DEBUGCTLMSR_BTF (1UL << 1) /* single-step on branches */ -#define DEBUGCTLMSR_TR (1UL << 6) -#define DEBUGCTLMSR_BTS (1UL << 7) -#define DEBUGCTLMSR_BTINT (1UL << 8) -#define DEBUGCTLMSR_BTS_OFF_OS (1UL << 9) -#define DEBUGCTLMSR_BTS_OFF_USR (1UL << 10) -#define DEBUGCTLMSR_FREEZE_LBRS_ON_PMI (1UL << 11) -#define DEBUGCTLMSR_FREEZE_IN_SMM_BIT 14 -#define DEBUGCTLMSR_FREEZE_IN_SMM (1UL << DEBUGCTLMSR_FREEZE_IN_SMM_BIT) - -#define MSR_PEBS_FRONTEND 0x000003f7 - -#define MSR_IA32_POWER_CTL 0x000001fc - -#define MSR_IA32_MC0_CTL 0x00000400 -#define MSR_IA32_MC0_STATUS 0x00000401 -#define MSR_IA32_MC0_ADDR 0x00000402 -#define MSR_IA32_MC0_MISC 0x00000403 - -/* C-state Residency Counters */ -#define MSR_PKG_C3_RESIDENCY 0x000003f8 -#define MSR_PKG_C6_RESIDENCY 0x000003f9 -#define MSR_ATOM_PKG_C6_RESIDENCY 0x000003fa -#define MSR_PKG_C7_RESIDENCY 0x000003fa -#define MSR_CORE_C3_RESIDENCY 0x000003fc -#define MSR_CORE_C6_RESIDENCY 0x000003fd -#define MSR_CORE_C7_RESIDENCY 0x000003fe -#define MSR_KNL_CORE_C6_RESIDENCY 0x000003ff -#define MSR_PKG_C2_RESIDENCY 0x0000060d -#define MSR_PKG_C8_RESIDENCY 0x00000630 -#define MSR_PKG_C9_RESIDENCY 0x00000631 -#define MSR_PKG_C10_RESIDENCY 0x00000632 - -/* Interrupt Response Limit */ -#define MSR_PKGC3_IRTL 0x0000060a -#define MSR_PKGC6_IRTL 0x0000060b -#define MSR_PKGC7_IRTL 0x0000060c -#define MSR_PKGC8_IRTL 0x00000633 -#define MSR_PKGC9_IRTL 0x00000634 -#define MSR_PKGC10_IRTL 0x00000635 - -/* Run Time Average Power Limiting (RAPL) Interface */ - -#define MSR_RAPL_POWER_UNIT 0x00000606 - -#define MSR_PKG_POWER_LIMIT 0x00000610 -#define MSR_PKG_ENERGY_STATUS 0x00000611 -#define MSR_PKG_PERF_STATUS 0x00000613 -#define MSR_PKG_POWER_INFO 0x00000614 - -#define MSR_DRAM_POWER_LIMIT 0x00000618 -#define MSR_DRAM_ENERGY_STATUS 0x00000619 -#define MSR_DRAM_PERF_STATUS 0x0000061b -#define MSR_DRAM_POWER_INFO 0x0000061c - -#define MSR_PP0_POWER_LIMIT 0x00000638 -#define MSR_PP0_ENERGY_STATUS 0x00000639 -#define MSR_PP0_POLICY 0x0000063a -#define MSR_PP0_PERF_STATUS 0x0000063b - -#define MSR_PP1_POWER_LIMIT 0x00000640 -#define MSR_PP1_ENERGY_STATUS 0x00000641 -#define MSR_PP1_POLICY 0x00000642 - -/* Config TDP MSRs */ -#define MSR_CONFIG_TDP_NOMINAL 0x00000648 -#define MSR_CONFIG_TDP_LEVEL_1 0x00000649 -#define MSR_CONFIG_TDP_LEVEL_2 0x0000064A -#define MSR_CONFIG_TDP_CONTROL 0x0000064B -#define MSR_TURBO_ACTIVATION_RATIO 0x0000064C - -#define MSR_PLATFORM_ENERGY_STATUS 0x0000064D - -#define MSR_PKG_WEIGHTED_CORE_C0_RES 0x00000658 -#define MSR_PKG_ANY_CORE_C0_RES 0x00000659 -#define MSR_PKG_ANY_GFXE_C0_RES 0x0000065A -#define MSR_PKG_BOTH_CORE_GFXE_C0_RES 0x0000065B - -#define MSR_CORE_C1_RES 0x00000660 -#define MSR_MODULE_C6_RES_MS 0x00000664 - -#define MSR_CC6_DEMOTION_POLICY_CONFIG 0x00000668 -#define MSR_MC6_DEMOTION_POLICY_CONFIG 0x00000669 - -#define MSR_ATOM_CORE_RATIOS 0x0000066a -#define MSR_ATOM_CORE_VIDS 0x0000066b -#define MSR_ATOM_CORE_TURBO_RATIOS 0x0000066c -#define MSR_ATOM_CORE_TURBO_VIDS 0x0000066d - - -#define MSR_CORE_PERF_LIMIT_REASONS 0x00000690 -#define MSR_GFX_PERF_LIMIT_REASONS 0x000006B0 -#define MSR_RING_PERF_LIMIT_REASONS 0x000006B1 - -/* Hardware P state interface */ -#define MSR_PPERF 0x0000064e -#define MSR_PERF_LIMIT_REASONS 0x0000064f -#define MSR_PM_ENABLE 0x00000770 -#define MSR_HWP_CAPABILITIES 0x00000771 -#define MSR_HWP_REQUEST_PKG 0x00000772 -#define MSR_HWP_INTERRUPT 0x00000773 -#define MSR_HWP_REQUEST 0x00000774 -#define MSR_HWP_STATUS 0x00000777 - -/* CPUID.6.EAX */ -#define HWP_BASE_BIT (1<<7) -#define HWP_NOTIFICATIONS_BIT (1<<8) -#define HWP_ACTIVITY_WINDOW_BIT (1<<9) -#define HWP_ENERGY_PERF_PREFERENCE_BIT (1<<10) -#define HWP_PACKAGE_LEVEL_REQUEST_BIT (1<<11) - -/* IA32_HWP_CAPABILITIES */ -#define HWP_HIGHEST_PERF(x) (((x) >> 0) & 0xff) -#define HWP_GUARANTEED_PERF(x) (((x) >> 8) & 0xff) -#define HWP_MOSTEFFICIENT_PERF(x) (((x) >> 16) & 0xff) -#define HWP_LOWEST_PERF(x) (((x) >> 24) & 0xff) - -/* IA32_HWP_REQUEST */ -#define HWP_MIN_PERF(x) (x & 0xff) -#define HWP_MAX_PERF(x) ((x & 0xff) << 8) -#define HWP_DESIRED_PERF(x) ((x & 0xff) << 16) -#define HWP_ENERGY_PERF_PREFERENCE(x) (((unsigned long long) x & 0xff) << 24) -#define HWP_EPP_PERFORMANCE 0x00 -#define HWP_EPP_BALANCE_PERFORMANCE 0x80 -#define HWP_EPP_BALANCE_POWERSAVE 0xC0 -#define HWP_EPP_POWERSAVE 0xFF -#define HWP_ACTIVITY_WINDOW(x) ((unsigned long long)(x & 0xff3) << 32) -#define HWP_PACKAGE_CONTROL(x) ((unsigned long long)(x & 0x1) << 42) - -/* IA32_HWP_STATUS */ -#define HWP_GUARANTEED_CHANGE(x) (x & 0x1) -#define HWP_EXCURSION_TO_MINIMUM(x) (x & 0x4) - -/* IA32_HWP_INTERRUPT */ -#define HWP_CHANGE_TO_GUARANTEED_INT(x) (x & 0x1) -#define HWP_EXCURSION_TO_MINIMUM_INT(x) (x & 0x2) - -#define MSR_AMD64_MC0_MASK 0xc0010044 - -#define MSR_IA32_MCx_CTL(x) (MSR_IA32_MC0_CTL + 4*(x)) -#define MSR_IA32_MCx_STATUS(x) (MSR_IA32_MC0_STATUS + 4*(x)) -#define MSR_IA32_MCx_ADDR(x) (MSR_IA32_MC0_ADDR + 4*(x)) -#define MSR_IA32_MCx_MISC(x) (MSR_IA32_MC0_MISC + 4*(x)) - -#define MSR_AMD64_MCx_MASK(x) (MSR_AMD64_MC0_MASK + (x)) - -/* These are consecutive and not in the normal 4er MCE bank block */ -#define MSR_IA32_MC0_CTL2 0x00000280 -#define MSR_IA32_MCx_CTL2(x) (MSR_IA32_MC0_CTL2 + (x)) - -#define MSR_P6_PERFCTR0 0x000000c1 -#define MSR_P6_PERFCTR1 0x000000c2 -#define MSR_P6_EVNTSEL0 0x00000186 -#define MSR_P6_EVNTSEL1 0x00000187 - -#define MSR_KNC_PERFCTR0 0x00000020 -#define MSR_KNC_PERFCTR1 0x00000021 -#define MSR_KNC_EVNTSEL0 0x00000028 -#define MSR_KNC_EVNTSEL1 0x00000029 - -/* Alternative perfctr range with full access. */ -#define MSR_IA32_PMC0 0x000004c1 - -/* AMD64 MSRs. Not complete. See the architecture manual for a more - complete list. */ - -#define MSR_AMD64_PATCH_LEVEL 0x0000008b -#define MSR_AMD64_TSC_RATIO 0xc0000104 -#define MSR_AMD64_NB_CFG 0xc001001f -#define MSR_AMD64_PATCH_LOADER 0xc0010020 -#define MSR_AMD64_OSVW_ID_LENGTH 0xc0010140 -#define MSR_AMD64_OSVW_STATUS 0xc0010141 -#define MSR_AMD64_LS_CFG 0xc0011020 -#define MSR_AMD64_DC_CFG 0xc0011022 -#define MSR_AMD64_BU_CFG2 0xc001102a -#define MSR_AMD64_IBSFETCHCTL 0xc0011030 -#define MSR_AMD64_IBSFETCHLINAD 0xc0011031 -#define MSR_AMD64_IBSFETCHPHYSAD 0xc0011032 -#define MSR_AMD64_IBSFETCH_REG_COUNT 3 -#define MSR_AMD64_IBSFETCH_REG_MASK ((1UL<<MSR_AMD64_IBSFETCH_REG_COUNT)-1) -#define MSR_AMD64_IBSOPCTL 0xc0011033 -#define MSR_AMD64_IBSOPRIP 0xc0011034 -#define MSR_AMD64_IBSOPDATA 0xc0011035 -#define MSR_AMD64_IBSOPDATA2 0xc0011036 -#define MSR_AMD64_IBSOPDATA3 0xc0011037 -#define MSR_AMD64_IBSDCLINAD 0xc0011038 -#define MSR_AMD64_IBSDCPHYSAD 0xc0011039 -#define MSR_AMD64_IBSOP_REG_COUNT 7 -#define MSR_AMD64_IBSOP_REG_MASK ((1UL<<MSR_AMD64_IBSOP_REG_COUNT)-1) -#define MSR_AMD64_IBSCTL 0xc001103a -#define MSR_AMD64_IBSBRTARGET 0xc001103b -#define MSR_AMD64_IBSOPDATA4 0xc001103d -#define MSR_AMD64_IBS_REG_COUNT_MAX 8 /* includes MSR_AMD64_IBSBRTARGET */ -#define MSR_AMD64_SEV 0xc0010131 -#define MSR_AMD64_SEV_ENABLED_BIT 0 -#define MSR_AMD64_SEV_ENABLED BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT) - -/* Fam 17h MSRs */ -#define MSR_F17H_IRPERF 0xc00000e9 - -/* Fam 16h MSRs */ -#define MSR_F16H_L2I_PERF_CTL 0xc0010230 -#define MSR_F16H_L2I_PERF_CTR 0xc0010231 -#define MSR_F16H_DR1_ADDR_MASK 0xc0011019 -#define MSR_F16H_DR2_ADDR_MASK 0xc001101a -#define MSR_F16H_DR3_ADDR_MASK 0xc001101b -#define MSR_F16H_DR0_ADDR_MASK 0xc0011027 - -/* Fam 15h MSRs */ -#define MSR_F15H_PERF_CTL 0xc0010200 -#define MSR_F15H_PERF_CTR 0xc0010201 -#define MSR_F15H_NB_PERF_CTL 0xc0010240 -#define MSR_F15H_NB_PERF_CTR 0xc0010241 -#define MSR_F15H_PTSC 0xc0010280 -#define MSR_F15H_IC_CFG 0xc0011021 - -/* Fam 10h MSRs */ -#define MSR_FAM10H_MMIO_CONF_BASE 0xc0010058 -#define FAM10H_MMIO_CONF_ENABLE (1<<0) -#define FAM10H_MMIO_CONF_BUSRANGE_MASK 0xf -#define FAM10H_MMIO_CONF_BUSRANGE_SHIFT 2 -#define FAM10H_MMIO_CONF_BASE_MASK 0xfffffffULL -#define FAM10H_MMIO_CONF_BASE_SHIFT 20 -#define MSR_FAM10H_NODE_ID 0xc001100c -#define MSR_F10H_DECFG 0xc0011029 -#define MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT 1 -#define MSR_F10H_DECFG_LFENCE_SERIALIZE BIT_ULL(MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT) - -/* K8 MSRs */ -#define MSR_K8_TOP_MEM1 0xc001001a -#define MSR_K8_TOP_MEM2 0xc001001d -#define MSR_K8_SYSCFG 0xc0010010 -#define MSR_K8_SYSCFG_MEM_ENCRYPT_BIT 23 -#define MSR_K8_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_K8_SYSCFG_MEM_ENCRYPT_BIT) -#define MSR_K8_INT_PENDING_MSG 0xc0010055 -/* C1E active bits in int pending message */ -#define K8_INTP_C1E_ACTIVE_MASK 0x18000000 -#define MSR_K8_TSEG_ADDR 0xc0010112 -#define MSR_K8_TSEG_MASK 0xc0010113 -#define K8_MTRRFIXRANGE_DRAM_ENABLE 0x00040000 /* MtrrFixDramEn bit */ -#define K8_MTRRFIXRANGE_DRAM_MODIFY 0x00080000 /* MtrrFixDramModEn bit */ -#define K8_MTRR_RDMEM_WRMEM_MASK 0x18181818 /* Mask: RdMem|WrMem */ - -/* K7 MSRs */ -#define MSR_K7_EVNTSEL0 0xc0010000 -#define MSR_K7_PERFCTR0 0xc0010004 -#define MSR_K7_EVNTSEL1 0xc0010001 -#define MSR_K7_PERFCTR1 0xc0010005 -#define MSR_K7_EVNTSEL2 0xc0010002 -#define MSR_K7_PERFCTR2 0xc0010006 -#define MSR_K7_EVNTSEL3 0xc0010003 -#define MSR_K7_PERFCTR3 0xc0010007 -#define MSR_K7_CLK_CTL 0xc001001b -#define MSR_K7_HWCR 0xc0010015 -#define MSR_K7_HWCR_SMMLOCK_BIT 0 -#define MSR_K7_HWCR_SMMLOCK BIT_ULL(MSR_K7_HWCR_SMMLOCK_BIT) -#define MSR_K7_FID_VID_CTL 0xc0010041 -#define MSR_K7_FID_VID_STATUS 0xc0010042 - -/* K6 MSRs */ -#define MSR_K6_WHCR 0xc0000082 -#define MSR_K6_UWCCR 0xc0000085 -#define MSR_K6_EPMR 0xc0000086 -#define MSR_K6_PSOR 0xc0000087 -#define MSR_K6_PFIR 0xc0000088 - -/* Centaur-Hauls/IDT defined MSRs. */ -#define MSR_IDT_FCR1 0x00000107 -#define MSR_IDT_FCR2 0x00000108 -#define MSR_IDT_FCR3 0x00000109 -#define MSR_IDT_FCR4 0x0000010a - -#define MSR_IDT_MCR0 0x00000110 -#define MSR_IDT_MCR1 0x00000111 -#define MSR_IDT_MCR2 0x00000112 -#define MSR_IDT_MCR3 0x00000113 -#define MSR_IDT_MCR4 0x00000114 -#define MSR_IDT_MCR5 0x00000115 -#define MSR_IDT_MCR6 0x00000116 -#define MSR_IDT_MCR7 0x00000117 -#define MSR_IDT_MCR_CTRL 0x00000120 - -/* VIA Cyrix defined MSRs*/ -#define MSR_VIA_FCR 0x00001107 -#define MSR_VIA_LONGHAUL 0x0000110a -#define MSR_VIA_RNG 0x0000110b -#define MSR_VIA_BCR2 0x00001147 - -/* Transmeta defined MSRs */ -#define MSR_TMTA_LONGRUN_CTRL 0x80868010 -#define MSR_TMTA_LONGRUN_FLAGS 0x80868011 -#define MSR_TMTA_LRTI_READOUT 0x80868018 -#define MSR_TMTA_LRTI_VOLT_MHZ 0x8086801a - -/* Intel defined MSRs. */ -#define MSR_IA32_P5_MC_ADDR 0x00000000 -#define MSR_IA32_P5_MC_TYPE 0x00000001 -#define MSR_IA32_TSC 0x00000010 -#define MSR_IA32_PLATFORM_ID 0x00000017 -#define MSR_IA32_EBL_CR_POWERON 0x0000002a -#define MSR_EBC_FREQUENCY_ID 0x0000002c -#define MSR_SMI_COUNT 0x00000034 -#define MSR_IA32_FEATURE_CONTROL 0x0000003a -#define MSR_IA32_TSC_ADJUST 0x0000003b -#define MSR_IA32_BNDCFGS 0x00000d90 - -#define MSR_IA32_BNDCFGS_RSVD 0x00000ffc - -#define MSR_IA32_XSS 0x00000da0 - -#define FEATURE_CONTROL_LOCKED (1<<0) -#define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX (1<<1) -#define FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX (1<<2) -#define FEATURE_CONTROL_LMCE (1<<20) - -#define MSR_IA32_APICBASE 0x0000001b -#define MSR_IA32_APICBASE_BSP (1<<8) -#define MSR_IA32_APICBASE_ENABLE (1<<11) -#define MSR_IA32_APICBASE_BASE (0xfffff<<12) - #define APIC_BASE_MSR 0x800 #define X2APIC_ENABLE (1UL << 10) #define APIC_ICR 0x300 @@ -813,291 +377,7 @@ void kvm_get_cpu_address_width(unsigned int *pa_bits, unsigned int *va_bits); #define APIC_VECTOR_MASK 0x000FF #define APIC_ICR2 0x310
-#define MSR_IA32_TSCDEADLINE 0x000006e0 - -#define MSR_IA32_UCODE_WRITE 0x00000079 -#define MSR_IA32_UCODE_REV 0x0000008b - -#define MSR_IA32_SMM_MONITOR_CTL 0x0000009b -#define MSR_IA32_SMBASE 0x0000009e - -#define MSR_IA32_PERF_STATUS 0x00000198 -#define MSR_IA32_PERF_CTL 0x00000199 -#define INTEL_PERF_CTL_MASK 0xffff -#define MSR_AMD_PSTATE_DEF_BASE 0xc0010064 -#define MSR_AMD_PERF_STATUS 0xc0010063 -#define MSR_AMD_PERF_CTL 0xc0010062 - -#define MSR_IA32_MPERF 0x000000e7 -#define MSR_IA32_APERF 0x000000e8 - -#define MSR_IA32_THERM_CONTROL 0x0000019a -#define MSR_IA32_THERM_INTERRUPT 0x0000019b - -#define THERM_INT_HIGH_ENABLE (1 << 0) -#define THERM_INT_LOW_ENABLE (1 << 1) -#define THERM_INT_PLN_ENABLE (1 << 24) - -#define MSR_IA32_THERM_STATUS 0x0000019c - -#define THERM_STATUS_PROCHOT (1 << 0) -#define THERM_STATUS_POWER_LIMIT (1 << 10) - -#define MSR_THERM2_CTL 0x0000019d - -#define MSR_THERM2_CTL_TM_SELECT (1ULL << 16) - -#define MSR_IA32_MISC_ENABLE 0x000001a0 - -#define MSR_IA32_TEMPERATURE_TARGET 0x000001a2 - -#define MSR_MISC_FEATURE_CONTROL 0x000001a4 -#define MSR_MISC_PWR_MGMT 0x000001aa - -#define MSR_IA32_ENERGY_PERF_BIAS 0x000001b0 -#define ENERGY_PERF_BIAS_PERFORMANCE 0 -#define ENERGY_PERF_BIAS_BALANCE_PERFORMANCE 4 -#define ENERGY_PERF_BIAS_NORMAL 6 -#define ENERGY_PERF_BIAS_BALANCE_POWERSAVE 8 -#define ENERGY_PERF_BIAS_POWERSAVE 15 - -#define MSR_IA32_PACKAGE_THERM_STATUS 0x000001b1 - -#define PACKAGE_THERM_STATUS_PROCHOT (1 << 0) -#define PACKAGE_THERM_STATUS_POWER_LIMIT (1 << 10) - -#define MSR_IA32_PACKAGE_THERM_INTERRUPT 0x000001b2 - -#define PACKAGE_THERM_INT_HIGH_ENABLE (1 << 0) -#define PACKAGE_THERM_INT_LOW_ENABLE (1 << 1) -#define PACKAGE_THERM_INT_PLN_ENABLE (1 << 24) - -/* Thermal Thresholds Support */ -#define THERM_INT_THRESHOLD0_ENABLE (1 << 15) -#define THERM_SHIFT_THRESHOLD0 8 -#define THERM_MASK_THRESHOLD0 (0x7f << THERM_SHIFT_THRESHOLD0) -#define THERM_INT_THRESHOLD1_ENABLE (1 << 23) -#define THERM_SHIFT_THRESHOLD1 16 -#define THERM_MASK_THRESHOLD1 (0x7f << THERM_SHIFT_THRESHOLD1) -#define THERM_STATUS_THRESHOLD0 (1 << 6) -#define THERM_LOG_THRESHOLD0 (1 << 7) -#define THERM_STATUS_THRESHOLD1 (1 << 8) -#define THERM_LOG_THRESHOLD1 (1 << 9) - -/* MISC_ENABLE bits: architectural */ -#define MSR_IA32_MISC_ENABLE_FAST_STRING_BIT 0 -#define MSR_IA32_MISC_ENABLE_FAST_STRING (1ULL << MSR_IA32_MISC_ENABLE_FAST_STRING_BIT) -#define MSR_IA32_MISC_ENABLE_TCC_BIT 1 -#define MSR_IA32_MISC_ENABLE_TCC (1ULL << MSR_IA32_MISC_ENABLE_TCC_BIT) -#define MSR_IA32_MISC_ENABLE_EMON_BIT 7 -#define MSR_IA32_MISC_ENABLE_EMON (1ULL << MSR_IA32_MISC_ENABLE_EMON_BIT) -#define MSR_IA32_MISC_ENABLE_BTS_UNAVAIL_BIT 11 -#define MSR_IA32_MISC_ENABLE_BTS_UNAVAIL (1ULL << MSR_IA32_MISC_ENABLE_BTS_UNAVAIL_BIT) -#define MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL_BIT 12 -#define MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL (1ULL << MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL_BIT) -#define MSR_IA32_MISC_ENABLE_ENHANCED_SPEEDSTEP_BIT 16 -#define MSR_IA32_MISC_ENABLE_ENHANCED_SPEEDSTEP (1ULL << MSR_IA32_MISC_ENABLE_ENHANCED_SPEEDSTEP_BIT) -#define MSR_IA32_MISC_ENABLE_MWAIT_BIT 18 -#define MSR_IA32_MISC_ENABLE_MWAIT (1ULL << MSR_IA32_MISC_ENABLE_MWAIT_BIT) -#define MSR_IA32_MISC_ENABLE_LIMIT_CPUID_BIT 22 -#define MSR_IA32_MISC_ENABLE_LIMIT_CPUID (1ULL << MSR_IA32_MISC_ENABLE_LIMIT_CPUID_BIT) -#define MSR_IA32_MISC_ENABLE_XTPR_DISABLE_BIT 23 -#define MSR_IA32_MISC_ENABLE_XTPR_DISABLE (1ULL << MSR_IA32_MISC_ENABLE_XTPR_DISABLE_BIT) -#define MSR_IA32_MISC_ENABLE_XD_DISABLE_BIT 34 -#define MSR_IA32_MISC_ENABLE_XD_DISABLE (1ULL << MSR_IA32_MISC_ENABLE_XD_DISABLE_BIT) - -/* MISC_ENABLE bits: model-specific, meaning may vary from core to core */ -#define MSR_IA32_MISC_ENABLE_X87_COMPAT_BIT 2 -#define MSR_IA32_MISC_ENABLE_X87_COMPAT (1ULL << MSR_IA32_MISC_ENABLE_X87_COMPAT_BIT) -#define MSR_IA32_MISC_ENABLE_TM1_BIT 3 -#define MSR_IA32_MISC_ENABLE_TM1 (1ULL << MSR_IA32_MISC_ENABLE_TM1_BIT) -#define MSR_IA32_MISC_ENABLE_SPLIT_LOCK_DISABLE_BIT 4 -#define MSR_IA32_MISC_ENABLE_SPLIT_LOCK_DISABLE (1ULL << MSR_IA32_MISC_ENABLE_SPLIT_LOCK_DISABLE_BIT) -#define MSR_IA32_MISC_ENABLE_L3CACHE_DISABLE_BIT 6 -#define MSR_IA32_MISC_ENABLE_L3CACHE_DISABLE (1ULL << MSR_IA32_MISC_ENABLE_L3CACHE_DISABLE_BIT) -#define MSR_IA32_MISC_ENABLE_SUPPRESS_LOCK_BIT 8 -#define MSR_IA32_MISC_ENABLE_SUPPRESS_LOCK (1ULL << MSR_IA32_MISC_ENABLE_SUPPRESS_LOCK_BIT) -#define MSR_IA32_MISC_ENABLE_PREFETCH_DISABLE_BIT 9 -#define MSR_IA32_MISC_ENABLE_PREFETCH_DISABLE (1ULL << MSR_IA32_MISC_ENABLE_PREFETCH_DISABLE_BIT) -#define MSR_IA32_MISC_ENABLE_FERR_BIT 10 -#define MSR_IA32_MISC_ENABLE_FERR (1ULL << MSR_IA32_MISC_ENABLE_FERR_BIT) -#define MSR_IA32_MISC_ENABLE_FERR_MULTIPLEX_BIT 10 -#define MSR_IA32_MISC_ENABLE_FERR_MULTIPLEX (1ULL << MSR_IA32_MISC_ENABLE_FERR_MULTIPLEX_BIT) -#define MSR_IA32_MISC_ENABLE_TM2_BIT 13 -#define MSR_IA32_MISC_ENABLE_TM2 (1ULL << MSR_IA32_MISC_ENABLE_TM2_BIT) -#define MSR_IA32_MISC_ENABLE_ADJ_PREF_DISABLE_BIT 19 -#define MSR_IA32_MISC_ENABLE_ADJ_PREF_DISABLE (1ULL << MSR_IA32_MISC_ENABLE_ADJ_PREF_DISABLE_BIT) -#define MSR_IA32_MISC_ENABLE_SPEEDSTEP_LOCK_BIT 20 -#define MSR_IA32_MISC_ENABLE_SPEEDSTEP_LOCK (1ULL << MSR_IA32_MISC_ENABLE_SPEEDSTEP_LOCK_BIT) -#define MSR_IA32_MISC_ENABLE_L1D_CONTEXT_BIT 24 -#define MSR_IA32_MISC_ENABLE_L1D_CONTEXT (1ULL << MSR_IA32_MISC_ENABLE_L1D_CONTEXT_BIT) -#define MSR_IA32_MISC_ENABLE_DCU_PREF_DISABLE_BIT 37 -#define MSR_IA32_MISC_ENABLE_DCU_PREF_DISABLE (1ULL << MSR_IA32_MISC_ENABLE_DCU_PREF_DISABLE_BIT) -#define MSR_IA32_MISC_ENABLE_TURBO_DISABLE_BIT 38 -#define MSR_IA32_MISC_ENABLE_TURBO_DISABLE (1ULL << MSR_IA32_MISC_ENABLE_TURBO_DISABLE_BIT) -#define MSR_IA32_MISC_ENABLE_IP_PREF_DISABLE_BIT 39 -#define MSR_IA32_MISC_ENABLE_IP_PREF_DISABLE (1ULL << MSR_IA32_MISC_ENABLE_IP_PREF_DISABLE_BIT) - -/* MISC_FEATURES_ENABLES non-architectural features */ -#define MSR_MISC_FEATURES_ENABLES 0x00000140 - -#define MSR_MISC_FEATURES_ENABLES_CPUID_FAULT_BIT 0 -#define MSR_MISC_FEATURES_ENABLES_CPUID_FAULT BIT_ULL(MSR_MISC_FEATURES_ENABLES_CPUID_FAULT_BIT) -#define MSR_MISC_FEATURES_ENABLES_RING3MWAIT_BIT 1 - -#define MSR_IA32_TSC_DEADLINE 0x000006E0 - -/* P4/Xeon+ specific */ -#define MSR_IA32_MCG_EAX 0x00000180 -#define MSR_IA32_MCG_EBX 0x00000181 -#define MSR_IA32_MCG_ECX 0x00000182 -#define MSR_IA32_MCG_EDX 0x00000183 -#define MSR_IA32_MCG_ESI 0x00000184 -#define MSR_IA32_MCG_EDI 0x00000185 -#define MSR_IA32_MCG_EBP 0x00000186 -#define MSR_IA32_MCG_ESP 0x00000187 -#define MSR_IA32_MCG_EFLAGS 0x00000188 -#define MSR_IA32_MCG_EIP 0x00000189 -#define MSR_IA32_MCG_RESERVED 0x0000018a - -/* Pentium IV performance counter MSRs */ -#define MSR_P4_BPU_PERFCTR0 0x00000300 -#define MSR_P4_BPU_PERFCTR1 0x00000301 -#define MSR_P4_BPU_PERFCTR2 0x00000302 -#define MSR_P4_BPU_PERFCTR3 0x00000303 -#define MSR_P4_MS_PERFCTR0 0x00000304 -#define MSR_P4_MS_PERFCTR1 0x00000305 -#define MSR_P4_MS_PERFCTR2 0x00000306 -#define MSR_P4_MS_PERFCTR3 0x00000307 -#define MSR_P4_FLAME_PERFCTR0 0x00000308 -#define MSR_P4_FLAME_PERFCTR1 0x00000309 -#define MSR_P4_FLAME_PERFCTR2 0x0000030a -#define MSR_P4_FLAME_PERFCTR3 0x0000030b -#define MSR_P4_IQ_PERFCTR0 0x0000030c -#define MSR_P4_IQ_PERFCTR1 0x0000030d -#define MSR_P4_IQ_PERFCTR2 0x0000030e -#define MSR_P4_IQ_PERFCTR3 0x0000030f -#define MSR_P4_IQ_PERFCTR4 0x00000310 -#define MSR_P4_IQ_PERFCTR5 0x00000311 -#define MSR_P4_BPU_CCCR0 0x00000360 -#define MSR_P4_BPU_CCCR1 0x00000361 -#define MSR_P4_BPU_CCCR2 0x00000362 -#define MSR_P4_BPU_CCCR3 0x00000363 -#define MSR_P4_MS_CCCR0 0x00000364 -#define MSR_P4_MS_CCCR1 0x00000365 -#define MSR_P4_MS_CCCR2 0x00000366 -#define MSR_P4_MS_CCCR3 0x00000367 -#define MSR_P4_FLAME_CCCR0 0x00000368 -#define MSR_P4_FLAME_CCCR1 0x00000369 -#define MSR_P4_FLAME_CCCR2 0x0000036a -#define MSR_P4_FLAME_CCCR3 0x0000036b -#define MSR_P4_IQ_CCCR0 0x0000036c -#define MSR_P4_IQ_CCCR1 0x0000036d -#define MSR_P4_IQ_CCCR2 0x0000036e -#define MSR_P4_IQ_CCCR3 0x0000036f -#define MSR_P4_IQ_CCCR4 0x00000370 -#define MSR_P4_IQ_CCCR5 0x00000371 -#define MSR_P4_ALF_ESCR0 0x000003ca -#define MSR_P4_ALF_ESCR1 0x000003cb -#define MSR_P4_BPU_ESCR0 0x000003b2 -#define MSR_P4_BPU_ESCR1 0x000003b3 -#define MSR_P4_BSU_ESCR0 0x000003a0 -#define MSR_P4_BSU_ESCR1 0x000003a1 -#define MSR_P4_CRU_ESCR0 0x000003b8 -#define MSR_P4_CRU_ESCR1 0x000003b9 -#define MSR_P4_CRU_ESCR2 0x000003cc -#define MSR_P4_CRU_ESCR3 0x000003cd -#define MSR_P4_CRU_ESCR4 0x000003e0 -#define MSR_P4_CRU_ESCR5 0x000003e1 -#define MSR_P4_DAC_ESCR0 0x000003a8 -#define MSR_P4_DAC_ESCR1 0x000003a9 -#define MSR_P4_FIRM_ESCR0 0x000003a4 -#define MSR_P4_FIRM_ESCR1 0x000003a5 -#define MSR_P4_FLAME_ESCR0 0x000003a6 -#define MSR_P4_FLAME_ESCR1 0x000003a7 -#define MSR_P4_FSB_ESCR0 0x000003a2 -#define MSR_P4_FSB_ESCR1 0x000003a3 -#define MSR_P4_IQ_ESCR0 0x000003ba -#define MSR_P4_IQ_ESCR1 0x000003bb -#define MSR_P4_IS_ESCR0 0x000003b4 -#define MSR_P4_IS_ESCR1 0x000003b5 -#define MSR_P4_ITLB_ESCR0 0x000003b6 -#define MSR_P4_ITLB_ESCR1 0x000003b7 -#define MSR_P4_IX_ESCR0 0x000003c8 -#define MSR_P4_IX_ESCR1 0x000003c9 -#define MSR_P4_MOB_ESCR0 0x000003aa -#define MSR_P4_MOB_ESCR1 0x000003ab -#define MSR_P4_MS_ESCR0 0x000003c0 -#define MSR_P4_MS_ESCR1 0x000003c1 -#define MSR_P4_PMH_ESCR0 0x000003ac -#define MSR_P4_PMH_ESCR1 0x000003ad -#define MSR_P4_RAT_ESCR0 0x000003bc -#define MSR_P4_RAT_ESCR1 0x000003bd -#define MSR_P4_SAAT_ESCR0 0x000003ae -#define MSR_P4_SAAT_ESCR1 0x000003af -#define MSR_P4_SSU_ESCR0 0x000003be -#define MSR_P4_SSU_ESCR1 0x000003bf /* guess: not in manual */ - -#define MSR_P4_TBPU_ESCR0 0x000003c2 -#define MSR_P4_TBPU_ESCR1 0x000003c3 -#define MSR_P4_TC_ESCR0 0x000003c4 -#define MSR_P4_TC_ESCR1 0x000003c5 -#define MSR_P4_U2L_ESCR0 0x000003b0 -#define MSR_P4_U2L_ESCR1 0x000003b1 - -#define MSR_P4_PEBS_MATRIX_VERT 0x000003f2 - -/* Intel Core-based CPU performance counters */ -#define MSR_CORE_PERF_FIXED_CTR0 0x00000309 -#define MSR_CORE_PERF_FIXED_CTR1 0x0000030a -#define MSR_CORE_PERF_FIXED_CTR2 0x0000030b -#define MSR_CORE_PERF_FIXED_CTR_CTRL 0x0000038d -#define MSR_CORE_PERF_GLOBAL_STATUS 0x0000038e -#define MSR_CORE_PERF_GLOBAL_CTRL 0x0000038f -#define MSR_CORE_PERF_GLOBAL_OVF_CTRL 0x00000390 - -/* Geode defined MSRs */ -#define MSR_GEODE_BUSCONT_CONF0 0x00001900 - -/* Intel VT MSRs */ -#define MSR_IA32_VMX_BASIC 0x00000480 -#define MSR_IA32_VMX_PINBASED_CTLS 0x00000481 -#define MSR_IA32_VMX_PROCBASED_CTLS 0x00000482 -#define MSR_IA32_VMX_EXIT_CTLS 0x00000483 -#define MSR_IA32_VMX_ENTRY_CTLS 0x00000484 -#define MSR_IA32_VMX_MISC 0x00000485 -#define MSR_IA32_VMX_CR0_FIXED0 0x00000486 -#define MSR_IA32_VMX_CR0_FIXED1 0x00000487 -#define MSR_IA32_VMX_CR4_FIXED0 0x00000488 -#define MSR_IA32_VMX_CR4_FIXED1 0x00000489 -#define MSR_IA32_VMX_VMCS_ENUM 0x0000048a -#define MSR_IA32_VMX_PROCBASED_CTLS2 0x0000048b -#define MSR_IA32_VMX_EPT_VPID_CAP 0x0000048c -#define MSR_IA32_VMX_TRUE_PINBASED_CTLS 0x0000048d -#define MSR_IA32_VMX_TRUE_PROCBASED_CTLS 0x0000048e -#define MSR_IA32_VMX_TRUE_EXIT_CTLS 0x0000048f -#define MSR_IA32_VMX_TRUE_ENTRY_CTLS 0x00000490 -#define MSR_IA32_VMX_VMFUNC 0x00000491 - -/* VMX_BASIC bits and bitmasks */ -#define VMX_BASIC_VMCS_SIZE_SHIFT 32 -#define VMX_BASIC_TRUE_CTLS (1ULL << 55) -#define VMX_BASIC_64 0x0001000000000000LLU -#define VMX_BASIC_MEM_TYPE_SHIFT 50 -#define VMX_BASIC_MEM_TYPE_MASK 0x003c000000000000LLU -#define VMX_BASIC_MEM_TYPE_WB 6LLU -#define VMX_BASIC_INOUT 0x0040000000000000LLU - /* VMX_EPT_VPID_CAP bits */ -#define VMX_EPT_VPID_CAP_AD_BITS (1ULL << 21) - -/* MSR_IA32_VMX_MISC bits */ -#define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL << 29) -#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE 0x1F -/* AMD-V MSRs */ - -#define MSR_VM_CR 0xc0010114 -#define MSR_VM_IGNNE 0xc0010115 -#define MSR_VM_HSAVE_PA 0xc0010117 +#define VMX_EPT_VPID_CAP_AD_BITS (1ULL << 21)
#endif /* SELFTEST_KVM_PROCESSOR_H */
Sync msr-index.h to pull in recent renames of the IA32_FEATURE_CONTROL MSR definitions. Update KVM's VMX selftest and turbostat accordingly. Keep the full name in turbostat's output to avoid breaking someone's workflow, e.g. if a script is looking for the full name.
While using the renamed defines is by no means necessary, do the sync now to avoid leaving a landmine that will get stepped on the next time msr-index.h needs to be refreshed for some other reason.
No functional change intended.
Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com --- tools/arch/x86/include/asm/msr-index.h | 14 ++++++++------ tools/power/x86/turbostat/turbostat.c | 4 ++-- tools/testing/selftests/kvm/lib/x86_64/vmx.c | 8 ++++---- 3 files changed, 14 insertions(+), 12 deletions(-)
diff --git a/tools/arch/x86/include/asm/msr-index.h b/tools/arch/x86/include/asm/msr-index.h index 084e98da04a7..ebe1685e92dd 100644 --- a/tools/arch/x86/include/asm/msr-index.h +++ b/tools/arch/x86/include/asm/msr-index.h @@ -558,7 +558,14 @@ #define MSR_IA32_EBL_CR_POWERON 0x0000002a #define MSR_EBC_FREQUENCY_ID 0x0000002c #define MSR_SMI_COUNT 0x00000034 -#define MSR_IA32_FEATURE_CONTROL 0x0000003a + +/* Referred to as IA32_FEATURE_CONTROL in Intel's SDM. */ +#define MSR_IA32_FEAT_CTL 0x0000003a +#define FEAT_CTL_LOCKED BIT(0) +#define FEAT_CTL_VMX_ENABLED_INSIDE_SMX BIT(1) +#define FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX BIT(2) +#define FEAT_CTL_LMCE_ENABLED BIT(20) + #define MSR_IA32_TSC_ADJUST 0x0000003b #define MSR_IA32_BNDCFGS 0x00000d90
@@ -566,11 +573,6 @@
#define MSR_IA32_XSS 0x00000da0
-#define FEATURE_CONTROL_LOCKED (1<<0) -#define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX (1<<1) -#define FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX (1<<2) -#define FEATURE_CONTROL_LMCE (1<<20) - #define MSR_IA32_APICBASE 0x0000001b #define MSR_IA32_APICBASE_BSP (1<<8) #define MSR_IA32_APICBASE_ENABLE (1<<11) diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c index 5d0fddda842c..31c1ca0bb3ee 100644 --- a/tools/power/x86/turbostat/turbostat.c +++ b/tools/power/x86/turbostat/turbostat.c @@ -4499,10 +4499,10 @@ void decode_feature_control_msr(void) { unsigned long long msr;
- if (!get_msr(base_cpu, MSR_IA32_FEATURE_CONTROL, &msr)) + if (!get_msr(base_cpu, MSR_IA32_FEAT_CTL, &msr)) fprintf(outf, "cpu%d: MSR_IA32_FEATURE_CONTROL: 0x%08llx (%sLocked %s)\n", base_cpu, msr, - msr & FEATURE_CONTROL_LOCKED ? "" : "UN-", + msr & FEAT_CTL_LOCKED ? "" : "UN-", msr & (1 << 18) ? "SGX" : ""); }
diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c index f6ec97b7eaef..85064baf5e97 100644 --- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c +++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c @@ -157,11 +157,11 @@ bool prepare_for_vmx_operation(struct vmx_pages *vmx) * Bit 2: Enables VMXON outside of SMX operation. If clear, VMXON * outside of SMX causes a #GP. */ - required = FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX; - required |= FEATURE_CONTROL_LOCKED; - feature_control = rdmsr(MSR_IA32_FEATURE_CONTROL); + required = FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX; + required |= FEAT_CTL_LOCKED; + feature_control = rdmsr(MSR_IA32_FEAT_CTL); if ((feature_control & required) != required) - wrmsr(MSR_IA32_FEATURE_CONTROL, feature_control | required); + wrmsr(MSR_IA32_FEAT_CTL, feature_control | required);
/* Enter VMX root operation. */ *(uint32_t *)(vmx->vmxon) = vmcs_revision();
Opportunistically initialize IA32_FEAT_CTL to enable VMX when the MSR is left unlocked by BIOS. Configuring feature control at boot time paves the way for similar enabling of other features, e.g. Software Guard Extensions (SGX).
Temporarily leave equivalent KVM code in place in order to avoid introducing a regression on Centaur and Zhaoxin CPUs, e.g. removing KVM's code would leave the MSR unlocked on those CPUs and would break existing functionality if people are loading kvm_intel on Centaur and/or Zhaoxin. Defer enablement of the boot-time configuration on Centaur and Zhaoxin to future patches to aid bisection.
Note, Local Machine Check Exceptions (LMCE) are also supported by the kernel and enabled via feature control, but the kernel currently uses LMCE if and and only if the feature is explicitly enabled by BIOS. Keep the current behavior to avoid introducing bugs, future patches can opt in to opportunistic enabling if it's deemed desirable to do so.
Always lock IA32_FEAT_CTL if it exists, even if the CPU doesn't support VMX, so that other existing and future kernel code that queries the MSR can assume it's locked.
Start from a clean slate when constructing the value to write to IA32_FEAT_CTL, i.e. ignore whatever value BIOS left in the MSR so as not to enable random features or fault on the WRMSR.
Suggested-by: Borislav Petkov bp@suse.de Cc: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com --- arch/x86/Kconfig.cpu | 4 ++++ arch/x86/kernel/cpu/Makefile | 1 + arch/x86/kernel/cpu/cpu.h | 4 ++++ arch/x86/kernel/cpu/feat_ctl.c | 37 ++++++++++++++++++++++++++++++++++ arch/x86/kernel/cpu/intel.c | 2 ++ 5 files changed, 48 insertions(+) create mode 100644 arch/x86/kernel/cpu/feat_ctl.c
diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu index af9c967782f6..98be76fa2978 100644 --- a/arch/x86/Kconfig.cpu +++ b/arch/x86/Kconfig.cpu @@ -387,6 +387,10 @@ config X86_DEBUGCTLMSR def_bool y depends on !(MK6 || MWINCHIPC6 || MWINCHIP3D || MCYRIXIII || M586MMX || M586TSC || M586 || M486SX || M486) && !UML
+config IA32_FEAT_CTL + def_bool y + depends on CPU_SUP_INTEL + menuconfig PROCESSOR_SELECT bool "Supported processor vendors" if EXPERT ---help--- diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile index a704fb9ee98e..4173b0de7a2f 100644 --- a/arch/x86/kernel/cpu/Makefile +++ b/arch/x86/kernel/cpu/Makefile @@ -32,6 +32,7 @@ obj-y += umwait.o obj-$(CONFIG_PROC_FS) += proc.o obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o
+obj-$(CONFIG_IA32_FEAT_CTL) += feat_ctl.o ifdef CONFIG_CPU_SUP_INTEL obj-y += intel.o intel_pconfig.o tsx.o obj-$(CONFIG_PM) += intel_epb.o diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h index 38ab6e115eac..37fdefd14f28 100644 --- a/arch/x86/kernel/cpu/cpu.h +++ b/arch/x86/kernel/cpu/cpu.h @@ -80,4 +80,8 @@ extern void x86_spec_ctrl_setup_ap(void);
extern u64 x86_read_arch_cap_msr(void);
+#ifdef CONFIG_IA32_FEAT_CTL +void init_ia32_feat_ctl(struct cpuinfo_x86 *c); +#endif + #endif /* ARCH_X86_CPU_H */ diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c new file mode 100644 index 000000000000..c4f8f767853b --- /dev/null +++ b/arch/x86/kernel/cpu/feat_ctl.c @@ -0,0 +1,37 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/tboot.h> + +#include <asm/cpufeature.h> +#include <asm/msr-index.h> +#include <asm/processor.h> + +void init_ia32_feat_ctl(struct cpuinfo_x86 *c) +{ + u64 msr; + + if (rdmsrl_safe(MSR_IA32_FEAT_CTL, &msr)) + return; + + if (msr & FEAT_CTL_LOCKED) + return; + + /* + * Ignore whatever value BIOS left in the MSR to avoid enabling random + * features or faulting on the WRMSR. + */ + msr = FEAT_CTL_LOCKED; + + /* + * Enable VMX if and only if the kernel may do VMXON at some point, + * i.e. KVM is enabled, to avoid unnecessarily adding an attack vector + * for the kernel, e.g. using VMX to hide malicious code. + */ + if (cpu_has(c, X86_FEATURE_VMX) && IS_ENABLED(CONFIG_KVM_INTEL)) { + msr |= FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX; + + if (tboot_enabled()) + msr |= FEAT_CTL_VMX_ENABLED_INSIDE_SMX; + } + + wrmsrl(MSR_IA32_FEAT_CTL, msr); +} diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 4a900804a023..9129c170ea11 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -755,6 +755,8 @@ static void init_intel(struct cpuinfo_x86 *c) /* Work around errata */ srat_detect_node(c);
+ init_ia32_feat_ctl(c); + if (cpu_has(c, X86_FEATURE_VMX)) detect_vmx_virtcap(c);
WARN if the IA32_FEAT_CTL MSR is somehow left unlocked now that CPU initialization unconditionally locks the MSR.
Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com --- arch/x86/kernel/cpu/mce/intel.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c index c238518b84a2..5627b1091b85 100644 --- a/arch/x86/kernel/cpu/mce/intel.c +++ b/arch/x86/kernel/cpu/mce/intel.c @@ -116,14 +116,15 @@ static bool lmce_supported(void) /* * BIOS should indicate support for LMCE by setting bit 20 in * IA32_FEAT_CTL without which touching MCG_EXT_CTL will generate a #GP - * fault. + * fault. The MSR must also be locked for LMCE_ENABLED to take effect. + * WARN if the MSR isn't locked as init_ia32_feat_ctl() unconditionally + * locks the MSR in the event that it wasn't already locked by BIOS. */ rdmsrl(MSR_IA32_FEAT_CTL, tmp); - if ((tmp & (FEAT_CTL_LOCKED | FEAT_CTL_LMCE_ENABLED)) == - (FEAT_CTL_LOCKED | FEAT_CTL_LMCE_ENABLED)) - return true; + if (WARN_ON_ONCE(!(tmp & FEAT_CTL_LOCKED))) + return false;
- return false; + return tmp & FEAT_CTL_LMCE_ENABLED; }
bool mce_intel_cmci_poll(void)
Use the recently added IA32_FEAT_CTL MSR initialization sequence to opportunistically enable VMX support when running on a Centaur CPU.
Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com --- arch/x86/Kconfig.cpu | 2 +- arch/x86/kernel/cpu/centaur.c | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu index 98be76fa2978..cba5b64266df 100644 --- a/arch/x86/Kconfig.cpu +++ b/arch/x86/Kconfig.cpu @@ -389,7 +389,7 @@ config X86_DEBUGCTLMSR
config IA32_FEAT_CTL def_bool y - depends on CPU_SUP_INTEL + depends on CPU_SUP_INTEL || CPU_SUP_CENTAUR
menuconfig PROCESSOR_SELECT bool "Supported processor vendors" if EXPERT diff --git a/arch/x86/kernel/cpu/centaur.c b/arch/x86/kernel/cpu/centaur.c index 14433ff5b828..084f6040b4df 100644 --- a/arch/x86/kernel/cpu/centaur.c +++ b/arch/x86/kernel/cpu/centaur.c @@ -250,6 +250,8 @@ static void init_centaur(struct cpuinfo_x86 *c) set_cpu_cap(c, X86_FEATURE_LFENCE_RDTSC); #endif
+ init_ia32_feat_ctl(c); + if (cpu_has(c, X86_FEATURE_VMX)) centaur_detect_vmx_virtcap(c); }
Use the recently added IA32_FEAT_CTL MSR initialization sequence to opportunistically enable VMX support when running on a Zhaoxin CPU.
Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com --- arch/x86/Kconfig.cpu | 2 +- arch/x86/kernel/cpu/zhaoxin.c | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu index cba5b64266df..526425fcaedc 100644 --- a/arch/x86/Kconfig.cpu +++ b/arch/x86/Kconfig.cpu @@ -389,7 +389,7 @@ config X86_DEBUGCTLMSR
config IA32_FEAT_CTL def_bool y - depends on CPU_SUP_INTEL || CPU_SUP_CENTAUR + depends on CPU_SUP_INTEL || CPU_SUP_CENTAUR || CPU_SUP_ZHAOXIN
menuconfig PROCESSOR_SELECT bool "Supported processor vendors" if EXPERT diff --git a/arch/x86/kernel/cpu/zhaoxin.c b/arch/x86/kernel/cpu/zhaoxin.c index 8e6f2f4b4afe..630a1450ea70 100644 --- a/arch/x86/kernel/cpu/zhaoxin.c +++ b/arch/x86/kernel/cpu/zhaoxin.c @@ -141,6 +141,8 @@ static void init_zhaoxin(struct cpuinfo_x86 *c) set_cpu_cap(c, X86_FEATURE_LFENCE_RDTSC); #endif
+ init_ia32_feat_ctl(c); + if (cpu_has(c, X86_FEATURE_VMX)) zhaoxin_detect_vmx_virtcap(c); }
Now that IA32_FEAT_CTL is always configured and locked for CPUs that are known to support VMX[*], clear the VMX capability flag if the MSR is unsupported or BIOS disabled VMX, i.e. locked IA32_FEAT_CTL and didn't set the appropriate VMX enable bit.
[*] Because init_ia32_feat_ctl() is called from vendors ->c_init(), it's still possible for IA32_FEAT_CTL to be left unlocked when VMX is supported by the CPU. This is not fatal, and will be addressed in a future patch.
Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com --- arch/x86/kernel/cpu/feat_ctl.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c index c4f8f767853b..a46c9e46f937 100644 --- a/arch/x86/kernel/cpu/feat_ctl.c +++ b/arch/x86/kernel/cpu/feat_ctl.c @@ -5,15 +5,21 @@ #include <asm/msr-index.h> #include <asm/processor.h>
+#undef pr_fmt +#define pr_fmt(fmt) "x86/cpu: " fmt + void init_ia32_feat_ctl(struct cpuinfo_x86 *c) { + bool tboot = tboot_enabled(); u64 msr;
- if (rdmsrl_safe(MSR_IA32_FEAT_CTL, &msr)) + if (rdmsrl_safe(MSR_IA32_FEAT_CTL, &msr)) { + clear_cpu_cap(c, X86_FEATURE_VMX); return; + }
if (msr & FEAT_CTL_LOCKED) - return; + goto update_caps;
/* * Ignore whatever value BIOS left in the MSR to avoid enabling random @@ -29,9 +35,20 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c) if (cpu_has(c, X86_FEATURE_VMX) && IS_ENABLED(CONFIG_KVM_INTEL)) { msr |= FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX;
- if (tboot_enabled()) + if (tboot) msr |= FEAT_CTL_VMX_ENABLED_INSIDE_SMX; }
wrmsrl(MSR_IA32_FEAT_CTL, msr); + +update_caps: + if (!cpu_has(c, X86_FEATURE_VMX)) + return; + + if ( (tboot && !(msr & FEAT_CTL_VMX_ENABLED_INSIDE_SMX)) || + (!tboot && !(msr & FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX))) { + pr_err_once("VMX (%s TXT) disabled by BIOS\n", + tboot ? "inside" : "outside"); + clear_cpu_cap(c, X86_FEATURE_VMX); + } }
Add a VMX specific variant of X86_FEATURE_* flags, which will eventually supplant the synthetic VMX flags defined in cpufeatures word 8. Use the Intel-defined layouts for the major VMX execution controls so that their word entries can be directly populated from their respective MSRs, and so that the VMX_FEATURE_* flags can be used to define the existing bit definitions in asm/vmx.h, i.e. force developers to define a VMX_FEATURE flag when adding support for a new hardware feature.
The majority of Intel's (and compatible CPU's) VMX capabilities are enumerated via MSRs and not CPUID, i.e. querying /proc/cpuinfo doesn't naturally provide any insight into the virtualization capabilities of VMX enabled CPUs. Commit e38e05a85828d ("x86: extended "flags" to show virtualization HW feature in /proc/cpuinfo") attempted to address the issue by synthesizing select VMX features into a Linux-defined word in cpufeatures.
Lack of reporting of VMX capabilities via /proc/cpuinfo is problematic because there is no sane way for a user to query the capabilities of their platform, e.g. when trying to find a platform to test a feature or debug an issue that has a hardware dependency. Lack of reporting is especially problematic when the user isn't familiar with VMX, e.g. the format of the MSRs is non-standard, existence of some MSRs is reported by bits in other MSRs, several "features" from KVM's point of view are enumerated as 3+ distinct features by hardware, etc...
The synthetic cpufeatures approach has several flaws:
- The set of synthesized VMX flags has become extremely stale with respect to the full set of VMX features, e.g. only one new flag (EPT A/D) has been added in the the decade since the introduction of the synthetic VMX features. Failure to keep the VMX flags up to date is likely due to the lack of a mechanism that forces developers to consider whether or not a new feature is worth reporting.
- The synthetic flags may incorrectly be misinterpreted as affecting kernel behavior, i.e. KVM, the kernel's sole consumer of VMX, completely ignores the synthetic flags.
- New CPU vendors that support VMX have duplicated the hideous code that propagates VMX features from MSRs to cpufeatures. Bringing the synthetic VMX flags up to date would exacerbate the copy+paste trainwreck.
Define separate VMX_FEATURE flags to set the stage for enumerating VMX capabilities outside of the cpu_has() framework, and for adding functional usage of VMX_FEATURE_* to help ensure the features reported via /proc/cpuinfo is up to date with respect to kernel recognition of VMX capabilities.
Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com --- MAINTAINERS | 2 +- arch/x86/include/asm/processor.h | 1 + arch/x86/include/asm/vmxfeatures.h | 81 ++++++++++++++++++++++++++++++ 3 files changed, 83 insertions(+), 1 deletion(-) create mode 100644 arch/x86/include/asm/vmxfeatures.h
diff --git a/MAINTAINERS b/MAINTAINERS index 51b84c987d2a..985a7a0cc1e7 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9141,7 +9141,7 @@ F: arch/x86/include/uapi/asm/svm.h F: arch/x86/include/asm/kvm* F: arch/x86/include/asm/pvclock-abi.h F: arch/x86/include/asm/svm.h -F: arch/x86/include/asm/vmx.h +F: arch/x86/include/asm/vmx*.h F: arch/x86/kernel/kvm.c F: arch/x86/kernel/kvmclock.c
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 7c071f86a058..b49b88bae92f 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -25,6 +25,7 @@ struct vm86; #include <asm/special_insns.h> #include <asm/fpu/types.h> #include <asm/unwind_hints.h> +#include <asm/vmxfeatures.h>
#include <linux/personality.h> #include <linux/cache.h> diff --git a/arch/x86/include/asm/vmxfeatures.h b/arch/x86/include/asm/vmxfeatures.h new file mode 100644 index 000000000000..4c743ba40ff1 --- /dev/null +++ b/arch/x86/include/asm/vmxfeatures.h @@ -0,0 +1,81 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_VMXFEATURES_H +#define _ASM_X86_VMXFEATURES_H + +/* + * Note: If the comment begins with a quoted string, that string is used + * in /proc/cpuinfo instead of the macro name. If the string is "", + * this feature bit is not displayed in /proc/cpuinfo at all. + */ + +/* Pin-Based VM-Execution Controls, EPT/VPID, APIC and VM-Functions, word 0 */ +#define VMX_FEATURE_INTR_EXITING ( 0*32+ 0) /* "" VM-Exit on vectored interrupts */ +#define VMX_FEATURE_NMI_EXITING ( 0*32+ 3) /* "" VM-Exit on NMIs */ +#define VMX_FEATURE_VIRTUAL_NMIS ( 0*32+ 5) /* "vnmi" NMI virtualization */ +#define VMX_FEATURE_PREEMPTION_TIMER ( 0*32+ 6) /* VMX Preemption Timer */ +#define VMX_FEATURE_POSTED_INTR ( 0*32+ 7) /* Posted Interrupts */ + +/* EPT/VPID features, scattered to bits 16-23 */ +#define VMX_FEATURE_INVVPID ( 0*32+ 16) /* INVVPID is supported */ +#define VMX_FEATURE_EPT_EXECUTE_ONLY ( 0*32+ 17) /* "ept_x_only" EPT entries can be execute only */ +#define VMX_FEATURE_EPT_AD ( 0*32+ 18) /* EPT Accessed/Dirty bits */ +#define VMX_FEATURE_EPT_1GB ( 0*32+ 19) /* 1GB EPT pages */ + +/* Aggregated APIC features 24-27 */ +#define VMX_FEATURE_FLEXPRIORITY ( 0*32+ 24) /* TPR shadow + virt APIC */ +#define VMX_FEATURE_APICV ( 0*32+ 25) /* TPR shadow + APIC reg virt + virt intr delivery + posted interrupts */ + +/* VM-Functions, shifted to bits 28-31 */ +#define VMX_FEATURE_EPTP_SWITCHING ( 0*32+ 28) /* EPTP switching (in guest) */ + +/* Primary Processor-Based VM-Execution Controls, word 1 */ +#define VMX_FEATURE_VIRTUAL_INTR_PENDING ( 1*32+ 2) /* "" VM-Exit if INTRs are unblocked in guest */ +#define VMX_FEATURE_TSC_OFFSETTING ( 1*32+ 3) /* "tsc_offset" Offset hardware TSC when read in guest */ +#define VMX_FEATURE_HLT_EXITING ( 1*32+ 7) /* "" VM-Exit on HLT */ +#define VMX_FEATURE_INVLPG_EXITING ( 1*32+ 9) /* "" VM-Exit on INVLPG */ +#define VMX_FEATURE_MWAIT_EXITING ( 1*32+ 10) /* "" VM-Exit on MWAIT */ +#define VMX_FEATURE_RDPMC_EXITING ( 1*32+ 11) /* "" VM-Exit on RDPMC */ +#define VMX_FEATURE_RDTSC_EXITING ( 1*32+ 12) /* "" VM-Exit on RDTSC */ +#define VMX_FEATURE_CR3_LOAD_EXITING ( 1*32+ 15) /* "" VM-Exit on writes to CR3 */ +#define VMX_FEATURE_CR3_STORE_EXITING ( 1*32+ 16) /* "" VM-Exit on reads from CR3 */ +#define VMX_FEATURE_CR8_LOAD_EXITING ( 1*32+ 19) /* "" VM-Exit on writes to CR8 */ +#define VMX_FEATURE_CR8_STORE_EXITING ( 1*32+ 20) /* "" VM-Exit on reads from CR8 */ +#define VMX_FEATURE_VIRTUAL_TPR ( 1*32+ 21) /* "vtpr" TPR virtualization, a.k.a. TPR shadow */ +#define VMX_FEATURE_VIRTUAL_NMI_PENDING ( 1*32+ 22) /* "" VM-Exit if NMIs are unblocked in guest */ +#define VMX_FEATURE_MOV_DR_EXITING ( 1*32+ 23) /* "" VM-Exit on accesses to debug registers */ +#define VMX_FEATURE_UNCOND_IO_EXITING ( 1*32+ 24) /* "" VM-Exit on *all* IN{S} and OUT{S}*/ +#define VMX_FEATURE_USE_IO_BITMAPS ( 1*32+ 25) /* "" VM-Exit based on I/O port */ +#define VMX_FEATURE_MONITOR_TRAP_FLAG ( 1*32+ 27) /* "mtf" VMX single-step VM-Exits */ +#define VMX_FEATURE_USE_MSR_BITMAPS ( 1*32+ 28) /* "" VM-Exit based on MSR index */ +#define VMX_FEATURE_MONITOR_EXITING ( 1*32+ 29) /* "" VM-Exit on MONITOR (MWAIT's accomplice) */ +#define VMX_FEATURE_PAUSE_EXITING ( 1*32+ 30) /* "" VM-Exit on PAUSE (unconditionally) */ +#define VMX_FEATURE_SEC_CONTROLS ( 1*32+ 31) /* "" Enable Secondary VM-Execution Controls */ + +/* Secondary Processor-Based VM-Execution Controls, word 2 */ +#define VMX_FEATURE_VIRT_APIC_ACCESSES ( 2*32+ 0) /* "vapic" Virtualize memory mapped APIC accesses */ +#define VMX_FEATURE_EPT ( 2*32+ 1) /* Extended Page Tables, a.k.a. Two-Dimensional Paging */ +#define VMX_FEATURE_DESC_EXITING ( 2*32+ 2) /* "" VM-Exit on {S,L}*DT instructions */ +#define VMX_FEATURE_RDTSCP ( 2*32+ 3) /* "" Enable RDTSCP in guest */ +#define VMX_FEATURE_VIRTUAL_X2APIC ( 2*32+ 4) /* "" Virtualize X2APIC for the guest */ +#define VMX_FEATURE_VPID ( 2*32+ 5) /* Virtual Processor ID (TLB ASID modifier) */ +#define VMX_FEATURE_WBINVD_EXITING ( 2*32+ 6) /* "" VM-Exit on WBINVD */ +#define VMX_FEATURE_UNRESTRICTED_GUEST ( 2*32+ 7) /* Allow Big Real Mode and other "invalid" states */ +#define VMX_FEATURE_APIC_REGISTER_VIRT ( 2*32+ 8) /* "vapic_reg" Hardware emulation of reads to the virtual-APIC */ +#define VMX_FEATURE_VIRT_INTR_DELIVERY ( 2*32+ 9) /* "vid" Evaluation and delivery of pending virtual interrupts */ +#define VMX_FEATURE_PAUSE_LOOP_EXITING ( 2*32+ 10) /* "ple" Conditionally VM-Exit on PAUSE at CPL0 */ +#define VMX_FEATURE_RDRAND_EXITING ( 2*32+ 11) /* "" VM-Exit on RDRAND*/ +#define VMX_FEATURE_INVPCID ( 2*32+ 12) /* "" Enable INVPCID in guest */ +#define VMX_FEATURE_VMFUNC ( 2*32+ 13) /* "" Enable VM-Functions (leaf dependent) */ +#define VMX_FEATURE_SHADOW_VMCS ( 2*32+ 14) /* VMREAD/VMWRITE in guest can access shadow VMCS */ +#define VMX_FEATURE_ENCLS_EXITING ( 2*32+ 15) /* "" VM-Exit on ENCLS (leaf dependent) */ +#define VMX_FEATURE_RDSEED_EXITING ( 2*32+ 16) /* "" VM-Exit on RDSEED */ +#define VMX_FEATURE_PAGE_MOD_LOGGING ( 2*32+ 17) /* "pml" Log dirty pages into buffer */ +#define VMX_FEATURE_EPT_VIOLATION_VE ( 2*32+ 18) /* "" Conditionally reflect EPT violations as #VE exceptions */ +#define VMX_FEATURE_PT_CONCEAL_VMX ( 2*32+ 19) /* "" Suppress VMX indicators in Processor Trace */ +#define VMX_FEATURE_XSAVES ( 2*32+ 20) /* "" Enable XSAVES and XRSTORS in guest */ +#define VMX_FEATURE_MODE_BASED_EPT_EXEC ( 2*32+ 22) /* "ept_mode_based_exec" Enable separate EPT EXEC bits for supervisor vs. user */ +#define VMX_FEATURE_PT_USE_GPA ( 2*32+ 24) /* "" Processor Trace logs GPAs */ +#define VMX_FEATURE_TSC_SCALING ( 2*32+ 25) /* Scale hardware TSC when read in guest */ +#define VMX_FEATURE_ENCLV_EXITING ( 2*32+ 28) /* "" VM-Exit on ENCLV (leaf dependent) */ + +#endif /* _ASM_X86_VMXFEATURES_H */
Add an entry in struct cpuinfo_x86 to track VMX capabilities and fill the capabilities during IA32_FEAT_CTL MSR initialization.
Make the VMX capabilities dependent on IA32_FEAT_CTL and X86_FEATURE_NAMES so as to avoid unnecessary overhead on CPUs that can't possibly support VMX, or when /proc/cpuinfo is not available.
Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com --- arch/x86/Kconfig.cpu | 4 ++ arch/x86/include/asm/processor.h | 3 ++ arch/x86/include/asm/vmxfeatures.h | 5 ++ arch/x86/kernel/cpu/common.c | 3 ++ arch/x86/kernel/cpu/feat_ctl.c | 74 ++++++++++++++++++++++++++++++ 5 files changed, 89 insertions(+)
diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu index 526425fcaedc..bc3a497c029c 100644 --- a/arch/x86/Kconfig.cpu +++ b/arch/x86/Kconfig.cpu @@ -391,6 +391,10 @@ config IA32_FEAT_CTL def_bool y depends on CPU_SUP_INTEL || CPU_SUP_CENTAUR || CPU_SUP_ZHAOXIN
+config X86_VMX_FEATURE_NAMES + def_bool y + depends on IA32_FEAT_CTL && X86_FEATURE_NAMES + menuconfig PROCESSOR_SELECT bool "Supported processor vendors" if EXPERT ---help--- diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index b49b88bae92f..6fb4870ed759 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -85,6 +85,9 @@ struct cpuinfo_x86 { #ifdef CONFIG_X86_64 /* Number of 4K pages in DTLB/ITLB combined(in pages): */ int x86_tlbsize; +#endif +#ifdef CONFIG_X86_VMX_FEATURE_NAMES + __u32 vmx_capability[NVMXINTS]; #endif __u8 x86_virt_bits; __u8 x86_phys_bits; diff --git a/arch/x86/include/asm/vmxfeatures.h b/arch/x86/include/asm/vmxfeatures.h index 4c743ba40ff1..0d04d8bf15a5 100644 --- a/arch/x86/include/asm/vmxfeatures.h +++ b/arch/x86/include/asm/vmxfeatures.h @@ -2,6 +2,11 @@ #ifndef _ASM_X86_VMXFEATURES_H #define _ASM_X86_VMXFEATURES_H
+/* + * Defines VMX CPU feature bits + */ +#define NVMXINTS 3 /* N 32-bit words worth of info */ + /* * Note: If the comment begins with a quoted string, that string is used * in /proc/cpuinfo instead of the macro name. If the string is "", diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 9d6a35a4586e..df1eacd26443 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1449,6 +1449,9 @@ static void identify_cpu(struct cpuinfo_x86 *c) #endif c->x86_cache_alignment = c->x86_clflush_size; memset(&c->x86_capability, 0, sizeof(c->x86_capability)); +#ifdef CONFIG_X86_VMX_FEATURE_NAMES + memset(&c->vmx_capability, 0, sizeof(c->vmx_capability)); +#endif
generic_identify(c);
diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c index a46c9e46f937..cbd8bfe9b87b 100644 --- a/arch/x86/kernel/cpu/feat_ctl.c +++ b/arch/x86/kernel/cpu/feat_ctl.c @@ -4,10 +4,80 @@ #include <asm/cpufeature.h> #include <asm/msr-index.h> #include <asm/processor.h> +#include <asm/vmx.h>
#undef pr_fmt #define pr_fmt(fmt) "x86/cpu: " fmt
+#ifdef CONFIG_X86_VMX_FEATURE_NAMES +enum vmx_feature_leafs { + MISC_FEATURES = 0, + PRIMARY_CTLS, + SECONDARY_CTLS, + NR_VMX_FEATURE_WORDS, +}; + +#define VMX_F(x) BIT(VMX_FEATURE_##x & 0x1f) + +static void init_vmx_capabilities(struct cpuinfo_x86 *c) +{ + u32 supported, funcs, ept, vpid, ign; + + BUILD_BUG_ON(NVMXINTS != NR_VMX_FEATURE_WORDS); + + /* + * The high bits contain the allowed-1 settings, i.e. features that can + * be turned on. The low bits contain the allowed-0 settings, i.e. + * features that can be turned off. Ignore the allowed-0 settings, + * if a feature can be turned on then it's supported. + * + * Use raw rdmsr() for primary processor controls and pin controls MSRs + * as they exist on any CPU that supports VMX, i.e. we want the WARN if + * the RDMSR faults. + */ + rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, ign, supported); + c->vmx_capability[PRIMARY_CTLS] = supported; + + rdmsr_safe(MSR_IA32_VMX_PROCBASED_CTLS2, &ign, &supported); + c->vmx_capability[SECONDARY_CTLS] = supported; + + rdmsr(MSR_IA32_VMX_PINBASED_CTLS, ign, supported); + rdmsr_safe(MSR_IA32_VMX_VMFUNC, &ign, &funcs); + + /* + * Except for EPT+VPID, which enumerates support for both in a single + * MSR, low for EPT, high for VPID. + */ + rdmsr_safe(MSR_IA32_VMX_EPT_VPID_CAP, &ept, &vpid); + + /* Pin, EPT, VPID and VM-Func are merged into a single word. */ + WARN_ON_ONCE(supported >> 16); + WARN_ON_ONCE(funcs >> 4); + c->vmx_capability[MISC_FEATURES] = (supported & 0xffff) | + ((vpid & 0x1) << 16) | + ((funcs & 0xf) << 28); + + /* EPT bits are full on scattered and must be manually handled. */ + if (ept & VMX_EPT_EXECUTE_ONLY_BIT) + c->vmx_capability[MISC_FEATURES] |= VMX_F(EPT_EXECUTE_ONLY); + if (ept & VMX_EPT_AD_BIT) + c->vmx_capability[MISC_FEATURES] |= VMX_F(EPT_AD); + if (ept & VMX_EPT_1GB_PAGE_BIT) + c->vmx_capability[MISC_FEATURES] |= VMX_F(EPT_1GB); + + /* Synthetic APIC features that are aggregates of multiple features. */ + if ((c->vmx_capability[PRIMARY_CTLS] & VMX_F(VIRTUAL_TPR)) && + (c->vmx_capability[SECONDARY_CTLS] & VMX_F(VIRT_APIC_ACCESSES))) + c->vmx_capability[MISC_FEATURES] |= VMX_F(FLEXPRIORITY); + + if ((c->vmx_capability[PRIMARY_CTLS] & VMX_F(VIRTUAL_TPR)) && + (c->vmx_capability[SECONDARY_CTLS] & VMX_F(APIC_REGISTER_VIRT)) && + (c->vmx_capability[SECONDARY_CTLS] & VMX_F(VIRT_INTR_DELIVERY)) && + (c->vmx_capability[MISC_FEATURES] & VMX_F(POSTED_INTR))) + c->vmx_capability[MISC_FEATURES] |= VMX_F(APICV); +} +#endif /* CONFIG_X86_VMX_FEATURE_NAMES */ + void init_ia32_feat_ctl(struct cpuinfo_x86 *c) { bool tboot = tboot_enabled(); @@ -50,5 +120,9 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c) pr_err_once("VMX (%s TXT) disabled by BIOS\n", tboot ? "inside" : "outside"); clear_cpu_cap(c, X86_FEATURE_VMX); + } else { +#ifdef CONFIG_X86_VMX_FEATURE_NAMES + init_vmx_capabilities(c); +#endif } }
Add support for generating VMX feature names in capflags.c and use the resulting x86_vmx_flags to print the VMX flags in /proc/cpuinfo. Don't print VMX flags if no bits are set in word 0, which holds Pin Controls. Pin Control's INTR and NMI exiting are fundamental pillars of VMX, if they are not supported then the CPU is broken, it does not actually support VMX, or the kernel wasn't built with support for the target CPU.
Print the features in a dedicated "vmx flags" line to avoid polluting the common "flags" and to avoid having to prefix all flags with "vmx_", which results in horrendously long names.
Keep synthetic VMX flags in cpufeatures to preserve /proc/cpuinfo's ABI for those flags. This means that "flags" and "vmx flags" will have duplicate entries for tpr_shadow (virtual_tpr), vnmi, ept, flexpriority, vpid and ept_ad, but caps the pollution of "flags" at those six VMX features. The vendor specific code that populates the synthetic flags will be consolidated in a future patch to further minimize the lasting damage.
Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com --- arch/x86/boot/mkcpustr.c | 1 + arch/x86/kernel/cpu/Makefile | 5 +++-- arch/x86/kernel/cpu/mkcapflags.sh | 15 +++++++++++---- arch/x86/kernel/cpu/proc.c | 15 +++++++++++++++ 4 files changed, 30 insertions(+), 6 deletions(-)
diff --git a/arch/x86/boot/mkcpustr.c b/arch/x86/boot/mkcpustr.c index 9caa10e82217..da0ccc5de538 100644 --- a/arch/x86/boot/mkcpustr.c +++ b/arch/x86/boot/mkcpustr.c @@ -15,6 +15,7 @@ #include "../include/asm/required-features.h" #include "../include/asm/disabled-features.h" #include "../include/asm/cpufeatures.h" +#include "../include/asm/vmxfeatures.h" #include "../kernel/cpu/capflags.c"
int main(void) diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile index 4173b0de7a2f..dba6a83bc349 100644 --- a/arch/x86/kernel/cpu/Makefile +++ b/arch/x86/kernel/cpu/Makefile @@ -57,11 +57,12 @@ obj-$(CONFIG_ACRN_GUEST) += acrn.o
ifdef CONFIG_X86_FEATURE_NAMES quiet_cmd_mkcapflags = MKCAP $@ - cmd_mkcapflags = $(CONFIG_SHELL) $(srctree)/$(src)/mkcapflags.sh $< $@ + cmd_mkcapflags = $(CONFIG_SHELL) $(srctree)/$(src)/mkcapflags.sh $@ $^
cpufeature = $(src)/../../include/asm/cpufeatures.h +vmxfeature = $(src)/../../include/asm/vmxfeatures.h
-$(obj)/capflags.c: $(cpufeature) $(src)/mkcapflags.sh FORCE +$(obj)/capflags.c: $(cpufeature) $(vmxfeature) $(src)/mkcapflags.sh FORCE $(call if_changed,mkcapflags) endif targets += capflags.c diff --git a/arch/x86/kernel/cpu/mkcapflags.sh b/arch/x86/kernel/cpu/mkcapflags.sh index aed45b8895d5..1db560ed2ca3 100644 --- a/arch/x86/kernel/cpu/mkcapflags.sh +++ b/arch/x86/kernel/cpu/mkcapflags.sh @@ -6,8 +6,7 @@
set -e
-IN=$1 -OUT=$2 +OUT=$1
dump_array() { @@ -15,6 +14,7 @@ dump_array() SIZE=$2 PFX=$3 POSTFIX=$4 + IN=$5
PFX_SZ=$(echo $PFX | wc -c) TABS="$(printf '\t\t\t\t\t')" @@ -57,11 +57,18 @@ trap 'rm "$OUT"' EXIT echo "#endif" echo ""
- dump_array "x86_cap_flags" "NCAPINTS*32" "X86_FEATURE_" "" + dump_array "x86_cap_flags" "NCAPINTS*32" "X86_FEATURE_" "" $2 echo ""
- dump_array "x86_bug_flags" "NBUGINTS*32" "X86_BUG_" "NCAPINTS*32" + dump_array "x86_bug_flags" "NBUGINTS*32" "X86_BUG_" "NCAPINTS*32" $2 + echo ""
+ echo "#ifdef CONFIG_X86_VMX_FEATURE_NAMES" + echo "#ifndef _ASM_X86_VMXFEATURES_H" + echo "#include <asm/vmxfeatures.h>" + echo "#endif" + dump_array "x86_vmx_flags" "NVMXINTS*32" "VMX_FEATURE_" "" $3 + echo "#endif /* CONFIG_X86_VMX_FEATURE_NAMES */" ) > $OUT
trap - EXIT diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c index cb2e49810d68..4eec8889b0ff 100644 --- a/arch/x86/kernel/cpu/proc.c +++ b/arch/x86/kernel/cpu/proc.c @@ -7,6 +7,10 @@
#include "cpu.h"
+#ifdef CONFIG_X86_VMX_FEATURE_NAMES +extern const char * const x86_vmx_flags[NVMXINTS*32]; +#endif + /* * Get CPU information for use by the procfs. */ @@ -102,6 +106,17 @@ static int show_cpuinfo(struct seq_file *m, void *v) if (cpu_has(c, i) && x86_cap_flags[i] != NULL) seq_printf(m, " %s", x86_cap_flags[i]);
+#ifdef CONFIG_X86_VMX_FEATURE_NAMES + if (cpu_has(c, X86_FEATURE_VMX) && c->vmx_capability[0]) { + seq_puts(m, "\nvmx flags\t:"); + for (i = 0; i < 32*NVMXINTS; i++) { + if (test_bit(i, (unsigned long *)c->vmx_capability) && + x86_vmx_flags[i] != NULL) + seq_printf(m, " %s", x86_vmx_flags[i]); + } + } +#endif + seq_puts(m, "\nbugs\t\t:"); for (i = 0; i < 32*NBUGINTS; i++) { unsigned int bug_bit = 32*NCAPINTS + i;
Set the synthetic VMX cpufeatures, which need to be kept to preserve /proc/cpuinfo's ABI, in the common IA32_FEAT_CTL initialization code. Remove the vendor code that manually sets the synthetic flags.
Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com --- arch/x86/kernel/cpu/centaur.c | 35 ------------------------ arch/x86/kernel/cpu/feat_ctl.c | 14 ++++++++++ arch/x86/kernel/cpu/intel.c | 49 ---------------------------------- arch/x86/kernel/cpu/zhaoxin.c | 35 ------------------------ 4 files changed, 14 insertions(+), 119 deletions(-)
diff --git a/arch/x86/kernel/cpu/centaur.c b/arch/x86/kernel/cpu/centaur.c index 084f6040b4df..02d99feb333e 100644 --- a/arch/x86/kernel/cpu/centaur.c +++ b/arch/x86/kernel/cpu/centaur.c @@ -18,13 +18,6 @@ #define RNG_ENABLED (1 << 3) #define RNG_ENABLE (1 << 6) /* MSR_VIA_RNG */
-#define X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW 0x00200000 -#define X86_VMX_FEATURE_PROC_CTLS_VNMI 0x00400000 -#define X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS 0x80000000 -#define X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC 0x00000001 -#define X86_VMX_FEATURE_PROC_CTLS2_EPT 0x00000002 -#define X86_VMX_FEATURE_PROC_CTLS2_VPID 0x00000020 - static void init_c3(struct cpuinfo_x86 *c) { u32 lo, hi; @@ -119,31 +112,6 @@ static void early_init_centaur(struct cpuinfo_x86 *c) } }
-static void centaur_detect_vmx_virtcap(struct cpuinfo_x86 *c) -{ - u32 vmx_msr_low, vmx_msr_high, msr_ctl, msr_ctl2; - - rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high); - msr_ctl = vmx_msr_high | vmx_msr_low; - - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW) - set_cpu_cap(c, X86_FEATURE_TPR_SHADOW); - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_VNMI) - set_cpu_cap(c, X86_FEATURE_VNMI); - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS) { - rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2, - vmx_msr_low, vmx_msr_high); - msr_ctl2 = vmx_msr_high | vmx_msr_low; - if ((msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC) && - (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW)) - set_cpu_cap(c, X86_FEATURE_FLEXPRIORITY); - if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_EPT) - set_cpu_cap(c, X86_FEATURE_EPT); - if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VPID) - set_cpu_cap(c, X86_FEATURE_VPID); - } -} - static void init_centaur(struct cpuinfo_x86 *c) { #ifdef CONFIG_X86_32 @@ -251,9 +219,6 @@ static void init_centaur(struct cpuinfo_x86 *c) #endif
init_ia32_feat_ctl(c); - - if (cpu_has(c, X86_FEATURE_VMX)) - centaur_detect_vmx_virtcap(c); }
#ifdef CONFIG_X86_32 diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c index cbd8bfe9b87b..fcbb35533cef 100644 --- a/arch/x86/kernel/cpu/feat_ctl.c +++ b/arch/x86/kernel/cpu/feat_ctl.c @@ -75,6 +75,20 @@ static void init_vmx_capabilities(struct cpuinfo_x86 *c) (c->vmx_capability[SECONDARY_CTLS] & VMX_F(VIRT_INTR_DELIVERY)) && (c->vmx_capability[MISC_FEATURES] & VMX_F(POSTED_INTR))) c->vmx_capability[MISC_FEATURES] |= VMX_F(APICV); + + /* Set the synthetic cpufeatures to preserve /proc/cpuinfo's ABI. */ + if (c->vmx_capability[PRIMARY_CTLS] & VMX_F(VIRTUAL_TPR)) + set_cpu_cap(c, X86_FEATURE_TPR_SHADOW); + if (c->vmx_capability[MISC_FEATURES] & VMX_F(FLEXPRIORITY)) + set_cpu_cap(c, X86_FEATURE_FLEXPRIORITY); + if (c->vmx_capability[MISC_FEATURES] & VMX_F(VIRTUAL_NMIS)) + set_cpu_cap(c, X86_FEATURE_VNMI); + if (c->vmx_capability[SECONDARY_CTLS] & VMX_F(EPT)) + set_cpu_cap(c, X86_FEATURE_EPT); + if (c->vmx_capability[MISC_FEATURES] & VMX_F(EPT_AD)) + set_cpu_cap(c, X86_FEATURE_EPT_AD); + if (c->vmx_capability[MISC_FEATURES] & VMX_F(VPID)) + set_cpu_cap(c, X86_FEATURE_VPID); } #endif /* CONFIG_X86_VMX_FEATURE_NAMES */
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 9129c170ea11..57473e2c0869 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -494,52 +494,6 @@ static void srat_detect_node(struct cpuinfo_x86 *c) #endif }
-static void detect_vmx_virtcap(struct cpuinfo_x86 *c) -{ - /* Intel VMX MSR indicated features */ -#define X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW 0x00200000 -#define X86_VMX_FEATURE_PROC_CTLS_VNMI 0x00400000 -#define X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS 0x80000000 -#define X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC 0x00000001 -#define X86_VMX_FEATURE_PROC_CTLS2_EPT 0x00000002 -#define X86_VMX_FEATURE_PROC_CTLS2_VPID 0x00000020 -#define x86_VMX_FEATURE_EPT_CAP_AD 0x00200000 - - u32 vmx_msr_low, vmx_msr_high, msr_ctl, msr_ctl2; - u32 msr_vpid_cap, msr_ept_cap; - - clear_cpu_cap(c, X86_FEATURE_TPR_SHADOW); - clear_cpu_cap(c, X86_FEATURE_VNMI); - clear_cpu_cap(c, X86_FEATURE_FLEXPRIORITY); - clear_cpu_cap(c, X86_FEATURE_EPT); - clear_cpu_cap(c, X86_FEATURE_VPID); - clear_cpu_cap(c, X86_FEATURE_EPT_AD); - - rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high); - msr_ctl = vmx_msr_high | vmx_msr_low; - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW) - set_cpu_cap(c, X86_FEATURE_TPR_SHADOW); - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_VNMI) - set_cpu_cap(c, X86_FEATURE_VNMI); - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS) { - rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2, - vmx_msr_low, vmx_msr_high); - msr_ctl2 = vmx_msr_high | vmx_msr_low; - if ((msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC) && - (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW)) - set_cpu_cap(c, X86_FEATURE_FLEXPRIORITY); - if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_EPT) { - set_cpu_cap(c, X86_FEATURE_EPT); - rdmsr(MSR_IA32_VMX_EPT_VPID_CAP, - msr_ept_cap, msr_vpid_cap); - if (msr_ept_cap & x86_VMX_FEATURE_EPT_CAP_AD) - set_cpu_cap(c, X86_FEATURE_EPT_AD); - } - if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VPID) - set_cpu_cap(c, X86_FEATURE_VPID); - } -} - #define MSR_IA32_TME_ACTIVATE 0x982
/* Helpers to access TME_ACTIVATE MSR */ @@ -757,9 +711,6 @@ static void init_intel(struct cpuinfo_x86 *c)
init_ia32_feat_ctl(c);
- if (cpu_has(c, X86_FEATURE_VMX)) - detect_vmx_virtcap(c); - if (cpu_has(c, X86_FEATURE_TME)) detect_tme(c);
diff --git a/arch/x86/kernel/cpu/zhaoxin.c b/arch/x86/kernel/cpu/zhaoxin.c index 630a1450ea70..6b2d3b0a63e6 100644 --- a/arch/x86/kernel/cpu/zhaoxin.c +++ b/arch/x86/kernel/cpu/zhaoxin.c @@ -16,13 +16,6 @@ #define RNG_ENABLED (1 << 3) #define RNG_ENABLE (1 << 8) /* MSR_ZHAOXIN_RNG */
-#define X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW 0x00200000 -#define X86_VMX_FEATURE_PROC_CTLS_VNMI 0x00400000 -#define X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS 0x80000000 -#define X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC 0x00000001 -#define X86_VMX_FEATURE_PROC_CTLS2_EPT 0x00000002 -#define X86_VMX_FEATURE_PROC_CTLS2_VPID 0x00000020 - static void init_zhaoxin_cap(struct cpuinfo_x86 *c) { u32 lo, hi; @@ -89,31 +82,6 @@ static void early_init_zhaoxin(struct cpuinfo_x86 *c)
}
-static void zhaoxin_detect_vmx_virtcap(struct cpuinfo_x86 *c) -{ - u32 vmx_msr_low, vmx_msr_high, msr_ctl, msr_ctl2; - - rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high); - msr_ctl = vmx_msr_high | vmx_msr_low; - - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW) - set_cpu_cap(c, X86_FEATURE_TPR_SHADOW); - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_VNMI) - set_cpu_cap(c, X86_FEATURE_VNMI); - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS) { - rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2, - vmx_msr_low, vmx_msr_high); - msr_ctl2 = vmx_msr_high | vmx_msr_low; - if ((msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC) && - (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW)) - set_cpu_cap(c, X86_FEATURE_FLEXPRIORITY); - if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_EPT) - set_cpu_cap(c, X86_FEATURE_EPT); - if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VPID) - set_cpu_cap(c, X86_FEATURE_VPID); - } -} - static void init_zhaoxin(struct cpuinfo_x86 *c) { early_init_zhaoxin(c); @@ -142,9 +110,6 @@ static void init_zhaoxin(struct cpuinfo_x86 *c) #endif
init_ia32_feat_ctl(c); - - if (cpu_has(c, X86_FEATURE_VMX)) - zhaoxin_detect_vmx_virtcap(c); }
#ifdef CONFIG_X86_32
Add a new feature flag, X86_FEATURE_MSR_IA32_FEAT_CTL, to track whether IA32_FEAT_CTL has been initialized. This will allow KVM, and any future subsystems that depend on IA32_FEAT_CTL, to rely purely on cpufeatures to query platform support, e.g. allows a future patch to remove KVM's manual IA32_FEAT_CTL MSR checks.
Various features (on platforms that support IA32_FEAT_CTL) are dependent on IA32_FEAT_CTL being configured and locked, e.g. VMX and LMCE. The MSR is always configured during boot, but only if the CPU vendor is recognized by the kernel. Because CPUID doesn't incorporate the current IA32_FEAT_CTL value in its reporting of relevant features, it's possible for a feature to be reported as supported in cpufeatures but not truly enabled, e.g. if the CPU supports VMX but the kernel doesn't recognize the CPU.
As a result, without the flag, KVM would see VMX as supported even if IA32_FEAT_CTL hasn't been initialized, and so would need to manually read the MSR and check the various enabling bits to avoid taking an unexpected #GP on VMXON.
Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/kernel/cpu/feat_ctl.c | 2 ++ 2 files changed, 3 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index e9b62498fe75..67d21b25ff78 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -220,6 +220,7 @@ #define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */ #define X86_FEATURE_L1TF_PTEINV ( 7*32+29) /* "" L1TF workaround PTE inversion */ #define X86_FEATURE_IBRS_ENHANCED ( 7*32+30) /* Enhanced IBRS */ +#define X86_FEATURE_MSR_IA32_FEAT_CTL ( 7*32+31) /* "" MSR IA32_FEAT_CTL configured */
/* Virtualization flags: Linux defined, word 8 */ #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */ diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c index fcbb35533cef..24a4fdc1ab51 100644 --- a/arch/x86/kernel/cpu/feat_ctl.c +++ b/arch/x86/kernel/cpu/feat_ctl.c @@ -126,6 +126,8 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c) wrmsrl(MSR_IA32_FEAT_CTL, msr);
update_caps: + set_cpu_cap(c, X86_FEATURE_MSR_IA32_FEAT_CTL); + if (!cpu_has(c, X86_FEATURE_VMX)) return;
Hi Sean,
Add a new feature flag, X86_FEATURE_MSR_IA32_FEAT_CTL, to track whether IA32_FEAT_CTL has been initialized. This will allow KVM, and any future subsystems that depend on IA32_FEAT_CTL, to rely purely on cpufeatures to query platform support, e.g. allows a future patch to remove KVM's manual IA32_FEAT_CTL MSR checks.
Various features (on platforms that support IA32_FEAT_CTL) are dependent on IA32_FEAT_CTL being configured and locked, e.g. VMX and LMCE. The MSR is always configured during boot, but only if the CPU vendor is recognized by the kernel. Because CPUID doesn't incorporate the current IA32_FEAT_CTL value in its reporting of relevant features, it's possible for a feature to be reported as supported in cpufeatures but not truly enabled, e.g. if the CPU supports VMX but the kernel doesn't recognize the CPU.
As a result, without the flag, KVM would see VMX as supported even if IA32_FEAT_CTL hasn't been initialized, and so would need to manually read the MSR and check the various enabling bits to avoid taking an unexpected #GP on VMXON.
I recently ran into a general protection fault that I believe is the fault of this patch:
[ 32.189584] general protection fault, maybe for address 0xffffb567801bcf58: 0000 [#1] SMP PTI [ 32.198103] CPU: 1 PID: 2600 Comm: rngd Not tainted 5.6.0-rc2-jk+ #2 [ 32.204454] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 32.214887] RIP: 0010:hardware_enable+0x100/0x1a0 [kvm_intel] [ 32.220628] Code: 00 00 48 39 f8 74 0f 65 48 89 3d 43 a2 cb 3c e8 66 d3 cc c5 66 90 48 89 df 57 9d 0f 1f 44 00 00 bf 01 00 00 00 e8 90 3d ca c5 <f3> 0f c7 34 24 31 c0 80 3d 59 8d 03 00 00 75 36 48 8b 5c 24 10 65 [ 32.239373] RSP: 0000:ffffb567801bcf58 EFLAGS: 00010002 [ 32.244598] RAX: 0000000000300000 RBX: 0000000000000086 RCX: ffff8f2650440000 [ 32.251730] RDX: 0000000000300000 RSI: 0000000000000000 RDI: ffff8f2650457020 [ 32.258862] RBP: 0000000000000007 R08: 000000077ea5d531 R09: 0000000000000000 [ 32.265986] R10: 000001432bf20982 R11: 0000000000000000 R12: ffffd55b80467110 [ 32.273118] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 32.280243] FS: 00007facfe66f700(0000) GS:ffff8f2650440000(0000) knlGS:0000000000000000 [ 32.288329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 32.294077] CR2: 00007facf0003000 CR3: 0000000b7d402006 CR4: 00000000007626e0 [ 32.301210] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 32.308342] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 32.315474] PKRU: 55555554 [ 32.318186] Call Trace: [ 32.320642] <IRQ> [ 32.322689] kvm_arch_hardware_enable+0x84/0x240 [kvm] [ 32.327836] hardware_enable_nolock+0x31/0x60 [kvm] [ 32.332717] flush_smp_call_function_queue+0x4d/0xe0 [ 32.337683] smp_call_function_interrupt+0x3a/0xd0 [ 32.342471] call_function_interrupt+0xf/0x20 [ 32.346830] </IRQ> [ 32.348935] RIP: 0033:0x7facffd4c753 [ 32.352514] Code: e8 48 c7 45 e0 00 00 00 00 eb 5f 48 8b 45 c8 48 8b 50 38 48 8b 45 c8 8b 40 40 89 c0 48 01 d0 48 89 45 f0 48 8b 45 f0 0f b6 00 <83> c0 01 89 c2 48 8b 45 f0 88 10 48 8b 45 c8 8b 50 40 48 8b 45 c8 [ 32.371263] RSP: 002b:00007facfe66ebf0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff03 [ 32.378826] RAX: 00000000000000ee RBX: 0000000000004097 RCX: 0000000000000000 [ 32.385961] RDX: 0000562781dbadf0 RSI: 0000000000000000 RDI: 00007ffd7edf9080 [ 32.393092] RBP: 00007facfe66ec30 R08: 00007ffd7edf9080 R09: 000000000000cd4a [ 32.400226] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 32.407358] R13: 00007facf0000b20 R14: 0000562781dba2e8 R15: 00007facfe66ed10 [ 32.414493] Modules linked in: ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rfkill ib_isert iscsi _target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib vfat fat ib_umad rpcrdma sunrpc intel_rapl_msr intel_rapl_common rdma_ucm ib_iser rdma_cm isst_if_common iw_cm ib_cm libiscsi skx_edac scsi_transport_iscsi nfit libnv dimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate i40iw qat_c62x iTCO_wdt ipmi_ssif iTCO_vendor_support ib_uverbs mei_me intel_qat intel_uncore ib_c ore joydev intel_rapl_perf pcspkr ipmi_si authenc ioatdma mei i2c_i801 lpc_ich dca ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ip_tables ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm drm_kms_helper cec drm ice i40e crc32 c_intel wmi fuse [ 32.498314] ---[ end trace bfeeeba337a01208 ]---
I noticed that a slightly older commit from before this does not fail. Additionally, the system reports the following during boot:
kvm: disabled by bios
I looked into the vmx_disabled_by_bios and noticed that it checks for both X86_FEATURE_MSR_IA32_FEAT_CTL and X86_FEATURE_VMX.
Compared to the older code before commit a4d0b2fdbcf7 ("KVM: VMX: Use VMX feature flag to query BIOS enabling") it's not clear to me how exactly this could fail to match up.
I suspect something is wrong and the features are enabled even though the BIOS has it disabled, leading to later failure because of this.
Thanks, Jake
On Tue, Feb 25, 2020 at 01:49:13PM -0800, Jacob Keller wrote:
Hi Sean,
Add a new feature flag, X86_FEATURE_MSR_IA32_FEAT_CTL, to track whether IA32_FEAT_CTL has been initialized. This will allow KVM, and any future subsystems that depend on IA32_FEAT_CTL, to rely purely on cpufeatures to query platform support, e.g. allows a future patch to remove KVM's manual IA32_FEAT_CTL MSR checks.
Various features (on platforms that support IA32_FEAT_CTL) are dependent on IA32_FEAT_CTL being configured and locked, e.g. VMX and LMCE. The MSR is always configured during boot, but only if the CPU vendor is recognized by the kernel. Because CPUID doesn't incorporate the current IA32_FEAT_CTL value in its reporting of relevant features, it's possible for a feature to be reported as supported in cpufeatures but not truly enabled, e.g. if the CPU supports VMX but the kernel doesn't recognize the CPU.
As a result, without the flag, KVM would see VMX as supported even if IA32_FEAT_CTL hasn't been initialized, and so would need to manually read the MSR and check the various enabling bits to avoid taking an unexpected #GP on VMXON.
I recently ran into a general protection fault that I believe is the fault of this patch:
[ 32.189584] general protection fault, maybe for address 0xffffb567801bcf58: 0000 [#1] SMP PTI [ 32.198103] CPU: 1 PID: 2600 Comm: rngd Not tainted 5.6.0-rc2-jk+ #2 [ 32.204454] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 32.214887] RIP: 0010:hardware_enable+0x100/0x1a0 [kvm_intel] [ 32.220628] Code: 00 00 48 39 f8 74 0f 65 48 89 3d 43 a2 cb 3c e8 66 d3 cc c5 66 90 48 89 df 57 9d 0f 1f 44 00 00 bf 01 00 00 00 e8 90 3d ca c5 <f3> 0f c7 34 24 31 c0 80 3d 59 8d 03 00 00 75 36 48 8b 5c 24 10 65 [ 32.239373] RSP: 0000:ffffb567801bcf58 EFLAGS: 00010002 [ 32.244598] RAX: 0000000000300000 RBX: 0000000000000086 RCX: ffff8f2650440000 [ 32.251730] RDX: 0000000000300000 RSI: 0000000000000000 RDI: ffff8f2650457020 [ 32.258862] RBP: 0000000000000007 R08: 000000077ea5d531 R09: 0000000000000000 [ 32.265986] R10: 000001432bf20982 R11: 0000000000000000 R12: ffffd55b80467110 [ 32.273118] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 32.280243] FS: 00007facfe66f700(0000) GS:ffff8f2650440000(0000) knlGS:0000000000000000 [ 32.288329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 32.294077] CR2: 00007facf0003000 CR3: 0000000b7d402006 CR4: 00000000007626e0 [ 32.301210] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 32.308342] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 32.315474] PKRU: 55555554 [ 32.318186] Call Trace: [ 32.320642] <IRQ> [ 32.322689] kvm_arch_hardware_enable+0x84/0x240 [kvm] [ 32.327836] hardware_enable_nolock+0x31/0x60 [kvm] [ 32.332717] flush_smp_call_function_queue+0x4d/0xe0 [ 32.337683] smp_call_function_interrupt+0x3a/0xd0 [ 32.342471] call_function_interrupt+0xf/0x20 [ 32.346830] </IRQ> [ 32.348935] RIP: 0033:0x7facffd4c753 [ 32.352514] Code: e8 48 c7 45 e0 00 00 00 00 eb 5f 48 8b 45 c8 48 8b 50 38 48 8b 45 c8 8b 40 40 89 c0 48 01 d0 48 89 45 f0 48 8b 45 f0 0f b6 00 <83> c0 01 89 c2 48 8b 45 f0 88 10 48 8b 45 c8 8b 50 40 48 8b 45 c8 [ 32.371263] RSP: 002b:00007facfe66ebf0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff03 [ 32.378826] RAX: 00000000000000ee RBX: 0000000000004097 RCX: 0000000000000000 [ 32.385961] RDX: 0000562781dbadf0 RSI: 0000000000000000 RDI: 00007ffd7edf9080 [ 32.393092] RBP: 00007facfe66ec30 R08: 00007ffd7edf9080 R09: 000000000000cd4a [ 32.400226] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 32.407358] R13: 00007facf0000b20 R14: 0000562781dba2e8 R15: 00007facfe66ed10 [ 32.414493] Modules linked in: ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rfkill ib_isert iscsi _target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib vfat fat ib_umad rpcrdma sunrpc intel_rapl_msr intel_rapl_common rdma_ucm ib_iser rdma_cm isst_if_common iw_cm ib_cm libiscsi skx_edac scsi_transport_iscsi nfit libnv dimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate i40iw qat_c62x iTCO_wdt ipmi_ssif iTCO_vendor_support ib_uverbs mei_me intel_qat intel_uncore ib_c ore joydev intel_rapl_perf pcspkr ipmi_si authenc ioatdma mei i2c_i801 lpc_ich dca ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ip_tables ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm drm_kms_helper cec drm ice i40e crc32 c_intel wmi fuse [ 32.498314] ---[ end trace bfeeeba337a01208 ]---
I noticed that a slightly older commit from before this does not fail. Additionally, the system reports the following during boot:
kvm: disabled by bios
I looked into the vmx_disabled_by_bios and noticed that it checks for both X86_FEATURE_MSR_IA32_FEAT_CTL and X86_FEATURE_VMX.
Compared to the older code before commit a4d0b2fdbcf7 ("KVM: VMX: Use VMX feature flag to query BIOS enabling") it's not clear to me how exactly this could fail to match up.
I suspect something is wrong and the features are enabled even though the BIOS has it disabled, leading to later failure because of this.
Hrm. On the failing kernel, what are the values of MSR 0x3a for all CPUs, i.e. what's the output of 'sudo rdmsr -a 0x3a'?
On 2/25/2020 2:12 PM, Sean Christopherson wrote:
On Tue, Feb 25, 2020 at 01:49:13PM -0800, Jacob Keller wrote:
Hi Sean,
I suspect something is wrong and the features are enabled even though the BIOS has it disabled, leading to later failure because of this.
Hrm. On the failing kernel, what are the values of MSR 0x3a for all CPUs, i.e. what's the output of 'sudo rdmsr -a 0x3a'?
On the old (fedora 30) kernel, every cpu reports as '1'.
I can't easily test the failing kernel because it crashes during boot.
Thanks, Jake
On Tue, Feb 25, 2020 at 02:52:32PM -0800, Jacob Keller wrote:
On 2/25/2020 2:12 PM, Sean Christopherson wrote:
On Tue, Feb 25, 2020 at 01:49:13PM -0800, Jacob Keller wrote:
Hi Sean,
I suspect something is wrong and the features are enabled even though the BIOS has it disabled, leading to later failure because of this.
Hrm. On the failing kernel, what are the values of MSR 0x3a for all CPUs, i.e. what's the output of 'sudo rdmsr -a 0x3a'?
On the old (fedora 30) kernel, every cpu reports as '1'.
I can't easily test the failing kernel because it crashes during boot.
No need, your BIOS is likely locking the MSR, I doubt the value is any different when running the new kernel.
Does reverting commit a4d0b2fdbcf7 ("KVM: VMX: Use VMX feature flag to query BIOS enabling") resolve the issue?
Is the failing kernel an (umodified) upstream kernel? A stable kernel? Or something else? Assuming it's an unmodified upstream kernel, can you send your .config? I've tried all the obvious Kconfig combinations but haven't been able to reproduce the problem. Staring at the code hasn't yielded any revelations either.
On 2/25/2020 3:29 PM, Sean Christopherson wrote:
On Tue, Feb 25, 2020 at 02:52:32PM -0800, Jacob Keller wrote:
On 2/25/2020 2:12 PM, Sean Christopherson wrote:
On Tue, Feb 25, 2020 at 01:49:13PM -0800, Jacob Keller wrote:
Hi Sean,
I suspect something is wrong and the features are enabled even though the BIOS has it disabled, leading to later failure because of this.
Hrm. On the failing kernel, what are the values of MSR 0x3a for all CPUs, i.e. what's the output of 'sudo rdmsr -a 0x3a'?
On the old (fedora 30) kernel, every cpu reports as '1'.
I can't easily test the failing kernel because it crashes during boot.
No need, your BIOS is likely locking the MSR, I doubt the value is any different when running the new kernel.
Does reverting commit a4d0b2fdbcf7 ("KVM: VMX: Use VMX feature flag to query BIOS enabling") resolve the issue?
Is the failing kernel an (umodified) upstream kernel? A stable kernel? Or something else? Assuming it's an unmodified upstream kernel, can you send your .config? I've tried all the obvious Kconfig combinations but haven't been able to reproduce the problem. Staring at the code hasn't yielded any revelations either.
I reverted the suggested commit and added some prints:
[ 26.056398] X86_FEATURE_MSR_IA32_FEAT_CTL is enabled [ 26.062426] X86_FEATURE_VMX is enabled [ 26.066923] kvm: disabled by bios
So the old code flow is finding KVM to be disabled, but both features are set...
The code that sets this is run first:
Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_LOCKED is set Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_VMX_ENABLED_INSIDE_SMX is unset Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX is unset Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: MSR locked by bios Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: VMX (outside TXT) disabled by BIOS Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: disabling X86_FEATURE_VMX
But somehow... it is still set later...
So there's something weird going on. Maybe "boot_cpu_has" in the vmx_disabled_by_bios is wrong? Hmm.
On 2/25/2020 3:54 PM, Jacob Keller wrote:
I reverted the suggested commit and added some prints:
[ 26.056398] X86_FEATURE_MSR_IA32_FEAT_CTL is enabled [ 26.062426] X86_FEATURE_VMX is enabled [ 26.066923] kvm: disabled by bios
So the old code flow is finding KVM to be disabled, but both features are set...
The code that sets this is run first:
Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_LOCKED is set Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_VMX_ENABLED_INSIDE_SMX is unset Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX is unset Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: MSR locked by bios Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: VMX (outside TXT) disabled by BIOS Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: disabling X86_FEATURE_VMX
But somehow... it is still set later...
So there's something weird going on. Maybe "boot_cpu_has" in the vmx_disabled_by_bios is wrong? Hmm.
I added even more pr_warns, giving me the following diff after reverting the suggested commit:
diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c index 0268185bef94..a86619acab80 100644 --- a/arch/x86/kernel/cpu/feat_ctl.c +++ b/arch/x86/kernel/cpu/feat_ctl.c @@ -97,13 +97,27 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c) bool tboot = tboot_enabled(); u64 msr;
pr_warn("before X86_FEATURE_MSR_IA32_FEAT_CTL is %s\n",
cpu_has(c, X86_FEATURE_MSR_IA32_FEAT_CTL) ? "enabled" : "disabled");
pr_warn("before X86_FEATURE_VMX is %s\n",
cpu_has(c, X86_FEATURE_VMX) ? "enabled" : "disabled");
if (rdmsrl_safe(MSR_IA32_FEAT_CTL, &msr)) { clear_cpu_cap(c, X86_FEATURE_VMX); return; }
if (msr & FEAT_CTL_LOCKED)
pr_warn("FEAT_CTL_LOCKED is %s\n",
msr & FEAT_CTL_LOCKED ? "set" : "unset");
pr_warn("FEAT_CTL_VMX_ENABLED_INSIDE_SMX is %s\n",
msr & FEAT_CTL_VMX_ENABLED_INSIDE_SMX ? "set" : "unset");
pr_warn("FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX is %s\n",
msr & FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX ? "set" : "unset");
if (msr & FEAT_CTL_LOCKED) {
pr_warn("MSR locked by bios\n"); goto update_caps;
} /* * Ignore whatever value BIOS left in the MSR to avoid enabling random
@@ -136,10 +150,16 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c) if (IS_ENABLED(CONFIG_KVM_INTEL)) pr_err_once("VMX (%s TXT) disabled by BIOS\n", tboot ? "inside" : "outside");
pr_warn("disabling X86_FEATURE_VMX\n"); clear_cpu_cap(c, X86_FEATURE_VMX); } else {
#ifdef CONFIG_X86_VMX_FEATURE_NAMES init_vmx_capabilities(c); #endif }
pr_warn("after X86_FEATURE_MSR_IA32_FEAT_CTL is %s\n",
cpu_has(c, X86_FEATURE_MSR_IA32_FEAT_CTL) ? "enabled" : "disabled");
pr_warn("after X86_FEATURE_VMX is %s\n",
cpu_has(c, X86_FEATURE_VMX) ? "enabled" : "disabled");
} diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index a2e18e60c2db..550f8d556251 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2222,6 +2222,16 @@ static __init int vmx_disabled_by_bios(void) { u64 msr;
pr_warn("boot X86_FEATURE_MSR_IA32_FEAT_CTL is %s\n",
boot_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ? "enabled" : "disabled");
pr_warn("boot X86_FEATURE_VMX is %s\n",
boot_cpu_has(X86_FEATURE_VMX) ? "enabled" : "disabled");
pr_warn("this_cpu X86_FEATURE_MSR_IA32_FEAT_CTL is %s\n",
this_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ? "enabled" : "disabled");
pr_warn("this_cpu X86_FEATURE_VMX is %s\n",
this_cpu_has(X86_FEATURE_VMX) ? "enabled" : "disabled");
rdmsrl(MSR_IA32_FEAT_CTL, msr); if (unlikely(!(msr & FEAT_CTL_LOCKED)))
With this, I see the following output for each CPU, starting with boot CPU:
Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: before X86_FEATURE_MSR_IA32_FEAT_CTL is disabled Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: before X86_FEATURE_VMX is enabled Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_LOCKED is set Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_VMX_ENABLED_INSIDE_SMX is unset Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX is unset Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: MSR locked by bios Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: VMX (outside TXT) disabled by BIOS Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: disabling X86_FEATURE_VMX Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: after X86_FEATURE_MSR_IA32_FEAT_CTL is enabled Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: after X86_FEATURE_VMX is disabled
And for each of the SMP CPUs:
Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: before X86_FEATURE_MSR_IA32_FEAT_CTL is disabled Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: before X86_FEATURE_VMX is enabled Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_LOCKED is set Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_VMX_ENABLED_INSIDE_SMX is unset Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX is unset Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: MSR locked by bios Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: disabling X86_FEATURE_VMX Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: after X86_FEATURE_MSR_IA32_FEAT_CTL is enabled Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: after X86_FEATURE_VMX is disabled
But when we finally go to check kvm:
Feb 25 16:36:06 jbrandeb-saw1 kernel: boot X86_FEATURE_MSR_IA32_FEAT_CTL is enabled Feb 25 16:36:06 jbrandeb-saw1 kernel: boot X86_FEATURE_VMX is enabled Feb 25 16:36:06 jbrandeb-saw1 kernel: this_cpu X86_FEATURE_MSR_IA32_FEAT_CTL is enabled Feb 25 16:36:06 jbrandeb-saw1 kernel: this_cpu X86_FEATURE_VMX is enabled
I tried checking both boot and this_cpu, just in case.
Somehow the things are being restored/re-enabled. I can't figure out where this even happens. At a glance it's not even obvious to me where the original features get set, and nothing seems to obviously set these flags....
Thanks, Jake
On Tue, Feb 25, 2020 at 03:54:34PM -0800, Jacob Keller wrote:
On 2/25/2020 3:29 PM, Sean Christopherson wrote:
On Tue, Feb 25, 2020 at 02:52:32PM -0800, Jacob Keller wrote:
On 2/25/2020 2:12 PM, Sean Christopherson wrote:
On Tue, Feb 25, 2020 at 01:49:13PM -0800, Jacob Keller wrote:
Hi Sean,
I suspect something is wrong and the features are enabled even though the BIOS has it disabled, leading to later failure because of this.
Hrm. On the failing kernel, what are the values of MSR 0x3a for all CPUs, i.e. what's the output of 'sudo rdmsr -a 0x3a'?
On the old (fedora 30) kernel, every cpu reports as '1'.
I can't easily test the failing kernel because it crashes during boot.
No need, your BIOS is likely locking the MSR, I doubt the value is any different when running the new kernel.
Does reverting commit a4d0b2fdbcf7 ("KVM: VMX: Use VMX feature flag to query BIOS enabling") resolve the issue?
Is the failing kernel an (umodified) upstream kernel? A stable kernel? Or something else? Assuming it's an unmodified upstream kernel, can you send your .config? I've tried all the obvious Kconfig combinations but haven't been able to reproduce the problem. Staring at the code hasn't yielded any revelations either.
I reverted the suggested commit and added some prints:
[ 26.056398] X86_FEATURE_MSR_IA32_FEAT_CTL is enabled [ 26.062426] X86_FEATURE_VMX is enabled [ 26.066923] kvm: disabled by bios
So the old code flow is finding KVM to be disabled, but both features are set...
The code that sets this is run first:
Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_LOCKED is set Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_VMX_ENABLED_INSIDE_SMX is unset Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX is unset Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: MSR locked by bios Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: VMX (outside TXT) disabled by BIOS Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: disabling X86_FEATURE_VMX
But somehow... it is still set later...
So there's something weird going on. Maybe "boot_cpu_has" in the vmx_disabled_by_bios is wrong? Hmm.
Hmm, perhaps a bug somewhere else is overwriting the cpufeatures bit for X86_FEATURE_VMX. Let me see if I can reproduce from net-next.
On 2/25/2020 4:42 PM, Sean Christopherson wrote>> So there's something weird going on. Maybe "boot_cpu_has" in the
vmx_disabled_by_bios is wrong? Hmm.
Hmm, perhaps a bug somewhere else is overwriting the cpufeatures bit for X86_FEATURE_VMX. Let me see if I can reproduce from net-next.
If you have any further suggestions for debugging, I'm happy to help try to figure this out. To my eyes, it looks like somehow bits get reset... It's definitely not clear to me how this happens.
There is the get_cpu_caps call.. but that seems to correctly call apply_forced_caps at the end.
That's all I have time for today.
Thanks, Jake
On 2/25/2020 4:58 PM, Jacob Keller wrote:
On 2/25/2020 4:42 PM, Sean Christopherson wrote>> So there's something weird going on. Maybe "boot_cpu_has" in the
vmx_disabled_by_bios is wrong? Hmm.
Hmm, perhaps a bug somewhere else is overwriting the cpufeatures bit for X86_FEATURE_VMX. Let me see if I can reproduce from net-next.
If you have any further suggestions for debugging, I'm happy to help try to figure this out. To my eyes, it looks like somehow bits get reset... It's definitely not clear to me how this happens.
There is the get_cpu_caps call.. but that seems to correctly call apply_forced_caps at the end.
That's all I have time for today.
Thanks, Jake
Hi,
I kept digging into this, and I added a further print to the get_cpu_cap function.
It looks like get_cpu_cap is being called again *after* init_ia32_feat_ctl...
Digging further, I discovered this appears to be the call in setup_pku, which would only be enabled for systems which have X86_FEATURE_PKU enabled and supported. It's quite likely that test systems may not have had this feature, hence why it went undetected till now.
Because of the extra get_cpu_cap call, the capabilities are reset. Since we never use setup_clear_cpu_cap or pass NULL to clear_cpu_cap, the code that sets the global cpu_caps_cleared bits is not run.
It's not clear to me what the best fix for this is.
Perhaps init_ia32_feat_ctl should be something run during early_identify_cpu, since it's really checking global status (rdmsr), and not per-CPU status. And then it could directly operate to call setup_clear_cpu_cap, which would properly clear the bit globally, ensuring that apply_forced_caps kicks in?
Or this needs to somehow be run *after* setup_pku? But that doesn't feel very robust.
Thanks, Jake
On Wed, Feb 26, 2020 at 12:41:09PM -0800, Jacob Keller wrote:
On 2/25/2020 4:58 PM, Jacob Keller wrote:
On 2/25/2020 4:42 PM, Sean Christopherson wrote>> So there's something weird going on. Maybe "boot_cpu_has" in the
vmx_disabled_by_bios is wrong? Hmm.
Hmm, perhaps a bug somewhere else is overwriting the cpufeatures bit for X86_FEATURE_VMX. Let me see if I can reproduce from net-next.
If you have any further suggestions for debugging, I'm happy to help try to figure this out. To my eyes, it looks like somehow bits get reset... It's definitely not clear to me how this happens.
There is the get_cpu_caps call.. but that seems to correctly call apply_forced_caps at the end.
That's all I have time for today.
Thanks, Jake
Hi,
I kept digging into this, and I added a further print to the get_cpu_cap function.
It looks like get_cpu_cap is being called again *after* init_ia32_feat_ctl...
Digging further, I discovered this appears to be the call in setup_pku, which would only be enabled for systems which have X86_FEATURE_PKU enabled and supported. It's quite likely that test systems may not have had this feature, hence why it went undetected till now.
Ya, probably not a whole lot of folks with Icelake silicon and VMX disabled in BIOS. I'll see if I can reproduce on my ICX system, that would make testing a fix a little easier.
Because of the extra get_cpu_cap call, the capabilities are reset. Since we never use setup_clear_cpu_cap or pass NULL to clear_cpu_cap, the code that sets the global cpu_caps_cleared bits is not run.
It's not clear to me what the best fix for this is.
Perhaps init_ia32_feat_ctl should be something run during early_identify_cpu, since it's really checking global status (rdmsr), and not per-CPU status. And then it could directly operate to call setup_clear_cpu_cap, which would properly clear the bit globally, ensuring that apply_forced_caps kicks in?
Or this needs to somehow be run *after* setup_pku? But that doesn't feel very robust.
Bummer. Using clear_cpu_cap() instead of setup_clear_cpu_cap() was me being fancy and trying to allow KVM to identify the case where VMX is available and configured on some CPUs but not all. I'll work on a fix.
On 2/26/2020 12:57 PM, Sean Christopherson wrote:
Bummer. Using clear_cpu_cap() instead of setup_clear_cpu_cap() was me being fancy and trying to allow KVM to identify the case where VMX is available and configured on some CPUs but not all. I'll work on a fix.
Hmm. Right. For that to work, you'd need to make this disabling happen significantly later, and/or fix setup_pku to somehow honor this properly.
But it looks like rdmsr is global and not tied to a given CPU anyways?
Thanks, Jake
On Wed, Feb 26, 2020 at 01:03:01PM -0800, Jacob Keller wrote:
On 2/26/2020 12:57 PM, Sean Christopherson wrote:
Bummer. Using clear_cpu_cap() instead of setup_clear_cpu_cap() was me being fancy and trying to allow KVM to identify the case where VMX is available and configured on some CPUs but not all. I'll work on a fix.
Hmm. Right. For that to work, you'd need to make this disabling happen significantly later, and/or fix setup_pku to somehow honor this properly.
Arguably, setup_pku() should be a little less heavy handed in updating cpufeatures for X86_FEATURE_OSPKE, but init_ia32_feat_ctl() should also be more robust.
I've reproduced the bug, should have a fix ready by EOD.
But it looks like rdmsr is global and not tied to a given CPU anyways?
For better or worse, the MSR is thread scoped.
On 2/26/2020 1:25 PM, Sean Christopherson wrote:
Arguably, setup_pku() should be a little less heavy handed in updating cpufeatures for X86_FEATURE_OSPKE, but init_ia32_feat_ctl() should also be more robust.
Right.
But it looks like rdmsr is global and not tied to a given CPU anyways?
For better or worse, the MSR is thread scoped.
Ahh. Definitely not obvious at a glance.
I've reproduced the bug, should have a fix ready by EOD.
Nice, glad to hear it.
Thanks, Jake
On Tue, Feb 25, 2020 at 01:49:13PM -0800, Jacob Keller wrote:
I recently ran into a general protection fault that I believe is the fault of this patch:
[ 32.189584] general protection fault, maybe for address 0xffffb567801bcf58: 0000 [#1] SMP PTI [ 32.198103] CPU: 1 PID: 2600 Comm: rngd Not tainted 5.6.0-rc2-jk+ #2 [ 32.204454] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 32.214887] RIP: 0010:hardware_enable+0x100/0x1a0 [kvm_intel] [ 32.220628] Code: <f3> 0f c7 34 24 31 c0 80 3d 59 8d 03 00 00 75 36 48 8b 5c 24 10 65 [ 32.239373] RSP: 0000:ffffb567801bcf58 EFLAGS: 00010002 [ 32.244598] RAX: 0000000000300000 RBX: 0000000000000086 RCX: ffff8f2650440000 [ 32.251730] RDX: 0000000000300000 RSI: 0000000000000000 RDI: ffff8f2650457020 [ 32.258862] RBP: 0000000000000007 R08: 000000077ea5d531 R09: 0000000000000000 [ 32.265986] R10: 000001432bf20982 R11: 0000000000000000 R12: ffffd55b80467110 [ 32.273118] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 32.280243] FS: 00007facfe66f700(0000) GS:ffff8f2650440000(0000) knlGS:0000000000000000 [ 32.288329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 32.294077] CR2: 00007facf0003000 CR3: 0000000b7d402006 CR4: 00000000007626e0 [ 32.301210] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 32.308342] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 32.315474] PKRU: 55555554 [ 32.318186] Call Trace: [ 32.320642] <IRQ> [ 32.322689] kvm_arch_hardware_enable+0x84/0x240 [kvm] [ 32.327836] hardware_enable_nolock+0x31/0x60 [kvm] [ 32.332717] flush_smp_call_function_queue+0x4d/0xe0 [ 32.337683] smp_call_function_interrupt+0x3a/0xd0 [ 32.342471] call_function_interrupt+0xf/0x20 [ 32.346830] </IRQ> [ 32.498314] ---[ end trace bfeeeba337a01208 ]---
I noticed that a slightly older commit from before this does not fail. Additionally, the system reports the following during boot:
kvm: disabled by bios
One other thing that's been bothering me; you mention in a later email that this bug resulting in a crash during boot. The low timestamps also suggest the system is fairly early in its bringup.
But KVM only does VMXON when it actually creates a VM[*]. During boot I would expect the bug to result in KVM being incorrectly loaded/enabled, but that alone wouldn't trigger a crash.
I assume/hope your system is automatically running some form of virt process at boot? Not that there's anything wrong with that, it's just suprising and I want to make sure there's not something really funky going on.
[*] KVM also does VMXON when hotplugging a CPU, but only if KVM has active VMs, and the IPI callback above indicates this isn't the hotplug case.
Subject: Re: [PATCH v5 13/19] x86/cpufeatures: Add flag to track whether MSR IA32_FEAT_CTL is configured
On Tue, Feb 25, 2020 at 01:49:13PM -0800, Jacob Keller wrote:
I recently ran into a general protection fault that I believe is the fault of this patch:
[ 32.189584] general protection fault, maybe for address
0xffffb567801bcf58: 0000 [#1] SMP PTI
[ 32.198103] CPU: 1 PID: 2600 Comm: rngd Not tainted 5.6.0-rc2-jk+ #2 [ 32.204454] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS
SE5C620.86B.02.01.0008.031920191559 03/19/2019
[ 32.214887] RIP: 0010:hardware_enable+0x100/0x1a0 [kvm_intel] [ 32.220628] Code: <f3> 0f c7 34 24 31 c0 80 3d 59 8d 03 00 00 75 36 48 8b
5c 24 10 65
[ 32.239373] RSP: 0000:ffffb567801bcf58 EFLAGS: 00010002 [ 32.244598] RAX: 0000000000300000 RBX: 0000000000000086 RCX:
ffff8f2650440000
[ 32.251730] RDX: 0000000000300000 RSI: 0000000000000000 RDI:
ffff8f2650457020
[ 32.258862] RBP: 0000000000000007 R08: 000000077ea5d531 R09:
0000000000000000
[ 32.265986] R10: 000001432bf20982 R11: 0000000000000000 R12:
ffffd55b80467110
[ 32.273118] R13: 0000000000000000 R14: 0000000000000000 R15:
0000000000000000
[ 32.280243] FS: 00007facfe66f700(0000) GS:ffff8f2650440000(0000)
knlGS:0000000000000000
[ 32.288329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 32.294077] CR2: 00007facf0003000 CR3: 0000000b7d402006 CR4:
00000000007626e0
[ 32.301210] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 32.308342] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 32.315474] PKRU: 55555554 [ 32.318186] Call Trace: [ 32.320642] <IRQ> [ 32.322689] kvm_arch_hardware_enable+0x84/0x240 [kvm] [ 32.327836] hardware_enable_nolock+0x31/0x60 [kvm] [ 32.332717] flush_smp_call_function_queue+0x4d/0xe0 [ 32.337683] smp_call_function_interrupt+0x3a/0xd0 [ 32.342471] call_function_interrupt+0xf/0x20 [ 32.346830] </IRQ> [ 32.498314] ---[ end trace bfeeeba337a01208 ]---
I noticed that a slightly older commit from before this does not fail. Additionally, the system reports the following during boot:
kvm: disabled by bios
One other thing that's been bothering me; you mention in a later email that this bug resulting in a crash during boot. The low timestamps also suggest the system is fairly early in its bringup.
But KVM only does VMXON when it actually creates a VM[*]. During boot I would expect the bug to result in KVM being incorrectly loaded/enabled, but that alone wouldn't trigger a crash.
I assume/hope your system is automatically running some form of virt process at boot? Not that there's anything wrong with that, it's just suprising and I want to make sure there's not something really funky going on.
I can be wrong but it appears during boot (before you can login) libvirt may create VM simply to call some qemu monitor APIs to get some info. The VM is then destroyed after libvirt gets that info of course.
Thanks, -Kai
[*] KVM also does VMXON when hotplugging a CPU, but only if KVM has active VMs, and the IPI callback above indicates this isn't the hotplug case.
On 2/26/2020 6:12 PM, Sean Christopherson wrote:
On Tue, Feb 25, 2020 at 01:49:13PM -0800, Jacob Keller wrote:
One other thing that's been bothering me; you mention in a later email that this bug resulting in a crash during boot. The low timestamps also suggest the system is fairly early in its bringup.
But KVM only does VMXON when it actually creates a VM[*]. During boot I would expect the bug to result in KVM being incorrectly loaded/enabled, but that alone wouldn't trigger a crash.
It crashes during hardware enable, specifically in the kvm_cpu_vmxon during the hardware_enable() function.
It doesn't crash until near the end of bootup, and it didn't crash when I kept the system in single-user boot mode.
I assume/hope your system is automatically running some form of virt process at boot? Not that there's anything wrong with that, it's just suprising and I want to make sure there's not something really funky going on.
The system has libvirtd enabled. My guess is that libvirtd starts up and enables hardware. I don't see any actual virtual machines enabled, but I think you're right that this is why it crashes.
Thanks, Jake
Remove KVM's code to initialize IA32_FEAT_CTL MSR when KVM is loaded now that the MSR is initialized during boot on all CPUs that support VMX, i.e. on all CPUs that can possibly load kvm_intel.
Note, don't WARN if IA32_FEAT_CTL is unlocked, even though the MSR is unconditionally locked by init_ia32_feat_ctl(). KVM isn't tied directly to a CPU vendor detection, whereas init_ia32_feat_ctl() is invoked if and only if the CPU vendor is recognized and known to support VMX. As a result, vmx_disabled_by_bios() may be reached without going through init_ia32_feat_ctl() and thus without locking IA32_FEAT_CTL. This quirk will be eliminated in a future patch.
Reviewed-by: Jim Mattson jmattson@google.com Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com --- arch/x86/kvm/vmx/vmx.c | 48 +++++++++++++++++------------------------- 1 file changed, 19 insertions(+), 29 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 91b2517e3b62..a026334ffc58 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2207,24 +2207,26 @@ static __init int vmx_disabled_by_bios(void) u64 msr;
rdmsrl(MSR_IA32_FEAT_CTL, msr); - if (msr & FEAT_CTL_LOCKED) { - /* launched w/ TXT and VMX disabled */ - if (!(msr & FEAT_CTL_VMX_ENABLED_INSIDE_SMX) - && tboot_enabled()) - return 1; - /* launched w/o TXT and VMX only enabled w/ TXT */ - if (!(msr & FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX) - && (msr & FEAT_CTL_VMX_ENABLED_INSIDE_SMX) - && !tboot_enabled()) { - printk(KERN_WARNING "kvm: disable TXT in the BIOS or " - "activate TXT before enabling KVM\n"); - return 1; - } - /* launched w/o TXT and VMX disabled */ - if (!(msr & FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX) - && !tboot_enabled()) - return 1; + + if (unlikely(!(msr & FEAT_CTL_LOCKED))) + return 1; + + /* launched w/ TXT and VMX disabled */ + if (!(msr & FEAT_CTL_VMX_ENABLED_INSIDE_SMX) && + tboot_enabled()) + return 1; + /* launched w/o TXT and VMX only enabled w/ TXT */ + if (!(msr & FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX) && + (msr & FEAT_CTL_VMX_ENABLED_INSIDE_SMX) && + !tboot_enabled()) { + pr_warn("kvm: disable TXT in the BIOS or " + "activate TXT before enabling KVM\n"); + return 1; } + /* launched w/o TXT and VMX disabled */ + if (!(msr & FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX) && + !tboot_enabled()) + return 1;
return 0; } @@ -2241,7 +2243,6 @@ static int hardware_enable(void) { int cpu = raw_smp_processor_id(); u64 phys_addr = __pa(per_cpu(vmxarea, cpu)); - u64 old, test_bits;
if (cr4_read_shadow() & X86_CR4_VMXE) return -EBUSY; @@ -2269,17 +2270,6 @@ static int hardware_enable(void) */ crash_enable_local_vmclear(cpu);
- rdmsrl(MSR_IA32_FEAT_CTL, old); - - test_bits = FEAT_CTL_LOCKED; - test_bits |= FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX; - if (tboot_enabled()) - test_bits |= FEAT_CTL_VMX_ENABLED_INSIDE_SMX; - - if ((old & test_bits) != test_bits) { - /* enable and lock */ - wrmsrl(MSR_IA32_FEAT_CTL, old | test_bits); - } kvm_cpu_vmxon(phys_addr); if (enable_ept) ept_sync_global();
Replace KVM's manual checks on IA32_FEAT_CTL with a query on the boot CPU's MSR_IA32_FEAT_CTL and VMX feature flags. The MSR_IA32_FEAT_CTL indicates that IA32_FEAT_CTL has been configured and that dependent features are accurately reflected in cpufeatures, e.g. the VMX flag is now cleared during boot if VMX isn't fully enabled via IA32_FEAT_CTL, including the case where the MSR isn't supported.
Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com --- arch/x86/kvm/vmx/vmx.c | 27 ++------------------------- 1 file changed, 2 insertions(+), 25 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index a026334ffc58..06e0e5262704 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2204,31 +2204,8 @@ static __init int cpu_has_kvm_support(void)
static __init int vmx_disabled_by_bios(void) { - u64 msr; - - rdmsrl(MSR_IA32_FEAT_CTL, msr); - - if (unlikely(!(msr & FEAT_CTL_LOCKED))) - return 1; - - /* launched w/ TXT and VMX disabled */ - if (!(msr & FEAT_CTL_VMX_ENABLED_INSIDE_SMX) && - tboot_enabled()) - return 1; - /* launched w/o TXT and VMX only enabled w/ TXT */ - if (!(msr & FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX) && - (msr & FEAT_CTL_VMX_ENABLED_INSIDE_SMX) && - !tboot_enabled()) { - pr_warn("kvm: disable TXT in the BIOS or " - "activate TXT before enabling KVM\n"); - return 1; - } - /* launched w/o TXT and VMX disabled */ - if (!(msr & FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX) && - !tboot_enabled()) - return 1; - - return 0; + return !boot_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) || + !boot_cpu_has(X86_FEATURE_VMX); }
static void kvm_cpu_vmxon(u64 addr)
Explicitly check the current CPU's IA32_FEAT_CTL and VMX feature flags when verifying compatibility across physical CPUs. This effectively adds a check on IA32_FEAT_CTL to ensure that VMX is fully enabled on all CPUs.
Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com --- arch/x86/kvm/vmx/vmx.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 06e0e5262704..cdb4bf50ee14 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6838,6 +6838,12 @@ static int __init vmx_check_processor_compat(void) struct vmcs_config vmcs_conf; struct vmx_capability vmx_cap;
+ if (!this_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) || + !this_cpu_has(X86_FEATURE_VMX)) { + pr_err("kvm: VMX is disabled on CPU %d\n", smp_processor_id()); + return -EIO; + } + if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0) return -EIO; if (nested)
Define the VMCS execution control flags (consumed by KVM) using their associated VMX_FEATURE_* to provide a strong hint that new VMX features are expected to be added to VMX_FEATURE and considered for reporting via /proc/cpuinfo.
No functional change intended.
Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com --- arch/x86/include/asm/vmx.h | 105 +++++++++++++++++++------------------ 1 file changed, 55 insertions(+), 50 deletions(-)
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 1835767aa335..9fbba31be825 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -15,67 +15,70 @@ #include <linux/bitops.h> #include <linux/types.h> #include <uapi/asm/vmx.h> +#include <asm/vmxfeatures.h> + +#define VMCS_CONTROL_BIT(x) BIT(VMX_FEATURE_##x & 0x1f)
/* * Definitions of Primary Processor-Based VM-Execution Controls. */ -#define CPU_BASED_VIRTUAL_INTR_PENDING 0x00000004 -#define CPU_BASED_USE_TSC_OFFSETING 0x00000008 -#define CPU_BASED_HLT_EXITING 0x00000080 -#define CPU_BASED_INVLPG_EXITING 0x00000200 -#define CPU_BASED_MWAIT_EXITING 0x00000400 -#define CPU_BASED_RDPMC_EXITING 0x00000800 -#define CPU_BASED_RDTSC_EXITING 0x00001000 -#define CPU_BASED_CR3_LOAD_EXITING 0x00008000 -#define CPU_BASED_CR3_STORE_EXITING 0x00010000 -#define CPU_BASED_CR8_LOAD_EXITING 0x00080000 -#define CPU_BASED_CR8_STORE_EXITING 0x00100000 -#define CPU_BASED_TPR_SHADOW 0x00200000 -#define CPU_BASED_VIRTUAL_NMI_PENDING 0x00400000 -#define CPU_BASED_MOV_DR_EXITING 0x00800000 -#define CPU_BASED_UNCOND_IO_EXITING 0x01000000 -#define CPU_BASED_USE_IO_BITMAPS 0x02000000 -#define CPU_BASED_MONITOR_TRAP_FLAG 0x08000000 -#define CPU_BASED_USE_MSR_BITMAPS 0x10000000 -#define CPU_BASED_MONITOR_EXITING 0x20000000 -#define CPU_BASED_PAUSE_EXITING 0x40000000 -#define CPU_BASED_ACTIVATE_SECONDARY_CONTROLS 0x80000000 +#define CPU_BASED_VIRTUAL_INTR_PENDING VMCS_CONTROL_BIT(VIRTUAL_INTR_PENDING) +#define CPU_BASED_USE_TSC_OFFSETING VMCS_CONTROL_BIT(TSC_OFFSETTING) +#define CPU_BASED_HLT_EXITING VMCS_CONTROL_BIT(HLT_EXITING) +#define CPU_BASED_INVLPG_EXITING VMCS_CONTROL_BIT(INVLPG_EXITING) +#define CPU_BASED_MWAIT_EXITING VMCS_CONTROL_BIT(MWAIT_EXITING) +#define CPU_BASED_RDPMC_EXITING VMCS_CONTROL_BIT(RDPMC_EXITING) +#define CPU_BASED_RDTSC_EXITING VMCS_CONTROL_BIT(RDTSC_EXITING) +#define CPU_BASED_CR3_LOAD_EXITING VMCS_CONTROL_BIT(CR3_LOAD_EXITING) +#define CPU_BASED_CR3_STORE_EXITING VMCS_CONTROL_BIT(CR3_STORE_EXITING) +#define CPU_BASED_CR8_LOAD_EXITING VMCS_CONTROL_BIT(CR8_LOAD_EXITING) +#define CPU_BASED_CR8_STORE_EXITING VMCS_CONTROL_BIT(CR8_STORE_EXITING) +#define CPU_BASED_TPR_SHADOW VMCS_CONTROL_BIT(VIRTUAL_TPR) +#define CPU_BASED_VIRTUAL_NMI_PENDING VMCS_CONTROL_BIT(VIRTUAL_NMI_PENDING) +#define CPU_BASED_MOV_DR_EXITING VMCS_CONTROL_BIT(MOV_DR_EXITING) +#define CPU_BASED_UNCOND_IO_EXITING VMCS_CONTROL_BIT(UNCOND_IO_EXITING) +#define CPU_BASED_USE_IO_BITMAPS VMCS_CONTROL_BIT(USE_IO_BITMAPS) +#define CPU_BASED_MONITOR_TRAP_FLAG VMCS_CONTROL_BIT(MONITOR_TRAP_FLAG) +#define CPU_BASED_USE_MSR_BITMAPS VMCS_CONTROL_BIT(USE_MSR_BITMAPS) +#define CPU_BASED_MONITOR_EXITING VMCS_CONTROL_BIT(MONITOR_EXITING) +#define CPU_BASED_PAUSE_EXITING VMCS_CONTROL_BIT(PAUSE_EXITING) +#define CPU_BASED_ACTIVATE_SECONDARY_CONTROLS VMCS_CONTROL_BIT(SEC_CONTROLS)
#define CPU_BASED_ALWAYSON_WITHOUT_TRUE_MSR 0x0401e172
/* * Definitions of Secondary Processor-Based VM-Execution Controls. */ -#define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 0x00000001 -#define SECONDARY_EXEC_ENABLE_EPT 0x00000002 -#define SECONDARY_EXEC_DESC 0x00000004 -#define SECONDARY_EXEC_RDTSCP 0x00000008 -#define SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE 0x00000010 -#define SECONDARY_EXEC_ENABLE_VPID 0x00000020 -#define SECONDARY_EXEC_WBINVD_EXITING 0x00000040 -#define SECONDARY_EXEC_UNRESTRICTED_GUEST 0x00000080 -#define SECONDARY_EXEC_APIC_REGISTER_VIRT 0x00000100 -#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY 0x00000200 -#define SECONDARY_EXEC_PAUSE_LOOP_EXITING 0x00000400 -#define SECONDARY_EXEC_RDRAND_EXITING 0x00000800 -#define SECONDARY_EXEC_ENABLE_INVPCID 0x00001000 -#define SECONDARY_EXEC_ENABLE_VMFUNC 0x00002000 -#define SECONDARY_EXEC_SHADOW_VMCS 0x00004000 -#define SECONDARY_EXEC_ENCLS_EXITING 0x00008000 -#define SECONDARY_EXEC_RDSEED_EXITING 0x00010000 -#define SECONDARY_EXEC_ENABLE_PML 0x00020000 -#define SECONDARY_EXEC_PT_CONCEAL_VMX 0x00080000 -#define SECONDARY_EXEC_XSAVES 0x00100000 -#define SECONDARY_EXEC_PT_USE_GPA 0x01000000 -#define SECONDARY_EXEC_MODE_BASED_EPT_EXEC 0x00400000 -#define SECONDARY_EXEC_TSC_SCALING 0x02000000 +#define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES VMCS_CONTROL_BIT(VIRT_APIC_ACCESSES) +#define SECONDARY_EXEC_ENABLE_EPT VMCS_CONTROL_BIT(EPT) +#define SECONDARY_EXEC_DESC VMCS_CONTROL_BIT(DESC_EXITING) +#define SECONDARY_EXEC_RDTSCP VMCS_CONTROL_BIT(RDTSCP) +#define SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE VMCS_CONTROL_BIT(VIRTUAL_X2APIC) +#define SECONDARY_EXEC_ENABLE_VPID VMCS_CONTROL_BIT(VPID) +#define SECONDARY_EXEC_WBINVD_EXITING VMCS_CONTROL_BIT(WBINVD_EXITING) +#define SECONDARY_EXEC_UNRESTRICTED_GUEST VMCS_CONTROL_BIT(UNRESTRICTED_GUEST) +#define SECONDARY_EXEC_APIC_REGISTER_VIRT VMCS_CONTROL_BIT(APIC_REGISTER_VIRT) +#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY VMCS_CONTROL_BIT(VIRT_INTR_DELIVERY) +#define SECONDARY_EXEC_PAUSE_LOOP_EXITING VMCS_CONTROL_BIT(PAUSE_LOOP_EXITING) +#define SECONDARY_EXEC_RDRAND_EXITING VMCS_CONTROL_BIT(RDRAND_EXITING) +#define SECONDARY_EXEC_ENABLE_INVPCID VMCS_CONTROL_BIT(INVPCID) +#define SECONDARY_EXEC_ENABLE_VMFUNC VMCS_CONTROL_BIT(VMFUNC) +#define SECONDARY_EXEC_SHADOW_VMCS VMCS_CONTROL_BIT(SHADOW_VMCS) +#define SECONDARY_EXEC_ENCLS_EXITING VMCS_CONTROL_BIT(ENCLS_EXITING) +#define SECONDARY_EXEC_RDSEED_EXITING VMCS_CONTROL_BIT(RDSEED_EXITING) +#define SECONDARY_EXEC_ENABLE_PML VMCS_CONTROL_BIT(PAGE_MOD_LOGGING) +#define SECONDARY_EXEC_PT_CONCEAL_VMX VMCS_CONTROL_BIT(PT_CONCEAL_VMX) +#define SECONDARY_EXEC_XSAVES VMCS_CONTROL_BIT(XSAVES) +#define SECONDARY_EXEC_MODE_BASED_EPT_EXEC VMCS_CONTROL_BIT(MODE_BASED_EPT_EXEC) +#define SECONDARY_EXEC_PT_USE_GPA VMCS_CONTROL_BIT(PT_USE_GPA) +#define SECONDARY_EXEC_TSC_SCALING VMCS_CONTROL_BIT(TSC_SCALING) #define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE 0x04000000
-#define PIN_BASED_EXT_INTR_MASK 0x00000001 -#define PIN_BASED_NMI_EXITING 0x00000008 -#define PIN_BASED_VIRTUAL_NMIS 0x00000020 -#define PIN_BASED_VMX_PREEMPTION_TIMER 0x00000040 -#define PIN_BASED_POSTED_INTR 0x00000080 +#define PIN_BASED_EXT_INTR_MASK VMCS_CONTROL_BIT(INTR_EXITING) +#define PIN_BASED_NMI_EXITING VMCS_CONTROL_BIT(NMI_EXITING) +#define PIN_BASED_VIRTUAL_NMIS VMCS_CONTROL_BIT(VIRTUAL_NMIS) +#define PIN_BASED_VMX_PREEMPTION_TIMER VMCS_CONTROL_BIT(PREEMPTION_TIMER) +#define PIN_BASED_POSTED_INTR VMCS_CONTROL_BIT(POSTED_INTR)
#define PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR 0x00000016
@@ -114,7 +117,9 @@ #define VMX_MISC_MSR_LIST_MULTIPLIER 512
/* VMFUNC functions */ -#define VMX_VMFUNC_EPTP_SWITCHING 0x00000001 +#define VMFUNC_CONTROL_BIT(x) BIT((VMX_FEATURE_##x & 0x1f) - 28) + +#define VMX_VMFUNC_EPTP_SWITCHING VMFUNC_CONTROL_BIT(EPTP_SWITCHING) #define VMFUNC_EPTP_ENTRIES 512
static inline u32 vmx_basic_vmcs_revision_id(u64 vmx_basic)
On Fri, Dec 20, 2019 at 08:45:11PM -0800, Sean Christopherson wrote:
Define the VMCS execution control flags (consumed by KVM) using their associated VMX_FEATURE_* to provide a strong hint that new VMX features are expected to be added to VMX_FEATURE and considered for reporting via /proc/cpuinfo.
No functional change intended.
Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com
arch/x86/include/asm/vmx.h | 105 +++++++++++++++++++------------------ 1 file changed, 55 insertions(+), 50 deletions(-)
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 1835767aa335..9fbba31be825 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -15,67 +15,70 @@ #include <linux/bitops.h> #include <linux/types.h> #include <uapi/asm/vmx.h> +#include <asm/vmxfeatures.h>
+#define VMCS_CONTROL_BIT(x) BIT(VMX_FEATURE_##x & 0x1f) /*
- Definitions of Primary Processor-Based VM-Execution Controls.
*/ -#define CPU_BASED_VIRTUAL_INTR_PENDING 0x00000004 -#define CPU_BASED_USE_TSC_OFFSETING 0x00000008 -#define CPU_BASED_HLT_EXITING 0x00000080 -#define CPU_BASED_INVLPG_EXITING 0x00000200 -#define CPU_BASED_MWAIT_EXITING 0x00000400 -#define CPU_BASED_RDPMC_EXITING 0x00000800 -#define CPU_BASED_RDTSC_EXITING 0x00001000 -#define CPU_BASED_CR3_LOAD_EXITING 0x00008000 -#define CPU_BASED_CR3_STORE_EXITING 0x00010000 -#define CPU_BASED_CR8_LOAD_EXITING 0x00080000 -#define CPU_BASED_CR8_STORE_EXITING 0x00100000 -#define CPU_BASED_TPR_SHADOW 0x00200000 -#define CPU_BASED_VIRTUAL_NMI_PENDING 0x00400000 -#define CPU_BASED_MOV_DR_EXITING 0x00800000 -#define CPU_BASED_UNCOND_IO_EXITING 0x01000000 -#define CPU_BASED_USE_IO_BITMAPS 0x02000000 -#define CPU_BASED_MONITOR_TRAP_FLAG 0x08000000 -#define CPU_BASED_USE_MSR_BITMAPS 0x10000000 -#define CPU_BASED_MONITOR_EXITING 0x20000000 -#define CPU_BASED_PAUSE_EXITING 0x40000000 -#define CPU_BASED_ACTIVATE_SECONDARY_CONTROLS 0x80000000 +#define CPU_BASED_VIRTUAL_INTR_PENDING VMCS_CONTROL_BIT(VIRTUAL_INTR_PENDING) +#define CPU_BASED_USE_TSC_OFFSETING VMCS_CONTROL_BIT(TSC_OFFSETTING)
checkpatch correctly complains here:
WARNING: 'OFFSETING' may be misspelled - perhaps 'OFFSETTING'? #80: FILE: arch/x86/include/asm/vmx.h:26: +#define CPU_BASED_USE_TSC_OFFSETING VMCS_CONTROL_BIT(TSC_OFFSETTING)
and VMX_FEATURE_TSC_OFFSETTING is correct.
Should I fix it up to CPU_BASED_USE_TSC_OFFSETTING while applying?
On Mon, Jan 13, 2020 at 07:32:28PM +0100, Borislav Petkov wrote:
On Fri, Dec 20, 2019 at 08:45:11PM -0800, Sean Christopherson wrote:
Define the VMCS execution control flags (consumed by KVM) using their associated VMX_FEATURE_* to provide a strong hint that new VMX features are expected to be added to VMX_FEATURE and considered for reporting via /proc/cpuinfo.
No functional change intended.
Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com
arch/x86/include/asm/vmx.h | 105 +++++++++++++++++++------------------ 1 file changed, 55 insertions(+), 50 deletions(-)
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 1835767aa335..9fbba31be825 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -15,67 +15,70 @@ #include <linux/bitops.h> #include <linux/types.h> #include <uapi/asm/vmx.h> +#include <asm/vmxfeatures.h>
+#define VMCS_CONTROL_BIT(x) BIT(VMX_FEATURE_##x & 0x1f) /*
- Definitions of Primary Processor-Based VM-Execution Controls.
*/ -#define CPU_BASED_VIRTUAL_INTR_PENDING 0x00000004 -#define CPU_BASED_USE_TSC_OFFSETING 0x00000008 -#define CPU_BASED_HLT_EXITING 0x00000080 -#define CPU_BASED_INVLPG_EXITING 0x00000200 -#define CPU_BASED_MWAIT_EXITING 0x00000400 -#define CPU_BASED_RDPMC_EXITING 0x00000800 -#define CPU_BASED_RDTSC_EXITING 0x00001000 -#define CPU_BASED_CR3_LOAD_EXITING 0x00008000 -#define CPU_BASED_CR3_STORE_EXITING 0x00010000 -#define CPU_BASED_CR8_LOAD_EXITING 0x00080000 -#define CPU_BASED_CR8_STORE_EXITING 0x00100000 -#define CPU_BASED_TPR_SHADOW 0x00200000 -#define CPU_BASED_VIRTUAL_NMI_PENDING 0x00400000 -#define CPU_BASED_MOV_DR_EXITING 0x00800000 -#define CPU_BASED_UNCOND_IO_EXITING 0x01000000 -#define CPU_BASED_USE_IO_BITMAPS 0x02000000 -#define CPU_BASED_MONITOR_TRAP_FLAG 0x08000000 -#define CPU_BASED_USE_MSR_BITMAPS 0x10000000 -#define CPU_BASED_MONITOR_EXITING 0x20000000 -#define CPU_BASED_PAUSE_EXITING 0x40000000 -#define CPU_BASED_ACTIVATE_SECONDARY_CONTROLS 0x80000000 +#define CPU_BASED_VIRTUAL_INTR_PENDING VMCS_CONTROL_BIT(VIRTUAL_INTR_PENDING) +#define CPU_BASED_USE_TSC_OFFSETING VMCS_CONTROL_BIT(TSC_OFFSETTING)
checkpatch correctly complains here:
WARNING: 'OFFSETING' may be misspelled - perhaps 'OFFSETTING'? #80: FILE: arch/x86/include/asm/vmx.h:26: +#define CPU_BASED_USE_TSC_OFFSETING VMCS_CONTROL_BIT(TSC_OFFSETTING)
and VMX_FEATURE_TSC_OFFSETTING is correct.
Should I fix it up to CPU_BASED_USE_TSC_OFFSETTING while applying?
No, the CPU_BASED_* defines are used by existing KVM code, i.e. you'd also have to touch a bunch of KVM code.
$ git grep CPU_BASED_USE_TSC_OFFSETING arch/x86/include/asm/vmx.h:#define CPU_BASED_USE_TSC_OFFSETING 0x00000008 arch/x86/kvm/vmx/nested.c: if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETING) arch/x86/kvm/vmx/nested.c: if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETING) arch/x86/kvm/vmx/nested.c: if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETING) arch/x86/kvm/vmx/nested.c: CPU_BASED_VIRTUAL_NMI_PENDING | CPU_BASED_USE_TSC_OFFSETING | arch/x86/kvm/vmx/vmx.c: (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETING)) arch/x86/kvm/vmx/vmx.c: (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETING)) arch/x86/kvm/vmx/vmx.c: CPU_BASED_USE_TSC_OFFSETING | tools/testing/selftests/kvm/include/x86_64/vmx.h:#define CPU_BASED_USE_TSC_OFFSETING 0x00000008 tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c: control |= CPU_BASED_USE_MSR_BITMAPS | CPU_BASED_USE_TSC_OFFSETING;
On Mon, Jan 13, 2020 at 10:37:05AM -0800, Sean Christopherson wrote:
No, the CPU_BASED_* defines are used by existing KVM code, i.e. you'd also have to touch a bunch of KVM code.
Doesn't bother me, I could do it in a patch ontop. But your call.
On Mon, Jan 13, 2020 at 07:38:23PM +0100, Borislav Petkov wrote:
On Mon, Jan 13, 2020 at 10:37:05AM -0800, Sean Christopherson wrote:
No, the CPU_BASED_* defines are used by existing KVM code, i.e. you'd also have to touch a bunch of KVM code.
Doesn't bother me, I could do it in a patch ontop. But your call.
No objection here.
On Mon, Jan 13, 2020 at 10:42:17AM -0800, Sean Christopherson wrote:
Doesn't bother me, I could do it in a patch ontop. But your call.
No objection here.
Something like this:
--- From: Borislav Petkov bp@suse.de
... so that "offsetting" is spelled the same as the respective VMX feature bit VMX_FEATURE_TSC_OFFSETTING.
No functional changes.
Signed-off-by: Borislav Petkov bp@suse.de --- arch/x86/include/asm/vmx.h | 2 +- arch/x86/kvm/vmx/nested.c | 8 ++++---- arch/x86/kvm/vmx/vmx.c | 6 +++--- tools/testing/selftests/kvm/include/x86_64/vmx.h | 2 +- tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c | 2 +- 5 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 9fbba31be825..6df8b3b94483 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -23,7 +23,7 @@ * Definitions of Primary Processor-Based VM-Execution Controls. */ #define CPU_BASED_VIRTUAL_INTR_PENDING VMCS_CONTROL_BIT(VIRTUAL_INTR_PENDING) -#define CPU_BASED_USE_TSC_OFFSETING VMCS_CONTROL_BIT(TSC_OFFSETTING) +#define CPU_BASED_USE_TSC_OFFSETTING VMCS_CONTROL_BIT(TSC_OFFSETTING) #define CPU_BASED_HLT_EXITING VMCS_CONTROL_BIT(HLT_EXITING) #define CPU_BASED_INVLPG_EXITING VMCS_CONTROL_BIT(INVLPG_EXITING) #define CPU_BASED_MWAIT_EXITING VMCS_CONTROL_BIT(MWAIT_EXITING) diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 6879966b7648..d466666b1de9 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -3230,7 +3230,7 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, }
enter_guest_mode(vcpu); - if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETING) + if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING) vcpu->arch.tsc_offset += vmcs12->tsc_offset;
if (prepare_vmcs02(vcpu, vmcs12, &exit_qual)) @@ -3294,7 +3294,7 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, * 26.7 "VM-entry failures during or after loading guest state". */ vmentry_fail_vmexit_guest_mode: - if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETING) + if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING) vcpu->arch.tsc_offset -= vmcs12->tsc_offset; leave_guest_mode(vcpu);
@@ -4209,7 +4209,7 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, if (nested_cpu_has_preemption_timer(vmcs12)) hrtimer_cancel(&to_vmx(vcpu)->nested.preemption_timer);
- if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETING) + if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING) vcpu->arch.tsc_offset -= vmcs12->tsc_offset;
if (likely(!vmx->fail)) { @@ -6016,7 +6016,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps, CPU_BASED_ALWAYSON_WITHOUT_TRUE_MSR; msrs->procbased_ctls_high &= CPU_BASED_VIRTUAL_INTR_PENDING | - CPU_BASED_VIRTUAL_NMI_PENDING | CPU_BASED_USE_TSC_OFFSETING | + CPU_BASED_VIRTUAL_NMI_PENDING | CPU_BASED_USE_TSC_OFFSETTING | CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING | CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING | CPU_BASED_CR3_STORE_EXITING | diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index cdb4bf50ee14..e543232a28b2 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -1716,7 +1716,7 @@ static u64 vmx_read_l1_tsc_offset(struct kvm_vcpu *vcpu) struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
if (is_guest_mode(vcpu) && - (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETING)) + (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)) return vcpu->arch.tsc_offset - vmcs12->tsc_offset;
return vcpu->arch.tsc_offset; @@ -1734,7 +1734,7 @@ static u64 vmx_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset) * to the newly set TSC to get L2's TSC. */ if (is_guest_mode(vcpu) && - (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETING)) + (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)) g_tsc_offset = vmcs12->tsc_offset;
trace_kvm_write_tsc_offset(vcpu->vcpu_id, @@ -2322,7 +2322,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, CPU_BASED_CR3_STORE_EXITING | CPU_BASED_UNCOND_IO_EXITING | CPU_BASED_MOV_DR_EXITING | - CPU_BASED_USE_TSC_OFFSETING | + CPU_BASED_USE_TSC_OFFSETTING | CPU_BASED_MWAIT_EXITING | CPU_BASED_MONITOR_EXITING | CPU_BASED_INVLPG_EXITING | diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h index f52e0ba84fed..969a0f0c2ec0 100644 --- a/tools/testing/selftests/kvm/include/x86_64/vmx.h +++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h @@ -19,7 +19,7 @@ * Definitions of Primary Processor-Based VM-Execution Controls. */ #define CPU_BASED_VIRTUAL_INTR_PENDING 0x00000004 -#define CPU_BASED_USE_TSC_OFFSETING 0x00000008 +#define CPU_BASED_USE_TSC_OFFSETTING 0x00000008 #define CPU_BASED_HLT_EXITING 0x00000080 #define CPU_BASED_INVLPG_EXITING 0x00000200 #define CPU_BASED_MWAIT_EXITING 0x00000400 diff --git a/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c b/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c index 5590fd2bcf87..69e482a95c47 100644 --- a/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c +++ b/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c @@ -98,7 +98,7 @@ static void l1_guest_code(struct vmx_pages *vmx_pages) prepare_vmcs(vmx_pages, l2_guest_code, &l2_guest_stack[L2_GUEST_STACK_SIZE]); control = vmreadz(CPU_BASED_VM_EXEC_CONTROL); - control |= CPU_BASED_USE_MSR_BITMAPS | CPU_BASED_USE_TSC_OFFSETING; + control |= CPU_BASED_USE_MSR_BITMAPS | CPU_BASED_USE_TSC_OFFSETTING; vmwrite(CPU_BASED_VM_EXEC_CONTROL, control); vmwrite(TSC_OFFSET, TSC_OFFSET_VALUE);
On Mon, Jan 13, 2020 at 07:52:16PM +0100, Borislav Petkov wrote:
On Mon, Jan 13, 2020 at 10:42:17AM -0800, Sean Christopherson wrote:
Doesn't bother me, I could do it in a patch ontop. But your call.
No objection here.
Something like this:
From: Borislav Petkov bp@suse.de
... so that "offsetting" is spelled the same as the respective VMX feature bit VMX_FEATURE_TSC_OFFSETTING.
No functional changes.
Signed-off-by: Borislav Petkov bp@suse.de
Reviewed-and-tested-by: Sean Christopherson sean.j.christopherson@intel.com
On Mon, Jan 13, 2020 at 12:16:10PM -0800, Sean Christopherson wrote:
On Mon, Jan 13, 2020 at 07:52:16PM +0100, Borislav Petkov wrote:
On Mon, Jan 13, 2020 at 10:42:17AM -0800, Sean Christopherson wrote:
Doesn't bother me, I could do it in a patch ontop. But your call.
No objection here.
Something like this:
From: Borislav Petkov bp@suse.de
... so that "offsetting" is spelled the same as the respective VMX feature bit VMX_FEATURE_TSC_OFFSETTING.
No functional changes.
Signed-off-by: Borislav Petkov bp@suse.de
Reviewed-and-tested-by: Sean Christopherson sean.j.christopherson@intel.com
Yah, so I tried to do a test-merge with linux-next to see what surprises should I be prepared for and there's the first one:
5e3d394fdd9e ("KVM: VMX: Fix the spelling of CPU_BASED_USE_TSC_OFFSETTING")
which is already in Paolo's tree. Dropping it on my side.
On Tue, Jan 14, 2020 at 10:31:38AM +0100, Borislav Petkov wrote:
On Mon, Jan 13, 2020 at 12:16:10PM -0800, Sean Christopherson wrote:
On Mon, Jan 13, 2020 at 07:52:16PM +0100, Borislav Petkov wrote:
On Mon, Jan 13, 2020 at 10:42:17AM -0800, Sean Christopherson wrote:
Doesn't bother me, I could do it in a patch ontop. But your call.
No objection here.
Something like this:
From: Borislav Petkov bp@suse.de
... so that "offsetting" is spelled the same as the respective VMX feature bit VMX_FEATURE_TSC_OFFSETTING.
No functional changes.
Signed-off-by: Borislav Petkov bp@suse.de
Reviewed-and-tested-by: Sean Christopherson sean.j.christopherson@intel.com
Yah, so I tried to do a test-merge with linux-next to see what surprises should I be prepared for and there's the first one:
5e3d394fdd9e ("KVM: VMX: Fix the spelling of CPU_BASED_USE_TSC_OFFSETTING")
which is already in Paolo's tree. Dropping it on my side.
Doh, now that you point it out, I remember that patch going by. Sorry I didn't recall it earlier.
Thanks for your help!
Provide stubs for perf_guest_get_msrs() and intel_pt_handle_vmx() when building without support for Intel CPUs, i.e. CPU_SUP_INTEL=n. Lack of stubs is not currently a problem as the only user, KVM_INTEL, takes a dependency on CPU_SUP_INTEL=y. Provide the stubs for all CPUs so that KVM_INTEL can be built for any CPU with compatible hardware support, e.g. Centuar and Zhaoxin CPUs.
Note, the existing stub for perf_guest_get_msrs() is essentially dead code as KVM selects CONFIG_PERF_EVENTS, i.e. the only user guarantees the full implementation is built.
Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com --- arch/x86/include/asm/perf_event.h | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h index ee26e9215f18..29964b0e1075 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -322,17 +322,10 @@ struct perf_guest_switch_msr { u64 host, guest; };
-extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr); extern void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap); extern void perf_check_microcode(void); extern int x86_perf_rdpmc_index(struct perf_event *event); #else -static inline struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr) -{ - *nr = 0; - return NULL; -} - static inline void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap) { memset(cap, 0, sizeof(*cap)); @@ -342,8 +335,23 @@ static inline void perf_events_lapic_init(void) { } static inline void perf_check_microcode(void) { } #endif
+#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL) +extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr); +#else +static inline struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr) +{ + *nr = 0; + return NULL; +} +#endif + #ifdef CONFIG_CPU_SUP_INTEL extern void intel_pt_handle_vmx(int on); +#else +static inline void intel_pt_handle_vmx(int on) +{ + +} #endif
#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD)
Change the dependency for KVM_INTEL, i.e. KVM w/ VMX, from Intel CPUs to any CPU that supports the IA32_FEAT_CTL MSR and thus VMX functionality. This effectively allows building KVM_INTEL for Centaur and Zhaoxin CPUs.
Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com --- arch/x86/kvm/Kconfig | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 840e12583b85..991019d5eee1 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -60,13 +60,11 @@ config KVM If unsure, say N.
config KVM_INTEL - tristate "KVM for Intel processors support" - depends on KVM - # for perf_guest_get_msrs(): - depends on CPU_SUP_INTEL + tristate "KVM for Intel (and compatible) processors support" + depends on KVM && IA32_FEAT_CTL ---help--- - Provides support for KVM on Intel processors equipped with the VT - extensions. + Provides support for KVM on processors equipped with Intel's VT + extensions, a.k.a. Virtual Machine Extensions (VMX).
To compile this as a module, choose M here: the module will be called kvm-intel.
linux-kselftest-mirror@lists.linaro.org