svm_update_lbrv() is called when MSR_IA32_DEBUGCTLMSR is updated, and on nested transitions where LBRV is used. It checks whether LBRV enablement needs to be changed in the current VMCB, and if it does, it also recalculate intercepts to LBR MSRs.
However, there are cases where intercepts need to be updated even when LBRV enablement doesn't. Example scenario: - L1 has MSR_IA32_DEBUGCTLMSR cleared. - L1 runs L2 without LBR_CTL_ENABLE (no LBRV). - L2 sets DEBUGCTLMSR_LBR in MSR_IA32_DEBUGCTLMSR, svm_update_lbrv() sets LBR_CTL_ENABLE in VMCB02 and disables intercepts to LBR MSRs. - L2 exits to L1, svm_update_lbrv() is not called on this transition. - L1 clears MSR_IA32_DEBUGCTLMSR, svm_update_lbrv() finds that LBR_CTL_ENABLE is already cleared in VMCB01 and does nothing. - Intercepts remain disabled, L1 reads to LBR MSRs read the host MSRs.
Fix it by always recalculating intercepts in svm_update_lbrv().
Fixes: 1d5a1b5860ed ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running") Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed yosry.ahmed@linux.dev --- arch/x86/kvm/svm/svm.c | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index d25c56b30b4e2..26ab75ecf1c67 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -806,25 +806,29 @@ void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb) vmcb_mark_dirty(to_vmcb, VMCB_LBR); }
-void svm_enable_lbrv(struct kvm_vcpu *vcpu) +static void __svm_enable_lbrv(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu);
svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK; - svm_recalc_lbr_msr_intercepts(vcpu);
/* Move the LBR msrs to the vmcb02 so that the guest can see them. */ if (is_guest_mode(vcpu)) svm_copy_lbrs(svm->vmcb, svm->vmcb01.ptr); }
-static void svm_disable_lbrv(struct kvm_vcpu *vcpu) +void svm_enable_lbrv(struct kvm_vcpu *vcpu) +{ + __svm_enable_lbrv(vcpu); + svm_recalc_lbr_msr_intercepts(vcpu); +} + +static void __svm_disable_lbrv(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu);
KVM_BUG_ON(sev_es_guest(vcpu->kvm), vcpu->kvm); svm->vmcb->control.virt_ext &= ~LBR_CTL_ENABLE_MASK; - svm_recalc_lbr_msr_intercepts(vcpu);
/* * Move the LBR msrs back to the vmcb01 to avoid copying them @@ -853,13 +857,18 @@ void svm_update_lbrv(struct kvm_vcpu *vcpu) (is_guest_mode(vcpu) && guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) && (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK));
- if (enable_lbrv == current_enable_lbrv) - return; + if (enable_lbrv && !current_enable_lbrv) + __svm_enable_lbrv(vcpu); + else if (!enable_lbrv && current_enable_lbrv) + __svm_disable_lbrv(vcpu);
- if (enable_lbrv) - svm_enable_lbrv(vcpu); - else - svm_disable_lbrv(vcpu); + /* + * During nested transitions, it is possible that the current VMCB has + * LBR_CTL set, but the previous LBR_CTL had it cleared (or vice versa). + * In this case, even though LBR_CTL does not need an update, intercepts + * do, so always recalculate the intercepts here. + */ + svm_recalc_lbr_msr_intercepts(vcpu); }
void disable_nmi_singlestep(struct vcpu_svm *svm)
On Sat, Nov 08, 2025 at 12:45:20AM +0000, Yosry Ahmed wrote:
svm_update_lbrv() is called when MSR_IA32_DEBUGCTLMSR is updated, and on nested transitions where LBRV is used. It checks whether LBRV enablement needs to be changed in the current VMCB, and if it does, it also recalculate intercepts to LBR MSRs.
However, there are cases where intercepts need to be updated even when LBRV enablement doesn't. Example scenario:
- L1 has MSR_IA32_DEBUGCTLMSR cleared.
- L1 runs L2 without LBR_CTL_ENABLE (no LBRV).
- L2 sets DEBUGCTLMSR_LBR in MSR_IA32_DEBUGCTLMSR, svm_update_lbrv() sets LBR_CTL_ENABLE in VMCB02 and disables intercepts to LBR MSRs.
- L2 exits to L1, svm_update_lbrv() is not called on this transition.
- L1 clears MSR_IA32_DEBUGCTLMSR, svm_update_lbrv() finds that LBR_CTL_ENABLE is already cleared in VMCB01 and does nothing.
- Intercepts remain disabled, L1 reads to LBR MSRs read the host MSRs.
Fix it by always recalculating intercepts in svm_update_lbrv().
This actually breaks hyperv_svm_test, because svm_update_lbrv() is called on every nested transition, calling svm_recalc_lbr_msr_intercepts() -> svm_set_intercept_for_msr() and setting svm->nested.force_msr_bitmap_recalc to true.
This breaks the hyperv optimization in nested_svm_vmrun_msrpm() AFAICT.
I think there are two ways to fix this: - Add another bool to svm->nested to track LBR intercepts, and only call svm_set_intercept_for_msr() if the intercepts need to be updated.
- Update svm_set_intercept_for_msr() itself to do nothing if the intercepts do not need to be changed, which is more clutter but applies to other callers as well so could shave cycles elsewhere (see below).
Sean, Paolo, any preferences?
Here's what updating svm_set_intercept_for_msr() looks like:
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 2fbb0b88c6a3e..b1afafc8b37c0 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -664,24 +664,36 @@ void svm_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type, bool se { struct vcpu_svm *svm = to_svm(vcpu); void *msrpm = svm->msrpm; + bool recalc = false; + bool already_set;
/* Don't disable interception for MSRs userspace wants to handle. */ if (type & MSR_TYPE_R) { - if (!set && kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ)) + already_set = svm_test_msr_bitmap_read(msrpm, msr); + if (!set && already_set && kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ)) { svm_clear_msr_bitmap_read(msrpm, msr); - else + recalc = true; + } else if (set && !already_set) { svm_set_msr_bitmap_read(msrpm, msr); + recalc = true; + } }
if (type & MSR_TYPE_W) { - if (!set && kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE)) + already_set = svm_test_msr_bitmap_write(msrpm, msr); + if (!set && already_set && kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE)) { svm_clear_msr_bitmap_write(msrpm, msr); - else + recalc = true; + } else if (set && !already_set) { svm_set_msr_bitmap_write(msrpm, msr); + recalc = true; + } }
- svm_hv_vmcb_dirty_nested_enlightenments(vcpu); - svm->nested.force_msr_bitmap_recalc = true; + if (recalc) { + svm_hv_vmcb_dirty_nested_enlightenments(vcpu); + svm->nested.force_msr_bitmap_recalc = true; + } }
void *svm_alloc_permissions_map(unsigned long size, gfp_t gfp_mask)
Fixes: 1d5a1b5860ed ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running") Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed yosry.ahmed@linux.dev
arch/x86/kvm/svm/svm.c | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index d25c56b30b4e2..26ab75ecf1c67 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -806,25 +806,29 @@ void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb) vmcb_mark_dirty(to_vmcb, VMCB_LBR); } -void svm_enable_lbrv(struct kvm_vcpu *vcpu) +static void __svm_enable_lbrv(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK;
- svm_recalc_lbr_msr_intercepts(vcpu);
/* Move the LBR msrs to the vmcb02 so that the guest can see them. */ if (is_guest_mode(vcpu)) svm_copy_lbrs(svm->vmcb, svm->vmcb01.ptr); } -static void svm_disable_lbrv(struct kvm_vcpu *vcpu) +void svm_enable_lbrv(struct kvm_vcpu *vcpu) +{
- __svm_enable_lbrv(vcpu);
- svm_recalc_lbr_msr_intercepts(vcpu);
+}
+static void __svm_disable_lbrv(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); KVM_BUG_ON(sev_es_guest(vcpu->kvm), vcpu->kvm); svm->vmcb->control.virt_ext &= ~LBR_CTL_ENABLE_MASK;
- svm_recalc_lbr_msr_intercepts(vcpu);
/* * Move the LBR msrs back to the vmcb01 to avoid copying them @@ -853,13 +857,18 @@ void svm_update_lbrv(struct kvm_vcpu *vcpu) (is_guest_mode(vcpu) && guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) && (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK));
- if (enable_lbrv == current_enable_lbrv)
return;
- if (enable_lbrv && !current_enable_lbrv)
__svm_enable_lbrv(vcpu);- else if (!enable_lbrv && current_enable_lbrv)
__svm_disable_lbrv(vcpu);
- if (enable_lbrv)
svm_enable_lbrv(vcpu);- else
svm_disable_lbrv(vcpu);
- /*
* During nested transitions, it is possible that the current VMCB has* LBR_CTL set, but the previous LBR_CTL had it cleared (or vice versa).* In this case, even though LBR_CTL does not need an update, intercepts* do, so always recalculate the intercepts here.*/- svm_recalc_lbr_msr_intercepts(vcpu);
} void disable_nmi_singlestep(struct vcpu_svm *svm) -- 2.51.2.1041.gc1ab5b90ca-goog
On Tue, Nov 11, 2025 at 03:11:37AM +0000, Yosry Ahmed wrote:
On Sat, Nov 08, 2025 at 12:45:20AM +0000, Yosry Ahmed wrote:
svm_update_lbrv() is called when MSR_IA32_DEBUGCTLMSR is updated, and on nested transitions where LBRV is used. It checks whether LBRV enablement needs to be changed in the current VMCB, and if it does, it also recalculate intercepts to LBR MSRs.
However, there are cases where intercepts need to be updated even when LBRV enablement doesn't. Example scenario:
- L1 has MSR_IA32_DEBUGCTLMSR cleared.
- L1 runs L2 without LBR_CTL_ENABLE (no LBRV).
- L2 sets DEBUGCTLMSR_LBR in MSR_IA32_DEBUGCTLMSR, svm_update_lbrv() sets LBR_CTL_ENABLE in VMCB02 and disables intercepts to LBR MSRs.
- L2 exits to L1, svm_update_lbrv() is not called on this transition.
- L1 clears MSR_IA32_DEBUGCTLMSR, svm_update_lbrv() finds that LBR_CTL_ENABLE is already cleared in VMCB01 and does nothing.
- Intercepts remain disabled, L1 reads to LBR MSRs read the host MSRs.
Fix it by always recalculating intercepts in svm_update_lbrv().
This actually breaks hyperv_svm_test, because svm_update_lbrv() is called on every nested transition, calling svm_recalc_lbr_msr_intercepts() -> svm_set_intercept_for_msr() and setting svm->nested.force_msr_bitmap_recalc to true.
This breaks the hyperv optimization in nested_svm_vmrun_msrpm() AFAICT.
I think there are two ways to fix this:
Add another bool to svm->nested to track LBR intercepts, and only call svm_set_intercept_for_msr() if the intercepts need to be updated.
Update svm_set_intercept_for_msr() itself to do nothing if the intercepts do not need to be changed, which is more clutter but applies to other callers as well so could shave cycles elsewhere (see below).
Sean, Paolo, any preferences?
Here's what updating svm_set_intercept_for_msr() looks like:
and that diff breaks userspace_msr_exit_test :)
Here's an actually tested diff:
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 2fbb0b88c6a3e..88717429ba9d5 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -664,24 +664,38 @@ void svm_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type, bool se { struct vcpu_svm *svm = to_svm(vcpu); void *msrpm = svm->msrpm; + bool recalc = false; + bool already_set;
/* Don't disable interception for MSRs userspace wants to handle. */ if (type & MSR_TYPE_R) { - if (!set && kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ)) + set = set || !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ); + already_set = svm_test_msr_bitmap_read(msrpm, msr); + + if (!set && already_set) svm_clear_msr_bitmap_read(msrpm, msr); - else + else if (set && !already_set) svm_set_msr_bitmap_read(msrpm, msr); + + recalc |= (set != already_set); }
if (type & MSR_TYPE_W) { - if (!set && kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE)) + set = set || !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE); + already_set = svm_test_msr_bitmap_write(msrpm, msr); + + if (!set && already_set) svm_clear_msr_bitmap_write(msrpm, msr); - else + else if (set && !already_set) svm_set_msr_bitmap_write(msrpm, msr); + + recalc |= (set != already_set); }
- svm_hv_vmcb_dirty_nested_enlightenments(vcpu); - svm->nested.force_msr_bitmap_recalc = true; + if (recalc) { + svm_hv_vmcb_dirty_nested_enlightenments(vcpu); + svm->nested.force_msr_bitmap_recalc = true; + } }
void *svm_alloc_permissions_map(unsigned long size, gfp_t gfp_mask)
---
For the record, I don't want to just use svm_test_msr_bitmap_*() in svm_update_lbrv() because there is no direct equivalent in older kernels as far as I can tell, so a backport would be completely inapplicable.
On Tue, Nov 11, 2025, Yosry Ahmed wrote:
On Tue, Nov 11, 2025 at 03:11:37AM +0000, Yosry Ahmed wrote:
On Sat, Nov 08, 2025 at 12:45:20AM +0000, Yosry Ahmed wrote:
svm_update_lbrv() is called when MSR_IA32_DEBUGCTLMSR is updated, and on nested transitions where LBRV is used. It checks whether LBRV enablement needs to be changed in the current VMCB, and if it does, it also recalculate intercepts to LBR MSRs.
However, there are cases where intercepts need to be updated even when LBRV enablement doesn't. Example scenario:
- L1 has MSR_IA32_DEBUGCTLMSR cleared.
- L1 runs L2 without LBR_CTL_ENABLE (no LBRV).
- L2 sets DEBUGCTLMSR_LBR in MSR_IA32_DEBUGCTLMSR, svm_update_lbrv() sets LBR_CTL_ENABLE in VMCB02 and disables intercepts to LBR MSRs.
- L2 exits to L1, svm_update_lbrv() is not called on this transition.
- L1 clears MSR_IA32_DEBUGCTLMSR, svm_update_lbrv() finds that LBR_CTL_ENABLE is already cleared in VMCB01 and does nothing.
- Intercepts remain disabled, L1 reads to LBR MSRs read the host MSRs.
Fix it by always recalculating intercepts in svm_update_lbrv().
This actually breaks hyperv_svm_test, because svm_update_lbrv() is called on every nested transition, calling svm_recalc_lbr_msr_intercepts() -> svm_set_intercept_for_msr() and setting svm->nested.force_msr_bitmap_recalc to true.
This breaks the hyperv optimization in nested_svm_vmrun_msrpm() AFAICT.
I think there are two ways to fix this:
Add another bool to svm->nested to track LBR intercepts, and only call svm_set_intercept_for_msr() if the intercepts need to be updated.
Update svm_set_intercept_for_msr() itself to do nothing if the intercepts do not need to be changed, which is more clutter but applies to other callers as well so could shave cycles elsewhere (see below).
Sean, Paolo, any preferences?
Here's what updating svm_set_intercept_for_msr() looks like:
I am *very* strongly opposed to modifying svm_set_intercept_for_msr() to deal with whatever mess LBRs has created. Whatever the problem is (I haven't read through all of this yet), it needs to be fixed in the LBR code, not in svm_set_intercept_for_msr().
Recalculating MSR intercepts is supposed to done only as needed, I don't want to encourage lazy code that works by optimizing paths that should be rare.
linux-stable-mirror@lists.linaro.org