On Tue, Nov 11, 2025 at 03:11:37AM +0000, Yosry Ahmed wrote:
On Sat, Nov 08, 2025 at 12:45:20AM +0000, Yosry Ahmed wrote:
svm_update_lbrv() is called when MSR_IA32_DEBUGCTLMSR is updated, and on nested transitions where LBRV is used. It checks whether LBRV enablement needs to be changed in the current VMCB, and if it does, it also recalculate intercepts to LBR MSRs.
However, there are cases where intercepts need to be updated even when LBRV enablement doesn't. Example scenario:
- L1 has MSR_IA32_DEBUGCTLMSR cleared.
- L1 runs L2 without LBR_CTL_ENABLE (no LBRV).
- L2 sets DEBUGCTLMSR_LBR in MSR_IA32_DEBUGCTLMSR, svm_update_lbrv() sets LBR_CTL_ENABLE in VMCB02 and disables intercepts to LBR MSRs.
- L2 exits to L1, svm_update_lbrv() is not called on this transition.
- L1 clears MSR_IA32_DEBUGCTLMSR, svm_update_lbrv() finds that LBR_CTL_ENABLE is already cleared in VMCB01 and does nothing.
- Intercepts remain disabled, L1 reads to LBR MSRs read the host MSRs.
Fix it by always recalculating intercepts in svm_update_lbrv().
This actually breaks hyperv_svm_test, because svm_update_lbrv() is called on every nested transition, calling svm_recalc_lbr_msr_intercepts() -> svm_set_intercept_for_msr() and setting svm->nested.force_msr_bitmap_recalc to true.
This breaks the hyperv optimization in nested_svm_vmrun_msrpm() AFAICT.
I think there are two ways to fix this:
Add another bool to svm->nested to track LBR intercepts, and only call svm_set_intercept_for_msr() if the intercepts need to be updated.
Update svm_set_intercept_for_msr() itself to do nothing if the intercepts do not need to be changed, which is more clutter but applies to other callers as well so could shave cycles elsewhere (see below).
Sean, Paolo, any preferences?
Here's what updating svm_set_intercept_for_msr() looks like:
and that diff breaks userspace_msr_exit_test :)
Here's an actually tested diff:
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 2fbb0b88c6a3e..88717429ba9d5 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -664,24 +664,38 @@ void svm_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type, bool se { struct vcpu_svm *svm = to_svm(vcpu); void *msrpm = svm->msrpm; + bool recalc = false; + bool already_set;
/* Don't disable interception for MSRs userspace wants to handle. */ if (type & MSR_TYPE_R) { - if (!set && kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ)) + set = set || !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ); + already_set = svm_test_msr_bitmap_read(msrpm, msr); + + if (!set && already_set) svm_clear_msr_bitmap_read(msrpm, msr); - else + else if (set && !already_set) svm_set_msr_bitmap_read(msrpm, msr); + + recalc |= (set != already_set); }
if (type & MSR_TYPE_W) { - if (!set && kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE)) + set = set || !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE); + already_set = svm_test_msr_bitmap_write(msrpm, msr); + + if (!set && already_set) svm_clear_msr_bitmap_write(msrpm, msr); - else + else if (set && !already_set) svm_set_msr_bitmap_write(msrpm, msr); + + recalc |= (set != already_set); }
- svm_hv_vmcb_dirty_nested_enlightenments(vcpu); - svm->nested.force_msr_bitmap_recalc = true; + if (recalc) { + svm_hv_vmcb_dirty_nested_enlightenments(vcpu); + svm->nested.force_msr_bitmap_recalc = true; + } }
void *svm_alloc_permissions_map(unsigned long size, gfp_t gfp_mask)
---
For the record, I don't want to just use svm_test_msr_bitmap_*() in svm_update_lbrv() because there is no direct equivalent in older kernels as far as I can tell, so a backport would be completely inapplicable.