From: Sean Christopherson seanjc@google.com
[ Upstream commit 17bcd714426386fda741a4bccd96a2870179344b ]
Free vCPUs before freeing any VM state, as both SVM and VMX may access VM state when "freeing" a vCPU that is currently "in" L2, i.e. that needs to be kicked out of nested guest mode.
Commit 6fcee03df6a1 ("KVM: x86: avoid loading a vCPU after .vm_destroy was called") partially fixed the issue, but for unknown reasons only moved the MMU unloading before VM destruction. Complete the change, and free all vCPU state prior to destroying VM state, as nVMX accesses even more state than nSVM.
In addition to the AVIC, KVM can hit a use-after-free on MSR filters:
kvm_msr_allowed+0x4c/0xd0 __kvm_set_msr+0x12d/0x1e0 kvm_set_msr+0x19/0x40 load_vmcs12_host_state+0x2d8/0x6e0 [kvm_intel] nested_vmx_vmexit+0x715/0xbd0 [kvm_intel] nested_vmx_free_vcpu+0x33/0x50 [kvm_intel] vmx_free_vcpu+0x54/0xc0 [kvm_intel] kvm_arch_vcpu_destroy+0x28/0xf0 kvm_vcpu_destroy+0x12/0x50 kvm_arch_destroy_vm+0x12c/0x1c0 kvm_put_kvm+0x263/0x3c0 kvm_vm_release+0x21/0x30
and an upcoming fix to process injectable interrupts on nested VM-Exit will access the PIC:
BUG: kernel NULL pointer dereference, address: 0000000000000090 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page CPU: 23 UID: 1000 PID: 2658 Comm: kvm-nx-lpage-re RIP: 0010:kvm_cpu_has_extint+0x2f/0x60 [kvm] Call Trace: <TASK> kvm_cpu_has_injectable_intr+0xe/0x60 [kvm] nested_vmx_vmexit+0x2d7/0xdf0 [kvm_intel] nested_vmx_free_vcpu+0x40/0x50 [kvm_intel] vmx_vcpu_free+0x2d/0x80 [kvm_intel] kvm_arch_vcpu_destroy+0x2d/0x130 [kvm] kvm_destroy_vcpus+0x8a/0x100 [kvm] kvm_arch_destroy_vm+0xa7/0x1d0 [kvm] kvm_destroy_vm+0x172/0x300 [kvm] kvm_vcpu_release+0x31/0x50 [kvm]
Inarguably, both nSVM and nVMX need to be fixed, but punt on those cleanups for the moment. Conceptually, vCPUs should be freed before VM state. Assets like the I/O APIC and PIC _must_ be allocated before vCPUs are created, so it stands to reason that they must be freed _after_ vCPUs are destroyed.
Reported-by: Aaron Lewis aaronlewis@google.com Closes: https://lore.kernel.org/all/20240703175618.2304869-2-aaronlewis@google.com Cc: Jim Mattson jmattson@google.com Cc: Yan Zhao yan.y.zhao@intel.com Cc: Rick P Edgecombe rick.p.edgecombe@intel.com Cc: Kai Huang kai.huang@intel.com Cc: Isaku Yamahata isaku.yamahata@intel.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Kevin Cheng chengkev@google.com --- arch/x86/kvm/x86.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f378d479fea3f..7f91b11e6f0ec 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -12888,11 +12888,11 @@ void kvm_arch_destroy_vm(struct kvm *kvm) mutex_unlock(&kvm->slots_lock); } kvm_unload_vcpu_mmus(kvm); + kvm_destroy_vcpus(kvm); kvm_x86_call(vm_destroy)(kvm); kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1)); kvm_pic_destroy(kvm); kvm_ioapic_destroy(kvm); - kvm_destroy_vcpus(kvm); kvfree(rcu_dereference_check(kvm->arch.apic_map, 1)); kfree(srcu_dereference_check(kvm->arch.pmu_event_filter, &kvm->srcu, 1)); kvm_mmu_uninit_vm(kvm);
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: 17bcd714426386fda741a4bccd96a2870179344b
WARNING: Author mismatch between patch and upstream commit: Backport author: Kevin Cheng chengkev@google.com Commit author: Sean Christopherson seanjc@google.com
Status in newer kernel trees: 6.15.y | Present (exact SHA1)
Note: The patch differs from the upstream commit: --- 1: 17bcd7144263 ! 1: c2bfc381d88f KVM: x86: Free vCPUs before freeing VM state @@ Metadata ## Commit message ## KVM: x86: Free vCPUs before freeing VM state
+ [ Upstream commit 17bcd714426386fda741a4bccd96a2870179344b ] + Free vCPUs before freeing any VM state, as both SVM and VMX may access VM state when "freeing" a vCPU that is currently "in" L2, i.e. that needs to be kicked out of nested guest mode. @@ Commit message Cc: Kai Huang kai.huang@intel.com Cc: Isaku Yamahata isaku.yamahata@intel.com Signed-off-by: Sean Christopherson seanjc@google.com - Message-ID: 20250224235542.2562848-2-seanjc@google.com - Signed-off-by: Paolo Bonzini pbonzini@redhat.com + Signed-off-by: Kevin Cheng chengkev@google.com
## arch/x86/kvm/x86.c ## @@ arch/x86/kvm/x86.c: void kvm_arch_destroy_vm(struct kvm *kvm)
---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | origin/linux-6.12.y | Success | Success |
linux-stable-mirror@lists.linaro.org