Currently, the situation when guest accesses MMIO during vectoring is handled differently on VMX and SVM: on VMX KVM returns internal error, when SVM goes into infinite loop trying to deliver an event again and again.
This patch series eliminates this difference by returning a KVM internal error when guest performs MMIO during vectoring for both VMX and SVM.
Also, introduce a selftest test case which covers the error handling mentioned above.
V1 -> V2: - Make commit messages more brief, avoid using pronouns - Extract SVM error handling into a separate commit - Introduce a new X86EMUL_ return type and detect the unhandleable vectoring error in vendor-specific check_emulate_instruction instead of handling it in the common MMU code (which is specific for cached MMIO)
Ivan Orlov (6): KVM: x86: Add function for vectoring error generation KVM: x86: Add emulation status for vectoring during MMIO KVM: VMX: Handle vectoring error in check_emulate_instruction KVM: SVM: Handle MMIO during vectroing error selftests: KVM: extract lidt into helper function selftests: KVM: Add test case for MMIO during vectoring
arch/x86/include/asm/kvm_host.h | 12 ++++- arch/x86/kvm/kvm_emulate.h | 2 + arch/x86/kvm/svm/svm.c | 9 +++- arch/x86/kvm/vmx/vmx.c | 33 +++++------- arch/x86/kvm/x86.c | 27 ++++++++++ .../selftests/kvm/include/x86_64/processor.h | 7 +++ .../selftests/kvm/set_memory_region_test.c | 53 ++++++++++++++++++- .../selftests/kvm/x86_64/sev_smoke_test.c | 2 +- 8 files changed, 119 insertions(+), 26 deletions(-)
Extract VMX code for unhandleable VM-Exit during vectoring into vendor-agnostic function so that boiler-plate code can be shared by SVM.
Report an actual GPA for EPT misconfig or invalid GPA for any other exit code in internal.data[3].
Signed-off-by: Ivan Orlov iorlov@amazon.com --- V1 -> V2: - Return GPA for any exit reason, using reported GPA when it is valid or INVALID_GPA otherwise. - Rename the error preparation function - Fix indentation
arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/vmx/vmx.c | 16 ++++------------ arch/x86/kvm/x86.c | 22 ++++++++++++++++++++++ 3 files changed, 28 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 6d9f763a7bb9..eb413079b7c6 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -2060,6 +2060,8 @@ void __kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu, u64 *data, u8 ndata); void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu);
+void kvm_prepare_event_vectoring_exit(struct kvm_vcpu *vcpu, gpa_t gpa); + void kvm_enable_efer_bits(u64); bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer); int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index f6900bec4874..f92740e7e107 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6452,6 +6452,7 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) union vmx_exit_reason exit_reason = vmx->exit_reason; u32 vectoring_info = vmx->idt_vectoring_info; u16 exit_handler_index; + gpa_t gpa;
/* * Flush logged GPAs PML buffer, this will make dirty_bitmap more @@ -6550,19 +6551,10 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) exit_reason.basic != EXIT_REASON_APIC_ACCESS && exit_reason.basic != EXIT_REASON_TASK_SWITCH && exit_reason.basic != EXIT_REASON_NOTIFY)) { - int ndata = 3; + gpa = exit_reason.basic == EXIT_REASON_EPT_MISCONFIG + ? vmcs_read64(GUEST_PHYSICAL_ADDRESS) : INVALID_GPA;
- vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; - vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_DELIVERY_EV; - vcpu->run->internal.data[0] = vectoring_info; - vcpu->run->internal.data[1] = exit_reason.full; - vcpu->run->internal.data[2] = vmx_get_exit_qual(vcpu); - if (exit_reason.basic == EXIT_REASON_EPT_MISCONFIG) { - vcpu->run->internal.data[ndata++] = - vmcs_read64(GUEST_PHYSICAL_ADDRESS); - } - vcpu->run->internal.data[ndata++] = vcpu->arch.last_vmentry_cpu; - vcpu->run->internal.ndata = ndata; + kvm_prepare_event_vectoring_exit(vcpu, gpa); return 0; }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 83fe0a78146f..e338d583f48f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8828,6 +8828,28 @@ void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_prepare_emulation_failure_exit);
+void kvm_prepare_event_vectoring_exit(struct kvm_vcpu *vcpu, gpa_t gpa) +{ + u32 reason, intr_info, error_code; + struct kvm_run *run = vcpu->run; + u64 info1, info2; + int ndata = 0; + + kvm_x86_call(get_exit_info)(vcpu, &reason, &info1, &info2, + &intr_info, &error_code); + + run->internal.data[ndata++] = info2; + run->internal.data[ndata++] = reason; + run->internal.data[ndata++] = info1; + run->internal.data[ndata++] = (u64)gpa; + run->internal.data[ndata++] = vcpu->arch.last_vmentry_cpu; + + run->exit_reason = KVM_EXIT_INTERNAL_ERROR; + run->internal.suberror = KVM_INTERNAL_ERROR_DELIVERY_EV; + run->internal.ndata = ndata; +} +EXPORT_SYMBOL_GPL(kvm_prepare_event_vectoring_exit); + static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type) { struct kvm *kvm = vcpu->kvm;
On Mon, Nov 11, 2024, Ivan Orlov wrote:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index f6900bec4874..f92740e7e107 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6452,6 +6452,7 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) union vmx_exit_reason exit_reason = vmx->exit_reason; u32 vectoring_info = vmx->idt_vectoring_info; u16 exit_handler_index;
- gpa_t gpa;
I've gone back and forth on where to declare scoped varaibles, but in this case, I think it makes sense to declare "gpa" inside the if-statement. Making it visible at the function scope when it's valid in a _super_ limited case is bound to cause issues.
Of course, this code goes away by the end of the series, so that point is moot. But on the other hand, declaring the variable in the if-statement is desirable as the churn is precisely limited to the code that's being changed.
/* * Flush logged GPAs PML buffer, this will make dirty_bitmap more @@ -6550,19 +6551,10 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) exit_reason.basic != EXIT_REASON_APIC_ACCESS && exit_reason.basic != EXIT_REASON_TASK_SWITCH && exit_reason.basic != EXIT_REASON_NOTIFY)) {
int ndata = 3;
gpa = exit_reason.basic == EXIT_REASON_EPT_MISCONFIG
? vmcs_read64(GUEST_PHYSICAL_ADDRESS) : INVALID_GPA;
Again a moot point, but IMO using a ternary operator here makes it unnecessarily difficult to see that gpa is valid if and only if the exit was an EPT misconfig.
gpa_t gpa = INVALID_GPA;
if (exit_reason.basic == EXIT_REASON_EPT_MISCONFIG) gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_DELIVERY_EV;
vcpu->run->internal.data[0] = vectoring_info;
vcpu->run->internal.data[1] = exit_reason.full;
vcpu->run->internal.data[2] = vmx_get_exit_qual(vcpu);
if (exit_reason.basic == EXIT_REASON_EPT_MISCONFIG) {
vcpu->run->internal.data[ndata++] =
vmcs_read64(GUEST_PHYSICAL_ADDRESS);
}
vcpu->run->internal.data[ndata++] = vcpu->arch.last_vmentry_cpu;
vcpu->run->internal.ndata = ndata;
return 0; }kvm_prepare_event_vectoring_exit(vcpu, gpa);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 83fe0a78146f..e338d583f48f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8828,6 +8828,28 @@ void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_prepare_emulation_failure_exit); +void kvm_prepare_event_vectoring_exit(struct kvm_vcpu *vcpu, gpa_t gpa) +{
- u32 reason, intr_info, error_code;
- struct kvm_run *run = vcpu->run;
- u64 info1, info2;
- int ndata = 0;
- kvm_x86_call(get_exit_info)(vcpu, &reason, &info1, &info2,
&intr_info, &error_code);
- run->internal.data[ndata++] = info2;
- run->internal.data[ndata++] = reason;
- run->internal.data[ndata++] = info1;
- run->internal.data[ndata++] = (u64)gpa;
No need for the cast.
- run->internal.data[ndata++] = vcpu->arch.last_vmentry_cpu;
- run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
- run->internal.suberror = KVM_INTERNAL_ERROR_DELIVERY_EV;
- run->internal.ndata = ndata;
+} +EXPORT_SYMBOL_GPL(kvm_prepare_event_vectoring_exit);
static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type) { struct kvm *kvm = vcpu->kvm; -- 2.43.0
Add emulation status for vectoring error due to MMIO. Such a situation can occur if guest sets the IDT descriptor base to point to MMIO region, and triggers an exception after that.
Exit to userspace with event delivery error when MMIO happens during vectoring.
Signed-off-by: Ivan Orlov iorlov@amazon.com --- V1 -> V2: - This patch wasn't included in V1.
arch/x86/kvm/kvm_emulate.h | 2 ++ arch/x86/kvm/x86.c | 5 +++++ 2 files changed, 7 insertions(+)
diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h index 55a18e2f2dcd..f856bc979bdb 100644 --- a/arch/x86/kvm/kvm_emulate.h +++ b/arch/x86/kvm/kvm_emulate.h @@ -88,6 +88,8 @@ struct x86_instruction_info { #define X86EMUL_CMPXCHG_FAILED 4 /* cmpxchg did not see expected value */ #define X86EMUL_IO_NEEDED 5 /* IO is needed to complete emulation */ #define X86EMUL_INTERCEPTED 6 /* Intercepted by nested VMCB/VMCS */ +/* Vectroing requires MMIO and can't be emulated */ +#define X86EMUL_UNHANDLEABLE_VECTORING_IO 7
/* x86-specific emulation flags */ #define X86EMUL_F_WRITE BIT(0) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e338d583f48f..4ba371040685 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9122,6 +9122,11 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, if (r == X86EMUL_RETRY_INSTR || r == X86EMUL_PROPAGATE_FAULT) return 1;
+ if (r == X86EMUL_UNHANDLEABLE_VECTORING_IO) { + kvm_prepare_event_vectoring_exit(vcpu, cr2_or_gpa); + return 0; + } + WARN_ON_ONCE(r != X86EMUL_UNHANDLEABLE); return handle_emulation_failure(vcpu, emulation_type); }
Move unhandleable vmexit due to MMIO during vectoring error detection into check_emulate_instruction. Implement a function which checks if emul_type indicates MMIO so it can be used for both VMX and SVM.
Fix the comment about EMULTYPE_PF as this flag doesn't necessarily mean MMIO anymore: it can also be set due to the write protection violation.
Signed-off-by: Ivan Orlov iorlov@amazon.com --- V1 -> V2: - Detect the unhandleable vectoring error in vmx_check_emulate_instruction instead of handling it in the common MMU code (which is specific for cached MMIO)
arch/x86/include/asm/kvm_host.h | 10 ++++++++-- arch/x86/kvm/vmx/vmx.c | 25 ++++++++++++------------- 2 files changed, 20 insertions(+), 15 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index eb413079b7c6..3de9702a9135 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -2017,8 +2017,8 @@ u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu); * VMware backdoor emulation handles select instructions * and reinjects the #GP for all other cases. * - * EMULTYPE_PF - Set when emulating MMIO by way of an intercepted #PF, in which - * case the CR2/GPA value pass on the stack is valid. + * EMULTYPE_PF - Set when an intercepted #PF triggers the emulation, in which case + * the CR2/GPA value pass on the stack is valid. * * EMULTYPE_COMPLETE_USER_EXIT - Set when the emulator should update interruptibility * state and inject single-step #DBs after skipping @@ -2053,6 +2053,12 @@ u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu); #define EMULTYPE_COMPLETE_USER_EXIT (1 << 7) #define EMULTYPE_WRITE_PF_TO_SP (1 << 8)
+static inline bool kvm_is_emul_type_mmio(int emul_type) +{ + return (emul_type & EMULTYPE_PF) && + !(emul_type & EMULTYPE_WRITE_PF_TO_SP); +} + int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type); int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu, void *insn, int insn_len); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index f92740e7e107..a10f35d9704b 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -1693,6 +1693,8 @@ static int vmx_rtit_ctl_check(struct kvm_vcpu *vcpu, u64 data) int vmx_check_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type, void *insn, int insn_len) { + bool is_vect; + /* * Emulation of instructions in SGX enclaves is impossible as RIP does * not point at the failing instruction, and even if it did, the code @@ -1704,6 +1706,13 @@ int vmx_check_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type, kvm_queue_exception(vcpu, UD_VECTOR); return X86EMUL_PROPAGATE_FAULT; } + + is_vect = to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK; + + /* Emulation is not possible when MMIO happens during event vectoring. */ + if (kvm_is_emul_type_mmio(emul_type) && is_vect) + return X86EMUL_UNHANDLEABLE_VECTORING_IO; + return X86EMUL_CONTINUE; }
@@ -6452,7 +6461,6 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) union vmx_exit_reason exit_reason = vmx->exit_reason; u32 vectoring_info = vmx->idt_vectoring_info; u16 exit_handler_index; - gpa_t gpa;
/* * Flush logged GPAs PML buffer, this will make dirty_bitmap more @@ -6537,24 +6545,15 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) return 0; }
- /* - * Note: - * Do not try to fix EXIT_REASON_EPT_MISCONFIG if it caused by - * delivery event since it indicates guest is accessing MMIO. - * The vm-exit can be triggered again after return to guest that - * will cause infinite loop. - */ if ((vectoring_info & VECTORING_INFO_VALID_MASK) && (exit_reason.basic != EXIT_REASON_EXCEPTION_NMI && exit_reason.basic != EXIT_REASON_EPT_VIOLATION && exit_reason.basic != EXIT_REASON_PML_FULL && exit_reason.basic != EXIT_REASON_APIC_ACCESS && exit_reason.basic != EXIT_REASON_TASK_SWITCH && - exit_reason.basic != EXIT_REASON_NOTIFY)) { - gpa = exit_reason.basic == EXIT_REASON_EPT_MISCONFIG - ? vmcs_read64(GUEST_PHYSICAL_ADDRESS) : INVALID_GPA; - - kvm_prepare_event_vectoring_exit(vcpu, gpa); + exit_reason.basic != EXIT_REASON_NOTIFY && + exit_reason.basic != EXIT_REASON_EPT_MISCONFIG)) { + kvm_prepare_event_vectoring_exit(vcpu, INVALID_GPA); return 0; }
On Mon, Nov 11, 2024, Ivan Orlov wrote:
Move unhandleable vmexit due to MMIO during vectoring error detection into check_emulate_instruction. Implement a function which checks if emul_type indicates MMIO so it can be used for both VMX and SVM.
Fix the comment about EMULTYPE_PF as this flag doesn't necessarily mean MMIO anymore: it can also be set due to the write protection violation.
Signed-off-by: Ivan Orlov iorlov@amazon.com
V1 -> V2:
- Detect the unhandleable vectoring error in vmx_check_emulate_instruction
instead of handling it in the common MMU code (which is specific for cached MMIO)
arch/x86/include/asm/kvm_host.h | 10 ++++++++-- arch/x86/kvm/vmx/vmx.c | 25 ++++++++++++------------- 2 files changed, 20 insertions(+), 15 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index eb413079b7c6..3de9702a9135 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -2017,8 +2017,8 @@ u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu);
VMware backdoor emulation handles select instructions
and reinjects the #GP for all other cases.
- EMULTYPE_PF - Set when emulating MMIO by way of an intercepted #PF, in which
case the CR2/GPA value pass on the stack is valid.
- EMULTYPE_PF - Set when an intercepted #PF triggers the emulation, in which case
the CR2/GPA value pass on the stack is valid.
- EMULTYPE_COMPLETE_USER_EXIT - Set when the emulator should update interruptibility
state and inject single-step #DBs after skipping
@@ -2053,6 +2053,12 @@ u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu); #define EMULTYPE_COMPLETE_USER_EXIT (1 << 7) #define EMULTYPE_WRITE_PF_TO_SP (1 << 8) +static inline bool kvm_is_emul_type_mmio(int emul_type)
Hmm, this should probably be "pf_mmio", not just "mmio". E.g. if KVM is emulating large swaths of guest code because unrestricted guest is disabled, then can end up emulating an MMIO access for "normal" emulation.
Hmm, actually, what if we go with this?
static inline bool kvm_can_emulate_event_vectoring(int emul_type) { return !(emul_type & EMULTYPE_PF) || (emul_type & EMULTYPE_WRITE_PF_TO_SP); }
The MMIO aspect isn't unique to VMX or SVM, and the above would allow handling other incompatible emulation types, should they ever arise (unlikely).
More importantly, it provides a single location to document (via comment) exactly why KVM can't deal with MMIO emulation during event vectoring (and why KVM allows emulation of write-protect page faults).
It would require using X86EMUL_UNHANDLEABLE_VECTORING instead of X86EMUL_UNHANDLEABLE_VECTORING_IO, but I don't think that's necessarily a bad thing.
+{
- return (emul_type & EMULTYPE_PF) &&
!(emul_type & EMULTYPE_WRITE_PF_TO_SP);
+}
int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type); int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu, void *insn, int insn_len); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index f92740e7e107..a10f35d9704b 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -1693,6 +1693,8 @@ static int vmx_rtit_ctl_check(struct kvm_vcpu *vcpu, u64 data) int vmx_check_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type, void *insn, int insn_len) {
- bool is_vect;
- /*
- Emulation of instructions in SGX enclaves is impossible as RIP does
- not point at the failing instruction, and even if it did, the code
@@ -1704,6 +1706,13 @@ int vmx_check_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type, kvm_queue_exception(vcpu, UD_VECTOR); return X86EMUL_PROPAGATE_FAULT; }
- is_vect = to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK;
- /* Emulation is not possible when MMIO happens during event vectoring. */
- if (kvm_is_emul_type_mmio(emul_type) && is_vect)
I definitely prefer to omit the local variable.
if ((to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK) && !kvm_can_emulate_event_vectoring(emul_type)) return X86EMUL_UNHANDLEABLE_VECTORING;
On 12/11/24 18:15, Sean Christopherson wrote:
On Mon, Nov 11, 2024, Ivan Orlov wrote:
Move unhandleable vmexit due to MMIO during vectoring error detection into check_emulate_instruction. Implement a function which checks if emul_type indicates MMIO so it can be used for both VMX and SVM.
Fix the comment about EMULTYPE_PF as this flag doesn't necessarily mean MMIO anymore: it can also be set due to the write protection violation.
Signed-off-by: Ivan Orlov iorlov@amazon.com
V1 -> V2:
- Detect the unhandleable vectoring error in vmx_check_emulate_instruction
instead of handling it in the common MMU code (which is specific for cached MMIO)
arch/x86/include/asm/kvm_host.h | 10 ++++++++-- arch/x86/kvm/vmx/vmx.c | 25 ++++++++++++------------- 2 files changed, 20 insertions(+), 15 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index eb413079b7c6..3de9702a9135 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -2017,8 +2017,8 @@ u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu);
VMware backdoor emulation handles select instructions
and reinjects the #GP for all other cases.
- EMULTYPE_PF - Set when emulating MMIO by way of an intercepted #PF, in which
case the CR2/GPA value pass on the stack is valid.
- EMULTYPE_PF - Set when an intercepted #PF triggers the emulation, in which case
the CR2/GPA value pass on the stack is valid.
- EMULTYPE_COMPLETE_USER_EXIT - Set when the emulator should update interruptibility
state and inject single-step #DBs after skipping
@@ -2053,6 +2053,12 @@ u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu); #define EMULTYPE_COMPLETE_USER_EXIT (1 << 7) #define EMULTYPE_WRITE_PF_TO_SP (1 << 8) +static inline bool kvm_is_emul_type_mmio(int emul_type)
Hmm, this should probably be "pf_mmio", not just "mmio". E.g. if KVM is emulating large swaths of guest code because unrestricted guest is disabled, then can end up emulating an MMIO access for "normal" emulation.
Hmm, actually, what if we go with this?
static inline bool kvm_can_emulate_event_vectoring(int emul_type) { return !(emul_type & EMULTYPE_PF) || (emul_type & EMULTYPE_WRITE_PF_TO_SP); }
I don't mind using either option here, in fact both of them would require an update if there is a new emulated page fault type; Let's use more generic one (which is kvm_can_emulate_event_vectoring) :)
I'm thinking about a static assert we could add to it, to be sure the condition gets updated if such an EMUL_TYPE is introduced, but I can't think of something neat... Anyway, it can be done in a separate patch I guess (if we really need it).
-- Kind regards, Ivan Orlov
On 12/11/24 18:15, Sean Christopherson wrote:
Hmm, this should probably be "pf_mmio", not just "mmio". E.g. if KVM is emulating large swaths of guest code because unrestricted guest is disabled, then can end up emulating an MMIO access for "normal" emulation.
Hmm, actually, what if we go with this?
static inline bool kvm_can_emulate_event_vectoring(int emul_type) { return !(emul_type & EMULTYPE_PF) || (emul_type & EMULTYPE_WRITE_PF_TO_SP); }
Hm, by the way, what is the desired behaviour if EMULTYPE_ALLOW_RETRY_PF is set? Is it correct that we return an internal error if it is set during vectoring? Or KVM may try to unprotect the page and re-execute?
If so, we may need something like
static inline bool kvm_can_emulate_event_vectoring(int emul_type) { return !(emul_type & EMULTYPE_PF) || (emul_type & ~(EMULTYPE_PF)); }
So it returns true if EMULTYPE_PF is not set or if it's not the only set bit.
On Wed, Dec 11, 2024, Ivan Orlov wrote:
On 12/11/24 18:15, Sean Christopherson wrote:
Hmm, this should probably be "pf_mmio", not just "mmio". E.g. if KVM is emulating large swaths of guest code because unrestricted guest is disabled, then can end up emulating an MMIO access for "normal" emulation.
Hmm, actually, what if we go with this?
static inline bool kvm_can_emulate_event_vectoring(int emul_type) { return !(emul_type & EMULTYPE_PF) || (emul_type & EMULTYPE_WRITE_PF_TO_SP); }
Hm, by the way, what is the desired behaviour if EMULTYPE_ALLOW_RETRY_PF is set? Is it correct that we return an internal error if it is set during vectoring? Or KVM may try to unprotect the page and re-execute?
Heh, it's sneaky, but EMULTYPE_ALLOW_RETRY_PF can be set if and only if RET_PF_WRITE_PROTECTED is set. Hmm, that makes me think we should do the below (EMULTYPE_WRITE_PF_TO_SP was a recent addition).
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2e713480933a..de5f6985d123 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9077,7 +9077,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
if ((emulation_type & EMULTYPE_ALLOW_RETRY_PF) && (WARN_ON_ONCE(is_guest_mode(vcpu)) || - WARN_ON_ONCE(!(emulation_type & EMULTYPE_PF)))) + WARN_ON_ONCE(!(emulation_type & EMULTYPE_WRITE_PF_TO_SP)))) emulation_type &= ~EMULTYPE_ALLOW_RETRY_PF;
r = kvm_check_emulate_insn(vcpu, emulation_type, insn, insn_len);
That said, let me get back to you on this when my brain is less tired. I'm not sure emulating when an exit occurred during event delivery is _ever_ correct.
If so, we may need something like
static inline bool kvm_can_emulate_event_vectoring(int emul_type) { return !(emul_type & EMULTYPE_PF) || (emul_type & ~(EMULTYPE_PF)); }
So it returns true if EMULTYPE_PF is not set or if it's not the only set bit.
On Wed, Dec 11, 2024 at 05:01:07PM -0800, Sean Christopherson wrote:
Hm, by the way, what is the desired behaviour if EMULTYPE_ALLOW_RETRY_PF is set? Is it correct that we return an internal error if it is set during vectoring? Or KVM may try to unprotect the page and re-execute?
Heh, it's sneaky, but EMULTYPE_ALLOW_RETRY_PF can be set if and only if RET_PF_WRITE_PROTECTED is set. Hmm, that makes me think we should do the below (EMULTYPE_WRITE_PF_TO_SP was a recent addition).
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2e713480933a..de5f6985d123 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9077,7 +9077,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
if ((emulation_type & EMULTYPE_ALLOW_RETRY_PF) && (WARN_ON_ONCE(is_guest_mode(vcpu)) ||
WARN_ON_ONCE(!(emulation_type & EMULTYPE_PF))))
WARN_ON_ONCE(!(emulation_type & EMULTYPE_WRITE_PF_TO_SP)))) emulation_type &= ~EMULTYPE_ALLOW_RETRY_PF;
What if we are handling a write to write-protected page, but not a write to the page table? We still can retry after unprotecting the page, so EMULTYPE_ALLOW_RETRY_PF should be enabled, right?
r = kvm_check_emulate_insn(vcpu, emulation_type, insn, insn_len);
That said, let me get back to you on this when my brain is less tired. I'm not sure emulating when an exit occurred during event delivery is _ever_ correct.
I believe we can re-execute the instruction if exit happened during vectoring due to exception (and if the address is not MMIO, e.g. when it's a write to write-protected page, for instance when stack points to it).
KVM unprotects the page, executes the instruction one more time and (probably) gets this exception once again (but the page is already unprotected, so vectoring succeeds without vmexit). If not (e.g. exception conditions are not met anymore), guest shouldn't really care and it can continue execution.
However, I'm not sure what happens if vectoring is caused by external interrupt: if we unprotect the page and re-execute the instruction, will IRQ be delivered nonetheless, or it will be lost as irq is already in ISR? Do we need to re-inject it in such a case?
-- Kind regards, Ivan Orlov
On Thu, Dec 12, 2024, Ivan Orlov wrote:
On Wed, Dec 11, 2024 at 05:01:07PM -0800, Sean Christopherson wrote:
Hm, by the way, what is the desired behaviour if EMULTYPE_ALLOW_RETRY_PF is set? Is it correct that we return an internal error if it is set during vectoring? Or KVM may try to unprotect the page and re-execute?
Heh, it's sneaky, but EMULTYPE_ALLOW_RETRY_PF can be set if and only if RET_PF_WRITE_PROTECTED is set. Hmm, that makes me think we should do the below (EMULTYPE_WRITE_PF_TO_SP was a recent addition).
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2e713480933a..de5f6985d123 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9077,7 +9077,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
if ((emulation_type & EMULTYPE_ALLOW_RETRY_PF) && (WARN_ON_ONCE(is_guest_mode(vcpu)) ||
WARN_ON_ONCE(!(emulation_type & EMULTYPE_PF))))
WARN_ON_ONCE(!(emulation_type & EMULTYPE_WRITE_PF_TO_SP)))) emulation_type &= ~EMULTYPE_ALLOW_RETRY_PF;
What if we are handling a write to write-protected page, but not a write to the page table? We still can retry after unprotecting the page, so EMULTYPE_ALLOW_RETRY_PF should be enabled, right?
Gah, I got my enums mixed up. I conflated RET_PF_WRITE_PROTECTED with EMULTYPE_WRITE_PF_TO_SP. Ignore the above.
FWIW, KVM _can't_ unprotect and retry in the EMULTYPE_WRITE_PF_TO_SP case. From kvm_unprotect_and_retry_on_failure():
/* * If the failed instruction faulted on an access to page tables that * are used to translate any part of the instruction, KVM can't resolve * the issue by unprotecting the gfn, as zapping the shadow page will * result in the instruction taking a !PRESENT page fault and thus put * the vCPU into an infinite loop of page faults. E.g. KVM will create * a SPTE and write-protect the gfn to resolve the !PRESENT fault, and * then zap the SPTE to unprotect the gfn, and then do it all over * again. Report the error to userspace. */ if (emulation_type & EMULTYPE_WRITE_PF_TO_SP) return false;
r = kvm_check_emulate_insn(vcpu, emulation_type, insn, insn_len);
That said, let me get back to you on this when my brain is less tired. I'm not sure emulating when an exit occurred during event delivery is _ever_ correct.
I believe we can re-execute the instruction if exit happened during vectoring due to exception (and if the address is not MMIO, e.g. when it's a write to write-protected page, for instance when stack points to it).
Unprotect and re-execute is fine, what I'm worried about is *successfully* emulating the instruction. E.g.
1. CPU executes instruction X and hits a #GP. 2. While vectoring the #GP, a shadow #PF is taken. 3. On VM-Exit, KVM re-injects the #GP (see __vmx_complete_interrupts()). 4. KVM emulates because of the write-protected page. 5. KVM "successfully" emulates and also detects the #GP 6. KVM synthesizes a #GP, and because the vCPU already has injected #GP, incorrectly escalates to a #DF.
The above is a bit contrived, but I think it could happen if the guest reused a page that _was_ a page table, for a vCPU's kernel stack.
KVM unprotects the page, executes the instruction one more time and (probably) gets this exception once again (but the page is already unprotected, so vectoring succeeds without vmexit). If not (e.g. exception conditions are not met anymore), guest shouldn't really care and it can continue execution.
However, I'm not sure what happens if vectoring is caused by external interrupt: if we unprotect the page and re-execute the instruction, will IRQ be delivered nonetheless, or it will be lost as irq is already in ISR? Do we need to re-inject it in such a case?
In all cases, the event that was being vectored is re-injected. Restarting from scratch would be a bug. E.g. if the cause of initial exception was "fixed", say because the initial exception was #BP, and the guest finished patching out the INT3, then restarting would execute the _new_ instruction, and the INT3 would be lost.
In most cases, the guest would never notice, but it's still undesriable for KVM to effectively rewrite history.
As far as unprotect+retry being viable, I think we're on the same page. What I'm getting at is that I think KVM should never allow emulating on #PF when the #PF occurred while vectoring. E.g. this:
static inline bool kvm_can_emulate_event_vectoring(int emul_type) { return !(emul_type & EMULTYPE_PF); }
and then I believe this? Where this diff can be a separate prep patch (though I'm pretty sure it's technically pointless without the vectoring angle, because shadow #PF can't coincide with any of the failure paths for kvm_check_emulate_insn()).
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 07c6f1d5323d..63361b2da450 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9107,6 +9107,10 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, if (r == X86EMUL_RETRY_INSTR || r == X86EMUL_PROPAGATE_FAULT) return 1;
+ if (kvm_unprotect_and_retry_on_failure(vcpu, cr2_or_gpa, + emulation_type)) + return 1; + if (r == X86EMUL_UNHANDLEABLE_VECTORING_IO) { kvm_prepare_event_vectoring_exit(vcpu, cr2_or_gpa); return 0;
On Thu, Dec 12, 2024 at 11:42:37AM -0800, Sean Christopherson wrote:
Gah, I got my enums mixed up. I conflated RET_PF_WRITE_PROTECTED with EMULTYPE_WRITE_PF_TO_SP. Ignore the above.
FWIW, KVM _can't_ unprotect and retry in the EMULTYPE_WRITE_PF_TO_SP case. From kvm_unprotect_and_retry_on_failure():
Ah alright, I guess I get it now :) So, EMULTYPE_ALLOW_RETRY_PF is set whenever there is a PF when accessing write-protected page, and EMULTYPE_WRITE_PF_TO_SP is set when this access touches SPTE used to translate the write itself. If both are set, we can't unprotect and retry, and the latter can be set only if the former is set.
Unprotect and re-execute is fine, what I'm worried about is *successfully* emulating the instruction. E.g.
- CPU executes instruction X and hits a #GP.
- While vectoring the #GP, a shadow #PF is taken.
- On VM-Exit, KVM re-injects the #GP (see __vmx_complete_interrupts()).
- KVM emulates because of the write-protected page.
- KVM "successfully" emulates and also detects the #GP
- KVM synthesizes a #GP, and because the vCPU already has injected #GP, incorrectly escalates to a #DF.
The above is a bit contrived, but I think it could happen if the guest reused a page that _was_ a page table, for a vCPU's kernel stack.
Does it work like that only for contributory exceptions / page faults? In case if it's not #GP but (for instance) #UD, (as far as I understand) KVM will queue only one of them without causing #DF so it's gonna be valid?
However, I'm not sure what happens if vectoring is caused by external interrupt: if we unprotect the page and re-execute the instruction, will IRQ be delivered nonetheless, or it will be lost as irq is already in ISR? Do we need to re-inject it in such a case?
In all cases, the event that was being vectored is re-injected. Restarting from scratch would be a bug. E.g. if the cause of initial exception was "fixed", say because the initial exception was #BP, and the guest finished patching out the INT3, then restarting would execute the _new_ instruction, and the INT3 would be lost.
Cool, that is what I was concerned about, glad that it is already implemented :)
As far as unprotect+retry being viable, I think we're on the same page. What I'm getting at is that I think KVM should never allow emulating on #PF when the #PF occurred while vectoring. E.g. this:
static inline bool kvm_can_emulate_event_vectoring(int emul_type) { return !(emul_type & EMULTYPE_PF); }
Yeah, I agree. I'll post a V3 with suggested fixes (after running all of the selftests to be sure that it doesn't break anything).
and then I believe this? Where this diff can be a separate prep patch (though I'm pretty sure it's technically pointless without the vectoring angle, because shadow #PF can't coincide with any of the failure paths for kvm_check_emulate_insn()).
Looks good. If you don't mind, I could add this patch to the series with `Suggested-by` tag since it's neccessary to allow unprotect+retry in case of shadow #PF during vectoring.
-- Kind regards, Ivan Orlov
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 07c6f1d5323d..63361b2da450 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9107,6 +9107,10 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, if (r == X86EMUL_RETRY_INSTR || r == X86EMUL_PROPAGATE_FAULT) return 1;
if (kvm_unprotect_and_retry_on_failure(vcpu, cr2_or_gpa,
emulation_type))
return 1;
if (r == X86EMUL_UNHANDLEABLE_VECTORING_IO) { kvm_prepare_event_vectoring_exit(vcpu, cr2_or_gpa); return 0;
On Fri, Dec 13, 2024, Ivan Orlov wrote:
On Thu, Dec 12, 2024 at 11:42:37AM -0800, Sean Christopherson wrote:
Unprotect and re-execute is fine, what I'm worried about is *successfully* emulating the instruction. E.g.
- CPU executes instruction X and hits a #GP.
- While vectoring the #GP, a shadow #PF is taken.
- On VM-Exit, KVM re-injects the #GP (see __vmx_complete_interrupts()).
- KVM emulates because of the write-protected page.
- KVM "successfully" emulates and also detects the #GP
- KVM synthesizes a #GP, and because the vCPU already has injected #GP, incorrectly escalates to a #DF.
The above is a bit contrived, but I think it could happen if the guest reused a page that _was_ a page table, for a vCPU's kernel stack.
Does it work like that only for contributory exceptions / page faults?
The #DF case, yes.
In case if it's not #GP but (for instance) #UD, (as far as I understand) KVM will queue only one of them without causing #DF so it's gonna be valid?
No, it can still be invalid. E.g. initialize hit a #BP, replace it with a #UD, but there may be guest-visibile side effects from the original #BP.
However, I'm not sure what happens if vectoring is caused by external interrupt: if we unprotect the page and re-execute the instruction, will IRQ be delivered nonetheless, or it will be lost as irq is already in ISR? Do we need to re-inject it in such a case?
In all cases, the event that was being vectored is re-injected. Restarting from scratch would be a bug. E.g. if the cause of initial exception was "fixed", say because the initial exception was #BP, and the guest finished patching out the INT3, then restarting would execute the _new_ instruction, and the INT3 would be lost.
Cool, that is what I was concerned about, glad that it is already implemented :)
As far as unprotect+retry being viable, I think we're on the same page. What I'm getting at is that I think KVM should never allow emulating on #PF when the #PF occurred while vectoring. E.g. this:
static inline bool kvm_can_emulate_event_vectoring(int emul_type) { return !(emul_type & EMULTYPE_PF); }
Yeah, I agree. I'll post a V3 with suggested fixes (after running all of the selftests to be sure that it doesn't break anything).
and then I believe this? Where this diff can be a separate prep patch (though I'm pretty sure it's technically pointless without the vectoring angle, because shadow #PF can't coincide with any of the failure paths for kvm_check_emulate_insn()).
Looks good. If you don't mind, I could add this patch to the series with `Suggested-by` tag since it's neccessary to allow unprotect+retry in case of shadow #PF during vectoring.
Ya, go for it.
Handle MMIO during vectoring error in check_emulate_instruction to prevent infinite loop on SVM and eliminate the difference in how the situation when the guest accesses MMIO during vectoring is handled on SVM and VMX.
Signed-off-by: Ivan Orlov iorlov@amazon.com --- V1 -> V2: - Detect the unhandleable vectoring error in svm_check_emulate_instruction instead of handling it in the common MMU code (which is specific for cached MMIO)
arch/x86/kvm/svm/svm.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index c1e29307826b..b69f0f98c576 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4797,9 +4797,16 @@ static void svm_enable_smi_window(struct kvm_vcpu *vcpu) static int svm_check_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type, void *insn, int insn_len) { - bool smep, smap, is_user; + bool smep, smap, is_user, is_vect; u64 error_code;
+ is_vect = to_svm(vcpu)->vmcb->control.exit_int_info & + SVM_EXITINTINFO_TYPE_MASK; + + /* Emulation is not possible when MMIO happens during event vectoring. */ + if (kvm_is_emul_type_mmio(emul_type) && is_vect) + return X86EMUL_UNHANDLEABLE_VECTORING_IO; + /* Emulation is always possible when KVM has access to all guest state. */ if (!sev_guest(vcpu->kvm)) return X86EMUL_CONTINUE;
On Mon, Nov 11, 2024, Ivan Orlov wrote:
Handle MMIO during vectoring error in check_emulate_instruction to prevent infinite loop on SVM and eliminate the difference in how the situation when the guest accesses MMIO during vectoring is handled on SVM and VMX.
Signed-off-by: Ivan Orlov iorlov@amazon.com
V1 -> V2:
- Detect the unhandleable vectoring error in svm_check_emulate_instruction
instead of handling it in the common MMU code (which is specific for cached MMIO)
arch/x86/kvm/svm/svm.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index c1e29307826b..b69f0f98c576 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4797,9 +4797,16 @@ static void svm_enable_smi_window(struct kvm_vcpu *vcpu) static int svm_check_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type, void *insn, int insn_len) {
- bool smep, smap, is_user;
- bool smep, smap, is_user, is_vect; u64 error_code;
- is_vect = to_svm(vcpu)->vmcb->control.exit_int_info &
SVM_EXITINTINFO_TYPE_MASK;
- /* Emulation is not possible when MMIO happens during event vectoring. */
- if (kvm_is_emul_type_mmio(emul_type) && is_vect)
Same nit here, omit the local variable.
return X86EMUL_UNHANDLEABLE_VECTORING_IO;
- /* Emulation is always possible when KVM has access to all guest state. */ if (!sev_guest(vcpu->kvm)) return X86EMUL_CONTINUE;
-- 2.43.0
Implement a function for setting the IDT descriptor from the guest code. Replace the existing lidt occurrences with calls to this function as `lidt` is used in multiple places.
Signed-off-by: Ivan Orlov iorlov@amazon.com --- V1 -> V2: - This patch wasn't included in V1.
tools/testing/selftests/kvm/include/x86_64/processor.h | 5 +++++ tools/testing/selftests/kvm/set_memory_region_test.c | 2 +- tools/testing/selftests/kvm/x86_64/sev_smoke_test.c | 2 +- 3 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h index e247f99e0473..1a60c99b5833 100644 --- a/tools/testing/selftests/kvm/include/x86_64/processor.h +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h @@ -571,6 +571,11 @@ static inline void set_cr4(uint64_t val) __asm__ __volatile__("mov %0, %%cr4" : : "r" (val) : "memory"); }
+static inline void set_idt(const struct desc_ptr *idt_desc) +{ + __asm__ __volatile__("lidt %0"::"m"(*idt_desc)); +} + static inline u64 xgetbv(u32 index) { u32 eax, edx; diff --git a/tools/testing/selftests/kvm/set_memory_region_test.c b/tools/testing/selftests/kvm/set_memory_region_test.c index a8267628e9ed..a1c53cc854a5 100644 --- a/tools/testing/selftests/kvm/set_memory_region_test.c +++ b/tools/testing/selftests/kvm/set_memory_region_test.c @@ -235,7 +235,7 @@ static void guest_code_delete_memory_region(void) * in the guest will never succeed, and so isn't an option. */ memset(&idt, 0, sizeof(idt)); - __asm__ __volatile__("lidt %0" :: "m"(idt)); + set_idt(&idt);
GUEST_SYNC(0);
diff --git a/tools/testing/selftests/kvm/x86_64/sev_smoke_test.c b/tools/testing/selftests/kvm/x86_64/sev_smoke_test.c index 2e9197eb1652..8c33e02a3183 100644 --- a/tools/testing/selftests/kvm/x86_64/sev_smoke_test.c +++ b/tools/testing/selftests/kvm/x86_64/sev_smoke_test.c @@ -166,7 +166,7 @@ static void guest_shutdown_code(void)
/* Clobber the IDT so that #UD is guaranteed to trigger SHUTDOWN. */ memset(&idt, 0, sizeof(idt)); - __asm__ __volatile__("lidt %0" :: "m"(idt)); + set_idt(&idt);
__asm__ __volatile__("ud2"); }
Extend the 'set_memory_region_test' with a test case which covers the MMIO during vectoring error handling. The test case
1) Sets an IDT descriptor base to point to an MMIO address 2) Generates a #GP in the guest 3) Verifies that we got a correct exit reason and suberror code 4) Verifies that we got a corrent reported GPA in internal.data[3]
Also, add a definition of non-canonical address to processor.h
Signed-off-by: Ivan Orlov iorlov@amazon.com --- V1 -> V2: - Get rid of pronouns, redundant comments and incorrect wording - Define noncanonical address in processor.h - Fix indentation and wrap lines at 80 columns
.../selftests/kvm/include/x86_64/processor.h | 2 + .../selftests/kvm/set_memory_region_test.c | 51 +++++++++++++++++++ 2 files changed, 53 insertions(+)
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h index 1a60c99b5833..997df5003edb 100644 --- a/tools/testing/selftests/kvm/include/x86_64/processor.h +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h @@ -1165,6 +1165,8 @@ void vm_install_exception_handler(struct kvm_vm *vm, int vector, /* If a toddler were to say "abracadabra". */ #define KVM_EXCEPTION_MAGIC 0xabacadabaULL
+#define NONCANONICAL 0xaaaaaaaaaaaaaaaaull + /* * KVM selftest exception fixup uses registers to coordinate with the exception * handler, versus the kernel's in-memory tables and KVM-Unit-Tests's in-memory diff --git a/tools/testing/selftests/kvm/set_memory_region_test.c b/tools/testing/selftests/kvm/set_memory_region_test.c index a1c53cc854a5..d65a9f20aa1a 100644 --- a/tools/testing/selftests/kvm/set_memory_region_test.c +++ b/tools/testing/selftests/kvm/set_memory_region_test.c @@ -553,6 +553,56 @@ static void test_add_overlapping_private_memory_regions(void) close(memfd); kvm_vm_free(vm); } + +static void guest_code_mmio_during_vectoring(void) +{ + const struct desc_ptr idt_desc = { + .address = MEM_REGION_GPA, + .size = 0xFFF, + }; + + set_idt(&idt_desc); + + /* Generate a #GP by dereferencing a non-canonical address */ + *((uint8_t *)NONCANONICAL) = 0x1; + + GUEST_ASSERT(0); +} + +/* + * This test points the IDT descriptor base to an MMIO address. It should cause + * a KVM internal error when an event occurs in the guest. + */ +static void test_mmio_during_vectoring(void) +{ + struct kvm_vcpu *vcpu; + struct kvm_run *run; + struct kvm_vm *vm; + u64 expected_gpa; + + pr_info("Testing MMIO during vectoring error handling\n"); + + vm = vm_create_with_one_vcpu(&vcpu, guest_code_mmio_during_vectoring); + virt_map(vm, MEM_REGION_GPA, MEM_REGION_GPA, 1); + + run = vcpu->run; + + vcpu_run(vcpu); + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_INTERNAL_ERROR); + TEST_ASSERT(run->internal.suberror == KVM_INTERNAL_ERROR_DELIVERY_EV, + "Unexpected suberror = %d", vcpu->run->internal.suberror); + TEST_ASSERT(run->internal.ndata != 4, "Unexpected internal error data array size = %d", + run->internal.ndata); + + /* The reported GPA should be IDT base + offset of the GP vector */ + expected_gpa = MEM_REGION_GPA + GP_VECTOR * sizeof(struct idt_entry); + + TEST_ASSERT(run->internal.data[3] == expected_gpa, + "Unexpected GPA = %llx (expected %lx)", + vcpu->run->internal.data[3], expected_gpa); + + kvm_vm_free(vm); +} #endif
int main(int argc, char *argv[]) @@ -568,6 +618,7 @@ int main(int argc, char *argv[]) * KVM_RUN fails with ENOEXEC or EFAULT. */ test_zero_memory_regions(); + test_mmio_during_vectoring(); #endif
test_invalid_memory_region_flags();
On Mon, Nov 11, 2024, Ivan Orlov wrote:
Extend the 'set_memory_region_test' with a test case which covers the MMIO during vectoring error handling. The test case
- Sets an IDT descriptor base to point to an MMIO address
- Generates a #GP in the guest
- Verifies that we got a correct exit reason and suberror code
- Verifies that we got a corrent reported GPA in internal.data[3]
Also, add a definition of non-canonical address to processor.h
Signed-off-by: Ivan Orlov iorlov@amazon.com
V1 -> V2:
- Get rid of pronouns, redundant comments and incorrect wording
- Define noncanonical address in processor.h
- Fix indentation and wrap lines at 80 columns
.../selftests/kvm/include/x86_64/processor.h | 2 + .../selftests/kvm/set_memory_region_test.c | 51 +++++++++++++++++++ 2 files changed, 53 insertions(+)
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h index 1a60c99b5833..997df5003edb 100644 --- a/tools/testing/selftests/kvm/include/x86_64/processor.h +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h @@ -1165,6 +1165,8 @@ void vm_install_exception_handler(struct kvm_vm *vm, int vector, /* If a toddler were to say "abracadabra". */ #define KVM_EXCEPTION_MAGIC 0xabacadabaULL +#define NONCANONICAL 0xaaaaaaaaaaaaaaaaull
Uber nit, I much prefer to place this definition at the top of the header. More specifically, it needs to not be in the middle of the selftest's exception fixup code.
/*
- KVM selftest exception fixup uses registers to coordinate with the exception
- handler, versus the kernel's in-memory tables and KVM-Unit-Tests's in-memory
diff --git a/tools/testing/selftests/kvm/set_memory_region_test.c b/tools/testing/selftests/kvm/set_memory_region_test.c index a1c53cc854a5..d65a9f20aa1a 100644 --- a/tools/testing/selftests/kvm/set_memory_region_test.c +++ b/tools/testing/selftests/kvm/set_memory_region_test.c @@ -553,6 +553,56 @@ static void test_add_overlapping_private_memory_regions(void) close(memfd); kvm_vm_free(vm); }
+static void guest_code_mmio_during_vectoring(void) +{
- const struct desc_ptr idt_desc = {
.address = MEM_REGION_GPA,
.size = 0xFFF,
- };
- set_idt(&idt_desc);
- /* Generate a #GP by dereferencing a non-canonical address */
- *((uint8_t *)NONCANONICAL) = 0x1;
Now I'm curious what happens if this uses vcpu_arch_put_guest(), i.e. if the test forces KVM to emulate the write.
No action needed, the test is a-ok as-is. I'm really just curious :-)
On Wed, Dec 11, 2024 at 10:19:40AM -0800, Sean Christopherson wrote:
+static void guest_code_mmio_during_vectoring(void) +{
const struct desc_ptr idt_desc = {
.address = MEM_REGION_GPA,
.size = 0xFFF,
};
set_idt(&idt_desc);
/* Generate a #GP by dereferencing a non-canonical address */
*((uint8_t *)NONCANONICAL) = 0x1;
Now I'm curious what happens if this uses vcpu_arch_put_guest(), i.e. if the test forces KVM to emulate the write.
No action needed, the test is a-ok as-is. I'm really just curious :-)
:) Just tried enabling `force_emulation_prefix` kvm parameter and replacing the write with
vcpu_arch_put_guest(*((uint8_t *)NONCANONICAL), 0x1);
And the test simply passes (so it returns the same internal error with suberror=3, patches applied)
-- Kind regards, Ivan Orlov
On Mon, Nov 11, 2024, Ivan Orlov wrote:
Currently, the situation when guest accesses MMIO during vectoring is handled differently on VMX and SVM: on VMX KVM returns internal error, when SVM goes into infinite loop trying to deliver an event again and again.
This patch series eliminates this difference by returning a KVM internal error when guest performs MMIO during vectoring for both VMX and SVM.
Also, introduce a selftest test case which covers the error handling mentioned above.
V1 -> V2:
- Make commit messages more brief, avoid using pronouns
- Extract SVM error handling into a separate commit
- Introduce a new X86EMUL_ return type and detect the unhandleable
vectoring error in vendor-specific check_emulate_instruction instead of handling it in the common MMU code (which is specific for cached MMIO)
Ivan Orlov (6): KVM: x86: Add function for vectoring error generation KVM: x86: Add emulation status for vectoring during MMIO KVM: VMX: Handle vectoring error in check_emulate_instruction KVM: SVM: Handle MMIO during vectroing error selftests: KVM: extract lidt into helper function selftests: KVM: Add test case for MMIO during vectoring
Minor nits throughout, but unless you disagree with my suggestions, I'll fix them up when applying, i.e. no need to post a v3.
On 12/11/24 18:20, Sean Christopherson wrote:
On Mon, Nov 11, 2024, Ivan Orlov wrote:
Currently, the situation when guest accesses MMIO during vectoring is handled differently on VMX and SVM: on VMX KVM returns internal error, when SVM goes into infinite loop trying to deliver an event again and again.
This patch series eliminates this difference by returning a KVM internal error when guest performs MMIO during vectoring for both VMX and SVM.
Also, introduce a selftest test case which covers the error handling mentioned above.
V1 -> V2:
- Make commit messages more brief, avoid using pronouns
- Extract SVM error handling into a separate commit
- Introduce a new X86EMUL_ return type and detect the unhandleable
vectoring error in vendor-specific check_emulate_instruction instead of handling it in the common MMU code (which is specific for cached MMIO)
Ivan Orlov (6): KVM: x86: Add function for vectoring error generation KVM: x86: Add emulation status for vectoring during MMIO KVM: VMX: Handle vectoring error in check_emulate_instruction KVM: SVM: Handle MMIO during vectroing error selftests: KVM: extract lidt into helper function selftests: KVM: Add test case for MMIO during vectoring
Minor nits throughout, but unless you disagree with my suggestions, I'll fix them up when applying, i.e. no need to post a v3.
Hi Sean,
Thanks a lot for the review :)
I don't have any conceptual disagreement with your suggestions, so please feel free to fix them when applying the patches. Thanks!
linux-kselftest-mirror@lists.linaro.org