On Thu, Dec 12, 2024, Ivan Orlov wrote:
On Wed, Dec 11, 2024 at 05:01:07PM -0800, Sean Christopherson wrote:
Hm, by the way, what is the desired behaviour if EMULTYPE_ALLOW_RETRY_PF is set? Is it correct that we return an internal error if it is set during vectoring? Or KVM may try to unprotect the page and re-execute?
Heh, it's sneaky, but EMULTYPE_ALLOW_RETRY_PF can be set if and only if RET_PF_WRITE_PROTECTED is set. Hmm, that makes me think we should do the below (EMULTYPE_WRITE_PF_TO_SP was a recent addition).
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2e713480933a..de5f6985d123 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9077,7 +9077,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
if ((emulation_type & EMULTYPE_ALLOW_RETRY_PF) && (WARN_ON_ONCE(is_guest_mode(vcpu)) ||
WARN_ON_ONCE(!(emulation_type & EMULTYPE_PF))))
WARN_ON_ONCE(!(emulation_type & EMULTYPE_WRITE_PF_TO_SP)))) emulation_type &= ~EMULTYPE_ALLOW_RETRY_PF;
What if we are handling a write to write-protected page, but not a write to the page table? We still can retry after unprotecting the page, so EMULTYPE_ALLOW_RETRY_PF should be enabled, right?
Gah, I got my enums mixed up. I conflated RET_PF_WRITE_PROTECTED with EMULTYPE_WRITE_PF_TO_SP. Ignore the above.
FWIW, KVM _can't_ unprotect and retry in the EMULTYPE_WRITE_PF_TO_SP case. From kvm_unprotect_and_retry_on_failure():
/* * If the failed instruction faulted on an access to page tables that * are used to translate any part of the instruction, KVM can't resolve * the issue by unprotecting the gfn, as zapping the shadow page will * result in the instruction taking a !PRESENT page fault and thus put * the vCPU into an infinite loop of page faults. E.g. KVM will create * a SPTE and write-protect the gfn to resolve the !PRESENT fault, and * then zap the SPTE to unprotect the gfn, and then do it all over * again. Report the error to userspace. */ if (emulation_type & EMULTYPE_WRITE_PF_TO_SP) return false;
r = kvm_check_emulate_insn(vcpu, emulation_type, insn, insn_len);
That said, let me get back to you on this when my brain is less tired. I'm not sure emulating when an exit occurred during event delivery is _ever_ correct.
I believe we can re-execute the instruction if exit happened during vectoring due to exception (and if the address is not MMIO, e.g. when it's a write to write-protected page, for instance when stack points to it).
Unprotect and re-execute is fine, what I'm worried about is *successfully* emulating the instruction. E.g.
1. CPU executes instruction X and hits a #GP. 2. While vectoring the #GP, a shadow #PF is taken. 3. On VM-Exit, KVM re-injects the #GP (see __vmx_complete_interrupts()). 4. KVM emulates because of the write-protected page. 5. KVM "successfully" emulates and also detects the #GP 6. KVM synthesizes a #GP, and because the vCPU already has injected #GP, incorrectly escalates to a #DF.
The above is a bit contrived, but I think it could happen if the guest reused a page that _was_ a page table, for a vCPU's kernel stack.
KVM unprotects the page, executes the instruction one more time and (probably) gets this exception once again (but the page is already unprotected, so vectoring succeeds without vmexit). If not (e.g. exception conditions are not met anymore), guest shouldn't really care and it can continue execution.
However, I'm not sure what happens if vectoring is caused by external interrupt: if we unprotect the page and re-execute the instruction, will IRQ be delivered nonetheless, or it will be lost as irq is already in ISR? Do we need to re-inject it in such a case?
In all cases, the event that was being vectored is re-injected. Restarting from scratch would be a bug. E.g. if the cause of initial exception was "fixed", say because the initial exception was #BP, and the guest finished patching out the INT3, then restarting would execute the _new_ instruction, and the INT3 would be lost.
In most cases, the guest would never notice, but it's still undesriable for KVM to effectively rewrite history.
As far as unprotect+retry being viable, I think we're on the same page. What I'm getting at is that I think KVM should never allow emulating on #PF when the #PF occurred while vectoring. E.g. this:
static inline bool kvm_can_emulate_event_vectoring(int emul_type) { return !(emul_type & EMULTYPE_PF); }
and then I believe this? Where this diff can be a separate prep patch (though I'm pretty sure it's technically pointless without the vectoring angle, because shadow #PF can't coincide with any of the failure paths for kvm_check_emulate_insn()).
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 07c6f1d5323d..63361b2da450 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9107,6 +9107,10 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, if (r == X86EMUL_RETRY_INSTR || r == X86EMUL_PROPAGATE_FAULT) return 1;
+ if (kvm_unprotect_and_retry_on_failure(vcpu, cr2_or_gpa, + emulation_type)) + return 1; + if (r == X86EMUL_UNHANDLEABLE_VECTORING_IO) { kvm_prepare_event_vectoring_exit(vcpu, cr2_or_gpa); return 0;