On 09/04/2020 05:50, Andy Lutomirski wrote:
On Wed, Apr 8, 2020 at 11:01 AM Thomas Gleixner tglx@linutronix.de wrote:
Paolo Bonzini pbonzini@redhat.com writes:
On 08/04/20 17:34, Sean Christopherson wrote:
On Wed, Apr 08, 2020 at 10:23:58AM +0200, Paolo Bonzini wrote:
Page-not-present async page faults are almost a perfect match for the hardware use of #VE (and it might even be possible to let the processor deliver the exceptions).
My "async" page fault knowledge is limited, but if the desired behavior is to reflect a fault into the guest for select EPT Violations, then yes, enabling EPT Violation #VEs in hardware is doable. The big gotcha is that KVM needs to set the suppress #VE bit for all EPTEs when allocating a new MMU page, otherwise not-present faults on zero-initialized EPTEs will get reflected.
Attached a patch that does the prep work in the MMU. The VMX usage would be:
kvm_mmu_set_spte_init_value(VMX_EPT_SUPPRESS_VE_BIT);
when EPT Violation #VEs are enabled. It's 64-bit only as it uses stosq to initialize EPTEs. 32-bit could also be supported by doing memcpy() from a static page.
The complication is that (at least according to the current ABI) we would not want #VE to kick if the guest currently has IF=0 (and possibly CPL=0). But the ABI is not set in stone, and anyway the #VE protocol is a decent one and worth using as a base for whatever PV protocol we design.
Forget the current pf async semantics (or the lack of). You really want to start from scratch and igore the whole thing.
The charm of #VE is that the hardware can inject it and it's not nesting until the guest cleared the second word in the VE information area. If that word is not 0 then you get a regular vmexit where you suspend the vcpu until the nested problem is solved.
Can you point me at where the SDM says this?
Vol3 25.5.6.1 Convertible EPT Violations
Anyway, I see two problems with #VE, one big and one small. The small (or maybe small) one is that any fancy protocol where the guest returns from an exception by doing, logically:
Hey I'm done; /* MOV somewhere, hypercall, MOV to CR4, whatever */ IRET;
is fundamentally racy. After we say we're done and before IRET, we can be recursively reentered. Hi, NMI!
Correct. There is no way to atomically end the #VE handler. (This causes "fun" even when using #VE for its intended purpose.)
~Andrew