On 09/04/20 16:13, Andrew Cooper wrote:
On 09/04/2020 13:47, Paolo Bonzini wrote:
On 09/04/20 06:50, Andy Lutomirski wrote:
The small (or maybe small) one is that any fancy protocol where the guest returns from an exception by doing, logically:
Hey I'm done; /* MOV somewhere, hypercall, MOV to CR4, whatever */ IRET;
is fundamentally racy. After we say we're done and before IRET, we can be recursively reentered. Hi, NMI!
That's possible in theory. In practice there would be only two levels of nesting, one for the original page being loaded and one for the tail of the #VE handler. The nested #VE would see IF=0, resolve the EPT violation synchronously and both handlers would finish. For the tail page to be swapped out again, leading to more nesting, the host's LRU must be seriously messed up.
With IST it would be much messier, and I haven't quite understood why you believe the #VE handler should have an IST.
Any interrupt/exception which can possibly occur between a SYSCALL and re-establishing a kernel stack (several instructions), must be IST to avoid taking said exception on a user stack and being a trivial privilege escalation.
Doh, of course. I always confuse SYSCALL and SYSENTER.
Therefore, it doesn't really matter if KVM's paravirt use of #VE does respect the interrupt flag. It is not sensible to build a paravirt interface using #VE who's safety depends on never turning on hardware-induced #VE's.
No, I think we wouldn't use a paravirt #VE at this point, we would use the real thing if available.
It would still be possible to switch from the IST to the main kernel stack before writing 0 to the reentrancy word.
Paolo