On Sat, Mar 7, 2020 at 11:01 AM Thomas Gleixner tglx@linutronix.de wrote:
Andy Lutomirski luto@kernel.org writes:
On Sat, Mar 7, 2020 at 7:47 AM Thomas Gleixner tglx@linutronix.de wrote:
The host knows exactly when it injects a async PF and it can store CR2 and reason of that async PF in flight.
On the next VMEXIT it checks whether apf_reason is 0. If apf_reason is 0 then it knows that the guest has read CR2 and apf_reason. All good nothing to worry about.
If not it needs to be careful.
As long as the apf_reason of the last async #PF is not cleared by the guest no new async #PF can be injected. That's already correct because in that case IF==0 which prevents a nested async #PF.
If MCE, NMI trigger a real pagefault then the #PF injection needs to clear apf_reason and set the correct CR2. When that #PF returns then the old CR2 and apf_reason need to be restored.
How is the host supposed to know when the #PF returns? Intercepting IRET sounds like a bad idea and, in any case, is not actually a reliable indication that #PF returned.
The host does not care about the IRET. It solely has to check whether apf_reason is 0 or not. That way it knows that the guest has read CR2 and apf_reason.
/me needs actual details
Suppose the host delivers an async #PF. apf_reason != 0 and CR2 contains something meaningful. Host resumes the guest.
The guest does whatever (gets NMI, and does perf stuff, for example). The guest gets a normal #PF. Somehow the host needs to do:
if (apf_reason != 0) { prev_apf_reason = apf_reason; prev_cr2 = cr2; apf_reason = 0; cr2 = actual fault address; }
resume guest;
Obviously this can only happen if the host intercepts #PF. Let's pretend for now that this is even possible on SEV-ES (it may well be, but I would also believe that it's not. SEV-ES intercepts are weird and I don't have the whole manual in my head. I'm not sure the host has any way to read CR2 for a SEV-ES guest.) So now the guest runs some more and finishes handling the inner #PF. Some time between doing that and running the outer #PF code that reads apf_reason, the host needs to do:
apf_reason = prev_apf_reason; cr2 = prev_cr2; prev_apf_reason = 0;
How is the host supposed to know when to do that?
--Andy