Re: [PATCH v2] x86/kvm: Disable KVM_ASYNC_PF_SEND_ALWAYS

7 Apr 2020

      On Tue, Apr 07, 2020 at 02:41:02PM -0700, Andy Lutomirski wrote:
...
...
On Apr 7, 2020, at 1:20 PM, Thomas Gleixner tglx@linutronix.de wrote:
Andy Lutomirski luto@amacapital.net writes:
...
...
...
On Apr 7, 2020, at 10:21 AM, Vivek Goyal vgoyal@redhat.com wrote:
Whether interrupts are enabled or not check only happens before we decide
if async pf protocol should be followed or not. Once we decide to
send PAGE_NOT_PRESENT, later notification PAGE_READY does not check
if interrupts are enabled or not. And it kind of makes sense otherwise
guest process will wait infinitely to receive PAGE_READY.
I modified the code a bit to disable interrupt and wait 10 seconds (after
getting PAGE_NOT_PRESENT message). And I noticed that error async pf
got delivered after 10 seconds after enabling interrupts. So error
async pf was not lost because interrupts were disabled.
Async PF is not the same as a real #PF. It just hijacked the #PF vector
because someone thought this is a brilliant idea.
...
...
Havind said that, I thought disabling interrupts does not mask exceptions.
So page fault exception should have been delivered even with interrupts
disabled. Is that correct? May be there was no vm exit/entry during
those 10 seconds and that's why.
No. Async PF is not a real exception. It has interrupt semantics and it
can only be injected when the guest has interrupts enabled. It's bad
design.
...
My point is that the entire async pf is nonsense. There are two types of events right now:
“Page not ready”: normally this isn’t even visible to the guest — the
guest just waits. With async pf, the idea is to try to tell the guest
that a particular instruction would block and the guest should do
something else instead. Sending a normal exception is a poor design,
though: the guest may not expect this instruction to cause an
exception. I think KVM should try to deliver an *interrupt* and, if it
can’t, then just block the guest.
That's pretty much what it does, just that it runs this through #PF and
has the checks for interrupts disabled - i.e can't right now' around
that. If it can't then KVM schedules the guest out until the situation
has been resolved.
...
“Page ready”: this is a regular asynchronous notification just like,
say, a virtio completion. It should be an ordinary interrupt.  Some in
memory data structure should indicate which pages are ready.
“Page is malfunctioning” is tricky because you *must* deliver the
event. x86’s #MC is not exactly a masterpiece, but it does kind of
work.
Nooooo. This does not need #MC at all. Don't even think about it.
Yessssssssssss.  Please do think about it. :)
...
The point is that the access to such a page is either happening in user
space or in kernel space with a proper exception table fixup.
That means a real #PF is perfectly fine. That can be injected any time
and does not have the interrupt semantics of async PF.
The hypervisor has no way to distinguish between MOV-and-has-valid-stack-and-extable-entry and MOV-definitely-can’t-fault-here.  Or, for that matter, MOV-in-do_page_fault()-will-recurve-if-it-faults.
...
So now lets assume we distangled async PF from #PF and made it a regular
interrupt, then the following situation still needs to be dealt with:
guest -> access faults
host -> injects async fault
guest -> handles and blocks the task
host figures out that the page does not exist anymore and now needs to
fixup the situation.
host -> injects async wakeup
guest -> returns from aysnc PF interrupt and retries the instruction
           which faults again.
host -> knows by now that this is a real fault and injects a proper #PF
guest -> #PF runs and either sends signal to user space or runs
           the exception table fixup for a kernel fault.
Or guest blows up because the fault could not be recovered using #PF.
I can see two somewhat sane ways to make this work.

Access to bad memory results in an async-page-not-present, except that,  it’s not deliverable, the guest is killed. Either that async-page-not-present has a special flag saying “memory failure” or the eventual wakeup says “memory failure”.

Access to bad memory results in #MC.  Sure, #MC is a turd, but it’s an *architectural* turd. By all means, have a nice simple PV mechanism to tell the #MC code exactly what went wrong, but keep the overall flow the same as in the native case.

Should we differentiate between two error cases. In this case memory is
not bad. Just that page being asked for can't be faulted in anymore. And
another error case is real memory failure. So for the first case it
could be injected into guest using #PF or #VE or something paravirt
specific and real hardware failures take #MC path (if they are not
already doing so).
Thanks
Vivek

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v2] x86/kvm: Disable KVM_ASYNC_PF_SEND_ALWAYS