Re: [PATCH v2] x86/kvm: Disable KVM_ASYNC_PF_SEND_ALWAYS

8 Apr 2020

      Andy Lutomirski luto@amacapital.net writes:
...
...
On Apr 7, 2020, at 3:48 PM, Thomas Gleixner tglx@linutronix.de wrote:
  Inject #MC
No, not what I meant. Host has two sane choices here IMO:

Tell the guest that the page is gone as part of the wakeup. No #PF or #MC.

Tell guest that it’s resolved and inject #MC when the guest

retries.  The #MC is a real fault, RIP points to the right place, etc.
Ok, that makes sense.
...
...
...

Access to bad memory results in an async-page-not-present, except

that, it’s not deliverable, the guest is killed.
That's incorrect. The proper reaction is a real #PF. Simply because this
is part of the contract of sharing some file backed stuff between host
and guest in a well defined "virtio" scenario and not a random access to
memory which might be there or not.
The problem is that the host doesn’t know when #PF is safe. It’s sort
of the same problem that async pf has now.  The guest kernel could
access the problematic page in the middle of an NMI, under
pagefault_disable(), etc — getting #PF as a result of CPL0 access to a
page with a valid guest PTE is simply not part of the x86
architecture.
Fair enough.
...
Replace copy_to_user() with some access to a gup-ed mapping with no
extable handler and it doesn’t look so good any more.
In this case the guest needs to die.
...
Of course, the guest will oops if this happens, but the guest needs to
be able to oops cleanly. #PF is too fragile for this because it’s not
IST, and #PF is the wrong thing anyway — #PF is all about
guest-virtual-to-guest-physical mappings.  Heck, what would CR2 be?
The host might not even know the guest virtual address.
It knows, but I can see your point.
...
...
...

Access to bad memory results in #MC.  Sure, #MC is a turd, but it’s

an *architectural* turd. By all means, have a nice simple PV mechanism
to tell the #MC code exactly what went wrong, but keep the overall
flow the same as in the native case.
It's a completely different flow as you evaluate PV turd instead of
analysing the MCE banks and the other error reporting facilities.
I’m fine with the flow being different. do_machine_check() could have
entirely different logic to decide the error in PV.  But I think we
should reuse the overall flow: kernel gets #MC with RIP pointing to
the offending instruction. If there’s an extable entry that can handle
memory failure, handle it. If it’s a user access, handle it.  If it’s
an unrecoverable error because it was a non-extable kernel access,
oops or panic.
The actual PV part could be extremely simple: the host just needs to
tell the guest “this #MC is due to memory failure at this guest
physical address”.  No banks, no DIMM slot, no rendezvous crap (LMCE),
no other nonsense.  It would be nifty if the host also told the guest
what the guest virtual address was if the host knows it.
It does. The EPT violations store:
- guest-linear address
  - guest-physical address
That's also part of the #VE exception to which Paolo was referring.
Thanks,
tglx

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v2] x86/kvm: Disable KVM_ASYNC_PF_SEND_ALWAYS