New subject: [PATCH 2/3] KVM: x86: Add support for VMware guest specific hypercalls

13 Nov 2024


      On Wed, Nov 13, 2024 at 2:31 AM Paolo Bonzini pbonzini@redhat.com wrote:
...
Il mar 12 nov 2024, 21:44 Doug Covelli doug.covelli@broadcom.com ha scritto:
...
...
Split irqchip should be the best tradeoff. Without it, moves from cr8
stay in the kernel, but moves to cr8 always go to userspace with a
KVM_EXIT_SET_TPR exit. You also won't be able to use Intel
flexpriority (in-processor accelerated TPR) because KVM does not know
which bits are set in IRR. So it will be *really* every move to cr8
that goes to userspace.
Sorry to hijack this thread but is there a technical reason not to allow CR8
based accesses to the TPR (not MMIO accesses) when the in-kernel local APIC is
not in use?
No worries, you're not hijacking :) The only reason is that it would be more code for a seldom used feature and anyway with worse performance. (To be clear, CR8 based accesses are allowed, but stores cause an exit in order to check the new TPR against IRR. That's because KVM's API does not have an equivalent of the TPR threshold as you point out below).
I have not really looked at the code but it seems like it could also
simplify things as CR8 would be handled more uniformly regardless of
who is virtualizing the local APIC.
...
...
Also I could not find these documented anywhere but with MSFT's APIC our monitor
relies on extensions for trapping certain events such as INIT/SIPI plus LINT0
and SVR writes:
UINT64 X64ApicInitSipiExitTrap    : 1; // WHvRunVpExitReasonX64ApicInitSipiTrap
UINT64 X64ApicWriteLint0ExitTrap  : 1; // WHvRunVpExitReasonX64ApicWriteTrap
UINT64 X64ApicWriteLint1ExitTrap  : 1; // WHvRunVpExitReasonX64ApicWriteTrap
UINT64 X64ApicWriteSvrExitTrap    : 1; // WHvRunVpExitReasonX64ApicWriteTrap
There's no need for this in KVM's in-kernel APIC model. INIT and SIPI are handled in the hypervisor and you can get the current state of APs via KVM_GET_MPSTATE. LINT0 and LINT1 are injected with KVM_INTERRUPT and KVM_NMI respectively, and they obey IF/PPR and NMI blocking respectively, plus the interrupt shadow; so there's no need for userspace to know when LINT0/LINT1 themselves change. The spurious interrupt vector register is also handled completely in kernel.
I realize that KVM can handle LINT0/SVR updates themselves but our
interrupt subsystem relies on knowing the current values of these
registers even when not virtualizing the local APIC.  I suppose we
could use KVM_GET_LAPIC to sync things up on demand but that seems
like it might nor be great from a performance point of view.
...
...
I did not see any similar functionality for KVM.  Does anything like that exist?
In any case we would be happy to add support for handling CR8 accesses w/o
exiting w/o the in-kernel APIC along with some sort of a way to configure the
TPR threshold if folks are not opposed to that.
As far I know everybody who's using KVM (whether proprietary or open source) has had no need for that, so I don't think it's a good idea to make the API more complex. Performance of Windows guests is going to be bad anyway with userspace APIC.
From what I have seen the exit cost with KVM is significantly lower
than with WHP/Hyper-V.  I don't think performance of Windows guests
with userspace APIC emulation would be bad if CR8 exits could be
avoided (Linux guests perf isn't bad from what I have observed and the
main difference is the astronomical number of CR8 exits).  It seems
like it would be pretty decent although I agree if you want the
absolute best performance then you would want to use the in kernel
APIC to speed up handling of ICR/EOI writes but those are relatively
infrequent compared to CR8 accesses .
Anyway I just saw Sean's response while writing this and it seems he
is not in favor of avoiding CR8 exits w/o the in kernel APIC either so
I suppose we will have to look into making use of the in kernel APIC.
Doug
...
Paolo
...
Doug
...
...
For now I think it makes sense to handle BDOOR_CMD_GET_VCPU_INFO at userlevel
like we do on Windows and macOS.
BDOOR_CMD_GETTIME/BDOOR_CMD_GETTIMEFULL are similar with the former being
deprecated in favor of the latter.  Both do essentially the same thing which is
to return the host OS's time - on Linux this is obtained via gettimeofday.  I
believe this is mainly used by tools to fix up the VM's time when resuming from
suspend.  I think it is fine to continue handling these at userlevel.
As long as the TSC is not involved it should be okay.
Paolo
...
...
...
> Anyway, one question apart from this: is the API the same for the I/O
> port and hypercall backdoors?
Yeah the calls and arguments are the same.  The hypercall based
interface is an attempt to modernize the backdoor since as you pointed
out the I/O based interface is kind of hacky as it bypasses the normal
checks for an I/O port access at CPL3.  It would be nice to get rid of
it but unfortunately I don't think that will happen in the foreseeable
future as there are a lot of existing VMs out there with older SW that
still uses this interface.
Yeah, but I think it still justifies that the KVM_ENABLE_CAP API can
enable the hypercall but not the I/O port.
Paolo
--
This electronic communication and the information and any files transmitted
with it, or attached to it, are confidential and are intended solely for
the use of the individual or entity to whom it is addressed and may contain
information that is confidential, legally privileged, protected by privacy
laws, or otherwise restricted from disclosure to anyone else. If you are
not the intended recipient or the person responsible for delivering the
e-mail to the intended recipient, you are hereby notified that any use,
copying, distributing, dissemination, forwarding, printing, or copying of
this e-mail is strictly prohibited. If you received this e-mail in error,
please return the e-mail to the sender, delete it from your computer, and
destroy any printed copy of it.
-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.

Re: [PATCH 2/3] KVM: x86: Add support for VMware guest specific hypercalls