On Wed, Nov 13, 2024, Paolo Bonzini wrote:
Il mar 12 nov 2024, 21:44 Doug Covelli doug.covelli@broadcom.com ha scritto:
Split irqchip should be the best tradeoff. Without it, moves from cr8 stay in the kernel, but moves to cr8 always go to userspace with a KVM_EXIT_SET_TPR exit. You also won't be able to use Intel flexpriority (in-processor accelerated TPR) because KVM does not know which bits are set in IRR. So it will be *really* every move to cr8 that goes to userspace.
Sorry to hijack this thread but is there a technical reason not to allow CR8 based accesses to the TPR (not MMIO accesses) when the in-kernel local APIC is not in use?
No worries, you're not hijacking :) The only reason is that it would be more code for a seldom used feature and anyway with worse performance. (To be clear, CR8 based accesses are allowed, but stores cause an exit in order to check the new TPR against IRR. That's because KVM's API does not have an equivalent of the TPR threshold as you point out below).
Also I could not find these documented anywhere but with MSFT's APIC our monitor relies on extensions for trapping certain events such as INIT/SIPI plus LINT0 and SVR writes:
UINT64 X64ApicInitSipiExitTrap : 1; // WHvRunVpExitReasonX64ApicInitSipiTrap UINT64 X64ApicWriteLint0ExitTrap : 1; // WHvRunVpExitReasonX64ApicWriteTrap UINT64 X64ApicWriteLint1ExitTrap : 1; // WHvRunVpExitReasonX64ApicWriteTrap UINT64 X64ApicWriteSvrExitTrap : 1; // WHvRunVpExitReasonX64ApicWriteTrap
There's no need for this in KVM's in-kernel APIC model. INIT and SIPI are handled in the hypervisor and you can get the current state of APs via KVM_GET_MPSTATE. LINT0 and LINT1 are injected with KVM_INTERRUPT and KVM_NMI respectively, and they obey IF/PPR and NMI blocking respectively, plus the interrupt shadow; so there's no need for userspace to know when LINT0/LINT1 themselves change. The spurious interrupt vector register is also handled completely in kernel.
I did not see any similar functionality for KVM. Does anything like that exist? In any case we would be happy to add support for handling CR8 accesses w/o exiting w/o the in-kernel APIC along with some sort of a way to configure the TPR threshold if folks are not opposed to that.
As far I know everybody who's using KVM (whether proprietary or open source) has had no need for that, so I don't think it's a good idea to make the API more complex.
+1
Performance of Windows guests is going to be bad anyway with userspace APIC.
Heh, on modern hardware, performance of any guest is going to suck with a userspace APIC, compared to what is possible with an in-kernel APIC.
More importantly, I really, really don't want to encourage non-trivial usage of a local APIC in userspace. KVM's support for a userspace local APIC is very poorly tested these days. I have zero desire to spend any amount of time reviewing and fixing issues that are unique to emulating the local APIC in userspace. And long term, I would love to force an in-kernel local APIC, though I don't know if that's entirely feasible.