On Mon, Feb 3, 2025 at 1:22 PM Paolo Bonzini pbonzini@redhat.com wrote:
On Mon, Feb 3, 2025 at 5:35 PM Doug Covelli doug.covelli@broadcom.com wrote:
OK. It seems like fully embracing the in-kernel APIC is the way to go especially considering it really simplifies using KVM's support for nested virtualization. Speaking of nested virtualization we have been working on adding support for that and would like to propose a couple of changes:
- Add an option for L0 to handle backdoor accesses from CPL3 code running in L2.
On a #GP nested_vmx_l0_wants_exit can check if this option is enabled and KVM can handle the #GP like it would if it had been from L1 (exit to userlevel iff it is a backdoor access otherwwise deliver the fault to L2). When combined with enable_vmware_backdoor this will allow L0 to optionally handle backdoor accesses from CPL3 code running in L2. This is needed for cases such as running VMware tools in a Windows VM with VBS enabled. For other cases such as running tools in a Windows VM in an ESX VM we still want L1 to handle the backdoor accesses from L2.
I think this makes sense and could be an argument to KVM_ENABLE_CAP.
- Extend KVM_EXIT_MEMORY_FAULT for permission faults (e.g the guest attempting
to write to a page that has been protected by userlevel calling mprotect). This is useful for cases where we want synchronous detection of guest writes such as lazy snapshots (dirty page tracking is no good for this case). Currently permission faults result in KVM_RUN returning EFAULT which we handle by interpreting the instruction as we do not know the guest physical address associated with the fault.
Yes, this makes sense too, though you might want to look into userfaultfd as well.
We had something planned using attributes, but I don't see any issue extending it to EFAULT. Maybe it would have to be yet another KVM_ENABLE_CAP; considering that it would break your existing code, there might be someone else in the wild doing it.
It looks like KVM_EXIT_MEMORY_FAULT was implemented in such a way that it won't break existing code:
Note! KVM_EXIT_MEMORY_FAULT is unique among all KVM exit reasons in that it accompanies a return code of ‘-1’, not ‘0’! errno will always be set to EFAULT or EHWPOISON when KVM exits with KVM_EXIT_MEMORY_FAULT, userspace should assume kvm_run.exit_reason is stale/undefined for all other error numbers.
That being said we could certainly make this opt-in if that is preferable.
Doug
Paolo