On Wed, Oct 30, 2024 at 4:35 AM Zack Rusin zack.rusin@broadcom.com wrote:
VMware products handle hypercalls in userspace. Give KVM the ability to run VMware guests unmodified by fowarding all hypercalls to the userspace.
Enabling of the KVM_CAP_X86_VMWARE_HYPERCALL_ENABLE capability turns the feature on - it's off by default. This allows vmx's built on top of KVM to support VMware specific hypercalls.
Hi Zack,
is there a spec of the hypercalls that are supported by userspace? I would like to understand if there's anything that's best handled in the kernel.
If we allow forwarding _all_ hypercalls to userspace, then people will use it for things other than VMware and there goes all hope of accelerating stuff in the kernel in the future.
So even having _some_ checks in the kernel before going out to userspace would keep that door open, or at least try.
Patch 1 instead looks good from an API point of view.
Paolo
Signed-off-by: Zack Rusin zack.rusin@broadcom.com Cc: Doug Covelli doug.covelli@broadcom.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Jonathan Corbet corbet@lwn.net Cc: Sean Christopherson seanjc@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: Dave Hansen dave.hansen@linux.intel.com Cc: x86@kernel.org Cc: "H. Peter Anvin" hpa@zytor.com Cc: Shuah Khan shuah@kernel.org Cc: Namhyung Kim namhyung@kernel.org Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Isaku Yamahata isaku.yamahata@intel.com Cc: Joel Stanley joel@jms.id.au Cc: Zack Rusin zack.rusin@broadcom.com Cc: kvm@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-kselftest@vger.kernel.org
Documentation/virt/kvm/api.rst | 41 +++++++++++++++++++++++++++++---- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/x86.c | 33 ++++++++++++++++++++++++++ include/uapi/linux/kvm.h | 1 + 4 files changed, 72 insertions(+), 4 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 33ef3cc785e4..5a8c7922f64f 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6601,10 +6601,11 @@ to the byte array. .. note::
For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN,
KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding
operations are complete (and guest state is consistent) only after userspace
has re-entered the kernel with KVM_RUN. The kernel side will first finish
incomplete operations and then check for pending signals.
KVM_EXIT_EPR, KVM_EXIT_HYPERCALL, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR
the corresponding operations are complete (and guest state is consistent)
only after userspace has re-entered the kernel with KVM_RUN. The kernel
side will first finish incomplete operations and then check for pending
signals. The pending state of the operation is not preserved in state which is visible to userspace, thus userspace should ensure that the operation is
@@ -8201,6 +8202,38 @@ default value for it is set via the kvm.enable_vmware_backdoor kernel parameter (false when not set). Must be set before any VCPUs have been created.
+7.38 KVM_CAP_X86_VMWARE_HYPERCALL +---------------------------------
+:Architectures: x86 +:Parameters: args[0] whether the feature should be enabled or not +:Returns: 0 on success.
+Capability allows userspace to handle hypercalls. When enabled +whenever the vcpu has executed a VMCALL(Intel) or a VMMCALL(AMD) +instruction kvm will exit to userspace with KVM_EXIT_HYPERCALL.
+On exit the hypercall structure of the kvm_run structure will +look as follows:
+::
- /* KVM_EXIT_HYPERCALL */
- struct {
__u64 nr; // rax
__u64 args[6]; // rbx, rcx, rdx, rsi, rdi, rbp
__u64 ret; // cpl, whatever userspace
// sets this to on return will be
// written to the rax
__u64 flags; // KVM_EXIT_HYPERCALL_LONG_MODE if
// the hypercall was executed in
// 64bit mode, 0 otherwise
- } hypercall;
+Except when running in compatibility mode with VMware hypervisors +userspace handling of hypercalls is discouraged. To implement +such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO +(all except s390).
- Other capabilities.
======================
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 7fcf185e337f..7fbb11682517 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1404,6 +1404,7 @@ struct kvm_arch { struct kvm_xen xen; #endif bool vmware_backdoor_enabled;
bool vmware_hypercall_enabled; bool backwards_tsc_observed; bool boot_vcpu_runs_old_kvmclock;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d7071907d6a5..b676c54266e7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4689,6 +4689,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_MEMORY_FAULT_INFO: case KVM_CAP_X86_GUEST_MODE: case KVM_CAP_X86_VMWARE_BACKDOOR:
case KVM_CAP_X86_VMWARE_HYPERCALL: r = 1; break; case KVM_CAP_PRE_FAULT_MEMORY:
@@ -6784,6 +6785,13 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, } mutex_unlock(&kvm->lock); break;
case KVM_CAP_X86_VMWARE_HYPERCALL:
r = -EINVAL;
if (cap->args[0] & ~1)
break;
kvm->arch.vmware_hypercall_enabled = cap->args[0];
r = 0;
break; default: r = -EINVAL; break;
@@ -10127,6 +10135,28 @@ static int complete_hypercall_exit(struct kvm_vcpu *vcpu) return kvm_skip_emulated_instruction(vcpu); }
+static int kvm_vmware_hypercall(struct kvm_vcpu *vcpu) +{
struct kvm_run *run = vcpu->run;
bool is_64_bit = is_64_bit_hypercall(vcpu);
u64 mask = is_64_bit ? U64_MAX : U32_MAX;
vcpu->run->hypercall.flags = is_64_bit ? KVM_EXIT_HYPERCALL_LONG_MODE : 0;
run->hypercall.nr = kvm_rax_read(vcpu) & mask;
run->hypercall.args[0] = kvm_rbx_read(vcpu) & mask;
run->hypercall.args[1] = kvm_rcx_read(vcpu) & mask;
run->hypercall.args[2] = kvm_rdx_read(vcpu) & mask;
run->hypercall.args[3] = kvm_rsi_read(vcpu) & mask;
run->hypercall.args[4] = kvm_rdi_read(vcpu) & mask;
run->hypercall.args[5] = kvm_rbp_read(vcpu) & mask;
run->hypercall.ret = kvm_x86_call(get_cpl)(vcpu);
run->exit_reason = KVM_EXIT_HYPERCALL;
vcpu->arch.complete_userspace_io = complete_hypercall_exit;
return 0;
+}
unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr, unsigned long a0, unsigned long a1, unsigned long a2, unsigned long a3, @@ -10225,6 +10255,9 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) int op_64_bit; int cpl;
if (vcpu->kvm->arch.vmware_hypercall_enabled)
return kvm_vmware_hypercall(vcpu);
if (kvm_xen_hypercall_enabled(vcpu->kvm)) return kvm_xen_hypercall(vcpu);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index c7b5f1c2ee1c..4c2cc6ed29a0 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -934,6 +934,7 @@ struct kvm_enable_cap { #define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237 #define KVM_CAP_X86_GUEST_MODE 238 #define KVM_CAP_X86_VMWARE_BACKDOOR 239 +#define KVM_CAP_X86_VMWARE_HYPERCALL 240
struct kvm_irq_routing_irqchip { __u32 irqchip; -- 2.43.0