On Thu, Nov 7, 2024 at 5:32 PM Sean Christopherson seanjc@google.com wrote:
On Mon, Nov 04, 2024, Zack Rusin wrote:
On Mon, Nov 4, 2024 at 5:13 PM Paolo Bonzini pbonzini@redhat.com wrote:
On Wed, Oct 30, 2024 at 4:35 AM Zack Rusin zack.rusin@broadcom.com wrote:
VMware products handle hypercalls in userspace. Give KVM the ability to run VMware guests unmodified by fowarding all hypercalls to the userspace.
Enabling of the KVM_CAP_X86_VMWARE_HYPERCALL_ENABLE capability turns the feature on - it's off by default. This allows vmx's built on top of KVM to support VMware specific hypercalls.
Hi Zack,
Hi, Paolo.
Thank you for looking at this.
is there a spec of the hypercalls that are supported by userspace? I would like to understand if there's anything that's best handled in the kernel.
There's no spec but we have open headers listing the hypercalls. There's about a 100 of them (a few were deprecated), the full list starts here: https://github.com/vmware/open-vm-tools/blob/739c5a2f4bfd4cdda491e6a6f6869d8... They're not well documented, but the names are pretty self-explenatory.
At a quick glance, this one needs to be handled in KVM:
BDOOR_CMD_VCPU_MMIO_HONORS_PAT
and these probably should be in KVM:
BDOOR_CMD_GETTIME BDOOR_CMD_SIDT BDOOR_CMD_SGDT BDOOR_CMD_SLDT_STR BDOOR_CMD_GETTIMEFULL BDOOR_CMD_VCPU_LEGACY_X2APIC_OK BDOOR_CMD_STEALCLOCK
and these maybe? (it's not clear what they do, from the name alone)
BDOOR_CMD_GET_VCPU_INFO BDOOR_CMD_VCPU_RESERVED
I'm not sure if there's any value in implementing a few of them. iirc there's 101 of them (as I mentioned a lot have been deprecated but that's for userspace, on the host we still have to do something for old guests using them) and, if out of those 101 we implement 100 in the kernel then, as far as this patch is concerned, it's no different than if we had 0 out of 101 because we're still going to have to exit to userspace to handle that 1 remaining.
Unless you're saying that those would be useful to you. In which case I'd be glad to implement them for you, but I'd put them behind some kind of a cap or a kernel config because we wouldn't be using them - besides what Doug mentioned - we already maintain the shared code for them that's used on Windows, MacOS, ESX and Linux so even if we had them in the Linux kernel it would still make more sense to use the code that's shared with the other OSes to lessen the maintenance burden (so that changing anything within that code consistently changes across all the OSes).
If we allow forwarding _all_ hypercalls to userspace, then people will use it for things other than VMware and there goes all hope of accelerating stuff in the kernel in the future.
To some extent, that ship has sailed, no? E.g. do KVM_XEN_HVM_CONFIG with KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL set, and userspace can intercept pretty much all hypercalls with very few side effects.
So even having _some_ checks in the kernel before going out to userspace would keep that door open, or at least try.
Doug just looked at this and I think I might have an idea on how to limit the scope at least a bit: if you think it would help we could limit forwarding of hypercalls to userspace only to those that that come with a BDOOR_MAGIC (which is 0x564D5868) in eax. Would that help?
I don't think it addresses Paolo's concern (if I understood Paolo's concern correctly), but it would help from the perspective of allowing KVM to support VMware hypercalls and Xen/Hyper-V/KVM hypercalls in the same VM.
Yea, I just don't think there's any realistic way we could handle all of those hypercalls in the kernel so I'm trying to offer some ideas on how to lessen the scope to make it as painless as possible. Unless you think we could somehow parlay my piercing blue eyes into getting those patches in as is, in which case let's do that ;)
I also think we should add CONFIG_KVM_VMWARE from the get-go, and if we're feeling lucky, maybe even retroactively bury KVM_CAP_X86_VMWARE_BACKDOOR behind that Kconfig. That would allow limiting the exposure to VMware specific code, e.g. if KVM does end up handling hypercalls in-kernel. And it might deter abuse to some extent.
I thought about that too. I was worried that even if we make it on by default it will require quite a bit of handholding to make sure all the distros include it, or otherwise on desktops Workstation still wouldn't work with KVM by default, I also felt a little silly trying to add a kernel config for those few lines that would be on pretty much everywhere and since we didn't implement the vmware backdoor functionality I didn't want to presume and try to shield a feature that might be in production by others with a new kernel config.
z