On Fri, Dec 10, 2021, Michael Roth wrote:
To summarize, x86 relies on a ucall based on using PIO intructions to generate an exit to userspace and provide the GVA of a dynamically-allocated ucall struct that resides in guest memory and contains information about how to handle/interpret the exit. This doesn't work for SEV guests for 3 main reasons:
- The guest memory is generally encrypted during run-time, so the guest needs to ensure the ucall struct is allocated in shared memory.
- The guest page table is also encrypted, so the address would need to be a GPA instead of a GVA.
- The guest vCPU register may also be encrypted in the case of SEV-ES/SEV-SNP, so the approach of examining vCPU register state has additional requirements such as requiring guest code to implement a #VC handler that can provide the appropriate registers via a vmgexit.
To address these issues, the SEV selftest RFC1 patchset introduced a set of new SEV-specific interfaces that closely mirrored the functionality of ucall()/get_ucall(), but relied on a pre-allocated/static ucall buffer in shared guest memory so it that guest code could pass messages/state to the host by simply writing to this pre-arranged shared memory region and then generating an exit to userspace (via a halt instruction).
Paolo suggested instead implementing support for test/guest-specific ucall implementations that could be used as an alternative to the default PIO-based ucall implementations as-needed based on test/guest requirements, while still allowing for tests to use a common set interfaces like ucall()/get_ucall().
This all seems way more complicated than it needs to be. HLT is _worse_ than PIO on x86 because it triggers a userspace exit if and only if the local APIC is not in-kernel. That is bound to bite someone. The only issue with SEV is the address, not the VM-Exit mechanism. That doesn't change with SEV-ES, SEV-SNP, or TDX, as PIO and HLT will both get reflected as #VC/#VE, i.e. the guest side needs to be updated to use VMGEXIT/TDCALL no matter what, at which point having the hypercall request PIO emulation is just as easy as requesting HLT.
I also don't like having to differentiate between a "shared" and "regular" ucall. I kind of like having to explicitly pass the ucall object being used, but that puts undue burden on simple single-vCPU tests.
The inability to read guest private memory is really the only issue, and that can be easily solved without completely revamping the ucall framework, and without having to update a huge pile of tests to make them place nice with private memory.
This would also be a good opportunity to clean up the stupidity of tests having to manually call ucall_init(), drop the unused/pointless @arg from ucall_init(), and maybe even fix arm64's lurking landmine of not being SMP safe (the address is shared by all vCPUs).
To reduce the burden on tests and avoid ordering issues with creating vCPUs, allocate a ucall struct for every possible vCPU when the VM is created and stuff the GPA of the struct in the struct itself so that the guest can communicate the GPA instead of the GVA. Then confidential VMs just need to make all structs shared.
If all architectures have a way to access a vCPU ID, the ucall structs could be stored as a simple array. If not, a list based allocator would probably suffice.
E.g. something like this, except the list management is in common code instead of x86, and also delete all the per-test ucall_init() calls.
diff --git a/tools/testing/selftests/kvm/lib/x86_64/ucall.c b/tools/testing/selftests/kvm/lib/x86_64/ucall.c index a3489973e290..9aab6407bd42 100644 --- a/tools/testing/selftests/kvm/lib/x86_64/ucall.c +++ b/tools/testing/selftests/kvm/lib/x86_64/ucall.c @@ -8,19 +8,59 @@
#define UCALL_PIO_PORT ((uint16_t)0x1000)
-void ucall_init(struct kvm_vm *vm, void *arg) +static struct list_head *ucall_list; + +void ucall_init(struct kvm_vm *vm) { + struct ucall *ucalls; + int nr_cpus = kvm_check_cap(KVM_CAP_MAX_VCPUS); + int i; + + TEST_ASSERT(!ucall_list, "ucall() can only be used by one VM at a time"); + + INIT_LIST_HEAD(&vm->ucall_list); + + ucalls = vm_vaddr_alloc(nr_cpus * sizeof(struct ucall)); + ucall_make_shared(ucall_list, <size>); + + for (i = 0; i < nr_cpus; i++) { + ucalls[i].gpa = addr_gva2gpa(vm, &ucalls[i]); + + list_add(&vm->ucall_list, &ucalls[i].list) + } + + ucall_list = &vm->ucall_list; + sync_global_to_guest(vm, ucall_list); }
void ucall_uninit(struct kvm_vm *vm) { + ucall_list = NULL; + sync_global_to_guest(vm, ucall_list); +} + +static struct ucall *ucall_alloc(void) +{ + struct ucall *uc; + + /* Is there a lock primitive for the guest? */ + lock_something(&ucall_lock); + uc = list_first_entry(ucall_list, struct ucall, list); + + list_del(&uc->list); + unlock_something(&ucall_lock); +} + +static void ucall_free(struct ucall *uc) +{ + lock_something(&ucall_lock); + list_add(&uc->list, ucall_list); + unlock_something(&ucall_lock); }
void ucall(uint64_t cmd, int nargs, ...) { - struct ucall uc = { - .cmd = cmd, - }; + struct ucall *uc = ucall_alloc(); va_list va; int i;
@@ -32,7 +72,9 @@ void ucall(uint64_t cmd, int nargs, ...) va_end(va);
asm volatile("in %[port], %%al" - : : [port] "d" (UCALL_PIO_PORT), "D" (&uc) : "rax", "memory"); + : : [port] "d" (UCALL_PIO_PORT), "D" (uc->gpa) : "rax", "memory"); + + ucall_free(uc); }
uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, struct ucall *uc) @@ -47,7 +89,7 @@ uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, struct ucall *uc) struct kvm_regs regs;
vcpu_regs_get(vm, vcpu_id, ®s); - memcpy(&ucall, addr_gva2hva(vm, (vm_vaddr_t)regs.rdi), + memcpy(&ucall, addr_gpa2hva(vm, (vm_paddr_t)regs.rdi), sizeof(ucall));
vcpu_run_complete_io(vm, vcpu_id);