On Wed, Jan 05, 2022 at 05:43:21PM +0000, Sean Christopherson wrote:
On Wed, Jan 05, 2022, Michael Roth wrote:
On Wed, Jan 05, 2022 at 12:17:33AM +0000, Sean Christopherson wrote:
PIO shouldn't require instruction decoding or a #VC handler. What I was thinking is that the guest in the selftest would make a direct #VMGEXIT/TDCALL to request PIO instead of executing an OUT.
That seems like a nicer approach. But it sort of lends itself to having test-specific ucall implementations in some form. How are you thinking vm_create() should decide what implementation to use? With this series in place it could be something like:
vm_create(..., struct ucall_ops *ops) ucall_init_ops(ops)
and with the SEV selftests in their current form it would look something like:
sev_vm_create(...) vm_create_with_ucall(..., ops=ucall_ops_pio_vmgexit) ucall_init_ops(ops)
is that sort of what you're thinking, or something else?
I keep forgetting ucall() doesn't have access to the VM. But, since we're restricing ucall() to a single VM, we can have a global that sets ucall_ops during ucall_init() based on the VM type, or skip an ops and just open code the behavior in x86's ucall() by snapshotting the VM type. Either way, the goal is to avoid having to pass in ucall_ops at the test level.
Yeah, I was thinking it could be done at the lowest level vm_create() helper. We'll need to expand vm_create() (or add yet another layer to avoid modifying a pile of tests) to allow opting out of initializing ucall, e.g. sev_migrate_tests.c needs to create multiple concurrent VMs, but happily doesn't need ucall support.
Why does sev_migrate_tests need to opt out? Couldn't it use ucall_ops_pio_vmgexit like that SEV case above?
Because it uses multiple VMs, and my rough sketch only allows for a single VM to use ucall. Though I suppose we could simply keep appending to the ucall list for every VM. The requirement would then be that all VMs are of the same type, i.e. utilize the same ucall_ops.
Hmm, maybe I misread your patch. Not supporting multiple VMs was the reason I gave up on having the ucall structs allocated on-demand and went with requiring them to be passed as arguments to ucall().
I thought with your patch you had solved that by having each vm have it's own pool, via vm->ucall_list, and then mapping each pool into each guest separately via:
ucall_init(vm): ucall_list = vm->ucall_list sync_global_to_guest(ucall_list).
then as long as that ucall_init() is done *after* the guest calls kvm_vm_elf_load(), it will end up with a 'ucall_list' global that points to it's own specific vm->ucall_list. Then on the test side it doesn't matter what the 'ucall_list' global is currently set to since you have the GPA and know what vm exited.
Or am I missing something there?
Although even if that is the case, now that we're proposing doing the ucall_init() inside vm_create(), then we run the risk of a test calling kvm_vm_elf_load() after, which might clobber the guest's copy of ucall_list global if ucall_init() had since been called for another VM. But that could maybe be worked around by having whatever vm_create() variant we use also do the kvm_vm_elf_load() unconditionally as part of creation.
I ask because there is a ucall() in the exception handling code where some unhandled exceptions result in the guest automatically issuing a ucall(UCALL_UNHANDLED), so even when tests don't use ucall() they might still rely on it if they enable exception handling. So that might be an argument for always setting up at least the default ucall_ops_pio implementation and creating a pool just in case. (or an argument for dropping the UCALL_HANDLED handling).
The sev_migrate_tests don't even run a guest, hence the quick-and-dirty "solution". Though thinking toward the future, that may be too dirty as it would prevent tests from having multiple "real" VMs.
To reduce the burden on tests and avoid ordering issues with creating vCPUs, allocate a ucall struct for every possible vCPU when the VM is created and stuff the GPA of the struct in the struct itself so that the guest can communicate the GPA instead of the GVA. Then confidential VMs just need to make all structs shared.
So a separate call like:
ucall_make_shared(vm->ucall_list)
? Might need some good documentation/assertions to make sure it gets called at the right place for confidential VMs, and may need some extra hooks in SEV selftest implementation for switching from private to shared after the memory has already been allocated, but seems reasonable.
Again, I was thinking that it would be done unconditionally by ucall_init(), i.e. would be automatically handled by the selftest framework and would Just Work for individual tests.
Ok, I'll have to think that through more. Currently with the SEV selftests as they we have:
sev_vm_create(policy, npages) vm = vm_create(...) vm_set_memory_encryption(vm, encrypt_by_default, enc_bit) //vm_vaddr_alloc_shared() can be used now
The ucall struct allocations would need to go through vm_vaddr_alloc_shared() to make sure the selftest library tracks/maps the pages as shared, but that vm_set_memory_encryption() happens too late if the ucall_init() stuff is done in vm_create(). It should be possible to pass the vm_set_memory_encryption() arguments directly to vm_create() to allow for what you're proposing, but I guess we'd need a new vm_create() wrapper that handles both the vm_set_memory_encryption() args, along with the ucall_ops above, something like:
sev_vm_create(policy, npages) vm = vm_create_coco(..., encrypt_by_default, enc_bit/shared_bit, ucall_ops)
Or were you thinking something else? Just trying to get an idea of how this will all need to tie in with the SEV selftests and what needs to change on that end.
Hmm, I was thinking the selftest framework would only need to be told the VM type, e.g. DEFAULT, SEV, SEV-ES, SEV-SNP, or TDX, and would then handle setting everything up, e.g. enumerating the C-bit location and encrypting memory as needed.
One thought would be to extend "enum vm_guest_mode" with flags above NUM_VM_MODES to specify the VM type. That way tests that use VM_MODE_DEFAULT would continue to work without any updates.
Ok, let me see what that approach looks like on the SEV selftest side.