On Mon, Jun 13, 2022 at 12:49:28PM -0500, Michael Roth wrote:
On Fri, Jun 10, 2022 at 02:01:41PM -0700, Vishal Annapurve wrote:
....
I ended up adding a KVM_CAP_UNMAPPED_PRIVATE_MEM to distinguish between the 2 modes. With UPM-mode enabled it basically means KVM can/should enforce that all private guest pages are backed by private memslots, and enable a couple platform-specific hooks to handle MAP_GPA_RANGE, and queries from MMU on whether or not an NPT fault is for a private page or not. SEV uses these hooks to manage its encryption bitmap, and uses that bitmap as the authority on whether or not a page is encrypted. SNP uses GHCB page-state-change requests so MAP_GPA_RANGE is a no-op there, but uses the MMU hook to indicate whether a fault is private based on the page fault flags.
When UPM-mode isn't enabled, MAP_GPA_RANGE just gets passed on to userspace as before, and platform-specific hooks above are no-ops. That's the mode your SEV self-tests ran in initially. I added a test that runs the PrivateMemoryPrivateAccess in UPM-mode, where the guest's OS memory is also backed by private memslot and the platform hooks are enabled, and things seem to still work okay there. I only added a UPM-mode test for the PrivateMemoryPrivateAccess one though so far. I suppose we'd want to make sure it works exactly as it did with UPM-mode disabled, but I don't see why it wouldn't.
Thanks Michael for the update. Yeah, using the bitmap to track private/shared-ness of gfn ranges should be the better way to go as compared to the limited approach I used to just track a single contiguous pfn range. I spent some time in getting the SEV/SEV-ES priv memfd selftests to execute from private fd as well and ended up doing similar changes as part of the github tree: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com....
But probably worth having some discussion on how exactly we should define this mode, and whether that meshes with what TDX folks are planning.
I've pushed my UPM-mode selftest additions here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com...
And the UPM SEV/SEV-SNP tree I'm running them against (DISCLAIMER: EXPERIMENTAL): https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com...
Thanks for the references here. This helps get a clear picture around the status of priv memfd integration with Sev-SNP VMs and this work will be the base of future SEV specific priv memfd selftest patches as things get more stable.
I see usage of pwrite to populate initial private memory contents. Does it make sense to have SEV_VM_LAUNCH_UPDATE_DATA handle the private fd population as well? I tried to prototype it via: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com...
Thanks for the pointer and for taking a stab at this approach (hadn't realized you were looking into this so sorry for the overlap with your code).
as I got this suggestion from Erdem Aktas(erdemaktas@google) while discussing about executing guest code from private fd.
The way we way have the host patches implemented currently is sort of based around the idea that userspace handles all private/shared conversion via allocations/deallocations from the private backing store, since I thought that was one of the design goals. For SNP that means allocating a page from backing store will trigger the additional hooks in the kernel needed to do some additional bookkeeping like RMP updates and removing from directmap, which I'm doing via a platform-specific callback I've added to the KVM memfile notifier callback.
There was some talk of allowing a sort of pre-boot stage to the MFD_INACCESSIBLE protections where writes would be allowed up until a certain point. The kernel hack to allow pwrite() was sort of a holdover for this support.
Handling pre-population as part of SNP_LAUNCH_UPDATE seems sort of incompatible with this, since it reads from shared memory and writes into private memory.
Well, no, it wouldn't be, since your code handles it the same way as mine, where kvm_private_mem_get_pfn() allocates a page, but doesn't generate a notifier event, so you can defer things like RMP updates until after the memory is populated. So this might be a reasonable approach as well. But still worth exploring if a more general KVM ioctl is the better approach.