On Fri, Jul 29, 2022, Sean Christopherson wrote:
On Mon, Jul 25, 2022, Chao Peng wrote:
On Thu, Jul 21, 2022 at 05:58:50PM +0000, Sean Christopherson wrote:
On Thu, Jul 21, 2022, Chao Peng wrote:
On Thu, Jul 21, 2022 at 03:34:59PM +0800, Wei Wang wrote:
On 7/21/22 00:21, Sean Christopherson wrote: Maybe you could tag it with cgs for all the confidential guest support related stuff: e.g. kvm_vm_ioctl_set_cgs_mem()
bool is_private = ioctl == KVM_MEMORY_ENCRYPT_REG_REGION; ... kvm_vm_ioctl_set_cgs_mem(, is_private)
If we plan to widely use such abbr. through KVM (e.g. it's well known), I'm fine.
I'd prefer to stay away from "confidential guest", and away from any VM-scoped name for that matter. User-unmappable memmory has use cases beyond hiding guest state from the host, e.g. userspace could use inaccessible/unmappable memory to harden itself against unintentional access to guest memory.
I actually use mem_attr in patch: https://lkml.org/lkml/2022/7/20/610 But I also don't quite like it, it's so generic and sounds say nothing.
But I do want a name can cover future usages other than just private/shared (pKVM for example may have a third state).
I don't think there can be a third top-level state. Memory is either private to the guest or it's not. There can be sub-states, e.g. memory could be selectively shared or encrypted with a different key, in which case we'd need metadata to track that state.
Though that begs the question of whether or not private_fd is the correct terminology. E.g. if guest memory is backed by a memfd that can't be mapped by userspace (currently F_SEAL_INACCESSIBLE), but something else in the kernel plugs that memory into a device or another VM, then arguably that memory is shared, especially the multi-VM scenario.
For TDX and SNP "private vs. shared" is likely the correct terminology given the current specs, but for generic KVM it's probably better to align with whatever terminology is used for memfd. "inaccessible_fd" and "user_inaccessible_fd" are a bit odd since the fd itself is accesible.
What about "user_unmappable"? E.g.
F_SEAL_USER_UNMAPPABLE, MFD_USER_UNMAPPABLE, KVM_HAS_USER_UNMAPPABLE_MEMORY, MEMFILE_F_USER_INACCESSIBLE, user_unmappable_fd, etc...
For KVM I also think user_unmappable looks better than 'private', e.g. user_unmappable_fd/KVM_HAS_USER_UNMAPPABLE_MEMORY sounds more appropriate names. For memfd however, I don't feel that strong to change it from current 'inaccessible' to 'user_unmappable', one of the reason is it's not just about unmappable, but actually also inaccessible through direct ioctls like read()/write().
Heh, I _knew_ there had to be a catch. I agree that INACCESSIBLE is better for memfd.
Thought about this some more...
I think we should avoid UNMAPPABLE even on the KVM side of things for the core memslots functionality and instead be very literal, e.g.
KVM_HAS_FD_BASED_MEMSLOTS KVM_MEM_FD_VALID
We'll still need KVM_HAS_USER_UNMAPPABLE_MEMORY, but it won't be tied directly to the memslot. Decoupling the two thingis will require a bit of extra work, but the code impact should be quite small, e.g. explicitly query and propagate MEMFILE_F_USER_INACCESSIBLE to kvm_memory_slot to track if a memslot can be private. And unless I'm missing something, it won't require an additional memslot flag. The biggest oddity (if we don't also add KVM_MEM_PRIVATE) is that KVM would effectively ignore the hva for fd-based memslots for VM types that don't support private memory, i.e. userspace can't opt out of using the fd-based backing, but that doesn't seem like a deal breaker.
Decoupling private memory from fd-based memslots will allow using fd-based memslots for backing VMs even if the memory is user mappable, which opens up potentially interesting use cases. It would also allow testing some parts of fd-based memslots with existing VMs.
The big advantage of KVM's hva-based memslots is that KVM doesn't care what's backing a memslot, and so (in thoery) enabling new backing stores for KVM is free. It's not always free, but at this point I think we've eliminated most of the hiccups, e.g. x86's MMU should no longer require additional enlightenment to support huge pages for new backing types.
On the flip-side, a big disadvantage of hva-based memslots is that KVM doesn't _know_ what's backing a memslot. This is one of the major reasons, if not _the_ main reason at this point, why KVM binds a VM to a single virtual address space. Running with different hva=>pfn mappings would either be completely unsafe or prohibitively expensive (nearly impossible?) to ensure.
With fd-based memslots, KVM essentially binds a memslot directly to the backing store. This allows KVM to do a "deep" comparison of a memslot between two address spaces simply by checking that the backing store is the same. For intra-host/copyless migration (to upgrade the userspace VMM), being able to do a deep comparison would theoretically allow transferring KVM's page tables between VMs instead of forcing the target VM to rebuild the page tables. There are memcg complications (and probably many others) for transferring page tables, but I'm pretty sure it could work.
I don't have a concrete use case (this is a recent idea on my end), but since we're already adding fd-based memory, I can't think of a good reason not make it more generic for not much extra cost. And there are definitely classes of VMs for which fd-based memory would Just Work, e.g. large VMs that are never oversubscribed on memory don't need to support reclaim, so the fact that fd-based memslots won't support page aging (among other things) right away is a non-issue.