On Wed, Oct 16, 2024 at 08:54:24PM -0300, Jason Gunthorpe wrote:
On Wed, Oct 16, 2024 at 07:49:31PM -0400, Peter Xu wrote:
On Wed, Oct 16, 2024 at 07:51:57PM -0300, Jason Gunthorpe wrote:
On Wed, Oct 16, 2024 at 04:16:17PM -0400, Peter Xu wrote:
Is there chance that when !CoCo will be supported, then external modules (e.g. VFIO) can reuse the old user mappings, just like before gmemfd?
To support CoCo, I understand gmem+offset is required all over the places. However in a non-CoCo context, I wonder whether the other modules are required to stick with gmem+offset, or they can reuse the old VA ways, because how it works can fundamentally be the same as before, except that the folios now will be managed by gmemfd.
My intention with iommufd was to see fd + offest as the "new" way to refer to all guest memory and discourage people from using VMA handles.
Does it mean anonymous memory guests will not be supported at all for iommufd?
No, they can use the "old" way with normal VMA's still, or they can use an anonymous memfd with the new way..
I just don't expect to have new complex stuff built on the VMA interface - I don't expect guestmemfd VMAs to work.
Yes, if with guestmemfd already we probably don't need to bother on the VA interface.
It's the same when guestmemfd supports KVM_SET_USER_MEMORY_REGION2 already, then it's not a problem at all to use fd+offset for this KVM API.
My question was more torwards whether gmemfd could still expose the possibility to be used in VA forms to other modules that may not support fd+offsets yet. And I assume your reference on the word "VMA" means "VA ranges", while "gmemfd VMA" on its own is probably OK? Which is proposed in this series with the fault handler.
It may not be a problem to many cloud providers, but if QEMU is involved, it's still pretty flexible and QEMU will need to add fd+offset support for many of the existing interfaces that is mostly based on VA or VA ranges. I believe that includes QEMU itself, aka, the user hypervisor (which is about how user app should access shared pages that KVM is fault-allowed), vhost-kernel (more GUP oriented), vhost-user (similar to userapp side), etc.
I think as long as we can provide gmemfd VMAs like what this series provides, it sounds possible to reuse the old VA interfaces before the CoCo interfaces are ready, so that people can already start leveraging gmemfd backing pages.
The idea is in general nice to me - QEMU used to have a requirement where we want to have strict vIOMMU semantics between QEMU and another process that runs the device emulation (aka, vhost-user). We didn't want to map all guest RAM all the time because OVS bug can corrupt QEMU memory until now even if vIOMMU is present (which should be able to prevent this, only logically..). We used to have the idea that we can have one fd sent to vhost-user process that we can have control of what is mapped and what can be zapped.
In this case of gmemfd that is mostly what we used to persue already before, that:
- It allows mmap() of a guest memory region (without yet the capability to access all of them... otherwise it can bypass protection, no matter it's for CoCo or a vIOMMU in this case)
- It allows the main process (in this case, it can be QEMU/KVM or anything/KVM) to control how to fault in the pages, in this case gmemfd lazily faults in the pages only if they're falutable / shared
- It allows remote tearing down of pages that were not faultable / shared anymore, which guarantees the safety measure that the other process cannot access any page that was not authorized
I wonder if it's good enough even for CoCo's use case, where if anyone wants to illegally access some page, it'll simply crash.
Besides that, we definitely can also have good use of non-CoCo 1G pages on either postcopy solution (that James used to work on for HGM), or hwpoisoning (where currently at least the latter one is, I believe, still a common issue for all of us, to make hwpoison work for hugetlbfs with PAGE_SIZE granule [1]). The former issue will be still required at least for QEMU to leverage the split-abliity of gmemfd huge folios.
Then even if both KVM ioctls + iommufd ioctls will only support fd+offsets, as long as it's allowed to be faultable and gupped on the shared portion of the gmemfd folios, they can start to be considered using to replace hugetlb to overcome those difficulties even before CoCo is supported all over the places. There's also a question on whether all the known modules would finally support fd+offsets, which I'm not sure. If some module won't support it, maybe it can still work with gmemfd in VA ranges so that it can still benefit from what gmemfd can provide.
So in short, not sure if the use case can use a combination of (fd, offset) interfacing on some modules like KVM/iommufd, but VA ranges like before on some others.
Thanks,
[1] https://lore.kernel.org/all/20240924043924.3562257-1-jiaqiyan@google.com/