Unmapping virtual machine guest memory from the host kernel's direct map is a successful mitigation against Spectre-style transient execution issues: If the kernel page tables do not contain entries pointing to guest memory, then any attempted speculative read through the direct map will necessarily be blocked by the MMU before any observable microarchitectural side-effects happen. This means that Spectre-gadgets and similar cannot be used to target virtual machine memory. Roughly 60% of speculative execution issues fall into this category [1, Table 1].
This patch series extends guest_memfd with the ability to remove its memory from the host kernel's direct map, to be able to attain the above protection for KVM guests running inside guest_memfd.
=== Changes to RFC v3 ===
- Settle relationship between direct map removal and shared/private memory in guest_memfd (David H.) - Omit TLB flushes upon direct map removal again - Settle uABI for how KVM accesses guest memory in non-CoCo guest_memfd VMs (upstream guest_memfd calls) - Add selftests that exercise the codepaths of non-CoCo guest_memfd VMs
Lastly, this series is rebased on top of Fuad's v4 for shared mapping of guest_memfd [2]. The KVM parts should also apply on top of 0ad2507d5d93 ("Linux 6.14-rc3"), but the selftest patches need Fuad's series as base.
=== Overview ===
guest_memfd should be usable for "non-CoCo" VMs - virtual machines where host userspace is trusted (e.g. can have access to all of guest memory), but which should still be hardened against speculative execution attacks (Spectre, etc.) staged through potentially existing gadgets in the host kernel.
To attain this hardening, unmap guest memory from the host kernels address space (e.g. zap direct map entries), while allowing KVM to continue accessing guest memory through userspace mappings. This works because KVM already almost always uses userspace mappings whenever KVM needs to access guest memory - the only parts that require direct map entries (because they use GUP) are KVM's MMU, and kvm-clock on x86.
Building on top of guest_memfd sidesteps the MMU problem, as for memslots with KVM_MEM_GUEST_MEMFD set, the MMU consumes fd + offset directly without going through any VMAs. kvm-clock on the other hand is not strictly needed (guests boot fine without it), so ignore it for now.
=== Implementation ===
Make KVM_CREATE_GUEST_MEMFD accept a flag (KVM_GMEM_NO_DIRECT_MAP) that instructs it to remove newly allocated folios from the host kernels direct map immediately after preparation.
Nothing further is needed to make non-CoCo VMs work - particularly, KVM does not need to be taught any special ways of accessing guest memory if it is in guest_memfd. Userspace can simply mmap guest_memfd (via KVM_GMEM_SHARED_MEM added in Fuad's series), and set the memslot's userspace_addr to this userspace mapping of guest_memfd.
=== Open Questions ===
In this patch series, stale TLB entries do not get flushed after direct map entries are marked as not present. This is fine from a functional point of view (as the mapping is still valid, it's just temporarily not supposed to be used), but pokes a theoretical hole into the speculation protection: Something could try to keep alive stale TLB entries for specific pages until the guest starts using them for sensitive information, and then stage a Spectre attack on that memory, despite it being unmapped. In practice, this would require knowing in advance, at gmem fault-time, which pages will eventually contain information of interest, and then preventing these specific TLB entries from getting naturally evicted (where the number of pages that can be targeted like this is limited by the size of the TLB). These seem to be fairly difficult requisites to fulfill, but we were wondering what the community thinks.
=== Summary ===
Patch 1 adds a struct address_space flag that indices that folios in a mapping are direct map removed, and threads it through mm code to ensure direct map removed folios don't end up in places where they can cause mayhem (particularly, we reject them in get_user_pages). Since these checks end up being duplicates of already existing checks for secretmem folios, patch 2 unifies the two by using the new address_space flag for secretmem mappings. Patches 3 through 5 are about support for direct map removal in guest_memfd, while patches 6 through 12 are about testing the non-CoCo setup in KVM selftests, with patches 6 through 9 being preparatory, and patches 10 through 12 adding the actual test cases.
[1]: https://download.vusec.net/papers/quarantine_raid23.pdf [2]: https://lore.kernel.org/kvm/20250218172500.807733-1-tabba@google.com/ [RFC v1]: https://lore.kernel.org/kvm/20240709132041.3625501-1-roypat@amazon.co.uk/ [RFC v2]: https://lore.kernel.org/kvm/20240910163038.1298452-1-roypat@amazon.co.uk/ [RFC v3]: https://lore.kernel.org/kvm/20241030134912.515725-1-roypat@amazon.co.uk/
Patrick Roy (12): mm: introduce AS_NO_DIRECT_MAP mm/secretmem: set AS_NO_DIRECT_MAP instead of special-casing KVM: guest_memfd: Add flag to remove from direct map KVM: Add capability to discover KVM_GMEM_NO_DIRECT_MAP support KVM: Documentation: document KVM_GMEM_NO_DIRECT_MAP flag KVM: selftests: load elf via bounce buffer KVM: selftests: set KVM_MEM_GUEST_MEMFD in vm_mem_add() if guest_memfd != -1 KVM: selftests: Add guest_memfd based vm_mem_backing_src_types KVM: selftests: stuff vm_mem_backing_src_type into vm_shape KVM: selftests: adjust test_create_guest_memfd_invalid KVM: selftests: set KVM_GMEM_NO_DIRECT_MAP in mem conversion tests KVM: selftests: Test guest execution from direct map removed gmem
Documentation/virt/kvm/api.rst | 13 ++++ include/linux/pagemap.h | 16 +++++ include/linux/secretmem.h | 18 ------ include/uapi/linux/kvm.h | 3 + lib/buildid.c | 4 +- mm/gup.c | 14 +--- mm/mlock.c | 2 +- mm/secretmem.c | 6 +- .../testing/selftests/kvm/guest_memfd_test.c | 2 +- .../testing/selftests/kvm/include/kvm_util.h | 29 ++++++--- .../testing/selftests/kvm/include/test_util.h | 8 +++ tools/testing/selftests/kvm/lib/elf.c | 8 +-- tools/testing/selftests/kvm/lib/io.c | 23 +++++++ tools/testing/selftests/kvm/lib/kvm_util.c | 64 +++++++++++-------- tools/testing/selftests/kvm/lib/test_util.c | 8 +++ tools/testing/selftests/kvm/lib/x86/sev.c | 1 + .../selftests/kvm/pre_fault_memory_test.c | 1 + .../selftests/kvm/set_memory_region_test.c | 40 ++++++++++++ .../kvm/x86/private_mem_conversions_test.c | 7 +- virt/kvm/guest_memfd.c | 24 ++++++- virt/kvm/kvm_main.c | 5 ++ 21 files changed, 214 insertions(+), 82 deletions(-)
base-commit: da40655874b54a2b563f8ceb3ed839c6cd38e0b4
Add AS_NO_DIRECT_MAP for mappings where direct map entries of folios are set to not present . Currently, mappings that match this description are secretmem mappings (memfd_secret()). Later, some guest_memfd configurations will also fall into this category.
Reject this new type of mappings in all locations that currently reject secretmem mappings, on the assumption that if secretmem mappings are rejected somewhere, it is precisely because of an inability to deal with folios without direct map entries.
Use a new flag instead of overloading AS_INACCESSIBLE (which is already set by guest_memfd) because not all guest_memfd mappings will end up being direct map removed (e.g. in pKVM setups, parts of guest_memfd that can be mapped to userspace should also be GUP-able, and generally not have restrictions on who can access it).
Signed-off-by: Patrick Roy roypat@amazon.co.uk --- include/linux/pagemap.h | 16 ++++++++++++++++ lib/buildid.c | 4 ++-- mm/gup.c | 6 +++++- mm/mlock.c | 3 ++- 4 files changed, 25 insertions(+), 4 deletions(-)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 47bfc6b1b632..903b41e89cf8 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -210,6 +210,7 @@ enum mapping_flags { AS_STABLE_WRITES = 7, /* must wait for writeback before modifying folio contents */ AS_INACCESSIBLE = 8, /* Do not attempt direct R/W access to the mapping */ + AS_NO_DIRECT_MAP = 9, /* Folios in the mapping are not in the direct map */ /* Bits 16-25 are used for FOLIO_ORDER */ AS_FOLIO_ORDER_BITS = 5, AS_FOLIO_ORDER_MIN = 16, @@ -335,6 +336,21 @@ static inline bool mapping_inaccessible(struct address_space *mapping) return test_bit(AS_INACCESSIBLE, &mapping->flags); }
+static inline void mapping_set_no_direct_map(struct address_space *mapping) +{ + set_bit(AS_NO_DIRECT_MAP, &mapping->flags); +} + +static inline bool mapping_no_direct_map(struct address_space *mapping) +{ + return test_bit(AS_NO_DIRECT_MAP, &mapping->flags); +} + +static inline bool vma_is_no_direct_map(const struct vm_area_struct *vma) +{ + return vma->vm_file && mapping_no_direct_map(vma->vm_file->f_mapping); +} + static inline gfp_t mapping_gfp_mask(struct address_space * mapping) { return mapping->gfp_mask; diff --git a/lib/buildid.c b/lib/buildid.c index c4b0f376fb34..80b5d805067f 100644 --- a/lib/buildid.c +++ b/lib/buildid.c @@ -65,8 +65,8 @@ static int freader_get_folio(struct freader *r, loff_t file_off)
freader_put_folio(r);
- /* reject secretmem folios created with memfd_secret() */ - if (secretmem_mapping(r->file->f_mapping)) + /* reject secretmem folios created with memfd_secret() or guest_memfd() */ + if (secretmem_mapping(r->file->f_mapping) || mapping_no_direct_map(r->file->f_mapping)) return -EFAULT;
r->folio = filemap_get_folio(r->file->f_mapping, file_off >> PAGE_SHIFT); diff --git a/mm/gup.c b/mm/gup.c index 3883b307780e..7ddaf93c5b6a 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1283,7 +1283,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma)) return -EOPNOTSUPP;
- if (vma_is_secretmem(vma)) + if (vma_is_secretmem(vma) || vma_is_no_direct_map(vma)) return -EFAULT;
if (write) { @@ -2849,6 +2849,10 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags) */ if (check_secretmem && secretmem_mapping(mapping)) return false; + + if (mapping_no_direct_map(mapping)) + return false; + /* The only remaining allowed file system is shmem. */ return !reject_file_backed || shmem_mapping(mapping); } diff --git a/mm/mlock.c b/mm/mlock.c index cde076fa7d5e..07a351491d9d 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -474,7 +474,8 @@ static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (newflags == oldflags || (oldflags & VM_SPECIAL) || is_vm_hugetlb_page(vma) || vma == get_gate_vma(current->mm) || - vma_is_dax(vma) || vma_is_secretmem(vma) || (oldflags & VM_DROPPABLE)) + vma_is_dax(vma) || vma_is_secretmem(vma) || vma_is_no_direct_map(vma) || + (oldflags & VM_DROPPABLE)) /* don't set VM_LOCKED or VM_LOCKONFAULT and don't count */ goto out;
base-commit: da40655874b54a2b563f8ceb3ed839c6cd38e0b4
Make secretmem set AS_NO_DIRECT_MAP on its struct address_space, to drop all the vma_is_secretmem()/secretmem_mapping() checks that are based on checking explicitly for the secretmem ops structures.
This drops a optimization in gup_fast_folio_allowed() where secretmem_mapping() was only called if CONFIG_SECRETMEM=y. secretmem is enabled by default since commit b758fe6df50d ("mm/secretmem: make it on by default"), so the secretmem check did not actually end up elided in most cases anymore anyway.
Signed-off-by: Patrick Roy roypat@amazon.co.uk --- include/linux/secretmem.h | 18 ------------------ lib/buildid.c | 2 +- mm/gup.c | 14 +------------- mm/mlock.c | 3 +-- mm/secretmem.c | 6 +----- 5 files changed, 4 insertions(+), 39 deletions(-)
diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h index e918f96881f5..0ae1fb057b3d 100644 --- a/include/linux/secretmem.h +++ b/include/linux/secretmem.h @@ -4,28 +4,10 @@
#ifdef CONFIG_SECRETMEM
-extern const struct address_space_operations secretmem_aops; - -static inline bool secretmem_mapping(struct address_space *mapping) -{ - return mapping->a_ops == &secretmem_aops; -} - -bool vma_is_secretmem(struct vm_area_struct *vma); bool secretmem_active(void);
#else
-static inline bool vma_is_secretmem(struct vm_area_struct *vma) -{ - return false; -} - -static inline bool secretmem_mapping(struct address_space *mapping) -{ - return false; -} - static inline bool secretmem_active(void) { return false; diff --git a/lib/buildid.c b/lib/buildid.c index 80b5d805067f..33f173a607ad 100644 --- a/lib/buildid.c +++ b/lib/buildid.c @@ -66,7 +66,7 @@ static int freader_get_folio(struct freader *r, loff_t file_off) freader_put_folio(r);
/* reject secretmem folios created with memfd_secret() or guest_memfd() */ - if (secretmem_mapping(r->file->f_mapping) || mapping_no_direct_map(r->file->f_mapping)) + if (mapping_no_direct_map(r->file->f_mapping)) return -EFAULT;
r->folio = filemap_get_folio(r->file->f_mapping, file_off >> PAGE_SHIFT); diff --git a/mm/gup.c b/mm/gup.c index 7ddaf93c5b6a..b1483a876740 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1283,7 +1283,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma)) return -EOPNOTSUPP;
- if (vma_is_secretmem(vma) || vma_is_no_direct_map(vma)) + if (vma_is_no_direct_map(vma)) return -EFAULT;
if (write) { @@ -2786,7 +2786,6 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags) { bool reject_file_backed = false; struct address_space *mapping; - bool check_secretmem = false; unsigned long mapping_flags;
/* @@ -2798,14 +2797,6 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags) reject_file_backed = true;
/* We hold a folio reference, so we can safely access folio fields. */ - - /* secretmem folios are always order-0 folios. */ - if (IS_ENABLED(CONFIG_SECRETMEM) && !folio_test_large(folio)) - check_secretmem = true; - - if (!reject_file_backed && !check_secretmem) - return true; - if (WARN_ON_ONCE(folio_test_slab(folio))) return false;
@@ -2847,9 +2838,6 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags) * At this point, we know the mapping is non-null and points to an * address_space object. */ - if (check_secretmem && secretmem_mapping(mapping)) - return false; - if (mapping_no_direct_map(mapping)) return false;
diff --git a/mm/mlock.c b/mm/mlock.c index 07a351491d9d..a43f308be70d 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -474,8 +474,7 @@ static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (newflags == oldflags || (oldflags & VM_SPECIAL) || is_vm_hugetlb_page(vma) || vma == get_gate_vma(current->mm) || - vma_is_dax(vma) || vma_is_secretmem(vma) || vma_is_no_direct_map(vma) || - (oldflags & VM_DROPPABLE)) + vma_is_dax(vma) || vma_is_no_direct_map(vma) || (oldflags & VM_DROPPABLE)) /* don't set VM_LOCKED or VM_LOCKONFAULT and don't count */ goto out;
diff --git a/mm/secretmem.c b/mm/secretmem.c index 1b0a214ee558..ea4c04d469b1 100644 --- a/mm/secretmem.c +++ b/mm/secretmem.c @@ -136,11 +136,6 @@ static int secretmem_mmap(struct file *file, struct vm_area_struct *vma) return 0; }
-bool vma_is_secretmem(struct vm_area_struct *vma) -{ - return vma->vm_ops == &secretmem_vm_ops; -} - static const struct file_operations secretmem_fops = { .release = secretmem_release, .mmap = secretmem_mmap, @@ -214,6 +209,7 @@ static struct file *secretmem_file_create(unsigned long flags)
mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); mapping_set_unevictable(inode->i_mapping); + mapping_set_no_direct_map(inode->i_mapping);
inode->i_op = &secretmem_iops; inode->i_mapping->a_ops = &secretmem_aops;
Add KVM_GMEM_NO_DIRECT_MAP flag for KVM_CREATE_GUEST_MEMFD() ioctl. When set, guest_memfd folios will be removed from the direct map after preparation, with direct map entries only restored when the folios are freed.
To ensure these folios do not end up in places where the kernel cannot deal with them, set AS_NO_DIRECT_MAP on the guest_memfd's struct address_space if KVM_GMEM_NO_DIRECT_MAP is requested.
Note that this flag causes removal of direct map entries for all guest_memfd folios independent of whether they are "shared" or "private" (although current guest_memfd only supports either all folios in the "shared" state, or all folios in the "private" state if !IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)). The usecase for removing direct map entries of also the shared parts of guest_memfd are a special type of non-CoCo VM where, host userspace is trusted to have access to all of guest memory, but where Spectre-style transient execution attacks through the host kernel's direct map should still be mitigated.
Note that KVM retains access to guest memory via userspace mappings of guest_memfd, which are reflected back into KVM's memslots via userspace_addr. This is needed for things like MMIO emulation on x86_64 to work. Previous iterations attempted to instead have KVM temporarily restore direct map entries whenever such an access to guest memory was needed, but this turned out to have a significant performance impact, as well as additional complexity due to needing to refcount direct map reinsertion operations and making them play nicely with gmem truncations.
This iteration also doesn't have KVM perform TLB flushes after direct map manipulations. This is because TLB flushes resulted in a up to 40x elongation of page faults in guest_memfd (scaling with the number of CPU cores), or a 5x elongation of memory population. On the one hand, TLB flushes are not needed for functional correctness (the virt->phys mapping technically stays "correct", the kernel should simply to not it for a while), so this is a correct optimization to make. On the other hand, it means that the desired protection from Spectre-style attacks is not perfect, as an attacker could try to prevent a stale TLB entry from getting evicted, keeping it alive until the page it refers to is used by the guest for some sensitive data, and then targeting it using a spectre-gadget.
Signed-off-by: Patrick Roy roypat@amazon.co.uk --- include/uapi/linux/kvm.h | 2 ++ virt/kvm/guest_memfd.c | 28 +++++++++++++++++++++++++++- 2 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 117937a895da..4654c01a0a01 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1573,6 +1573,8 @@ struct kvm_create_guest_memfd { __u64 reserved[6]; };
+#define KVM_GMEM_NO_DIRECT_MAP (1ULL << 0) + #define KVM_PRE_FAULT_MEMORY _IOWR(KVMIO, 0xd5, struct kvm_pre_fault_memory)
struct kvm_pre_fault_memory { diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 30b47ff0e6d2..bd7d361c9bb7 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -4,6 +4,7 @@ #include <linux/kvm_host.h> #include <linux/pagemap.h> #include <linux/anon_inodes.h> +#include <linux/set_memory.h>
#include "kvm_mm.h"
@@ -42,8 +43,23 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo return 0; }
+static bool kvm_gmem_test_no_direct_map(struct inode *inode) +{ + return ((unsigned long) inode->i_private) & KVM_GMEM_NO_DIRECT_MAP; +} + static inline void kvm_gmem_mark_prepared(struct folio *folio) { + struct inode *inode = folio_inode(folio); + + if (kvm_gmem_test_no_direct_map(inode)) { + int r = set_direct_map_valid_noflush(folio_page(folio, 0), folio_nr_pages(folio), + false); + + if (!r) + folio_set_private(folio); + } + folio_mark_uptodate(folio); }
@@ -479,6 +495,10 @@ static void kvm_gmem_free_folio(struct folio *folio) kvm_pfn_t pfn = page_to_pfn(page); int order = folio_order(folio);
+ if (folio_test_private(folio)) + WARN_ON_ONCE(set_direct_map_valid_noflush(folio_page(folio, 0), + folio_nr_pages(folio), true)); + kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order)); } #endif @@ -552,6 +572,9 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) /* Unmovable mappings are supposed to be marked unevictable as well. */ WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
+ if (flags & KVM_GMEM_NO_DIRECT_MAP) + mapping_set_no_direct_map(inode->i_mapping); + kvm_get_kvm(kvm); gmem->kvm = kvm; xa_init(&gmem->bindings); @@ -571,7 +594,10 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args) { loff_t size = args->size; u64 flags = args->flags; - u64 valid_flags = 0; + u64 valid_flags = KVM_GMEM_NO_DIRECT_MAP; + + if (!can_set_direct_map()) + valid_flags &= ~KVM_GMEM_NO_DIRECT_MAP;
if (flags & ~valid_flags) return -EINVAL;
Add a capability to let userspace discover whether guest_memfd supports removing its folios from the direct map. Support depends on guest_memfd itself being supported, but also on whether KVM can manipulate the direct map at page granularity at all (possible most of the time, just arm64 is a notable outlier where its impossible if the direct map has been setup using hugepages, as arm64 cannot break these apart due to break-before-make semantics).
Signed-off-by: Patrick Roy roypat@amazon.co.uk --- include/uapi/linux/kvm.h | 1 + virt/kvm/kvm_main.c | 5 +++++ 2 files changed, 6 insertions(+)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 4654c01a0a01..fb02a93546d8 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -930,6 +930,7 @@ struct kvm_enable_cap { #define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237 #define KVM_CAP_X86_GUEST_MODE 238 #define KVM_CAP_GMEM_SHARED_MEM 239 +#define KVM_CAP_GMEM_NO_DIRECT_MAP 240
struct kvm_irq_routing_irqchip { __u32 irqchip; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 3e40acb9f5c0..32ca1c921ab0 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -65,6 +65,7 @@ #include <trace/events/kvm.h>
#include <linux/kvm_dirty_ring.h> +#include <linux/set_memory.h>
/* Worst case buffer size needed for holding an integer. */ @@ -4823,6 +4824,10 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) return kvm_supported_mem_attributes(kvm); #endif #ifdef CONFIG_KVM_PRIVATE_MEM + case KVM_CAP_GMEM_NO_DIRECT_MAP: + if (!can_set_direct_map()) + return false; + fallthrough; case KVM_CAP_GUEST_MEMFD: return !kvm || kvm_arch_has_private_mem(kvm); #endif
Signed-off-by: Patrick Roy roypat@amazon.co.uk --- Documentation/virt/kvm/api.rst | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 2b52eb77e29c..fc0d2314fae6 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6388,6 +6388,19 @@ a single guest_memfd file, but the bound ranges must not overlap).
See KVM_SET_USER_MEMORY_REGION2 for additional details.
+The following flags are defined: + +KVM_GMEM_NO_DIRECT_MAP + Ensure memory backing this guest_memfd inode is unmapped from the kernel's + address space. Check KVM_CAP_GMEM_NO_DIRECT_MAP for support. + +Errors: + + ========== =============================================================== + EINVAL The specified `flags` were invalid or not supported. + ========== =============================================================== + + 4.143 KVM_PRE_FAULT_MEMORY ---------------------------
If guest memory is backed using a VMA that does not allow GUP (e.g. a userspace mapping of guest_memfd when the fd was allocated using KVM_GMEM_NO_DIRECT_MAP), then directly loading the test ELF binary into it via read(2) potentially does not work. To nevertheless support loading binaries in this cases, do the read(2) syscall using a bounce buffer, and then memcpy from the bounce buffer into guest memory.
Signed-off-by: Patrick Roy roypat@amazon.co.uk --- .../testing/selftests/kvm/include/test_util.h | 1 + tools/testing/selftests/kvm/lib/elf.c | 8 +++---- tools/testing/selftests/kvm/lib/io.c | 23 +++++++++++++++++++ 3 files changed, 28 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h index 3e473058849f..51f34c34b5a2 100644 --- a/tools/testing/selftests/kvm/include/test_util.h +++ b/tools/testing/selftests/kvm/include/test_util.h @@ -46,6 +46,7 @@ do { \
ssize_t test_write(int fd, const void *buf, size_t count); ssize_t test_read(int fd, void *buf, size_t count); +ssize_t test_read_bounce(int fd, void *buf, size_t count); int test_seq_read(const char *path, char **bufp, size_t *sizep);
void __printf(5, 6) test_assert(bool exp, const char *exp_str, diff --git a/tools/testing/selftests/kvm/lib/elf.c b/tools/testing/selftests/kvm/lib/elf.c index f34d926d9735..e829fbe0a11e 100644 --- a/tools/testing/selftests/kvm/lib/elf.c +++ b/tools/testing/selftests/kvm/lib/elf.c @@ -31,7 +31,7 @@ static void elfhdr_get(const char *filename, Elf64_Ehdr *hdrp) * the real size of the ELF header. */ unsigned char ident[EI_NIDENT]; - test_read(fd, ident, sizeof(ident)); + test_read_bounce(fd, ident, sizeof(ident)); TEST_ASSERT((ident[EI_MAG0] == ELFMAG0) && (ident[EI_MAG1] == ELFMAG1) && (ident[EI_MAG2] == ELFMAG2) && (ident[EI_MAG3] == ELFMAG3), "ELF MAGIC Mismatch,\n" @@ -79,7 +79,7 @@ static void elfhdr_get(const char *filename, Elf64_Ehdr *hdrp) offset_rv = lseek(fd, 0, SEEK_SET); TEST_ASSERT(offset_rv == 0, "Seek to ELF header failed,\n" " rv: %zi expected: %i", offset_rv, 0); - test_read(fd, hdrp, sizeof(*hdrp)); + test_read_bounce(fd, hdrp, sizeof(*hdrp)); TEST_ASSERT(hdrp->e_phentsize == sizeof(Elf64_Phdr), "Unexpected physical header size,\n" " hdrp->e_phentsize: %x\n" @@ -146,7 +146,7 @@ void kvm_vm_elf_load(struct kvm_vm *vm, const char *filename)
/* Read in the program header. */ Elf64_Phdr phdr; - test_read(fd, &phdr, sizeof(phdr)); + test_read_bounce(fd, &phdr, sizeof(phdr));
/* Skip if this header doesn't describe a loadable segment. */ if (phdr.p_type != PT_LOAD) @@ -187,7 +187,7 @@ void kvm_vm_elf_load(struct kvm_vm *vm, const char *filename) " expected: 0x%jx", n1, errno, (intmax_t) offset_rv, (intmax_t) phdr.p_offset); - test_read(fd, addr_gva2hva(vm, phdr.p_vaddr), + test_read_bounce(fd, addr_gva2hva(vm, phdr.p_vaddr), phdr.p_filesz); } } diff --git a/tools/testing/selftests/kvm/lib/io.c b/tools/testing/selftests/kvm/lib/io.c index fedb2a741f0b..a89b43cc2ebc 100644 --- a/tools/testing/selftests/kvm/lib/io.c +++ b/tools/testing/selftests/kvm/lib/io.c @@ -155,3 +155,26 @@ ssize_t test_read(int fd, void *buf, size_t count)
return num_read; } + +/* Test read via intermediary buffer + * + * Same as test_read, except read(2)s happen into a bounce buffer that is memcpy'd + * to buf. For use with buffers that cannot be GUP'd (e.g. guest_memfd VMAs if + * guest_memfd was allocated with KVM_GMEM_NO_DIRECT_MAP). + */ +ssize_t test_read_bounce(int fd, void *buf, size_t count) +{ + void *bounce_buffer; + ssize_t num_read; + + TEST_ASSERT(count >= 0, "Unexpected count, count: %li", count); + + bounce_buffer = malloc(count); + TEST_ASSERT(bounce_buffer != NULL, "Failed to allocate bounce buffer"); + + num_read = test_read(fd, bounce_buffer, count); + memcpy(buf, bounce_buffer, num_read); + free(bounce_buffer); + + return num_read; +}
Have vm_mem_add() always set KVM_MEM_GUEST_MEMFD in the memslot flags if a guest_memfd is passed in as an argument. This eliminates the possibility where a guest_memfd instance is passed to vm_mem_add(), but it ends up being ignored because the flags argument does not specify KVM_MEM_GUEST_MEMFD at the same time.
This makes it easy to support more scenarios in which no vm_mem_add() is not passed a guest_memfd instance, but is expected to allocate one. Currently, this only happens if guest_memfd == -1 but flags & KVM_MEM_GUEST_MEMFD != 0, but later vm_mem_add() will gain support for loading the test code itself into guest_memfd (via KVM_GMEM_SHARED_MEM) if requested via a special vm_mem_backing_src_type, at which point having to make sure the src_type and flags are in-sync becomes cumbersome.
Signed-off-by: Patrick Roy roypat@amazon.co.uk --- tools/testing/selftests/kvm/lib/kvm_util.c | 26 +++++++++++++--------- 1 file changed, 15 insertions(+), 11 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 33fefeb3ca44..ebdf38e2983b 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -1017,22 +1017,26 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
region->backing_src_type = src_type;
- if (flags & KVM_MEM_GUEST_MEMFD) { - if (guest_memfd < 0) { + if (guest_memfd < 0) { + if (flags & KVM_MEM_GUEST_MEMFD) { uint32_t guest_memfd_flags = 0; TEST_ASSERT(!guest_memfd_offset, "Offset must be zero when creating new guest_memfd"); guest_memfd = vm_create_guest_memfd(vm, mem_size, guest_memfd_flags); - } else { - /* - * Install a unique fd for each memslot so that the fd - * can be closed when the region is deleted without - * needing to track if the fd is owned by the framework - * or by the caller. - */ - guest_memfd = dup(guest_memfd); - TEST_ASSERT(guest_memfd >= 0, __KVM_SYSCALL_ERROR("dup()", guest_memfd)); } + } else { + /* + * Install a unique fd for each memslot so that the fd + * can be closed when the region is deleted without + * needing to track if the fd is owned by the framework + * or by the caller. + */ + guest_memfd = dup(guest_memfd); + TEST_ASSERT(guest_memfd >= 0, __KVM_SYSCALL_ERROR("dup()", guest_memfd)); + } + + if (guest_memfd > 0) { + flags |= KVM_MEM_GUEST_MEMFD;
region->region.guest_memfd = guest_memfd; region->region.guest_memfd_offset = guest_memfd_offset;
Allow selftests to configure their memslots such that userspace_addr is set to a MAP_SHARED mapping of the guest_memfd that's associated with the memslot. This setup is the configuration for non-CoCo VMs, where all guest memory is backed by a guest_memfd whose folios are all marked shared, but KVM is still able to access guest memory to provide functionality such as MMIO emulation on x86.
Add backing types for normal guest_memfd, as well as direct map removed guest_memfd.
If KVM_CAP_MEMORY_ATTRIBUTES is available, explicitly set gmem-enabled memslots to private, as otherwise guest page faults will be resolved by GUP-ing the guest_memfd VMA (instead of using the special VMA-less guest_memfd fault code in the KVM MMU), but this is not always supported (e.g. if direct map entries are not available).
Signed-off-by: Patrick Roy roypat@amazon.co.uk --- .../testing/selftests/kvm/include/kvm_util.h | 10 +++ .../testing/selftests/kvm/include/test_util.h | 7 ++ tools/testing/selftests/kvm/lib/kvm_util.c | 67 ++++++++++--------- tools/testing/selftests/kvm/lib/test_util.c | 8 +++ 4 files changed, 62 insertions(+), 30 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h index 4c4e5a847f67..baeddec7c264 100644 --- a/tools/testing/selftests/kvm/include/kvm_util.h +++ b/tools/testing/selftests/kvm/include/kvm_util.h @@ -544,6 +544,16 @@ static inline uint64_t vm_get_stat(struct kvm_vm *vm, const char *stat_name)
void vm_create_irqchip(struct kvm_vm *vm);
+static inline bool backing_src_guest_memfd_flags(enum vm_mem_backing_src_type t) +{ + switch (t) { + case VM_MEM_SRC_GUEST_MEMFD_NO_DIRECT_MAP: + return KVM_GMEM_NO_DIRECT_MAP; + default: + return 0; + } +} + static inline int __vm_create_guest_memfd(struct kvm_vm *vm, uint64_t size, uint64_t flags) { diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h index 51f34c34b5a2..2469df886d7a 100644 --- a/tools/testing/selftests/kvm/include/test_util.h +++ b/tools/testing/selftests/kvm/include/test_util.h @@ -133,6 +133,8 @@ enum vm_mem_backing_src_type { VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB, VM_MEM_SRC_SHMEM, VM_MEM_SRC_SHARED_HUGETLB, + VM_MEM_SRC_GUEST_MEMFD, + VM_MEM_SRC_GUEST_MEMFD_NO_DIRECT_MAP, NUM_SRC_TYPES, };
@@ -164,6 +166,11 @@ static inline bool backing_src_is_shared(enum vm_mem_backing_src_type t) return vm_mem_backing_src_alias(t)->flag & MAP_SHARED; }
+static inline bool backing_src_is_guest_memfd(enum vm_mem_backing_src_type t) +{ + return t == VM_MEM_SRC_GUEST_MEMFD || t == VM_MEM_SRC_GUEST_MEMFD_NO_DIRECT_MAP; +} + static inline bool backing_src_can_be_huge(enum vm_mem_backing_src_type t) { return t != VM_MEM_SRC_ANONYMOUS && t != VM_MEM_SRC_SHMEM; diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index ebdf38e2983b..0900809bf6ac 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -970,6 +970,34 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type, alignment = 1; #endif
+ if (guest_memfd < 0) { + if ((flags & KVM_MEM_GUEST_MEMFD) || backing_src_is_guest_memfd(src_type)) { + uint32_t guest_memfd_flags = backing_src_guest_memfd_flags(src_type); + + TEST_ASSERT(!guest_memfd_offset, + "Offset must be zero when creating new guest_memfd"); + guest_memfd = vm_create_guest_memfd(vm, mem_size, guest_memfd_flags); + } + } else { + /* + * Install a unique fd for each memslot so that the fd + * can be closed when the region is deleted without + * needing to track if the fd is owned by the framework + * or by the caller. + */ + guest_memfd = dup(guest_memfd); + TEST_ASSERT(guest_memfd >= 0, __KVM_SYSCALL_ERROR("dup()", guest_memfd)); + } + + if (guest_memfd > 0) { + flags |= KVM_MEM_GUEST_MEMFD; + + region->region.guest_memfd = guest_memfd; + region->region.guest_memfd_offset = guest_memfd_offset; + } else { + region->region.guest_memfd = -1; + } + /* * When using THP mmap is not guaranteed to returned a hugepage aligned * address so we have to pad the mmap. Padding is not needed for HugeTLB @@ -985,10 +1013,13 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type, if (alignment > 1) region->mmap_size += alignment;
- region->fd = -1; - if (backing_src_is_shared(src_type)) + if (backing_src_is_guest_memfd(src_type)) + region->fd = guest_memfd; + else if (backing_src_is_guest_memfd(src_type)) region->fd = kvm_memfd_alloc(region->mmap_size, src_type == VM_MEM_SRC_SHARED_HUGETLB); + else + region->fd = -1;
region->mmap_start = mmap(NULL, region->mmap_size, PROT_READ | PROT_WRITE, @@ -1016,34 +1047,6 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type, }
region->backing_src_type = src_type; - - if (guest_memfd < 0) { - if (flags & KVM_MEM_GUEST_MEMFD) { - uint32_t guest_memfd_flags = 0; - TEST_ASSERT(!guest_memfd_offset, - "Offset must be zero when creating new guest_memfd"); - guest_memfd = vm_create_guest_memfd(vm, mem_size, guest_memfd_flags); - } - } else { - /* - * Install a unique fd for each memslot so that the fd - * can be closed when the region is deleted without - * needing to track if the fd is owned by the framework - * or by the caller. - */ - guest_memfd = dup(guest_memfd); - TEST_ASSERT(guest_memfd >= 0, __KVM_SYSCALL_ERROR("dup()", guest_memfd)); - } - - if (guest_memfd > 0) { - flags |= KVM_MEM_GUEST_MEMFD; - - region->region.guest_memfd = guest_memfd; - region->region.guest_memfd_offset = guest_memfd_offset; - } else { - region->region.guest_memfd = -1; - } - region->unused_phy_pages = sparsebit_alloc(); if (vm_arch_has_protected_memory(vm)) region->protected_phy_pages = sparsebit_alloc(); @@ -1063,6 +1066,10 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type, guest_paddr, (uint64_t) region->region.memory_size, region->region.guest_memfd);
+ if (region->region.guest_memfd != -1 && kvm_has_cap(KVM_CAP_MEMORY_ATTRIBUTES)) + vm_set_memory_attributes(vm, region->region.guest_phys_addr, + region->region.memory_size, KVM_MEMORY_ATTRIBUTE_PRIVATE); + /* Add to quick lookup data structures */ vm_userspace_mem_region_gpa_insert(&vm->regions.gpa_tree, region); vm_userspace_mem_region_hva_insert(&vm->regions.hva_tree, region); diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c index 8ed0b74ae837..1a5b0d5d5f91 100644 --- a/tools/testing/selftests/kvm/lib/test_util.c +++ b/tools/testing/selftests/kvm/lib/test_util.c @@ -279,6 +279,14 @@ const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i) */ .flag = MAP_SHARED, }, + [VM_MEM_SRC_GUEST_MEMFD] = { + .name = "guest_memfd", + .flag = MAP_SHARED, + }, + [VM_MEM_SRC_GUEST_MEMFD_NO_DIRECT_MAP] = { + .name = "guest_memfd_no_direct_map", + .flag = MAP_SHARED, + } }; _Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES, "Missing new backing src types?");
Use one of the padding fields in struct vm_shape to carry an enum vm_mem_backing_src_type value, to give the option to overwrite the default of VM_MEM_SRC_ANONYMOUS in __vm_create().
Overwriting this default will allow tests to create VMs where the test code is backed by mmap'd guest_memfd instead of anonymous memory.
Signed-off-by: Patrick Roy roypat@amazon.co.uk --- .../testing/selftests/kvm/include/kvm_util.h | 19 ++++++++++--------- tools/testing/selftests/kvm/lib/kvm_util.c | 2 +- tools/testing/selftests/kvm/lib/x86/sev.c | 1 + .../selftests/kvm/pre_fault_memory_test.c | 1 + 4 files changed, 13 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h index baeddec7c264..170e43d0bdf1 100644 --- a/tools/testing/selftests/kvm/include/kvm_util.h +++ b/tools/testing/selftests/kvm/include/kvm_util.h @@ -180,7 +180,7 @@ enum vm_guest_mode { struct vm_shape { uint32_t type; uint8_t mode; - uint8_t pad0; + uint8_t src_type; uint16_t pad1; };
@@ -188,14 +188,15 @@ kvm_static_assert(sizeof(struct vm_shape) == sizeof(uint64_t));
#define VM_TYPE_DEFAULT 0
-#define VM_SHAPE(__mode) \ -({ \ - struct vm_shape shape = { \ - .mode = (__mode), \ - .type = VM_TYPE_DEFAULT \ - }; \ - \ - shape; \ +#define VM_SHAPE(__mode) \ +({ \ + struct vm_shape shape = { \ + .mode = (__mode), \ + .type = VM_TYPE_DEFAULT, \ + .src_type = VM_MEM_SRC_ANONYMOUS \ + }; \ + \ + shape; \ })
#if defined(__aarch64__) diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 0900809bf6ac..43c7af269beb 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -420,7 +420,7 @@ struct kvm_vm *__vm_create(struct vm_shape shape, uint32_t nr_runnable_vcpus,
vm = ____vm_create(shape);
- vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, 0, 0, nr_pages, 0); + vm_userspace_mem_region_add(vm, shape.src_type, 0, 0, nr_pages, 0); for (i = 0; i < NR_MEM_REGIONS; i++) vm->memslots[i] = 0;
diff --git a/tools/testing/selftests/kvm/lib/x86/sev.c b/tools/testing/selftests/kvm/lib/x86/sev.c index e9535ee20b7f..802e9db18235 100644 --- a/tools/testing/selftests/kvm/lib/x86/sev.c +++ b/tools/testing/selftests/kvm/lib/x86/sev.c @@ -118,6 +118,7 @@ struct kvm_vm *vm_sev_create_with_one_vcpu(uint32_t type, void *guest_code, struct vm_shape shape = { .mode = VM_MODE_DEFAULT, .type = type, + .src_type = VM_MEM_SRC_ANONYMOUS, }; struct kvm_vm *vm; struct kvm_vcpu *cpus[1]; diff --git a/tools/testing/selftests/kvm/pre_fault_memory_test.c b/tools/testing/selftests/kvm/pre_fault_memory_test.c index 0350a8896a2f..d403f8d2f26f 100644 --- a/tools/testing/selftests/kvm/pre_fault_memory_test.c +++ b/tools/testing/selftests/kvm/pre_fault_memory_test.c @@ -68,6 +68,7 @@ static void __test_pre_fault_memory(unsigned long vm_type, bool private) const struct vm_shape shape = { .mode = VM_MODE_DEFAULT, .type = vm_type, + .src_type = VM_MEM_SRC_ANONYMOUS, }; struct kvm_vcpu *vcpu; struct kvm_run *run;
BIT(0) is now a valid flag, corresponding to KVM_GMEM_NO_DIRECT_MAP, so adjust test_create_guest_memfd_invalid to no longer assert it as an invalid flag.
Signed-off-by: Patrick Roy roypat@amazon.co.uk --- tools/testing/selftests/kvm/guest_memfd_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c index 38c501e49e0e..b2e7d4c96802 100644 --- a/tools/testing/selftests/kvm/guest_memfd_test.c +++ b/tools/testing/selftests/kvm/guest_memfd_test.c @@ -170,7 +170,7 @@ static void test_create_guest_memfd_invalid(struct kvm_vm *vm) size); }
- for (flag = BIT(0); flag; flag <<= 1) { + for (flag = BIT(1); flag; flag <<= 1) { fd = __vm_create_guest_memfd(vm, page_size, flag); TEST_ASSERT(fd == -1 && errno == EINVAL, "guest_memfd() with flag '0x%lx' should fail with EINVAL",
Cover the scenario that the guest can fault in and write gmem-backed guest memory even if its direct map removed.
Signed-off-by: Patrick Roy roypat@amazon.co.uk --- .../selftests/kvm/x86/private_mem_conversions_test.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c index 82a8d88b5338..dfc78781e93b 100644 --- a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c +++ b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c @@ -367,7 +367,7 @@ static void *__test_mem_conversions(void *__vcpu) }
static void test_mem_conversions(enum vm_mem_backing_src_type src_type, uint32_t nr_vcpus, - uint32_t nr_memslots) + uint32_t nr_memslots, uint64_t gmem_flags) { /* * Allocate enough memory so that each vCPU's chunk of memory can be @@ -394,7 +394,7 @@ static void test_mem_conversions(enum vm_mem_backing_src_type src_type, uint32_t
vm_enable_cap(vm, KVM_CAP_EXIT_HYPERCALL, (1 << KVM_HC_MAP_GPA_RANGE));
- memfd = vm_create_guest_memfd(vm, memfd_size, 0); + memfd = vm_create_guest_memfd(vm, memfd_size, gmem_flags);
for (i = 0; i < nr_memslots; i++) vm_mem_add(vm, src_type, BASE_DATA_GPA + slot_size * i, @@ -477,7 +477,8 @@ int main(int argc, char *argv[]) } }
- test_mem_conversions(src_type, nr_vcpus, nr_memslots); + test_mem_conversions(src_type, nr_vcpus, nr_memslots, 0); + test_mem_conversions(src_type, nr_vcpus, nr_memslots, KVM_GMEM_NO_DIRECT_MAP);
return 0; }
Add a selftest that loads itself into guest_memfd (via KVM_GMEM_SHARED_MEM) and triggers an MMIO exit when executed. This exercises x86 MMIO emulation code inside KVM for guest_memfd-backed memslots where the guest_memfd folios are direct map removed. Particularly, it validates that x86 MMIO emulation code (guest page table walks + instruction fetch) correctly accesses gmem through the VMA that's been reflected into the memslot's userspace_addr field (instead of trying to do direct map accesses).
Signed-off-by: Patrick Roy roypat@amazon.co.uk --- .../selftests/kvm/set_memory_region_test.c | 40 +++++++++++++++++++ 1 file changed, 40 insertions(+)
diff --git a/tools/testing/selftests/kvm/set_memory_region_test.c b/tools/testing/selftests/kvm/set_memory_region_test.c index bc440d5aba57..16bbfe117a17 100644 --- a/tools/testing/selftests/kvm/set_memory_region_test.c +++ b/tools/testing/selftests/kvm/set_memory_region_test.c @@ -603,6 +603,42 @@ static void test_mmio_during_vectoring(void)
kvm_vm_free(vm); } + +static void guest_code_trigger_mmio(void) +{ + /* + * Read some GPA that is not backed by a memslot. KVM consider this + * as MMIO and tell userspace to emulate the read. + */ + READ_ONCE(*((uint64_t *)MEM_REGION_GPA)); + + GUEST_DONE(); +} + +static void test_guest_memfd_mmio(void) +{ + struct kvm_vm *vm; + struct kvm_vcpu *vcpu; + struct vm_shape shape = { + .mode = VM_MODE_DEFAULT, + .type = KVM_X86_SW_PROTECTED_VM, + .src_type = VM_MEM_SRC_GUEST_MEMFD_NO_DIRECT_MAP, + }; + pthread_t vcpu_thread; + + pr_info("Testing MMIO emulation for instructions in gmem\n"); + + vm = __vm_create_shape_with_one_vcpu(shape, &vcpu, 0, guest_code_trigger_mmio); + + virt_map(vm, MEM_REGION_GPA, MEM_REGION_GPA, 1); + + pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu); + + /* If the MMIO read was successfully emulated, the vcpu thread will exit */ + pthread_join(vcpu_thread, NULL); + + kvm_vm_free(vm); +} #endif
int main(int argc, char *argv[]) @@ -630,6 +666,10 @@ int main(int argc, char *argv[]) (kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM))) { test_add_private_memory_region(); test_add_overlapping_private_memory_regions(); + if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM) && kvm_has_cap(KVM_CAP_GMEM_NO_DIRECT_MAP)) + test_guest_memfd_mmio(); + else + pr_info("Skipping tests requiring KVM_CAP_GMEM_SHARED_MEM | KVM_CAP_GMEM_NO_DIRECT_MAP"); } else { pr_info("Skipping tests for KVM_MEM_GUEST_MEMFD memory regions\n"); }
linux-kselftest-mirror@lists.linaro.org