This series implements selftests targeting the feature floated by Chao via: https://lore.kernel.org/lkml/20221202061347.1070246-10-chao.p.peng@linux.int...
Below changes aim to test the fd based approach for guest private memory in context of normal (non-confidential) VMs executing on non-confidential platforms.
private_mem_test.c file adds selftest to access private memory from the guest via private/shared accesses and checking if the contents can be leaked to/accessed by vmm via shared memory view before/after conversions.
Updates in V2: 1) Simplified vcpu run loop implementation API 2) Removed VM creation logic from private mem library
Updates in V1 (Compared to RFC v3 patches): 1) Incorporated suggestions from Sean around simplifying KVM changes 2) Addressed comments from Sean 3) Added private mem test with shared memory backed by 2MB hugepages.
V1 series: https://lore.kernel.org/lkml/20221111014244.1714148-1-vannapurve@google.com/...
This series has dependency on following patches: 1) V10 series patches from Chao mentioned above.
Github link for the patches posted as part of this series: https://github.com/vishals4gh/linux/commits/priv_memfd_selftests_v2
Vishal Annapurve (6): KVM: x86: Add support for testing private memory KVM: Selftests: Add support for private memory KVM: selftests: x86: Add IS_ALIGNED/IS_PAGE_ALIGNED helpers KVM: selftests: x86: Add helpers to execute VMs with private memory KVM: selftests: Add get_free_huge_2m_pages KVM: selftests: x86: Add selftest for private memory
arch/x86/kvm/mmu/mmu_internal.h | 6 +- tools/testing/selftests/kvm/.gitignore | 1 + tools/testing/selftests/kvm/Makefile | 2 + .../selftests/kvm/include/kvm_util_base.h | 15 +- .../testing/selftests/kvm/include/test_util.h | 5 + .../kvm/include/x86_64/private_mem.h | 24 ++ .../selftests/kvm/include/x86_64/processor.h | 1 + tools/testing/selftests/kvm/lib/kvm_util.c | 58 ++++- tools/testing/selftests/kvm/lib/test_util.c | 29 +++ .../selftests/kvm/lib/x86_64/private_mem.c | 139 ++++++++++++ .../selftests/kvm/x86_64/private_mem_test.c | 212 ++++++++++++++++++ virt/kvm/Kconfig | 4 + virt/kvm/kvm_main.c | 3 +- 13 files changed, 490 insertions(+), 9 deletions(-) create mode 100644 tools/testing/selftests/kvm/include/x86_64/private_mem.h create mode 100644 tools/testing/selftests/kvm/lib/x86_64/private_mem.c create mode 100644 tools/testing/selftests/kvm/x86_64/private_mem_test.c
Introduce HAVE_KVM_PRIVATE_MEM_TESTING config to be able to test fd based approach to support private memory with non-confidential selftest VMs. To support this testing few important aspects need to be considered from the perspective of selftests - * KVM needs to know whether the access from guest VM is private or shared. Confidential VMs (SNP/TDX) carry a dedicated bit in gpa that can be used by KVM to deduce the nature of the access. Non-confidential VMs don't have mechanism to carry/convey such an information to KVM. So KVM just relies on what attributes are set by userspace VMM keeping the userspace VMM in the TCB for the testing purposes. * arch_private_mem_supported is updated to allow private memory logic to work with non-confidential vm selftests.
Signed-off-by: Vishal Annapurve vannapurve@google.com --- arch/x86/kvm/mmu/mmu_internal.h | 6 +++++- virt/kvm/Kconfig | 4 ++++ virt/kvm/kvm_main.c | 3 ++- 3 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 5ccf08183b00..e2f508db0b6e 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -263,6 +263,8 @@ enum { static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u32 err, bool prefetch) { + bool is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault); + struct kvm_page_fault fault = { .addr = cr2_or_gpa, .error_code = err, @@ -272,13 +274,15 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, .rsvd = err & PFERR_RSVD_MASK, .user = err & PFERR_USER_MASK, .prefetch = prefetch, - .is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault), + .is_tdp = is_tdp, .nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(vcpu->kvm),
.max_level = KVM_MAX_HUGEPAGE_LEVEL, .req_level = PG_LEVEL_4K, .goal_level = PG_LEVEL_4K, + .is_private = IS_ENABLED(CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING) && is_tdp && + kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT), }; int r;
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index d605545d6dd1..1e326367a21c 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -92,3 +92,7 @@ config HAVE_KVM_PM_NOTIFIER
config HAVE_KVM_RESTRICTED_MEM bool + +config HAVE_KVM_PRIVATE_MEM_TESTING + bool "Private memory selftests" + depends on HAVE_KVM_MEMORY_ATTRIBUTES && HAVE_KVM_RESTRICTED_MEM diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index ac835fc77273..d2d829d23442 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1262,7 +1262,8 @@ int __weak kvm_arch_create_vm_debugfs(struct kvm *kvm)
bool __weak kvm_arch_has_private_mem(struct kvm *kvm) { - return false; + return IS_ENABLED(CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING); + }
static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
On Mon, Dec 05, 2022, Vishal Annapurve wrote:
Introduce HAVE_KVM_PRIVATE_MEM_TESTING config to be able to test fd based @@ -272,13 +274,15 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, .rsvd = err & PFERR_RSVD_MASK, .user = err & PFERR_USER_MASK, .prefetch = prefetch,
.is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
.nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(vcpu->kvm),.is_tdp = is_tdp,
.max_level = KVM_MAX_HUGEPAGE_LEVEL, .req_level = PG_LEVEL_4K, .goal_level = PG_LEVEL_4K,
.is_private = IS_ENABLED(CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING) && is_tdp &&
kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
After looking at the SNP+UPM series, I think we should forego a dedicated Kconfig for testing and instead add a new VM type for UPM-capable guests. The new type, e.g. KVM_X86_PROTECTED_VM, can then be used to leverage UPM for "legacy" SEV and SEV-ES guests, as well as for UPM-capable guests that don't utilize per-VM memory encryption, e.g. KVM selftests.
Carrying test-only behavior is obviously never ideal, and it would pretty much have to be mutually exclusive with "real" usage of UPM, otherwise the KVM logics gets unnecessarily complex.
On Tue, Jan 17, 2023 at 1:39 PM Sean Christopherson seanjc@google.com wrote:
On Mon, Dec 05, 2022, Vishal Annapurve wrote:
Introduce HAVE_KVM_PRIVATE_MEM_TESTING config to be able to test fd based @@ -272,13 +274,15 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, .rsvd = err & PFERR_RSVD_MASK, .user = err & PFERR_USER_MASK, .prefetch = prefetch,
.is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
.is_tdp = is_tdp, .nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(vcpu->kvm), .max_level = KVM_MAX_HUGEPAGE_LEVEL, .req_level = PG_LEVEL_4K, .goal_level = PG_LEVEL_4K,
.is_private = IS_ENABLED(CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING) && is_tdp &&
kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
After looking at the SNP+UPM series, I think we should forego a dedicated Kconfig for testing and instead add a new VM type for UPM-capable guests. The new type, e.g. KVM_X86_PROTECTED_VM, can then be used to leverage UPM for "legacy" SEV and SEV-ES guests, as well as for UPM-capable guests that don't utilize per-VM memory encryption, e.g. KVM selftests.
Carrying test-only behavior is obviously never ideal, and it would pretty much have to be mutually exclusive with "real" usage of UPM, otherwise the KVM logics gets unnecessarily complex.
Ack, the newly added VM type fits better here with SEV/SEV-ES and non-confidential selftests being able to share this framework.
Add support for registering private memory with kvm using KVM_SET_USER_MEMORY_REGION ioctl.
Helper function to query extended userspace mem region is introduced to allow memory conversion.
vm_mem_backing_src types is extended to contain additional guest memory source types to cover the cases where guest memory can be backed by both anonymous memory and restricted memfd.
Signed-off-by: Vishal Annapurve vannapurve@google.com --- .../selftests/kvm/include/kvm_util_base.h | 12 +++- .../testing/selftests/kvm/include/test_util.h | 4 ++ tools/testing/selftests/kvm/lib/kvm_util.c | 58 +++++++++++++++++-- tools/testing/selftests/kvm/lib/test_util.c | 11 ++++ 4 files changed, 78 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h index c7685c7038ff..4ad99f295f2a 100644 --- a/tools/testing/selftests/kvm/include/kvm_util_base.h +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h @@ -31,7 +31,10 @@ typedef uint64_t vm_paddr_t; /* Virtual Machine (Guest) physical address */ typedef uint64_t vm_vaddr_t; /* Virtual Machine (Guest) virtual address */
struct userspace_mem_region { - struct kvm_userspace_memory_region region; + union { + struct kvm_userspace_memory_region region; + struct kvm_userspace_memory_region_ext region_ext; + }; struct sparsebit *unused_phy_pages; int fd; off_t offset; @@ -196,7 +199,7 @@ static inline bool kvm_has_cap(long cap)
#define kvm_do_ioctl(fd, cmd, arg) \ ({ \ - static_assert(!_IOC_SIZE(cmd) || sizeof(*arg) == _IOC_SIZE(cmd), ""); \ + static_assert(!_IOC_SIZE(cmd) || sizeof(*arg) >= _IOC_SIZE(cmd), ""); \ ioctl(fd, cmd, arg); \ })
@@ -384,6 +387,7 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, void vm_mem_region_set_flags(struct kvm_vm *vm, uint32_t slot, uint32_t flags); void vm_mem_region_move(struct kvm_vm *vm, uint32_t slot, uint64_t new_gpa); void vm_mem_region_delete(struct kvm_vm *vm, uint32_t slot); + struct kvm_vcpu *__vm_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id); vm_vaddr_t vm_vaddr_unused_gap(struct kvm_vm *vm, size_t sz, vm_vaddr_t vaddr_min); vm_vaddr_t vm_vaddr_alloc(struct kvm_vm *vm, size_t sz, vm_vaddr_t vaddr_min); @@ -715,6 +719,10 @@ struct kvm_userspace_memory_region * kvm_userspace_memory_region_find(struct kvm_vm *vm, uint64_t start, uint64_t end);
+struct kvm_userspace_memory_region_ext * +kvm_userspace_memory_region_ext_find(struct kvm_vm *vm, uint64_t start, + uint64_t end); + #define sync_global_to_guest(vm, g) ({ \ typeof(g) *_p = addr_gva2hva(vm, (vm_vaddr_t)&(g)); \ memcpy(_p, &(g), sizeof(g)); \ diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h index 80d6416f3012..aea80071f2b8 100644 --- a/tools/testing/selftests/kvm/include/test_util.h +++ b/tools/testing/selftests/kvm/include/test_util.h @@ -103,6 +103,8 @@ enum vm_mem_backing_src_type { VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB, VM_MEM_SRC_SHMEM, VM_MEM_SRC_SHARED_HUGETLB, + VM_MEM_SRC_ANONYMOUS_AND_RESTRICTED_MEMFD, + VM_MEM_SRC_ANON_HTLB2M_AND_RESTRICTED_MEMFD, NUM_SRC_TYPES, };
@@ -110,7 +112,9 @@ enum vm_mem_backing_src_type {
struct vm_mem_backing_src_alias { const char *name; + /* Flags applicable for normal host accessible guest memory */ uint32_t flag; + uint32_t need_restricted_memfd; };
#define MIN_RUN_DELAY_NS 200000UL diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 1d26a2160178..dba693d6446a 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -32,6 +32,11 @@ int open_path_or_exit(const char *path, int flags) return fd; }
+static int memfd_restricted(unsigned int flags) +{ + return syscall(__NR_memfd_restricted, flags); +} + /* * Open KVM_DEV_PATH if available, otherwise exit the entire program. * @@ -582,6 +587,35 @@ __weak void vcpu_arch_free(struct kvm_vcpu *vcpu)
}
+/* + * KVM Userspace Memory Region Ext Find + * + * Input Args: + * vm - Virtual Machine + * start - Starting VM physical address + * end - Ending VM physical address, inclusive. + * + * Output Args: None + * + * Return: + * Pointer to overlapping ext region, NULL if no such region. + * + * Public interface to userspace_mem_region_find. Allows tests to look up + * the memslot datastructure for a given range of guest physical memory. + */ +struct kvm_userspace_memory_region_ext * +kvm_userspace_memory_region_ext_find(struct kvm_vm *vm, uint64_t start, + uint64_t end) +{ + struct userspace_mem_region *region; + + region = userspace_mem_region_find(vm, start, end); + if (!region) + return NULL; + + return ®ion->region_ext; +} + /* * VM VCPU Remove * @@ -881,6 +915,7 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, struct userspace_mem_region *region; size_t backing_src_pagesz = get_backing_src_pagesz(src_type); size_t alignment; + int restricted_memfd = -1;
TEST_ASSERT(vm_adjust_num_guest_pages(vm->mode, npages) == npages, "Number of guest pages is not compatible with the host. " @@ -978,14 +1013,24 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
/* As needed perform madvise */ if ((src_type == VM_MEM_SRC_ANONYMOUS || - src_type == VM_MEM_SRC_ANONYMOUS_THP) && thp_configured()) { + src_type == VM_MEM_SRC_ANONYMOUS_THP || + src_type == VM_MEM_SRC_ANONYMOUS_AND_RESTRICTED_MEMFD) && + thp_configured()) { ret = madvise(region->host_mem, npages * vm->page_size, - src_type == VM_MEM_SRC_ANONYMOUS ? MADV_NOHUGEPAGE : MADV_HUGEPAGE); + (src_type == VM_MEM_SRC_ANONYMOUS_THP) ? + MADV_HUGEPAGE : MADV_NOHUGEPAGE); TEST_ASSERT(ret == 0, "madvise failed, addr: %p length: 0x%lx src_type: %s", region->host_mem, npages * vm->page_size, vm_mem_backing_src_alias(src_type)->name); }
+ if (vm_mem_backing_src_alias(src_type)->need_restricted_memfd) { + restricted_memfd = memfd_restricted(0); + TEST_ASSERT(restricted_memfd != -1, + "Failed to create restricted memfd"); + flags |= KVM_MEM_PRIVATE; + } + region->unused_phy_pages = sparsebit_alloc(); sparsebit_set_num(region->unused_phy_pages, guest_paddr >> vm->page_shift, npages); @@ -994,13 +1039,16 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, region->region.guest_phys_addr = guest_paddr; region->region.memory_size = npages * vm->page_size; region->region.userspace_addr = (uintptr_t) region->host_mem; - ret = __vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION, ®ion->region); + region->region_ext.restricted_fd = restricted_memfd; + region->region_ext.restricted_offset = 0; + ret = ioctl(vm->fd, KVM_SET_USER_MEMORY_REGION, ®ion->region_ext); TEST_ASSERT(ret == 0, "KVM_SET_USER_MEMORY_REGION IOCTL failed,\n" " rc: %i errno: %i\n" " slot: %u flags: 0x%x\n" - " guest_phys_addr: 0x%lx size: 0x%lx", + " guest_phys_addr: 0x%lx size: 0x%lx restricted fd: %d\n", ret, errno, slot, flags, - guest_paddr, (uint64_t) region->region.memory_size); + guest_paddr, (uint64_t) region->region.memory_size, + restricted_memfd);
/* Add to quick lookup data structures */ vm_userspace_mem_region_gpa_insert(&vm->regions.gpa_tree, region); diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c index 5c22fa4c2825..d33b98bfe8a3 100644 --- a/tools/testing/selftests/kvm/lib/test_util.c +++ b/tools/testing/selftests/kvm/lib/test_util.c @@ -271,6 +271,16 @@ const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i) */ .flag = MAP_SHARED, }, + [VM_MEM_SRC_ANONYMOUS_AND_RESTRICTED_MEMFD] = { + .name = "anonymous_and_restricted_memfd", + .flag = ANON_FLAGS, + .need_restricted_memfd = 1, + }, + [VM_MEM_SRC_ANON_HTLB2M_AND_RESTRICTED_MEMFD] = { + .name = "anonymous_hugetlb_2mb_and_restricted_memfd", + .flag = ANON_HUGE_FLAGS | MAP_HUGE_2MB, + .need_restricted_memfd = 1, + }, }; _Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES, "Missing new backing src types?"); @@ -289,6 +299,7 @@ size_t get_backing_src_pagesz(uint32_t i) switch (i) { case VM_MEM_SRC_ANONYMOUS: case VM_MEM_SRC_SHMEM: + case VM_MEM_SRC_ANONYMOUS_AND_RESTRICTED_MEMFD: return getpagesize(); case VM_MEM_SRC_ANONYMOUS_THP: return get_trans_hugepagesz();
On Mon, Dec 05, 2022, Vishal Annapurve wrote:
Add support for registering private memory with kvm using KVM_SET_USER_MEMORY_REGION ioctl.
Helper function to query extended userspace mem region is introduced to allow memory conversion.
vm_mem_backing_src types is extended to contain additional guest memory source types to cover the cases where guest memory can be backed by both anonymous memory and restricted memfd.
Signed-off-by: Vishal Annapurve vannapurve@google.com
.../selftests/kvm/include/kvm_util_base.h | 12 +++- .../testing/selftests/kvm/include/test_util.h | 4 ++ tools/testing/selftests/kvm/lib/kvm_util.c | 58 +++++++++++++++++-- tools/testing/selftests/kvm/lib/test_util.c | 11 ++++ 4 files changed, 78 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h index c7685c7038ff..4ad99f295f2a 100644 --- a/tools/testing/selftests/kvm/include/kvm_util_base.h +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h @@ -31,7 +31,10 @@ typedef uint64_t vm_paddr_t; /* Virtual Machine (Guest) physical address */ typedef uint64_t vm_vaddr_t; /* Virtual Machine (Guest) virtual address */ struct userspace_mem_region {
- struct kvm_userspace_memory_region region;
- union {
struct kvm_userspace_memory_region region;
struct kvm_userspace_memory_region_ext region_ext;
As discussed in the UPM series, we're trending towards adding an entirely new struct+ioctl(), kvm_userspace_memory_region2, instead of extending the existing struct. The == -> >= hack you had to add in kvm_do_ioctl() below is one of the reason for that change.
- }; struct sparsebit *unused_phy_pages; int fd; off_t offset;
@@ -196,7 +199,7 @@ static inline bool kvm_has_cap(long cap) #define kvm_do_ioctl(fd, cmd, arg) \ ({ \
- static_assert(!_IOC_SIZE(cmd) || sizeof(*arg) == _IOC_SIZE(cmd), ""); \
- static_assert(!_IOC_SIZE(cmd) || sizeof(*arg) >= _IOC_SIZE(cmd), ""); \ ioctl(fd, cmd, arg); \
}) @@ -384,6 +387,7 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, void vm_mem_region_set_flags(struct kvm_vm *vm, uint32_t slot, uint32_t flags); void vm_mem_region_move(struct kvm_vm *vm, uint32_t slot, uint64_t new_gpa); void vm_mem_region_delete(struct kvm_vm *vm, uint32_t slot);
struct kvm_vcpu *__vm_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id); vm_vaddr_t vm_vaddr_unused_gap(struct kvm_vm *vm, size_t sz, vm_vaddr_t vaddr_min); vm_vaddr_t vm_vaddr_alloc(struct kvm_vm *vm, size_t sz, vm_vaddr_t vaddr_min); @@ -715,6 +719,10 @@ struct kvm_userspace_memory_region * kvm_userspace_memory_region_find(struct kvm_vm *vm, uint64_t start, uint64_t end); +struct kvm_userspace_memory_region_ext * +kvm_userspace_memory_region_ext_find(struct kvm_vm *vm, uint64_t start,
uint64_t end);
#define sync_global_to_guest(vm, g) ({ \ typeof(g) *_p = addr_gva2hva(vm, (vm_vaddr_t)&(g)); \ memcpy(_p, &(g), sizeof(g)); \ diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h index 80d6416f3012..aea80071f2b8 100644 --- a/tools/testing/selftests/kvm/include/test_util.h +++ b/tools/testing/selftests/kvm/include/test_util.h @@ -103,6 +103,8 @@ enum vm_mem_backing_src_type { VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB, VM_MEM_SRC_SHMEM, VM_MEM_SRC_SHARED_HUGETLB,
- VM_MEM_SRC_ANONYMOUS_AND_RESTRICTED_MEMFD,
- VM_MEM_SRC_ANON_HTLB2M_AND_RESTRICTED_MEMFD,
There's no need for a dedicated flag in the backing type, vm_userspace_mem_region_add() already takes the memslot's flags and can simply key off KVM_MEM_PRIVATE.
@@ -881,6 +915,7 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, struct userspace_mem_region *region; size_t backing_src_pagesz = get_backing_src_pagesz(src_type); size_t alignment;
- int restricted_memfd = -1;
No need to initialize to -1, KVM is supposed to ignore the restrictedmem fd if !KVM_MEM_PRIVATE, and if KVM_MEM_PRIVATE is set, selftests must provide a valid fd.
TEST_ASSERT(vm_adjust_num_guest_pages(vm->mode, npages) == npages, "Number of guest pages is not compatible with the host. "
This is what I ended up with after splitting out the conversion to KVM_SET_USER_MEMORY_REGION2 to a separate patch.
-- diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 7c1f81f93ba3..26c6830c1aa1 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -32,6 +32,11 @@ int open_path_or_exit(const char *path, int flags) return fd; }
+static int memfd_restricted(unsigned int flags) +{ + return syscall(__NR_memfd_restricted, flags); +} + /* * Open KVM_DEV_PATH if available, otherwise exit the entire program. * @@ -980,6 +985,15 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, }
region->backing_src_type = src_type; + + if (flags & KVM_MEM_PRIVATE) { + region->region.restricted_fd = memfd_restricted(0); + region->region.restricted_offset = 0; + + TEST_ASSERT(region->region.restricted_fd >= 0, + "Failed to create restricted memfd"); + } + region->unused_phy_pages = sparsebit_alloc(); sparsebit_set_num(region->unused_phy_pages, guest_paddr >> vm->page_shift, npages); @@ -992,9 +1006,10 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, TEST_ASSERT(ret == 0, "KVM_SET_USER_MEMORY_REGION2 IOCTL failed,\n" " rc: %i errno: %i\n" " slot: %u flags: 0x%x\n" - " guest_phys_addr: 0x%lx size: 0x%lx", + " guest_phys_addr: 0x%lx size: 0x%lx restricted fd: %d\n", ret, errno, slot, flags, - guest_paddr, (uint64_t) region->region.memory_size); + guest_paddr, (uint64_t) region->region.memory_size, + region->region.restricted_fd);
/* Add to quick lookup data structures */ vm_userspace_mem_region_gpa_insert(&vm->regions.gpa_tree, region);
On Tue, Jan 17, 2023 at 1:46 PM Sean Christopherson seanjc@google.com wrote:
On Mon, Dec 05, 2022, Vishal Annapurve wrote:
Add support for registering private memory with kvm using KVM_SET_USER_MEMORY_REGION ioctl.
Helper function to query extended userspace mem region is introduced to allow memory conversion.
vm_mem_backing_src types is extended to contain additional guest memory source types to cover the cases where guest memory can be backed by both anonymous memory and restricted memfd.
Signed-off-by: Vishal Annapurve vannapurve@google.com
.../selftests/kvm/include/kvm_util_base.h | 12 +++- .../testing/selftests/kvm/include/test_util.h | 4 ++ tools/testing/selftests/kvm/lib/kvm_util.c | 58 +++++++++++++++++-- tools/testing/selftests/kvm/lib/test_util.c | 11 ++++ 4 files changed, 78 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h index c7685c7038ff..4ad99f295f2a 100644 --- a/tools/testing/selftests/kvm/include/kvm_util_base.h +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h @@ -31,7 +31,10 @@ typedef uint64_t vm_paddr_t; /* Virtual Machine (Guest) physical address */ typedef uint64_t vm_vaddr_t; /* Virtual Machine (Guest) virtual address */
struct userspace_mem_region {
struct kvm_userspace_memory_region region;
union {
struct kvm_userspace_memory_region region;
struct kvm_userspace_memory_region_ext region_ext;
As discussed in the UPM series, we're trending towards adding an entirely new struct+ioctl(), kvm_userspace_memory_region2, instead of extending the existing struct. The == -> >= hack you had to add in kvm_do_ioctl() below is one of the reason for that change.
Ack.
}; struct sparsebit *unused_phy_pages; int fd; off_t offset;
@@ -196,7 +199,7 @@ static inline bool kvm_has_cap(long cap)
#define kvm_do_ioctl(fd, cmd, arg) \ ({ \
static_assert(!_IOC_SIZE(cmd) || sizeof(*arg) == _IOC_SIZE(cmd), ""); \
static_assert(!_IOC_SIZE(cmd) || sizeof(*arg) >= _IOC_SIZE(cmd), ""); \ ioctl(fd, cmd, arg); \
})
@@ -384,6 +387,7 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, void vm_mem_region_set_flags(struct kvm_vm *vm, uint32_t slot, uint32_t flags); void vm_mem_region_move(struct kvm_vm *vm, uint32_t slot, uint64_t new_gpa); void vm_mem_region_delete(struct kvm_vm *vm, uint32_t slot);
struct kvm_vcpu *__vm_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id); vm_vaddr_t vm_vaddr_unused_gap(struct kvm_vm *vm, size_t sz, vm_vaddr_t vaddr_min); vm_vaddr_t vm_vaddr_alloc(struct kvm_vm *vm, size_t sz, vm_vaddr_t vaddr_min); @@ -715,6 +719,10 @@ struct kvm_userspace_memory_region * kvm_userspace_memory_region_find(struct kvm_vm *vm, uint64_t start, uint64_t end);
+struct kvm_userspace_memory_region_ext * +kvm_userspace_memory_region_ext_find(struct kvm_vm *vm, uint64_t start,
uint64_t end);
#define sync_global_to_guest(vm, g) ({ \ typeof(g) *_p = addr_gva2hva(vm, (vm_vaddr_t)&(g)); \ memcpy(_p, &(g), sizeof(g)); \ diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h index 80d6416f3012..aea80071f2b8 100644 --- a/tools/testing/selftests/kvm/include/test_util.h +++ b/tools/testing/selftests/kvm/include/test_util.h @@ -103,6 +103,8 @@ enum vm_mem_backing_src_type { VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB, VM_MEM_SRC_SHMEM, VM_MEM_SRC_SHARED_HUGETLB,
VM_MEM_SRC_ANONYMOUS_AND_RESTRICTED_MEMFD,
VM_MEM_SRC_ANON_HTLB2M_AND_RESTRICTED_MEMFD,
There's no need for a dedicated flag in the backing type, vm_userspace_mem_region_add() already takes the memslot's flags and can simply key off KVM_MEM_PRIVATE.
I switched to using a dedicated flag thinking that it might be handy when private memory can be backed by huge pages. For now it makes sense to avoid adding it.
@@ -881,6 +915,7 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, struct userspace_mem_region *region; size_t backing_src_pagesz = get_backing_src_pagesz(src_type); size_t alignment;
int restricted_memfd = -1;
No need to initialize to -1, KVM is supposed to ignore the restrictedmem fd if !KVM_MEM_PRIVATE, and if KVM_MEM_PRIVATE is set, selftests must provide a valid fd.
TEST_ASSERT(vm_adjust_num_guest_pages(vm->mode, npages) == npages, "Number of guest pages is not compatible with the host. "
This is what I ended up with after splitting out the conversion to KVM_SET_USER_MEMORY_REGION2 to a separate patch.
-- diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 7c1f81f93ba3..26c6830c1aa1 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -32,6 +32,11 @@ int open_path_or_exit(const char *path, int flags) return fd; }
+static int memfd_restricted(unsigned int flags) +{
return syscall(__NR_memfd_restricted, flags);
+}
/*
- Open KVM_DEV_PATH if available, otherwise exit the entire program.
@@ -980,6 +985,15 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, }
region->backing_src_type = src_type;
if (flags & KVM_MEM_PRIVATE) {
region->region.restricted_fd = memfd_restricted(0);
region->region.restricted_offset = 0;
TEST_ASSERT(region->region.restricted_fd >= 0,
"Failed to create restricted memfd");
}
region->unused_phy_pages = sparsebit_alloc(); sparsebit_set_num(region->unused_phy_pages, guest_paddr >> vm->page_shift, npages);
@@ -992,9 +1006,10 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, TEST_ASSERT(ret == 0, "KVM_SET_USER_MEMORY_REGION2 IOCTL failed,\n" " rc: %i errno: %i\n" " slot: %u flags: 0x%x\n"
" guest_phys_addr: 0x%lx size: 0x%lx",
" guest_phys_addr: 0x%lx size: 0x%lx restricted fd: %d\n", ret, errno, slot, flags,
guest_paddr, (uint64_t) region->region.memory_size);
guest_paddr, (uint64_t) region->region.memory_size,
region->region.restricted_fd); /* Add to quick lookup data structures */ vm_userspace_mem_region_gpa_insert(&vm->regions.gpa_tree, region);
Ack.
Add IS_ALIGNED/IS_PAGE_ALIGNED helpers for selftests.
Signed-off-by: Vishal Annapurve vannapurve@google.com --- tools/testing/selftests/kvm/include/kvm_util_base.h | 3 +++ tools/testing/selftests/kvm/include/x86_64/processor.h | 1 + 2 files changed, 4 insertions(+)
diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h index 4ad99f295f2a..7ba32471df50 100644 --- a/tools/testing/selftests/kvm/include/kvm_util_base.h +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h @@ -170,6 +170,9 @@ extern enum vm_guest_mode vm_mode_default; #define MIN_PAGE_SIZE (1U << MIN_PAGE_SHIFT) #define PTES_PER_MIN_PAGE ptes_per_page(MIN_PAGE_SIZE)
+/* @a is a power of 2 value */ +#define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0) + struct vm_guest_mode_params { unsigned int pa_bits; unsigned int va_bits; diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h index 5d310abe6c3f..4d5dd9a467e1 100644 --- a/tools/testing/selftests/kvm/include/x86_64/processor.h +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h @@ -279,6 +279,7 @@ static inline unsigned int x86_model(unsigned int eax) #define PAGE_SHIFT 12 #define PAGE_SIZE (1ULL << PAGE_SHIFT) #define PAGE_MASK (~(PAGE_SIZE-1) & PHYSICAL_PAGE_MASK) +#define IS_PAGE_ALIGNED(x) IS_ALIGNED(x, PAGE_SIZE)
#define HUGEPAGE_SHIFT(x) (PAGE_SHIFT + (((x) - 1) * 9)) #define HUGEPAGE_SIZE(x) (1UL << HUGEPAGE_SHIFT(x))
On Mon, Dec 05, 2022, Vishal Annapurve wrote:
Add IS_ALIGNED/IS_PAGE_ALIGNED helpers for selftests.
Signed-off-by: Vishal Annapurve vannapurve@google.com
tools/testing/selftests/kvm/include/kvm_util_base.h | 3 +++ tools/testing/selftests/kvm/include/x86_64/processor.h | 1 + 2 files changed, 4 insertions(+)
diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h index 4ad99f295f2a..7ba32471df50 100644 --- a/tools/testing/selftests/kvm/include/kvm_util_base.h +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h @@ -170,6 +170,9 @@ extern enum vm_guest_mode vm_mode_default; #define MIN_PAGE_SIZE (1U << MIN_PAGE_SHIFT) #define PTES_PER_MIN_PAGE ptes_per_page(MIN_PAGE_SIZE) +/* @a is a power of 2 value */ +#define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0)
IS_ALIGNED() is provided by tools/include/linux/bitmap.h
struct vm_guest_mode_params { unsigned int pa_bits; unsigned int va_bits; diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h index 5d310abe6c3f..4d5dd9a467e1 100644 --- a/tools/testing/selftests/kvm/include/x86_64/processor.h +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h @@ -279,6 +279,7 @@ static inline unsigned int x86_model(unsigned int eax) #define PAGE_SHIFT 12 #define PAGE_SIZE (1ULL << PAGE_SHIFT) #define PAGE_MASK (~(PAGE_SIZE-1) & PHYSICAL_PAGE_MASK) +#define IS_PAGE_ALIGNED(x) IS_ALIGNED(x, PAGE_SIZE)
I certainly don't object to adding IS_PAGE_ALIGNED(), but it's not needed for this series. Verifying that KVM doesn't allow an unaligned page conversion during KVM_HC_MAP_GPA_RANGE belongs in a separate test+series, as that doesn't have a strict dependency on UPM.
TL;DR: this patch can be dropped, for now at least.
#define HUGEPAGE_SHIFT(x) (PAGE_SHIFT + (((x) - 1) * 9))
#define HUGEPAGE_SIZE(x) (1UL << HUGEPAGE_SHIFT(x))
2.39.0.rc0.267.gcb52ba06e7-goog
On Tue, Jan 17, 2023 at 1:48 PM Sean Christopherson seanjc@google.com wrote:
... I certainly don't object to adding IS_PAGE_ALIGNED(), but it's not needed for this series. Verifying that KVM doesn't allow an unaligned page conversion during KVM_HC_MAP_GPA_RANGE belongs in a separate test+series, as that doesn't have a strict dependency on UPM.
TL;DR: this patch can be dropped, for now at least.
Makes sense.
#define HUGEPAGE_SHIFT(x) (PAGE_SHIFT + (((x) - 1) * 9))
#define HUGEPAGE_SIZE(x) (1UL << HUGEPAGE_SHIFT(x))
2.39.0.rc0.267.gcb52ba06e7-goog
Introduce a set of APIs to execute VM with private memslots.
Host userspace APIs for: 1) Executing a vcpu run loop that handles MAPGPA hypercall 2) Backing/unbacking guest private memory
Guest APIs for: 1) Changing memory mapping type
Signed-off-by: Vishal Annapurve vannapurve@google.com --- tools/testing/selftests/kvm/Makefile | 1 + .../kvm/include/x86_64/private_mem.h | 24 +++ .../selftests/kvm/lib/x86_64/private_mem.c | 139 ++++++++++++++++++ 3 files changed, 164 insertions(+) create mode 100644 tools/testing/selftests/kvm/include/x86_64/private_mem.h create mode 100644 tools/testing/selftests/kvm/lib/x86_64/private_mem.c
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 2275ba861e0e..97f7d52c553b 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -55,6 +55,7 @@ LIBKVM_x86_64 += lib/x86_64/apic.c LIBKVM_x86_64 += lib/x86_64/handlers.S LIBKVM_x86_64 += lib/x86_64/hyperv.c LIBKVM_x86_64 += lib/x86_64/memstress.c +LIBKVM_x86_64 += lib/x86_64/private_mem.c LIBKVM_x86_64 += lib/x86_64/processor.c LIBKVM_x86_64 += lib/x86_64/svm.c LIBKVM_x86_64 += lib/x86_64/ucall.c diff --git a/tools/testing/selftests/kvm/include/x86_64/private_mem.h b/tools/testing/selftests/kvm/include/x86_64/private_mem.h new file mode 100644 index 000000000000..3aa6b4d11b28 --- /dev/null +++ b/tools/testing/selftests/kvm/include/x86_64/private_mem.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2022, Google LLC. + */ + +#ifndef SELFTEST_KVM_PRIVATE_MEM_H +#define SELFTEST_KVM_PRIVATE_MEM_H + +#include <stdint.h> +#include <kvm_util.h> + +void kvm_hypercall_map_shared(uint64_t gpa, uint64_t size); +void kvm_hypercall_map_private(uint64_t gpa, uint64_t size); + +void vm_unback_private_mem(struct kvm_vm *vm, uint64_t gpa, uint64_t size); + +void vm_allocate_private_mem(struct kvm_vm *vm, uint64_t gpa, uint64_t size); + +void handle_vm_exit_map_gpa_hypercall(struct kvm_vm *vm, uint64_t gpa, + uint64_t npages, uint64_t attrs); + +void vcpu_run_and_handle_mapgpa(struct kvm_vm *vm, struct kvm_vcpu *vcpu); + +#endif /* SELFTEST_KVM_PRIVATE_MEM_H */ diff --git a/tools/testing/selftests/kvm/lib/x86_64/private_mem.c b/tools/testing/selftests/kvm/lib/x86_64/private_mem.c new file mode 100644 index 000000000000..2b97fc34ec4a --- /dev/null +++ b/tools/testing/selftests/kvm/lib/x86_64/private_mem.c @@ -0,0 +1,139 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022, Google LLC. + */ +#define _GNU_SOURCE /* for program_invocation_name */ +#include <fcntl.h> +#include <limits.h> +#include <sched.h> +#include <signal.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/ioctl.h> + +#include <linux/compiler.h> +#include <linux/kernel.h> +#include <linux/kvm_para.h> + +#include <test_util.h> +#include <kvm_util.h> +#include <private_mem.h> +#include <processor.h> + +static inline uint64_t __kvm_hypercall_map_gpa_range(uint64_t gpa, uint64_t size, + uint64_t flags) +{ + return kvm_hypercall(KVM_HC_MAP_GPA_RANGE, gpa, size >> PAGE_SHIFT, flags, 0); +} + +static inline void kvm_hypercall_map_gpa_range(uint64_t gpa, uint64_t size, + uint64_t flags) +{ + uint64_t ret; + + GUEST_ASSERT_2(IS_PAGE_ALIGNED(gpa) && IS_PAGE_ALIGNED(size), gpa, size); + + ret = __kvm_hypercall_map_gpa_range(gpa, size, flags); + GUEST_ASSERT_1(!ret, ret); +} + +void kvm_hypercall_map_shared(uint64_t gpa, uint64_t size) +{ + kvm_hypercall_map_gpa_range(gpa, size, KVM_MAP_GPA_RANGE_DECRYPTED); +} + +void kvm_hypercall_map_private(uint64_t gpa, uint64_t size) +{ + kvm_hypercall_map_gpa_range(gpa, size, KVM_MAP_GPA_RANGE_ENCRYPTED); +} + +static void vm_update_private_mem(struct kvm_vm *vm, uint64_t gpa, uint64_t size, + bool unback_mem) +{ + int restricted_fd; + uint64_t restricted_fd_offset, guest_phys_base, fd_offset; + struct kvm_memory_attributes attr; + struct kvm_userspace_memory_region_ext *region_ext; + struct kvm_userspace_memory_region *region; + int fallocate_mode = 0; + int ret; + + region_ext = kvm_userspace_memory_region_ext_find(vm, gpa, gpa + size); + TEST_ASSERT(region_ext != NULL, "Region not found"); + region = ®ion_ext->region; + TEST_ASSERT(region->flags & KVM_MEM_PRIVATE, + "Can not update private memfd for non-private memslot\n"); + restricted_fd = region_ext->restricted_fd; + restricted_fd_offset = region_ext->restricted_offset; + guest_phys_base = region->guest_phys_addr; + fd_offset = restricted_fd_offset + (gpa - guest_phys_base); + + if (unback_mem) + fallocate_mode = (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE); + + printf("restricted_fd %d fallocate_mode 0x%x for offset 0x%lx size 0x%lx\n", + restricted_fd, fallocate_mode, fd_offset, size); + ret = fallocate(restricted_fd, fallocate_mode, fd_offset, size); + TEST_ASSERT(ret == 0, "fallocate failed\n"); + attr.attributes = unback_mem ? 0 : KVM_MEMORY_ATTRIBUTE_PRIVATE; + attr.address = gpa; + attr.size = size; + attr.flags = 0; + if (unback_mem) + printf("undoing encryption for gpa 0x%lx size 0x%lx\n", gpa, size); + else + printf("doing encryption for gpa 0x%lx size 0x%lx\n", gpa, size); + + vm_ioctl(vm, KVM_SET_MEMORY_ATTRIBUTES, &attr); +} + +void vm_unback_private_mem(struct kvm_vm *vm, uint64_t gpa, uint64_t size) +{ + vm_update_private_mem(vm, gpa, size, true); +} + +void vm_allocate_private_mem(struct kvm_vm *vm, uint64_t gpa, uint64_t size) +{ + vm_update_private_mem(vm, gpa, size, false); +} + +void handle_vm_exit_map_gpa_hypercall(struct kvm_vm *vm, uint64_t gpa, + uint64_t npages, uint64_t attrs) +{ + uint64_t size; + + size = npages << MIN_PAGE_SHIFT; + pr_info("Explicit conversion off 0x%lx size 0x%lx to %s\n", gpa, size, + (attrs & KVM_MAP_GPA_RANGE_ENCRYPTED) ? "private" : "shared"); + + if (attrs & KVM_MAP_GPA_RANGE_ENCRYPTED) + vm_allocate_private_mem(vm, gpa, size); + else + vm_unback_private_mem(vm, gpa, size); +} + +void vcpu_run_and_handle_mapgpa(struct kvm_vm *vm, struct kvm_vcpu *vcpu) +{ + /* + * Loop until the guest exits with any reason other than + * KVM_HC_MAP_GPA_RANGE hypercall. + */ + + while (true) { + vcpu_run(vcpu); + + if ((vcpu->run->exit_reason == KVM_EXIT_HYPERCALL) && + (vcpu->run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)) { + uint64_t gpa = vcpu->run->hypercall.args[0]; + uint64_t npages = vcpu->run->hypercall.args[1]; + uint64_t attrs = vcpu->run->hypercall.args[2]; + + handle_vm_exit_map_gpa_hypercall(vm, gpa, npages, attrs); + vcpu->run->hypercall.ret = 0; + continue; + } + + return; + } +}
On Mon, Dec 05, 2022, Vishal Annapurve wrote:
Introduce a set of APIs to execute VM with private memslots.
Host userspace APIs for:
- Executing a vcpu run loop that handles MAPGPA hypercall
- Backing/unbacking guest private memory
Guest APIs for:
- Changing memory mapping type
Signed-off-by: Vishal Annapurve vannapurve@google.com
tools/testing/selftests/kvm/Makefile | 1 + .../kvm/include/x86_64/private_mem.h | 24 +++ .../selftests/kvm/lib/x86_64/private_mem.c | 139 ++++++++++++++++++ 3 files changed, 164 insertions(+) create mode 100644 tools/testing/selftests/kvm/include/x86_64/private_mem.h create mode 100644 tools/testing/selftests/kvm/lib/x86_64/private_mem.c
Given that we _know_ private memory isn't always going to x86 specific, I don't want to bury any helpers in x86_64 that aren't strictly x86 only. E.g. helpers for doing Intel+AMD hypercalls is ok, but "generic" private memory helpers belong elsewhere.
I experimented with extracting memslots/mmu helpers out of kvm_util.c as prep work, e.g. to avoid bloating kvm_util.c, but I couldn't come up with an obviously "good" naming scheme and/or split. At this time, the common bits are fairly small, so I think the best approach for now is to simply put vm_mem_map_shared_or_private() in kvm_util.c.
+static inline uint64_t __kvm_hypercall_map_gpa_range(uint64_t gpa, uint64_t size,
- uint64_t flags)
+{
- return kvm_hypercall(KVM_HC_MAP_GPA_RANGE, gpa, size >> PAGE_SHIFT, flags, 0);
+}
This can go in tools/testing/selftests/kvm/include/x86_64/processor.h.
+static inline void kvm_hypercall_map_gpa_range(uint64_t gpa, uint64_t size,
- uint64_t flags)
+{
- uint64_t ret;
- GUEST_ASSERT_2(IS_PAGE_ALIGNED(gpa) && IS_PAGE_ALIGNED(size), gpa, size);
- ret = __kvm_hypercall_map_gpa_range(gpa, size, flags);
- GUEST_ASSERT_1(!ret, ret);
+}
+void kvm_hypercall_map_shared(uint64_t gpa, uint64_t size) +{
- kvm_hypercall_map_gpa_range(gpa, size, KVM_MAP_GPA_RANGE_DECRYPTED);
+}
+void kvm_hypercall_map_private(uint64_t gpa, uint64_t size) +{
- kvm_hypercall_map_gpa_range(gpa, size, KVM_MAP_GPA_RANGE_ENCRYPTED);
+}
+static void vm_update_private_mem(struct kvm_vm *vm, uint64_t gpa, uint64_t size,
- bool unback_mem)
s/unback_mem/map_shared. "unback memory" is going to be super confusing to someone who isn't familiar with UPM. map_private would be the obvious alternative, but I like not having to invert the param in the helper.
+{
- int restricted_fd;
- uint64_t restricted_fd_offset, guest_phys_base, fd_offset;
- struct kvm_memory_attributes attr;
- struct kvm_userspace_memory_region_ext *region_ext;
- struct kvm_userspace_memory_region *region;
- int fallocate_mode = 0;
- int ret;
- region_ext = kvm_userspace_memory_region_ext_find(vm, gpa, gpa + size);
I forget if I've already mentioned this somewhere, but I'd prefer to use the "private" userspace_mem_region_find() and delete the existing kvm_userspace_memory_region_find().
- TEST_ASSERT(region_ext != NULL, "Region not found");
- region = ®ion_ext->region;
- TEST_ASSERT(region->flags & KVM_MEM_PRIVATE,
"Can not update private memfd for non-private memslot\n");
- restricted_fd = region_ext->restricted_fd;
- restricted_fd_offset = region_ext->restricted_offset;
- guest_phys_base = region->guest_phys_addr;
- fd_offset = restricted_fd_offset + (gpa - guest_phys_base);
- if (unback_mem)
fallocate_mode = (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE);
- printf("restricted_fd %d fallocate_mode 0x%x for offset 0x%lx size 0x%lx\n",
restricted_fd, fallocate_mode, fd_offset, size);
Don't put prints in common helpers, except _maybe_ for error paths. It's fine for development and/or debug, but for the final product it ends up being noise 99% of the time. If you really, really want printing checked in, then pr_debug() is an option, but I would generally discourage even that for selftests. E.g. strace can give you all the information printed here without needing to rebuild the binary, and without maintenance burden.
- ret = fallocate(restricted_fd, fallocate_mode, fd_offset, size);
- TEST_ASSERT(ret == 0, "fallocate failed\n");
Use whitespace to differntiate operations/blocks.
- attr.attributes = unback_mem ? 0 : KVM_MEMORY_ATTRIBUTE_PRIVATE;
- attr.address = gpa;
- attr.size = size;
- attr.flags = 0;
Add a helper to do KVM_SET_MEMORY_ATTRIBUTES, e.g. to fill the appropriate struct.
- if (unback_mem)
printf("undoing encryption for gpa 0x%lx size 0x%lx\n", gpa, size);
- else
printf("doing encryption for gpa 0x%lx size 0x%lx\n", gpa, size);
- vm_ioctl(vm, KVM_SET_MEMORY_ATTRIBUTES, &attr);
+}
void vm_mem_map_shared_or_private(struct kvm_vm *vm, uint64_t gpa, uint64_t size, bool map_shared) { struct userspace_mem_region *region; uint64_t end = gpa + size - 1; off_t fd_offset; int mode, ret;
region = userspace_mem_region_find(vm, gpa, gpa); TEST_ASSERT(region && region->region.flags & KVM_MEM_PRIVATE, "Private memory region not found for GPA 0x%lx", gpa);
TEST_ASSERT(region == userspace_mem_region_find(vm, end, end), "Private/Shared conversions must act on a single memslot");
fd_offset = region->region.restricted_offset + (gpa - region->region.guest_phys_addr);
/* To map shared, punch a hole. To map private, allocate (no flags). */ mode = map_shared ? (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE) : 0;
ret = fallocate(region->region.restricted_fd, mode, fd_offset, size); TEST_ASSERT(!ret, "fallocate() failed to map %lx[%lu] %s, fd = %d, mode = %x, offset = %lx\n", gpa, size, map_shared ? "shared" : "private", region->region.restricted_fd, mode, fd_offset);
vm_set_memory_attributes(vm, gpa, size, map_shared ? 0 : KVM_MEMORY_ATTRIBUTE_PRIVATE); }
On Mon, Dec 05, 2022, Vishal Annapurve wrote:
+void vcpu_run_and_handle_mapgpa(struct kvm_vm *vm, struct kvm_vcpu *vcpu) +{
- /*
* Loop until the guest exits with any reason other than
* KVM_HC_MAP_GPA_RANGE hypercall.
*/
- while (true) {
vcpu_run(vcpu);
if ((vcpu->run->exit_reason == KVM_EXIT_HYPERCALL) &&
(vcpu->run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)) {
I get what you're trying to do, and I completely agree that we need better helpers and/or macros to reduce this type of boilerplate, but adding a one-off helper like this is going to be a net negative overall. This helper services exactly one use case, and also obfuscates what a test does.
In other words, this is yet another thing that needs broad, generic support (_vcpu_run() is a very special case). E.g. something like this to make it easy for tests to run a guest and handle ucalls plus specific exits (just a strawman, I think we can do better for handling ucalls).
#define vcpu_run_loop(vcpu, handlers, ucalls) \ do { \ uint32_t __exit; \ int __r = 0; \ \ while (!r) { \ vcpu_run(vcpu); \ \ __exit = vcpu->run->exit_reason; \ \ if (__exit < ARRAY_SIZE(handlers) && handlers[__exit]) \ __r = handlers[__exit](vcpu); \ else if (__exit == KVM_EXIT_IO && ucalls) \ __r = handle_exit_ucall(vcpu, ucalls, \ ARRAY_SIZE(ucalls)); \ else \ TEST_FAIL(...) \ } \ } while (0)
For this series, I think it makes sense to just open code yet another test. It really doesn't end up being much code, which is partly why we haven't added helpers :-/
Add an API to query free 2MB hugepages in the system.
Signed-off-by: Vishal Annapurve vannapurve@google.com --- .../testing/selftests/kvm/include/test_util.h | 1 + tools/testing/selftests/kvm/lib/test_util.c | 18 ++++++++++++++++++ 2 files changed, 19 insertions(+)
diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h index aea80071f2b8..3d1cc215940a 100644 --- a/tools/testing/selftests/kvm/include/test_util.h +++ b/tools/testing/selftests/kvm/include/test_util.h @@ -122,6 +122,7 @@ struct vm_mem_backing_src_alias { bool thp_configured(void); size_t get_trans_hugepagesz(void); size_t get_def_hugetlb_pagesz(void); +size_t get_free_huge_2mb_pages(void); const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i); size_t get_backing_src_pagesz(uint32_t i); bool is_backing_src_hugetlb(uint32_t i); diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c index d33b98bfe8a3..745573023b57 100644 --- a/tools/testing/selftests/kvm/lib/test_util.c +++ b/tools/testing/selftests/kvm/lib/test_util.c @@ -162,6 +162,24 @@ size_t get_trans_hugepagesz(void) return size; }
+size_t get_free_huge_2mb_pages(void) +{ + size_t free_pages; + FILE *f; + int ret; + + f = fopen("/sys/kernel/mm/hugepages/hugepages-2048kB/free_hugepages", "r"); + TEST_ASSERT(f != NULL, "Error in opening hugepages-2048kB/free_hugepages"); + + do { + ret = fscanf(f, "%ld", &free_pages); + } while (errno == EINTR); + TEST_ASSERT(ret < 1, "Error reading hugepages-2048kB/free_hugepages"); + fclose(f); + + return free_pages; +} + size_t get_def_hugetlb_pagesz(void) { char buf[64];
On Mon, Dec 05, 2022, Vishal Annapurve wrote:
Add an API to query free 2MB hugepages in the system.
Signed-off-by: Vishal Annapurve vannapurve@google.com
.../testing/selftests/kvm/include/test_util.h | 1 + tools/testing/selftests/kvm/lib/test_util.c | 18 ++++++++++++++++++ 2 files changed, 19 insertions(+)
diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h index aea80071f2b8..3d1cc215940a 100644 --- a/tools/testing/selftests/kvm/include/test_util.h +++ b/tools/testing/selftests/kvm/include/test_util.h @@ -122,6 +122,7 @@ struct vm_mem_backing_src_alias { bool thp_configured(void); size_t get_trans_hugepagesz(void); size_t get_def_hugetlb_pagesz(void); +size_t get_free_huge_2mb_pages(void); const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i); size_t get_backing_src_pagesz(uint32_t i); bool is_backing_src_hugetlb(uint32_t i); diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c index d33b98bfe8a3..745573023b57 100644 --- a/tools/testing/selftests/kvm/lib/test_util.c +++ b/tools/testing/selftests/kvm/lib/test_util.c @@ -162,6 +162,24 @@ size_t get_trans_hugepagesz(void) return size; } +size_t get_free_huge_2mb_pages(void)
I strongly prefer to follow the precedence set by other tests, which at this point means defaulting to non-huge pages. I do think we need to make it easier and/or automatic to test hugepages, but I would like to tackle that problem separately. E.g. a kernel built without hugepage support will fail the fopen() below.
+{
- size_t free_pages;
- FILE *f;
- int ret;
- f = fopen("/sys/kernel/mm/hugepages/hugepages-2048kB/free_hugepages", "r");
- TEST_ASSERT(f != NULL, "Error in opening hugepages-2048kB/free_hugepages");
- do {
ret = fscanf(f, "%ld", &free_pages);
- } while (errno == EINTR);
- TEST_ASSERT(ret < 1, "Error reading hugepages-2048kB/free_hugepages");
- fclose(f);
- return free_pages;
+}
size_t get_def_hugetlb_pagesz(void) { char buf[64]; -- 2.39.0.rc0.267.gcb52ba06e7-goog
Add a selftest to exercise implicit/explicit conversion functionality within KVM and verify: 1) Shared memory is visible to host userspace after conversion 2) Private memory is not visible to host userspace before/after conversion 3) Host userspace and guest can communicate over shared memory
Signed-off-by: Vishal Annapurve vannapurve@google.com --- tools/testing/selftests/kvm/.gitignore | 1 + tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/x86_64/private_mem_test.c | 212 ++++++++++++++++++ 3 files changed, 214 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86_64/private_mem_test.c
diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore index 082855d94c72..19cdcde2ed08 100644 --- a/tools/testing/selftests/kvm/.gitignore +++ b/tools/testing/selftests/kvm/.gitignore @@ -34,6 +34,7 @@ /x86_64/nested_exceptions_test /x86_64/nx_huge_pages_test /x86_64/platform_info_test +/x86_64/private_mem_test /x86_64/pmu_event_filter_test /x86_64/set_boot_cpu_id /x86_64/set_sregs_test diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 97f7d52c553b..beb793dd3e1c 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -99,6 +99,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/monitor_mwait_test TEST_GEN_PROGS_x86_64 += x86_64/nested_exceptions_test TEST_GEN_PROGS_x86_64 += x86_64/platform_info_test TEST_GEN_PROGS_x86_64 += x86_64/pmu_event_filter_test +TEST_GEN_PROGS_x86_64 += x86_64/private_mem_test TEST_GEN_PROGS_x86_64 += x86_64/set_boot_cpu_id TEST_GEN_PROGS_x86_64 += x86_64/set_sregs_test TEST_GEN_PROGS_x86_64 += x86_64/smaller_maxphyaddr_emulation_test diff --git a/tools/testing/selftests/kvm/x86_64/private_mem_test.c b/tools/testing/selftests/kvm/x86_64/private_mem_test.c new file mode 100644 index 000000000000..015ada2e3d54 --- /dev/null +++ b/tools/testing/selftests/kvm/x86_64/private_mem_test.c @@ -0,0 +1,212 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022, Google LLC. + */ +#define _GNU_SOURCE /* for program_invocation_short_name */ +#include <fcntl.h> +#include <limits.h> +#include <sched.h> +#include <signal.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/ioctl.h> + +#include <linux/compiler.h> +#include <linux/kernel.h> +#include <linux/kvm_para.h> +#include <linux/memfd.h> + +#include <test_util.h> +#include <kvm_util.h> +#include <private_mem.h> +#include <processor.h> + +#define TEST_AREA_SLOT 10 +#define TEST_AREA_GPA 0xC0000000 +#define TEST_AREA_SIZE (2 * 1024 * 1024) +#define GUEST_TEST_MEM_OFFSET (1 * 1024 * 1024) +#define GUEST_TEST_MEM_SIZE (10 * 4096) + +#define VM_STAGE_PROCESSED(x) pr_info("Processed stage %s\n", #x) + +#define TEST_MEM_DATA_PATTERN1 0x66 +#define TEST_MEM_DATA_PATTERN2 0x99 +#define TEST_MEM_DATA_PATTERN3 0x33 +#define TEST_MEM_DATA_PATTERN4 0xaa +#define TEST_MEM_DATA_PATTERN5 0x12 + +static bool verify_mem_contents(void *mem, uint32_t size, uint8_t pattern) +{ + uint8_t *buf = (uint8_t *)mem; + + for (uint32_t i = 0; i < size; i++) { + if (buf[i] != pattern) + return false; + } + + return true; +} + +static void populate_test_area(void *test_area_base, uint64_t pattern) +{ + memset(test_area_base, pattern, TEST_AREA_SIZE); +} + +static void populate_guest_test_mem(void *guest_test_mem, uint64_t pattern) +{ + memset(guest_test_mem, pattern, GUEST_TEST_MEM_SIZE); +} + +static bool verify_test_area(void *test_area_base, uint64_t area_pattern, + uint64_t guest_pattern) +{ + void *guest_test_mem = test_area_base + GUEST_TEST_MEM_OFFSET; + void *test_area2_base = guest_test_mem + GUEST_TEST_MEM_SIZE; + uint64_t test_area2_size = (TEST_AREA_SIZE - (GUEST_TEST_MEM_OFFSET + + GUEST_TEST_MEM_SIZE)); + + return (verify_mem_contents(test_area_base, GUEST_TEST_MEM_OFFSET, area_pattern) && + verify_mem_contents(guest_test_mem, GUEST_TEST_MEM_SIZE, guest_pattern) && + verify_mem_contents(test_area2_base, test_area2_size, area_pattern)); +} + +#define GUEST_STARTED 0 +#define GUEST_PRIVATE_MEM_POPULATED 1 +#define GUEST_SHARED_MEM_POPULATED 2 +#define GUEST_PRIVATE_MEM_POPULATED2 3 + +/* + * Run memory conversion tests with explicit conversion: + * Execute KVM hypercall to map/unmap gpa range which will cause userspace exit + * to back/unback private memory. Subsequent accesses by guest to the gpa range + * will not cause exit to userspace. + * + * Test memory conversion scenarios with following steps: + * 1) Access private memory using private access and verify that memory contents + * are not visible to userspace. + * 2) Convert memory to shared using explicit conversions and ensure that + * userspace is able to access the shared regions. + * 3) Convert memory back to private using explicit conversions and ensure that + * userspace is again not able to access converted private regions. + */ +static void guest_conv_test_fn(void) +{ + void *test_area_base = (void *)TEST_AREA_GPA; + void *guest_test_mem = (void *)(TEST_AREA_GPA + GUEST_TEST_MEM_OFFSET); + uint64_t guest_test_size = GUEST_TEST_MEM_SIZE; + + GUEST_SYNC(GUEST_STARTED); + + populate_test_area(test_area_base, TEST_MEM_DATA_PATTERN1); + GUEST_SYNC(GUEST_PRIVATE_MEM_POPULATED); + GUEST_ASSERT(verify_test_area(test_area_base, TEST_MEM_DATA_PATTERN1, + TEST_MEM_DATA_PATTERN1)); + + kvm_hypercall_map_shared((uint64_t)guest_test_mem, guest_test_size); + + populate_guest_test_mem(guest_test_mem, TEST_MEM_DATA_PATTERN2); + + GUEST_SYNC(GUEST_SHARED_MEM_POPULATED); + GUEST_ASSERT(verify_test_area(test_area_base, TEST_MEM_DATA_PATTERN1, + TEST_MEM_DATA_PATTERN5)); + + kvm_hypercall_map_private((uint64_t)guest_test_mem, guest_test_size); + + populate_guest_test_mem(guest_test_mem, TEST_MEM_DATA_PATTERN3); + GUEST_SYNC(GUEST_PRIVATE_MEM_POPULATED2); + + GUEST_ASSERT(verify_test_area(test_area_base, TEST_MEM_DATA_PATTERN1, + TEST_MEM_DATA_PATTERN3)); + GUEST_DONE(); +} + +#define ASSERT_CONV_TEST_EXIT_IO(vcpu, stage) \ + { \ + struct ucall uc; \ + ASSERT_EQ(vcpu->run->exit_reason, KVM_EXIT_IO); \ + ASSERT_EQ(get_ucall(vcpu, &uc), UCALL_SYNC); \ + ASSERT_EQ(uc.args[1], stage); \ + } + +#define ASSERT_GUEST_DONE(vcpu) \ + { \ + struct ucall uc; \ + ASSERT_EQ(vcpu->run->exit_reason, KVM_EXIT_IO); \ + ASSERT_EQ(get_ucall(vcpu, &uc), UCALL_DONE); \ + } + +static void host_conv_test_fn(struct kvm_vm *vm, struct kvm_vcpu *vcpu) +{ + void *test_area_hva = addr_gpa2hva(vm, TEST_AREA_GPA); + void *guest_test_mem_hva = (test_area_hva + GUEST_TEST_MEM_OFFSET); + + vcpu_run_and_handle_mapgpa(vm, vcpu); + ASSERT_CONV_TEST_EXIT_IO(vcpu, GUEST_STARTED); + populate_test_area(test_area_hva, TEST_MEM_DATA_PATTERN4); + VM_STAGE_PROCESSED(GUEST_STARTED); + + vcpu_run_and_handle_mapgpa(vm, vcpu); + ASSERT_CONV_TEST_EXIT_IO(vcpu, GUEST_PRIVATE_MEM_POPULATED); + TEST_ASSERT(verify_test_area(test_area_hva, TEST_MEM_DATA_PATTERN4, + TEST_MEM_DATA_PATTERN4), "failed"); + VM_STAGE_PROCESSED(GUEST_PRIVATE_MEM_POPULATED); + + vcpu_run_and_handle_mapgpa(vm, vcpu); + ASSERT_CONV_TEST_EXIT_IO(vcpu, GUEST_SHARED_MEM_POPULATED); + TEST_ASSERT(verify_test_area(test_area_hva, TEST_MEM_DATA_PATTERN4, + TEST_MEM_DATA_PATTERN2), "failed"); + populate_guest_test_mem(guest_test_mem_hva, TEST_MEM_DATA_PATTERN5); + VM_STAGE_PROCESSED(GUEST_SHARED_MEM_POPULATED); + + vcpu_run_and_handle_mapgpa(vm, vcpu); + ASSERT_CONV_TEST_EXIT_IO(vcpu, GUEST_PRIVATE_MEM_POPULATED2); + TEST_ASSERT(verify_test_area(test_area_hva, TEST_MEM_DATA_PATTERN4, + TEST_MEM_DATA_PATTERN5), "failed"); + VM_STAGE_PROCESSED(GUEST_PRIVATE_MEM_POPULATED2); + + vcpu_run_and_handle_mapgpa(vm, vcpu); + ASSERT_GUEST_DONE(vcpu); +} + +static void execute_vm_with_private_test_mem( + enum vm_mem_backing_src_type test_mem_src) +{ + struct kvm_vm *vm; + struct kvm_enable_cap cap; + struct kvm_vcpu *vcpu; + + vm = vm_create_with_one_vcpu(&vcpu, guest_conv_test_fn); + + vm_check_cap(vm, KVM_CAP_EXIT_HYPERCALL); + cap.cap = KVM_CAP_EXIT_HYPERCALL; + cap.flags = 0; + cap.args[0] = (1 << KVM_HC_MAP_GPA_RANGE); + vm_ioctl(vm, KVM_ENABLE_CAP, &cap); + + vm_userspace_mem_region_add(vm, test_mem_src, TEST_AREA_GPA, + TEST_AREA_SLOT, TEST_AREA_SIZE / vm->page_size, KVM_MEM_PRIVATE); + vm_allocate_private_mem(vm, TEST_AREA_GPA, TEST_AREA_SIZE); + + virt_map(vm, TEST_AREA_GPA, TEST_AREA_GPA, TEST_AREA_SIZE/vm->page_size); + + host_conv_test_fn(vm, vcpu); + + kvm_vm_free(vm); +} + +int main(int argc, char *argv[]) +{ + execute_vm_with_private_test_mem( + VM_MEM_SRC_ANONYMOUS_AND_RESTRICTED_MEMFD); + + /* Needs 2MB Hugepages */ + if (get_free_huge_2mb_pages() >= 1) { + printf("Running private mem test with 2M pages\n"); + execute_vm_with_private_test_mem( + VM_MEM_SRC_ANON_HTLB2M_AND_RESTRICTED_MEMFD); + } else + printf("Skipping private mem test with 2M pages\n"); + + return 0; +}
On Mon, Dec 05, 2022, Vishal Annapurve wrote:
+#define TEST_AREA_SLOT 10
I vote using "DATA" instead of "TEST_AREA" just because it's shorter.
+#define TEST_AREA_GPA 0xC0000000
Put the GPA above 4GiB, that way it'll never run afoul of the PCI hole, e.g. won't overlap with the local APIC.
+#define TEST_AREA_SIZE (2 * 1024 * 1024)
Use the sizes from <linux/sizes.h>, e.g. SZ_2M.
+#define GUEST_TEST_MEM_OFFSET (1 * 1024 * 1024)
Same here.
+#define GUEST_TEST_MEM_SIZE (10 * 4096)
Why 10*4KiB? SZ_2M + PAGE_SIZE seems like the best compromise until selftests doesn't explode on mapping huge memslots in the guest. That lets us test boundary conditions wiith hugepages.
+#define VM_STAGE_PROCESSED(x) pr_info("Processed stage %s\n", #x)
Comments on this in the context of host_conv_test_fn().
+#define TEST_MEM_DATA_PATTERN1 0x66
Regarding the values of the patterns, the patterns themselves are also not at all interesting. The goal of the test is to verify that KVM/kernel plumbs through the correct memory. Verifying that hardware doesn't barf on a pattern is very much a non-goal, i.e. we aren't writing a test for DRAM.
While debugging my own goofs, I found it much, much easier to triage failures by simply using the pattern number of the value, e.g. 1 = 0x11, 2 = 0x22, etc.
+#define TEST_MEM_DATA_PATTERN2 0x99 +#define TEST_MEM_DATA_PATTERN3 0x33 +#define TEST_MEM_DATA_PATTERN4 0xaa +#define TEST_MEM_DATA_PATTERN5 0x12
While I like using macros instead of copy+pasting magic numbers, in this case the macros make the code really, really hard to read. Longish names with just one character different all look the same when glancing through the code, e.g. identifying when the test switches between patterns is difficult.
And IMO, having the guest explicitly tell the host what to expect yields a more maintanable, easier to understand test overall. And by having the guest say what pattern(s) to expect/write, then there's no needed for the guest to have a priori knowledge of the patterns, and thus no need for macros.
+static bool verify_mem_contents(void *mem, uint32_t size, uint8_t pattern) +{
- uint8_t *buf = (uint8_t *)mem;
- for (uint32_t i = 0; i < size; i++) {
if (buf[i] != pattern)
return false;
It took me blundering into a SHUTDOWN before I figured out these return booleans instead of simply asserting: using TEST_ASSERT() in the guest obviously doesn't fair well.
While I like deduplicating code, in this case it does more harm than good because the context of exactly what fails is lost, e.g. expected vs. actual pattern, offset, etc... We could use macros to dedup code, i.e. have only the assertion be different, but I don't think that ends up being a net positive either.
- }
- return true;
+}
+static void populate_test_area(void *test_area_base, uint64_t pattern) +{
- memset(test_area_base, pattern, TEST_AREA_SIZE);
+}
+static void populate_guest_test_mem(void *guest_test_mem, uint64_t pattern) +{
- memset(guest_test_mem, pattern, GUEST_TEST_MEM_SIZE);
Generally speaking, don't add one-line wrappers that don't provide novel functionality. E.g. there's nothing fancy here, and so the wrappers just end up obfuscating the size of a memset().
+}
+static bool verify_test_area(void *test_area_base, uint64_t area_pattern,
- uint64_t guest_pattern)
+{
- void *guest_test_mem = test_area_base + GUEST_TEST_MEM_OFFSET;
- void *test_area2_base = guest_test_mem + GUEST_TEST_MEM_SIZE;
- uint64_t test_area2_size = (TEST_AREA_SIZE - (GUEST_TEST_MEM_OFFSET +
GUEST_TEST_MEM_SIZE));
- return (verify_mem_contents(test_area_base, GUEST_TEST_MEM_OFFSET, area_pattern) &&
verify_mem_contents(guest_test_mem, GUEST_TEST_MEM_SIZE, guest_pattern) &&
verify_mem_contents(test_area2_base, test_area2_size, area_pattern));
+}
+#define GUEST_STARTED 0 +#define GUEST_PRIVATE_MEM_POPULATED 1 +#define GUEST_SHARED_MEM_POPULATED 2 +#define GUEST_PRIVATE_MEM_POPULATED2 3
+/*
- Run memory conversion tests with explicit conversion:
- Execute KVM hypercall to map/unmap gpa range which will cause userspace exit
- to back/unback private memory. Subsequent accesses by guest to the gpa range
- will not cause exit to userspace.
- Test memory conversion scenarios with following steps:
- Access private memory using private access and verify that memory contents
- are not visible to userspace.
- Convert memory to shared using explicit conversions and ensure that
- userspace is able to access the shared regions.
- Convert memory back to private using explicit conversions and ensure that
- userspace is again not able to access converted private regions.
- */
+static void guest_conv_test_fn(void) +{
- void *test_area_base = (void *)TEST_AREA_GPA;
- void *guest_test_mem = (void *)(TEST_AREA_GPA + GUEST_TEST_MEM_OFFSET);
- uint64_t guest_test_size = GUEST_TEST_MEM_SIZE;
- GUEST_SYNC(GUEST_STARTED);
- populate_test_area(test_area_base, TEST_MEM_DATA_PATTERN1);
Tangentially related to my "the patterns themselves are uninteresting" comment, and very related to the "avoid global defines for the patterns", I would like to structure this test so that it's easy to test GPA+size combinations. E.g. to test that KVM does the right thing when a conversion spans regions that KVM is likely to map with hugepages, or is misaligned with respect to hugepages, etc.
If the guest explicit tells the host which patterns to expect/write, then testing combinations of addresses is just a matter of looping in the guest, e.g.
struct { uint64_t offset; uint64_t size; } stages[] = { GUEST_STAGE(0, PAGE_SIZE), GUEST_STAGE(0, SZ_2M), GUEST_STAGE(PAGE_SIZE, PAGE_SIZE), GUEST_STAGE(PAGE_SIZE, SZ_2M), GUEST_STAGE(SZ_2M, PAGE_SIZE), }; const uint8_t init_p = 0xcc; uint64_t j; int i;
/* Memory should be shared by default. */ memset((void *)DATA_GPA, ~init_p, DATA_SIZE); GUEST_SYNC4(DATA_GPA, DATA_SIZE, ~init_p, init_p); memcmp_g(DATA_GPA, init_p, DATA_SIZE);
for (i = 0; i < ARRAY_SIZE(stages); i++) { blah blah blah }
- GUEST_SYNC(GUEST_PRIVATE_MEM_POPULATED);
As above, use the sync to effectively tell/request the host to do something, as opposed to having the host infer what it's supposed to do based on the current stage.
Aside from wanting to be able to iterate on GPA+size, I really, really dislike the GUEST_SYNC(stage) pattern. It works ok for small tests, but the pattern doesn't scale, e.g. see xen_shinfo_test.c. Even at smaller scales, the resulting magic numbers can be unnecessarily difficult to understand, e.g. smm_test.c isn't doing anything _that_ complex, but every time I look at the test I spend several minutes relearning what it does. Using macros instead of magic numbers helps a little, but it doesn't fix the underlying issue of bundling a bunch of testcases into a single monolithic sequences.
- GUEST_ASSERT(verify_test_area(test_area_base, TEST_MEM_DATA_PATTERN1,
TEST_MEM_DATA_PATTERN1));
Align params to help delineate the boundaries between the assert and the function call. E.g. if we ended up with this code:
GUEST_ASSERT(verify_test_area(test_area_base, TEST_MEM_DATA_PATTERN1, TEST_MEM_DATA_PATTERN1));
But as (much further) above, just assert in the comparison helper to avoid the problem entirely.
- kvm_hypercall_map_shared((uint64_t)guest_test_mem, guest_test_size);
- populate_guest_test_mem(guest_test_mem, TEST_MEM_DATA_PATTERN2);
- GUEST_SYNC(GUEST_SHARED_MEM_POPULATED);
- GUEST_ASSERT(verify_test_area(test_area_base, TEST_MEM_DATA_PATTERN1,
TEST_MEM_DATA_PATTERN5));
- kvm_hypercall_map_private((uint64_t)guest_test_mem, guest_test_size);
- populate_guest_test_mem(guest_test_mem, TEST_MEM_DATA_PATTERN3);
- GUEST_SYNC(GUEST_PRIVATE_MEM_POPULATED2);
- GUEST_ASSERT(verify_test_area(test_area_base, TEST_MEM_DATA_PATTERN1,
TEST_MEM_DATA_PATTERN3));
- GUEST_DONE();
+}
+#define ASSERT_CONV_TEST_EXIT_IO(vcpu, stage) \
- { \
struct ucall uc; \
ASSERT_EQ(vcpu->run->exit_reason, KVM_EXIT_IO); \
ASSERT_EQ(get_ucall(vcpu, &uc), UCALL_SYNC); \
ASSERT_EQ(uc.args[1], stage); \
- }
+#define ASSERT_GUEST_DONE(vcpu) \
- { \
struct ucall uc; \
ASSERT_EQ(vcpu->run->exit_reason, KVM_EXIT_IO); \
ASSERT_EQ(get_ucall(vcpu, &uc), UCALL_DONE); \
- }
+static void host_conv_test_fn(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
For future reference, "fn" is completely redundant. It's a function, no need for the label. When a function pointer is a parameter, and isn't obviously such, then "fn" can be helpful, but here it's just noise.
+{
- void *test_area_hva = addr_gpa2hva(vm, TEST_AREA_GPA);
- void *guest_test_mem_hva = (test_area_hva + GUEST_TEST_MEM_OFFSET);
- vcpu_run_and_handle_mapgpa(vm, vcpu);
- ASSERT_CONV_TEST_EXIT_IO(vcpu, GUEST_STARTED);
- populate_test_area(test_area_hva, TEST_MEM_DATA_PATTERN4);
- VM_STAGE_PROCESSED(GUEST_STARTED);
Again for future reference since I think it's better to avoid named stages, if you add macros to wrap trivial code in order to do something clever (pretty print the name of the stage), use a name that more precisely captures the triviality of the code. VM_STAGE_PROCESSED() sounds like the macro is doing bookkeeping, e.g. advancing a stage/counter or something, whereas something like print_stage() makes it fairly clear that the helper/macro is doing nothing more than printing, i.e. saves the reader from having to go look at the implementation to understand the code.
Regarding the printing itself, I suspect one of the reasons why you added the pretty printing of stages was to help debug. While there is a time and place for printf debugging, when it comes to KVM tests, the "need" to resort to printing out every step is usually the symptom of unhelpful assertions and error messages.
E.g. if pattern "encodes" its number (p1 = 0x11), capturing the line number, expected versus actual pattern, and the GPA provides pretty much all the debug info needed to figure out what failed.
Seeing the test actually make progress can be useful, e.g. as a heartbeart for long-running tests, but again outside of development/debug it's mostly noise. E.g. if a test fails in CI, earlier messages may or may not be captured depending on the whims of the CI instance/robot, and so we want as much information as possible in the error message itself.
- vcpu_run_and_handle_mapgpa(vm, vcpu);
- ASSERT_CONV_TEST_EXIT_IO(vcpu, GUEST_PRIVATE_MEM_POPULATED);
- TEST_ASSERT(verify_test_area(test_area_hva, TEST_MEM_DATA_PATTERN4,
TEST_MEM_DATA_PATTERN4), "failed");
- VM_STAGE_PROCESSED(GUEST_PRIVATE_MEM_POPULATED);
- vcpu_run_and_handle_mapgpa(vm, vcpu);
- ASSERT_CONV_TEST_EXIT_IO(vcpu, GUEST_SHARED_MEM_POPULATED);
- TEST_ASSERT(verify_test_area(test_area_hva, TEST_MEM_DATA_PATTERN4,
TEST_MEM_DATA_PATTERN2), "failed");
- populate_guest_test_mem(guest_test_mem_hva, TEST_MEM_DATA_PATTERN5);
- VM_STAGE_PROCESSED(GUEST_SHARED_MEM_POPULATED);
- vcpu_run_and_handle_mapgpa(vm, vcpu);
- ASSERT_CONV_TEST_EXIT_IO(vcpu, GUEST_PRIVATE_MEM_POPULATED2);
- TEST_ASSERT(verify_test_area(test_area_hva, TEST_MEM_DATA_PATTERN4,
TEST_MEM_DATA_PATTERN5), "failed");
- VM_STAGE_PROCESSED(GUEST_PRIVATE_MEM_POPULATED2);
- vcpu_run_and_handle_mapgpa(vm, vcpu);
- ASSERT_GUEST_DONE(vcpu);
+}
+static void execute_vm_with_private_test_mem(
enum vm_mem_backing_src_type test_mem_src)
+{
- struct kvm_vm *vm;
- struct kvm_enable_cap cap;
- struct kvm_vcpu *vcpu;
- vm = vm_create_with_one_vcpu(&vcpu, guest_conv_test_fn);
- vm_check_cap(vm, KVM_CAP_EXIT_HYPERCALL);
TEST_REQUIRE() so that the test is skipped, not failed.
- cap.cap = KVM_CAP_EXIT_HYPERCALL;
- cap.flags = 0;
- cap.args[0] = (1 << KVM_HC_MAP_GPA_RANGE);
- vm_ioctl(vm, KVM_ENABLE_CAP, &cap);
vm_enable_cap() will do most of this for you.
- vm_userspace_mem_region_add(vm, test_mem_src, TEST_AREA_GPA,
TEST_AREA_SLOT, TEST_AREA_SIZE / vm->page_size, KVM_MEM_PRIVATE);
Align params.
- vm_allocate_private_mem(vm, TEST_AREA_GPA, TEST_AREA_SIZE);
- virt_map(vm, TEST_AREA_GPA, TEST_AREA_GPA, TEST_AREA_SIZE/vm->page_size);
- host_conv_test_fn(vm, vcpu);
- kvm_vm_free(vm);
+}
+int main(int argc, char *argv[]) +{
- execute_vm_with_private_test_mem(
VM_MEM_SRC_ANONYMOUS_AND_RESTRICTED_MEMFD);
- /* Needs 2MB Hugepages */
- if (get_free_huge_2mb_pages() >= 1) {
printf("Running private mem test with 2M pages\n");
execute_vm_with_private_test_mem(
VM_MEM_SRC_ANON_HTLB2M_AND_RESTRICTED_MEMFD);
- } else
printf("Skipping private mem test with 2M pages\n");
- return 0;
+}
2.39.0.rc0.267.gcb52ba06e7-goog
On Mon, Dec 05, 2022, Vishal Annapurve wrote:
This series implements selftests targeting the feature floated by Chao via: https://lore.kernel.org/lkml/20221202061347.1070246-10-chao.p.peng@linux.int...
Below changes aim to test the fd based approach for guest private memory in context of normal (non-confidential) VMs executing on non-confidential platforms.
private_mem_test.c file adds selftest to access private memory from the guest via private/shared accesses and checking if the contents can be leaked to/accessed by vmm via shared memory view before/after conversions.
Updates in V2:
- Simplified vcpu run loop implementation API
- Removed VM creation logic from private mem library
I pushed a rework version of this series to:
git@github.com:sean-jc/linux.git x86/upm_base_support
Can you take a look and make sure that I didn't completely botch anything, and preserved the spirit of what you are testing?
Going forward, no need to send a v3 at this time. Whoever sends v11 of the series will be responsible for including tests.
No need to respond to comments either, unless of course there's something you object to, want to clarify, etc., in which case definitely pipe up.
Beyond the SEV series, do you have additional UPM testcases written? If so, can you post them, even if they're in a less-than-perfect state? If they're in a "too embarassing to post" state, feel from to send them off list :-)
Last question, do you have a list of testcases that you consider "required" for UPM? My off-the-cuff list of selftests I want to have before merging UPM is pretty short at this point:
- Negative testing of the memslot changes, e.g. bad alignment, bad fd, illegal memslot updates, etc. - Negative testing of restrictedmem, e.g. various combinations of overlapping bindings of a single restrictedmem instance. - Access vs. conversion stress, e.g. accessing a region in the guest while it's concurrently converted by the host, maybe with fancy guest code to try and detect TLB or ordering bugs?
On Tue, Jan 17, 2023 at 5:09 PM Sean Christopherson seanjc@google.com wrote:
On Mon, Dec 05, 2022, Vishal Annapurve wrote:
This series implements selftests targeting the feature floated by Chao via: https://lore.kernel.org/lkml/20221202061347.1070246-10-chao.p.peng@linux.int...
Below changes aim to test the fd based approach for guest private memory in context of normal (non-confidential) VMs executing on non-confidential platforms.
private_mem_test.c file adds selftest to access private memory from the guest via private/shared accesses and checking if the contents can be leaked to/accessed by vmm via shared memory view before/after conversions.
Updates in V2:
- Simplified vcpu run loop implementation API
- Removed VM creation logic from private mem library
I pushed a rework version of this series to:
git@github.com:sean-jc/linux.git x86/upm_base_support
Thanks for the review and spending time to rework this series. The revised version [1] looks cleaner and lighter.
Can you take a look and make sure that I didn't completely botch anything, and preserved the spirit of what you are testing?
Yeah, the reworked selftest structure [1] looks good to me in general. Few test cases that are missing in the reworked version: * Checking if contents of GPA ranges corresponding to private memory are not leaked to host userspace when accessing guest memory using HVA ranges * Checking if private to shared conversion of memory affects nearby private pages.
Going forward, no need to send a v3 at this time. Whoever sends v11 of the series will be responsible for including tests.
Sounds good to me.
No need to respond to comments either, unless of course there's something you object to, want to clarify, etc., in which case definitely pipe up.
Beyond the SEV series, do you have additional UPM testcases written? If so, can you post them, even if they're in a less-than-perfect state? If they're in a "too embarassing to post" state, feel from to send them off list :-)
Ackerley (ackerleytng@google.com) is working on publishing the rfc v3 version of TDX selftests that include UPM specific selftests. He plans to publish them this week.
Last question, do you have a list of testcases that you consider "required" for UPM? My off-the-cuff list of selftests I want to have before merging UPM is pretty short at this point:
- Negative testing of the memslot changes, e.g. bad alignment, bad fd, illegal memslot updates, etc.
- Negative testing of restrictedmem, e.g. various combinations of overlapping bindings of a single restrictedmem instance.
- Access vs. conversion stress, e.g. accessing a region in the guest while it's concurrently converted by the host, maybe with fancy guest code to try and detect TLB or ordering bugs?
List of testcases that I was tracking (covered by the current selftests) as required: 1) Ensure private memory contents are not accessible to host userspace using the HVA 2) Ensure shared memory contents are visible/accessible from both host userspace and the guest 3) Ensure 1 and 2 holds across explicit memory conversions 4) Exercise memory conversions with mixed shared/private memory pages in a huge page to catch issues like [2] 5) Ensure that explicit memory conversions don't affect nearby GPA ranges
Test Cases that will be covered by TDX/SNP selftests (in addition to above scenarios): 6) Ensure 1 and 2 holds across implicit memory conversions 7) Ensure that implicit memory conversions don't affect nearby GPA ranges
Additional testcases possible: 8) Running conversion tests for non-overlapping GPA ranges of same/different memslots from multiple vcpus
[1] - https://github.com/sean-jc/linux/commit/7e536bf3c45c623425bc84e8a96634efc3a6... [2] - https://lore.kernel.org/linux-mm/CAGtprH82H_fjtRbL0KUxOkgOk4pgbaEbAydDYfZ0qx...
On Tue, Jan 17, 2023 at 7:11 PM Vishal Annapurve vannapurve@google.com wrote:
...
Last question, do you have a list of testcases that you consider "required" for UPM? My off-the-cuff list of selftests I want to have before merging UPM is pretty short at this point:
- Negative testing of the memslot changes, e.g. bad alignment, bad fd, illegal memslot updates, etc.
- Negative testing of restrictedmem, e.g. various combinations of overlapping bindings of a single restrictedmem instance.
- Access vs. conversion stress, e.g. accessing a region in the guest while it's concurrently converted by the host, maybe with fancy guest code to try and detect TLB or ordering bugs?
List of testcases that I was tracking (covered by the current selftests) as required:
- Ensure private memory contents are not accessible to host userspace
using the HVA 2) Ensure shared memory contents are visible/accessible from both host userspace and the guest 3) Ensure 1 and 2 holds across explicit memory conversions 4) Exercise memory conversions with mixed shared/private memory pages in a huge page to catch issues like [2] 5) Ensure that explicit memory conversions don't affect nearby GPA ranges
Test Cases that will be covered by TDX/SNP selftests (in addition to above scenarios): 6) Ensure 1 and 2 holds across implicit memory conversions 7) Ensure that implicit memory conversions don't affect nearby GPA ranges
Additional testcases possible: 8) Running conversion tests for non-overlapping GPA ranges of same/different memslots from multiple vcpus
[1] - https://github.com/sean-jc/linux/commit/7e536bf3c45c623425bc84e8a96634efc3a6... [2] - https://lore.kernel.org/linux-mm/CAGtprH82H_fjtRbL0KUxOkgOk4pgbaEbAydDYfZ0qx...
List of additional testcases that could help increase basic coverage (including what sean mentioned earlier): 1) restrictedmem functionality testing - read/write/mmap should not work - fstat/fallocate should work as expected 2) restrictedmem registration/modification testing with: - bad alignment, bad fd, modifying properties of existing memslot - Installing multiple memslots with ranges within the same restricted mem files - deleting memslots with restricted memfd while guests are being executed 3) Runtime restricted mem testing: - Access vs conversion testing from multiple vcpus - conversion and access to non-overlapping ranges from multiple vcpus
Regards, Vishal
On Fri, Feb 10, 2023 at 11:59:23AM -0800, Vishal Annapurve wrote:
On Tue, Jan 17, 2023 at 7:11 PM Vishal Annapurve vannapurve@google.com wrote:
...
Last question, do you have a list of testcases that you consider "required" for UPM? My off-the-cuff list of selftests I want to have before merging UPM is pretty short at this point:
- Negative testing of the memslot changes, e.g. bad alignment, bad fd, illegal memslot updates, etc.
- Negative testing of restrictedmem, e.g. various combinations of overlapping bindings of a single restrictedmem instance.
- Access vs. conversion stress, e.g. accessing a region in the guest while it's concurrently converted by the host, maybe with fancy guest code to try and detect TLB or ordering bugs?
List of testcases that I was tracking (covered by the current selftests) as required:
- Ensure private memory contents are not accessible to host userspace
using the HVA 2) Ensure shared memory contents are visible/accessible from both host userspace and the guest 3) Ensure 1 and 2 holds across explicit memory conversions 4) Exercise memory conversions with mixed shared/private memory pages in a huge page to catch issues like [2] 5) Ensure that explicit memory conversions don't affect nearby GPA ranges
Test Cases that will be covered by TDX/SNP selftests (in addition to above scenarios): 6) Ensure 1 and 2 holds across implicit memory conversions 7) Ensure that implicit memory conversions don't affect nearby GPA ranges
Additional testcases possible: 8) Running conversion tests for non-overlapping GPA ranges of same/different memslots from multiple vcpus
[1] - https://github.com/sean-jc/linux/commit/7e536bf3c45c623425bc84e8a96634efc3a6... [2] - https://lore.kernel.org/linux-mm/CAGtprH82H_fjtRbL0KUxOkgOk4pgbaEbAydDYfZ0qx...
List of additional testcases that could help increase basic coverage (including what sean mentioned earlier):
- restrictedmem functionality testing
- read/write/mmap should not work
- fstat/fallocate should work as expected
- restrictedmem registration/modification testing with:
- bad alignment, bad fd, modifying properties of existing memslot
- Installing multiple memslots with ranges within the same
restricted mem files - deleting memslots with restricted memfd while guests are being executed
In case you havn't started, I will work on 1) and 2) for the following days. As a start, I will first add restrictedmem tests (without KVM) then move to new memslots related tests.
Chao
- Runtime restricted mem testing:
- Access vs conversion testing from multiple vcpus
- conversion and access to non-overlapping ranges from multiple vcpus
Regards, Vishal
Chao Peng chao.p.peng@linux.intel.com writes:
On Fri, Feb 10, 2023 at 11:59:23AM -0800, Vishal Annapurve wrote:
On Tue, Jan 17, 2023 at 7:11 PM Vishal Annapurve vannapurve@google.com wrote:
...
Last question, do you have a list of testcases that you
consider "required" for
UPM? My off-the-cuff list of selftests I want to have before
merging UPM is pretty
short at this point:
- Negative testing of the memslot changes, e.g. bad alignment, bad
fd,
illegal memslot updates, etc.
- Negative testing of restrictedmem, e.g. various combinations of
overlapping
bindings of a single restrictedmem instance.
- Access vs. conversion stress, e.g. accessing a region in the
guest while it's
concurrently converted by the host, maybe with fancy guest code
to try and
detect TLB or ordering bugs?
List of testcases that I was tracking (covered by the current selftests) as required:
- Ensure private memory contents are not accessible to host userspace
using the HVA 2) Ensure shared memory contents are visible/accessible from both host userspace and the guest 3) Ensure 1 and 2 holds across explicit memory conversions 4) Exercise memory conversions with mixed shared/private memory pages in a huge page to catch issues like [2] 5) Ensure that explicit memory conversions don't affect nearby GPA
ranges
Test Cases that will be covered by TDX/SNP selftests (in addition to above scenarios): 6) Ensure 1 and 2 holds across implicit memory conversions 7) Ensure that implicit memory conversions don't affect nearby GPA
ranges
Additional testcases possible: 8) Running conversion tests for non-overlapping GPA ranges of same/different memslots from multiple vcpus
[1] -
https://github.com/sean-jc/linux/commit/7e536bf3c45c623425bc84e8a96634efc3a6...
[2] -
https://lore.kernel.org/linux-mm/CAGtprH82H_fjtRbL0KUxOkgOk4pgbaEbAydDYfZ0qx...
List of additional testcases that could help increase basic coverage (including what sean mentioned earlier):
- restrictedmem functionality testing
- read/write/mmap should not work
- fstat/fallocate should work as expected
- restrictedmem registration/modification testing with:
- bad alignment, bad fd, modifying properties of existing memslot
- Installing multiple memslots with ranges within the same
restricted mem files - deleting memslots with restricted memfd while guests are being executed
In case you havn't started, I will work on 1) and 2) for the following days. As a start, I will first add restrictedmem tests (without KVM) then move to new memslots related tests.
Chao
- Runtime restricted mem testing:
- Access vs conversion testing from multiple vcpus
- conversion and access to non-overlapping ranges from multiple vcpus
Regards, Vishal
Chao, I'll work on
+ Running conversion tests for non-overlapping GPA ranges of same/different memslots from multiple vcpus + Deleting memslots with restricted memfd while guests are being executed + Installing multiple memslots with ranges within the same restricted mem files
this week.
On Mon, Mar 06, 2023 at 06:21:24PM +0000, Ackerley Tng wrote:
Chao Peng chao.p.peng@linux.intel.com writes:
On Fri, Feb 10, 2023 at 11:59:23AM -0800, Vishal Annapurve wrote:
On Tue, Jan 17, 2023 at 7:11 PM Vishal Annapurve vannapurve@google.com wrote:
...
Last question, do you have a list of testcases that you consider
"required" for
UPM? My off-the-cuff list of selftests I want to have before
merging UPM is pretty
short at this point:
- Negative testing of the memslot changes, e.g. bad alignment,
bad fd,
illegal memslot updates, etc.
- Negative testing of restrictedmem, e.g. various combinations
of overlapping
bindings of a single restrictedmem instance.
- Access vs. conversion stress, e.g. accessing a region in the
guest while it's
concurrently converted by the host, maybe with fancy guest
code to try and
detect TLB or ordering bugs?
List of testcases that I was tracking (covered by the current selftests) as required:
- Ensure private memory contents are not accessible to host userspace
using the HVA 2) Ensure shared memory contents are visible/accessible from both host userspace and the guest 3) Ensure 1 and 2 holds across explicit memory conversions 4) Exercise memory conversions with mixed shared/private memory pages in a huge page to catch issues like [2] 5) Ensure that explicit memory conversions don't affect nearby GPA
ranges
Test Cases that will be covered by TDX/SNP selftests (in addition to above scenarios): 6) Ensure 1 and 2 holds across implicit memory conversions 7) Ensure that implicit memory conversions don't affect nearby GPA
ranges
Additional testcases possible: 8) Running conversion tests for non-overlapping GPA ranges of same/different memslots from multiple vcpus
[1] - https://github.com/sean-jc/linux/commit/7e536bf3c45c623425bc84e8a96634efc3a6... [2] - https://lore.kernel.org/linux-mm/CAGtprH82H_fjtRbL0KUxOkgOk4pgbaEbAydDYfZ0qx...
List of additional testcases that could help increase basic coverage (including what sean mentioned earlier):
- restrictedmem functionality testing
- read/write/mmap should not work
- fstat/fallocate should work as expected
- restrictedmem registration/modification testing with:
- bad alignment, bad fd, modifying properties of existing memslot
- Installing multiple memslots with ranges within the same
restricted mem files - deleting memslots with restricted memfd while guests are being executed
In case you havn't started, I will work on 1) and 2) for the following days. As a start, I will first add restrictedmem tests (without KVM) then move to new memslots related tests.
Chao
- Runtime restricted mem testing:
- Access vs conversion testing from multiple vcpus
- conversion and access to non-overlapping ranges from multiple vcpus
Regards, Vishal
Chao, I'll work on
- Running conversion tests for non-overlapping GPA ranges of same/different memslots from multiple vcpus
- Deleting memslots with restricted memfd while guests are being executed
- Installing multiple memslots with ranges within the same restricted mem files
this week.
Thanks Ackerley. Looks good to me.
BTW, for whom may have interest, below are the testcases I added: https://github.com/chao-p/linux/commit/24dd1257d5c93acb8c8cc6c76c51cf6869970... https://github.com/chao-p/linux/commit/39a872ef09d539ce0c953451152eb05276b87... https://github.com/chao-p/linux/commit/ddd2c92b268a2fdc6158f82a6169ad1a57f2a...
Chao
Chao Peng chao.p.peng@linux.intel.com writes:
[...]
Chao, I'll work on
- Running conversion tests for non-overlapping GPA ranges of same/different memslots from multiple vcpus
- Deleting memslots with restricted memfd while guests are being executed
- Installing multiple memslots with ranges within the same restricted mem files
this week.
Thanks Ackerley. Looks good to me.
BTW, for whom may have interest, below are the testcases I added: https://github.com/chao-p/linux/commit/24dd1257d5c93acb8c8cc6c76c51cf6869970... https://github.com/chao-p/linux/commit/39a872ef09d539ce0c953451152eb05276b87... https://github.com/chao-p/linux/commit/ddd2c92b268a2fdc6158f82a6169ad1a57f2a...
Chao
Hi Chao,
While I was working on the selftests I noticed that this could perhaps be improved:
https://github.com/chao-p/linux/blob/ddd2c92b268a2fdc6158f82a6169ad1a57f2a01...
We should use a temporary variable to hold the result of fget(fd).
As it is now, if the user provides any invalide fd, like -1, slot->restrictedmem.file would be overwritten and lost.
We cannot update slot->restrictedmem.file until after the file_is_restrictedmem() check.
For now there isn't a big problem because kvm_restrictedmem_bind() is only called on a new struct kvm_memory_slot, but I think this should be changed in case the function is used elsewhere in future.
Ackerley
On Wed, Mar 08, 2023, Ackerley Tng wrote:
While I was working on the selftests I noticed that this could perhaps be improved:
https://github.com/chao-p/linux/blob/ddd2c92b268a2fdc6158f82a6169ad1a57f2a01...
We should use a temporary variable to hold the result of fget(fd).
As it is now, if the user provides any invalide fd, like -1, slot->restrictedmem.file would be overwritten and lost.
Meh, that can happen if and only if KVM has a bug elsehwere. If slot->restrictedmem.file is anything but NULL, KVM is hosed. E.g. waiting to set slot->restrictedmem.file until the very end wouldn't magically prevent a file descriptor leak if slot->restrictedmem.file is non-NULL.
We cannot update slot->restrictedmem.file until after the file_is_restrictedmem() check.
For now there isn't a big problem because kvm_restrictedmem_bind() is only called on a new struct kvm_memory_slot, but I think this should be changed in case the function is used elsewhere in future.
Nah, if anything we could add
if (WARN_ON_ONCE(slot->restrictedmem.file)) return -EIO;
but I don't see the point. There's exactly one caller and the entire scheme depends on binding the memslot to restricted memory when the memslot is created, i.e. this would be but one of many changes if KVM were to allowed re-binding a memslot.
On Wed, Jan 18, 2023 at 01:09:49AM +0000, Sean Christopherson wrote:
On Mon, Dec 05, 2022, Vishal Annapurve wrote:
This series implements selftests targeting the feature floated by Chao via: https://lore.kernel.org/lkml/20221202061347.1070246-10-chao.p.peng@linux.int...
Below changes aim to test the fd based approach for guest private memory in context of normal (non-confidential) VMs executing on non-confidential platforms.
private_mem_test.c file adds selftest to access private memory from the guest via private/shared accesses and checking if the contents can be leaked to/accessed by vmm via shared memory view before/after conversions.
Updates in V2:
- Simplified vcpu run loop implementation API
- Removed VM creation logic from private mem library
I pushed a rework version of this series to:
git@github.com:sean-jc/linux.git x86/upm_base_support
It still has build issue with lockdep enabled that I mentioned before:
https://lore.kernel.org/all/20230116134845.vboraky2nd56szos@box.shutemov.nam...
And when compiled with lockdep, it triggers a lot of warnings about missed locks, like this:
[ 59.632024] kvm: FIXME: Walk the memory attributes of the slot and set the mixed status appropriately [ 59.684888] ------------[ cut here ]------------ [ 59.690677] WARNING: CPU: 2 PID: 138 at include/linux/kvm_host.h:2307 kvm_mmu_do_page_fault+0x19a/0x1b0 [ 59.693531] CPU: 2 PID: 138 Comm: private_mem_con Not tainted 6.1.0-rc4-00624-g7e536bf3c45c-dirty #1 [ 59.696265] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 59.699586] RIP: 0010:kvm_mmu_do_page_fault+0x19a/0x1b0 [ 59.700720] Code: d8 1c 00 00 eb e3 65 48 8b 0c 25 28 00 00 00 48 3b 4c 24 50 75 1b 48 83 c4 58 5b 41 5e 41 5f 5d c3 48 81 c0 [ 59.704711] RSP: 0018:ffffc90000323c80 EFLAGS: 00010246 [ 59.705830] RAX: 0000000000000000 RBX: ffff888103bc8000 RCX: ffffffff8107dff0 [ 59.707353] RDX: 0000000000000001 RSI: ffffffff82549d49 RDI: ffffffff825abe77 [ 59.708865] RBP: ffffc90000e59000 R08: 0000000000000000 R09: 0000000000000000 [ 59.710369] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 59.711859] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000180 [ 59.713338] FS: 00007f2e556de740(0000) GS:ffff8881f9d00000(0000) knlGS:0000000000000000 [ 59.714978] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 59.716168] CR2: 0000000000000000 CR3: 0000000100e90005 CR4: 0000000000372ee0 [ 59.717631] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 59.719086] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 59.721148] Call Trace: [ 59.722661] <TASK> [ 59.723986] ? lock_is_held_type+0xdb/0x150 [ 59.726501] kvm_mmu_page_fault+0x41/0x170 [ 59.728946] vmx_handle_exit+0x343/0x750 [ 59.731007] kvm_arch_vcpu_ioctl_run+0x1d12/0x2790 [ 59.733319] kvm_vcpu_ioctl+0x4a6/0x590 [ 59.735195] __se_sys_ioctl+0x6a/0xb0 [ 59.736976] do_syscall_64+0x3d/0x80 [ 59.738698] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 59.740743] RIP: 0033:0x7f2e557d8f6b [ 59.741907] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 c7 04 24 10 00 00 00 b0 [ 59.747836] RSP: 002b:00007ffe8b84eb50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 59.750147] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2e557d8f6b [ 59.751754] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000007 [ 59.753361] RBP: 000000000042f880 R08: 0000000000000007 R09: 0000000000430210 [ 59.754952] R10: ca7f9f3d969d5d5c R11: 0000000000000246 R12: 000000000042d2a0 [ 59.756596] R13: 0000000100000000 R14: 0000000000422e00 R15: 00007f2e558f7000 [ 59.758231] </TASK> [ 59.758752] irq event stamp: 8637 [ 59.759540] hardirqs last enabled at (8647): [<ffffffff8119ae18>] __up_console_sem+0x68/0x90 [ 59.761309] hardirqs last disabled at (8654): [<ffffffff8119adfd>] __up_console_sem+0x4d/0x90 [ 59.763022] softirqs last enabled at (8550): [<ffffffff81123c7a>] __irq_exit_rcu+0xaa/0x130 [ 59.764731] softirqs last disabled at (8539): [<ffffffff81123c7a>] __irq_exit_rcu+0xaa/0x130 [ 59.766409] ---[ end trace 0000000000000000 ]---
On Wed, Jan 18, 2023, Kirill A. Shutemov wrote:
On Wed, Jan 18, 2023 at 01:09:49AM +0000, Sean Christopherson wrote:
On Mon, Dec 05, 2022, Vishal Annapurve wrote:
This series implements selftests targeting the feature floated by Chao via: https://lore.kernel.org/lkml/20221202061347.1070246-10-chao.p.peng@linux.int...
Below changes aim to test the fd based approach for guest private memory in context of normal (non-confidential) VMs executing on non-confidential platforms.
private_mem_test.c file adds selftest to access private memory from the guest via private/shared accesses and checking if the contents can be leaked to/accessed by vmm via shared memory view before/after conversions.
Updates in V2:
- Simplified vcpu run loop implementation API
- Removed VM creation logic from private mem library
I pushed a rework version of this series to:
git@github.com:sean-jc/linux.git x86/upm_base_support
It still has build issue with lockdep enabled that I mentioned before:
Yeah, I haven't updated the branch since last Friday, i.e. I haven't fixed the known bugs yet. I pushed the selftests changes at the same as everything else, just didn't get to typing up the emails until yesterday.
https://lore.kernel.org/all/20230116134845.vboraky2nd56szos@box.shutemov.nam...
And when compiled with lockdep, it triggers a lot of warnings about missed locks, like this:
Ah crud, I knew I was forgetting something. The lockdep assertion can simply be removed, mmu_lock doesn't need to be held to read attributes. I knew the assertion was wrong when I added it, but I wanted to remind myself to take a closer look at the usage of kvm_mem_is_private() and forgot to go back and delete the assertion.
The use of kvm_mem_is_private() in kvm_mmu_do_page_fault() is technically unsafe. Similar to getting the pfn, if mmu_lock isn't held, consuming the attributes (private vs. shared) needs MMU notifier protection, i.e. the attributes are safe to read only after mmu_invalidate_seq is snapshot.
However, is_private gets rechecked by __kvm_faultin_pfn(), which is protected by the sequence counter, and so the technically unsafe read is verified in the end. The obvious alternative is to make is_private an output of kvm_faultin_pfn(), but that's incorrect for SNP and TDX guests, in which case "is_private" is a property of the fault itself.
TL;DR: I'm going to delete the assertion and add a comment in kvm_mmu_do_page_fault() explaining why it's safe to consume kvm_mem_is_private() for "legacy" guests.
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 35a339891aed..da0afe81cf10 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2310,8 +2310,6 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr) #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn) { - lockdep_assert_held(kvm->mmu_lock); - return xa_to_value(xa_load(&kvm->mem_attr_array, gfn)); }
linux-kselftest-mirror@lists.linaro.org