This series implements selftests targeting the feature floated by Chao via: https://lore.kernel.org/linux-mm/20220310140911.50924-1-chao.p.peng@linux.in...
Below changes aim to test the fd based approach for guest private memory in context of normal (non-confidential) VMs executing on non-confidential platforms.
Confidential platforms along with the confidentiality aware software stack support a notion of private/shared accesses from the confidential VMs. Generally, a bit in the GPA conveys the shared/private-ness of the access. Non-confidential platforms don't have a notion of private or shared accesses from the guest VMs. To support this notion, KVM_HC_MAP_GPA_RANGE is modified to allow marking an access from a VM within a GPA range as always shared or private. Any suggestions regarding implementing this ioctl alternatively/cleanly are appreciated.
priv_memfd_test.c file adds a suite of two basic selftests to access private memory from the guest via private/shared access and checking if the contents can be leaked to/accessed by vmm via shared memory view.
Test results: 1) PMPAT - PrivateMemoryPrivateAccess test passes 2) PMSAT - PrivateMemorySharedAccess test fails currently and needs more analysis to understand the reason of failure.
Important - Below patch is needed to ensure host kernel crash is avoided while running these tests: https://github.com/vishals4gh/linux/commit/b9adedf777ad84af39042e9c19899600a...
Github link for the patches posted as part of this series: https://github.com/vishals4gh/linux/commits/priv_memfd_selftests_v1 Note that this series is dependent on Chao's v5 patches mentioned above applied on top of 5.17.
Vishal Annapurve (5): x86: kvm: HACK: Allow testing of priv memfd approach selftests: kvm: Fix inline assembly for hypercall selftests: kvm: Add a basic selftest test priv memfd selftests: kvm: priv_memfd_test: Add support for memory conversion selftests: kvm: priv_memfd_test: Add shared access test
arch/x86/include/uapi/asm/kvm_para.h | 1 + arch/x86/kvm/mmu/mmu.c | 9 +- arch/x86/kvm/x86.c | 16 +- include/linux/kvm_host.h | 3 + tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/lib/x86_64/processor.c | 2 +- tools/testing/selftests/kvm/priv_memfd_test.c | 410 ++++++++++++++++++ virt/kvm/kvm_main.c | 2 +- 8 files changed, 436 insertions(+), 8 deletions(-) create mode 100644 tools/testing/selftests/kvm/priv_memfd_test.c
Add plumbing in KVM logic to allow private memfd series: https://lore.kernel.org/linux-mm/20220310140911.50924-1-chao.p.peng@linux.in... to be tested with non-confidential VMs.
1) Existing hypercall KVM_HC_MAP_GPA_RANGE is modified to support marking pages of the guest memory as privately accessed or accessed in a shared fashion.
2) kvm_vcpu_is_private_gfn is defined to allow guest accesses to be categorized as shared or private based on the values set by KVM_HC_MAP_GPA_RANGE hypercall.
3) KVM_MEM_PRIVATE flag for memslots is marked as always supported.
Signed-off-by: Vishal Annapurve vannapurve@google.com --- arch/x86/include/uapi/asm/kvm_para.h | 1 + arch/x86/kvm/mmu/mmu.c | 9 +++++---- arch/x86/kvm/x86.c | 16 ++++++++++++++-- include/linux/kvm_host.h | 3 +++ virt/kvm/kvm_main.c | 2 +- 5 files changed, 24 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 6e64b27b2c1e..3bc9add4095d 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -102,6 +102,7 @@ struct kvm_clock_pairing { #define KVM_MAP_GPA_RANGE_PAGE_SZ_2M (1 << 0) #define KVM_MAP_GPA_RANGE_PAGE_SZ_1G (1 << 1) #define KVM_MAP_GPA_RANGE_ENC_STAT(n) (n << 4) +#define KVM_MARK_GPA_RANGE_ENC_ACCESS (1 << 8) #define KVM_MAP_GPA_RANGE_ENCRYPTED KVM_MAP_GPA_RANGE_ENC_STAT(1) #define KVM_MAP_GPA_RANGE_DECRYPTED KVM_MAP_GPA_RANGE_ENC_STAT(0)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index b1a30a751db0..ee9bc36011de 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3895,10 +3895,11 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
static bool kvm_vcpu_is_private_gfn(struct kvm_vcpu *vcpu, gfn_t gfn) { - /* - * At this time private gfn has not been supported yet. Other patch - * that enables it should change this. - */ + gpa_t priv_gfn_end = vcpu->priv_gfn + vcpu->priv_pages; + + if ((gfn >= vcpu->priv_gfn) && (gfn < priv_gfn_end)) + return true; + return false; }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 11a949928a85..3b17fa7f2192 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9186,8 +9186,20 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) if (!(vcpu->kvm->arch.hypercall_exit_enabled & (1 << KVM_HC_MAP_GPA_RANGE))) break;
- if (!PAGE_ALIGNED(gpa) || !npages || - gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa)) { + if (!PAGE_ALIGNED(gpa) || + gpa_to_gfn(gpa) + npages < gpa_to_gfn(gpa)) { + ret = -KVM_EINVAL; + break; + } + + if (attrs & KVM_MARK_GPA_RANGE_ENC_ACCESS) { + vcpu->priv_gfn = gpa_to_gfn(gpa); + vcpu->priv_pages = npages; + ret = 0; + break; + } + + if (!npages) { ret = -KVM_EINVAL; break; } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 0150e952a131..7c12a0bdb495 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -311,6 +311,9 @@ struct kvm_vcpu { u64 requests; unsigned long guest_debug;
+ uint64_t priv_gfn; + uint64_t priv_pages; + struct mutex mutex; struct kvm_run *run;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index df5311755a40..a31a58aa1b79 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1487,7 +1487,7 @@ static void kvm_replace_memslot(struct kvm *kvm,
bool __weak kvm_arch_private_memory_supported(struct kvm *kvm) { - return false; + return true; }
static int check_memory_region_flags(struct kvm *kvm,
Fix inline assembly for hypercall to explicitly set eax with hypercall number to allow the implementation to work even in cases where compiler would inline the function.
Signed-off-by: Vishal Annapurve vannapurve@google.com --- tools/testing/selftests/kvm/lib/x86_64/processor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c index 9f000dfb5594..4d88e1a553bf 100644 --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c @@ -1461,7 +1461,7 @@ uint64_t kvm_hypercall(uint64_t nr, uint64_t a0, uint64_t a1, uint64_t a2,
asm volatile("vmcall" : "=a"(r) - : "b"(a0), "c"(a1), "d"(a2), "S"(a3)); + : "a"(nr), "b"(a0), "c"(a1), "d"(a2), "S"(a3)); return r; }
Add KVM selftest to access private memory privately from the guest to test that memory updates from guest and userspace vmm don't affect each other.
Signed-off-by: Vishal Annapurve vannapurve@google.com --- tools/testing/selftests/kvm/Makefile | 1 + tools/testing/selftests/kvm/priv_memfd_test.c | 257 ++++++++++++++++++ 2 files changed, 258 insertions(+) create mode 100644 tools/testing/selftests/kvm/priv_memfd_test.c
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 21c2dbd21a81..f2f9a8546c66 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -97,6 +97,7 @@ TEST_GEN_PROGS_x86_64 += max_guest_memory_test TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test TEST_GEN_PROGS_x86_64 += memslot_perf_test TEST_GEN_PROGS_x86_64 += rseq_test +TEST_GEN_PROGS_x86_64 += priv_memfd_test TEST_GEN_PROGS_x86_64 += set_memory_region_test TEST_GEN_PROGS_x86_64 += steal_time TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test diff --git a/tools/testing/selftests/kvm/priv_memfd_test.c b/tools/testing/selftests/kvm/priv_memfd_test.c new file mode 100644 index 000000000000..11ccdb853a84 --- /dev/null +++ b/tools/testing/selftests/kvm/priv_memfd_test.c @@ -0,0 +1,257 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE /* for program_invocation_short_name */ +#include <fcntl.h> +#include <sched.h> +#include <signal.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/ioctl.h> + +#include <linux/compiler.h> +#include <linux/kernel.h> +#include <linux/kvm_para.h> +#include <linux/memfd.h> + +#include <test_util.h> +#include <kvm_util.h> +#include <processor.h> + +#define TEST_MEM_GPA 0xb0000000 +#define TEST_MEM_SIZE 0x2000 +#define TEST_MEM_END (TEST_MEM_GPA + TEST_MEM_SIZE) +#define SHARED_MEM_DATA_BYTE 0x66 +#define PRIV_MEM_DATA_BYTE 0x99 + +#define TEST_MEM_SLOT 10 + +#define VCPU_ID 0 + +#define VM_STAGE_PROCESSED(x) pr_info("Processed stage %s\n", #x) + +typedef bool (*vm_stage_handler_fn)(struct kvm_vm *, + void *, uint64_t); +typedef void (*guest_code_fn)(void); +struct test_run_helper { + char *test_desc; + vm_stage_handler_fn vmst_handler; + guest_code_fn guest_fn; + void *shared_mem; + int priv_memfd; +}; + +static bool verify_byte_pattern(void *mem, uint8_t byte, uint32_t size) +{ + uint8_t *buf = (uint8_t *)mem; + + for (uint32_t i = 0; i < size; i++) { + if (buf[i] != byte) + return false; + } + + return true; +} + +/* Test to verify guest private accesses on private memory with following steps: + * 1) Upon entry, guest signals VMM that it has started. + * 2) VMM populates the shared memory with known pattern and continues guest + * execution. + * 3) Guest writes a different pattern on the private memory and signals VMM + * that it has updated private memory. + * 4) VMM verifies its shared memory contents to be same as the data populated + * in step 2 and continues guest execution. + * 5) Guest verifies its private memory contents to be same as the data + * populated in step 3 and marks the end of the guest execution. + */ +#define PMPAT_ID 0 +#define PMPAT_DESC "PrivateMemoryPrivateAccessTest" + +/* Guest code execution stages for private mem access test */ +#define PMPAT_GUEST_STARTED 0ULL +#define PMPAT_GUEST_PRIV_MEM_UPDATED 1ULL + +static bool pmpat_handle_vm_stage(struct kvm_vm *vm, + void *test_info, + uint64_t stage) +{ + void *shared_mem = ((struct test_run_helper *)test_info)->shared_mem; + + switch (stage) { + case PMPAT_GUEST_STARTED: { + /* Initialize the contents of shared memory */ + memset(shared_mem, SHARED_MEM_DATA_BYTE, TEST_MEM_SIZE); + VM_STAGE_PROCESSED(PMPAT_GUEST_STARTED); + break; + } + case PMPAT_GUEST_PRIV_MEM_UPDATED: { + /* verify host updated data is still intact */ + TEST_ASSERT(verify_byte_pattern(shared_mem, + SHARED_MEM_DATA_BYTE, TEST_MEM_SIZE), + "Shared memory view mismatch"); + VM_STAGE_PROCESSED(PMPAT_GUEST_PRIV_MEM_UPDATED); + break; + } + default: + printf("Unhandled VM stage %ld\n", stage); + return false; + } + + return true; +} + +static void pmpat_guest_code(void) +{ + void *priv_mem = (void *)TEST_MEM_GPA; + int ret; + + GUEST_SYNC(PMPAT_GUEST_STARTED); + + /* Mark the GPA range to be treated as always accessed privately */ + ret = kvm_hypercall(KVM_HC_MAP_GPA_RANGE, TEST_MEM_GPA, + TEST_MEM_SIZE >> MIN_PAGE_SHIFT, + KVM_MARK_GPA_RANGE_ENC_ACCESS, 0); + GUEST_ASSERT_1(ret == 0, ret); + + memset(priv_mem, PRIV_MEM_DATA_BYTE, TEST_MEM_SIZE); + GUEST_SYNC(PMPAT_GUEST_PRIV_MEM_UPDATED); + + GUEST_ASSERT(verify_byte_pattern(priv_mem, + PRIV_MEM_DATA_BYTE, TEST_MEM_SIZE)); + + GUEST_DONE(); +} + +static struct test_run_helper priv_memfd_testsuite[] = { + [PMPAT_ID] = { + .test_desc = PMPAT_DESC, + .vmst_handler = pmpat_handle_vm_stage, + .guest_fn = pmpat_guest_code, + }, +}; + +static void vcpu_work(struct kvm_vm *vm, uint32_t test_id) +{ + struct kvm_run *run; + struct ucall uc; + uint64_t cmd; + + /* + * Loop until the guest is done. + */ + run = vcpu_state(vm, VCPU_ID); + + while (true) { + vcpu_run(vm, VCPU_ID); + + if (run->exit_reason == KVM_EXIT_IO) { + cmd = get_ucall(vm, VCPU_ID, &uc); + if (cmd != UCALL_SYNC) + break; + + if (!priv_memfd_testsuite[test_id].vmst_handler( + vm, &priv_memfd_testsuite[test_id], uc.args[1])) + break; + + continue; + } + + TEST_FAIL("Unhandled VCPU exit reason %d\n", run->exit_reason); + break; + } + + if (run->exit_reason == KVM_EXIT_IO && cmd == UCALL_ABORT) + TEST_FAIL("%s at %s:%ld, val = %lu", (const char *)uc.args[0], + __FILE__, uc.args[1], uc.args[2]); +} + +static void priv_memory_region_add(struct kvm_vm *vm, void *mem, uint32_t slot, + uint32_t size, uint64_t guest_addr, + uint32_t priv_fd, uint64_t priv_offset) +{ + struct kvm_userspace_memory_region_ext region_ext; + int ret; + + region_ext.region.slot = slot; + region_ext.region.flags = KVM_MEM_PRIVATE; + region_ext.region.guest_phys_addr = guest_addr; + region_ext.region.memory_size = size; + region_ext.region.userspace_addr = (uintptr_t) mem; + region_ext.private_fd = priv_fd; + region_ext.private_offset = priv_offset; + ret = ioctl(vm_get_fd(vm), KVM_SET_USER_MEMORY_REGION, ®ion_ext); + TEST_ASSERT(ret == 0, "Failed to register user region for gpa 0x%lx\n", + guest_addr); +} + +/* Do private access to the guest's private memory */ +static void setup_and_execute_test(uint32_t test_id) +{ + struct kvm_vm *vm; + int priv_memfd; + int ret; + void *shared_mem; + struct kvm_enable_cap cap; + + vm = vm_create_default(VCPU_ID, 0, + priv_memfd_testsuite[test_id].guest_fn); + + /* Allocate shared memory */ + shared_mem = mmap(NULL, TEST_MEM_SIZE, + PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1, 0); + TEST_ASSERT(shared_mem != MAP_FAILED, "Failed to mmap() host"); + + /* Allocate private memory */ + priv_memfd = memfd_create("vm_private_mem", MFD_INACCESSIBLE); + TEST_ASSERT(priv_memfd != -1, "Failed to create priv_memfd"); + ret = fallocate(priv_memfd, 0, 0, TEST_MEM_SIZE); + TEST_ASSERT(ret != -1, "fallocate failed"); + + priv_memory_region_add(vm, shared_mem, + TEST_MEM_SLOT, TEST_MEM_SIZE, + TEST_MEM_GPA, priv_memfd, 0); + + pr_info("Mapping test memory pages 0x%x page_size 0x%x\n", + TEST_MEM_SIZE/vm_get_page_size(vm), + vm_get_page_size(vm)); + virt_map(vm, TEST_MEM_GPA, TEST_MEM_GPA, + (TEST_MEM_SIZE/vm_get_page_size(vm))); + + /* Enable exit on KVM_HC_MAP_GPA_RANGE */ + pr_info("Enabling exit on map_gpa_range hypercall\n"); + ret = ioctl(vm_get_fd(vm), KVM_CHECK_EXTENSION, KVM_CAP_EXIT_HYPERCALL); + TEST_ASSERT(ret & (1 << KVM_HC_MAP_GPA_RANGE), + "VM exit on MAP_GPA_RANGE HC not supported"); + cap.cap = KVM_CAP_EXIT_HYPERCALL; + cap.flags = 0; + cap.args[0] = (1 << KVM_HC_MAP_GPA_RANGE); + ret = ioctl(vm_get_fd(vm), KVM_ENABLE_CAP, &cap); + TEST_ASSERT(ret == 0, + "Failed to enable exit on MAP_GPA_RANGE hypercall\n"); + + priv_memfd_testsuite[test_id].shared_mem = shared_mem; + priv_memfd_testsuite[test_id].priv_memfd = priv_memfd; + vcpu_work(vm, test_id); + + munmap(shared_mem, TEST_MEM_SIZE); + priv_memfd_testsuite[test_id].shared_mem = NULL; + close(priv_memfd); + priv_memfd_testsuite[test_id].priv_memfd = -1; + kvm_vm_free(vm); +} + +int main(int argc, char *argv[]) +{ + /* Tell stdout not to buffer its content */ + setbuf(stdout, NULL); + + for (uint32_t i = 0; i < ARRAY_SIZE(priv_memfd_testsuite); i++) { + pr_info("=== Starting test %s... ===\n", + priv_memfd_testsuite[i].test_desc); + setup_and_execute_test(i); + pr_info("--- completed test %s ---\n\n", + priv_memfd_testsuite[i].test_desc); + } + + return 0; +}
Add handling of explicit private/shared memory conversion using KVM_HC_MAP_GPA_RANGE and implicit memory conversion by handling KVM_EXIT_MEMORY_ERROR.
Signed-off-by: Vishal Annapurve vannapurve@google.com --- tools/testing/selftests/kvm/priv_memfd_test.c | 87 +++++++++++++++++++ 1 file changed, 87 insertions(+)
diff --git a/tools/testing/selftests/kvm/priv_memfd_test.c b/tools/testing/selftests/kvm/priv_memfd_test.c index 11ccdb853a84..0e6c19501f27 100644 --- a/tools/testing/selftests/kvm/priv_memfd_test.c +++ b/tools/testing/selftests/kvm/priv_memfd_test.c @@ -129,6 +129,83 @@ static struct test_run_helper priv_memfd_testsuite[] = { }, };
+static void handle_vm_exit_hypercall(struct kvm_run *run, + uint32_t test_id) +{ + uint64_t gpa, npages, attrs; + int priv_memfd = + priv_memfd_testsuite[test_id].priv_memfd; + int ret; + int fallocate_mode; + + if (run->hypercall.nr != KVM_HC_MAP_GPA_RANGE) { + TEST_FAIL("Unhandled Hypercall %lld\n", + run->hypercall.nr); + } + + gpa = run->hypercall.args[0]; + npages = run->hypercall.args[1]; + attrs = run->hypercall.args[2]; + + if ((gpa >= TEST_MEM_GPA) && ((gpa + + (npages << MIN_PAGE_SHIFT)) <= TEST_MEM_END)) { + TEST_FAIL("Unhandled gpa 0x%lx npages %ld\n", + gpa, npages); + } + + if (attrs & KVM_MAP_GPA_RANGE_ENCRYPTED) + fallocate_mode = 0; + else { + fallocate_mode = (FALLOC_FL_PUNCH_HOLE | + FALLOC_FL_KEEP_SIZE); + } + pr_info("Converting off 0x%lx pages 0x%lx to %s\n", + (gpa - TEST_MEM_GPA), npages, + fallocate_mode ? + "shared" : "private"); + ret = fallocate(priv_memfd, fallocate_mode, + (gpa - TEST_MEM_GPA), + npages << MIN_PAGE_SHIFT); + TEST_ASSERT(ret != -1, + "fallocate failed in hc handling"); + run->hypercall.ret = 0; +} + +static void handle_vm_exit_memory_error(struct kvm_run *run, + uint32_t test_id) +{ + uint64_t gpa, size, flags; + int ret; + int priv_memfd = + priv_memfd_testsuite[test_id].priv_memfd; + int fallocate_mode; + + gpa = run->memory.gpa; + size = run->memory.size; + flags = run->memory.flags; + + if ((gpa < TEST_MEM_GPA) || ((gpa + size) + > TEST_MEM_END)) { + TEST_FAIL("Unhandled gpa 0x%lx size 0x%lx\n", + gpa, size); + } + + if (flags & KVM_MEMORY_EXIT_FLAG_PRIVATE) + fallocate_mode = 0; + else { + fallocate_mode = (FALLOC_FL_PUNCH_HOLE | + FALLOC_FL_KEEP_SIZE); + } + pr_info("Converting off 0x%lx size 0x%lx to %s\n", + (gpa - TEST_MEM_GPA), size, + fallocate_mode ? + "shared" : "private"); + ret = fallocate(priv_memfd, fallocate_mode, + (gpa - TEST_MEM_GPA), size); + TEST_ASSERT(ret != -1, + "fallocate failed in memory error handling"); +} + static void vcpu_work(struct kvm_vm *vm, uint32_t test_id) { struct kvm_run *run; @@ -155,6 +232,16 @@ static void vcpu_work(struct kvm_vm *vm, uint32_t test_id) continue; }
+ if (run->exit_reason == KVM_EXIT_HYPERCALL) { + handle_vm_exit_hypercall(run, test_id); + continue; + } + + if (run->exit_reason == KVM_EXIT_MEMORY_ERROR) { + handle_vm_exit_memory_error(run, test_id); + continue; + } + TEST_FAIL("Unhandled VCPU exit reason %d\n", run->exit_reason); break; }
Add a test to access private memory in shared fashion which should exercise implicit memory conversion path using KVM_EXIT_MEMORY_ERROR.
Signed-off-by: Vishal Annapurve vannapurve@google.com --- tools/testing/selftests/kvm/priv_memfd_test.c | 66 +++++++++++++++++++ 1 file changed, 66 insertions(+)
diff --git a/tools/testing/selftests/kvm/priv_memfd_test.c b/tools/testing/selftests/kvm/priv_memfd_test.c index 0e6c19501f27..607fdc149c7d 100644 --- a/tools/testing/selftests/kvm/priv_memfd_test.c +++ b/tools/testing/selftests/kvm/priv_memfd_test.c @@ -121,12 +121,78 @@ static void pmpat_guest_code(void) GUEST_DONE(); }
+/* Test to verify guest shared accesses on private memory with following steps: + * 1) Upon entry, guest signals VMM that it has started. + * 2) VMM populates the shared memory with known pattern and continues guest + * execution. + * 3) Guest reads private gpa range in a shared fashion and verifies that it + * reads what VMM has written in step2. + * 3) Guest writes a different pattern on the shared memory and signals VMM + * that it has updated the shared memory. + * 4) VMM verifies shared memory contents to be same as the data populated + * in step 3 and continues guest execution. + */ +#define PMSAT_ID 1 +#define PMSAT_DESC "PrivateMemorySharedAccessTest" + +/* Guest code execution stages for private mem access test */ +#define PMSAT_GUEST_STARTED 0ULL +#define PMSAT_GUEST_TEST_MEM_UPDATED 1ULL + +static bool pmsat_handle_vm_stage(struct kvm_vm *vm, + void *test_info, + uint64_t stage) +{ + void *shared_mem = ((struct test_run_helper *)test_info)->shared_mem; + + switch (stage) { + case PMSAT_GUEST_STARTED: { + /* Initialize the contents of shared memory */ + memset(shared_mem, SHARED_MEM_DATA_BYTE, TEST_MEM_SIZE); + VM_STAGE_PROCESSED(PMSAT_GUEST_STARTED); + break; + } + case PMSAT_GUEST_TEST_MEM_UPDATED: { + /* verify data to be same as what guest wrote */ + TEST_ASSERT(verify_byte_pattern(shared_mem, + PRIV_MEM_DATA_BYTE, TEST_MEM_SIZE), + "Shared memory view mismatch"); + VM_STAGE_PROCESSED(PMSAT_GUEST_PRIV_MEM_UPDATED); + break; + } + default: + printf("Unhandled VM stage %ld\n", stage); + return false; + } + + return true; +} + +static void pmsat_guest_code(void) +{ + void *shared_mem = (void *)TEST_MEM_GPA; + + GUEST_SYNC(PMSAT_GUEST_STARTED); + GUEST_ASSERT(verify_byte_pattern(shared_mem, + SHARED_MEM_DATA_BYTE, TEST_MEM_SIZE)); + + memset(shared_mem, PRIV_MEM_DATA_BYTE, TEST_MEM_SIZE); + GUEST_SYNC(PMSAT_GUEST_TEST_MEM_UPDATED); + + GUEST_DONE(); +} + static struct test_run_helper priv_memfd_testsuite[] = { [PMPAT_ID] = { .test_desc = PMPAT_DESC, .vmst_handler = pmpat_handle_vm_stage, .guest_fn = pmpat_guest_code, }, + [PMSAT_ID] = { + .test_desc = PMSAT_DESC, + .vmst_handler = pmsat_handle_vm_stage, + .guest_fn = pmsat_guest_code, + }, };
static void handle_vm_exit_hypercall(struct kvm_run *run,
On 4/9/2022 2:35 AM, Vishal Annapurve wrote:
This series implements selftests targeting the feature floated by Chao via: https://lore.kernel.org/linux-mm/20220310140911.50924-1-chao.p.peng@linux.in...
Thanks for working on this.
Below changes aim to test the fd based approach for guest private memory in context of normal (non-confidential) VMs executing on non-confidential platforms.
Confidential platforms along with the confidentiality aware software stack support a notion of private/shared accesses from the confidential VMs. Generally, a bit in the GPA conveys the shared/private-ness of the access. Non-confidential platforms don't have a notion of private or shared accesses from the guest VMs. To support this notion, KVM_HC_MAP_GPA_RANGE is modified to allow marking an access from a VM within a GPA range as always shared or private. Any suggestions regarding implementing this ioctl alternatively/cleanly are appreciated.
priv_memfd_test.c file adds a suite of two basic selftests to access private memory from the guest via private/shared access and checking if the contents can be leaked to/accessed by vmm via shared memory view.
Test results:
- PMPAT - PrivateMemoryPrivateAccess test passes
- PMSAT - PrivateMemorySharedAccess test fails currently and needs more
analysis to understand the reason of failure.
That could be because of the return code (*r = -1) from the KVM_EXIT_MEMORY_ERROR. This gets interpreted as -EPERM in the VMM when the vcpu_run exits.
+ vcpu->run->exit_reason = KVM_EXIT_MEMORY_ERROR; + vcpu->run->memory.flags = flags; + vcpu->run->memory.padding = 0; + vcpu->run->memory.gpa = fault->gfn << PAGE_SHIFT; + vcpu->run->memory.size = PAGE_SIZE; + fault->pfn = -1; + *r = -1; + return true;
Regards Nikunj
[1] https://lore.kernel.org/all/20220310140911.50924-10-chao.p.peng@linux.intel....
On Mon, Apr 11, 2022 at 05:31:09PM +0530, Nikunj A. Dadhania wrote:
On 4/9/2022 2:35 AM, Vishal Annapurve wrote:
This series implements selftests targeting the feature floated by Chao via: https://lore.kernel.org/linux-mm/20220310140911.50924-1-chao.p.peng@linux.in...
Thanks for working on this.
Below changes aim to test the fd based approach for guest private memory in context of normal (non-confidential) VMs executing on non-confidential platforms.
Confidential platforms along with the confidentiality aware software stack support a notion of private/shared accesses from the confidential VMs. Generally, a bit in the GPA conveys the shared/private-ness of the access. Non-confidential platforms don't have a notion of private or shared accesses from the guest VMs. To support this notion, KVM_HC_MAP_GPA_RANGE is modified to allow marking an access from a VM within a GPA range as always shared or private. Any suggestions regarding implementing this ioctl alternatively/cleanly are appreciated.
priv_memfd_test.c file adds a suite of two basic selftests to access private memory from the guest via private/shared access and checking if the contents can be leaked to/accessed by vmm via shared memory view.
Test results:
- PMPAT - PrivateMemoryPrivateAccess test passes
- PMSAT - PrivateMemorySharedAccess test fails currently and needs more
analysis to understand the reason of failure.
That could be because of the return code (*r = -1) from the KVM_EXIT_MEMORY_ERROR. This gets interpreted as -EPERM in the VMM when the vcpu_run exits.
- vcpu->run->exit_reason = KVM_EXIT_MEMORY_ERROR;
- vcpu->run->memory.flags = flags;
- vcpu->run->memory.padding = 0;
- vcpu->run->memory.gpa = fault->gfn << PAGE_SHIFT;
- vcpu->run->memory.size = PAGE_SIZE;
- fault->pfn = -1;
- *r = -1;
- return true;
That's true. The current private mem patch treats KVM_EXIT_MEMORY_ERROR as error for KVM_RUN. That behavior needs to be discussed, but right now (v5) it hits the ASSERT in tools/testing/selftests/kvm/lib/kvm_util.c before you have chance to handle KVM_EXIT_MEMORY_ERROR in this patch series.
void vcpu_run(struct kvm_vm *vm, uint32_t vcpuid) { int ret = _vcpu_run(vm, vcpuid); TEST_ASSERT(ret == 0, "KVM_RUN IOCTL failed, " "rc: %i errno: %i", ret, errno); }
Thanks, Chao
Regards Nikunj
[1] https://lore.kernel.org/all/20220310140911.50924-10-chao.p.peng@linux.intel....
On Fri, Apr 8, 2022, at 2:05 PM, Vishal Annapurve wrote:
This series implements selftests targeting the feature floated by Chao via: https://lore.kernel.org/linux-mm/20220310140911.50924-1-chao.p.peng@linux.in...
Below changes aim to test the fd based approach for guest private memory in context of normal (non-confidential) VMs executing on non-confidential platforms.
Confidential platforms along with the confidentiality aware software stack support a notion of private/shared accesses from the confidential VMs. Generally, a bit in the GPA conveys the shared/private-ness of the access. Non-confidential platforms don't have a notion of private or shared accesses from the guest VMs. To support this notion, KVM_HC_MAP_GPA_RANGE is modified to allow marking an access from a VM within a GPA range as always shared or private. Any suggestions regarding implementing this ioctl alternatively/cleanly are appreciated.
This is fantastic. I do think we need to decide how this should work in general. We have a few platforms with somewhat different properties:
TDX: The guest decides, per memory access (using a GPA bit), whether an access is private or shared. In principle, the same address could be *both* and be distinguished by only that bit, and the two addresses would refer to different pages.
SEV: The guest decides, per memory access (using a GPA bit), whether an access is private or shared. At any given time, a physical address (with that bit masked off) can be private, shared, or invalid, but it can't be valid as private and shared at the same time.
pKVM (currently, as I understand it): the guest decides by hypercall, in advance of an access, which addresses are private and which are shared.
This series, if I understood it correctly, is like TDX except with no hardware security.
Sean or Chao, do you have a clear sense of whether the current fd-based private memory proposal can cleanly support SEV and pKVM? What, if anything, needs to be done on the API side to get that working well? I don't think we need to support SEV or pKVM right away to get this merged, but I do think we should understand how the API can map to them.
On Tue, Apr 12, 2022 at 05:16:22PM -0700, Andy Lutomirski wrote:
On Fri, Apr 8, 2022, at 2:05 PM, Vishal Annapurve wrote:
This series implements selftests targeting the feature floated by Chao via: https://lore.kernel.org/linux-mm/20220310140911.50924-1-chao.p.peng@linux.in...
Below changes aim to test the fd based approach for guest private memory in context of normal (non-confidential) VMs executing on non-confidential platforms.
Confidential platforms along with the confidentiality aware software stack support a notion of private/shared accesses from the confidential VMs. Generally, a bit in the GPA conveys the shared/private-ness of the access. Non-confidential platforms don't have a notion of private or shared accesses from the guest VMs. To support this notion, KVM_HC_MAP_GPA_RANGE is modified to allow marking an access from a VM within a GPA range as always shared or private. Any suggestions regarding implementing this ioctl alternatively/cleanly are appreciated.
This is fantastic. I do think we need to decide how this should work in general. We have a few platforms with somewhat different properties:
TDX: The guest decides, per memory access (using a GPA bit), whether an access is private or shared. In principle, the same address could be *both* and be distinguished by only that bit, and the two addresses would refer to different pages.
SEV: The guest decides, per memory access (using a GPA bit), whether an access is private or shared. At any given time, a physical address (with that bit masked off) can be private, shared, or invalid, but it can't be valid as private and shared at the same time.
pKVM (currently, as I understand it): the guest decides by hypercall, in advance of an access, which addresses are private and which are shared.
This series, if I understood it correctly, is like TDX except with no hardware security.
Sean or Chao, do you have a clear sense of whether the current fd-based private memory proposal can cleanly support SEV and pKVM? What, if anything, needs to be done on the API side to get that working well? I don't think we need to support SEV or pKVM right away to get this merged, but I do think we should understand how the API can map to them.
I've been looking at porting the SEV-SNP hypervisor patches over to using memfd, and I hit an issue that I think is generally applicable to SEV/SEV-ES as well. Namely at guest init time we have something like the following flow:
VMM: - allocate shared memory to back the guest and map it into guest address space - initialize shared memory with initialize memory contents (namely the BIOS) - ask KVM to encrypt these pages in-place and measure them to generate the initial measured payload for attestation, via KVM_SEV_LAUNCH_UPDATE with the GPA for each range of memory to encrypt. KVM: - issue SEV_LAUNCH_UPDATE firmware command, which takes an HPA as input and does an in-place encryption/measure of the page.
With current v5 of the memfd/UPM series, I think the expected flow is that we would fallocate() these ranges from the private fd backend in advance of calling KVM_SEV_LAUNCH_UPDATE (if VMM does it after we'd destroy the initial guest payload, since they'd be replaced by newly-allocated pages). But if VMM does it before, VMM has no way to initialize the guest memory contents, since mmap()/pwrite() are disallowed due to MFD_INACCESSIBLE.
I think something similar to your proposal[1] here of making pread()/pwrite() possible for private-fd-backed memory that's been flagged as "shareable" would work for this case. Although here the "shareable" flag could be removed immediately upon successful completion of the SEV_LAUNCH_UPDATE firmware command.
I think with TDX this isn't an issue because their analagous TDH.MEM.PAGE.ADD seamcall takes a pair of source/dest HPA as input params, so the VMM wouldn't need write access to dest HPA at any point, just source HPA.
[1] https://lwn.net/ml/linux-kernel/eefc3c74-acca-419c-8947-726ce2458446@www.fas...
On Wed, Apr 13, 2022 at 08:42:00AM -0500, Michael Roth wrote:
On Tue, Apr 12, 2022 at 05:16:22PM -0700, Andy Lutomirski wrote:
On Fri, Apr 8, 2022, at 2:05 PM, Vishal Annapurve wrote:
This series implements selftests targeting the feature floated by Chao via: https://lore.kernel.org/linux-mm/20220310140911.50924-1-chao.p.peng@linux.in...
Below changes aim to test the fd based approach for guest private memory in context of normal (non-confidential) VMs executing on non-confidential platforms.
Confidential platforms along with the confidentiality aware software stack support a notion of private/shared accesses from the confidential VMs. Generally, a bit in the GPA conveys the shared/private-ness of the access. Non-confidential platforms don't have a notion of private or shared accesses from the guest VMs. To support this notion, KVM_HC_MAP_GPA_RANGE is modified to allow marking an access from a VM within a GPA range as always shared or private. Any suggestions regarding implementing this ioctl alternatively/cleanly are appreciated.
This is fantastic. I do think we need to decide how this should work in general. We have a few platforms with somewhat different properties:
TDX: The guest decides, per memory access (using a GPA bit), whether an access is private or shared. In principle, the same address could be *both* and be distinguished by only that bit, and the two addresses would refer to different pages.
SEV: The guest decides, per memory access (using a GPA bit), whether an access is private or shared. At any given time, a physical address (with that bit masked off) can be private, shared, or invalid, but it can't be valid as private and shared at the same time.
pKVM (currently, as I understand it): the guest decides by hypercall, in advance of an access, which addresses are private and which are shared.
This series, if I understood it correctly, is like TDX except with no hardware security.
Sean or Chao, do you have a clear sense of whether the current fd-based private memory proposal can cleanly support SEV and pKVM? What, if anything, needs to be done on the API side to get that working well? I don't think we need to support SEV or pKVM right away to get this merged, but I do think we should understand how the API can map to them.
I've been looking at porting the SEV-SNP hypervisor patches over to using memfd, and I hit an issue that I think is generally applicable to SEV/SEV-ES as well. Namely at guest init time we have something like the following flow:
VMM: - allocate shared memory to back the guest and map it into guest address space - initialize shared memory with initialize memory contents (namely the BIOS) - ask KVM to encrypt these pages in-place and measure them to generate the initial measured payload for attestation, via KVM_SEV_LAUNCH_UPDATE with the GPA for each range of memory to encrypt. KVM: - issue SEV_LAUNCH_UPDATE firmware command, which takes an HPA as input and does an in-place encryption/measure of the page.
With current v5 of the memfd/UPM series, I think the expected flow is that we would fallocate() these ranges from the private fd backend in advance of calling KVM_SEV_LAUNCH_UPDATE (if VMM does it after we'd destroy the initial guest payload, since they'd be replaced by newly-allocated pages). But if VMM does it before, VMM has no way to initialize the guest memory contents, since mmap()/pwrite() are disallowed due to MFD_INACCESSIBLE.
OK, so for SEV, basically VMM puts vBIOS directly into guest memory and then do in-place measurement.
TDX has no problem because TDX temporarily uses a VMM buffer (vs. guest memory) to hold the vBIOS and then asks SEAM-MODULE to measure and copy that to guest memory.
Maybe something like SHM_LOCK should be used instead of the aggressive MFD_INACCESSIBLE. Before VMM calling SHM_LOCK on the memfd, the content can be changed but after that it's not visible to userspace VMM. This gives userspace a chance to modify the data in private page.
Chao
I think something similar to your proposal[1] here of making pread()/pwrite() possible for private-fd-backed memory that's been flagged as "shareable" would work for this case. Although here the "shareable" flag could be removed immediately upon successful completion of the SEV_LAUNCH_UPDATE firmware command.
I think with TDX this isn't an issue because their analagous TDH.MEM.PAGE.ADD seamcall takes a pair of source/dest HPA as input params, so the VMM wouldn't need write access to dest HPA at any point, just source HPA.
[1] https://lwn.net/ml/linux-kernel/eefc3c74-acca-419c-8947-726ce2458446@www.fas...
linux-kselftest-mirror@lists.linaro.org