[PATCH v2 0/4] KVM ARM64 pre_fault_memory

List overview All Threads
Download

newer

older

[PATCH v3 0/2] libbpf: fix BTF...

[PATCH v5] selftests: af_unix: Add...

Jack Thomson

13 Oct 2025 13 Oct '25

3:14 p.m.

From: Jack Thomson jackabt@amazon.com

This patch series adds ARM64 support for the KVM_PRE_FAULT_MEMORY feature, which was previously only available on x86 [1]. This allows us to reduce the number of stage-2 faults during execution. This is of benefit in post-copy migration scenarios, particularly in memory intensive applications, where we are experiencing high latencies due to the stage-2 faults.

Patch Overview:

- The first patch adds support for the KVM_PRE_FAULT_MEMORY ioctl on arm64.

- The second patch fixes an issue with unaligned mmap allocations in the selftests.

- The third patch updates the pre_fault_memory_test to support arm64.

- The last patch extends the pre_fault_memory_test to cover different vm memory backings.

=== Changes Since v1 [2] ===

Addressing feedback from Oliver:

- No pre-fault flag is passed to user_mem_abort() or gmem_abort() now aborts are synthesized. - Remove retry loop from kvm_arch_vcpu_pre_fault_memory()

[1]: https://lore.kernel.org/kvm/20240710174031.312055-1-pbonzini@redhat.com [2]: https://lore.kernel.org/all/20250911134648.58945-1-jackabt.amazon@gmail.com

Jack Thomson (4): KVM: arm64: Add pre_fault_memory implementation KVM: selftests: Fix unaligned mmap allocations KVM: selftests: Enable pre_fault_memory_test for arm64 KVM: selftests: Add option for different backing in pre-fault tests

Documentation/virt/kvm/api.rst | 3 +- arch/arm64/kvm/Kconfig | 1 + arch/arm64/kvm/arm.c | 1 + arch/arm64/kvm/mmu.c | 73 +++++++++++- tools/testing/selftests/kvm/Makefile.kvm | 1 + tools/testing/selftests/kvm/lib/kvm_util.c | 12 +- .../selftests/kvm/pre_fault_memory_test.c | 110 +++++++++++++----- 7 files changed, 163 insertions(+), 38 deletions(-)

base-commit: 42188667be387867d2bf763d028654cbad046f7b

-- 2.43.0

Show replies by date

Jack Thomson

13 Oct 13 Oct

3:14 p.m.

New subject: [PATCH v2 1/4] KVM: arm64: Add pre_fault_memory implementation

From: Jack Thomson jackabt@amazon.com

Add kvm_arch_vcpu_pre_fault_memory() for arm64. The implementation hands off the stage-2 faulting logic to either gmem_abort() or user_mem_abort().

Add an optional page_size output parameter to user_mem_abort() to return the VMA page size, which is needed when pre-faulting.

Update the documentation to clarify x86 specific behaviour.

Signed-off-by: Jack Thomson jackabt@amazon.com --- Documentation/virt/kvm/api.rst | 3 +- arch/arm64/kvm/Kconfig | 1 + arch/arm64/kvm/arm.c | 1 + arch/arm64/kvm/mmu.c | 73 ++++++++++++++++++++++++++++++++-- 4 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index c17a87a0a5ac..9e8cc4eb505d 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6461,7 +6461,8 @@ Errors: KVM_PRE_FAULT_MEMORY populates KVM's stage-2 page tables used to map memory for the current vCPU state. KVM maps memory as if the vCPU generated a stage-2 read page fault, e.g. faults in memory as needed, but doesn't break -CoW. However, KVM does not mark any newly created stage-2 PTE as Accessed. +CoW. However, on x86, KVM does not mark any newly created stage-2 PTE as +Accessed.

In the case of confidential VM types where there is an initial set up of private guest memory before the guest is 'finalized'/measured, this ioctl diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index bff62e75d681..1ac0605f86cb 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -25,6 +25,7 @@ menuconfig KVM select HAVE_KVM_CPU_RELAX_INTERCEPT select KVM_MMIO select KVM_GENERIC_DIRTYLOG_READ_PROTECT + select KVM_GENERIC_PRE_FAULT_MEMORY select KVM_XFER_TO_GUEST_WORK select KVM_VFIO select HAVE_KVM_DIRTY_RING_ACQ_REL diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 888f7c7abf54..65654a742864 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -322,6 +322,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_IRQFD_RESAMPLE: case KVM_CAP_COUNTER_OFFSET: case KVM_CAP_ARM_WRITABLE_IMP_ID_REGS: + case KVM_CAP_PRE_FAULT_MEMORY: r = 1; break; case KVM_CAP_SET_GUEST_DEBUG2: diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index a36426ccd9b5..82f122e4b08c 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1597,8 +1597,8 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,

static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct kvm_s2_trans *nested, - struct kvm_memory_slot *memslot, unsigned long hva, - bool fault_is_perm) + struct kvm_memory_slot *memslot, long *page_size, + unsigned long hva, bool fault_is_perm) { int ret = 0; bool topup_memcache; @@ -1871,6 +1871,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, kvm_release_faultin_page(kvm, page, !!ret, writable); kvm_fault_unlock(kvm);

+ if (page_size) + *page_size = vma_pagesize; + /* Mark the page dirty only if the fault is handled successfully */ if (writable && !ret) mark_page_dirty_in_slot(kvm, memslot, gfn); @@ -2069,8 +2072,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) ret = gmem_abort(vcpu, fault_ipa, nested, memslot, esr_fsc_is_permission_fault(esr)); else - ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva, - esr_fsc_is_permission_fault(esr)); + ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, NULL, + hva, esr_fsc_is_permission_fault(esr)); if (ret == 0) ret = 1; out: @@ -2446,3 +2449,65 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled)

trace_kvm_toggle_cache(*vcpu_pc(vcpu), was_enabled, now_enabled); } + +long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, + struct kvm_pre_fault_memory *range) +{ + int ret, idx; + hva_t hva; + phys_addr_t end; + struct kvm_memory_slot *memslot; + struct kvm_vcpu_fault_info stored_fault, *fault_info; + + long page_size = PAGE_SIZE; + phys_addr_t ipa = range->gpa; + gfn_t gfn = gpa_to_gfn(range->gpa); + + idx = srcu_read_lock(&vcpu->kvm->srcu); + + if (ipa >= kvm_phys_size(vcpu->arch.hw_mmu)) { + ret = -ENOENT; + goto out_unlock; + } + + memslot = gfn_to_memslot(vcpu->kvm, gfn); + if (!memslot) { + ret = -ENOENT; + goto out_unlock; + } + + fault_info = &vcpu->arch.fault; + stored_fault = *fault_info; + + /* Generate a synthetic abort for the pre-fault address */ + fault_info->esr_el2 = FIELD_PREP(ESR_ELx_EC_MASK, ESR_ELx_EC_DABT_CUR); + fault_info->esr_el2 &= ~ESR_ELx_ISV; + fault_info->esr_el2 |= ESR_ELx_FSC_FAULT_L(KVM_PGTABLE_LAST_LEVEL); + + fault_info->hpfar_el2 = HPFAR_EL2_NS | + FIELD_PREP(HPFAR_EL2_FIPA, ipa >> 12); + + if (kvm_slot_has_gmem(memslot)) { + ret = gmem_abort(vcpu, ipa, NULL, memslot, false); + } else { + hva = gfn_to_hva_memslot_prot(memslot, gfn, NULL); + if (kvm_is_error_hva(hva)) { + ret = -EFAULT; + goto out; + } + ret = user_mem_abort(vcpu, ipa, NULL, memslot, &page_size, hva, + false); + } + + if (ret < 0) + goto out; + + end = (range->gpa & ~(page_size - 1)) + page_size; + ret = min(range->size, end - range->gpa); + +out: + *fault_info = stored_fault; +out_unlock: + srcu_read_unlock(&vcpu->kvm->srcu, idx); + return ret; +}

-- 2.43.0

Suzuki K Poulose

16 Oct 16 Oct

2:01 p.m.

New subject: [PATCH v2 1/4] KVM: arm64: Add pre_fault_memory implementation

On 13/10/2025 16:14, Jack Thomson wrote:

...

From: Jack Thomson jackabt@amazon.com

Add kvm_arch_vcpu_pre_fault_memory() for arm64. The implementation hands off the stage-2 faulting logic to either gmem_abort() or user_mem_abort().

Add an optional page_size output parameter to user_mem_abort() to return the VMA page size, which is needed when pre-faulting.

Update the documentation to clarify x86 specific behaviour.

Thanks for the patch ! Do we care about faulting beyond the requested range ? I understand this doesn't happen for anything that is not backed by gmem (which might change with hugetlbfs support) or normal VMs. But for coco VMs this might affect the measurement or even cause failure in "pre-faulting" because of the extra security checks. (e.g., trying to fault in twice, because the range is backed by say, 1G page).

Of course these could be addressed via a separate patch, when this becomes a real requirement.

One way to solve this could be pass on the "pagesize" as the input parameter which could force the backend to limit the vma_pagesize that gets used for the stage2 mapping.

...

Signed-off-by: Jack Thomson jackabt@amazon.com

Documentation/virt/kvm/api.rst | 3 +- arch/arm64/kvm/Kconfig | 1 + arch/arm64/kvm/arm.c | 1 + arch/arm64/kvm/mmu.c | 73 ++++++++++++++++++++++++++++++++-- 4 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index c17a87a0a5ac..9e8cc4eb505d 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6461,7 +6461,8 @@ Errors: KVM_PRE_FAULT_MEMORY populates KVM's stage-2 page tables used to map memory for the current vCPU state. KVM maps memory as if the vCPU generated a stage-2 read page fault, e.g. faults in memory as needed, but doesn't break -CoW. However, KVM does not mark any newly created stage-2 PTE as Accessed. +CoW. However, on x86, KVM does not mark any newly created stage-2 PTE as +Accessed. In the case of confidential VM types where there is an initial set up of private guest memory before the guest is 'finalized'/measured, this ioctl diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index bff62e75d681..1ac0605f86cb 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -25,6 +25,7 @@ menuconfig KVM select HAVE_KVM_CPU_RELAX_INTERCEPT select KVM_MMIO select KVM_GENERIC_DIRTYLOG_READ_PROTECT

select KVM_GENERIC_PRE_FAULT_MEMORY select KVM_XFER_TO_GUEST_WORK select KVM_VFIO select HAVE_KVM_DIRTY_RING_ACQ_REL

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 888f7c7abf54..65654a742864 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -322,6 +322,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_IRQFD_RESAMPLE: case KVM_CAP_COUNTER_OFFSET: case KVM_CAP_ARM_WRITABLE_IMP_ID_REGS:

case KVM_CAP_PRE_FAULT_MEMORY: r = 1; break; case KVM_CAP_SET_GUEST_DEBUG2:

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index a36426ccd9b5..82f122e4b08c 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1597,8 +1597,8 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct kvm_s2_trans *nested,
	  struct kvm_memory_slot *memslot, unsigned long hva,
	  bool fault_is_perm)
	  struct kvm_memory_slot *memslot, long *page_size,
	  unsigned long hva, bool fault_is_perm)
{ int ret = 0; bool topup_memcache;
@@ -1871,6 +1871,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, kvm_release_faultin_page(kvm, page, !!ret, writable); kvm_fault_unlock(kvm);
if (page_size)
*page_size = vma_pagesize;
/* Mark the page dirty only if the fault is handled successfully */ if (writable && !ret) mark_page_dirty_in_slot(kvm, memslot, gfn);
@@ -2069,8 +2072,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) ret = gmem_abort(vcpu, fault_ipa, nested, memslot, esr_fsc_is_permission_fault(esr)); else
ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
		     esr_fsc_is_permission_fault(esr));
ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, NULL,
		     hva, esr_fsc_is_permission_fault(esr));
if (ret == 0) ret = 1; out:
@@ -2446,3 +2449,65 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled) trace_kvm_toggle_cache(*vcpu_pc(vcpu), was_enabled, now_enabled); }

+long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
		    struct kvm_pre_fault_memory *range)
+{
int ret, idx;

hva_t hva;

phys_addr_t end;

struct kvm_memory_slot *memslot;

struct kvm_vcpu_fault_info stored_fault, *fault_info;

long page_size = PAGE_SIZE;

phys_addr_t ipa = range->gpa;

gfn_t gfn = gpa_to_gfn(range->gpa);

idx = srcu_read_lock(&vcpu->kvm->srcu);

if (ipa >= kvm_phys_size(vcpu->arch.hw_mmu)) {
ret = -ENOENT;
goto out_unlock;
}

memslot = gfn_to_memslot(vcpu->kvm, gfn);

if (!memslot) {
ret = -ENOENT;
goto out_unlock;
}

fault_info = &vcpu->arch.fault;

stored_fault = *fault_info;

/* Generate a synthetic abort for the pre-fault address */

fault_info->esr_el2 = FIELD_PREP(ESR_ELx_EC_MASK, ESR_ELx_EC_DABT_CUR);

minor nit: Any reason why we don't use ESR_ELx_EC_DABT_LOW ? We always get that for a data abort from the guest. Otherwise, this looks good to me.

Suzuki

...

fault_info->esr_el2 &= ~ESR_ELx_ISV;

fault_info->esr_el2 |= ESR_ELx_FSC_FAULT_L(KVM_PGTABLE_LAST_LEVEL);

fault_info->hpfar_el2 = HPFAR_EL2_NS |
FIELD_PREP(HPFAR_EL2_FIPA, ipa >> 12);
if (kvm_slot_has_gmem(memslot)) {
ret = gmem_abort(vcpu, ipa, NULL, memslot, false);
} else {
hva = gfn_to_hva_memslot_prot(memslot, gfn, NULL);
if (kvm_is_error_hva(hva)) {
	ret = -EFAULT;
	goto out;
}
ret = user_mem_abort(vcpu, ipa, NULL, memslot, &page_size, hva,
		     false);
}

if (ret < 0)
goto out;
end = (range->gpa & ~(page_size - 1)) + page_size;

ret = min(range->size, end - range->gpa);
+out:

*fault_info = stored_fault;

+out_unlock:

srcu_read_unlock(&vcpu->kvm->srcu, idx);

return ret;

+}

Jack Thomson

13 Oct 13 Oct

3:14 p.m.

New subject: [PATCH v2 2/4] KVM: selftests: Fix unaligned mmap allocations

From: Jack Thomson jackabt@amazon.com

When creating a VM using mmap with huge pages, and the memory amount does not align with the underlying page size. The stored mmap_size value does not account for the fact that mmap will automatically align the length to a multiple of the underlying page size. During the teardown of the test, munmap is used. However, munmap requires the length to be a multiple of the underlying page size.

Update the vm_mem_add method to ensure the mmap_size is aligned to the underlying page size.

Signed-off-by: Jack Thomson jackabt@amazon.com --- tools/testing/selftests/kvm/lib/kvm_util.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index c3f5142b0a54..b106fbed999c 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -1051,7 +1051,6 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type, /* Allocate and initialize new mem region structure. */ region = calloc(1, sizeof(*region)); TEST_ASSERT(region != NULL, "Insufficient Memory"); - region->mmap_size = mem_size;

#ifdef __s390x__ /* On s390x, the host address must be aligned to 1M (due to PGSTEs) */ @@ -1060,6 +1059,11 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type, alignment = 1; #endif

+ alignment = max(backing_src_pagesz, alignment); + region->mmap_size = align_up(mem_size, alignment); + + TEST_ASSERT_EQ(guest_paddr, align_up(guest_paddr, backing_src_pagesz)); + /* * When using THP mmap is not guaranteed to returned a hugepage aligned * address so we have to pad the mmap. Padding is not needed for HugeTLB @@ -1067,12 +1071,6 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type, * page size. */ if (src_type == VM_MEM_SRC_ANONYMOUS_THP) - alignment = max(backing_src_pagesz, alignment); - - TEST_ASSERT_EQ(guest_paddr, align_up(guest_paddr, backing_src_pagesz)); - - /* Add enough memory to align up if necessary */ - if (alignment > 1) region->mmap_size += alignment;

region->fd = -1;

-- 2.43.0

Sean Christopherson

23 Oct 23 Oct

5:16 p.m.

New subject: [PATCH v2 2/4] KVM: selftests: Fix unaligned mmap allocations

On Mon, Oct 13, 2025, Jack Thomson wrote:

...

From: Jack Thomson jackabt@amazon.com

When creating a VM using mmap with huge pages, and the memory amount does not align with the underlying page size. The stored mmap_size value does not account for the fact that mmap will automatically align the length to a multiple of the underlying page size. During the teardown of the test, munmap is used. However, munmap requires the length to be a multiple of the underlying page size.

What happens when selftests use the wrong map_size? E.g. is munmap() silently failing? If so, then I should probably take this particular patch through kvm-x86/gmem, otherwise it means we'll start getting asserts due to:

3223560c93eb ("KVM: selftests: Define wrappers for common syscalls to assert success")

If munmap() isn't failing, then that begs the question of what this patch is actually doing :-)

...

Update the vm_mem_add method to ensure the mmap_size is aligned to the underlying page size.

Signed-off-by: Jack Thomson jackabt@amazon.com

tools/testing/selftests/kvm/lib/kvm_util.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index c3f5142b0a54..b106fbed999c 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -1051,7 +1051,6 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type, /* Allocate and initialize new mem region structure. */ region = calloc(1, sizeof(*region)); TEST_ASSERT(region != NULL, "Insufficient Memory");

region->mmap_size = mem_size;

#ifdef __s390x__ /* On s390x, the host address must be aligned to 1M (due to PGSTEs) */ @@ -1060,6 +1059,11 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type, alignment = 1; #endif

alignment = max(backing_src_pagesz, alignment);

region->mmap_size = align_up(mem_size, alignment);

TEST_ASSERT_EQ(guest_paddr, align_up(guest_paddr, backing_src_pagesz));

/*

When using THP mmap is not guaranteed to returned a hugepage aligned

address so we have to pad the mmap. Padding is not needed for HugeTLB

@@ -1067,12 +1071,6 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type, * page size. */ if (src_type == VM_MEM_SRC_ANONYMOUS_THP)
alignment = max(backing_src_pagesz, alignment);
TEST_ASSERT_EQ(guest_paddr, align_up(guest_paddr, backing_src_pagesz));

/* Add enough memory to align up if necessary */

if (alignment > 1) region->mmap_size += alignment;
region->fd = -1; -- 2.43.0

Thomson, Jack

28 Oct 28 Oct

11:44 a.m.

New subject: [PATCH v2 2/4] KVM: selftests: Fix unaligned mmap allocations

On 23/10/2025 6:16 pm, Sean Christopherson wrote:

...

On Mon, Oct 13, 2025, Jack Thomson wrote:

...
From: Jack Thomson jackabt@amazon.com

When creating a VM using mmap with huge pages, and the memory amount does not align with the underlying page size. The stored mmap_size value does not account for the fact that mmap will automatically align the length to a multiple of the underlying page size. During the teardown of the test, munmap is used. However, munmap requires the length to be a multiple of the underlying page size.

What happens when selftests use the wrong map_size? E.g. is munmap() silently failing? If so, then I should probably take this particular patch through kvm-x86/gmem, otherwise it means we'll start getting asserts due to:

3223560c93eb ("KVM: selftests: Define wrappers for common syscalls to assert success")

If munmap() isn't failing, then that begs the question of what this patch is actually doing :-)

Hi Sean, sorry I completely missed your reply.

Yeah currently with a misaligned map_size it causes munmap() to fail, I noticed when tested with different backings.

...

...
Update the vm_mem_add method to ensure the mmap_size is aligned to the underlying page size.

Signed-off-by: Jack Thomson jackabt@amazon.com

tools/testing/selftests/kvm/lib/kvm_util.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index c3f5142b0a54..b106fbed999c 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -1051,7 +1051,6 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type, /* Allocate and initialize new mem region structure. */ region = calloc(1, sizeof(*region)); TEST_ASSERT(region != NULL, "Insufficient Memory");

region->mmap_size = mem_size;

#ifdef __s390x__ /* On s390x, the host address must be aligned to 1M (due to PGSTEs) */ @@ -1060,6 +1059,11 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type, alignment = 1; #endif

alignment = max(backing_src_pagesz, alignment);

region->mmap_size = align_up(mem_size, alignment);

TEST_ASSERT_EQ(guest_paddr, align_up(guest_paddr, backing_src_pagesz));

/*

When using THP mmap is not guaranteed to returned a hugepage aligned

address so we have to pad the mmap. Padding is not needed for HugeTLB

@@ -1067,12 +1071,6 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type, * page size. */ if (src_type == VM_MEM_SRC_ANONYMOUS_THP)
alignment = max(backing_src_pagesz, alignment);
TEST_ASSERT_EQ(guest_paddr, align_up(guest_paddr, backing_src_pagesz));

/* Add enough memory to align up if necessary */

if (alignment > 1) region->mmap_size += alignment;
region->fd = -1; -- 2.43.0

-- Thanks, Jack

Sean Christopherson

3 Nov 3 Nov

9:08 p.m.

New subject: [PATCH v2 2/4] KVM: selftests: Fix unaligned mmap allocations

On Tue, Oct 28, 2025, Jack Thomson wrote:

...

On 23/10/2025 6:16 pm, Sean Christopherson wrote:

...
On Mon, Oct 13, 2025, Jack Thomson wrote:

...
From: Jack Thomson jackabt@amazon.com

When creating a VM using mmap with huge pages, and the memory amount does not align with the underlying page size. The stored mmap_size value does not account for the fact that mmap will automatically align the length to a multiple of the underlying page size. During the teardown of the test, munmap is used. However, munmap requires the length to be a multiple of the underlying page size.

What happens when selftests use the wrong map_size? E.g. is munmap() silently failing? If so, then I should probably take this particular patch through kvm-x86/gmem, otherwise it means we'll start getting asserts due to:

3223560c93eb ("KVM: selftests: Define wrappers for common syscalls to assert success")

If munmap() isn't failing, then that begs the question of what this patch is actually doing :-)

Hi Sean, sorry I completely missed your reply.

Yeah currently with a misaligned map_size it causes munmap() to fail, I noticed when tested with different backings.

Exactly which tests fail? I ask because I'm not sure we want to fix this by having vm_mem_add() paper over test issues (I vaguely recall looking at this in the past, but I can't find or recall the details).

Thomson, Jack

4 Nov 4 Nov

11:40 a.m.

New subject: [PATCH v2 2/4] KVM: selftests: Fix unaligned mmap allocations

On 03/11/2025 9:08 pm, Sean Christopherson wrote:

...

On Tue, Oct 28, 2025, Jack Thomson wrote:

...
On 23/10/2025 6:16 pm, Sean Christopherson wrote:

...
On Mon, Oct 13, 2025, Jack Thomson wrote:

...
From: Jack Thomson jackabt@amazon.com

When creating a VM using mmap with huge pages, and the memory amount does not align with the underlying page size. The stored mmap_size value does not account for the fact that mmap will automatically align the length to a multiple of the underlying page size. During the teardown of the test, munmap is used. However, munmap requires the length to be a multiple of the underlying page size.

What happens when selftests use the wrong map_size? E.g. is munmap() silently failing? If so, then I should probably take this particular patch through kvm-x86/gmem, otherwise it means we'll start getting asserts due to:
3223560c93eb ("KVM: selftests: Define wrappers for common syscalls to assert success")
If munmap() isn't failing, then that begs the question of what this patch is actually doing :-)
Hi Sean, sorry I completely missed your reply.

Yeah currently with a misaligned map_size it causes munmap() to fail, I noticed when tested with different backings.
Exactly which tests fail? I ask because I'm not sure we want to fix this by having vm_mem_add() paper over test issues (I vaguely recall looking at this in the past, but I can't find or recall the details).

The test failures happened with pre_faulting tests after adding the option to change the backing page size [1]. If you'd prefer to have the test handle with this I'll update there instead.

[1] https://lore.kernel.org/all/20251013151502.6679-5-jackabt.amazon@gmail.com

-- Thanks, Jack

Sean Christopherson

8:19 p.m.

New subject: [PATCH v2 2/4] KVM: selftests: Fix unaligned mmap allocations

On Tue, Nov 04, 2025, Jack Thomson wrote:

...

On 03/11/2025 9:08 pm, Sean Christopherson wrote:

...
On Tue, Oct 28, 2025, Jack Thomson wrote:

...
On 23/10/2025 6:16 pm, Sean Christopherson wrote:

...
On Mon, Oct 13, 2025, Jack Thomson wrote:

...
From: Jack Thomson jackabt@amazon.com

When creating a VM using mmap with huge pages, and the memory amount does not align with the underlying page size. The stored mmap_size value does not account for the fact that mmap will automatically align the length to a multiple of the underlying page size. During the teardown of the test, munmap is used. However, munmap requires the length to be a multiple of the underlying page size.

What happens when selftests use the wrong map_size? E.g. is munmap() silently failing? If so, then I should probably take this particular patch through kvm-x86/gmem, otherwise it means we'll start getting asserts due to:
3223560c93eb ("KVM: selftests: Define wrappers for common syscalls to assert success")
If munmap() isn't failing, then that begs the question of what this patch is actually doing :-)
Hi Sean, sorry I completely missed your reply.

Yeah currently with a misaligned map_size it causes munmap() to fail, I noticed when tested with different backings.
Exactly which tests fail? I ask because I'm not sure we want to fix this by having vm_mem_add() paper over test issues (I vaguely recall looking at this in the past, but I can't find or recall the details).
The test failures happened with pre_faulting tests after adding the option to change the backing page size [1]. If you'd prefer to have the test handle with this I'll update there instead.

Ah, yeah, that's a test bug introduced by your patch. I can't find the thread, but the issue of hugepage aligntment in vm_mem_add() has come up in the past, and IIRC the conclusion was that tests need to handle the size+alignment, because having the library force the alignment risking papering over test bugs/flaws. And I think there may have even been cases where it introduced failures, as some tests deliberately wanted to do weird things?

E.g. not updating the pre-faulting test to use the "correct" size+alignment means the test is missing easy coverage for hugepages, since KVM won't create huge mappings in stage-2 due to the memslot not being sized+aligned.

Thomson, Jack

13 Nov 13 Nov

11:34 a.m.

New subject: [PATCH v2 2/4] KVM: selftests: Fix unaligned mmap allocations

On 04/11/2025 8:19 pm, Sean Christopherson wrote:

...

On Tue, Nov 04, 2025, Jack Thomson wrote:

...
On 03/11/2025 9:08 pm, Sean Christopherson wrote:

...
On Tue, Oct 28, 2025, Jack Thomson wrote:

...
On 23/10/2025 6:16 pm, Sean Christopherson wrote:

...
On Mon, Oct 13, 2025, Jack Thomson wrote:

...
From: Jack Thomson jackabt@amazon.com

When creating a VM using mmap with huge pages, and the memory amount does not align with the underlying page size. The stored mmap_size value does not account for the fact that mmap will automatically align the length to a multiple of the underlying page size. During the teardown of the test, munmap is used. However, munmap requires the length to be a multiple of the underlying page size.

What happens when selftests use the wrong map_size? E.g. is munmap() silently failing? If so, then I should probably take this particular patch through kvm-x86/gmem, otherwise it means we'll start getting asserts due to:
 3223560c93eb ("KVM: selftests: Define wrappers for common syscalls to assert success")
If munmap() isn't failing, then that begs the question of what this patch is actually doing :-)
Hi Sean, sorry I completely missed your reply.

Yeah currently with a misaligned map_size it causes munmap() to fail, I noticed when tested with different backings.
Exactly which tests fail? I ask because I'm not sure we want to fix this by having vm_mem_add() paper over test issues (I vaguely recall looking at this in the past, but I can't find or recall the details).
The test failures happened with pre_faulting tests after adding the option to change the backing page size [1]. If you'd prefer to have the test handle with this I'll update there instead.
Ah, yeah, that's a test bug introduced by your patch. I can't find the thread, but the issue of hugepage aligntment in vm_mem_add() has come up in the past, and IIRC the conclusion was that tests need to handle the size+alignment, because having the library force the alignment risking papering over test bugs/flaws. And I think there may have even been cases where it introduced failures, as some tests deliberately wanted to do weird things?

E.g. not updating the pre-faulting test to use the "correct" size+alignment means the test is missing easy coverage for hugepages, since KVM won't create huge mappings in stage-2 due to the memslot not being sized+aligned.

Got you, that makes sense I'll update this series to resolve this then. Thanks for taking a look.

-- Thanks, Jack

Jack Thomson

13 Oct 13 Oct

3:15 p.m.

New subject: [PATCH v2 3/4] KVM: selftests: Enable pre_fault_memory_test for arm64

From: Jack Thomson jackabt@amazon.com

Enable the pre_fault_memory_test to run on arm64 by making it work with different guest page sizes and testing multiple guest configurations.

Update the test_assert to compare against the UCALL_EXIT_REASON, for portability, as arm64 exits with KVM_EXIT_MMIO while x86 uses KVM_EXIT_IO.

Signed-off-by: Jack Thomson jackabt@amazon.com --- tools/testing/selftests/kvm/Makefile.kvm | 1 + .../selftests/kvm/pre_fault_memory_test.c | 79 ++++++++++++++----- 2 files changed, 59 insertions(+), 21 deletions(-)

diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm index 90f03f00cb04..4db1737fad04 100644 --- a/tools/testing/selftests/kvm/Makefile.kvm +++ b/tools/testing/selftests/kvm/Makefile.kvm @@ -180,6 +180,7 @@ TEST_GEN_PROGS_arm64 += memslot_perf_test TEST_GEN_PROGS_arm64 += mmu_stress_test TEST_GEN_PROGS_arm64 += rseq_test TEST_GEN_PROGS_arm64 += steal_time +TEST_GEN_PROGS_arm64 += pre_fault_memory_test

TEST_GEN_PROGS_s390 = $(TEST_GEN_PROGS_COMMON) TEST_GEN_PROGS_s390 += s390/memop diff --git a/tools/testing/selftests/kvm/pre_fault_memory_test.c b/tools/testing/selftests/kvm/pre_fault_memory_test.c index 0350a8896a2f..ed9848a8af60 100644 --- a/tools/testing/selftests/kvm/pre_fault_memory_test.c +++ b/tools/testing/selftests/kvm/pre_fault_memory_test.c @@ -10,19 +10,29 @@ #include <test_util.h> #include <kvm_util.h> #include <processor.h> +#include <guest_modes.h>

/* Arbitrarily chosen values */ -#define TEST_SIZE (SZ_2M + PAGE_SIZE) -#define TEST_NPAGES (TEST_SIZE / PAGE_SIZE) +#define TEST_BASE_SIZE SZ_2M #define TEST_SLOT 10

+/* Storage of test info to share with guest code */ +struct test_config { + int page_size; + uint64_t test_size; + uint64_t test_num_pages; +}; + +struct test_config test_config; + static void guest_code(uint64_t base_gpa) { volatile uint64_t val __used; + struct test_config *config = &test_config; int i;

- for (i = 0; i < TEST_NPAGES; i++) { - uint64_t *src = (uint64_t *)(base_gpa + i * PAGE_SIZE); + for (i = 0; i < config->test_num_pages; i++) { + uint64_t *src = (uint64_t *)(base_gpa + i * config->page_size);

val = *src; } @@ -63,11 +73,17 @@ static void pre_fault_memory(struct kvm_vcpu *vcpu, u64 gpa, u64 size, "KVM_PRE_FAULT_MEMORY", ret, vcpu->vm); }

-static void __test_pre_fault_memory(unsigned long vm_type, bool private) +struct test_params { + unsigned long vm_type; + bool private; +}; + +static void __test_pre_fault_memory(enum vm_guest_mode guest_mode, void *arg) { + struct test_params *p = arg; const struct vm_shape shape = { - .mode = VM_MODE_DEFAULT, - .type = vm_type, + .mode = guest_mode, + .type = p->vm_type, }; struct kvm_vcpu *vcpu; struct kvm_run *run; @@ -78,10 +94,17 @@ static void __test_pre_fault_memory(unsigned long vm_type, bool private) uint64_t guest_test_virt_mem; uint64_t alignment, guest_page_size;

+ pr_info("Testing guest mode: %s\n", vm_guest_mode_string(guest_mode)); + vm = vm_create_shape_with_one_vcpu(shape, &vcpu, guest_code);

- alignment = guest_page_size = vm_guest_mode_params[VM_MODE_DEFAULT].page_size; - guest_test_phys_mem = (vm->max_gfn - TEST_NPAGES) * guest_page_size; + guest_page_size = vm_guest_mode_params[guest_mode].page_size; + + test_config.page_size = guest_page_size; + test_config.test_size = TEST_BASE_SIZE + test_config.page_size; + test_config.test_num_pages = vm_calc_num_guest_pages(vm->mode, test_config.test_size); + + guest_test_phys_mem = (vm->max_gfn - test_config.test_num_pages) * test_config.page_size; #ifdef __s390x__ alignment = max(0x100000UL, guest_page_size); #else @@ -91,22 +114,31 @@ static void __test_pre_fault_memory(unsigned long vm_type, bool private) guest_test_virt_mem = guest_test_phys_mem & ((1ULL << (vm->va_bits - 1)) - 1);

vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, - guest_test_phys_mem, TEST_SLOT, TEST_NPAGES, - private ? KVM_MEM_GUEST_MEMFD : 0); - virt_map(vm, guest_test_virt_mem, guest_test_phys_mem, TEST_NPAGES); - - if (private) - vm_mem_set_private(vm, guest_test_phys_mem, TEST_SIZE); - pre_fault_memory(vcpu, guest_test_phys_mem, SZ_2M, 0); - pre_fault_memory(vcpu, guest_test_phys_mem + SZ_2M, PAGE_SIZE * 2, PAGE_SIZE); - pre_fault_memory(vcpu, guest_test_phys_mem + TEST_SIZE, PAGE_SIZE, PAGE_SIZE); + guest_test_phys_mem, TEST_SLOT, test_config.test_num_pages, + p->private ? KVM_MEM_GUEST_MEMFD : 0); + virt_map(vm, guest_test_virt_mem, guest_test_phys_mem, test_config.test_num_pages); + + if (p->private) + vm_mem_set_private(vm, guest_test_phys_mem, test_config.test_size); + pre_fault_memory(vcpu, guest_test_phys_mem, TEST_BASE_SIZE, 0); + /* Test pre-faulting over an already faulted range */ + pre_fault_memory(vcpu, guest_test_phys_mem, TEST_BASE_SIZE, 0); + pre_fault_memory(vcpu, guest_test_phys_mem + TEST_BASE_SIZE, + test_config.page_size * 2, test_config.page_size); + pre_fault_memory(vcpu, guest_test_phys_mem + test_config.test_size, + test_config.page_size, test_config.page_size);

vcpu_args_set(vcpu, 1, guest_test_virt_mem); + + /* Export the shared variables to the guest. */ + sync_global_to_guest(vm, test_config); + vcpu_run(vcpu);

run = vcpu->run; - TEST_ASSERT(run->exit_reason == KVM_EXIT_IO, - "Wanted KVM_EXIT_IO, got exit reason: %u (%s)", + TEST_ASSERT(run->exit_reason == UCALL_EXIT_REASON, + "Wanted %s, got exit reason: %u (%s)", + exit_reason_str(UCALL_EXIT_REASON), run->exit_reason, exit_reason_str(run->exit_reason));

switch (get_ucall(vcpu, &uc)) { @@ -130,7 +162,12 @@ static void test_pre_fault_memory(unsigned long vm_type, bool private) return; }

- __test_pre_fault_memory(vm_type, private); + struct test_params p = { + .vm_type = vm_type, + .private = private, + }; + + for_each_guest_mode(__test_pre_fault_memory, &p); }

int main(int argc, char *argv[])

-- 2.43.0

Jack Thomson

3:15 p.m.

New subject: [PATCH v2 4/4] KVM: selftests: Add option for different backing in pre-fault tests

From: Jack Thomson jackabt@amazon.com

Add a -m option to specify different memory backing types for the pre-fault tests (e.g., anonymous, hugetlb), allowing testing of the pre-fault functionality across different memory configurations.

Signed-off-by: Jack Thomson jackabt@amazon.com --- .../selftests/kvm/pre_fault_memory_test.c | 31 ++++++++++++++++--- 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/kvm/pre_fault_memory_test.c b/tools/testing/selftests/kvm/pre_fault_memory_test.c index ed9848a8af60..22e2e53945d9 100644 --- a/tools/testing/selftests/kvm/pre_fault_memory_test.c +++ b/tools/testing/selftests/kvm/pre_fault_memory_test.c @@ -76,6 +76,7 @@ static void pre_fault_memory(struct kvm_vcpu *vcpu, u64 gpa, u64 size, struct test_params { unsigned long vm_type; bool private; + enum vm_mem_backing_src_type mem_backing_src; };

static void __test_pre_fault_memory(enum vm_guest_mode guest_mode, void *arg) @@ -94,7 +95,11 @@ static void __test_pre_fault_memory(enum vm_guest_mode guest_mode, void *arg) uint64_t guest_test_virt_mem; uint64_t alignment, guest_page_size;

+ size_t backing_src_pagesz = get_backing_src_pagesz(p->mem_backing_src); + pr_info("Testing guest mode: %s\n", vm_guest_mode_string(guest_mode)); + pr_info("Testing memory backing src type: %s\n", + vm_mem_backing_src_alias(p->mem_backing_src)->name);

vm = vm_create_shape_with_one_vcpu(shape, &vcpu, guest_code);

@@ -110,10 +115,11 @@ static void __test_pre_fault_memory(enum vm_guest_mode guest_mode, void *arg) #else alignment = SZ_2M; #endif + alignment = max(alignment, backing_src_pagesz); guest_test_phys_mem = align_down(guest_test_phys_mem, alignment); guest_test_virt_mem = guest_test_phys_mem & ((1ULL << (vm->va_bits - 1)) - 1);

- vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, + vm_userspace_mem_region_add(vm, p->mem_backing_src, guest_test_phys_mem, TEST_SLOT, test_config.test_num_pages, p->private ? KVM_MEM_GUEST_MEMFD : 0); virt_map(vm, guest_test_virt_mem, guest_test_phys_mem, test_config.test_num_pages); @@ -155,7 +161,8 @@ static void __test_pre_fault_memory(enum vm_guest_mode guest_mode, void *arg) kvm_vm_free(vm); }

-static void test_pre_fault_memory(unsigned long vm_type, bool private) +static void test_pre_fault_memory(unsigned long vm_type, enum vm_mem_backing_src_type backing_src, + bool private) { if (vm_type && !(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type))) { pr_info("Skipping tests for vm_type 0x%lx\n", vm_type); @@ -165,6 +172,7 @@ static void test_pre_fault_memory(unsigned long vm_type, bool private) struct test_params p = { .vm_type = vm_type, .private = private, + .mem_backing_src = backing_src, };

for_each_guest_mode(__test_pre_fault_memory, &p); @@ -174,10 +182,23 @@ int main(int argc, char *argv[]) { TEST_REQUIRE(kvm_check_cap(KVM_CAP_PRE_FAULT_MEMORY));

- test_pre_fault_memory(0, false); + int opt; + enum vm_mem_backing_src_type backing = VM_MEM_SRC_ANONYMOUS; + + while ((opt = getopt(argc, argv, "m:")) != -1) { + switch (opt) { + case 'm': + backing = parse_backing_src_type(optarg); + break; + default: + break; + } + } + + test_pre_fault_memory(0, backing, false); #ifdef __x86_64__ - test_pre_fault_memory(KVM_X86_SW_PROTECTED_VM, false); - test_pre_fault_memory(KVM_X86_SW_PROTECTED_VM, true); + test_pre_fault_memory(KVM_X86_SW_PROTECTED_VM, backing, false); + test_pre_fault_memory(KVM_X86_SW_PROTECTED_VM, backing, true); #endif return 0; }

-- 2.43.0

days inactive

days old

linux-kselftest-mirror@lists.linaro.org

11 comments

participants

tags (0)

participants (4)

Jack Thomson
Sean Christopherson
Suzuki K Poulose
Thomson, Jack