Base ====
These patches are based upon Andrew Morton's v5.13-rc1-mmots-2021-05-13-17-23 tag. This is because this series depends on:
- UFFD minor fault support for hugetlbfs (in v5.13-rc1) [1] - UFFD minor fault support for shmem (in Andrew's tree) [2]
[1] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen@... [2] https://lore.kernel.org/patchwork/cover/1420967/
Changelog =========
v1->v2: - Picked up Reviewed-by's. - Change backing_src_is_shared() to check the flags, instead of the type. This makes it robust to adding new backing source types in the future. - Add another commit which refactors setup_demand_paging() error handling. - Print UFFD ioctl type once in setup_demand_paging, instead of on every page-in operation. - Expand comment on why we use MFD_HUGETLB instead of MAP_HUGETLB. - Reworded comment on addr_gpa2alias. - Moved demand_paging_test.c timing calls outside of the if (), deduplicating them. - Split trivial comment / logging fixups into a separate commit. - Add another commit which prints a clarifying message on test skip. - Split the commit allowing backing src_type to be modified in two. - Split the commit adding the shmem backing type in two. - Rebased onto v5.13-rc1-mmots-2021-05-13-17-23.
Overview ========
Minor fault handling is a new userfaultfd feature whose goal is generally to improve performance. In particular, it is intended for use with demand paging. There are more details in the cover letters for this new feature (linked above), but at a high level the idea is that we think of these three phases of live migration of a VM:
1. Precopy, where we copy "some" pages from the source to the target, while the VM is still running on the source machine. 2. Blackout, where execution stops on the source, and begins on the target. 3. Postcopy, where the VM is running on the target, some pages are already up to date, and others are not (because they weren't copied, or were modified after being copied).
During postcopy, the first time the guest touches memory, we intercept a minor fault. Userspace checks whether or not the page is already up to date. If needed, we copy the final version of the page from the soure machine. This could be done with RDMA for example, to do it truly in place / with no copying. At this point, all that's left is to setup PTEs for the guest: so we issue UFFDIO_CONTINUE. No copying or page allocation needed.
Because of this use case, it's useful to exercise this as part of the demand paging test. It lets us ensure the use case works correctly end-to-end, and also gives us an in-tree way to profile the end-to-end flow for future performance improvements.
Axel Rasmussen (10): KVM: selftests: trivial comment/logging fixes KVM: selftests: simplify setup_demand_paging error handling KVM: selftests: print a message when skipping KVM tests KVM: selftests: compute correct demand paging size KVM: selftests: allow different backing source types KVM: selftests: refactor vm_mem_backing_src_type flags KVM: selftests: add shmem backing source type KVM: selftests: create alias mappings when using shared memory KVM: selftests: allow using UFFD minor faults for demand paging KVM: selftests: add shared hugetlbfs backing source type
.../selftests/kvm/demand_paging_test.c | 175 +++++++++++------- .../testing/selftests/kvm/include/kvm_util.h | 1 + .../testing/selftests/kvm/include/test_util.h | 12 ++ tools/testing/selftests/kvm/lib/kvm_util.c | 84 ++++++++- .../selftests/kvm/lib/kvm_util_internal.h | 2 + tools/testing/selftests/kvm/lib/test_util.c | 51 +++-- 6 files changed, 238 insertions(+), 87 deletions(-)
-- 2.31.1.751.gd2f1c929bd-goog
Some trivial fixes I found while touching related code in this series, factored out into a separate commit for easier reviewing:
- s/gor/got/ and add a newline in demand_paging_test.c - s/backing_src/src_type/ in a comment to be consistent with the real function signature in kvm_util.c
Signed-off-by: Axel Rasmussen axelrasmussen@google.com --- tools/testing/selftests/kvm/demand_paging_test.c | 2 +- tools/testing/selftests/kvm/lib/kvm_util.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index 5f7a229c3af1..9398ba6ef023 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -169,7 +169,7 @@ static void *uffd_handler_thread_fn(void *arg) if (r == -1) { if (errno == EAGAIN) continue; - pr_info("Read of uffd gor errno %d", errno); + pr_info("Read of uffd got errno %d\n", errno); return NULL; }
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index fc83f6c5902d..f05ca919cccb 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -663,8 +663,8 @@ int kvm_memcmp_hva_gva(void *hva, struct kvm_vm *vm, vm_vaddr_t gva, size_t len) * * Input Args: * vm - Virtual Machine - * backing_src - Storage source for this region. - * NULL to use anonymous memory. + * src_type - Storage source for this region. + * NULL to use anonymous memory. * guest_paddr - Starting guest physical address * slot - KVM region slot * npages - Number of physical pages
On Wed, May 19, 2021 at 1:03 PM Axel Rasmussen axelrasmussen@google.com wrote:
Some trivial fixes I found while touching related code in this series, factored out into a separate commit for easier reviewing:
- s/gor/got/ and add a newline in demand_paging_test.c
- s/backing_src/src_type/ in a comment to be consistent with the real function signature in kvm_util.c
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
Reviewed-by: Ben Gardon bgardon@google.com
Thanks for doing this!
tools/testing/selftests/kvm/demand_paging_test.c | 2 +- tools/testing/selftests/kvm/lib/kvm_util.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index 5f7a229c3af1..9398ba6ef023 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -169,7 +169,7 @@ static void *uffd_handler_thread_fn(void *arg) if (r == -1) { if (errno == EAGAIN) continue;
pr_info("Read of uffd gor errno %d", errno);
pr_info("Read of uffd got errno %d\n", errno); return NULL; }
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index fc83f6c5902d..f05ca919cccb 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -663,8 +663,8 @@ int kvm_memcmp_hva_gva(void *hva, struct kvm_vm *vm, vm_vaddr_t gva, size_t len)
- Input Args:
- vm - Virtual Machine
- backing_src - Storage source for this region.
NULL to use anonymous memory.
- src_type - Storage source for this region.
NULL to use anonymous memory.
- guest_paddr - Starting guest physical address
- slot - KVM region slot
- npages - Number of physical pages
-- 2.31.1.751.gd2f1c929bd-goog
A small cleanup. Our caller writes:
r = setup_demand_paging(...); if (r < 0) exit(-r);
Since we're just going to exit anyway, instead of returning an error we can just re-use TEST_ASSERT. This makes the caller simpler, as well as the function itself - no need to write our branches, etc.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com --- .../selftests/kvm/demand_paging_test.c | 51 +++++++------------ 1 file changed, 19 insertions(+), 32 deletions(-)
diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index 9398ba6ef023..601a1df24dd2 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -9,6 +9,8 @@
#define _GNU_SOURCE /* for pipe2 */
+#include <inttypes.h> +#include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <time.h> @@ -198,42 +200,32 @@ static void *uffd_handler_thread_fn(void *arg) return NULL; }
-static int setup_demand_paging(struct kvm_vm *vm, - pthread_t *uffd_handler_thread, int pipefd, - useconds_t uffd_delay, - struct uffd_handler_args *uffd_args, - void *hva, uint64_t len) +static void setup_demand_paging(struct kvm_vm *vm, + pthread_t *uffd_handler_thread, int pipefd, + useconds_t uffd_delay, + struct uffd_handler_args *uffd_args, + void *hva, uint64_t len) { int uffd; struct uffdio_api uffdio_api; struct uffdio_register uffdio_register;
uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); - if (uffd == -1) { - pr_info("uffd creation failed\n"); - return -1; - } + TEST_ASSERT(uffd >= 0, "uffd creation failed, errno: %d", errno);
uffdio_api.api = UFFD_API; uffdio_api.features = 0; - if (ioctl(uffd, UFFDIO_API, &uffdio_api) == -1) { - pr_info("ioctl uffdio_api failed\n"); - return -1; - } + TEST_ASSERT(ioctl(uffd, UFFDIO_API, &uffdio_api) != -1, + "ioctl UFFDIO_API failed: %" PRIu64, + (uint64_t)uffdio_api.api);
uffdio_register.range.start = (uint64_t)hva; uffdio_register.range.len = len; uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING; - if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) == -1) { - pr_info("ioctl uffdio_register failed\n"); - return -1; - } - - if ((uffdio_register.ioctls & UFFD_API_RANGE_IOCTLS) != - UFFD_API_RANGE_IOCTLS) { - pr_info("unexpected userfaultfd ioctl set\n"); - return -1; - } + TEST_ASSERT(ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) != -1, + "ioctl UFFDIO_REGISTER failed"); + TEST_ASSERT((uffdio_register.ioctls & UFFD_API_RANGE_IOCTLS) == + UFFD_API_RANGE_IOCTLS, "unexpected userfaultfd ioctl set");
uffd_args->uffd = uffd; uffd_args->pipefd = pipefd; @@ -243,8 +235,6 @@ static int setup_demand_paging(struct kvm_vm *vm,
PER_VCPU_DEBUG("Created uffd thread for HVA range [%p, %p)\n", hva, hva + len); - - return 0; }
struct test_params { @@ -321,13 +311,10 @@ static void run_test(enum vm_guest_mode mode, void *arg) O_CLOEXEC | O_NONBLOCK); TEST_ASSERT(!r, "Failed to set up pipefd");
- r = setup_demand_paging(vm, - &uffd_handler_threads[vcpu_id], - pipefds[vcpu_id * 2], - p->uffd_delay, &uffd_args[vcpu_id], - vcpu_hva, vcpu_mem_size); - if (r < 0) - exit(-r); + setup_demand_paging(vm, &uffd_handler_threads[vcpu_id], + pipefds[vcpu_id * 2], p->uffd_delay, + &uffd_args[vcpu_id], vcpu_hva, + vcpu_mem_size); } }
On Wed, May 19, 2021 at 1:03 PM Axel Rasmussen axelrasmussen@google.com wrote:
A small cleanup. Our caller writes:
r = setup_demand_paging(...); if (r < 0) exit(-r);
Since we're just going to exit anyway, instead of returning an error we can just re-use TEST_ASSERT. This makes the caller simpler, as well as the function itself - no need to write our branches, etc.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
.../selftests/kvm/demand_paging_test.c | 51 +++++++------------ 1 file changed, 19 insertions(+), 32 deletions(-)
diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index 9398ba6ef023..601a1df24dd2 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -9,6 +9,8 @@
#define _GNU_SOURCE /* for pipe2 */
+#include <inttypes.h> +#include <stdint.h>
Why do the includes need to change in this commit? Is it for the PRIu64 below?
#include <stdio.h> #include <stdlib.h> #include <time.h> @@ -198,42 +200,32 @@ static void *uffd_handler_thread_fn(void *arg) return NULL; }
-static int setup_demand_paging(struct kvm_vm *vm,
pthread_t *uffd_handler_thread, int pipefd,
useconds_t uffd_delay,
struct uffd_handler_args *uffd_args,
void *hva, uint64_t len)
+static void setup_demand_paging(struct kvm_vm *vm,
pthread_t *uffd_handler_thread, int pipefd,
useconds_t uffd_delay,
struct uffd_handler_args *uffd_args,
void *hva, uint64_t len)
{ int uffd; struct uffdio_api uffdio_api; struct uffdio_register uffdio_register;
uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
if (uffd == -1) {
pr_info("uffd creation failed\n");
return -1;
}
TEST_ASSERT(uffd >= 0, "uffd creation failed, errno: %d", errno); uffdio_api.api = UFFD_API; uffdio_api.features = 0;
if (ioctl(uffd, UFFDIO_API, &uffdio_api) == -1) {
pr_info("ioctl uffdio_api failed\n");
return -1;
}
TEST_ASSERT(ioctl(uffd, UFFDIO_API, &uffdio_api) != -1,
"ioctl UFFDIO_API failed: %" PRIu64,
(uint64_t)uffdio_api.api); uffdio_register.range.start = (uint64_t)hva; uffdio_register.range.len = len; uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) == -1) {
pr_info("ioctl uffdio_register failed\n");
return -1;
}
if ((uffdio_register.ioctls & UFFD_API_RANGE_IOCTLS) !=
UFFD_API_RANGE_IOCTLS) {
pr_info("unexpected userfaultfd ioctl set\n");
return -1;
}
TEST_ASSERT(ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) != -1,
"ioctl UFFDIO_REGISTER failed");
TEST_ASSERT((uffdio_register.ioctls & UFFD_API_RANGE_IOCTLS) ==
UFFD_API_RANGE_IOCTLS, "unexpected userfaultfd ioctl set"); uffd_args->uffd = uffd; uffd_args->pipefd = pipefd;
@@ -243,8 +235,6 @@ static int setup_demand_paging(struct kvm_vm *vm,
PER_VCPU_DEBUG("Created uffd thread for HVA range [%p, %p)\n", hva, hva + len);
return 0;
}
struct test_params { @@ -321,13 +311,10 @@ static void run_test(enum vm_guest_mode mode, void *arg) O_CLOEXEC | O_NONBLOCK); TEST_ASSERT(!r, "Failed to set up pipefd");
r = setup_demand_paging(vm,
&uffd_handler_threads[vcpu_id],
pipefds[vcpu_id * 2],
p->uffd_delay, &uffd_args[vcpu_id],
vcpu_hva, vcpu_mem_size);
if (r < 0)
exit(-r);
setup_demand_paging(vm, &uffd_handler_threads[vcpu_id],
pipefds[vcpu_id * 2], p->uffd_delay,
&uffd_args[vcpu_id], vcpu_hva,
vcpu_mem_size); } }
-- 2.31.1.751.gd2f1c929bd-goog
On Wed, May 19, 2021 at 2:45 PM Ben Gardon bgardon@google.com wrote:
On Wed, May 19, 2021 at 1:03 PM Axel Rasmussen axelrasmussen@google.com wrote:
A small cleanup. Our caller writes:
r = setup_demand_paging(...); if (r < 0) exit(-r);
Since we're just going to exit anyway, instead of returning an error we can just re-use TEST_ASSERT. This makes the caller simpler, as well as the function itself - no need to write our branches, etc.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
.../selftests/kvm/demand_paging_test.c | 51 +++++++------------ 1 file changed, 19 insertions(+), 32 deletions(-)
diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index 9398ba6ef023..601a1df24dd2 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -9,6 +9,8 @@
#define _GNU_SOURCE /* for pipe2 */
+#include <inttypes.h> +#include <stdint.h>
Why do the includes need to change in this commit? Is it for the PRIu64 below?
Right, I didn't actually try compiling without these, but inttypes.h defines PRIu64 and stdint.h defines uint64_t. In general I tend to prefer including things like this because we're using their definitions directly, even if we might be picking them up transiently some other way.
#include <stdio.h> #include <stdlib.h> #include <time.h> @@ -198,42 +200,32 @@ static void *uffd_handler_thread_fn(void *arg) return NULL; }
-static int setup_demand_paging(struct kvm_vm *vm,
pthread_t *uffd_handler_thread, int pipefd,
useconds_t uffd_delay,
struct uffd_handler_args *uffd_args,
void *hva, uint64_t len)
+static void setup_demand_paging(struct kvm_vm *vm,
pthread_t *uffd_handler_thread, int pipefd,
useconds_t uffd_delay,
struct uffd_handler_args *uffd_args,
void *hva, uint64_t len)
{ int uffd; struct uffdio_api uffdio_api; struct uffdio_register uffdio_register;
uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
if (uffd == -1) {
pr_info("uffd creation failed\n");
return -1;
}
TEST_ASSERT(uffd >= 0, "uffd creation failed, errno: %d", errno); uffdio_api.api = UFFD_API; uffdio_api.features = 0;
if (ioctl(uffd, UFFDIO_API, &uffdio_api) == -1) {
pr_info("ioctl uffdio_api failed\n");
return -1;
}
TEST_ASSERT(ioctl(uffd, UFFDIO_API, &uffdio_api) != -1,
"ioctl UFFDIO_API failed: %" PRIu64,
(uint64_t)uffdio_api.api); uffdio_register.range.start = (uint64_t)hva; uffdio_register.range.len = len; uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) == -1) {
pr_info("ioctl uffdio_register failed\n");
return -1;
}
if ((uffdio_register.ioctls & UFFD_API_RANGE_IOCTLS) !=
UFFD_API_RANGE_IOCTLS) {
pr_info("unexpected userfaultfd ioctl set\n");
return -1;
}
TEST_ASSERT(ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) != -1,
"ioctl UFFDIO_REGISTER failed");
TEST_ASSERT((uffdio_register.ioctls & UFFD_API_RANGE_IOCTLS) ==
UFFD_API_RANGE_IOCTLS, "unexpected userfaultfd ioctl set"); uffd_args->uffd = uffd; uffd_args->pipefd = pipefd;
@@ -243,8 +235,6 @@ static int setup_demand_paging(struct kvm_vm *vm,
PER_VCPU_DEBUG("Created uffd thread for HVA range [%p, %p)\n", hva, hva + len);
return 0;
}
struct test_params { @@ -321,13 +311,10 @@ static void run_test(enum vm_guest_mode mode, void *arg) O_CLOEXEC | O_NONBLOCK); TEST_ASSERT(!r, "Failed to set up pipefd");
r = setup_demand_paging(vm,
&uffd_handler_threads[vcpu_id],
pipefds[vcpu_id * 2],
p->uffd_delay, &uffd_args[vcpu_id],
vcpu_hva, vcpu_mem_size);
if (r < 0)
exit(-r);
setup_demand_paging(vm, &uffd_handler_threads[vcpu_id],
pipefds[vcpu_id * 2], p->uffd_delay,
&uffd_args[vcpu_id], vcpu_hva,
vcpu_mem_size); } }
-- 2.31.1.751.gd2f1c929bd-goog
On Wed, May 19, 2021 at 3:14 PM Axel Rasmussen axelrasmussen@google.com wrote:
On Wed, May 19, 2021 at 2:45 PM Ben Gardon bgardon@google.com wrote:
On Wed, May 19, 2021 at 1:03 PM Axel Rasmussen axelrasmussen@google.com wrote:
A small cleanup. Our caller writes:
r = setup_demand_paging(...); if (r < 0) exit(-r);
Since we're just going to exit anyway, instead of returning an error we can just re-use TEST_ASSERT. This makes the caller simpler, as well as the function itself - no need to write our branches, etc.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
Reviewed-by: Ben Gardon bgardon@google.com
.../selftests/kvm/demand_paging_test.c | 51 +++++++------------ 1 file changed, 19 insertions(+), 32 deletions(-)
diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index 9398ba6ef023..601a1df24dd2 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -9,6 +9,8 @@
#define _GNU_SOURCE /* for pipe2 */
+#include <inttypes.h> +#include <stdint.h>
Why do the includes need to change in this commit? Is it for the PRIu64 below?
Right, I didn't actually try compiling without these, but inttypes.h defines PRIu64 and stdint.h defines uint64_t. In general I tend to prefer including things like this because we're using their definitions directly, even if we might be picking them up transiently some other way.
Makes sense to me.
#include <stdio.h> #include <stdlib.h> #include <time.h> @@ -198,42 +200,32 @@ static void *uffd_handler_thread_fn(void *arg) return NULL; }
-static int setup_demand_paging(struct kvm_vm *vm,
pthread_t *uffd_handler_thread, int pipefd,
useconds_t uffd_delay,
struct uffd_handler_args *uffd_args,
void *hva, uint64_t len)
+static void setup_demand_paging(struct kvm_vm *vm,
pthread_t *uffd_handler_thread, int pipefd,
useconds_t uffd_delay,
struct uffd_handler_args *uffd_args,
void *hva, uint64_t len)
{ int uffd; struct uffdio_api uffdio_api; struct uffdio_register uffdio_register;
uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
if (uffd == -1) {
pr_info("uffd creation failed\n");
return -1;
}
TEST_ASSERT(uffd >= 0, "uffd creation failed, errno: %d", errno); uffdio_api.api = UFFD_API; uffdio_api.features = 0;
if (ioctl(uffd, UFFDIO_API, &uffdio_api) == -1) {
pr_info("ioctl uffdio_api failed\n");
return -1;
}
TEST_ASSERT(ioctl(uffd, UFFDIO_API, &uffdio_api) != -1,
"ioctl UFFDIO_API failed: %" PRIu64,
(uint64_t)uffdio_api.api); uffdio_register.range.start = (uint64_t)hva; uffdio_register.range.len = len; uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) == -1) {
pr_info("ioctl uffdio_register failed\n");
return -1;
}
if ((uffdio_register.ioctls & UFFD_API_RANGE_IOCTLS) !=
UFFD_API_RANGE_IOCTLS) {
pr_info("unexpected userfaultfd ioctl set\n");
return -1;
}
TEST_ASSERT(ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) != -1,
"ioctl UFFDIO_REGISTER failed");
TEST_ASSERT((uffdio_register.ioctls & UFFD_API_RANGE_IOCTLS) ==
UFFD_API_RANGE_IOCTLS, "unexpected userfaultfd ioctl set"); uffd_args->uffd = uffd; uffd_args->pipefd = pipefd;
@@ -243,8 +235,6 @@ static int setup_demand_paging(struct kvm_vm *vm,
PER_VCPU_DEBUG("Created uffd thread for HVA range [%p, %p)\n", hva, hva + len);
return 0;
}
struct test_params { @@ -321,13 +311,10 @@ static void run_test(enum vm_guest_mode mode, void *arg) O_CLOEXEC | O_NONBLOCK); TEST_ASSERT(!r, "Failed to set up pipefd");
r = setup_demand_paging(vm,
&uffd_handler_threads[vcpu_id],
pipefds[vcpu_id * 2],
p->uffd_delay, &uffd_args[vcpu_id],
vcpu_hva, vcpu_mem_size);
if (r < 0)
exit(-r);
setup_demand_paging(vm, &uffd_handler_threads[vcpu_id],
pipefds[vcpu_id * 2], p->uffd_delay,
&uffd_args[vcpu_id], vcpu_hva,
vcpu_mem_size); } }
-- 2.31.1.751.gd2f1c929bd-goog
On 20/05/21 00:14, Axel Rasmussen wrote:
On Wed, May 19, 2021 at 2:45 PM Ben Gardon bgardon@google.com wrote:
On Wed, May 19, 2021 at 1:03 PM Axel Rasmussen axelrasmussen@google.com wrote:
A small cleanup. Our caller writes:
r = setup_demand_paging(...); if (r < 0) exit(-r);
Since we're just going to exit anyway, instead of returning an error we can just re-use TEST_ASSERT. This makes the caller simpler, as well as the function itself - no need to write our branches, etc.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
.../selftests/kvm/demand_paging_test.c | 51 +++++++------------ 1 file changed, 19 insertions(+), 32 deletions(-)
diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index 9398ba6ef023..601a1df24dd2 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -9,6 +9,8 @@
#define _GNU_SOURCE /* for pipe2 */
+#include <inttypes.h> +#include <stdint.h>
Why do the includes need to change in this commit? Is it for the PRIu64 below?
Right, I didn't actually try compiling without these, but inttypes.h defines PRIu64 and stdint.h defines uint64_t. In general I tend to prefer including things like this because we're using their definitions directly, even if we might be picking them up transiently some other way.
inttypes.h is defined to include stdint.h (stdint.h is mostly useful in freestanding environments and is usually provided by the C compiler, while inttypes.h is provided by libc).
Paolo
Previously, if this check failed, we'd just exit quietly with no output. This can be confusing, so print out a short message indicating why the test is being skipped.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com --- tools/testing/selftests/kvm/lib/kvm_util.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index f05ca919cccb..0d6ddee429b9 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -53,8 +53,10 @@ int kvm_check_cap(long cap) int kvm_fd;
kvm_fd = open(KVM_DEV_PATH, O_RDONLY); - if (kvm_fd < 0) + if (kvm_fd < 0) { + print_skip("KVM not available, errno: %d", errno); exit(KSFT_SKIP); + }
ret = ioctl(kvm_fd, KVM_CHECK_EXTENSION, cap); TEST_ASSERT(ret != -1, "KVM_CHECK_EXTENSION IOCTL failed,\n"
On Wed, May 19, 2021 at 1:03 PM Axel Rasmussen axelrasmussen@google.com wrote:
Previously, if this check failed, we'd just exit quietly with no output. This can be confusing, so print out a short message indicating why the test is being skipped.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
Reviewed-by: Ben Gardon bgardon@google.com
tools/testing/selftests/kvm/lib/kvm_util.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index f05ca919cccb..0d6ddee429b9 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -53,8 +53,10 @@ int kvm_check_cap(long cap) int kvm_fd;
kvm_fd = open(KVM_DEV_PATH, O_RDONLY);
if (kvm_fd < 0)
if (kvm_fd < 0) {
print_skip("KVM not available, errno: %d", errno); exit(KSFT_SKIP);
}
This is a wonderful change. I believe this will only be hit if KVM is built as a module and that module has not yet been loaded, so this message could also suggest that the user check if the KVM / KVM arch/vendor specific module has been loaded.
ret = ioctl(kvm_fd, KVM_CHECK_EXTENSION, cap); TEST_ASSERT(ret != -1, "KVM_CHECK_EXTENSION IOCTL failed,\n"
-- 2.31.1.751.gd2f1c929bd-goog
On 19/05/21 23:49, Ben Gardon wrote:
On Wed, May 19, 2021 at 1:03 PM Axel Rasmussen axelrasmussen@google.com wrote:
Previously, if this check failed, we'd just exit quietly with no output. This can be confusing, so print out a short message indicating why the test is being skipped.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
Reviewed-by: Ben Gardon bgardon@google.com
tools/testing/selftests/kvm/lib/kvm_util.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index f05ca919cccb..0d6ddee429b9 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -53,8 +53,10 @@ int kvm_check_cap(long cap) int kvm_fd;
kvm_fd = open(KVM_DEV_PATH, O_RDONLY);
if (kvm_fd < 0)
if (kvm_fd < 0) {
print_skip("KVM not available, errno: %d", errno); exit(KSFT_SKIP);
}
This is a wonderful change. I believe this will only be hit if KVM is built as a module and that module has not yet been loaded, so this message could also suggest that the user check if the KVM / KVM arch/vendor specific module has been loaded.
Let's make it
print_skip("%s not available, is KVM loaded? (errno: %d)", KVM_DEV_PATH, errno);
Paolo
This is a preparatory commit needed before we can use different kinds of backing pages for guest memory.
Previously, we used perf_test_args.host_page_size, which is the host's native page size (commonly 4K). For VM_MEM_SRC_ANONYMOUS this turns out to be okay, but in a follow-up commit we want to allow using different kinds of backing memory.
Take VM_MEM_SRC_ANONYMOUS_HUGETLB for example. Without this change, if we used that backing page type, when we issued a UFFDIO_COPY ioctl we'd only do so with 4K, rather than the full 2M of a backing hugepage. In this case, UFFDIO_COPY returns -EINVAL (__mcopy_atomic_hugetlb checks the size).
Signed-off-by: Axel Rasmussen axelrasmussen@google.com --- tools/testing/selftests/kvm/demand_paging_test.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index 601a1df24dd2..94cf047358d5 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -40,6 +40,7 @@
static int nr_vcpus = 1; static uint64_t guest_percpu_mem_size = DEFAULT_PER_VCPU_MEM_SIZE; +static size_t demand_paging_size; static char *guest_data_prototype;
static void *vcpu_worker(void *data) @@ -85,7 +86,7 @@ static int handle_uffd_page_request(int uffd, uint64_t addr)
copy.src = (uint64_t)guest_data_prototype; copy.dst = addr; - copy.len = perf_test_args.host_page_size; + copy.len = demand_paging_size; copy.mode = 0;
clock_gettime(CLOCK_MONOTONIC, &start); @@ -102,7 +103,7 @@ static int handle_uffd_page_request(int uffd, uint64_t addr) PER_PAGE_DEBUG("UFFDIO_COPY %d \t%ld ns\n", tid, timespec_to_ns(ts_diff)); PER_PAGE_DEBUG("Paged in %ld bytes at 0x%lx from thread %d\n", - perf_test_args.host_page_size, addr, tid); + demand_paging_size, addr, tid);
return 0; } @@ -261,10 +262,12 @@ static void run_test(enum vm_guest_mode mode, void *arg)
perf_test_args.wr_fract = 1;
- guest_data_prototype = malloc(perf_test_args.host_page_size); + demand_paging_size = get_backing_src_pagesz(VM_MEM_SRC_ANONYMOUS); + + guest_data_prototype = malloc(demand_paging_size); TEST_ASSERT(guest_data_prototype, "Failed to allocate buffer for guest data pattern"); - memset(guest_data_prototype, 0xAB, perf_test_args.host_page_size); + memset(guest_data_prototype, 0xAB, demand_paging_size);
vcpu_threads = malloc(nr_vcpus * sizeof(*vcpu_threads)); TEST_ASSERT(vcpu_threads, "Memory allocation failed");
On Wed, May 19, 2021 at 1:03 PM Axel Rasmussen axelrasmussen@google.com wrote:
This is a preparatory commit needed before we can use different kinds of backing pages for guest memory.
Previously, we used perf_test_args.host_page_size, which is the host's native page size (commonly 4K). For VM_MEM_SRC_ANONYMOUS this turns out to be okay, but in a follow-up commit we want to allow using different kinds of backing memory.
Take VM_MEM_SRC_ANONYMOUS_HUGETLB for example. Without this change, if we used that backing page type, when we issued a UFFDIO_COPY ioctl we'd only do so with 4K, rather than the full 2M of a backing hugepage. In this case, UFFDIO_COPY returns -EINVAL (__mcopy_atomic_hugetlb checks the size).
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
Reviewed-by: Ben Gardon bgardon@google.com
tools/testing/selftests/kvm/demand_paging_test.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index 601a1df24dd2..94cf047358d5 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -40,6 +40,7 @@
static int nr_vcpus = 1; static uint64_t guest_percpu_mem_size = DEFAULT_PER_VCPU_MEM_SIZE; +static size_t demand_paging_size; static char *guest_data_prototype;
static void *vcpu_worker(void *data) @@ -85,7 +86,7 @@ static int handle_uffd_page_request(int uffd, uint64_t addr)
copy.src = (uint64_t)guest_data_prototype; copy.dst = addr;
copy.len = perf_test_args.host_page_size;
copy.len = demand_paging_size; copy.mode = 0; clock_gettime(CLOCK_MONOTONIC, &start);
@@ -102,7 +103,7 @@ static int handle_uffd_page_request(int uffd, uint64_t addr) PER_PAGE_DEBUG("UFFDIO_COPY %d \t%ld ns\n", tid, timespec_to_ns(ts_diff)); PER_PAGE_DEBUG("Paged in %ld bytes at 0x%lx from thread %d\n",
perf_test_args.host_page_size, addr, tid);
demand_paging_size, addr, tid); return 0;
} @@ -261,10 +262,12 @@ static void run_test(enum vm_guest_mode mode, void *arg)
perf_test_args.wr_fract = 1;
guest_data_prototype = malloc(perf_test_args.host_page_size);
demand_paging_size = get_backing_src_pagesz(VM_MEM_SRC_ANONYMOUS);
guest_data_prototype = malloc(demand_paging_size); TEST_ASSERT(guest_data_prototype, "Failed to allocate buffer for guest data pattern");
memset(guest_data_prototype, 0xAB, perf_test_args.host_page_size);
memset(guest_data_prototype, 0xAB, demand_paging_size); vcpu_threads = malloc(nr_vcpus * sizeof(*vcpu_threads)); TEST_ASSERT(vcpu_threads, "Memory allocation failed");
-- 2.31.1.751.gd2f1c929bd-goog
Add an argument which lets us specify a different backing memory type for the test. The default is just to use anonymous, matching existing behavior.
This is in preparation for testing UFFD minor faults. For that, we'll need to use a new backing memory type which is setup with MAP_SHARED.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com --- tools/testing/selftests/kvm/demand_paging_test.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index 94cf047358d5..01890a7b0155 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -241,6 +241,7 @@ static void setup_demand_paging(struct kvm_vm *vm, struct test_params { bool use_uffd; useconds_t uffd_delay; + enum vm_mem_backing_src_type src_type; bool partition_vcpu_memory_access; };
@@ -258,11 +259,11 @@ static void run_test(enum vm_guest_mode mode, void *arg) int r;
vm = perf_test_create_vm(mode, nr_vcpus, guest_percpu_mem_size, - VM_MEM_SRC_ANONYMOUS); + p->src_type);
perf_test_args.wr_fract = 1;
- demand_paging_size = get_backing_src_pagesz(VM_MEM_SRC_ANONYMOUS); + demand_paging_size = get_backing_src_pagesz(p->src_type);
guest_data_prototype = malloc(demand_paging_size); TEST_ASSERT(guest_data_prototype, @@ -378,7 +379,7 @@ static void help(char *name) { puts(""); printf("usage: %s [-h] [-m mode] [-u] [-d uffd_delay_usec]\n" - " [-b memory] [-v vcpus] [-o]\n", name); + " [-b memory] [-t type] [-v vcpus] [-o]\n", name); guest_modes_help(); printf(" -u: use User Fault FD to handle vCPU page\n" " faults.\n"); @@ -388,6 +389,8 @@ static void help(char *name) printf(" -b: specify the size of the memory region which should be\n" " demand paged by each vCPU. e.g. 10M or 3G.\n" " Default: 1G\n"); + printf(" -t: The type of backing memory to use. Default: anonymous\n"); + backing_src_help(); printf(" -v: specify the number of vCPUs to run.\n"); printf(" -o: Overlap guest memory accesses instead of partitioning\n" " them into a separate region of memory for each vCPU.\n"); @@ -399,13 +402,14 @@ int main(int argc, char *argv[]) { int max_vcpus = kvm_check_cap(KVM_CAP_MAX_VCPUS); struct test_params p = { + .src_type = VM_MEM_SRC_ANONYMOUS, .partition_vcpu_memory_access = true, }; int opt;
guest_modes_append_default();
- while ((opt = getopt(argc, argv, "hm:ud:b:v:o")) != -1) { + while ((opt = getopt(argc, argv, "hm:ud:b:t:v:o")) != -1) { switch (opt) { case 'm': guest_modes_cmdline(optarg); @@ -420,6 +424,9 @@ int main(int argc, char *argv[]) case 'b': guest_percpu_mem_size = parse_size(optarg); break; + case 't': + p.src_type = parse_backing_src_type(optarg); + break; case 'v': nr_vcpus = atoi(optarg); TEST_ASSERT(nr_vcpus > 0 && nr_vcpus <= max_vcpus,
On Wed, May 19, 2021 at 1:03 PM Axel Rasmussen axelrasmussen@google.com wrote:
Add an argument which lets us specify a different backing memory type for the test. The default is just to use anonymous, matching existing behavior.
This is in preparation for testing UFFD minor faults. For that, we'll need to use a new backing memory type which is setup with MAP_SHARED.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
Reviewed-by: Ben Gardon bgardon@google.com
tools/testing/selftests/kvm/demand_paging_test.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index 94cf047358d5..01890a7b0155 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -241,6 +241,7 @@ static void setup_demand_paging(struct kvm_vm *vm, struct test_params { bool use_uffd; useconds_t uffd_delay;
enum vm_mem_backing_src_type src_type; bool partition_vcpu_memory_access;
};
@@ -258,11 +259,11 @@ static void run_test(enum vm_guest_mode mode, void *arg) int r;
vm = perf_test_create_vm(mode, nr_vcpus, guest_percpu_mem_size,
VM_MEM_SRC_ANONYMOUS);
p->src_type); perf_test_args.wr_fract = 1;
demand_paging_size = get_backing_src_pagesz(VM_MEM_SRC_ANONYMOUS);
demand_paging_size = get_backing_src_pagesz(p->src_type); guest_data_prototype = malloc(demand_paging_size); TEST_ASSERT(guest_data_prototype,
@@ -378,7 +379,7 @@ static void help(char *name) { puts(""); printf("usage: %s [-h] [-m mode] [-u] [-d uffd_delay_usec]\n"
" [-b memory] [-v vcpus] [-o]\n", name);
" [-b memory] [-t type] [-v vcpus] [-o]\n", name); guest_modes_help(); printf(" -u: use User Fault FD to handle vCPU page\n" " faults.\n");
@@ -388,6 +389,8 @@ static void help(char *name) printf(" -b: specify the size of the memory region which should be\n" " demand paged by each vCPU. e.g. 10M or 3G.\n" " Default: 1G\n");
printf(" -t: The type of backing memory to use. Default: anonymous\n");
backing_src_help(); printf(" -v: specify the number of vCPUs to run.\n"); printf(" -o: Overlap guest memory accesses instead of partitioning\n" " them into a separate region of memory for each vCPU.\n");
@@ -399,13 +402,14 @@ int main(int argc, char *argv[]) { int max_vcpus = kvm_check_cap(KVM_CAP_MAX_VCPUS); struct test_params p = {
.src_type = VM_MEM_SRC_ANONYMOUS, .partition_vcpu_memory_access = true, }; int opt; guest_modes_append_default();
while ((opt = getopt(argc, argv, "hm:ud:b:v:o")) != -1) {
while ((opt = getopt(argc, argv, "hm:ud:b:t:v:o")) != -1) { switch (opt) { case 'm': guest_modes_cmdline(optarg);
@@ -420,6 +424,9 @@ int main(int argc, char *argv[]) case 'b': guest_percpu_mem_size = parse_size(optarg); break;
case 't':
p.src_type = parse_backing_src_type(optarg);
break; case 'v': nr_vcpus = atoi(optarg); TEST_ASSERT(nr_vcpus > 0 && nr_vcpus <= max_vcpus,
-- 2.31.1.751.gd2f1c929bd-goog
Each struct vm_mem_backing_src_alias has a flags field, which denotes the flags used to mmap() an area of that type. Previously, this field never included MAP_PRIVATE | MAP_ANONYMOUS, because vm_userspace_mem_region_add assumed that *all* types would always use those flags, and so it hardcoded them.
In a follow-up commit, we'll add a new type: shmem. Areas of this type must not have MAP_PRIVATE | MAP_ANONYMOUS, and instead they must have MAP_SHARED.
So, refactor things. Make it so that the flags field of struct vm_mem_backing_src_alias really is a complete set of flags, and don't add in any extras in vm_userspace_mem_region_add. This will let us easily tack on shmem.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com --- tools/testing/selftests/kvm/lib/kvm_util.c | 5 ++- tools/testing/selftests/kvm/lib/test_util.c | 35 +++++++++++---------- 2 files changed, 21 insertions(+), 19 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 0d6ddee429b9..bc405785ac8b 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -759,9 +759,8 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
region->mmap_start = mmap(NULL, region->mmap_size, PROT_READ | PROT_WRITE, - MAP_PRIVATE | MAP_ANONYMOUS - | vm_mem_backing_src_alias(src_type)->flag, - -1, 0); + vm_mem_backing_src_alias(src_type)->flag, + region->fd, 0); TEST_ASSERT(region->mmap_start != MAP_FAILED, "test_malloc failed, mmap_start: %p errno: %i", region->mmap_start, errno); diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c index 63d2bc7d757b..06ddde068736 100644 --- a/tools/testing/selftests/kvm/lib/test_util.c +++ b/tools/testing/selftests/kvm/lib/test_util.c @@ -168,70 +168,73 @@ size_t get_def_hugetlb_pagesz(void)
const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i) { + static const int anon_flags = MAP_PRIVATE | MAP_ANONYMOUS; + static const int anon_huge_flags = anon_flags | MAP_HUGETLB; + static const struct vm_mem_backing_src_alias aliases[] = { [VM_MEM_SRC_ANONYMOUS] = { .name = "anonymous", - .flag = 0, + .flag = anon_flags, }, [VM_MEM_SRC_ANONYMOUS_THP] = { .name = "anonymous_thp", - .flag = 0, + .flag = anon_flags, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB] = { .name = "anonymous_hugetlb", - .flag = MAP_HUGETLB, + .flag = anon_huge_flags, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_16KB] = { .name = "anonymous_hugetlb_16kb", - .flag = MAP_HUGETLB | MAP_HUGE_16KB, + .flag = anon_huge_flags | MAP_HUGE_16KB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_64KB] = { .name = "anonymous_hugetlb_64kb", - .flag = MAP_HUGETLB | MAP_HUGE_64KB, + .flag = anon_huge_flags | MAP_HUGE_64KB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_512KB] = { .name = "anonymous_hugetlb_512kb", - .flag = MAP_HUGETLB | MAP_HUGE_512KB, + .flag = anon_huge_flags | MAP_HUGE_512KB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_1MB] = { .name = "anonymous_hugetlb_1mb", - .flag = MAP_HUGETLB | MAP_HUGE_1MB, + .flag = anon_huge_flags | MAP_HUGE_1MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_2MB] = { .name = "anonymous_hugetlb_2mb", - .flag = MAP_HUGETLB | MAP_HUGE_2MB, + .flag = anon_huge_flags | MAP_HUGE_2MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_8MB] = { .name = "anonymous_hugetlb_8mb", - .flag = MAP_HUGETLB | MAP_HUGE_8MB, + .flag = anon_huge_flags | MAP_HUGE_8MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_16MB] = { .name = "anonymous_hugetlb_16mb", - .flag = MAP_HUGETLB | MAP_HUGE_16MB, + .flag = anon_huge_flags | MAP_HUGE_16MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_32MB] = { .name = "anonymous_hugetlb_32mb", - .flag = MAP_HUGETLB | MAP_HUGE_32MB, + .flag = anon_huge_flags | MAP_HUGE_32MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_256MB] = { .name = "anonymous_hugetlb_256mb", - .flag = MAP_HUGETLB | MAP_HUGE_256MB, + .flag = anon_huge_flags | MAP_HUGE_256MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_512MB] = { .name = "anonymous_hugetlb_512mb", - .flag = MAP_HUGETLB | MAP_HUGE_512MB, + .flag = anon_huge_flags | MAP_HUGE_512MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB] = { .name = "anonymous_hugetlb_1gb", - .flag = MAP_HUGETLB | MAP_HUGE_1GB, + .flag = anon_huge_flags | MAP_HUGE_1GB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_2GB] = { .name = "anonymous_hugetlb_2gb", - .flag = MAP_HUGETLB | MAP_HUGE_2GB, + .flag = anon_huge_flags | MAP_HUGE_2GB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB] = { .name = "anonymous_hugetlb_16gb", - .flag = MAP_HUGETLB | MAP_HUGE_16GB, + .flag = anon_huge_flags | MAP_HUGE_16GB, }, }; _Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES,
On Wed, May 19, 2021 at 1:04 PM Axel Rasmussen axelrasmussen@google.com wrote:
Each struct vm_mem_backing_src_alias has a flags field, which denotes the flags used to mmap() an area of that type. Previously, this field never included MAP_PRIVATE | MAP_ANONYMOUS, because vm_userspace_mem_region_add assumed that *all* types would always use those flags, and so it hardcoded them.
In a follow-up commit, we'll add a new type: shmem. Areas of this type must not have MAP_PRIVATE | MAP_ANONYMOUS, and instead they must have MAP_SHARED.
So, refactor things. Make it so that the flags field of struct vm_mem_backing_src_alias really is a complete set of flags, and don't add in any extras in vm_userspace_mem_region_add. This will let us easily tack on shmem.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
tools/testing/selftests/kvm/lib/kvm_util.c | 5 ++- tools/testing/selftests/kvm/lib/test_util.c | 35 +++++++++++---------- 2 files changed, 21 insertions(+), 19 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 0d6ddee429b9..bc405785ac8b 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -759,9 +759,8 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
region->mmap_start = mmap(NULL, region->mmap_size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS
| vm_mem_backing_src_alias(src_type)->flag,
-1, 0);
vm_mem_backing_src_alias(src_type)->flag,
region->fd, 0);
I don't see the region->fd change mentioned in the patch description or elsewhere in this patch. Is something setting region->fd to -1 or should this be part of another patch in the series?
TEST_ASSERT(region->mmap_start != MAP_FAILED, "test_malloc failed, mmap_start: %p errno: %i", region->mmap_start, errno);
diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c index 63d2bc7d757b..06ddde068736 100644 --- a/tools/testing/selftests/kvm/lib/test_util.c +++ b/tools/testing/selftests/kvm/lib/test_util.c @@ -168,70 +168,73 @@ size_t get_def_hugetlb_pagesz(void)
const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i) {
static const int anon_flags = MAP_PRIVATE | MAP_ANONYMOUS;
static const int anon_huge_flags = anon_flags | MAP_HUGETLB;
static const struct vm_mem_backing_src_alias aliases[] = { [VM_MEM_SRC_ANONYMOUS] = { .name = "anonymous",
.flag = 0,
.flag = anon_flags, }, [VM_MEM_SRC_ANONYMOUS_THP] = { .name = "anonymous_thp",
.flag = 0,
.flag = anon_flags, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB] = { .name = "anonymous_hugetlb",
.flag = MAP_HUGETLB,
.flag = anon_huge_flags, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_16KB] = { .name = "anonymous_hugetlb_16kb",
.flag = MAP_HUGETLB | MAP_HUGE_16KB,
.flag = anon_huge_flags | MAP_HUGE_16KB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_64KB] = { .name = "anonymous_hugetlb_64kb",
.flag = MAP_HUGETLB | MAP_HUGE_64KB,
.flag = anon_huge_flags | MAP_HUGE_64KB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_512KB] = { .name = "anonymous_hugetlb_512kb",
.flag = MAP_HUGETLB | MAP_HUGE_512KB,
.flag = anon_huge_flags | MAP_HUGE_512KB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_1MB] = { .name = "anonymous_hugetlb_1mb",
.flag = MAP_HUGETLB | MAP_HUGE_1MB,
.flag = anon_huge_flags | MAP_HUGE_1MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_2MB] = { .name = "anonymous_hugetlb_2mb",
.flag = MAP_HUGETLB | MAP_HUGE_2MB,
.flag = anon_huge_flags | MAP_HUGE_2MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_8MB] = { .name = "anonymous_hugetlb_8mb",
.flag = MAP_HUGETLB | MAP_HUGE_8MB,
.flag = anon_huge_flags | MAP_HUGE_8MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_16MB] = { .name = "anonymous_hugetlb_16mb",
.flag = MAP_HUGETLB | MAP_HUGE_16MB,
.flag = anon_huge_flags | MAP_HUGE_16MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_32MB] = { .name = "anonymous_hugetlb_32mb",
.flag = MAP_HUGETLB | MAP_HUGE_32MB,
.flag = anon_huge_flags | MAP_HUGE_32MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_256MB] = { .name = "anonymous_hugetlb_256mb",
.flag = MAP_HUGETLB | MAP_HUGE_256MB,
.flag = anon_huge_flags | MAP_HUGE_256MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_512MB] = { .name = "anonymous_hugetlb_512mb",
.flag = MAP_HUGETLB | MAP_HUGE_512MB,
.flag = anon_huge_flags | MAP_HUGE_512MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB] = { .name = "anonymous_hugetlb_1gb",
.flag = MAP_HUGETLB | MAP_HUGE_1GB,
.flag = anon_huge_flags | MAP_HUGE_1GB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_2GB] = { .name = "anonymous_hugetlb_2gb",
.flag = MAP_HUGETLB | MAP_HUGE_2GB,
.flag = anon_huge_flags | MAP_HUGE_2GB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB] = { .name = "anonymous_hugetlb_16gb",
.flag = MAP_HUGETLB | MAP_HUGE_16GB,
.flag = anon_huge_flags | MAP_HUGE_16GB, }, }; _Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES,
-- 2.31.1.751.gd2f1c929bd-goog
On Wed, May 19, 2021 at 3:02 PM Ben Gardon bgardon@google.com wrote:
On Wed, May 19, 2021 at 1:04 PM Axel Rasmussen axelrasmussen@google.com wrote:
Each struct vm_mem_backing_src_alias has a flags field, which denotes the flags used to mmap() an area of that type. Previously, this field never included MAP_PRIVATE | MAP_ANONYMOUS, because vm_userspace_mem_region_add assumed that *all* types would always use those flags, and so it hardcoded them.
In a follow-up commit, we'll add a new type: shmem. Areas of this type must not have MAP_PRIVATE | MAP_ANONYMOUS, and instead they must have MAP_SHARED.
So, refactor things. Make it so that the flags field of struct vm_mem_backing_src_alias really is a complete set of flags, and don't add in any extras in vm_userspace_mem_region_add. This will let us easily tack on shmem.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
tools/testing/selftests/kvm/lib/kvm_util.c | 5 ++- tools/testing/selftests/kvm/lib/test_util.c | 35 +++++++++++---------- 2 files changed, 21 insertions(+), 19 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 0d6ddee429b9..bc405785ac8b 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -759,9 +759,8 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
region->mmap_start = mmap(NULL, region->mmap_size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS
| vm_mem_backing_src_alias(src_type)->flag,
-1, 0);
vm_mem_backing_src_alias(src_type)->flag,
region->fd, 0);
I don't see the region->fd change mentioned in the patch description or elsewhere in this patch. Is something setting region->fd to -1 or should this be part of another patch in the series?
Ah, apologies, this is a mistake from splitting up the commits. When they were all squashed together, we set region->fd = -1 explicitly just above here, but with them separated we can't depend on that. I'll fix this in a v3.
TEST_ASSERT(region->mmap_start != MAP_FAILED, "test_malloc failed, mmap_start: %p errno: %i", region->mmap_start, errno);
diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c index 63d2bc7d757b..06ddde068736 100644 --- a/tools/testing/selftests/kvm/lib/test_util.c +++ b/tools/testing/selftests/kvm/lib/test_util.c @@ -168,70 +168,73 @@ size_t get_def_hugetlb_pagesz(void)
const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i) {
static const int anon_flags = MAP_PRIVATE | MAP_ANONYMOUS;
static const int anon_huge_flags = anon_flags | MAP_HUGETLB;
static const struct vm_mem_backing_src_alias aliases[] = { [VM_MEM_SRC_ANONYMOUS] = { .name = "anonymous",
.flag = 0,
.flag = anon_flags, }, [VM_MEM_SRC_ANONYMOUS_THP] = { .name = "anonymous_thp",
.flag = 0,
.flag = anon_flags, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB] = { .name = "anonymous_hugetlb",
.flag = MAP_HUGETLB,
.flag = anon_huge_flags, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_16KB] = { .name = "anonymous_hugetlb_16kb",
.flag = MAP_HUGETLB | MAP_HUGE_16KB,
.flag = anon_huge_flags | MAP_HUGE_16KB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_64KB] = { .name = "anonymous_hugetlb_64kb",
.flag = MAP_HUGETLB | MAP_HUGE_64KB,
.flag = anon_huge_flags | MAP_HUGE_64KB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_512KB] = { .name = "anonymous_hugetlb_512kb",
.flag = MAP_HUGETLB | MAP_HUGE_512KB,
.flag = anon_huge_flags | MAP_HUGE_512KB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_1MB] = { .name = "anonymous_hugetlb_1mb",
.flag = MAP_HUGETLB | MAP_HUGE_1MB,
.flag = anon_huge_flags | MAP_HUGE_1MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_2MB] = { .name = "anonymous_hugetlb_2mb",
.flag = MAP_HUGETLB | MAP_HUGE_2MB,
.flag = anon_huge_flags | MAP_HUGE_2MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_8MB] = { .name = "anonymous_hugetlb_8mb",
.flag = MAP_HUGETLB | MAP_HUGE_8MB,
.flag = anon_huge_flags | MAP_HUGE_8MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_16MB] = { .name = "anonymous_hugetlb_16mb",
.flag = MAP_HUGETLB | MAP_HUGE_16MB,
.flag = anon_huge_flags | MAP_HUGE_16MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_32MB] = { .name = "anonymous_hugetlb_32mb",
.flag = MAP_HUGETLB | MAP_HUGE_32MB,
.flag = anon_huge_flags | MAP_HUGE_32MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_256MB] = { .name = "anonymous_hugetlb_256mb",
.flag = MAP_HUGETLB | MAP_HUGE_256MB,
.flag = anon_huge_flags | MAP_HUGE_256MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_512MB] = { .name = "anonymous_hugetlb_512mb",
.flag = MAP_HUGETLB | MAP_HUGE_512MB,
.flag = anon_huge_flags | MAP_HUGE_512MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB] = { .name = "anonymous_hugetlb_1gb",
.flag = MAP_HUGETLB | MAP_HUGE_1GB,
.flag = anon_huge_flags | MAP_HUGE_1GB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_2GB] = { .name = "anonymous_hugetlb_2gb",
.flag = MAP_HUGETLB | MAP_HUGE_2GB,
.flag = anon_huge_flags | MAP_HUGE_2GB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB] = { .name = "anonymous_hugetlb_16gb",
.flag = MAP_HUGETLB | MAP_HUGE_16GB,
.flag = anon_huge_flags | MAP_HUGE_16GB, }, }; _Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES,
-- 2.31.1.751.gd2f1c929bd-goog
On Wed, May 19, 2021 at 3:17 PM Axel Rasmussen axelrasmussen@google.com wrote:
On Wed, May 19, 2021 at 3:02 PM Ben Gardon bgardon@google.com wrote:
On Wed, May 19, 2021 at 1:04 PM Axel Rasmussen axelrasmussen@google.com wrote:
Each struct vm_mem_backing_src_alias has a flags field, which denotes the flags used to mmap() an area of that type. Previously, this field never included MAP_PRIVATE | MAP_ANONYMOUS, because vm_userspace_mem_region_add assumed that *all* types would always use those flags, and so it hardcoded them.
In a follow-up commit, we'll add a new type: shmem. Areas of this type must not have MAP_PRIVATE | MAP_ANONYMOUS, and instead they must have MAP_SHARED.
So, refactor things. Make it so that the flags field of struct vm_mem_backing_src_alias really is a complete set of flags, and don't add in any extras in vm_userspace_mem_region_add. This will let us easily tack on shmem.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
tools/testing/selftests/kvm/lib/kvm_util.c | 5 ++- tools/testing/selftests/kvm/lib/test_util.c | 35 +++++++++++---------- 2 files changed, 21 insertions(+), 19 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 0d6ddee429b9..bc405785ac8b 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -759,9 +759,8 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
region->mmap_start = mmap(NULL, region->mmap_size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS
| vm_mem_backing_src_alias(src_type)->flag,
-1, 0);
vm_mem_backing_src_alias(src_type)->flag,
region->fd, 0);
I don't see the region->fd change mentioned in the patch description or elsewhere in this patch. Is something setting region->fd to -1 or should this be part of another patch in the series?
Ah, apologies, this is a mistake from splitting up the commits. When they were all squashed together, we set region->fd = -1 explicitly just above here, but with them separated we can't depend on that. I'll fix this in a v3.
Thanks for fixing that and thanks for splitting up the patches in this series. It made them super easy to review.
TEST_ASSERT(region->mmap_start != MAP_FAILED, "test_malloc failed, mmap_start: %p errno: %i", region->mmap_start, errno);
diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c index 63d2bc7d757b..06ddde068736 100644 --- a/tools/testing/selftests/kvm/lib/test_util.c +++ b/tools/testing/selftests/kvm/lib/test_util.c @@ -168,70 +168,73 @@ size_t get_def_hugetlb_pagesz(void)
const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i) {
static const int anon_flags = MAP_PRIVATE | MAP_ANONYMOUS;
static const int anon_huge_flags = anon_flags | MAP_HUGETLB;
static const struct vm_mem_backing_src_alias aliases[] = { [VM_MEM_SRC_ANONYMOUS] = { .name = "anonymous",
.flag = 0,
.flag = anon_flags, }, [VM_MEM_SRC_ANONYMOUS_THP] = { .name = "anonymous_thp",
.flag = 0,
.flag = anon_flags, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB] = { .name = "anonymous_hugetlb",
.flag = MAP_HUGETLB,
.flag = anon_huge_flags, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_16KB] = { .name = "anonymous_hugetlb_16kb",
.flag = MAP_HUGETLB | MAP_HUGE_16KB,
.flag = anon_huge_flags | MAP_HUGE_16KB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_64KB] = { .name = "anonymous_hugetlb_64kb",
.flag = MAP_HUGETLB | MAP_HUGE_64KB,
.flag = anon_huge_flags | MAP_HUGE_64KB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_512KB] = { .name = "anonymous_hugetlb_512kb",
.flag = MAP_HUGETLB | MAP_HUGE_512KB,
.flag = anon_huge_flags | MAP_HUGE_512KB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_1MB] = { .name = "anonymous_hugetlb_1mb",
.flag = MAP_HUGETLB | MAP_HUGE_1MB,
.flag = anon_huge_flags | MAP_HUGE_1MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_2MB] = { .name = "anonymous_hugetlb_2mb",
.flag = MAP_HUGETLB | MAP_HUGE_2MB,
.flag = anon_huge_flags | MAP_HUGE_2MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_8MB] = { .name = "anonymous_hugetlb_8mb",
.flag = MAP_HUGETLB | MAP_HUGE_8MB,
.flag = anon_huge_flags | MAP_HUGE_8MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_16MB] = { .name = "anonymous_hugetlb_16mb",
.flag = MAP_HUGETLB | MAP_HUGE_16MB,
.flag = anon_huge_flags | MAP_HUGE_16MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_32MB] = { .name = "anonymous_hugetlb_32mb",
.flag = MAP_HUGETLB | MAP_HUGE_32MB,
.flag = anon_huge_flags | MAP_HUGE_32MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_256MB] = { .name = "anonymous_hugetlb_256mb",
.flag = MAP_HUGETLB | MAP_HUGE_256MB,
.flag = anon_huge_flags | MAP_HUGE_256MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_512MB] = { .name = "anonymous_hugetlb_512mb",
.flag = MAP_HUGETLB | MAP_HUGE_512MB,
.flag = anon_huge_flags | MAP_HUGE_512MB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB] = { .name = "anonymous_hugetlb_1gb",
.flag = MAP_HUGETLB | MAP_HUGE_1GB,
.flag = anon_huge_flags | MAP_HUGE_1GB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_2GB] = { .name = "anonymous_hugetlb_2gb",
.flag = MAP_HUGETLB | MAP_HUGE_2GB,
.flag = anon_huge_flags | MAP_HUGE_2GB, }, [VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB] = { .name = "anonymous_hugetlb_16gb",
.flag = MAP_HUGETLB | MAP_HUGE_16GB,
.flag = anon_huge_flags | MAP_HUGE_16GB, }, }; _Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES,
-- 2.31.1.751.gd2f1c929bd-goog
This lets us run the demand paging test on top of a shmem-backed area. In follow-up commits, we'll 1) leverage this new capability to create an alias mapping, and then 2) use the alias mapping to exercise UFFD minor faults.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com --- tools/testing/selftests/kvm/include/test_util.h | 1 + tools/testing/selftests/kvm/lib/kvm_util.c | 15 +++++++++++++++ tools/testing/selftests/kvm/lib/test_util.c | 5 +++++ 3 files changed, 21 insertions(+)
diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h index fade3130eb01..7377f00469ef 100644 --- a/tools/testing/selftests/kvm/include/test_util.h +++ b/tools/testing/selftests/kvm/include/test_util.h @@ -84,6 +84,7 @@ enum vm_mem_backing_src_type { VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB, VM_MEM_SRC_ANONYMOUS_HUGETLB_2GB, VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB, + VM_MEM_SRC_SHMEM, NUM_SRC_TYPES, };
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index bc405785ac8b..e4a8d0c43c5e 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -757,6 +757,21 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, if (alignment > 1) region->mmap_size += alignment;
+ region->fd = -1; + if (src_type == VM_MEM_SRC_SHMEM) { + region->fd = memfd_create("kvm_selftest", MFD_CLOEXEC); + TEST_ASSERT(region->fd != -1, + "memfd_create failed, errno: %i", errno); + + ret = ftruncate(region->fd, region->mmap_size); + TEST_ASSERT(ret == 0, "ftruncate failed, errno: %i", errno); + + ret = fallocate(region->fd, + FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, 0, + region->mmap_size); + TEST_ASSERT(ret == 0, "fallocate failed, errno: %i", errno); + } + region->mmap_start = mmap(NULL, region->mmap_size, PROT_READ | PROT_WRITE, vm_mem_backing_src_alias(src_type)->flag, diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c index 06ddde068736..c7a265da5090 100644 --- a/tools/testing/selftests/kvm/lib/test_util.c +++ b/tools/testing/selftests/kvm/lib/test_util.c @@ -236,6 +236,10 @@ const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i) .name = "anonymous_hugetlb_16gb", .flag = anon_huge_flags | MAP_HUGE_16GB, }, + [VM_MEM_SRC_SHMEM] = { + .name = "shmem", + .flag = MAP_SHARED, + }, }; _Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES, "Missing new backing src types?"); @@ -253,6 +257,7 @@ size_t get_backing_src_pagesz(uint32_t i)
switch (i) { case VM_MEM_SRC_ANONYMOUS: + case VM_MEM_SRC_SHMEM: return getpagesize(); case VM_MEM_SRC_ANONYMOUS_THP: return get_trans_hugepagesz();
On Wed, May 19, 2021 at 1:04 PM Axel Rasmussen axelrasmussen@google.com wrote:
This lets us run the demand paging test on top of a shmem-backed area. In follow-up commits, we'll 1) leverage this new capability to create an alias mapping, and then 2) use the alias mapping to exercise UFFD minor faults.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
tools/testing/selftests/kvm/include/test_util.h | 1 + tools/testing/selftests/kvm/lib/kvm_util.c | 15 +++++++++++++++ tools/testing/selftests/kvm/lib/test_util.c | 5 +++++ 3 files changed, 21 insertions(+)
diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h index fade3130eb01..7377f00469ef 100644 --- a/tools/testing/selftests/kvm/include/test_util.h +++ b/tools/testing/selftests/kvm/include/test_util.h @@ -84,6 +84,7 @@ enum vm_mem_backing_src_type { VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB, VM_MEM_SRC_ANONYMOUS_HUGETLB_2GB, VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB,
VM_MEM_SRC_SHMEM, NUM_SRC_TYPES,
};
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index bc405785ac8b..e4a8d0c43c5e 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -757,6 +757,21 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, if (alignment > 1) region->mmap_size += alignment;
region->fd = -1;
Ah I guess this is the corresponding change from the previous patch.
if (src_type == VM_MEM_SRC_SHMEM) {
region->fd = memfd_create("kvm_selftest", MFD_CLOEXEC);
TEST_ASSERT(region->fd != -1,
"memfd_create failed, errno: %i", errno);
ret = ftruncate(region->fd, region->mmap_size);
TEST_ASSERT(ret == 0, "ftruncate failed, errno: %i", errno);
ret = fallocate(region->fd,
FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, 0,
region->mmap_size);
TEST_ASSERT(ret == 0, "fallocate failed, errno: %i", errno);
}
region->mmap_start = mmap(NULL, region->mmap_size, PROT_READ | PROT_WRITE, vm_mem_backing_src_alias(src_type)->flag,
diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c index 06ddde068736..c7a265da5090 100644 --- a/tools/testing/selftests/kvm/lib/test_util.c +++ b/tools/testing/selftests/kvm/lib/test_util.c @@ -236,6 +236,10 @@ const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i) .name = "anonymous_hugetlb_16gb", .flag = anon_huge_flags | MAP_HUGE_16GB, },
[VM_MEM_SRC_SHMEM] = {
.name = "shmem",
.flag = MAP_SHARED,
}, }; _Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES, "Missing new backing src types?");
@@ -253,6 +257,7 @@ size_t get_backing_src_pagesz(uint32_t i)
switch (i) { case VM_MEM_SRC_ANONYMOUS:
case VM_MEM_SRC_SHMEM: return getpagesize(); case VM_MEM_SRC_ANONYMOUS_THP: return get_trans_hugepagesz();
-- 2.31.1.751.gd2f1c929bd-goog
When a memory region is added with a src_type specifying that it should use some kind of shared memory, also create an alias mapping to the same underlying physical pages.
And, add an API so tests can get access to these alias addresses. Basically, for a guest physical address, let us look up the analogous host *alias* address.
In a future commit, we'll modify the demand paging test to take advantage of this to exercise UFFD minor faults. The idea is, we pre-fault the underlying pages *via the alias*. When the *guest* faults, it gets a "minor" fault (PTEs don't exist yet, but a page is already in the page cache). Then, the userfaultfd theads can handle the fault: they could potentially modify the underlying memory *via the alias* if they wanted to, and then they install the PTEs and let the guest carry on via a UFFDIO_CONTINUE ioctl.
Reviewed-by: Ben Gardon bgardon@google.com Signed-off-by: Axel Rasmussen axelrasmussen@google.com --- .../testing/selftests/kvm/include/kvm_util.h | 1 + tools/testing/selftests/kvm/lib/kvm_util.c | 51 +++++++++++++++++++ .../selftests/kvm/lib/kvm_util_internal.h | 2 + 3 files changed, 54 insertions(+)
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h index a8f022794ce3..0624f25a6803 100644 --- a/tools/testing/selftests/kvm/include/kvm_util.h +++ b/tools/testing/selftests/kvm/include/kvm_util.h @@ -146,6 +146,7 @@ void virt_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, void *addr_gpa2hva(struct kvm_vm *vm, vm_paddr_t gpa); void *addr_gva2hva(struct kvm_vm *vm, vm_vaddr_t gva); vm_paddr_t addr_hva2gpa(struct kvm_vm *vm, void *hva); +void *addr_gpa2alias(struct kvm_vm *vm, vm_paddr_t gpa);
/* * Address Guest Virtual to Guest Physical diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index e4a8d0c43c5e..0b88d1bbc1e0 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -811,6 +811,19 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
/* Add to linked-list of memory regions. */ list_add(®ion->list, &vm->userspace_mem_regions); + + /* If shared memory, create an alias. */ + if (region->fd >= 0) { + region->mmap_alias = mmap(NULL, region->mmap_size, + PROT_READ | PROT_WRITE, + vm_mem_backing_src_alias(src_type)->flag, + region->fd, 0); + TEST_ASSERT(region->mmap_alias != MAP_FAILED, + "mmap of alias failed, errno: %i", errno); + + /* Align host alias address */ + region->host_alias = align(region->mmap_alias, alignment); + } }
/* @@ -1239,6 +1252,44 @@ vm_paddr_t addr_hva2gpa(struct kvm_vm *vm, void *hva) return -1; }
+/* + * Address VM physical to Host Virtual *alias*. + * + * Input Args: + * vm - Virtual Machine + * gpa - VM physical address + * + * Output Args: None + * + * Return: + * Equivalent address within the host virtual *alias* area, or NULL + * (without failing the test) if the guest memory is not shared (so + * no alias exists). + * + * When vm_create() and related functions are called with a shared memory + * src_type, we also create a writable, shared alias mapping of the + * underlying guest memory. This allows the host to manipulate guest memory + * without mapping that memory in the guest's address space. And, for + * userfaultfd-based demand paging, we can do so without triggering userfaults. + */ +void *addr_gpa2alias(struct kvm_vm *vm, vm_paddr_t gpa) +{ + struct userspace_mem_region *region; + + list_for_each_entry(region, &vm->userspace_mem_regions, list) { + if (!region->host_alias) + continue; + + if ((gpa >= region->region.guest_phys_addr) + && (gpa <= (region->region.guest_phys_addr + + region->region.memory_size - 1))) + return (void *) ((uintptr_t) region->host_alias + + (gpa - region->region.guest_phys_addr)); + } + + return NULL; +} + /* * VM Create IRQ Chip * diff --git a/tools/testing/selftests/kvm/lib/kvm_util_internal.h b/tools/testing/selftests/kvm/lib/kvm_util_internal.h index 91ce1b5d480b..a25af33d4a9c 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util_internal.h +++ b/tools/testing/selftests/kvm/lib/kvm_util_internal.h @@ -16,7 +16,9 @@ struct userspace_mem_region { int fd; off_t offset; void *host_mem; + void *host_alias; void *mmap_start; + void *mmap_alias; size_t mmap_size; struct list_head list; };
On Wed, May 19, 2021 at 1:04 PM Axel Rasmussen axelrasmussen@google.com wrote:
When a memory region is added with a src_type specifying that it should use some kind of shared memory, also create an alias mapping to the same underlying physical pages.
And, add an API so tests can get access to these alias addresses. Basically, for a guest physical address, let us look up the analogous host *alias* address.
In a future commit, we'll modify the demand paging test to take advantage of this to exercise UFFD minor faults. The idea is, we pre-fault the underlying pages *via the alias*. When the *guest* faults, it gets a "minor" fault (PTEs don't exist yet, but a page is already in the page cache). Then, the userfaultfd theads can handle the fault: they could potentially modify the underlying memory *via the alias* if they wanted to, and then they install the PTEs and let the guest carry on via a UFFDIO_CONTINUE ioctl.
Reviewed-by: Ben Gardon bgardon@google.com Signed-off-by: Axel Rasmussen axelrasmussen@google.com
.../testing/selftests/kvm/include/kvm_util.h | 1 + tools/testing/selftests/kvm/lib/kvm_util.c | 51 +++++++++++++++++++ .../selftests/kvm/lib/kvm_util_internal.h | 2 + 3 files changed, 54 insertions(+)
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h index a8f022794ce3..0624f25a6803 100644 --- a/tools/testing/selftests/kvm/include/kvm_util.h +++ b/tools/testing/selftests/kvm/include/kvm_util.h @@ -146,6 +146,7 @@ void virt_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, void *addr_gpa2hva(struct kvm_vm *vm, vm_paddr_t gpa); void *addr_gva2hva(struct kvm_vm *vm, vm_vaddr_t gva); vm_paddr_t addr_hva2gpa(struct kvm_vm *vm, void *hva); +void *addr_gpa2alias(struct kvm_vm *vm, vm_paddr_t gpa);
/*
- Address Guest Virtual to Guest Physical
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index e4a8d0c43c5e..0b88d1bbc1e0 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -811,6 +811,19 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
/* Add to linked-list of memory regions. */ list_add(®ion->list, &vm->userspace_mem_regions);
/* If shared memory, create an alias. */
if (region->fd >= 0) {
region->mmap_alias = mmap(NULL, region->mmap_size,
PROT_READ | PROT_WRITE,
vm_mem_backing_src_alias(src_type)->flag,
region->fd, 0);
TEST_ASSERT(region->mmap_alias != MAP_FAILED,
"mmap of alias failed, errno: %i", errno);
/* Align host alias address */
region->host_alias = align(region->mmap_alias, alignment);
}
}
/* @@ -1239,6 +1252,44 @@ vm_paddr_t addr_hva2gpa(struct kvm_vm *vm, void *hva) return -1; }
+/*
- Address VM physical to Host Virtual *alias*.
- Input Args:
- vm - Virtual Machine
- gpa - VM physical address
- Output Args: None
- Return:
- Equivalent address within the host virtual *alias* area, or NULL
- (without failing the test) if the guest memory is not shared (so
- no alias exists).
- When vm_create() and related functions are called with a shared memory
- src_type, we also create a writable, shared alias mapping of the
- underlying guest memory. This allows the host to manipulate guest memory
- without mapping that memory in the guest's address space. And, for
- userfaultfd-based demand paging, we can do so without triggering userfaults.
- */
+void *addr_gpa2alias(struct kvm_vm *vm, vm_paddr_t gpa) +{
struct userspace_mem_region *region;
list_for_each_entry(region, &vm->userspace_mem_regions, list) {
This patch fails to compile on top of with db0670ce3361 ("KVM: selftests: Keep track of memslots more efficiently").
This can be reproduced by checking out kvm/master and running `make -C tools/testing/selftests/kvm`.
The following diff fixes the compilation error but I did not have time to test it yet:
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index c98db1846e1b..28e528c19d28 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -1374,19 +1374,17 @@ vm_paddr_t addr_hva2gpa(struct kvm_vm *vm, void *hva) void *addr_gpa2alias(struct kvm_vm *vm, vm_paddr_t gpa) { struct userspace_mem_region *region; + uintptr_t offset;
- list_for_each_entry(region, &vm->userspace_mem_regions, list) { - if (!region->host_alias) - continue; + region = userspace_mem_region_find(vm, gpa, gpa); + if (!region) + return NULL;
- if ((gpa >= region->region.guest_phys_addr) - && (gpa <= (region->region.guest_phys_addr - + region->region.memory_size - 1))) - return (void *) ((uintptr_t) region->host_alias - + (gpa - region->region.guest_phys_addr)); - } + if (!region->host_alias) + return NULL;
- return NULL; + offset = gpa - region->region.guest_phys_addr; + return (void *) ((uintptr_t) region->host_alias + offset); }
/*
if (!region->host_alias)
continue;
if ((gpa >= region->region.guest_phys_addr)
&& (gpa <= (region->region.guest_phys_addr
+ region->region.memory_size - 1)))
return (void *) ((uintptr_t) region->host_alias
+ (gpa - region->region.guest_phys_addr));
}
return NULL;
+}
/*
- VM Create IRQ Chip
diff --git a/tools/testing/selftests/kvm/lib/kvm_util_internal.h b/tools/testing/selftests/kvm/lib/kvm_util_internal.h index 91ce1b5d480b..a25af33d4a9c 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util_internal.h +++ b/tools/testing/selftests/kvm/lib/kvm_util_internal.h @@ -16,7 +16,9 @@ struct userspace_mem_region { int fd; off_t offset; void *host_mem;
void *host_alias; void *mmap_start;
void *mmap_alias; size_t mmap_size; struct list_head list;
};
2.31.1.751.gd2f1c929bd-goog
I applied this change on top of kvm/master and tested it, and indeed it compiles and works correctly.
Paolo, feel free to take this with a:
Reviewed-by: Axel Rasmussen axelrasmussen@google.com
Or alternatively if you prefer I'm happy to send it as a real git-send-email patch.
On Tue, May 25, 2021 at 4:50 PM David Matlack dmatlack@google.com wrote:
On Wed, May 19, 2021 at 1:04 PM Axel Rasmussen axelrasmussen@google.com wrote:
When a memory region is added with a src_type specifying that it should use some kind of shared memory, also create an alias mapping to the same underlying physical pages.
And, add an API so tests can get access to these alias addresses. Basically, for a guest physical address, let us look up the analogous host *alias* address.
In a future commit, we'll modify the demand paging test to take advantage of this to exercise UFFD minor faults. The idea is, we pre-fault the underlying pages *via the alias*. When the *guest* faults, it gets a "minor" fault (PTEs don't exist yet, but a page is already in the page cache). Then, the userfaultfd theads can handle the fault: they could potentially modify the underlying memory *via the alias* if they wanted to, and then they install the PTEs and let the guest carry on via a UFFDIO_CONTINUE ioctl.
Reviewed-by: Ben Gardon bgardon@google.com Signed-off-by: Axel Rasmussen axelrasmussen@google.com
.../testing/selftests/kvm/include/kvm_util.h | 1 + tools/testing/selftests/kvm/lib/kvm_util.c | 51 +++++++++++++++++++ .../selftests/kvm/lib/kvm_util_internal.h | 2 + 3 files changed, 54 insertions(+)
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h index a8f022794ce3..0624f25a6803 100644 --- a/tools/testing/selftests/kvm/include/kvm_util.h +++ b/tools/testing/selftests/kvm/include/kvm_util.h @@ -146,6 +146,7 @@ void virt_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, void *addr_gpa2hva(struct kvm_vm *vm, vm_paddr_t gpa); void *addr_gva2hva(struct kvm_vm *vm, vm_vaddr_t gva); vm_paddr_t addr_hva2gpa(struct kvm_vm *vm, void *hva); +void *addr_gpa2alias(struct kvm_vm *vm, vm_paddr_t gpa);
/*
- Address Guest Virtual to Guest Physical
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index e4a8d0c43c5e..0b88d1bbc1e0 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -811,6 +811,19 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
/* Add to linked-list of memory regions. */ list_add(®ion->list, &vm->userspace_mem_regions);
/* If shared memory, create an alias. */
if (region->fd >= 0) {
region->mmap_alias = mmap(NULL, region->mmap_size,
PROT_READ | PROT_WRITE,
vm_mem_backing_src_alias(src_type)->flag,
region->fd, 0);
TEST_ASSERT(region->mmap_alias != MAP_FAILED,
"mmap of alias failed, errno: %i", errno);
/* Align host alias address */
region->host_alias = align(region->mmap_alias, alignment);
}
}
/* @@ -1239,6 +1252,44 @@ vm_paddr_t addr_hva2gpa(struct kvm_vm *vm, void *hva) return -1; }
+/*
- Address VM physical to Host Virtual *alias*.
- Input Args:
- vm - Virtual Machine
- gpa - VM physical address
- Output Args: None
- Return:
- Equivalent address within the host virtual *alias* area, or NULL
- (without failing the test) if the guest memory is not shared (so
- no alias exists).
- When vm_create() and related functions are called with a shared memory
- src_type, we also create a writable, shared alias mapping of the
- underlying guest memory. This allows the host to manipulate guest memory
- without mapping that memory in the guest's address space. And, for
- userfaultfd-based demand paging, we can do so without triggering userfaults.
- */
+void *addr_gpa2alias(struct kvm_vm *vm, vm_paddr_t gpa) +{
struct userspace_mem_region *region;
list_for_each_entry(region, &vm->userspace_mem_regions, list) {
This patch fails to compile on top of with db0670ce3361 ("KVM: selftests: Keep track of memslots more efficiently").
This can be reproduced by checking out kvm/master and running `make -C tools/testing/selftests/kvm`.
The following diff fixes the compilation error but I did not have time to test it yet:
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index c98db1846e1b..28e528c19d28 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -1374,19 +1374,17 @@ vm_paddr_t addr_hva2gpa(struct kvm_vm *vm, void *hva) void *addr_gpa2alias(struct kvm_vm *vm, vm_paddr_t gpa) { struct userspace_mem_region *region;
uintptr_t offset;
list_for_each_entry(region, &vm->userspace_mem_regions, list) {
if (!region->host_alias)
continue;
region = userspace_mem_region_find(vm, gpa, gpa);
if (!region)
return NULL;
if ((gpa >= region->region.guest_phys_addr)
&& (gpa <= (region->region.guest_phys_addr
+ region->region.memory_size - 1)))
return (void *) ((uintptr_t) region->host_alias
+ (gpa - region->region.guest_phys_addr));
}
if (!region->host_alias)
return NULL;
return NULL;
offset = gpa - region->region.guest_phys_addr;
return (void *) ((uintptr_t) region->host_alias + offset);
}
/*
if (!region->host_alias)
continue;
if ((gpa >= region->region.guest_phys_addr)
&& (gpa <= (region->region.guest_phys_addr
+ region->region.memory_size - 1)))
return (void *) ((uintptr_t) region->host_alias
+ (gpa - region->region.guest_phys_addr));
}
return NULL;
+}
/*
- VM Create IRQ Chip
diff --git a/tools/testing/selftests/kvm/lib/kvm_util_internal.h b/tools/testing/selftests/kvm/lib/kvm_util_internal.h index 91ce1b5d480b..a25af33d4a9c 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util_internal.h +++ b/tools/testing/selftests/kvm/lib/kvm_util_internal.h @@ -16,7 +16,9 @@ struct userspace_mem_region { int fd; off_t offset; void *host_mem;
void *host_alias; void *mmap_start;
void *mmap_alias; size_t mmap_size; struct list_head list;
};
2.31.1.751.gd2f1c929bd-goog
On 26/05/21 19:22, Axel Rasmussen wrote:
I applied this change on top of kvm/master and tested it, and indeed it compiles and works correctly.
Paolo, feel free to take this with a:
Reviewed-by: Axel Rasmussen axelrasmussen@google.com
Or alternatively if you prefer I'm happy to send it as a real git-send-email patch.
Yes, I squashed it.
Paolo
On Tue, May 25, 2021 at 4:50 PM David Matlack dmatlack@google.com wrote:
On Wed, May 19, 2021 at 1:04 PM Axel Rasmussen axelrasmussen@google.com wrote:
When a memory region is added with a src_type specifying that it should use some kind of shared memory, also create an alias mapping to the same underlying physical pages.
And, add an API so tests can get access to these alias addresses. Basically, for a guest physical address, let us look up the analogous host *alias* address.
In a future commit, we'll modify the demand paging test to take advantage of this to exercise UFFD minor faults. The idea is, we pre-fault the underlying pages *via the alias*. When the *guest* faults, it gets a "minor" fault (PTEs don't exist yet, but a page is already in the page cache). Then, the userfaultfd theads can handle the fault: they could potentially modify the underlying memory *via the alias* if they wanted to, and then they install the PTEs and let the guest carry on via a UFFDIO_CONTINUE ioctl.
Reviewed-by: Ben Gardon bgardon@google.com Signed-off-by: Axel Rasmussen axelrasmussen@google.com
.../testing/selftests/kvm/include/kvm_util.h | 1 + tools/testing/selftests/kvm/lib/kvm_util.c | 51 +++++++++++++++++++ .../selftests/kvm/lib/kvm_util_internal.h | 2 + 3 files changed, 54 insertions(+)
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h index a8f022794ce3..0624f25a6803 100644 --- a/tools/testing/selftests/kvm/include/kvm_util.h +++ b/tools/testing/selftests/kvm/include/kvm_util.h @@ -146,6 +146,7 @@ void virt_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, void *addr_gpa2hva(struct kvm_vm *vm, vm_paddr_t gpa); void *addr_gva2hva(struct kvm_vm *vm, vm_vaddr_t gva); vm_paddr_t addr_hva2gpa(struct kvm_vm *vm, void *hva); +void *addr_gpa2alias(struct kvm_vm *vm, vm_paddr_t gpa);
/*
- Address Guest Virtual to Guest Physical
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index e4a8d0c43c5e..0b88d1bbc1e0 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -811,6 +811,19 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
/* Add to linked-list of memory regions. */ list_add(®ion->list, &vm->userspace_mem_regions);
/* If shared memory, create an alias. */
if (region->fd >= 0) {
region->mmap_alias = mmap(NULL, region->mmap_size,
PROT_READ | PROT_WRITE,
vm_mem_backing_src_alias(src_type)->flag,
region->fd, 0);
TEST_ASSERT(region->mmap_alias != MAP_FAILED,
"mmap of alias failed, errno: %i", errno);
/* Align host alias address */
region->host_alias = align(region->mmap_alias, alignment);
}
}
/*
@@ -1239,6 +1252,44 @@ vm_paddr_t addr_hva2gpa(struct kvm_vm *vm, void *hva) return -1; }
+/*
- Address VM physical to Host Virtual *alias*.
- Input Args:
- vm - Virtual Machine
- gpa - VM physical address
- Output Args: None
- Return:
- Equivalent address within the host virtual *alias* area, or NULL
- (without failing the test) if the guest memory is not shared (so
- no alias exists).
- When vm_create() and related functions are called with a shared memory
- src_type, we also create a writable, shared alias mapping of the
- underlying guest memory. This allows the host to manipulate guest memory
- without mapping that memory in the guest's address space. And, for
- userfaultfd-based demand paging, we can do so without triggering userfaults.
- */
+void *addr_gpa2alias(struct kvm_vm *vm, vm_paddr_t gpa) +{
struct userspace_mem_region *region;
list_for_each_entry(region, &vm->userspace_mem_regions, list) {
This patch fails to compile on top of with db0670ce3361 ("KVM: selftests: Keep track of memslots more efficiently").
This can be reproduced by checking out kvm/master and running `make -C tools/testing/selftests/kvm`.
The following diff fixes the compilation error but I did not have time to test it yet:
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index c98db1846e1b..28e528c19d28 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -1374,19 +1374,17 @@ vm_paddr_t addr_hva2gpa(struct kvm_vm *vm, void *hva) void *addr_gpa2alias(struct kvm_vm *vm, vm_paddr_t gpa) { struct userspace_mem_region *region;
uintptr_t offset;
list_for_each_entry(region, &vm->userspace_mem_regions, list) {
if (!region->host_alias)
continue;
region = userspace_mem_region_find(vm, gpa, gpa);
if (!region)
return NULL;
if ((gpa >= region->region.guest_phys_addr)
&& (gpa <= (region->region.guest_phys_addr
+ region->region.memory_size - 1)))
return (void *) ((uintptr_t) region->host_alias
+ (gpa - region->region.guest_phys_addr));
}
if (!region->host_alias)
return NULL;
return NULL;
offset = gpa - region->region.guest_phys_addr;
return (void *) ((uintptr_t) region->host_alias + offset);
}
/*
if (!region->host_alias)
continue;
if ((gpa >= region->region.guest_phys_addr)
&& (gpa <= (region->region.guest_phys_addr
+ region->region.memory_size - 1)))
return (void *) ((uintptr_t) region->host_alias
+ (gpa - region->region.guest_phys_addr));
}
return NULL;
+}
- /*
- VM Create IRQ Chip
diff --git a/tools/testing/selftests/kvm/lib/kvm_util_internal.h b/tools/testing/selftests/kvm/lib/kvm_util_internal.h index 91ce1b5d480b..a25af33d4a9c 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util_internal.h +++ b/tools/testing/selftests/kvm/lib/kvm_util_internal.h @@ -16,7 +16,9 @@ struct userspace_mem_region { int fd; off_t offset; void *host_mem;
void *host_alias; void *mmap_start;
};void *mmap_alias; size_t mmap_size; struct list_head list;
-- 2.31.1.751.gd2f1c929bd-goog
UFFD handling of MINOR faults is a new feature whose use case is to speed up demand paging (compared to MISSING faults). So, it's interesting to let this selftest exercise this new mode.
Modify the demand paging test to have the option of using UFFD minor faults, as opposed to missing faults. Now, when turning on userfaultfd with '-u', the desired mode has to be specified ("MISSING" or "MINOR").
If we're in minor mode, before registering, prefault via the *alias*. This way, the guest will trigger minor faults, instead of missing faults, and we can UFFDIO_CONTINUE to resolve them.
Modify the page fault handler function to use the right ioctl depending on the mode we're running in. In MINOR mode, use UFFDIO_CONTINUE.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com --- .../selftests/kvm/demand_paging_test.c | 112 ++++++++++++------ 1 file changed, 79 insertions(+), 33 deletions(-)
diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index 01890a7b0155..df7190261923 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -74,33 +74,48 @@ static void *vcpu_worker(void *data) return NULL; }
-static int handle_uffd_page_request(int uffd, uint64_t addr) +static int handle_uffd_page_request(int uffd_mode, int uffd, uint64_t addr) { - pid_t tid; + pid_t tid = syscall(__NR_gettid); struct timespec start; struct timespec ts_diff; - struct uffdio_copy copy; int r;
- tid = syscall(__NR_gettid); + clock_gettime(CLOCK_MONOTONIC, &start);
- copy.src = (uint64_t)guest_data_prototype; - copy.dst = addr; - copy.len = demand_paging_size; - copy.mode = 0; + if (uffd_mode == UFFDIO_REGISTER_MODE_MISSING) { + struct uffdio_copy copy;
- clock_gettime(CLOCK_MONOTONIC, &start); + copy.src = (uint64_t)guest_data_prototype; + copy.dst = addr; + copy.len = demand_paging_size; + copy.mode = 0; + + r = ioctl(uffd, UFFDIO_COPY, ©); + if (r == -1) { + pr_info("Failed UFFDIO_COPY in 0x%lx from thread %d with errno: %d\n", + addr, tid, errno); + return r; + } + } else if (uffd_mode == UFFDIO_REGISTER_MODE_MINOR) { + struct uffdio_continue cont = {0}; + + cont.range.start = addr; + cont.range.len = demand_paging_size;
- r = ioctl(uffd, UFFDIO_COPY, ©); - if (r == -1) { - pr_info("Failed Paged in 0x%lx from thread %d with errno: %d\n", - addr, tid, errno); - return r; + r = ioctl(uffd, UFFDIO_CONTINUE, &cont); + if (r == -1) { + pr_info("Failed UFFDIO_CONTINUE in 0x%lx from thread %d with errno: %d\n", + addr, tid, errno); + return r; + } + } else { + TEST_FAIL("Invalid uffd mode %d", uffd_mode); }
ts_diff = timespec_elapsed(start);
- PER_PAGE_DEBUG("UFFDIO_COPY %d \t%ld ns\n", tid, + PER_PAGE_DEBUG("UFFD page-in %d \t%ld ns\n", tid, timespec_to_ns(ts_diff)); PER_PAGE_DEBUG("Paged in %ld bytes at 0x%lx from thread %d\n", demand_paging_size, addr, tid); @@ -111,6 +126,7 @@ static int handle_uffd_page_request(int uffd, uint64_t addr) bool quit_uffd_thread;
struct uffd_handler_args { + int uffd_mode; int uffd; int pipefd; useconds_t delay; @@ -187,7 +203,7 @@ static void *uffd_handler_thread_fn(void *arg) if (delay) usleep(delay); addr = msg.arg.pagefault.address; - r = handle_uffd_page_request(uffd, addr); + r = handle_uffd_page_request(uffd_args->uffd_mode, uffd, addr); if (r < 0) return NULL; pages++; @@ -203,13 +219,32 @@ static void *uffd_handler_thread_fn(void *arg)
static void setup_demand_paging(struct kvm_vm *vm, pthread_t *uffd_handler_thread, int pipefd, - useconds_t uffd_delay, + int uffd_mode, useconds_t uffd_delay, struct uffd_handler_args *uffd_args, - void *hva, uint64_t len) + void *hva, void *alias, uint64_t len) { + bool is_minor = (uffd_mode == UFFDIO_REGISTER_MODE_MINOR); int uffd; struct uffdio_api uffdio_api; struct uffdio_register uffdio_register; + uint64_t expected_ioctls = ((uint64_t) 1) << _UFFDIO_COPY; + + PER_PAGE_DEBUG("Userfaultfd %s mode, faults resolved with %s\n", + is_minor ? "MINOR" : "MISSING", + is_minor ? "UFFDIO_CONINUE" : "UFFDIO_COPY"); + + /* In order to get minor faults, prefault via the alias. */ + if (is_minor) { + size_t p; + + expected_ioctls = ((uint64_t) 1) << _UFFDIO_CONTINUE; + + TEST_ASSERT(alias != NULL, "Alias required for minor faults"); + for (p = 0; p < (len / demand_paging_size); ++p) { + memcpy(alias + (p * demand_paging_size), + guest_data_prototype, demand_paging_size); + } + }
uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); TEST_ASSERT(uffd >= 0, "uffd creation failed, errno: %d", errno); @@ -222,12 +257,13 @@ static void setup_demand_paging(struct kvm_vm *vm,
uffdio_register.range.start = (uint64_t)hva; uffdio_register.range.len = len; - uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING; + uffdio_register.mode = uffd_mode; TEST_ASSERT(ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) != -1, "ioctl UFFDIO_REGISTER failed"); - TEST_ASSERT((uffdio_register.ioctls & UFFD_API_RANGE_IOCTLS) == - UFFD_API_RANGE_IOCTLS, "unexpected userfaultfd ioctl set"); + TEST_ASSERT((uffdio_register.ioctls & expected_ioctls) == + expected_ioctls, "missing userfaultfd ioctls");
+ uffd_args->uffd_mode = uffd_mode; uffd_args->uffd = uffd; uffd_args->pipefd = pipefd; uffd_args->delay = uffd_delay; @@ -239,7 +275,7 @@ static void setup_demand_paging(struct kvm_vm *vm, }
struct test_params { - bool use_uffd; + int uffd_mode; useconds_t uffd_delay; enum vm_mem_backing_src_type src_type; bool partition_vcpu_memory_access; @@ -276,7 +312,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) perf_test_setup_vcpus(vm, nr_vcpus, guest_percpu_mem_size, p->partition_vcpu_memory_access);
- if (p->use_uffd) { + if (p->uffd_mode) { uffd_handler_threads = malloc(nr_vcpus * sizeof(*uffd_handler_threads)); TEST_ASSERT(uffd_handler_threads, "Memory allocation failed"); @@ -290,6 +326,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) { vm_paddr_t vcpu_gpa; void *vcpu_hva; + void *vcpu_alias; uint64_t vcpu_mem_size;
@@ -304,8 +341,9 @@ static void run_test(enum vm_guest_mode mode, void *arg) PER_VCPU_DEBUG("Added VCPU %d with test mem gpa [%lx, %lx)\n", vcpu_id, vcpu_gpa, vcpu_gpa + vcpu_mem_size);
- /* Cache the HVA pointer of the region */ + /* Cache the host addresses of the region */ vcpu_hva = addr_gpa2hva(vm, vcpu_gpa); + vcpu_alias = addr_gpa2alias(vm, vcpu_gpa);
/* * Set up user fault fd to handle demand paging @@ -316,8 +354,9 @@ static void run_test(enum vm_guest_mode mode, void *arg) TEST_ASSERT(!r, "Failed to set up pipefd");
setup_demand_paging(vm, &uffd_handler_threads[vcpu_id], - pipefds[vcpu_id * 2], p->uffd_delay, - &uffd_args[vcpu_id], vcpu_hva, + pipefds[vcpu_id * 2], p->uffd_mode, + p->uffd_delay, &uffd_args[vcpu_id], + vcpu_hva, vcpu_alias, vcpu_mem_size); } } @@ -346,7 +385,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
pr_info("All vCPU threads joined\n");
- if (p->use_uffd) { + if (p->uffd_mode) { char c;
/* Tell the user fault fd handler threads to quit */ @@ -368,7 +407,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
free(guest_data_prototype); free(vcpu_threads); - if (p->use_uffd) { + if (p->uffd_mode) { free(uffd_handler_threads); free(uffd_args); free(pipefds); @@ -378,11 +417,11 @@ static void run_test(enum vm_guest_mode mode, void *arg) static void help(char *name) { puts(""); - printf("usage: %s [-h] [-m mode] [-u] [-d uffd_delay_usec]\n" + printf("usage: %s [-h] [-m mode] [-u mode] [-d uffd_delay_usec]\n" " [-b memory] [-t type] [-v vcpus] [-o]\n", name); guest_modes_help(); - printf(" -u: use User Fault FD to handle vCPU page\n" - " faults.\n"); + printf(" -u: use userfaultfd to handle vCPU page faults. Mode is a\n" + " UFFD registration mode: 'MISSING' or 'MINOR'.\n"); printf(" -d: add a delay in usec to the User Fault\n" " FD handler to simulate demand paging\n" " overheads. Ignored without -u.\n"); @@ -409,13 +448,17 @@ int main(int argc, char *argv[])
guest_modes_append_default();
- while ((opt = getopt(argc, argv, "hm:ud:b:t:v:o")) != -1) { + while ((opt = getopt(argc, argv, "hm:u:d:b:t:v:o")) != -1) { switch (opt) { case 'm': guest_modes_cmdline(optarg); break; case 'u': - p.use_uffd = true; + if (!strcmp("MISSING", optarg)) + p.uffd_mode = UFFDIO_REGISTER_MODE_MISSING; + else if (!strcmp("MINOR", optarg)) + p.uffd_mode = UFFDIO_REGISTER_MODE_MINOR; + TEST_ASSERT(p.uffd_mode, "UFFD mode must be 'MISSING' or 'MINOR'."); break; case 'd': p.uffd_delay = strtoul(optarg, NULL, 0); @@ -442,6 +485,9 @@ int main(int argc, char *argv[]) } }
+ TEST_ASSERT(p.uffd_mode != UFFDIO_REGISTER_MODE_MINOR || p.src_type == VM_MEM_SRC_SHMEM, + "userfaultfd MINOR mode requires shared memory; pick a different -t"); + for_each_guest_mode(run_test, &p);
return 0;
On Wed, May 19, 2021 at 1:04 PM Axel Rasmussen axelrasmussen@google.com wrote:
UFFD handling of MINOR faults is a new feature whose use case is to speed up demand paging (compared to MISSING faults). So, it's interesting to let this selftest exercise this new mode.
Modify the demand paging test to have the option of using UFFD minor faults, as opposed to missing faults. Now, when turning on userfaultfd with '-u', the desired mode has to be specified ("MISSING" or "MINOR").
If we're in minor mode, before registering, prefault via the *alias*. This way, the guest will trigger minor faults, instead of missing faults, and we can UFFDIO_CONTINUE to resolve them.
Modify the page fault handler function to use the right ioctl depending on the mode we're running in. In MINOR mode, use UFFDIO_CONTINUE.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
.../selftests/kvm/demand_paging_test.c | 112 ++++++++++++------ 1 file changed, 79 insertions(+), 33 deletions(-)
diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index 01890a7b0155..df7190261923 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -74,33 +74,48 @@ static void *vcpu_worker(void *data) return NULL; }
-static int handle_uffd_page_request(int uffd, uint64_t addr) +static int handle_uffd_page_request(int uffd_mode, int uffd, uint64_t addr) {
pid_t tid;
pid_t tid = syscall(__NR_gettid); struct timespec start; struct timespec ts_diff;
struct uffdio_copy copy; int r;
tid = syscall(__NR_gettid);
clock_gettime(CLOCK_MONOTONIC, &start);
copy.src = (uint64_t)guest_data_prototype;
copy.dst = addr;
copy.len = demand_paging_size;
copy.mode = 0;
if (uffd_mode == UFFDIO_REGISTER_MODE_MISSING) {
struct uffdio_copy copy;
clock_gettime(CLOCK_MONOTONIC, &start);
copy.src = (uint64_t)guest_data_prototype;
copy.dst = addr;
copy.len = demand_paging_size;
copy.mode = 0;
r = ioctl(uffd, UFFDIO_COPY, ©);
if (r == -1) {
pr_info("Failed UFFDIO_COPY in 0x%lx from thread %d with errno: %d\n",
addr, tid, errno);
return r;
}
} else if (uffd_mode == UFFDIO_REGISTER_MODE_MINOR) {
struct uffdio_continue cont = {0};
cont.range.start = addr;
cont.range.len = demand_paging_size;
r = ioctl(uffd, UFFDIO_COPY, ©);
if (r == -1) {
pr_info("Failed Paged in 0x%lx from thread %d with errno: %d\n",
addr, tid, errno);
return r;
r = ioctl(uffd, UFFDIO_CONTINUE, &cont);
if (r == -1) {
pr_info("Failed UFFDIO_CONTINUE in 0x%lx from thread %d with errno: %d\n",
addr, tid, errno);
return r;
}
} else {
TEST_FAIL("Invalid uffd mode %d", uffd_mode); } ts_diff = timespec_elapsed(start);
PER_PAGE_DEBUG("UFFDIO_COPY %d \t%ld ns\n", tid,
PER_PAGE_DEBUG("UFFD page-in %d \t%ld ns\n", tid, timespec_to_ns(ts_diff)); PER_PAGE_DEBUG("Paged in %ld bytes at 0x%lx from thread %d\n", demand_paging_size, addr, tid);
@@ -111,6 +126,7 @@ static int handle_uffd_page_request(int uffd, uint64_t addr) bool quit_uffd_thread;
struct uffd_handler_args {
int uffd_mode; int uffd; int pipefd; useconds_t delay;
@@ -187,7 +203,7 @@ static void *uffd_handler_thread_fn(void *arg) if (delay) usleep(delay); addr = msg.arg.pagefault.address;
r = handle_uffd_page_request(uffd, addr);
r = handle_uffd_page_request(uffd_args->uffd_mode, uffd, addr); if (r < 0) return NULL; pages++;
@@ -203,13 +219,32 @@ static void *uffd_handler_thread_fn(void *arg)
static void setup_demand_paging(struct kvm_vm *vm, pthread_t *uffd_handler_thread, int pipefd,
useconds_t uffd_delay,
int uffd_mode, useconds_t uffd_delay, struct uffd_handler_args *uffd_args,
void *hva, uint64_t len)
void *hva, void *alias, uint64_t len)
{
bool is_minor = (uffd_mode == UFFDIO_REGISTER_MODE_MINOR); int uffd; struct uffdio_api uffdio_api; struct uffdio_register uffdio_register;
uint64_t expected_ioctls = ((uint64_t) 1) << _UFFDIO_COPY;
PER_PAGE_DEBUG("Userfaultfd %s mode, faults resolved with %s\n",
is_minor ? "MINOR" : "MISSING",
is_minor ? "UFFDIO_CONINUE" : "UFFDIO_COPY");
/* In order to get minor faults, prefault via the alias. */
if (is_minor) {
size_t p;
expected_ioctls = ((uint64_t) 1) << _UFFDIO_CONTINUE;
TEST_ASSERT(alias != NULL, "Alias required for minor faults");
for (p = 0; p < (len / demand_paging_size); ++p) {
memcpy(alias + (p * demand_paging_size),
guest_data_prototype, demand_paging_size);
}
}
Would it be worth timing this operation? I think we'd need to know how long we spent prefaulting the memory to really be able to compare UDDF modes using this test.
uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); TEST_ASSERT(uffd >= 0, "uffd creation failed, errno: %d", errno);
@@ -222,12 +257,13 @@ static void setup_demand_paging(struct kvm_vm *vm,
uffdio_register.range.start = (uint64_t)hva; uffdio_register.range.len = len;
uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
uffdio_register.mode = uffd_mode; TEST_ASSERT(ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) != -1, "ioctl UFFDIO_REGISTER failed");
TEST_ASSERT((uffdio_register.ioctls & UFFD_API_RANGE_IOCTLS) ==
UFFD_API_RANGE_IOCTLS, "unexpected userfaultfd ioctl set");
TEST_ASSERT((uffdio_register.ioctls & expected_ioctls) ==
expected_ioctls, "missing userfaultfd ioctls");
uffd_args->uffd_mode = uffd_mode; uffd_args->uffd = uffd; uffd_args->pipefd = pipefd; uffd_args->delay = uffd_delay;
@@ -239,7 +275,7 @@ static void setup_demand_paging(struct kvm_vm *vm, }
struct test_params {
bool use_uffd;
int uffd_mode; useconds_t uffd_delay; enum vm_mem_backing_src_type src_type; bool partition_vcpu_memory_access;
@@ -276,7 +312,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) perf_test_setup_vcpus(vm, nr_vcpus, guest_percpu_mem_size, p->partition_vcpu_memory_access);
if (p->use_uffd) {
if (p->uffd_mode) { uffd_handler_threads = malloc(nr_vcpus * sizeof(*uffd_handler_threads)); TEST_ASSERT(uffd_handler_threads, "Memory allocation failed");
@@ -290,6 +326,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) { vm_paddr_t vcpu_gpa; void *vcpu_hva;
void *vcpu_alias; uint64_t vcpu_mem_size;
@@ -304,8 +341,9 @@ static void run_test(enum vm_guest_mode mode, void *arg) PER_VCPU_DEBUG("Added VCPU %d with test mem gpa [%lx, %lx)\n", vcpu_id, vcpu_gpa, vcpu_gpa + vcpu_mem_size);
/* Cache the HVA pointer of the region */
/* Cache the host addresses of the region */ vcpu_hva = addr_gpa2hva(vm, vcpu_gpa);
vcpu_alias = addr_gpa2alias(vm, vcpu_gpa); /* * Set up user fault fd to handle demand paging
@@ -316,8 +354,9 @@ static void run_test(enum vm_guest_mode mode, void *arg) TEST_ASSERT(!r, "Failed to set up pipefd");
setup_demand_paging(vm, &uffd_handler_threads[vcpu_id],
pipefds[vcpu_id * 2], p->uffd_delay,
&uffd_args[vcpu_id], vcpu_hva,
pipefds[vcpu_id * 2], p->uffd_mode,
p->uffd_delay, &uffd_args[vcpu_id],
vcpu_hva, vcpu_alias, vcpu_mem_size); } }
@@ -346,7 +385,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
pr_info("All vCPU threads joined\n");
if (p->use_uffd) {
if (p->uffd_mode) { char c; /* Tell the user fault fd handler threads to quit */
@@ -368,7 +407,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
free(guest_data_prototype); free(vcpu_threads);
if (p->use_uffd) {
if (p->uffd_mode) { free(uffd_handler_threads); free(uffd_args); free(pipefds);
@@ -378,11 +417,11 @@ static void run_test(enum vm_guest_mode mode, void *arg) static void help(char *name) { puts("");
printf("usage: %s [-h] [-m mode] [-u] [-d uffd_delay_usec]\n"
printf("usage: %s [-h] [-m mode] [-u mode] [-d uffd_delay_usec]\n"
NIT: maybe use uffd_mode or some word other than mode here to disambiguate with -m
" [-b memory] [-t type] [-v vcpus] [-o]\n", name); guest_modes_help();
printf(" -u: use User Fault FD to handle vCPU page\n"
" faults.\n");
printf(" -u: use userfaultfd to handle vCPU page faults. Mode is a\n"
" UFFD registration mode: 'MISSING' or 'MINOR'.\n"); printf(" -d: add a delay in usec to the User Fault\n" " FD handler to simulate demand paging\n" " overheads. Ignored without -u.\n");
@@ -409,13 +448,17 @@ int main(int argc, char *argv[])
guest_modes_append_default();
while ((opt = getopt(argc, argv, "hm:ud:b:t:v:o")) != -1) {
while ((opt = getopt(argc, argv, "hm:u:d:b:t:v:o")) != -1) { switch (opt) { case 'm': guest_modes_cmdline(optarg); break; case 'u':
p.use_uffd = true;
if (!strcmp("MISSING", optarg))
p.uffd_mode = UFFDIO_REGISTER_MODE_MISSING;
else if (!strcmp("MINOR", optarg))
p.uffd_mode = UFFDIO_REGISTER_MODE_MINOR;
TEST_ASSERT(p.uffd_mode, "UFFD mode must be 'MISSING' or 'MINOR'."); break; case 'd': p.uffd_delay = strtoul(optarg, NULL, 0);
@@ -442,6 +485,9 @@ int main(int argc, char *argv[]) } }
TEST_ASSERT(p.uffd_mode != UFFDIO_REGISTER_MODE_MINOR || p.src_type == VM_MEM_SRC_SHMEM,
"userfaultfd MINOR mode requires shared memory; pick a different -t");
for_each_guest_mode(run_test, &p); return 0;
-- 2.31.1.751.gd2f1c929bd-goog
On Wed, May 19, 2021 at 3:21 PM Ben Gardon bgardon@google.com wrote:
On Wed, May 19, 2021 at 1:04 PM Axel Rasmussen axelrasmussen@google.com wrote:
UFFD handling of MINOR faults is a new feature whose use case is to speed up demand paging (compared to MISSING faults). So, it's interesting to let this selftest exercise this new mode.
Modify the demand paging test to have the option of using UFFD minor faults, as opposed to missing faults. Now, when turning on userfaultfd with '-u', the desired mode has to be specified ("MISSING" or "MINOR").
If we're in minor mode, before registering, prefault via the *alias*. This way, the guest will trigger minor faults, instead of missing faults, and we can UFFDIO_CONTINUE to resolve them.
Modify the page fault handler function to use the right ioctl depending on the mode we're running in. In MINOR mode, use UFFDIO_CONTINUE.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
.../selftests/kvm/demand_paging_test.c | 112 ++++++++++++------ 1 file changed, 79 insertions(+), 33 deletions(-)
diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index 01890a7b0155..df7190261923 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -74,33 +74,48 @@ static void *vcpu_worker(void *data) return NULL; }
-static int handle_uffd_page_request(int uffd, uint64_t addr) +static int handle_uffd_page_request(int uffd_mode, int uffd, uint64_t addr) {
pid_t tid;
pid_t tid = syscall(__NR_gettid); struct timespec start; struct timespec ts_diff;
struct uffdio_copy copy; int r;
tid = syscall(__NR_gettid);
clock_gettime(CLOCK_MONOTONIC, &start);
copy.src = (uint64_t)guest_data_prototype;
copy.dst = addr;
copy.len = demand_paging_size;
copy.mode = 0;
if (uffd_mode == UFFDIO_REGISTER_MODE_MISSING) {
struct uffdio_copy copy;
clock_gettime(CLOCK_MONOTONIC, &start);
copy.src = (uint64_t)guest_data_prototype;
copy.dst = addr;
copy.len = demand_paging_size;
copy.mode = 0;
r = ioctl(uffd, UFFDIO_COPY, ©);
if (r == -1) {
pr_info("Failed UFFDIO_COPY in 0x%lx from thread %d with errno: %d\n",
addr, tid, errno);
return r;
}
} else if (uffd_mode == UFFDIO_REGISTER_MODE_MINOR) {
struct uffdio_continue cont = {0};
cont.range.start = addr;
cont.range.len = demand_paging_size;
r = ioctl(uffd, UFFDIO_COPY, ©);
if (r == -1) {
pr_info("Failed Paged in 0x%lx from thread %d with errno: %d\n",
addr, tid, errno);
return r;
r = ioctl(uffd, UFFDIO_CONTINUE, &cont);
if (r == -1) {
pr_info("Failed UFFDIO_CONTINUE in 0x%lx from thread %d with errno: %d\n",
addr, tid, errno);
return r;
}
} else {
TEST_FAIL("Invalid uffd mode %d", uffd_mode); } ts_diff = timespec_elapsed(start);
PER_PAGE_DEBUG("UFFDIO_COPY %d \t%ld ns\n", tid,
PER_PAGE_DEBUG("UFFD page-in %d \t%ld ns\n", tid, timespec_to_ns(ts_diff)); PER_PAGE_DEBUG("Paged in %ld bytes at 0x%lx from thread %d\n", demand_paging_size, addr, tid);
@@ -111,6 +126,7 @@ static int handle_uffd_page_request(int uffd, uint64_t addr) bool quit_uffd_thread;
struct uffd_handler_args {
int uffd_mode; int uffd; int pipefd; useconds_t delay;
@@ -187,7 +203,7 @@ static void *uffd_handler_thread_fn(void *arg) if (delay) usleep(delay); addr = msg.arg.pagefault.address;
r = handle_uffd_page_request(uffd, addr);
r = handle_uffd_page_request(uffd_args->uffd_mode, uffd, addr); if (r < 0) return NULL; pages++;
@@ -203,13 +219,32 @@ static void *uffd_handler_thread_fn(void *arg)
static void setup_demand_paging(struct kvm_vm *vm, pthread_t *uffd_handler_thread, int pipefd,
useconds_t uffd_delay,
int uffd_mode, useconds_t uffd_delay, struct uffd_handler_args *uffd_args,
void *hva, uint64_t len)
void *hva, void *alias, uint64_t len)
{
bool is_minor = (uffd_mode == UFFDIO_REGISTER_MODE_MINOR); int uffd; struct uffdio_api uffdio_api; struct uffdio_register uffdio_register;
uint64_t expected_ioctls = ((uint64_t) 1) << _UFFDIO_COPY;
PER_PAGE_DEBUG("Userfaultfd %s mode, faults resolved with %s\n",
is_minor ? "MINOR" : "MISSING",
is_minor ? "UFFDIO_CONINUE" : "UFFDIO_COPY");
/* In order to get minor faults, prefault via the alias. */
if (is_minor) {
size_t p;
expected_ioctls = ((uint64_t) 1) << _UFFDIO_CONTINUE;
TEST_ASSERT(alias != NULL, "Alias required for minor faults");
for (p = 0; p < (len / demand_paging_size); ++p) {
memcpy(alias + (p * demand_paging_size),
guest_data_prototype, demand_paging_size);
}
}
Would it be worth timing this operation? I think we'd need to know how long we spent prefaulting the memory to really be able to compare UDDF modes using this test.
It's easy to time it and print out a value, so I'm happy to add it.
As for how useful it is, I'm not so sure. In general the way I think of it is, the prefaulting would happen during the precopy phase of live migration. During this phase, the VM is still running on the source machine, so the VM owner doesn't notice any performance degradation or slowness in this phase.
uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); TEST_ASSERT(uffd >= 0, "uffd creation failed, errno: %d", errno);
@@ -222,12 +257,13 @@ static void setup_demand_paging(struct kvm_vm *vm,
uffdio_register.range.start = (uint64_t)hva; uffdio_register.range.len = len;
uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
uffdio_register.mode = uffd_mode; TEST_ASSERT(ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) != -1, "ioctl UFFDIO_REGISTER failed");
TEST_ASSERT((uffdio_register.ioctls & UFFD_API_RANGE_IOCTLS) ==
UFFD_API_RANGE_IOCTLS, "unexpected userfaultfd ioctl set");
TEST_ASSERT((uffdio_register.ioctls & expected_ioctls) ==
expected_ioctls, "missing userfaultfd ioctls");
uffd_args->uffd_mode = uffd_mode; uffd_args->uffd = uffd; uffd_args->pipefd = pipefd; uffd_args->delay = uffd_delay;
@@ -239,7 +275,7 @@ static void setup_demand_paging(struct kvm_vm *vm, }
struct test_params {
bool use_uffd;
int uffd_mode; useconds_t uffd_delay; enum vm_mem_backing_src_type src_type; bool partition_vcpu_memory_access;
@@ -276,7 +312,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) perf_test_setup_vcpus(vm, nr_vcpus, guest_percpu_mem_size, p->partition_vcpu_memory_access);
if (p->use_uffd) {
if (p->uffd_mode) { uffd_handler_threads = malloc(nr_vcpus * sizeof(*uffd_handler_threads)); TEST_ASSERT(uffd_handler_threads, "Memory allocation failed");
@@ -290,6 +326,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) { vm_paddr_t vcpu_gpa; void *vcpu_hva;
void *vcpu_alias; uint64_t vcpu_mem_size;
@@ -304,8 +341,9 @@ static void run_test(enum vm_guest_mode mode, void *arg) PER_VCPU_DEBUG("Added VCPU %d with test mem gpa [%lx, %lx)\n", vcpu_id, vcpu_gpa, vcpu_gpa + vcpu_mem_size);
/* Cache the HVA pointer of the region */
/* Cache the host addresses of the region */ vcpu_hva = addr_gpa2hva(vm, vcpu_gpa);
vcpu_alias = addr_gpa2alias(vm, vcpu_gpa); /* * Set up user fault fd to handle demand paging
@@ -316,8 +354,9 @@ static void run_test(enum vm_guest_mode mode, void *arg) TEST_ASSERT(!r, "Failed to set up pipefd");
setup_demand_paging(vm, &uffd_handler_threads[vcpu_id],
pipefds[vcpu_id * 2], p->uffd_delay,
&uffd_args[vcpu_id], vcpu_hva,
pipefds[vcpu_id * 2], p->uffd_mode,
p->uffd_delay, &uffd_args[vcpu_id],
vcpu_hva, vcpu_alias, vcpu_mem_size); } }
@@ -346,7 +385,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
pr_info("All vCPU threads joined\n");
if (p->use_uffd) {
if (p->uffd_mode) { char c; /* Tell the user fault fd handler threads to quit */
@@ -368,7 +407,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
free(guest_data_prototype); free(vcpu_threads);
if (p->use_uffd) {
if (p->uffd_mode) { free(uffd_handler_threads); free(uffd_args); free(pipefds);
@@ -378,11 +417,11 @@ static void run_test(enum vm_guest_mode mode, void *arg) static void help(char *name) { puts("");
printf("usage: %s [-h] [-m mode] [-u] [-d uffd_delay_usec]\n"
printf("usage: %s [-h] [-m mode] [-u mode] [-d uffd_delay_usec]\n"
NIT: maybe use uffd_mode or some word other than mode here to disambiguate with -m
" [-b memory] [-t type] [-v vcpus] [-o]\n", name); guest_modes_help();
printf(" -u: use User Fault FD to handle vCPU page\n"
" faults.\n");
printf(" -u: use userfaultfd to handle vCPU page faults. Mode is a\n"
" UFFD registration mode: 'MISSING' or 'MINOR'.\n"); printf(" -d: add a delay in usec to the User Fault\n" " FD handler to simulate demand paging\n" " overheads. Ignored without -u.\n");
@@ -409,13 +448,17 @@ int main(int argc, char *argv[])
guest_modes_append_default();
while ((opt = getopt(argc, argv, "hm:ud:b:t:v:o")) != -1) {
while ((opt = getopt(argc, argv, "hm:u:d:b:t:v:o")) != -1) { switch (opt) { case 'm': guest_modes_cmdline(optarg); break; case 'u':
p.use_uffd = true;
if (!strcmp("MISSING", optarg))
p.uffd_mode = UFFDIO_REGISTER_MODE_MISSING;
else if (!strcmp("MINOR", optarg))
p.uffd_mode = UFFDIO_REGISTER_MODE_MINOR;
TEST_ASSERT(p.uffd_mode, "UFFD mode must be 'MISSING' or 'MINOR'."); break; case 'd': p.uffd_delay = strtoul(optarg, NULL, 0);
@@ -442,6 +485,9 @@ int main(int argc, char *argv[]) } }
TEST_ASSERT(p.uffd_mode != UFFDIO_REGISTER_MODE_MINOR || p.src_type == VM_MEM_SRC_SHMEM,
"userfaultfd MINOR mode requires shared memory; pick a different -t");
for_each_guest_mode(run_test, &p); return 0;
-- 2.31.1.751.gd2f1c929bd-goog
On 20/05/21 00:20, Ben Gardon wrote:
printf("usage: %s [-h] [-m mode] [-u mode] [-d uffd_delay_usec]\n"
NIT: maybe use uffd_mode or some word other than mode here to disambiguate with -m
Changed to "[-m vm_mode] [-u uffd_mode]".
Paolo
This lets us run the demand paging test on top of a shared hugetlbfs-backed area. The "shared" is key, as this allows us to exercise userfaultfd minor faults on hugetlbfs.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com --- tools/testing/selftests/kvm/demand_paging_test.c | 6 ++++-- tools/testing/selftests/kvm/include/test_util.h | 11 +++++++++++ tools/testing/selftests/kvm/lib/kvm_util.c | 9 +++++++-- tools/testing/selftests/kvm/lib/test_util.c | 11 +++++++++++ 4 files changed, 33 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index df7190261923..60d9b5223b9d 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -485,8 +485,10 @@ int main(int argc, char *argv[]) } }
- TEST_ASSERT(p.uffd_mode != UFFDIO_REGISTER_MODE_MINOR || p.src_type == VM_MEM_SRC_SHMEM, - "userfaultfd MINOR mode requires shared memory; pick a different -t"); + if (p.uffd_mode == UFFDIO_REGISTER_MODE_MINOR && + !backing_src_is_shared(p.src_type)) { + TEST_FAIL("userfaultfd MINOR mode requires shared memory; pick a different -t"); + }
for_each_guest_mode(run_test, &p);
diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h index 7377f00469ef..d79be15dd3d2 100644 --- a/tools/testing/selftests/kvm/include/test_util.h +++ b/tools/testing/selftests/kvm/include/test_util.h @@ -17,6 +17,7 @@ #include <errno.h> #include <unistd.h> #include <fcntl.h> +#include <sys/mman.h> #include "kselftest.h"
static inline int _no_printf(const char *format, ...) { return 0; } @@ -85,6 +86,7 @@ enum vm_mem_backing_src_type { VM_MEM_SRC_ANONYMOUS_HUGETLB_2GB, VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB, VM_MEM_SRC_SHMEM, + VM_MEM_SRC_SHARED_HUGETLB, NUM_SRC_TYPES, };
@@ -101,4 +103,13 @@ size_t get_backing_src_pagesz(uint32_t i); void backing_src_help(void); enum vm_mem_backing_src_type parse_backing_src_type(const char *type_name);
+/* + * Whether or not the given source type is shared memory (as opposed to + * anonymous). + */ +static inline bool backing_src_is_shared(enum vm_mem_backing_src_type t) +{ + return vm_mem_backing_src_alias(t)->flag & MAP_SHARED; +} + #endif /* SELFTEST_KVM_TEST_UTIL_H */ diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 0b88d1bbc1e0..8373aec1fb02 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -758,8 +758,13 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, region->mmap_size += alignment;
region->fd = -1; - if (src_type == VM_MEM_SRC_SHMEM) { - region->fd = memfd_create("kvm_selftest", MFD_CLOEXEC); + if (backing_src_is_shared(src_type)) { + int memfd_flags = MFD_CLOEXEC; + + if (src_type == VM_MEM_SRC_SHARED_HUGETLB) + memfd_flags |= MFD_HUGETLB; + + region->fd = memfd_create("kvm_selftest", memfd_flags); TEST_ASSERT(region->fd != -1, "memfd_create failed, errno: %i", errno);
diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c index c7a265da5090..6ad6c8276b2e 100644 --- a/tools/testing/selftests/kvm/lib/test_util.c +++ b/tools/testing/selftests/kvm/lib/test_util.c @@ -240,6 +240,16 @@ const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i) .name = "shmem", .flag = MAP_SHARED, }, + [VM_MEM_SRC_SHARED_HUGETLB] = { + .name = "shared_hugetlb", + /* + * No MAP_HUGETLB, we use MFD_HUGETLB instead. Since + * we're using "file backed" memory, we need to specify + * this when the FD is created, not when the area is + * mapped. + */ + .flag = MAP_SHARED, + }, }; _Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES, "Missing new backing src types?"); @@ -262,6 +272,7 @@ size_t get_backing_src_pagesz(uint32_t i) case VM_MEM_SRC_ANONYMOUS_THP: return get_trans_hugepagesz(); case VM_MEM_SRC_ANONYMOUS_HUGETLB: + case VM_MEM_SRC_SHARED_HUGETLB: return get_def_hugetlb_pagesz(); default: return MAP_HUGE_PAGE_SIZE(flag);
On Wed, May 19, 2021 at 1:04 PM Axel Rasmussen axelrasmussen@google.com wrote:
This lets us run the demand paging test on top of a shared hugetlbfs-backed area. The "shared" is key, as this allows us to exercise userfaultfd minor faults on hugetlbfs.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
Reviewed-by: Ben Gardon bgardon@google.com
tools/testing/selftests/kvm/demand_paging_test.c | 6 ++++-- tools/testing/selftests/kvm/include/test_util.h | 11 +++++++++++ tools/testing/selftests/kvm/lib/kvm_util.c | 9 +++++++-- tools/testing/selftests/kvm/lib/test_util.c | 11 +++++++++++ 4 files changed, 33 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index df7190261923..60d9b5223b9d 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -485,8 +485,10 @@ int main(int argc, char *argv[]) } }
TEST_ASSERT(p.uffd_mode != UFFDIO_REGISTER_MODE_MINOR || p.src_type == VM_MEM_SRC_SHMEM,
"userfaultfd MINOR mode requires shared memory; pick a different -t");
if (p.uffd_mode == UFFDIO_REGISTER_MODE_MINOR &&
!backing_src_is_shared(p.src_type)) {
TEST_FAIL("userfaultfd MINOR mode requires shared memory; pick a different -t");
} for_each_guest_mode(run_test, &p);
diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h index 7377f00469ef..d79be15dd3d2 100644 --- a/tools/testing/selftests/kvm/include/test_util.h +++ b/tools/testing/selftests/kvm/include/test_util.h @@ -17,6 +17,7 @@ #include <errno.h> #include <unistd.h> #include <fcntl.h> +#include <sys/mman.h> #include "kselftest.h"
static inline int _no_printf(const char *format, ...) { return 0; } @@ -85,6 +86,7 @@ enum vm_mem_backing_src_type { VM_MEM_SRC_ANONYMOUS_HUGETLB_2GB, VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB, VM_MEM_SRC_SHMEM,
VM_MEM_SRC_SHARED_HUGETLB, NUM_SRC_TYPES,
};
@@ -101,4 +103,13 @@ size_t get_backing_src_pagesz(uint32_t i); void backing_src_help(void); enum vm_mem_backing_src_type parse_backing_src_type(const char *type_name);
+/*
- Whether or not the given source type is shared memory (as opposed to
- anonymous).
- */
+static inline bool backing_src_is_shared(enum vm_mem_backing_src_type t) +{
return vm_mem_backing_src_alias(t)->flag & MAP_SHARED;
+}
#endif /* SELFTEST_KVM_TEST_UTIL_H */ diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 0b88d1bbc1e0..8373aec1fb02 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -758,8 +758,13 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, region->mmap_size += alignment;
region->fd = -1;
if (src_type == VM_MEM_SRC_SHMEM) {
region->fd = memfd_create("kvm_selftest", MFD_CLOEXEC);
if (backing_src_is_shared(src_type)) {
int memfd_flags = MFD_CLOEXEC;
if (src_type == VM_MEM_SRC_SHARED_HUGETLB)
memfd_flags |= MFD_HUGETLB;
region->fd = memfd_create("kvm_selftest", memfd_flags); TEST_ASSERT(region->fd != -1, "memfd_create failed, errno: %i", errno);
diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c index c7a265da5090..6ad6c8276b2e 100644 --- a/tools/testing/selftests/kvm/lib/test_util.c +++ b/tools/testing/selftests/kvm/lib/test_util.c @@ -240,6 +240,16 @@ const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i) .name = "shmem", .flag = MAP_SHARED, },
[VM_MEM_SRC_SHARED_HUGETLB] = {
.name = "shared_hugetlb",
/*
* No MAP_HUGETLB, we use MFD_HUGETLB instead. Since
* we're using "file backed" memory, we need to specify
* this when the FD is created, not when the area is
* mapped.
*/
.flag = MAP_SHARED,
}, }; _Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES, "Missing new backing src types?");
@@ -262,6 +272,7 @@ size_t get_backing_src_pagesz(uint32_t i) case VM_MEM_SRC_ANONYMOUS_THP: return get_trans_hugepagesz(); case VM_MEM_SRC_ANONYMOUS_HUGETLB:
case VM_MEM_SRC_SHARED_HUGETLB: return get_def_hugetlb_pagesz(); default: return MAP_HUGE_PAGE_SIZE(flag);
-- 2.31.1.751.gd2f1c929bd-goog
On 19/05/21 22:03, Axel Rasmussen wrote:
Base
These patches are based upon Andrew Morton's v5.13-rc1-mmots-2021-05-13-17-23 tag. This is because this series depends on:
- UFFD minor fault support for hugetlbfs (in v5.13-rc1) [1]
- UFFD minor fault support for shmem (in Andrew's tree) [2]
[1] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen@... [2] https://lore.kernel.org/patchwork/cover/1420967/
Changelog
v1->v2:
- Picked up Reviewed-by's.
- Change backing_src_is_shared() to check the flags, instead of the type. This makes it robust to adding new backing source types in the future.
- Add another commit which refactors setup_demand_paging() error handling.
- Print UFFD ioctl type once in setup_demand_paging, instead of on every page-in operation.
- Expand comment on why we use MFD_HUGETLB instead of MAP_HUGETLB.
- Reworded comment on addr_gpa2alias.
- Moved demand_paging_test.c timing calls outside of the if (), deduplicating them.
- Split trivial comment / logging fixups into a separate commit.
- Add another commit which prints a clarifying message on test skip.
- Split the commit allowing backing src_type to be modified in two.
- Split the commit adding the shmem backing type in two.
- Rebased onto v5.13-rc1-mmots-2021-05-13-17-23.
Overview
Minor fault handling is a new userfaultfd feature whose goal is generally to improve performance. In particular, it is intended for use with demand paging. There are more details in the cover letters for this new feature (linked above), but at a high level the idea is that we think of these three phases of live migration of a VM:
- Precopy, where we copy "some" pages from the source to the target, while the VM is still running on the source machine.
- Blackout, where execution stops on the source, and begins on the target.
- Postcopy, where the VM is running on the target, some pages are already up to date, and others are not (because they weren't copied, or were modified after being copied).
During postcopy, the first time the guest touches memory, we intercept a minor fault. Userspace checks whether or not the page is already up to date. If needed, we copy the final version of the page from the soure machine. This could be done with RDMA for example, to do it truly in place / with no copying. At this point, all that's left is to setup PTEs for the guest: so we issue UFFDIO_CONTINUE. No copying or page allocation needed.
Because of this use case, it's useful to exercise this as part of the demand paging test. It lets us ensure the use case works correctly end-to-end, and also gives us an in-tree way to profile the end-to-end flow for future performance improvements.
Axel Rasmussen (10): KVM: selftests: trivial comment/logging fixes KVM: selftests: simplify setup_demand_paging error handling KVM: selftests: print a message when skipping KVM tests KVM: selftests: compute correct demand paging size KVM: selftests: allow different backing source types KVM: selftests: refactor vm_mem_backing_src_type flags KVM: selftests: add shmem backing source type KVM: selftests: create alias mappings when using shared memory KVM: selftests: allow using UFFD minor faults for demand paging KVM: selftests: add shared hugetlbfs backing source type
.../selftests/kvm/demand_paging_test.c | 175 +++++++++++------- .../testing/selftests/kvm/include/kvm_util.h | 1 + .../testing/selftests/kvm/include/test_util.h | 12 ++ tools/testing/selftests/kvm/lib/kvm_util.c | 84 ++++++++- .../selftests/kvm/lib/kvm_util_internal.h | 2 + tools/testing/selftests/kvm/lib/test_util.c | 51 +++-- 6 files changed, 238 insertions(+), 87 deletions(-)
-- 2.31.1.751.gd2f1c929bd-goog
Queued, thanks (with region->fd moved to the right patch).
Paolo
linux-kselftest-mirror@lists.linaro.org