When handling page faults for many vCPUs during demand paging, KVM's MMU
lock becomes highly contended. This series creates a test with a naive
userfaultfd based demand paging implementation to demonstrate that
contention. This test serves both as a functional test of userfaultfd
and a microbenchmark of demand paging performance with a variable number
of vCPUs and memory per vCPU.
The test creates N userfaultfd threads, N vCPUs, and a region of memory
with M pages per vCPU. The N userfaultfd polling threads are each set up
to serve faults on a region of memory corresponding to one of the vCPUs.
Each of the vCPUs is then started, and touches each page of its disjoint
memory region, sequentially. In response to faults, the userfaultfd
threads copy a static buffer into the guest's memory. This creates a
worst case for MMU lock contention as we have removed most of the
contention between the userfaultfd threads and there is no time required
to fetch the contents of guest memory.
This test was run successfully on Intel Haswell, Broadwell, and
Cascadelake hosts with a variety of vCPU counts and memory sizes.
This test was adapted from the dirty_log_test.
The series can also be viewed in Gerrit here:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/1464
(Thanks to Dmitry Vyukov <dvyukov(a)google.com> for setting up the Gerrit
instance)
v4 (Responding to feedback from Andrew Jones, Peter Xu, and Peter Shier):
- Tested this revision by running
demand_paging_test
at each commit in the series on an Intel Haswell machine. Ran
demand_paging_test -u -v 8 -b 8M -d 10
on the same machine at the last commit in the series.
- Readded partial aarch64 support, though aarch64 and s390 remain
untested
- Implemented pipefd polling to reduce UFFD thread exit latency
- Added variable unit input for memory size so users can pass command
line arguments of the form -b 24M instead of the raw number or bytes
- Moved a missing break from a patch later in the series to an earlier
one
- Moved to syncing per-vCPU global variables to guest and looking up
per-vcpu arguments based on a single CPU ID passed to each guest
vCPU. This allows for future patches to pass more than the supported
number of arguments for each arch to the vCPUs.
- Implemented vcpu_args_set for s390 and aarch64 [UNTESTED]
- Changed vm_create to always allocate memslot 0 at 4G instead of only
when the number of pages required is large.
- Changed vcpu_wss to vcpu_memory_size for clarity.
Ben Gardon (10):
KVM: selftests: Create a demand paging test
KVM: selftests: Add demand paging content to the demand paging test
KVM: selftests: Add configurable demand paging delay
KVM: selftests: Add memory size parameter to the demand paging test
KVM: selftests: Pass args to vCPU in global vCPU args struct
KVM: selftests: Add support for vcpu_args_set to aarch64 and s390x
KVM: selftests: Support multiple vCPUs in demand paging test
KVM: selftests: Time guest demand paging
KVM: selftests: Stop memslot creation in KVM internal memslot region
KVM: selftests: Move memslot 0 above KVM internal memslots
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 5 +-
.../selftests/kvm/demand_paging_test.c | 680 ++++++++++++++++++
.../testing/selftests/kvm/include/test_util.h | 2 +
.../selftests/kvm/lib/aarch64/processor.c | 33 +
tools/testing/selftests/kvm/lib/kvm_util.c | 27 +-
.../selftests/kvm/lib/s390x/processor.c | 35 +
tools/testing/selftests/kvm/lib/test_util.c | 61 ++
8 files changed, 839 insertions(+), 5 deletions(-)
create mode 100644 tools/testing/selftests/kvm/demand_paging_test.c
create mode 100644 tools/testing/selftests/kvm/lib/test_util.c
--
2.25.0.341.g760bfbb309-goog
The current stable LLVM BPF backend fails to compile the BPF selftests
due to a compiler bug. The bug has been fixed in trunk, but that fix
hasn't landed in the binary packages I'm using yet (Fedora arm64).
Without this workaround the tests don't compile for me.
This patch triggers a preprocessor warning on LLVM versions that
definitely have the bug. The test may be conservative (ie, I'm not sure
if 9.1 will have the fix), but it should at least make the current set
of stable releases work together.
See https://reviews.llvm.org/D69438 for more information on the fix. I
obtained the workaround from
https://lore.kernel.org/linux-kselftest/aed8eda7-df20-069b-ea14-f0662898456…
Fixes: 20a9ad2e7136 ("selftests/bpf: add CO-RE relocs array tests")
Signed-off-by: Palmer Dabbelt <palmerdabbelt(a)google.com>
---
.../testing/selftests/bpf/progs/test_core_reloc_arrays.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/test_core_reloc_arrays.c b/tools/testing/selftests/bpf/progs/test_core_reloc_arrays.c
index 89951b684282..e5eafdab80a4 100644
--- a/tools/testing/selftests/bpf/progs/test_core_reloc_arrays.c
+++ b/tools/testing/selftests/bpf/progs/test_core_reloc_arrays.c
@@ -43,15 +43,23 @@ int test_core_arrays(void *ctx)
/* in->a[2] */
if (CORE_READ(&out->a2, &in->a[2]))
return 1;
+#if defined(__clang__) && (__clang_major__ < 10) && (__clang_minor__ < 1)
+# warning "clang 9.0 SEGVs on multidimensional arrays, see https://reviews.llvm.org/D69438"
+#else
/* in->b[1][2][3] */
if (CORE_READ(&out->b123, &in->b[1][2][3]))
return 1;
+#endif
/* in->c[1].c */
if (CORE_READ(&out->c1c, &in->c[1].c))
return 1;
+#if defined(__clang__) && (__clang_major__ < 10) && (__clang_minor__ < 1)
+# warning "clang 9.0 SEGVs on multidimensional arrays, see https://reviews.llvm.org/D69438"
+#else
/* in->d[0][0].d */
if (CORE_READ(&out->d00d, &in->d[0][0].d))
return 1;
+#endif
return 0;
}
--
2.25.0.341.g760bfbb309-goog
A few of the lists used in the linked-list KUnit tests (the
for_each_entry{,_reverse} tests) are declared 'static', and so are
not-reinitialised if the test runs multiple times. This was not a
problem when KUnit tests were run once on startup, but when tests are
able to be run manually (e.g. from debugfs[1]), this is no longer the
case.
Making these lists no longer 'static' causes the lists to be
reinitialised, and the test passes each time it is run. While there may
be some value in testing that initialising static lists works, the
for_each_entry_* tests are unlikely to be the right place for it.
Signed-off-by: David Gow <davidgow(a)google.com>
---
lib/list-test.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/list-test.c b/lib/list-test.c
index 76babb1df889..ee09505df16f 100644
--- a/lib/list-test.c
+++ b/lib/list-test.c
@@ -659,7 +659,7 @@ static void list_test_list_for_each_prev_safe(struct kunit *test)
static void list_test_list_for_each_entry(struct kunit *test)
{
struct list_test_struct entries[5], *cur;
- static LIST_HEAD(list);
+ LIST_HEAD(list);
int i = 0;
for (i = 0; i < 5; ++i) {
@@ -680,7 +680,7 @@ static void list_test_list_for_each_entry(struct kunit *test)
static void list_test_list_for_each_entry_reverse(struct kunit *test)
{
struct list_test_struct entries[5], *cur;
- static LIST_HEAD(list);
+ LIST_HEAD(list);
int i = 0;
for (i = 0; i < 5; ++i) {
--
2.25.0.341.g760bfbb309-goog
The reuseport tests currently suffer from a race condition: RST
packets count towards DROP_ERR_SKB_DATA, since they don't contain
a valid struct cmd. Tests will spuriously fail depending on whether
check_results is called before or after the RST is processed.
Exit the BPF program early if FIN is set.
Signed-off-by: Lorenz Bauer <lmb(a)cloudflare.com>
---
.../selftests/bpf/progs/test_select_reuseport_kern.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c b/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
index d69a1f2bbbfd..26e77dcc7e91 100644
--- a/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
@@ -113,6 +113,12 @@ int _select_by_skb_data(struct sk_reuseport_md *reuse_md)
data_check.skb_ports[0] = th->source;
data_check.skb_ports[1] = th->dest;
+ if (th->fin)
+ /* The connection is being torn down at the end of a
+ * test. It can't contain a cmd, so return early.
+ */
+ return SK_PASS;
+
if ((th->doff << 2) + sizeof(*cmd) > data_check.len)
GOTO_DONE(DROP_ERR_SKB_DATA);
if (bpf_skb_load_bytes(reuse_md, th->doff << 2, &cmd_copy,
--
2.20.1
Currently, there is a lot of false positives if a single reuseport test
fails. This is because expected_results and the result map are not cleared.
Zero both after individual test runs, which fixes the mentioned false
positives.
Signed-off-by: Lorenz Bauer <lmb(a)cloudflare.com>
Fixes: 91134d849a0e ("bpf: Test BPF_PROG_TYPE_SK_REUSEPORT")
---
.../selftests/bpf/prog_tests/select_reuseport.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/select_reuseport.c b/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
index e7e56929751c..098bcae5f827 100644
--- a/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
+++ b/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
@@ -33,7 +33,7 @@
#define REUSEPORT_ARRAY_SIZE 32
static int result_map, tmp_index_ovr_map, linum_map, data_check_map;
-static enum result expected_results[NR_RESULTS];
+static __u32 expected_results[NR_RESULTS];
static int sk_fds[REUSEPORT_ARRAY_SIZE];
static int reuseport_array = -1, outer_map = -1;
static int select_by_skb_data_prog;
@@ -697,7 +697,19 @@ static void setup_per_test(int type, sa_family_t family, bool inany,
static void cleanup_per_test(bool no_inner_map)
{
- int i, err;
+ int i, err, zero = 0;
+
+ memset(expected_results, 0, sizeof(expected_results));
+
+ for (i = 0; i < NR_RESULTS; i++) {
+ err = bpf_map_update_elem(result_map, &i, &zero, BPF_ANY);
+ RET_IF(err, "reset elem in result_map",
+ "i:%u err:%d errno:%d\n", i, err, errno);
+ }
+
+ err = bpf_map_update_elem(linum_map, &zero, &zero, BPF_ANY);
+ RET_IF(err, "reset line number in linum_map", "err:%d errno:%d\n",
+ err, errno);
for (i = 0; i < REUSEPORT_ARRAY_SIZE; i++)
close(sk_fds[i]);
--
2.20.1
The reuseport tests currently suffer from a race condition: FIN
packets count towards DROP_ERR_SKB_DATA, since they don't contain
a valid struct cmd. Tests will spuriously fail depending on whether
check_results is called before or after the FIN is processed.
Exit the BPF program early if FIN is set.
Signed-off-by: Lorenz Bauer <lmb(a)cloudflare.com>
Fixes: 91134d849a0e ("bpf: Test BPF_PROG_TYPE_SK_REUSEPORT")
---
.../selftests/bpf/progs/test_select_reuseport_kern.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c b/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
index d69a1f2bbbfd..26e77dcc7e91 100644
--- a/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
@@ -113,6 +113,12 @@ int _select_by_skb_data(struct sk_reuseport_md *reuse_md)
data_check.skb_ports[0] = th->source;
data_check.skb_ports[1] = th->dest;
+ if (th->fin)
+ /* The connection is being torn down at the end of a
+ * test. It can't contain a cmd, so return early.
+ */
+ return SK_PASS;
+
if ((th->doff << 2) + sizeof(*cmd) > data_check.len)
GOTO_DONE(DROP_ERR_SKB_DATA);
if (bpf_skb_load_bytes(reuse_md, th->doff << 2, &cmd_copy,
--
2.20.1