Currently, anonymous PTE-mapped THPs cannot be collapsed in-place:
collapsing (e.g., via MADV_COLLAPSE) implies allocating a fresh THP and
mapping that new THP via a PMD: as it's a fresh anon THP, it will get the
exclusive flag set on the head page and everybody is happy.
However, if the kernel would ever support in-place collapse of anonymous
THPs (replacing a page table mapping each sub-page of a THP via PTEs with a
single PMD mapping the complete THP), exclusivity information stored for
each sub-page would have to be collapsed accordingly:
(1) All PTEs map !exclusive anon sub-pages: the in-place collapsed THP
must not not have the exclusive flag set on the head page mapped by
the PMD. This is the easiest case to handle ("simply don't set any
exclusive flags").
(2) All PTEs map exclusive anon sub-pages: when collapsing, we have to
clear the exclusive flag from all tail pages and only leave the
exclusive flag set for the head page. Otherwise, fork() after
collapse would not clear the exclusive flags from the tail pages
and we'd be in trouble once PTE-mapping the shared THP when writing
to shared tail pages that still have the exclusive flag set. This
would effectively revert what the PTE-mapping code does when
propagating the exclusive flag to all sub-pages.
(3) PTEs map a mixture of exclusive and !exclusive anon sub-pages (can
happen e.g., due to MADV_DONTFORK before fork()). We must not
collapse the THP in-place, otherwise bad things may happen:
the exclusive flags of sub-pages would get ignored and the
exclusive flag of the head page would get used instead.
Now that we have MADV_COLLAPSE in place to trigger collapsing a THP,
let's add some test cases that would bail out early, if we'd
voluntarily/accidantially unlock in-place collapse for anon THPs and
forget about taking proper care of exclusive flags.
Running the test on a kernel with MADV_COLLAPSE support:
# [INFO] Anonymous THP tests
# [RUN] Basic COW after fork() when collapsing before fork()
ok 169 No leak from parent into child
# [RUN] Basic COW after fork() when collapsing after fork() (fully shared)
ok 170 # SKIP MADV_COLLAPSE failed: Invalid argument
# [RUN] Basic COW after fork() when collapsing after fork() (lower shared)
ok 171 No leak from parent into child
# [RUN] Basic COW after fork() when collapsing after fork() (upper shared)
ok 172 No leak from parent into child
For now, MADV_COLLAPSE always seems to fail if all PTEs map shared
sub-pages.
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: Zach O'Keefe <zokeefe(a)google.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
---
A patch from Hugh made me explore the wonderful world of in-place collapse
of THP, and I was briefly concerned that it would apply to anon THP as
well. After thinking about it a bit, I decided to add test cases, to better
be safe than sorry in any case, and to document how PG_anon_exclusive is to
be handled in that case.
---
tools/testing/selftests/vm/cow.c | 228 +++++++++++++++++++++++++++++++
1 file changed, 228 insertions(+)
diff --git a/tools/testing/selftests/vm/cow.c b/tools/testing/selftests/vm/cow.c
index 26f6ea3079e2..16216d893d96 100644
--- a/tools/testing/selftests/vm/cow.c
+++ b/tools/testing/selftests/vm/cow.c
@@ -30,6 +30,10 @@
#include "../kselftest.h"
#include "vm_util.h"
+#ifndef MADV_COLLAPSE
+#define MADV_COLLAPSE 25
+#endif
+
static size_t pagesize;
static int pagemap_fd;
static size_t thpsize;
@@ -1178,6 +1182,228 @@ static int tests_per_anon_test_case(void)
return tests;
}
+enum anon_thp_collapse_test {
+ ANON_THP_COLLAPSE_UNSHARED,
+ ANON_THP_COLLAPSE_FULLY_SHARED,
+ ANON_THP_COLLAPSE_LOWER_SHARED,
+ ANON_THP_COLLAPSE_UPPER_SHARED,
+};
+
+static void do_test_anon_thp_collapse(char *mem, size_t size,
+ enum anon_thp_collapse_test test)
+{
+ struct comm_pipes comm_pipes;
+ char buf;
+ int ret;
+
+ ret = setup_comm_pipes(&comm_pipes);
+ if (ret) {
+ ksft_test_result_fail("pipe() failed\n");
+ return;
+ }
+
+ /*
+ * Trigger PTE-mapping the THP by temporarily mapping a single subpage
+ * R/O, such that we can try collapsing it later.
+ */
+ ret = mprotect(mem + pagesize, pagesize, PROT_READ);
+ if (ret) {
+ ksft_test_result_fail("mprotect() failed\n");
+ goto close_comm_pipes;
+ }
+ ret = mprotect(mem + pagesize, pagesize, PROT_READ | PROT_WRITE);
+ if (ret) {
+ ksft_test_result_fail("mprotect() failed\n");
+ goto close_comm_pipes;
+ }
+
+ switch (test) {
+ case ANON_THP_COLLAPSE_UNSHARED:
+ /* Collapse before actually COW-sharing the page. */
+ ret = madvise(mem, size, MADV_COLLAPSE);
+ if (ret) {
+ ksft_test_result_skip("MADV_COLLAPSE failed: %s\n",
+ strerror(errno));
+ goto close_comm_pipes;
+ }
+ break;
+ case ANON_THP_COLLAPSE_FULLY_SHARED:
+ /* COW-share the full PTE-mapped THP. */
+ break;
+ case ANON_THP_COLLAPSE_LOWER_SHARED:
+ /* Don't COW-share the upper part of the THP. */
+ ret = madvise(mem + size / 2, size / 2, MADV_DONTFORK);
+ if (ret) {
+ ksft_test_result_fail("MADV_DONTFORK failed\n");
+ goto close_comm_pipes;
+ }
+ break;
+ case ANON_THP_COLLAPSE_UPPER_SHARED:
+ /* Don't COW-share the lower part of the THP. */
+ ret = madvise(mem, size / 2, MADV_DONTFORK);
+ if (ret) {
+ ksft_test_result_fail("MADV_DONTFORK failed\n");
+ goto close_comm_pipes;
+ }
+ break;
+ default:
+ assert(false);
+ }
+
+ ret = fork();
+ if (ret < 0) {
+ ksft_test_result_fail("fork() failed\n");
+ goto close_comm_pipes;
+ } else if (!ret) {
+ switch (test) {
+ case ANON_THP_COLLAPSE_UNSHARED:
+ case ANON_THP_COLLAPSE_FULLY_SHARED:
+ exit(child_memcmp_fn(mem, size, &comm_pipes));
+ break;
+ case ANON_THP_COLLAPSE_LOWER_SHARED:
+ exit(child_memcmp_fn(mem, size / 2, &comm_pipes));
+ break;
+ case ANON_THP_COLLAPSE_UPPER_SHARED:
+ exit(child_memcmp_fn(mem + size / 2, size / 2,
+ &comm_pipes));
+ break;
+ default:
+ assert(false);
+ }
+ }
+
+ while (read(comm_pipes.child_ready[0], &buf, 1) != 1)
+ ;
+
+ switch (test) {
+ case ANON_THP_COLLAPSE_UNSHARED:
+ break;
+ case ANON_THP_COLLAPSE_UPPER_SHARED:
+ case ANON_THP_COLLAPSE_LOWER_SHARED:
+ /*
+ * Revert MADV_DONTFORK such that we merge the VMAs and are
+ * able to actually collapse.
+ */
+ ret = madvise(mem, size, MADV_DOFORK);
+ if (ret) {
+ ksft_test_result_fail("MADV_DOFORK failed\n");
+ write(comm_pipes.parent_ready[1], "0", 1);
+ wait(&ret);
+ goto close_comm_pipes;
+ }
+ /* FALLTHROUGH */
+ case ANON_THP_COLLAPSE_FULLY_SHARED:
+ /* Collapse before anyone modified the COW-shared page. */
+ ret = madvise(mem, size, MADV_COLLAPSE);
+ if (ret) {
+ ksft_test_result_skip("MADV_COLLAPSE failed: %s\n",
+ strerror(errno));
+ write(comm_pipes.parent_ready[1], "0", 1);
+ wait(&ret);
+ goto close_comm_pipes;
+ }
+ break;
+ default:
+ assert(false);
+ }
+
+ /* Modify the page. */
+ memset(mem, 0xff, size);
+ write(comm_pipes.parent_ready[1], "0", 1);
+
+ wait(&ret);
+ if (WIFEXITED(ret))
+ ret = WEXITSTATUS(ret);
+ else
+ ret = -EINVAL;
+
+ ksft_test_result(!ret, "No leak from parent into child\n");
+close_comm_pipes:
+ close_comm_pipes(&comm_pipes);
+}
+
+static void test_anon_thp_collapse_unshared(char *mem, size_t size)
+{
+ do_test_anon_thp_collapse(mem, size, ANON_THP_COLLAPSE_UNSHARED);
+}
+
+static void test_anon_thp_collapse_fully_shared(char *mem, size_t size)
+{
+ do_test_anon_thp_collapse(mem, size, ANON_THP_COLLAPSE_FULLY_SHARED);
+}
+
+static void test_anon_thp_collapse_lower_shared(char *mem, size_t size)
+{
+ do_test_anon_thp_collapse(mem, size, ANON_THP_COLLAPSE_LOWER_SHARED);
+}
+
+static void test_anon_thp_collapse_upper_shared(char *mem, size_t size)
+{
+ do_test_anon_thp_collapse(mem, size, ANON_THP_COLLAPSE_UPPER_SHARED);
+}
+
+/*
+ * Test cases that are specific to anonymous THP: pages in private mappings
+ * that may get shared via COW during fork().
+ */
+static const struct test_case anon_thp_test_cases[] = {
+ /*
+ * Basic COW test for fork() without any GUP when collapsing a THP
+ * before fork().
+ *
+ * Re-mapping a PTE-mapped anon THP using a single PMD ("in-place
+ * collapse") might easily get COW handling wrong when not collapsing
+ * exclusivity information properly.
+ */
+ {
+ "Basic COW after fork() when collapsing before fork()",
+ test_anon_thp_collapse_unshared,
+ },
+ /* Basic COW test, but collapse after COW-sharing a full THP. */
+ {
+ "Basic COW after fork() when collapsing after fork() (fully shared)",
+ test_anon_thp_collapse_fully_shared,
+ },
+ /*
+ * Basic COW test, but collapse after COW-sharing the lower half of a
+ * THP.
+ */
+ {
+ "Basic COW after fork() when collapsing after fork() (lower shared)",
+ test_anon_thp_collapse_lower_shared,
+ },
+ /*
+ * Basic COW test, but collapse after COW-sharing the upper half of a
+ * THP.
+ */
+ {
+ "Basic COW after fork() when collapsing after fork() (upper shared)",
+ test_anon_thp_collapse_upper_shared,
+ },
+};
+
+static void run_anon_thp_test_cases(void)
+{
+ int i;
+
+ if (!thpsize)
+ return;
+
+ ksft_print_msg("[INFO] Anonymous THP tests\n");
+
+ for (i = 0; i < ARRAY_SIZE(anon_thp_test_cases); i++) {
+ struct test_case const *test_case = &anon_thp_test_cases[i];
+
+ ksft_print_msg("[RUN] %s\n", test_case->desc);
+ do_run_with_thp(test_case->fn, THP_RUN_PMD);
+ }
+}
+
+static int tests_per_anon_thp_test_case(void)
+{
+ return thpsize ? 1 : 0;
+}
+
typedef void (*non_anon_test_fn)(char *mem, const char *smem, size_t size);
static void test_cow(char *mem, const char *smem, size_t size)
@@ -1518,6 +1744,7 @@ int main(int argc, char **argv)
ksft_print_header();
ksft_set_plan(ARRAY_SIZE(anon_test_cases) * tests_per_anon_test_case() +
+ ARRAY_SIZE(anon_thp_test_cases) * tests_per_anon_thp_test_case() +
ARRAY_SIZE(non_anon_test_cases) * tests_per_non_anon_test_case());
gup_fd = open("/sys/kernel/debug/gup_test", O_RDWR);
@@ -1526,6 +1753,7 @@ int main(int argc, char **argv)
ksft_exit_fail_msg("opening pagemap failed\n");
run_anon_test_cases();
+ run_anon_thp_test_cases();
run_non_anon_test_cases();
err = ksft_get_fail_cnt();
--
2.39.0
This series implements selftests targeting the feature floated by Chao via:
https://lore.kernel.org/lkml/20221202061347.1070246-10-chao.p.peng@linux.in…
Below changes aim to test the fd based approach for guest private memory
in context of normal (non-confidential) VMs executing on non-confidential
platforms.
private_mem_test.c file adds selftest to access private memory from the
guest via private/shared accesses and checking if the contents can be
leaked to/accessed by vmm via shared memory view before/after conversions.
Updates in V2:
1) Simplified vcpu run loop implementation API
2) Removed VM creation logic from private mem library
Updates in V1 (Compared to RFC v3 patches):
1) Incorporated suggestions from Sean around simplifying KVM changes
2) Addressed comments from Sean
3) Added private mem test with shared memory backed by 2MB hugepages.
V1 series:
https://lore.kernel.org/lkml/20221111014244.1714148-1-vannapurve@google.com…
This series has dependency on following patches:
1) V10 series patches from Chao mentioned above.
Github link for the patches posted as part of this series:
https://github.com/vishals4gh/linux/commits/priv_memfd_selftests_v2
Vishal Annapurve (6):
KVM: x86: Add support for testing private memory
KVM: Selftests: Add support for private memory
KVM: selftests: x86: Add IS_ALIGNED/IS_PAGE_ALIGNED helpers
KVM: selftests: x86: Add helpers to execute VMs with private memory
KVM: selftests: Add get_free_huge_2m_pages
KVM: selftests: x86: Add selftest for private memory
arch/x86/kvm/mmu/mmu_internal.h | 6 +-
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 2 +
.../selftests/kvm/include/kvm_util_base.h | 15 +-
.../testing/selftests/kvm/include/test_util.h | 5 +
.../kvm/include/x86_64/private_mem.h | 24 ++
.../selftests/kvm/include/x86_64/processor.h | 1 +
tools/testing/selftests/kvm/lib/kvm_util.c | 58 ++++-
tools/testing/selftests/kvm/lib/test_util.c | 29 +++
.../selftests/kvm/lib/x86_64/private_mem.c | 139 ++++++++++++
.../selftests/kvm/x86_64/private_mem_test.c | 212 ++++++++++++++++++
virt/kvm/Kconfig | 4 +
virt/kvm/kvm_main.c | 3 +-
13 files changed, 490 insertions(+), 9 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/x86_64/private_mem.h
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/private_mem.c
create mode 100644 tools/testing/selftests/kvm/x86_64/private_mem_test.c
--
2.39.0.rc0.267.gcb52ba06e7-goog
From: Ammar Faizi <ammarfaizi2(a)gnuweeb.org>
This is an RFC patchset v2.
Xin Li reported sysret_rip test fails at:
assert(ctx->uc_mcontext.gregs[REG_EFL] ==
ctx->uc_mcontext.gregs[REG_R11]);
in a FRED system. Handle the FRED system scenario too. There are two
patches in this series. Comments welcome...
Note: This patchset is only tested for 'syscall' sets %rcx=%rip and
%r11=%rflags case. I don't have a FRED system to test it.
How to test this:
$ make -C tools/testing/selftests/x86
$ tools/testing/selftests/x86/sysret_rip_64
Link: https://lore.kernel.org/lkml/5d4ad3e3-034f-c7da-d141-9c001c2343af@intel.com
Signed-off-by: Ammar Faizi <ammarfaizi2(a)gnuweeb.org>
---
## Changelog v2:
- Use "+r"(rsp) as the right way to avoid redzone problems
per Andrew's comment (hpa).
(Ref: https://lore.kernel.org/lkml/8f5c24df-514d-5d89-f58f-ec8c3eb1e049@zytor.com )
---
Ammar Faizi (2):
selftests/x86: sysret_rip: Handle syscall in a FRED system
selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`
tools/testing/selftests/x86/sysret_rip.c | 105 ++++++++++++++++++++++-
1 file changed, 104 insertions(+), 1 deletion(-)
base-commit: e12ad468c22065a2826b2fc4c11d2113a7975301
--
Ammar Faizi
Hello,
This is v3 of the patch series for TDX selftests.
It has been updated for Intel’s V10 of the TDX host patches which was
proposed in https://lkml.org/lkml/2022/8/8/877
The tree can be found at
https://github.com/googleprodkernel/linux-cc/tree/tdx-selftests-rfc-v3/
Changes from RFC v2:
Selftest setup now builds upon the KVM selftest framework when setting
up the guest for testing. We now use the KVM selftest framework to
build the guest page tables and load the ELF binary into guest memory.
Inlining of the entire guest image is no longer required and that
allows us to cleanly separate code into different compilation units
and be able to use proper assembly instead of inline assembly
(addresses Sean’s comment).
To achieve this, we take a dependency on the SEV VM tests:
https://lore.kernel.org/lkml/20221018205845.770121-1-pgonda@google.com/T/. Those
patches provide functions for the host to allocate and track protected
memory in the guest.
In RFCv3, TDX selftest code is organized into:
+ headers in tools/testing/selftests/kvm/include/x86_64/tdx/
+ common code in tools/testing/selftests/kvm/lib/x86_64/tdx/
+ selftests in tools/testing/selftests/kvm/x86_64/tdx_*
RFCv3 also adds additional selftests for UPM.
Dependencies
+ Peter’s patches, which provide functions for the host to allocate
and track protected memory in the
guest. https://lore.kernel.org/lkml/20221018205845.770121-1-pgonda@google.com/T/
+ Peter’s patches depend on Sean’s patches:
+ https://lore.kernel.org/linux-arm-kernel/20220825232522.3997340-1-seanjc@go…
+ https://lore.kernel.org/lkml/20221006004512.666529-1-seanjc@google.com/T/
+ Proposed fixes for these these issues mentioned on the mailing list
+ https://lore.kernel.org/lkml/36cde6d6-128d-884e-1447-09b08bb5de3d@intel.com/
+ https://lore.kernel.org/lkml/diqzedtubs0d.fsf@google.com/
+ https://lore.kernel.org/lkml/67b782ee-c95c-d6bc-3cca-cdfe75f4bf6a@intel.com/
+ https://lore.kernel.org/lkml/diqzcz7cd983.fsf@ackerleytng-cloudtop-sg.c.goo…
+ https://lore.kernel.org/linux-mm/20221116205025.1510291-1-ackerleytng@googl…
Further work for this patch series/TODOs
+ Sean’s comments for the non-confidential UPM selftests patch series
at https://lore.kernel.org/lkml/Y8dC8WDwEmYixJqt@google.com/T/#u apply
here as well
+ Add ucall support for TDX selftests
I would also like to acknowledge the following people, who helped
review or test patches in RFCv1 and RFCv2:
+ Sean Christopherson <seanjc(a)google.com>
+ Zhenzhong Duan <zhenzhong.duan(a)intel.com>
+ Peter Gonda <pgonda(a)google.com>
+ Andrew Jones <drjones(a)redhat.com>
+ Maxim Levitsky <mlevitsk(a)redhat.com>
+ Xiaoyao Li <xiaoyao.li(a)intel.com>
+ David Matlack <dmatlack(a)google.com>
+ Marc Orr <marcorr(a)google.com>
+ Isaku Yamahata <isaku.yamahata(a)gmail.com>
Links to earlier patch series
+ RFC v1: https://lore.kernel.org/lkml/20210726183816.1343022-1-erdemaktas@google.com…
+ RFC v2: https://lore.kernel.org/lkml/20220830222000.709028-1-sagis@google.com/T/#u
Ackerley Tng (14):
KVM: selftests: Add function to allow one-to-one GVA to GPA mappings
KVM: selftests: Expose function that sets up sregs based on VM's mode
KVM: selftests: Store initial stack address in struct kvm_vcpu
KVM: selftests: Refactor steps in vCPU descriptor table initialization
KVM: selftests: TDX: Use KVM_TDX_CAPABILITIES to validate TDs'
attribute configuration
KVM: selftests: Require GCC to realign stacks on function entry
KVM: selftests: Add functions to allow mapping as shared
KVM: selftests: Add support for restricted memory
KVM: selftests: TDX: Update load_td_memory_region for VM memory backed
by restricted memfd
KVM: selftests: Expose _vm_vaddr_alloc
KVM: selftests: TDX: Add support for TDG.MEM.PAGE.ACCEPT
KVM: selftests: TDX: Add support for TDG.VP.VEINFO.GET
KVM: selftests: TDX: Add TDX UPM selftest
KVM: selftests: TDX: Add TDX UPM selftests for implicit conversion
Erdem Aktas (4):
KVM: selftests: Add support for creating non-default type VMs
KVM: selftests: Add helper functions to create TDX VMs
KVM: selftests: TDX: Add TDX lifecycle test
KVM: selftests: TDX: Adding test case for TDX port IO
Roger Wang (1):
KVM: selftests: TDX: Add TDG.VP.INFO test
Ryan Afranji (2):
KVM: selftests: TDX: Verify the behavior when host consumes a TD
private memory
KVM: selftests: TDX: Add shared memory test
Sagi Shahar (10):
KVM: selftests: TDX: Add report_fatal_error test
KVM: selftests: TDX: Add basic TDX CPUID test
KVM: selftests: TDX: Add basic get_td_vmcall_info test
KVM: selftests: TDX: Add TDX IO writes test
KVM: selftests: TDX: Add TDX IO reads test
KVM: selftests: TDX: Add TDX MSR read/write tests
KVM: selftests: TDX: Add TDX HLT exit test
KVM: selftests: TDX: Add TDX MMIO reads test
KVM: selftests: TDX: Add TDX MMIO writes test
KVM: selftests: TDX: Add TDX CPUID TDVMCALL test
tools/testing/selftests/kvm/.gitignore | 3 +
tools/testing/selftests/kvm/Makefile | 10 +-
.../selftests/kvm/include/kvm_util_base.h | 43 +-
.../testing/selftests/kvm/include/test_util.h | 2 +
.../selftests/kvm/include/x86_64/processor.h | 4 +
.../kvm/include/x86_64/tdx/td_boot.h | 82 +
.../kvm/include/x86_64/tdx/td_boot_asm.h | 16 +
.../selftests/kvm/include/x86_64/tdx/tdcall.h | 59 +
.../selftests/kvm/include/x86_64/tdx/tdx.h | 65 +
.../kvm/include/x86_64/tdx/tdx_util.h | 19 +
.../kvm/include/x86_64/tdx/test_util.h | 164 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 123 +-
tools/testing/selftests/kvm/lib/test_util.c | 7 +
.../selftests/kvm/lib/x86_64/processor.c | 77 +-
tools/testing/selftests/kvm/lib/x86_64/sev.c | 2 +-
.../selftests/kvm/lib/x86_64/tdx/td_boot.S | 101 ++
.../selftests/kvm/lib/x86_64/tdx/tdcall.S | 158 ++
.../selftests/kvm/lib/x86_64/tdx/tdx.c | 231 +++
.../selftests/kvm/lib/x86_64/tdx/tdx_util.c | 562 +++++++
.../selftests/kvm/lib/x86_64/tdx/test_util.c | 101 ++
.../kvm/x86_64/tdx_shared_mem_test.c | 137 ++
.../selftests/kvm/x86_64/tdx_upm_test.c | 460 ++++++
.../selftests/kvm/x86_64/tdx_vm_tests.c | 1329 +++++++++++++++++
23 files changed, 3709 insertions(+), 46 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/td_boot.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/td_boot_asm.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/tdcall.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/tdx.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/tdx_util.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/test_util.h
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/td_boot.S
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/tdcall.S
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/tdx.c
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/tdx_util.c
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/test_util.c
create mode 100644 tools/testing/selftests/kvm/x86_64/tdx_shared_mem_test.c
create mode 100644 tools/testing/selftests/kvm/x86_64/tdx_upm_test.c
create mode 100644 tools/testing/selftests/kvm/x86_64/tdx_vm_tests.c
--
2.39.0.246.g2a6d74b583-goog
The Felix VSC9959 switch in NXP LS1028A supports the tc-gate action
which enforced time-based access control per stream. A stream as seen by
this switch is identified by {MAC DA, VID}.
We use the standard forwarding selftest topology with 2 host interfaces
and 2 switch interfaces. The host ports must require timestamping non-IP
packets and supporting tc-etf offload, for isochron to work. The
isochron program monitors network sync status (ptp4l, phc2sys) and
deterministically transmits packets to the switch such that the tc-gate
action either (a) always accepts them based on its schedule, or
(b) always drops them.
I tried to keep as much of the logic that isn't specific to the NXP
LS1028A in a new tsn_lib.sh, for future reuse. This covers
synchronization using ptp4l and phc2sys, and isochron.
The cycle-time chosen for this selftest isn't particularly impressive
(and the focus is the functionality of the switch), but I didn't really
know what to do better, considering that it will mostly be run during
debugging sessions, various kernel bloatware would be enabled, like
lockdep, KASAN, etc, and we certainly can't run any races with those on.
I tried to look through the kselftest framework for other real time
applications and didn't really find any, so I'm not sure how better to
prepare the environment in case we want to go for a lower cycle time.
At the moment, the only thing the selftest is ensuring is that dynamic
frequency scaling is disabled on the CPU that isochron runs on. It would
probably be useful to have a blacklist of kernel config options (checked
through zcat /proc/config.gz) and some cyclictest scripts to run
beforehand, but I saw none of those.
Signed-off-by: Vladimir Oltean <vladimir.oltean(a)nxp.com>
---
v1->v2:
- fix an off-by-one bug introduced at the last minute regarding which
tc-mqprio queue was used for tc-etf and SO_TXTIME
- introduce debugging for packets incorrectly received / incorrectly
dropped based on "isochron report"
- make the tsn_lib.sh dependency on isochron and linuxptp optional via
REQUIRE_ISOCHRON and REQUIRE_LINUXPTP
- avoid errors when CONFIG_CPU_FREQ is disabled
- consistently use SCHED_FIFO instead of SCHED_RR for the isochron
receiver
.../selftests/drivers/net/ocelot/psfp.sh | 327 ++++++++++++++++++
.../selftests/net/forwarding/tsn_lib.sh | 235 +++++++++++++
2 files changed, 562 insertions(+)
create mode 100755 tools/testing/selftests/drivers/net/ocelot/psfp.sh
create mode 100644 tools/testing/selftests/net/forwarding/tsn_lib.sh
diff --git a/tools/testing/selftests/drivers/net/ocelot/psfp.sh b/tools/testing/selftests/drivers/net/ocelot/psfp.sh
new file mode 100755
index 000000000000..5a5cee92c665
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/ocelot/psfp.sh
@@ -0,0 +1,327 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright 2021-2022 NXP
+
+# Note: On LS1028A, in lack of enough user ports, this setup requires patching
+# the device tree to use the second CPU port as a user port
+
+WAIT_TIME=1
+NUM_NETIFS=4
+STABLE_MAC_ADDRS=yes
+NETIF_CREATE=no
+lib_dir=$(dirname $0)/../../../net/forwarding
+source $lib_dir/tc_common.sh
+source $lib_dir/lib.sh
+source $lib_dir/tsn_lib.sh
+
+UDS_ADDRESS_H1="/var/run/ptp4l_h1"
+UDS_ADDRESS_SWP1="/var/run/ptp4l_swp1"
+
+# Tunables
+NUM_PKTS=1000
+STREAM_VID=100
+STREAM_PRIO=6
+# Use a conservative cycle of 10 ms to allow the test to still pass when the
+# kernel has some extra overhead like lockdep etc
+CYCLE_TIME_NS=10000000
+# Create two Gate Control List entries, one OPEN and one CLOSE, of equal
+# durations
+GATE_DURATION_NS=$((${CYCLE_TIME_NS} / 2))
+# Give 2/3 of the cycle time to user space and 1/3 to the kernel
+FUDGE_FACTOR=$((${CYCLE_TIME_NS} / 3))
+# Shift the isochron base time by half the gate time, so that packets are
+# always received by swp1 close to the middle of the time slot, to minimize
+# inaccuracies due to network sync
+SHIFT_TIME_NS=$((${GATE_DURATION_NS} / 2))
+
+h1=${NETIFS[p1]}
+swp1=${NETIFS[p2]}
+swp2=${NETIFS[p3]}
+h2=${NETIFS[p4]}
+
+H1_IPV4="192.0.2.1"
+H2_IPV4="192.0.2.2"
+H1_IPV6="2001:db8:1::1"
+H2_IPV6="2001:db8:1::2"
+
+# Chain number exported by the ocelot driver for
+# Per-Stream Filtering and Policing filters
+PSFP()
+{
+ echo 30000
+}
+
+psfp_chain_create()
+{
+ local if_name=$1
+
+ tc qdisc add dev $if_name clsact
+
+ tc filter add dev $if_name ingress chain 0 pref 49152 flower \
+ skip_sw action goto chain $(PSFP)
+}
+
+psfp_chain_destroy()
+{
+ local if_name=$1
+
+ tc qdisc del dev $if_name clsact
+}
+
+psfp_filter_check()
+{
+ local expected=$1
+ local packets=""
+ local drops=""
+ local stats=""
+
+ stats=$(tc -j -s filter show dev ${swp1} ingress chain $(PSFP) pref 1)
+ packets=$(echo ${stats} | jq ".[1].options.actions[].stats.packets")
+ drops=$(echo ${stats} | jq ".[1].options.actions[].stats.drops")
+
+ if ! [ "${packets}" = "${expected}" ]; then
+ printf "Expected filter to match on %d packets but matched on %d instead\n" \
+ "${expected}" "${packets}"
+ fi
+
+ echo "Hardware filter reports ${drops} drops"
+}
+
+h1_create()
+{
+ simple_if_init $h1 $H1_IPV4/24 $H1_IPV6/64
+}
+
+h1_destroy()
+{
+ simple_if_fini $h1 $H1_IPV4/24 $H1_IPV6/64
+}
+
+h2_create()
+{
+ simple_if_init $h2 $H2_IPV4/24 $H2_IPV6/64
+}
+
+h2_destroy()
+{
+ simple_if_fini $h2 $H2_IPV4/24 $H2_IPV6/64
+}
+
+switch_create()
+{
+ local h2_mac_addr=$(mac_get $h2)
+
+ ip link set ${swp1} up
+ ip link set ${swp2} up
+
+ ip link add br0 type bridge vlan_filtering 1
+ ip link set ${swp1} master br0
+ ip link set ${swp2} master br0
+ ip link set br0 up
+
+ bridge vlan add dev ${swp2} vid ${STREAM_VID}
+ bridge vlan add dev ${swp1} vid ${STREAM_VID}
+ # PSFP on Ocelot requires the filter to also be added to the bridge
+ # FDB, and not be removed
+ bridge fdb add dev ${swp2} \
+ ${h2_mac_addr} vlan ${STREAM_VID} static master
+
+ psfp_chain_create ${swp1}
+
+ tc filter add dev ${swp1} ingress chain $(PSFP) pref 1 \
+ protocol 802.1Q flower skip_sw \
+ dst_mac ${h2_mac_addr} vlan_id ${STREAM_VID} \
+ action gate base-time 0.000000000 \
+ sched-entry OPEN ${GATE_DURATION_NS} -1 -1 \
+ sched-entry CLOSE ${GATE_DURATION_NS} -1 -1
+}
+
+switch_destroy()
+{
+ psfp_chain_destroy ${swp1}
+ ip link del br0
+}
+
+txtime_setup()
+{
+ local if_name=$1
+
+ tc qdisc add dev ${if_name} clsact
+ # Classify PTP on TC 7 and isochron on TC 6
+ tc filter add dev ${if_name} egress protocol 0x88f7 \
+ flower action skbedit priority 7
+ tc filter add dev ${if_name} egress protocol 802.1Q \
+ flower vlan_ethtype 0xdead action skbedit priority 6
+ tc qdisc add dev ${if_name} handle 100: parent root mqprio num_tc 8 \
+ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
+ map 0 1 2 3 4 5 6 7 \
+ hw 1
+ # Set up TC 6 for SO_TXTIME. tc-mqprio queues count from 1.
+ tc qdisc replace dev ${if_name} parent 100:$((${STREAM_PRIO} + 1)) etf \
+ clockid CLOCK_TAI offload delta ${FUDGE_FACTOR}
+}
+
+txtime_cleanup()
+{
+ local if_name=$1
+
+ tc qdisc del dev ${if_name} root
+ tc qdisc del dev ${if_name} clsact
+}
+
+setup_prepare()
+{
+ vrf_prepare
+
+ h1_create
+ h2_create
+ switch_create
+
+ txtime_setup ${h1}
+
+ # Set up swp1 as a master PHC for h1, synchronized to the local
+ # CLOCK_REALTIME.
+ phc2sys_start ${swp1} ${UDS_ADDRESS_SWP1}
+
+ # Assumption true for LS1028A: h1 and h2 use the same PHC. So by
+ # synchronizing h1 to swp1 via PTP, h2 is also implicitly synchronized
+ # to swp1 (and both to CLOCK_REALTIME).
+ ptp4l_start ${h1} true ${UDS_ADDRESS_H1}
+ ptp4l_start ${swp1} false ${UDS_ADDRESS_SWP1}
+
+ # Make sure there are no filter matches at the beginning of the test
+ psfp_filter_check 0
+}
+
+cleanup()
+{
+ pre_cleanup
+
+ ptp4l_stop ${swp1}
+ ptp4l_stop ${h1}
+ phc2sys_stop
+ isochron_recv_stop
+
+ txtime_cleanup ${h1}
+
+ h2_destroy
+ h1_destroy
+ switch_destroy
+
+ vrf_cleanup
+}
+
+debug_incorrectly_dropped_packets()
+{
+ local isochron_dat=$1
+ local dropped_seqids
+ local seqid
+
+ echo "Packets incorrectly dropped:"
+
+ dropped_seqids=$(isochron report \
+ --input-file "${isochron_dat}" \
+ --printf-format "%u RX hw %T\n" \
+ --printf-args "qR" | \
+ grep 'RX hw 0.000000000' | \
+ awk '{print $1}')
+
+ for seqid in ${dropped_seqids}; do
+ isochron report \
+ --input-file "${isochron_dat}" \
+ --start ${seqid} --stop ${seqid} \
+ --printf-format "seqid %u scheduled for %T, HW TX timestamp %T\n" \
+ --printf-args "qST"
+ done
+}
+
+debug_incorrectly_received_packets()
+{
+ local isochron_dat=$1
+
+ echo "Packets incorrectly received:"
+
+ isochron report \
+ --input-file "${isochron_dat}" \
+ --printf-format "seqid %u scheduled for %T, HW TX timestamp %T, HW RX timestamp %T\n" \
+ --printf-args "qSTR" |
+ grep -v 'HW RX timestamp 0.000000000'
+}
+
+run_test()
+{
+ local base_time=$1
+ local expected=$2
+ local test_name=$3
+ local debug=$4
+ local isochron_dat="$(mktemp)"
+ local extra_args=""
+ local received
+
+ isochron_do \
+ "${h1}" \
+ "${h2}" \
+ "${UDS_ADDRESS_H1}" \
+ "" \
+ "${base_time}" \
+ "${CYCLE_TIME_NS}" \
+ "${SHIFT_TIME_NS}" \
+ "${NUM_PKTS}" \
+ "${STREAM_VID}" \
+ "${STREAM_PRIO}" \
+ "" \
+ "${isochron_dat}"
+
+ # Count all received packets by looking at the non-zero RX timestamps
+ received=$(isochron report \
+ --input-file "${isochron_dat}" \
+ --printf-format "%u\n" --printf-args "R" | \
+ grep -w -v '0' | wc -l)
+
+ if [ "${received}" = "${expected}" ]; then
+ RET=0
+ else
+ RET=1
+ echo "Expected isochron to receive ${expected} packets but received ${received}"
+ fi
+
+ log_test "${test_name}"
+
+ if [ "$RET" = "1" ]; then
+ ${debug} "${isochron_dat}"
+ fi
+
+ rm ${isochron_dat} 2> /dev/null
+}
+
+test_gate_in_band()
+{
+ # Send packets in-band with the OPEN gate entry
+ run_test 0.000000000 ${NUM_PKTS} "In band" \
+ debug_incorrectly_dropped_packets
+
+ psfp_filter_check ${NUM_PKTS}
+}
+
+test_gate_out_of_band()
+{
+ # Send packets in-band with the CLOSE gate entry
+ run_test 0.005000000 0 "Out of band" \
+ debug_incorrectly_received_packets
+
+ psfp_filter_check $((2 * ${NUM_PKTS}))
+}
+
+trap cleanup EXIT
+
+ALL_TESTS="
+ test_gate_in_band
+ test_gate_out_of_band
+"
+
+setup_prepare
+setup_wait
+
+tests_run
+
+exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/tsn_lib.sh b/tools/testing/selftests/net/forwarding/tsn_lib.sh
new file mode 100644
index 000000000000..60a1423e8116
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/tsn_lib.sh
@@ -0,0 +1,235 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright 2021-2022 NXP
+
+REQUIRE_ISOCHRON=${REQUIRE_ISOCHRON:=yes}
+REQUIRE_LINUXPTP=${REQUIRE_LINUXPTP:=yes}
+
+# Tunables
+UTC_TAI_OFFSET=37
+ISOCHRON_CPU=1
+
+if [[ "$REQUIRE_ISOCHRON" = "yes" ]]; then
+ # https://github.com/vladimiroltean/tsn-scripts
+ # WARNING: isochron versions pre-1.0 are unstable,
+ # always use the latest version
+ require_command isochron
+fi
+if [[ "$REQUIRE_LINUXPTP" = "yes" ]]; then
+ require_command phc2sys
+ require_command ptp4l
+fi
+
+phc2sys_start()
+{
+ local if_name=$1
+ local uds_address=$2
+ local extra_args=""
+
+ if ! [ -z "${uds_address}" ]; then
+ extra_args="${extra_args} -z ${uds_address}"
+ fi
+
+ phc2sys_log="$(mktemp)"
+
+ chrt -f 10 phc2sys -m \
+ -c ${if_name} \
+ -s CLOCK_REALTIME \
+ -O ${UTC_TAI_OFFSET} \
+ --step_threshold 0.00002 \
+ --first_step_threshold 0.00002 \
+ ${extra_args} \
+ > "${phc2sys_log}" 2>&1 &
+ phc2sys_pid=$!
+
+ echo "phc2sys logs to ${phc2sys_log} and has pid ${phc2sys_pid}"
+
+ sleep 1
+}
+
+phc2sys_stop()
+{
+ { kill ${phc2sys_pid} && wait ${phc2sys_pid}; } 2> /dev/null
+ rm "${phc2sys_log}" 2> /dev/null
+}
+
+ptp4l_start()
+{
+ local if_name=$1
+ local slave_only=$2
+ local uds_address=$3
+ local log="ptp4l_log_${if_name}"
+ local pid="ptp4l_pid_${if_name}"
+ local extra_args=""
+
+ if [ "${slave_only}" = true ]; then
+ extra_args="${extra_args} -s"
+ fi
+
+ # declare dynamic variables ptp4l_log_${if_name} and ptp4l_pid_${if_name}
+ # as global, so that they can be referenced later
+ declare -g "${log}=$(mktemp)"
+
+ chrt -f 10 ptp4l -m -2 -P \
+ -i ${if_name} \
+ --step_threshold 0.00002 \
+ --first_step_threshold 0.00002 \
+ --tx_timestamp_timeout 100 \
+ --uds_address="${uds_address}" \
+ ${extra_args} \
+ > "${!log}" 2>&1 &
+ declare -g "${pid}=$!"
+
+ echo "ptp4l for interface ${if_name} logs to ${!log} and has pid ${!pid}"
+
+ sleep 1
+}
+
+ptp4l_stop()
+{
+ local if_name=$1
+ local log="ptp4l_log_${if_name}"
+ local pid="ptp4l_pid_${if_name}"
+
+ { kill ${!pid} && wait ${!pid}; } 2> /dev/null
+ rm "${!log}" 2> /dev/null
+}
+
+cpufreq_max()
+{
+ local cpu=$1
+ local freq="cpu${cpu}_freq"
+ local governor="cpu${cpu}_governor"
+
+ # Kernel may be compiled with CONFIG_CPU_FREQ disabled
+ if ! [ -d /sys/bus/cpu/devices/cpu${cpu}/cpufreq ]; then
+ return
+ fi
+
+ # declare dynamic variables cpu${cpu}_freq and cpu${cpu}_governor as
+ # global, so they can be referenced later
+ declare -g "${freq}=$(cat /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_min_freq)"
+ declare -g "${governor}=$(cat /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_governor)"
+
+ cat /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_max_freq > \
+ /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_min_freq
+ echo -n "performance" > \
+ /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_governor
+}
+
+cpufreq_restore()
+{
+ local cpu=$1
+ local freq="cpu${cpu}_freq"
+ local governor="cpu${cpu}_governor"
+
+ if ! [ -d /sys/bus/cpu/devices/cpu${cpu}/cpufreq ]; then
+ return
+ fi
+
+ echo "${!freq}" > /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_min_freq
+ echo -n "${!governor}" > \
+ /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_governor
+}
+
+isochron_recv_start()
+{
+ local if_name=$1
+ local uds=$2
+ local extra_args=$3
+
+ if ! [ -z "${uds}" ]; then
+ extra_args="--unix-domain-socket ${uds}"
+ fi
+
+ isochron rcv \
+ --interface ${if_name} \
+ --sched-priority 98 \
+ --sched-fifo \
+ --utc-tai-offset ${UTC_TAI_OFFSET} \
+ --quiet \
+ ${extra_args} & \
+ isochron_pid=$!
+
+ sleep 1
+}
+
+isochron_recv_stop()
+{
+ { kill ${isochron_pid} && wait ${isochron_pid}; } 2> /dev/null
+}
+
+isochron_do()
+{
+ local sender_if_name=$1; shift
+ local receiver_if_name=$1; shift
+ local sender_uds=$1; shift
+ local receiver_uds=$1; shift
+ local base_time=$1; shift
+ local cycle_time=$1; shift
+ local shift_time=$1; shift
+ local num_pkts=$1; shift
+ local vid=$1; shift
+ local priority=$1; shift
+ local dst_ip=$1; shift
+ local isochron_dat=$1; shift
+ local extra_args=""
+ local receiver_extra_args=""
+ local vrf="$(master_name_get ${sender_if_name})"
+ local use_l2="true"
+
+ if ! [ -z "${dst_ip}" ]; then
+ use_l2="false"
+ fi
+
+ if ! [ -z "${vrf}" ]; then
+ dst_ip="${dst_ip}%${vrf}"
+ fi
+
+ if ! [ -z "${vid}" ]; then
+ vid="--vid=${vid}"
+ fi
+
+ if [ -z "${receiver_uds}" ]; then
+ extra_args="${extra_args} --omit-remote-sync"
+ fi
+
+ if ! [ -z "${shift_time}" ]; then
+ extra_args="${extra_args} --shift-time=${shift_time}"
+ fi
+
+ if [ "${use_l2}" = "true" ]; then
+ extra_args="${extra_args} --l2 --etype=0xdead ${vid}"
+ receiver_extra_args="--l2 --etype=0xdead"
+ else
+ extra_args="${extra_args} --l4 --ip-destination=${dst_ip}"
+ receiver_extra_args="--l4"
+ fi
+
+ cpufreq_max ${ISOCHRON_CPU}
+
+ isochron_recv_start "${h2}" "${receiver_uds}" "${receiver_extra_args}"
+
+ isochron send \
+ --interface ${sender_if_name} \
+ --unix-domain-socket ${sender_uds} \
+ --priority ${priority} \
+ --base-time ${base_time} \
+ --cycle-time ${cycle_time} \
+ --num-frames ${num_pkts} \
+ --frame-size 64 \
+ --txtime \
+ --utc-tai-offset ${UTC_TAI_OFFSET} \
+ --cpu-mask $((1 << ${ISOCHRON_CPU})) \
+ --sched-fifo \
+ --sched-priority 98 \
+ --client 127.0.0.1 \
+ --sync-threshold 5000 \
+ --output-file ${isochron_dat} \
+ ${extra_args} \
+ --quiet
+
+ isochron_recv_stop
+
+ cpufreq_restore ${ISOCHRON_CPU}
+}
--
2.25.1
Add support for sockmap to vsock.
We're testing usage of vsock as a way to redirect guest-local UDS requests to
the host and this patch series greatly improves the performance of such a
setup.
Compared to copying packets via userspace, this improves throughput by 121% in
basic testing.
Tested as follows.
Setup: guest unix dgram sender -> guest vsock redirector -> host vsock server
Threads: 1
Payload: 64k
No sockmap:
- 76.3 MB/s
- The guest vsock redirector was
"socat VSOCK-CONNECT:2:1234 UNIX-RECV:/path/to/sock"
Using sockmap (this patch):
- 168.8 MB/s (+121%)
- The guest redirector was a simple sockmap echo server,
redirecting unix ingress to vsock 2:1234 egress.
- Same sender and server programs
*Note: these numbers are from RFC v1
Only the virtio transport has been tested. The loopback transport was used in
writing bpf/selftests, but not thoroughly tested otherwise.
This series requires the skb patch.
Changes in v2:
- vsock/bpf: rename vsock_dgram_* -> vsock_*
- vsock/bpf: change sk_psock_{get,put} and {lock,release}_sock() order to
minimize slock hold time
- vsock/bpf: use "new style" wait
- vsock/bpf: fix bug in wait log
- vsock/bpf: add check that recvmsg sk_type is one dgram, seqpacket, or stream.
Return error if not one of the three.
- virtio/vsock: comment __skb_recv_datagram() usage
- virtio/vsock: do not init copied in read_skb()
- vsock/bpf: add ifdef guard around struct proto in dgram_recvmsg()
- selftests/bpf: add vsock loopback config for aarch64
- selftests/bpf: add vsock loopback config for s390x
- selftests/bpf: remove vsock device from vmtest.sh qemu machine
- selftests/bpf: remove CONFIG_VIRTIO_VSOCKETS=y from config.x86_64
- vsock/bpf: move transport-related (e.g., if (!vsk->transport)) checks out of
fast path
Signed-off-by: Bobby Eshleman <bobby.eshleman(a)bytedance.com>
---
Bobby Eshleman (3):
vsock: support sockmap
selftests/bpf: add vsock to vmtest.sh
selftests/bpf: Add a test case for vsock sockmap
drivers/vhost/vsock.c | 1 +
include/linux/virtio_vsock.h | 1 +
include/net/af_vsock.h | 17 ++
net/vmw_vsock/Makefile | 1 +
net/vmw_vsock/af_vsock.c | 55 ++++++-
net/vmw_vsock/virtio_transport.c | 2 +
net/vmw_vsock/virtio_transport_common.c | 24 +++
net/vmw_vsock/vsock_bpf.c | 175 +++++++++++++++++++++
net/vmw_vsock/vsock_loopback.c | 2 +
tools/testing/selftests/bpf/config.aarch64 | 2 +
tools/testing/selftests/bpf/config.s390x | 3 +
tools/testing/selftests/bpf/config.x86_64 | 3 +
.../selftests/bpf/prog_tests/sockmap_listen.c | 163 +++++++++++++++++++
13 files changed, 443 insertions(+), 6 deletions(-)
---
base-commit: d83115ce337a632f996e44c9f9e18cadfcf5a094
change-id: 20230118-support-vsock-sockmap-connectible-2e1297d2111a
Best regards,
--
Bobby Eshleman <bobby.eshleman(a)bytedance.com>
From: Ammar Faizi <ammarfaizi2(a)gnuweeb.org>
On Mon, 23 Jan 2023 15:58:12 -0800, "H. Peter Anvin" wrote:
> On 1/23/23 15:43, Ammar Faizi wrote:
> >
> > Align them to spot differences:
> >
> > 0x200893 = 0b1000000000100010010011
> > 0x200a93 = 0b1000000000101010010011
> > ^
> >
> > Or just xor them to find the differences:
> >
> > (gdb) p/x 0x200893 ^ 0x200a93
> > $3 = 0x200
> >
> > ** Checks my Intel SDM cheat sheets. **
> >
> > Then, I was like "Oh, that's (1 << 9) a.k.a. IF. Of course we can't
> > change rflags[IF] from userspace!!!".
> >
> > In short, we can't use 0x200893 as the rflags_sentinel value because it
> > clears the interrupt flag.
> >
>
> Right, my mistake.
I changed it to 0x200a93. The test passed on my machine. But I don't
have a FRED system to test the special case.
Didn't manage to apply the feedback from Andrew about the way to handle
redzone properly, though.
Something like this...
----------
This is just an RFC patchset.
Xin Li reported sysret_rip test fails at:
assert(ctx->uc_mcontext.gregs[REG_EFL] ==
ctx->uc_mcontext.gregs[REG_R11]);
in a FRED system. Handle the FRED system scenario too. There are two
patches in this series. Comments welcome...
Note: Only tested for 'syscall' sets %rcx=%rip and %r11=%rflags case.
I don't have a FRED system to test it.
How to test this:
$ make -C tools/testing/selftests/x86
$ tools/testing/selftests/x86/sysret_rip_64
Link: https://lore.kernel.org/lkml/5d4ad3e3-034f-c7da-d141-9c001c2343af@intel.com
Signed-off-by: Ammar Faizi <ammarfaizi2(a)gnuweeb.org>
---
Ammar Faizi (2):
selftests/x86: sysret_rip: Handle syscall in a FRED system
selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`
tools/testing/selftests/x86/sysret_rip.c | 105 ++++++++++++++++++++++-
1 file changed, 104 insertions(+), 1 deletion(-)
base-commit: e12ad468c22065a2826b2fc4c11d2113a7975301
--
Ammar Faizi