November 2025 - Linux-kselftest-mirror

[PATCH bpf-next v11 0/8] bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags for percpu maps

by Leon Hwang

This patch set introduces the BPF_F_CPU and BPF_F_ALL_CPUS flags for percpu maps, as the requirement of BPF_F_ALL_CPUS flag for percpu_array maps was discussed in the thread of "[PATCH bpf-next v3 0/4] bpf: Introduce global percpu data"[1]. The goal of BPF_F_ALL_CPUS flag is to reduce data caching overhead in light skeletons by allowing a single value to be reused to update values across all CPUs. This avoids the M:N problem where M cached values are used to update a map on N CPUs kernel. The BPF_F_CPU flag is accompanied by *flags*-embedded cpu info, which specifies the target CPU for the operation: * For lookup operations: the flag field alongside cpu info enable querying a value on the specified CPU. * For update operations: the flag field alongside cpu info enable updating value for specified CPU. Links: [1] https://lore.kernel.org/bpf/20250526162146.24429-1-leon.hwang@linux.dev/ Changes: v10 -> v11: * Support the combination of BPF_EXIST and BPF_F_CPU/BPF_F_ALL_CPUS for update operations. * Fix unstable lru_percpu_hash map test using the combination of BPF_EXIST and BPF_F_CPU/BPF_F_ALL_CPUS to avoid LRU eviction (reported by Alexei). v9 -> v10: * Add tests to verify array and hash maps do not support BPF_F_CPU and BPF_F_ALL_CPUS flags. * Address comment from Andrii: * Copy map value using copy_map_value_long for percpu_cgroup_storage maps in a separate patch. v8 -> v9: * Change value type from u64 to u32 in selftests. * Address comments from Andrii: * Keep value_size unaligned and update everywhere for consistency when cpu flags are specified. * Update value by getting pointer for percpu hash and percpu cgroup_storage maps. v7 -> v8: * Address comments from Andrii: * Check BPF_F_LOCK when update percpu_array, percpu_hash and lru_percpu_hash maps. * Refactor flags check in __htab_map_lookup_and_delete_batch(). * Keep value_size unaligned and copy value using copy_map_value() in __htab_map_lookup_and_delete_batch() when BPF_F_CPU is specified. * Update warn message in libbpf's validate_map_op(). * Update comment of libbpf's bpf_map__lookup_elem(). v6 -> v7: * Get correct value size for percpu_hash and lru_percpu_hash in update_batch API. * Set 'count' as 'max_entries' in test cases for lookup_batch API. * Address comment from Alexei: * Move cpu flags check into bpf_map_check_op_flags(). v5 -> v6: * Move bpf_map_check_op_flags() from 'bpf.h' to 'syscall.c'. * Address comments from Alexei: * Drop the refactoring code of data copying logic for percpu maps. * Drop bpf_map_check_op_flags() wrappers. v4 -> v5: * Address comments from Andrii: * Refactor data copying logic for all percpu maps. * Drop this_cpu_ptr() micro-optimization. * Drop cpu check in libbpf's validate_map_op(). * Enhance bpf_map_check_op_flags() using *allowed flags* instead of 'extra_flags_mask'. v3 -> v4: * Address comments from Andrii: * Remove unnecessary map_type check in bpf_map_value_size(). * Reduce code churn. * Remove unnecessary do_delete check in __htab_map_lookup_and_delete_batch(). * Introduce bpf_percpu_copy_to_user() and bpf_percpu_copy_from_user(). * Rename check_map_flags() to bpf_map_check_op_flags() with extra_flags_mask. * Add human-readable pr_warn() explanations in validate_map_op(). * Use flags in bpf_map__delete_elem() and bpf_map__lookup_and_delete_elem(). * Drop "for alignment reasons". v3 link: https://lore.kernel.org/bpf/20250821160817.70285-1-leon.hwang@linux.dev/ v2 -> v3: * Address comments from Alexei: * Use BPF_F_ALL_CPUS instead of BPF_ALL_CPUS magic. * Introduce these two cpu flags for all percpu maps. * Address comments from Jiri: * Reduce some unnecessary u32 cast. * Refactor more generic map flags check function. * A code style issue. v2 link: https://lore.kernel.org/bpf/20250805163017.17015-1-leon.hwang@linux.dev/ v1 -> v2: * Address comments from Andrii: * Embed cpu info as high 32 bits of *flags* totally. * Use ERANGE instead of E2BIG. * Few format issues. Leon Hwang (8): bpf: Introduce internal bpf_map_check_op_flags helper function bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_array maps bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_hash and lru_percpu_hash maps bpf: Copy map value using copy_map_value_long for percpu_cgroup_storage maps bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_cgroup_storage maps libbpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu maps selftests/bpf: Add cases to test BPF_F_CPU and BPF_F_ALL_CPUS flags include/linux/bpf-cgroup.h | 4 +- include/linux/bpf.h | 44 ++- include/uapi/linux/bpf.h | 2 + kernel/bpf/arraymap.c | 32 +- kernel/bpf/hashtab.c | 96 +++-- kernel/bpf/local_storage.c | 27 +- kernel/bpf/syscall.c | 68 ++-- tools/include/uapi/linux/bpf.h | 2 + tools/lib/bpf/bpf.h | 8 + tools/lib/bpf/libbpf.c | 26 +- tools/lib/bpf/libbpf.h | 21 +- .../selftests/bpf/prog_tests/percpu_alloc.c | 335 ++++++++++++++++++ .../selftests/bpf/progs/percpu_alloc_array.c | 32 ++ 13 files changed, 590 insertions(+), 107 deletions(-) -- 2.51.2

6 days, 7 hours

3
13
0 0

[bpf-next] selftests/bpf: propagate LLVM toolchain into runqslower sub-make

by Hoyeon Lee

The runqslower build invokes a nested make, but the selected LLVM toolchain (via LLVM=-<version>) is not propagated. This causes the sub-make to call the system-default 'clang' and 'llvm-strip' even when a specific LLVM version is intended. # LLVM=-20 V=1 make -C tools/testing/selftests/bpf ... make -C tools/bpf/runqslower ... clang -g -O2 --target=bpfel -I... -c runqslower.bpf.c -o runqslower.bpf.o && \ llvm-strip -g runqslower.bpf.o /bin/sh: 1: clang: not found (expected: clang-20 and llvm-strip-20) Propagate CLANG and LLVM_STRIP to the sub-make to ensure LLVM version consistency across all builds. Signed-off-by: Hoyeon Lee <hoyeon.lee(a)suse.com> --- tools/testing/selftests/bpf/Makefile | 1 + tools/testing/selftests/lib.mk | 1 + 2 files changed, 2 insertions(+) diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 34ea23c63bd5..79ab69920dca 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -306,6 +306,7 @@ endif $(OUTPUT)/runqslower: $(BPFOBJ) | $(DEFAULT_BPFTOOL) $(RUNQSLOWER_OUTPUT) $(Q)$(MAKE) $(submake_extras) -C $(TOOLSDIR)/bpf/runqslower \ + CLANG=$(CLANG) LLVM_STRIP=$(LLVM_STRIP) \ OUTPUT=$(RUNQSLOWER_OUTPUT) VMLINUX_BTF=$(VMLINUX_BTF) \ BPFTOOL_OUTPUT=$(HOST_BUILD_DIR)/bpftool/ \ BPFOBJ_OUTPUT=$(BUILD_DIR)/libbpf/ \ diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk index a448fae57831..f14255b2afbd 100644 --- a/tools/testing/selftests/lib.mk +++ b/tools/testing/selftests/lib.mk @@ -8,6 +8,7 @@ LLVM_SUFFIX := $(LLVM) endif CLANG := $(LLVM_PREFIX)clang$(LLVM_SUFFIX) +LLVM_STRIP := $(LLVM_PREFIX)llvm-strip$(LLVM_SUFFIX) CLANG_TARGET_FLAGS_arm := arm-linux-gnueabi CLANG_TARGET_FLAGS_arm64 := aarch64-linux-gnu -- 2.51.1

6 days, 10 hours

2
1
0 0

[PATCH bpf-next] selftests/bpf: call bpf_get_numa_node_id() in trigger_count()

by Menglong Dong

The bench test "trig-kernel-count" can be used as a baseline comparison for fentry and other benchmarks, and the calling to bpf_get_numa_node_id() should be considered as composition of the baseline. So, let's call it in trigger_count(). Meanwhile, rename trigger_count() to trigger_kernel_count() to make it easier understand. Signed-off-by: Menglong Dong <dongml2(a)chinatelecom.cn> --- tools/testing/selftests/bpf/benchs/bench_trigger.c | 4 ++-- tools/testing/selftests/bpf/progs/trigger_bench.c | 6 ++++-- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/bpf/benchs/bench_trigger.c b/tools/testing/selftests/bpf/benchs/bench_trigger.c index 1e2aff007c2a..34018fc3927f 100644 --- a/tools/testing/selftests/bpf/benchs/bench_trigger.c +++ b/tools/testing/selftests/bpf/benchs/bench_trigger.c @@ -180,10 +180,10 @@ static void trigger_kernel_count_setup(void) { setup_ctx(); bpf_program__set_autoload(ctx.skel->progs.trigger_driver, false); - bpf_program__set_autoload(ctx.skel->progs.trigger_count, true); + bpf_program__set_autoload(ctx.skel->progs.trigger_kernel_count, true); load_ctx(); /* override driver program */ - ctx.driver_prog_fd = bpf_program__fd(ctx.skel->progs.trigger_count); + ctx.driver_prog_fd = bpf_program__fd(ctx.skel->progs.trigger_kernel_count); } static void trigger_kprobe_setup(void) diff --git a/tools/testing/selftests/bpf/progs/trigger_bench.c b/tools/testing/selftests/bpf/progs/trigger_bench.c index 3d5f30c29ae3..2898b3749d07 100644 --- a/tools/testing/selftests/bpf/progs/trigger_bench.c +++ b/tools/testing/selftests/bpf/progs/trigger_bench.c @@ -42,12 +42,14 @@ int bench_trigger_uprobe_multi(void *ctx) const volatile int batch_iters = 0; SEC("?raw_tp") -int trigger_count(void *ctx) +int trigger_kernel_count(void *ctx) { int i; - for (i = 0; i < batch_iters; i++) + for (i = 0; i < batch_iters; i++) { inc_counter(); + bpf_get_numa_node_id(); + } return 0; } -- 2.51.2

6 days, 10 hours

2
1
0 0

[PATCH v3 00/18] vfio: selftests: Support for multi-device tests

by David Matlack

This series adds support for tests that use multiple devices, and adds one new test, vfio_pci_device_init_perf_test, which measures parallel device initialization time to demonstrate the improvement from commit e908f58b6beb ("vfio/pci: Separate SR-IOV VF dev_set"). This series also breaks apart the monolithic vfio_util.h and vfio_pci_device.c into separate files, to account for all the new code. This required quite a bit of code motion so the diffstat looks large. The final layout is more granular and provides a better separation of the IOMMU code from the device code. Final layout: C files: - tools/testing/selftests/vfio/lib/libvfio.c - tools/testing/selftests/vfio/lib/iommu.c - tools/testing/selftests/vfio/lib/iova_allocator.c - tools/testing/selftests/vfio/lib/vfio_pci_device.c - tools/testing/selftests/vfio/lib/vfio_pci_driver.c H files: - tools/testing/selftests/vfio/lib/include/libvfio.h - tools/testing/selftests/vfio/lib/include/libvfio/assert.h - tools/testing/selftests/vfio/lib/include/libvfio/iommu.h - tools/testing/selftests/vfio/lib/include/libvfio/iova_allocator.h - tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h - tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_driver.h Notably, vfio_util.h is now gone and replaced with libvfio.h. This series is based on vfio/next plus Alex Mastro's series to add the IOVA allocator [1]. It should apply cleanly to vfio/next once Alex's series is merged by Linus into the next 6.18 rc and then merged into vfio/next. This series can be found on GitHub: https://github.com/dmatlack/linux/tree/vfio/selftests/init_perf_test/v3 [1] https://lore.kernel.org/kvm/20251111-iova-ranges-v3-0-7960244642c5@fb.com/ Cc: Alex Mastro <amastro(a)fb.com> Cc: Jason Gunthorpe <jgg(a)nvidia.com> Cc: Josh Hilke <jrhilke(a)google.com> Cc: Raghavendra Rao Ananta <rananta(a)google.com> Cc: Vipin Sharma <vipinsh(a)google.com> v3: - Replace literal with NSEC_PER_SEC (Alex Mastro) - Fix Makefile accumulate vs. assignment (Alex Mastro) v2: https://lore.kernel.org/kvm/20251112192232.442761-1-dmatlack@google.com/ v1: https://lore.kernel.org/kvm/20251008232531.1152035-1-dmatlack@google.com/ David Matlack (18): vfio: selftests: Move run.sh into scripts directory vfio: selftests: Split run.sh into separate scripts vfio: selftests: Allow passing multiple BDFs on the command line vfio: selftests: Rename struct vfio_iommu_mode to iommu_mode vfio: selftests: Introduce struct iommu vfio: selftests: Support multiple devices in the same container/iommufd vfio: selftests: Eliminate overly chatty logging vfio: selftests: Prefix logs with device BDF where relevant vfio: selftests: Upgrade driver logging to dev_err() vfio: selftests: Rename struct vfio_dma_region to dma_region vfio: selftests: Move IOMMU library code into iommu.c vfio: selftests: Move IOVA allocator into iova_allocator.c vfio: selftests: Stop passing device for IOMMU operations vfio: selftests: Rename vfio_util.h to libvfio.h vfio: selftests: Move vfio_selftests_*() helpers into libvfio.c vfio: selftests: Split libvfio.h into separate header files vfio: selftests: Eliminate INVALID_IOVA vfio: selftests: Add vfio_pci_device_init_perf_test tools/testing/selftests/vfio/Makefile | 10 +- .../selftests/vfio/lib/drivers/dsa/dsa.c | 36 +- .../selftests/vfio/lib/drivers/ioat/ioat.c | 18 +- .../selftests/vfio/lib/include/libvfio.h | 26 + .../vfio/lib/include/libvfio/assert.h | 54 ++ .../vfio/lib/include/libvfio/iommu.h | 76 +++ .../vfio/lib/include/libvfio/iova_allocator.h | 23 + .../lib/include/libvfio/vfio_pci_device.h | 125 ++++ .../lib/include/libvfio/vfio_pci_driver.h | 97 +++ .../selftests/vfio/lib/include/vfio_util.h | 331 ----------- tools/testing/selftests/vfio/lib/iommu.c | 465 +++++++++++++++ .../selftests/vfio/lib/iova_allocator.c | 94 +++ tools/testing/selftests/vfio/lib/libvfio.c | 78 +++ tools/testing/selftests/vfio/lib/libvfio.mk | 5 +- .../selftests/vfio/lib/vfio_pci_device.c | 555 +----------------- .../selftests/vfio/lib/vfio_pci_driver.c | 16 +- tools/testing/selftests/vfio/run.sh | 109 ---- .../testing/selftests/vfio/scripts/cleanup.sh | 41 ++ tools/testing/selftests/vfio/scripts/lib.sh | 42 ++ tools/testing/selftests/vfio/scripts/run.sh | 16 + tools/testing/selftests/vfio/scripts/setup.sh | 48 ++ .../selftests/vfio/vfio_dma_mapping_test.c | 46 +- .../selftests/vfio/vfio_iommufd_setup_test.c | 2 +- .../vfio/vfio_pci_device_init_perf_test.c | 168 ++++++ .../selftests/vfio/vfio_pci_device_test.c | 12 +- .../selftests/vfio/vfio_pci_driver_test.c | 51 +- 26 files changed, 1481 insertions(+), 1063 deletions(-) create mode 100644 tools/testing/selftests/vfio/lib/include/libvfio.h create mode 100644 tools/testing/selftests/vfio/lib/include/libvfio/assert.h create mode 100644 tools/testing/selftests/vfio/lib/include/libvfio/iommu.h create mode 100644 tools/testing/selftests/vfio/lib/include/libvfio/iova_allocator.h create mode 100644 tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h create mode 100644 tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_driver.h delete mode 100644 tools/testing/selftests/vfio/lib/include/vfio_util.h create mode 100644 tools/testing/selftests/vfio/lib/iommu.c create mode 100644 tools/testing/selftests/vfio/lib/iova_allocator.c create mode 100644 tools/testing/selftests/vfio/lib/libvfio.c delete mode 100755 tools/testing/selftests/vfio/run.sh create mode 100755 tools/testing/selftests/vfio/scripts/cleanup.sh create mode 100755 tools/testing/selftests/vfio/scripts/lib.sh create mode 100755 tools/testing/selftests/vfio/scripts/run.sh create mode 100755 tools/testing/selftests/vfio/scripts/setup.sh create mode 100644 tools/testing/selftests/vfio/vfio_pci_device_init_perf_test.c base-commit: fa804aa4ac1b091ef2ec2981f08a1c28aaeba8e7 prerequisite-patch-id: dcf23dcc1198960bda3102eefaa21df60b2e4c54 prerequisite-patch-id: e32e56d5bf7b6c7dd40d737aa3521560407e00f5 prerequisite-patch-id: 4f79a41bf10a4c025ba5f433551b46035aa15878 prerequisite-patch-id: f903a45f0c32319138cd93a007646ab89132b18c -- 2.52.0.rc2.455.g230fcf2819-goog

6 days, 15 hours

3
25
0 0

Re: [PATCH v3] selftests/seccomp: Fix indentation and rebase error logging patch

by Sameeksha Sankpal

Hi Shuah, Thanks for pointing that out. Apologies for missing the mailing lists earlier. Resending this follow-up with the correct CC list and in plain text format. Please let me know if there’s anything else I should improve in this patch. I’m happy to resend it as v4 if needed. Thanks, Sameeksha On Mon, 24 Nov 2025 at 23:59, Shuah Khan <skhan(a)linuxfoundation.org> wrote: > > On 11/21/25 23:21, Sameeksha Sankpal wrote: > > Hi, > > Just following up on this patch. > > It’s been a few months, so I wanted to check if there is anything else I > > should address or improve to move it forward. > > I see that you didn't cc any mailing list on this email? Please keep > everybody in the loop when you send responses. > > > > > Thanks, > > Sameeksha Sankpal > > > > On Fri, 30 May 2025 at 04:25, Sameeksha Sankpal <sameekshasankpal(a)gmail.com> > > wrote: > > > >> Rebase the error logging enhancement for get_proc_stat() against the > >> upstream seccomp tree with proper indentation formatting. > >> > >> Suggested-by: Kees Cook <kees(a)kernel.org> > >> Signed-off-by: Sameeksha Sankpal <sameekshasankpal(a)gmail.com> > >> --- > >> v1 -> v2: > >> - Used TH_LOG instead of printf for error logging > >> - Moved variable declaration to the top of the function > >> - Applied review suggestion by Kees Cook > >> > >> v2 -> v3: > >> - Rebased against upstream seccomp tree (was previously against v1) > >> - Fixed indentation to use tabs instead of spaces > >> - Used scripts/checkpatch.pl to check the patch for common errors > >> - Removed the blank line beforeS S-o-b added in v2 > >> > >> tools/testing/selftests/seccomp/seccomp_bpf.c | 5 +++++ > >> 1 file changed, 5 insertions(+) > >> > >> diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c > >> b/tools/testing/selftests/seccomp/seccomp_bpf.c > >> index 61acbd45ffaa..dbd7e705a2af 100644 > >> --- a/tools/testing/selftests/seccomp/seccomp_bpf.c > >> +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c > >> @@ -4508,9 +4508,14 @@ static char get_proc_stat(struct __test_metadata > >> *_metadata, pid_t pid) > >> char proc_path[100] = {0}; > >> char status; > >> char *line; > >> + int rc; > >> > >> snprintf(proc_path, sizeof(proc_path), "/proc/%d/stat", pid); > >> ASSERT_EQ(get_nth(_metadata, proc_path, 3, &line), 1); > >> + rc = get_nth(_metadata, proc_path, 3, &line); > >> + ASSERT_EQ(rc, 1) { > >> + TH_LOG("user_notification_fifo: failed to read stat for > >> PID %d (rc=%d)", pid, rc); > >> + } > >> > >> status = *line; > >> free(line); > >> -- > >> 2.43.0 > >> > >> > > > thanks, > -- Shuah

6 days, 15 hours

2
1
0 0

[PATCH] selftests/net: initialize char variable to null

by Ankit Khushwaha

char variable in 'so_txtime.c' & 'txtimestamp.c' left uninitilized by when switch default case taken. raises following warning. txtimestamp.c:240:2: warning: variable 'tsname' is used uninitialized whenever switch default is taken [-Wsometimes-uninitialized] so_txtime.c:210:3: warning: variable 'reason' is used uninitialized whenever switch default is taken [-Wsometimes-uninitialized] initialize these variables to NULL to fix this. Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux(a)gmail.com> --- tools/testing/selftests/net/so_txtime.c | 2 +- tools/testing/selftests/net/txtimestamp.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/net/so_txtime.c b/tools/testing/selftests/net/so_txtime.c index 8457b7ccbc09..b76df1efc2ef 100644 --- a/tools/testing/selftests/net/so_txtime.c +++ b/tools/testing/selftests/net/so_txtime.c @@ -174,7 +174,7 @@ static int do_recv_errqueue_timeout(int fdt) msg.msg_controllen = sizeof(control); while (1) { - const char *reason; + const char *reason = NULL; ret = recvmsg(fdt, &msg, MSG_ERRQUEUE); if (ret == -1 && errno == EAGAIN) diff --git a/tools/testing/selftests/net/txtimestamp.c b/tools/testing/selftests/net/txtimestamp.c index dae91eb97d69..bcc14688661d 100644 --- a/tools/testing/selftests/net/txtimestamp.c +++ b/tools/testing/selftests/net/txtimestamp.c @@ -217,7 +217,7 @@ static void print_timestamp_usr(void) static void print_timestamp(struct scm_timestamping *tss, int tstype, int tskey, int payload_len) { - const char *tsname; + const char *tsname = NULL; validate_key(tskey, tstype); -- 2.52.0

6 days, 16 hours

2
6
0 0

[PATCH] selftests: tpm2: Fix ill defined assertions

by Maurice Hieronymus

Remove parentheses around assert statements in Python. With parentheses, assert always evaluates to True, making the checks ineffective. Signed-off-by: Maurice Hieronymus <mhi(a)mailbox.org> --- tools/testing/selftests/tpm2/tpm2.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/tpm2/tpm2.py b/tools/testing/selftests/tpm2/tpm2.py index bba8cb54548e..3d130c30bc7c 100644 --- a/tools/testing/selftests/tpm2/tpm2.py +++ b/tools/testing/selftests/tpm2/tpm2.py @@ -437,7 +437,7 @@ class Client: def extend_pcr(self, i, dig, bank_alg = TPM2_ALG_SHA1): ds = get_digest_size(bank_alg) - assert(ds == len(dig)) + assert ds == len(dig) auth_cmd = AuthCommand() @@ -589,7 +589,7 @@ class Client: def seal(self, parent_key, data, auth_value, policy_dig, name_alg = TPM2_ALG_SHA1): ds = get_digest_size(name_alg) - assert(not policy_dig or ds == len(policy_dig)) + assert not policy_dig or ds == len(policy_dig) attributes = 0 if not policy_dig: base-commit: 821e6e2a328bb907d40f8d1141d8b6c079aa7340 -- 2.50.1

6 days, 17 hours

2
1
0 0

[PATCH net-next 12/12] selftests: net: selftest for ipvlan-macnat mode

by Dmitry Skorodumov

Implemented a self-test for ipvlan in l2macnat mode. The test verifies: 1) It's not possible to configure an ip in l2macnat mode on ipvtap 2) It creates several net namespaces - Default namespace emulates host, - ipvlan-tst-phy emulates some host in remote network - ipvlan-tst-0/1 emulate VMs on host. Test verifies, that MAC addresses are as expected in ARP/NEIGH tables: all MACs in 'tst-phy' points to "host" mac-address all MACs in Default and tst are real ones 3) The l2macnat mode has limited number of addresses remembered on port. Test verifies, that this limit really works. Signed-off-by: Dmitry Skorodumov <skorodumov.dmitry(a)huawei.com> --- tools/testing/selftests/net/Makefile | 2 + .../selftests/net/ipvtap_macnat_bridge.py | 168 +++++++++ .../selftests/net/ipvtap_macnat_test.sh | 333 ++++++++++++++++++ 3 files changed, 503 insertions(+) create mode 100755 tools/testing/selftests/net/ipvtap_macnat_bridge.py create mode 100755 tools/testing/selftests/net/ipvtap_macnat_test.sh diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index b5127e968108..050d864f0bd9 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -49,6 +49,7 @@ TEST_PROGS := \ ipv6_flowlabel.sh \ ipv6_force_forwarding.sh \ ipv6_route_update_soft_lockup.sh \ + ipvtap_macnat_test.sh \ l2_tos_ttl_inherit.sh \ l2tp.sh \ link_netns.py \ @@ -191,6 +192,7 @@ TEST_GEN_PROGS := \ TEST_FILES := \ fcnal-test.sh \ in_netns.sh \ + ipvtap_macnat_bridge.py \ lib.sh \ settings \ setup_loopback.sh \ diff --git a/tools/testing/selftests/net/ipvtap_macnat_bridge.py b/tools/testing/selftests/net/ipvtap_macnat_bridge.py new file mode 100755 index 000000000000..7dc4a626e5bb --- /dev/null +++ b/tools/testing/selftests/net/ipvtap_macnat_bridge.py @@ -0,0 +1,168 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 + +""" +Script to bridge ipvtap and tap, +needed to simulate behaviour of virtual machine using ipvtap. + +ipvtap in macnat mode cannot have IP address. +Due to limitations of ipvtap, it also cannot be plugged +into bridge. +Use this script to connect ipvtap and tap and assing IP to tap. +""" + +import socket +import os +import select +import sys +import signal +import fcntl +import struct +import subprocess + +# Linux TUN/TAP constants +TUNSETIFF = 0x400454ca +IFF_TUN = 0x0001 +IFF_TAP = 0x0002 +IFF_NO_PI = 0x1000 + +ns_name = "non-initialized" + +class TapBridge: + """Simple class to bridge ipvtap and tap interfaces""" + def __init__(self, tap, ipvtap, buffer_size=65536): + self.tap_name = tap + self.ipvtap_name = ipvtap + self.buffer_size = buffer_size + self.running = False + + def open_ipvtap_sock(self, tap_name): + """Open a IPVTAP interface using raw socket""" + try: + sock = socket.socket(socket.AF_PACKET, + socket.SOCK_RAW, + socket.ntohs(0x0003)) + sock.bind((tap_name, 0)) + sock.setblocking(False) + print(f"Connected to IPVTAP interface: {tap_name}") + return sock + + except (OSError, IOError) as e: + print(f"Error opening IPVTAP interface {tap_name}: {e}") + return None + + def create_tap_interface(self, tap_name): + """Create and configure a TAP interface using /dev/net/tun""" + try: + # Open the tun device + tun_fd = os.open('/dev/net/tun', os.O_RDWR) + if tun_fd < 0: + raise OSError("Failed to open /dev/net/tun (err: {os.errno})") + + # Prepare the ifr structure + tap_name_bytes = tap_name.encode('utf-8') + ifr = struct.pack('16sH', tap_name_bytes, IFF_TAP | IFF_NO_PI) + + # Set the interface name and flags + result = fcntl.ioctl(tun_fd, TUNSETIFF, ifr) + + # Get the actual interface name that was set + unpacked = struct.unpack('16sH', result) + actual_name = unpacked[0].split(b'\x00')[0].decode() + print(f"Created TAP interface: {actual_name}") + + return tun_fd + + except (OSError, IOError) as e: + print(f"Error creating TAP interface {tap_name}: {e}") + return None + + def forward_data(self, from_fd, to_fd, description): + """Forward data from one file descriptor to another""" + try: + data = os.read(from_fd, self.buffer_size) + if data: + os.write(to_fd, data) + return True + return False + + except BlockingIOError: + return True + except (OSError, IOError) as e: + print(f"Error forwarding data {description}: {e}") + return False + + def run(self): + """Main bridge loop""" + # Create TAP interfaces + tap1_fd = self.create_tap_interface(self.tap_name) + + sock = self.open_ipvtap_sock(self.ipvtap_name) + tap2_fd = sock.fileno() + + if tap1_fd is None or tap2_fd is None: + print("Failed to create TAP interfaces") + return + + print("Press Ctrl+C to stop\n") + + self.running = True + stats = {'tap1_to_tap2': 0, 'tap2_to_tap1': 0} + while self.running: + try: + # Use select to monitor both file descriptors + readable, _, _ = select.select([tap1_fd, tap2_fd], [], [], 1.0) + + for fd in readable: + if fd == tap1_fd: + descr = f"from {self.tap_name} to {self.ipvtap_name}" + if self.forward_data(tap1_fd, tap2_fd, descr): + stats['tap1_to_tap2'] += 1 + else: + self.running = False + elif fd == tap2_fd: + descr = f"from {self.ipvtap_name} to {self.tap_name}" + if self.forward_data(tap2_fd, tap1_fd, descr): + stats['tap2_to_tap1'] += 1 + else: + self.running = False + + except KeyboardInterrupt: + print("\nShutting down...") + self.running = False + except (OSError, IOError) as e: + print(f"Error in main loop: {e}") + self.running = False + + # Cleanup + os.close(tap1_fd) + os.close(tap2_fd) + print(f"Bridge stopped in {ns_name}. Stats: {stats}") + + +def signal_handler(_sig, _frame): + """SIGINT handler for macnat bridge""" + print(f'\nReceived interrupt signal, shutting down bridge in {ns_name}') + sys.exit(0) + + +if __name__ == "__main__": + ns_name = subprocess.getoutput("ip netns identify") or "default" + + signal.signal(signal.SIGINT, signal_handler) + + # Check if running as root + if os.geteuid() != 0: + print("ERROR: This script must be run as root!") + sys.exit(1) + + if len(sys.argv) != 3: + print("Usage: tap_bridge.py tap_name ipvtap_name") + sys.exit(1) + + TAP = sys.argv[1] + IPVTAP = sys.argv[2] + + print(f"Starting TAP bridge between {TAP} and {IPVTAP} in {ns_name}") + bridge = TapBridge(TAP, IPVTAP) + bridge.run() diff --git a/tools/testing/selftests/net/ipvtap_macnat_test.sh b/tools/testing/selftests/net/ipvtap_macnat_test.sh new file mode 100755 index 000000000000..927d75af776b --- /dev/null +++ b/tools/testing/selftests/net/ipvtap_macnat_test.sh @@ -0,0 +1,333 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Tests for ipvtap in macnat mode + +NS_TST0=ipvlan-tst-0 +NS_TST1=ipvlan-tst-1 +NS_PHY=ipvlan-tst-phy + +IP_HOST=172.25.0.1 +IP_PHY=172.25.0.2 +IP_TST0=172.25.0.10 +IP_TST1=172.25.0.30 + +IP_OK0=("172.25.0.10" "172.25.0.11" "172.25.0.12" "172.25.0.13") +IP6_OK0=("fc00::10" "fc00::11" "fc00::12" "fc00::13" ) + +IP_OVFL0="172.25.0.14" +IP6_OVFL0="fc00::14" + +IP6_HOST=fc00::1 +IP6_PHY=fc00::2 +IP6_TST0=fc00::10 +IP6_TST1=fc00::30 + +MAC_HOST="92:3a:00:00:00:01" +MAC_PHY="92:3a:00:00:00:02" +MAC_TST0="92:3a:00:00:00:10" +MAC_TST1="92:3a:00:00:00:30" + +VETH_HOST=vethtst +VETH_PHY=vethtst.p + +# +# The testing environment looks this way: +# +# |------HOST------| |------PHY-------| +# | veth<----------------->veth | +# |------|--|------| |----------------| +# | | +# | | |-----TST0-------| +# | |------------|----ipvtap | +# | |----------------| +# | +# | |-----TST1-------| +# |---------------|----ivtap | +# |----------------| +# +# The macnat mode is for virtual machines, so ipvtap-interface is supposed +# to be used only for traffic monitoring and doesn't have ip-address. +# +# To simulate a virtual machine on ipvtap, we create TAP-interfaces +# in TST environments and assing IP-addresses to them. +# TAP and IPVTAP are connected with simple python script. +# + +ns_run() { + ns=$1 + shift + if [[ "$ns" == "default" ]]; then + "$@" >/dev/null + else + ip netns exec "$ns" "$@" >/dev/null + fi +} + +configure_ns() { + local ns=$1 + local n=$2 + local ip=$3 + local ip6=$4 + local mac=$5 + + ns_run "$ns" ip link set lo up + + if ! ip link add netns "$ns" name "ipvtap0.$n" link $VETH_HOST \ + type ipvtap mode l2macnat bridge; then + exit_error "FAIL: Failed to configure ipvtap link." + fi + ns_run "$ns" ip link set "ipvtap0.$n" up + + ns_run "$ns" ip tuntap add mode tap "tap0.$n" + ns_run "$ns" ip link set dev "tap0.$n" address "$mac" + # disable dad + ns_run "$ns" sysctl -w "net/ipv6/conf/tap0.$n/accept_dad"=0 + ns_run "$ns" ip link set "tap0.$n" up + ns_run "$ns" ip a a "$ip/24" dev "tap0.$n" + ns_run "$ns" ip a a "$ip6/64" dev "tap0.$n" +} + +start_macnat_bridge() { + local ns=$1 + local n=$2 + ip netns exec "$ns" python3 ipvtap_macnat_bridge.py \ + "tap0.$n" "ipvtap0.$n" & +} + +configure_veth() { + local ns=$1 + local veth=$2 + local ip=$3 + local ip6=$4 + local mac=$5 + + ns_run "$ns" ip link set lo up + ns_run "$ns" ethtool -K "$veth" tx off rx off + ns_run "$ns" ip link set dev "$veth" address "$mac" + ns_run "$ns" ip link set "$veth" up + ns_run "$ns" ip a a "$ip/24" dev "$veth" + ns_run "$ns" ip a a "$ip6/64" dev "$veth" +} + +setup_env() { + ip netns add $NS_TST0 + ip netns add $NS_TST1 + ip netns add $NS_PHY + + # setup simulated other-host (phy) and host itself + ip link add $VETH_HOST type veth peer name $VETH_PHY \ + netns $NS_PHY >/dev/null + + # host config + configure_veth default $VETH_HOST $IP_HOST $IP6_HOST $MAC_HOST + configure_veth $NS_PHY $VETH_PHY $IP_PHY $IP6_PHY $MAC_PHY + + # TST namespaces config + configure_ns $NS_TST0 0 $IP_TST0 $IP6_TST0 $MAC_TST0 + configure_ns $NS_TST1 1 $IP_TST1 $IP6_TST1 $MAC_TST1 +} + +ping_all() { + # This will learn MAC/IP addresses on ipvtap + local ns=$1 + + ns_run "$ns" ping -c 1 $IP_TST0 + ns_run "$ns" ping -c 1 $IP6_TST0 + + ns_run "$ns" ping -c 1 $IP_TST1 + ns_run "$ns" ping -c 1 $IP6_TST1 + + ns_run "$ns" ping -c 1 $IP_HOST + ns_run "$ns" ping -c 1 $IP6_HOST + + ns_run "$ns" ping -c 1 $IP_PHY + ns_run "$ns" ping -c 1 $IP6_PHY +} + +check_mac_eq() { + # Ensure IP corresponds to MAC. + local ns=$1 + local ip=$2 + local mac=$3 + local dev=$4 + + if [[ "$ns" == "default" ]]; then + out=$( + ip neigh show "$ip" dev "$dev" \ + | grep "$ip" \ + | grep "$mac" + ) + else + out=$( + ip netns exec "$ns" \ + ip neigh show "$ip" dev "$dev" \ + | grep "$ip" \ + | grep "$mac" + ) + fi + + if [[ $out'X' == "X" ]]; then + exit_error "FAIL: '$ip' is not '$mac'" + fi +} + +cleanup_env() { + ip link del $VETH_HOST + ip netns del $NS_TST0 + ip netns del $NS_TST1 + ip netns del $NS_PHY +} + +exit_error() { + echo "$1" + exit 1 +} + +test_check_mac() { + # All IPs in NS_PHY should have MAC of the host + check_mac_eq $NS_PHY $IP_TST0 $MAC_HOST $VETH_PHY + check_mac_eq $NS_PHY $IP6_TST0 $MAC_HOST $VETH_PHY + check_mac_eq $NS_PHY $IP_TST1 $MAC_HOST $VETH_PHY + check_mac_eq $NS_PHY $IP6_TST1 $MAC_HOST $VETH_PHY + check_mac_eq $NS_PHY $IP_HOST $MAC_HOST $VETH_PHY + check_mac_eq $NS_PHY $IP6_HOST $MAC_HOST $VETH_PHY + + # All IPs in TST0 should have corresponding MAC + check_mac_eq $NS_TST0 $IP_HOST $MAC_HOST tap0.0 + check_mac_eq $NS_TST0 $IP6_HOST $MAC_HOST tap0.0 + check_mac_eq $NS_TST0 $IP_TST1 $MAC_TST1 tap0.0 + check_mac_eq $NS_TST0 $IP6_TST1 $MAC_TST1 tap0.0 + check_mac_eq $NS_TST0 $IP_PHY $MAC_PHY tap0.0 + check_mac_eq $NS_TST0 $IP6_PHY $MAC_PHY tap0.0 + + # All IPs in host should have corresponding MAC + check_mac_eq default $IP_TST0 $MAC_TST0 $VETH_HOST + check_mac_eq default $IP6_TST0 $MAC_TST0 $VETH_HOST + check_mac_eq default $IP_TST1 $MAC_TST1 $VETH_HOST + check_mac_eq default $IP6_TST1 $MAC_TST1 $VETH_HOST + check_mac_eq default $IP_PHY $MAC_PHY $VETH_HOST + check_mac_eq default $IP6_PHY $MAC_PHY $VETH_HOST +} + +test_ip_add() { + # adding IPs to ipvtap should be forbidden and should fail + if ns_run $NS_TST0 ip a a 172.26.0.1/24 dev ipvtap0.0; then + exit_error "FAIL: Module allowed to add ip to ipvtap." + fi + + if ns_run $NS_TST0 ip a a fc01::1/64 dev ipvtap0.0; then + exit_error "FAIL: Module allowed to add ip6 to ipvtap." + fi +} + +test_ip_overflow() { + # The ipvtap remembers limited number of addresses on interface. + # Let's overflow it and check that oldest one doesn't work. + + ns_run $NS_TST0 ip addr flush dev tap0.0 + + # Add exactly 4 ip addresses + for ip in "${IP_OK0[@]}"; do + ns_run $NS_TST0 ip a a "$ip/24" dev tap0.0 + ns_run $NS_TST0 ping -c 1 $IP_HOST -I "$ip" + done + + # Initial check that ping works + if ! ping -c 2 $IP_TST0; then + exit_error "FAIL: Failed to ping tst0" + fi + + # Add 1 more ip addresses + ns_run "$NS_TST0" ip a a $IP_OVFL0/24 dev tap0.0 + ns_run $NS_TST0 ping -c 1 $IP_HOST -I $IP_OVFL0 + # check that ping to oldest one from host fails. + echo "the next ping should fail:" + if ping -c 2 $IP_TST0; then + exit_error "FAIL: IP-0 still exists on interface" + fi + + # ping host using address-0 and force relearn of IP0. + # Host should be able ping after that + ns_run $NS_TST0 ping -c 1 $IP_HOST -I $IP_TST0 + + if ! ping -c 2 $IP_TST0; then + exit_error "FAIL: Failed to ping tst0 at stage 3" + fi +} + +test_ip6_overflow() { + # The ipvtap stores limited number of addresses on interface. + # Let's overflow it and check that oldest one doesn't work. + + ns_run $NS_TST0 ip addr flush dev tap0.0 + + # Add exactly 4 ip addresses + for ip6 in "${IP6_OK0[@]}"; do + ns_run $NS_TST0 ip a a "$ip6/64" dev tap0.0 + ns_run $NS_TST0 ping -c 1 $IP6_HOST -I "$ip6" + done + + # Initial check that ping6 works + if ! ping -c 2 $IP6_TST0; then + exit_error "FAIL: Failed to ping6 tst0" + fi + + # Add 1 more ip6 addresses + ns_run $NS_TST0 ip a a $IP6_OVFL0/64 dev tap0.0 + ns_run $NS_TST0 ping -c 1 $IP6_HOST -I $IP6_OVFL0 + # check that ping to oldest one from host fails. + echo "the next ping should fail:" + if ping -c 2 $IP6_TST0; then + exit_error "FAIL: IP6-0 still exists on interface" + fi + + # ping host using address-0 and force relearn of IP0. + # Host should be able ping after that + ns_run $NS_TST0 ping -c 1 $IP6_HOST -I $IP6_TST0 + if ! ping -c 2 $IP6_TST0; then + exit_error "FAIL: Failed to ping6 tst0 at stage 3" + fi +} + +exec_test() { + echo "TEST: $2" + $1 + echo "PASSED: $2" +} + +trap cleanup_env EXIT + +echo "ipvlan macnat tests" +echo "===================" + +modprobe -q tap +modprobe -q ipvlan +modprobe -q ipvtap + +setup_env + +exec_test test_ip_add "ip add not allowed" + +start_macnat_bridge $NS_TST0 0 +mb_pid1=$! +start_macnat_bridge $NS_TST1 1 +mb_pid2=$! + +echo "<<< Preparation: pinging all...." +ping_all default +ping_all $NS_TST0 +ping_all $NS_TST1 +ping_all $NS_PHY +echo "Finished preparational pinging all. >>>" + +exec_test test_check_mac "mac correctness" +exec_test test_ip_overflow "ip learn capacity overflow" +exec_test test_ip6_overflow "ip6 learn capacity overflow" + +kill -INT $mb_pid1 +kill -INT $mb_pid2 +wait $mb_pid1 +wait $mb_pid2 + +echo "All tests passed" -- 2.25.1

6 days, 18 hours

2
1
0 0

[PATCH 0/5] mm, kvm: add guest_memfd support for uffd minor faults

by Mike Rapoport

From: "Mike Rapoport (Microsoft)" <rppt(a)kernel.org> Hi, These patches allow guest_memfd to notify userspace about minor page faults using userfaultfd and let userspace to resolve these page faults using UFFDIO_CONTINUE. To allow UFFDIO_CONTINUE outside of the core mm I added a get_shmem_folio() callback to vm_ops that allows an address space backing a VMA to return a folio that exists in it's page cache (patch 2) In order for guest_memfd to notify userspace about page faults, there is a new VM_FAULT_UFFD_MINOR that a ->fault() handler can return to inform the page fault handler that it needs to call handle_userfault() to complete the fault (patch 3). Patch 4 plumbs these new goodies into guest_memfd. This series is the minimal change I've been able to come up with to allow integration of guest_memfd with uffd and while refactoring uffd and making mfill_atomic() flow more linear would have been a nice improvement, it's way out of the scope of enabling uffd with guest_memfd. v2 changes: * Introduce VM_FAULF_UFFD_MINOR to avoid exporting handle_userfault() * Simplify vma_can_mfill_atomic() * Rename get_pagecache_folio() to get_shared_folio() and use inode instead of vma as its argument v1: https://lore.kernel.org/all/20251117114631.2029447-1-rppt@kernel.org Mike Rapoport (Microsoft) (4): userfaultfd: move vma_can_userfault out of line userfaultfd, shmem: use a VMA callback to handle UFFDIO_CONTINUE mm: introduce VM_FAULT_UFFD_MINOR fault reason guest_memfd: add support for userfaultfd minor mode Nikita Kalyazin (1): KVM: selftests: test userfaultfd minor for guest_memfd include/linux/mm.h | 9 ++ include/linux/mm_types.h | 3 + include/linux/userfaultfd_k.h | 36 +----- mm/memory.c | 2 + mm/shmem.c | 21 +++- mm/userfaultfd.c | 80 +++++++++++--- .../testing/selftests/kvm/guest_memfd_test.c | 103 ++++++++++++++++++ virt/kvm/guest_memfd.c | 29 +++++ 8 files changed, 232 insertions(+), 51 deletions(-) base-commit: 6a23ae0a96a600d1d12557add110e0bb6e32730c -- 2.50.1

6 days, 18 hours

2
10
0 0

[PATCH] selftests/run_kselftest.sh: Add `--skip` argument option

by Ricardo B. Marlière

Currently the only way of excluding certain tests from a collection is by passing all the other tests explicitly via `--test`. Therefore, if the user wants to skip a single test the resulting command line might be too big, depending on the collection. Add an option `--skip` that takes care of that. Signed-off-by: Ricardo B. Marlière <rbm(a)suse.com> --- tools/testing/selftests/run_kselftest.sh | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/tools/testing/selftests/run_kselftest.sh b/tools/testing/selftests/run_kselftest.sh index d4be97498b32..84d45254675c 100755 --- a/tools/testing/selftests/run_kselftest.sh +++ b/tools/testing/selftests/run_kselftest.sh @@ -30,6 +30,7 @@ Usage: $0 [OPTIONS] -s | --summary Print summary with detailed log in output.log (conflict with -p) -p | --per-test-log Print test log in /tmp with each test name (conflict with -s) -t | --test COLLECTION:TEST Run TEST from COLLECTION + -S | --skip COLLECTION:TEST Skip TEST from COLLECTION -c | --collection COLLECTION Run all tests from COLLECTION -l | --list List the available collection:test entries -d | --dry-run Don't actually run any tests @@ -43,6 +44,7 @@ EOF COLLECTIONS="" TESTS="" +SKIP="" dryrun="" kselftest_override_timeout="" ERROR_ON_FAIL=true @@ -58,6 +60,9 @@ while true; do -t | --test) TESTS="$TESTS $2" shift 2 ;; + -S | --skip) + SKIP="$SKIP $2" + shift 2 ;; -c | --collection) COLLECTIONS="$COLLECTIONS $2" shift 2 ;; @@ -109,6 +114,12 @@ if [ -n "$TESTS" ]; then done available="$(echo "$valid" | sed -e 's/ /\n/g')" fi +# Remove tests to be skipped from available list +if [ -n "$SKIP" ]; then + for skipped in $SKIP ; do + available="$(echo "$available" | grep -v "^${skipped}$")" + done +fi kselftest_failures_file="$(mktemp --tmpdir kselftest-failures-XXXXXX)" export kselftest_failures_file --- base-commit: a2f7990d330937a204b86b9cafbfef82f87a8693 change-id: 20251125-selftests-add_skip_opt-0f3fd24d7afa Best regards, -- Ricardo B. Marlière <rbm(a)suse.com>

6 days, 19 hours

2
1
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror November 2025