November 2025 - Linux-kselftest-mirror

[PATCH bpf-next v3] selftests/bpf: Fix htab_update/reenter_update selftest failure

by Saket Kumar Bhaskar

Since commit 31158ad02ddb ("rqspinlock: Add deadlock detection and recovery") the updated path on re-entrancy now reports deadlock via -EDEADLK instead of the previous -EBUSY. Also, the way reentrancy was exercised (via fentry/lookup_elem_raw) has been fragile because lookup_elem_raw may be inlined (find_kernel_btf_id() will return -ESRCH). To fix this fentry is attached to bpf_obj_free_fields() instead of lookup_elem_raw() and: - The htab map is made to use a BTF-described struct val with a struct bpf_timer so that check_and_free_fields() reliably calls bpf_obj_free_fields() on element replacement. - The selftest is updated to do two updates to the same key (insert + replace) in prog_test. - The selftest is updated to align with expected errno with the kernel’s current behavior. Signed-off-by: Saket Kumar Bhaskar <skb99(a)linux.ibm.com> --- Changes since v2: Addressed CI failures: * Initialize key to 0 before the first update. * Used pointer value to pass for update and memset rather than &value. v2: https://lore.kernel.org/all/20251114152653.356782-1-skb99@linux.ibm.com/ Changes since v1: Addressed comments from Alexei: * Fixed the scenario where test may fail when lookup_elem_raw() is inlined. v1: https://lore.kernel.org/all/20251106052628.349117-1-skb99@linux.ibm.com/ .../selftests/bpf/prog_tests/htab_update.c | 37 ++++++++++++++----- .../testing/selftests/bpf/progs/htab_update.c | 19 +++++++--- 2 files changed, 41 insertions(+), 15 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/htab_update.c b/tools/testing/selftests/bpf/prog_tests/htab_update.c index 2bc85f4814f4..d0b405eb2966 100644 --- a/tools/testing/selftests/bpf/prog_tests/htab_update.c +++ b/tools/testing/selftests/bpf/prog_tests/htab_update.c @@ -15,17 +15,17 @@ struct htab_update_ctx { static void test_reenter_update(void) { struct htab_update *skel; - unsigned int key, value; + void *value = NULL; + unsigned int key, value_size; int err; skel = htab_update__open(); if (!ASSERT_OK_PTR(skel, "htab_update__open")) return; - /* lookup_elem_raw() may be inlined and find_kernel_btf_id() will return -ESRCH */ - bpf_program__set_autoload(skel->progs.lookup_elem_raw, true); + bpf_program__set_autoload(skel->progs.bpf_obj_free_fields, true); err = htab_update__load(skel); - if (!ASSERT_TRUE(!err || err == -ESRCH, "htab_update__load") || err) + if (!ASSERT_TRUE(!err, "htab_update__load") || err) goto out; skel->bss->pid = getpid(); @@ -33,14 +33,33 @@ static void test_reenter_update(void) if (!ASSERT_OK(err, "htab_update__attach")) goto out; - /* Will trigger the reentrancy of bpf_map_update_elem() */ + value_size = bpf_map__value_size(skel->maps.htab); + + value = calloc(1, value_size); + if (!ASSERT_OK_PTR(value, "calloc value")) + goto out; + /* + * First update: plain insert. This should NOT trigger the re-entrancy + * path, because there is no old element to free yet. + */ key = 0; - value = 0; - err = bpf_map_update_elem(bpf_map__fd(skel->maps.htab), &key, &value, 0); - if (!ASSERT_OK(err, "add element")) + err = bpf_map_update_elem(bpf_map__fd(skel->maps.htab), &key, value, BPF_ANY); + if (!ASSERT_OK(err, "first update (insert)")) + goto out; + + /* + * Second update: replace existing element with same key and trigger + * the reentrancy of bpf_map_update_elem(). + * check_and_free_fields() calls bpf_obj_free_fields() on the old + * value, which is where fentry program runs and performs a nested + * bpf_map_update_elem(), triggering -EDEADLK. + */ + memset(value, 0, value_size); + err = bpf_map_update_elem(bpf_map__fd(skel->maps.htab), &key, value, BPF_ANY); + if (!ASSERT_OK(err, "second update (replace)")) goto out; - ASSERT_EQ(skel->bss->update_err, -EBUSY, "no reentrancy"); + ASSERT_EQ(skel->bss->update_err, -EDEADLK, "no reentrancy"); out: htab_update__destroy(skel); } diff --git a/tools/testing/selftests/bpf/progs/htab_update.c b/tools/testing/selftests/bpf/progs/htab_update.c index 7481bb30b29b..195d3b2fba00 100644 --- a/tools/testing/selftests/bpf/progs/htab_update.c +++ b/tools/testing/selftests/bpf/progs/htab_update.c @@ -6,24 +6,31 @@ char _license[] SEC("license") = "GPL"; +/* Map value type: has BTF-managed field (bpf_timer) */ +struct val { + struct bpf_timer t; + __u64 payload; +}; + struct { __uint(type, BPF_MAP_TYPE_HASH); __uint(max_entries, 1); - __uint(key_size, sizeof(__u32)); - __uint(value_size, sizeof(__u32)); + __type(key, __u32); + __type(value, struct val); } htab SEC(".maps"); int pid = 0; int update_err = 0; -SEC("?fentry/lookup_elem_raw") -int lookup_elem_raw(void *ctx) +SEC("?fentry/bpf_obj_free_fields") +int bpf_obj_free_fields(void *ctx) { - __u32 key = 0, value = 1; + __u32 key = 0; + struct val value = { .payload = 1 }; if ((bpf_get_current_pid_tgid() >> 32) != pid) return 0; - update_err = bpf_map_update_elem(&htab, &key, &value, 0); + update_err = bpf_map_update_elem(&htab, &key, &value, BPF_ANY); return 0; } -- 2.51.0

1 week

3
2
0 0

[PATCH] selftests: tracing: Add tprobe enable/disable testcase

by Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) <mhiramat(a)kernel.org> Commit 2867495dea86 ("tracing: tprobe-events: Register tracepoint when enable tprobe event") caused regression bug and tprobe did not work. To prevent similar problems, add a testcase which enables/disables a tprobe and check the results. Signed-off-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org> --- .../test.d/dynevent/enable_disable_tprobe.tc | 40 ++++++++++++++++++++ 1 file changed, 40 insertions(+) create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/enable_disable_tprobe.tc diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/enable_disable_tprobe.tc b/tools/testing/selftests/ftrace/test.d/dynevent/enable_disable_tprobe.tc new file mode 100644 index 000000000000..c1f1cafa30f3 --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/dynevent/enable_disable_tprobe.tc @@ -0,0 +1,40 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: Generic dynamic event - enable/disable tracepoint probe events +# requires: dynamic_events "t[:[<group>/][<event>]] <tracepoint> [<args>]":README + +echo 0 > events/enable +echo > dynamic_events + +TRACEPOINT=sched_switch +ENABLEFILE=events/tracepoints/myprobe/enable + +:;: "Add tracepoint event on $TRACEPOINT" ;: + +echo "t:myprobe ${TRACEPOINT}" >> dynamic_events + +:;: "Check enable/disable to ensure it works" ;: + +echo 1 > $ENABLEFILE + +grep -q $TRACEPOINT trace + +echo 0 > $ENABLEFILE + +echo > trace + +! grep -q $TRACEPOINT trace + +:;: "Repeat enable/disable to ensure it works" ;: + +echo 1 > $ENABLEFILE + +grep -q $TRACEPOINT trace + +echo 0 > $ENABLEFILE + +echo > trace + +! grep -q $TRACEPOINT trace + +exit 0

1 week

4
7
0 0

[PATCH bpf-next v10 0/8] bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags for percpu maps

by Leon Hwang

This patch set introduces the BPF_F_CPU and BPF_F_ALL_CPUS flags for percpu maps, as the requirement of BPF_F_ALL_CPUS flag for percpu_array maps was discussed in the thread of "[PATCH bpf-next v3 0/4] bpf: Introduce global percpu data"[1]. The goal of BPF_F_ALL_CPUS flag is to reduce data caching overhead in light skeletons by allowing a single value to be reused to update values across all CPUs. This avoids the M:N problem where M cached values are used to update a map on N CPUs kernel. The BPF_F_CPU flag is accompanied by *flags*-embedded cpu info, which specifies the target CPU for the operation: * For lookup operations: the flag field alongside cpu info enable querying a value on the specified CPU. * For update operations: the flag field alongside cpu info enable updating value for specified CPU. Links: [1] https://lore.kernel.org/bpf/20250526162146.24429-1-leon.hwang@linux.dev/ Changes: v9 -> v10: * Add tests to verify array and hash maps do not support BPF_F_CPU and BPF_F_ALL_CPUS flags. * Address comment from Andrii: * Copy map value using copy_map_value_long for percpu_cgroup_storage maps in a separate patch. v8 -> v9: * Change value type from u64 to u32 in selftests. * Address comments from Andrii: * Keep value_size unaligned and update everywhere for consistency when cpu flags are specified. * Update value by getting pointer for percpu hash and percpu cgroup_storage maps. v7 -> v8: * Address comments from Andrii: * Check BPF_F_LOCK when update percpu_array, percpu_hash and lru_percpu_hash maps. * Refactor flags check in __htab_map_lookup_and_delete_batch(). * Keep value_size unaligned and copy value using copy_map_value() in __htab_map_lookup_and_delete_batch() when BPF_F_CPU is specified. * Update warn message in libbpf's validate_map_op(). * Update comment of libbpf's bpf_map__lookup_elem(). v6 -> v7: * Get correct value size for percpu_hash and lru_percpu_hash in update_batch API. * Set 'count' as 'max_entries' in test cases for lookup_batch API. * Address comment from Alexei: * Move cpu flags check into bpf_map_check_op_flags(). v5 -> v6: * Move bpf_map_check_op_flags() from 'bpf.h' to 'syscall.c'. * Address comments from Alexei: * Drop the refactoring code of data copying logic for percpu maps. * Drop bpf_map_check_op_flags() wrappers. v4 -> v5: * Address comments from Andrii: * Refactor data copying logic for all percpu maps. * Drop this_cpu_ptr() micro-optimization. * Drop cpu check in libbpf's validate_map_op(). * Enhance bpf_map_check_op_flags() using *allowed flags* instead of 'extra_flags_mask'. v3 -> v4: * Address comments from Andrii: * Remove unnecessary map_type check in bpf_map_value_size(). * Reduce code churn. * Remove unnecessary do_delete check in __htab_map_lookup_and_delete_batch(). * Introduce bpf_percpu_copy_to_user() and bpf_percpu_copy_from_user(). * Rename check_map_flags() to bpf_map_check_op_flags() with extra_flags_mask. * Add human-readable pr_warn() explanations in validate_map_op(). * Use flags in bpf_map__delete_elem() and bpf_map__lookup_and_delete_elem(). * Drop "for alignment reasons". v3 link: https://lore.kernel.org/bpf/20250821160817.70285-1-leon.hwang@linux.dev/ v2 -> v3: * Address comments from Alexei: * Use BPF_F_ALL_CPUS instead of BPF_ALL_CPUS magic. * Introduce these two cpu flags for all percpu maps. * Address comments from Jiri: * Reduce some unnecessary u32 cast. * Refactor more generic map flags check function. * A code style issue. v2 link: https://lore.kernel.org/bpf/20250805163017.17015-1-leon.hwang@linux.dev/ v1 -> v2: * Address comments from Andrii: * Embed cpu info as high 32 bits of *flags* totally. * Use ERANGE instead of E2BIG. * Few format issues. Leon Hwang (8): bpf: Introduce internal bpf_map_check_op_flags helper function bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_array maps bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_hash and lru_percpu_hash maps bpf: Copy map value using copy_map_value_long for percpu_cgroup_storage maps bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_cgroup_storage maps libbpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu maps selftests/bpf: Add cases to test BPF_F_CPU and BPF_F_ALL_CPUS flags include/linux/bpf-cgroup.h | 4 +- include/linux/bpf.h | 44 ++- include/uapi/linux/bpf.h | 2 + kernel/bpf/arraymap.c | 29 +- kernel/bpf/hashtab.c | 94 ++++-- kernel/bpf/local_storage.c | 27 +- kernel/bpf/syscall.c | 65 ++-- tools/include/uapi/linux/bpf.h | 2 + tools/lib/bpf/bpf.h | 8 + tools/lib/bpf/libbpf.c | 26 +- tools/lib/bpf/libbpf.h | 21 +- .../selftests/bpf/prog_tests/percpu_alloc.c | 312 ++++++++++++++++++ .../selftests/bpf/progs/percpu_alloc_array.c | 32 ++ 13 files changed, 562 insertions(+), 104 deletions(-) -- 2.51.2

1 week

2
10
0 0

[PATCH v6 0/9] futex: Create {set,get}_robust_list2() syscalls

by André Almeida

Hello, This version is a complete rewrite of the syscall (thanks Thomas for the suggestions!). * Use case The use-case for the new syscalls is detailed in the last patch version: https://lore.kernel.org/lkml/20250626-tonyk-robust_futex-v5-0-179194dbde8f@… * The syscall interface Documented at patches 3/9 "futex: Create set_robust_list2() syscall" and 4/9 "futex: Create get_robust_list2() syscall". * Testing I expanded the current robust list selftest to use the new interface, and also ported the original syscall to use the new syscall internals, and everything survived the tests. * Changelog Changes from v5: - Complete interface rewrite, there are so many changes but the main ones are the following points - Array of robust lists now has a static size, allocated once during the first usage of the list - Now that the list of robust lists have a fixed size, I removed the logic of having a command for creating a new index on the list. To simplify things for everyone, userspace just need to call set_robust_list2(head, 32-bit/64-bit type, index). - Created get_robust_list2() - The new code can be better integrated with the original interface - v5: https://lore.kernel.org/r/20250626-tonyk-robust_futex-v5-0-179194dbde8f@iga… Feedback is very welcomed! --- André Almeida (9): futex: Use explicit sizes for compat_robust_list structs futex: Make exit_robust_list32() unconditionally available for 64-bit kernels futex: Create set_robust_list2() syscall futex: Create get_robust_list2() syscall futex: Wire up set_robust_list2 syscall futex: Wire up get_robust_list2 syscall selftests/futex: Expand for set_robust_list2() selftests/futex: Expand for get_robust_list2() futex: Use new robust list API internally arch/alpha/kernel/syscalls/syscall.tbl | 2 + arch/arm/tools/syscall.tbl | 2 + arch/m68k/kernel/syscalls/syscall.tbl | 2 + arch/microblaze/kernel/syscalls/syscall.tbl | 2 + arch/mips/kernel/syscalls/syscall_n32.tbl | 2 + arch/mips/kernel/syscalls/syscall_n64.tbl | 2 + arch/mips/kernel/syscalls/syscall_o32.tbl | 2 + arch/parisc/kernel/syscalls/syscall.tbl | 2 + arch/powerpc/kernel/syscalls/syscall.tbl | 2 + arch/s390/kernel/syscalls/syscall.tbl | 2 + arch/sh/kernel/syscalls/syscall.tbl | 2 + arch/sparc/kernel/syscalls/syscall.tbl | 2 + arch/x86/entry/syscalls/syscall_32.tbl | 2 + arch/x86/entry/syscalls/syscall_64.tbl | 2 + arch/xtensa/kernel/syscalls/syscall.tbl | 2 + include/linux/compat.h | 13 +- include/linux/futex.h | 30 +- include/linux/sched.h | 6 +- include/uapi/asm-generic/unistd.h | 7 +- include/uapi/linux/futex.h | 26 ++ kernel/futex/core.c | 140 ++++-- kernel/futex/syscalls.c | 134 +++++- kernel/sys_ni.c | 2 + scripts/syscall.tbl | 1 + .../selftests/futex/functional/robust_list.c | 504 +++++++++++++++++++-- 25 files changed, 788 insertions(+), 105 deletions(-) --- base-commit: c42ba5a87bdccbca11403b7ca8bad1a57b833732 change-id: 20250225-tonyk-robust_futex-60adeedac695 Best regards, -- André Almeida <andrealmeid(a)igalia.com>

1 week, 1 day

4
18
0 0

[PATCH 0/2] selftests/nolibc: fix loongarch build with recent versions of clang

by Thomas Weißschuh

LLVM 21 switched to -mcmodel=medium for LoongArch64 compilations. This code model uses R_LARCH_ECALL36 relocations which might not be supported by GNU ld which the nolibc testsuite uses by default. Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net> --- Thomas Weißschuh (2): selftests/nolibc: use lld to link loongarch binaries selftests/nolibc: error out on linker warnings tools/testing/selftests/nolibc/Makefile.nolibc | 1 + tools/testing/selftests/nolibc/run-tests.sh | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) --- base-commit: 6059e06967aaac9bf736c6cec75b9bccaf5bbe18 change-id: 20251121-nolibc-lld-f32af4983cc0 Best regards, -- Thomas Weißschuh <linux(a)weissschuh.net>

1 week, 2 days

2
3
0 0

[PATCH bpf v1] selftests: test_tag: prog_tag is calculated using SHA256.

by Xing Guo

commit 603b44162325 ("bpf: Update the bpf_prog_calc_tag to use SHA256") changed digest of prog_tag to SHA256 but forgot to update tests correspondingly. This patch helps fix it. Fixes: 603b44162325 ("bpf: Update the bpf_prog_calc_tag to use SHA256") Signed-off-by: Xing Guo <higuoxing(a)gmail.com> --- tools/testing/selftests/bpf/test_tag.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/test_tag.c b/tools/testing/selftests/bpf/test_tag.c index 5546b05a0486..f1300047c1e0 100644 --- a/tools/testing/selftests/bpf/test_tag.c +++ b/tools/testing/selftests/bpf/test_tag.c @@ -116,7 +116,7 @@ static void tag_from_alg(int insns, uint8_t *tag, uint32_t len) static const struct sockaddr_alg alg = { .salg_family = AF_ALG, .salg_type = "hash", - .salg_name = "sha1", + .salg_name = "sha256", }; int fd_base, fd_alg, ret; ssize_t size; -- 2.51.2

1 week, 3 days

2
1
0 0

[PATCH] selftests/iommu: Fix array-bounds warning in get_hw_info

by Nirbhay Sharma

GCC warns about potential out-of-bounds access when the test provides a buffer smaller than struct iommu_test_hw_info: iommufd_utils.h:817:37: warning: array subscript 'struct iommu_test_hw_info[0]' is partly outside array bounds of 'struct iommu_test_hw_info_buffer_smaller[1]' [-Warray-bounds=] 817 | assert(!info->flags); | ~~~~^~~~~~~ The warning occurs because 'info' is cast to a pointer to the full 8-byte struct at the top of the function, but the buffer_smaller test case passes only a 4-byte buffer. While the code correctly checks data_len before accessing each field, GCC's flow analysis with inlining doesn't recognize that the size check protects the access. Fix this by accessing fields through appropriately-typed pointers that match the actual field sizes (__u32), declared only after the bounds check. This makes the relationship between the size check and memory access explicit to the compiler. Signed-off-by: Nirbhay Sharma <nirbhay.lkd(a)gmail.com> --- tools/testing/selftests/iommu/iommufd_utils.h | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 9f472c20c190..37c1b994008c 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -770,7 +770,6 @@ static int _test_cmd_get_hw_info(int fd, __u32 device_id, __u32 data_type, void *data, size_t data_len, uint32_t *capabilities, uint8_t *max_pasid) { - struct iommu_test_hw_info *info = (struct iommu_test_hw_info *)data; struct iommu_hw_info cmd = { .size = sizeof(cmd), .dev_id = device_id, @@ -810,11 +809,19 @@ static int _test_cmd_get_hw_info(int fd, __u32 device_id, __u32 data_type, } } - if (info) { - if (data_len >= offsetofend(struct iommu_test_hw_info, test_reg)) - assert(info->test_reg == IOMMU_HW_INFO_SELFTEST_REGVAL); - if (data_len >= offsetofend(struct iommu_test_hw_info, flags)) - assert(!info->flags); + if (data) { + if (data_len >= offsetofend(struct iommu_test_hw_info, + test_reg)) { + __u32 *test_reg = (__u32 *)data + 1; + + assert(*test_reg == IOMMU_HW_INFO_SELFTEST_REGVAL); + } + if (data_len >= offsetofend(struct iommu_test_hw_info, + flags)) { + __u32 *flags = data; + + assert(!*flags); + } } if (max_pasid) -- 2.48.1

1 week, 3 days

2
1
0 0

[PATCH v2 00/10] KVM: nVMX: Improve performance for unmanaged guest memory

by griffoul＠gmail.com

From: Fred Griffoul <fgriffo(a)amazon.co.uk> This patch series addresses both performance and correctness issues in nested VMX when handling guest memory. During nested VMX operations, L0 (KVM) accesses specific L1 guest pages to manage L2 execution. These pages fall into two categories: pages accessed only by L0 (such as the L1 MSR bitmap page or the eVMCS page), and pages passed to the L2 guest via vmcs02 (such as APIC access, virtual APIC, and posted interrupt descriptor pages). The current implementation uses kvm_vcpu_map/unmap, which causes two issues. First, the current approach is missing proper invalidation handling in critical scenarios. Enlightened VMCS (eVMCS) pages can become stale when memslots are modified, as there is no mechanism to invalidate the cached mappings. Similarly, APIC access and virtual APIC pages can be migrated by the host, but without proper notification through mmu_notifier callbacks, the mappings become invalid and can lead to incorrect behavior. Second, for unmanaged guest memory (memory not directly mapped by the kernel, such as memory passed with the mem= parameter or guest_memfd for non-CoCo VMs), this workflow invokes expensive memremap/memunmap operations on every L2 VM entry/exit cycle. This creates significant overhead that impacts nested virtualization performance. This series replaces kvm_host_map with gfn_to_pfn_cache in nested VMX. The pfncache infrastructure maintains persistent mappings as long as the page GPA does not change, eliminating the memremap/memunmap overhead on every VM entry/exit cycle. Additionally, pfncache provides proper invalidation handling via mmu_notifier callbacks and memslots generation check, ensuring that mappings are correctly updated during both memslot updates and page migration events. As an example, a microbenchmark using memslot_perf_test with 8192 memslots demonstrates huge improvements in nested VMX operations with unmanaged guest memory: Before After Improvement map: 26.12s 1.54s ~17x faster unmap: 40.00s 0.017s ~2353x faster unmap chunked: 10.07s 0.005s ~2014x faster The series is organized as follows: Patches 1-5 handle the L1 MSR bitmap page and system pages (APIC access, virtual APIC, and posted interrupt descriptor). Patch 1 converts the MSR bitmap to use gfn_to_pfn_cache. Patches 2-3 restore and complete "guest-uses-pfn" support in pfncache. Patch 4 converts the system pages to use gfn_to_pfn_cache. Patch 5 adds a selftest for cache invalidation and memslot updates. Patches 6-7 add enlightened VMCS support. Patch 6 avoids accessing eVMCS fields after they are copied into the cached vmcs12 structure. Patch 7 converts eVMCS page mapping to use gfn_to_pfn_cache. Patches 8-10 implement persistent nested context to handle L2 vCPU multiplexing and migration between L1 vCPUs. Patch 8 introduces the nested context management infrastructure. Patch 9 integrates pfncache with persistent nested context. Patch 10 adds a selftest for this L2 vCPU context switching. v2: - Extended series to support enlightened VMCS (eVMCS). - Added persistent nested context for improved L2 vCPU handling. - Added additional selftests. Suggested-by: dwmw(a)amazon.co.uk Fred Griffoul (10): KVM: nVMX: Implement cache for L1 MSR bitmap KVM: pfncache: Restore guest-uses-pfn support KVM: x86: Add nested state validation for pfncache support KVM: nVMX: Implement cache for L1 APIC pages KVM: selftests: Add nested VMX APIC cache invalidation test KVM: nVMX: Cache evmcs fields to ensure consistency during VM-entry KVM: nVMX: Replace evmcs kvm_host_map with pfncache KVM: x86: Add nested context management KVM: nVMX: Use nested context for pfncache persistence KVM: selftests: Add L2 vcpu context switch test arch/x86/include/asm/kvm_host.h | 32 ++ arch/x86/include/uapi/asm/kvm.h | 2 + arch/x86/kvm/Makefile | 2 +- arch/x86/kvm/nested.c | 199 ++++++++ arch/x86/kvm/vmx/hyperv.c | 5 +- arch/x86/kvm/vmx/hyperv.h | 33 +- arch/x86/kvm/vmx/nested.c | 463 ++++++++++++++---- arch/x86/kvm/vmx/vmx.c | 8 + arch/x86/kvm/vmx/vmx.h | 16 +- arch/x86/kvm/x86.c | 19 +- include/linux/kvm_host.h | 34 +- include/linux/kvm_types.h | 1 + tools/testing/selftests/kvm/Makefile.kvm | 2 + .../selftests/kvm/x86/vmx_apic_update_test.c | 302 ++++++++++++ .../selftests/kvm/x86/vmx_l2_switch_test.c | 416 ++++++++++++++++ virt/kvm/kvm_main.c | 3 +- virt/kvm/kvm_mm.h | 6 +- virt/kvm/pfncache.c | 43 +- 18 files changed, 1467 insertions(+), 119 deletions(-) create mode 100644 arch/x86/kvm/nested.c create mode 100644 tools/testing/selftests/kvm/x86/vmx_apic_update_test.c create mode 100644 tools/testing/selftests/kvm/x86/vmx_l2_switch_test.c -- 2.43.0

1 week, 3 days

2
13
0 0

[PATCH bpf-next v1 0/3] bpf: Fix FIONREAD and copied_seq issues

by Jiayuan Chen

syzkaller reported a bug [1] where a socket using sockmap, after being unloaded, exposed incorrect copied_seq calculation. The selftest I provided can be used to reproduce the issue reported by syzkaller. TCP recvmsg seq # bug 2: copied E92C873, seq E68D125, rcvnxt E7CEB7C, fl 40 WARNING: CPU: 1 PID: 5997 at net/ipv4/tcp.c:2724 tcp_recvmsg_locked+0xb2f/0x2910 net/ipv4/tcp.c:2724 Call Trace: <TASK> receive_fallback_to_copy net/ipv4/tcp.c:1968 [inline] tcp_zerocopy_receive+0x131a/0x2120 net/ipv4/tcp.c:2200 do_tcp_getsockopt+0xe28/0x26c0 net/ipv4/tcp.c:4713 tcp_getsockopt+0xdf/0x100 net/ipv4/tcp.c:4812 do_sock_getsockopt+0x34d/0x440 net/socket.c:2421 __sys_getsockopt+0x12f/0x260 net/socket.c:2450 __do_sys_getsockopt net/socket.c:2457 [inline] __se_sys_getsockopt net/socket.c:2454 [inline] __x64_sys_getsockopt+0xbd/0x160 net/socket.c:2454 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f A sockmap socket maintains its own receive queue (ingress_msg) which may contain data from either its own protocol stack or forwarded from other sockets. FD1:read() -- FD1->copied_seq++ | [read data] | [enqueue data] v [sockmap] -> ingress to self -> ingress_msg queue FD1 native stack ------> ^ -- FD1->rcv_nxt++ -> redirect to other | [enqueue data] | | | ingress to FD1 v ^ ... | [sockmap] FD2 native stack The issue occurs when reading from ingress_msg: we update tp->copied_seq by default, but if the data comes from other sockets (not the socket's own protocol stack), tcp->rcv_nxt remains unchanged. Later, when converting back to a native socket, reads may fail as copied_seq could be significantly larger than rcv_nxt. Additionally, FIONREAD calculation based on copied_seq and rcv_nxt is insufficient for sockmap sockets, requiring separate field tracking. [1] https://syzkaller.appspot.com/bug?extid=06dbd397158ec0ea4983 Jiayuan Chen (3): bpf, sockmap: Fix incorrect copied_seq calculation bpf, sockmap: Fix FIONREAD for sockmap bpf, selftest: Add tests for FIONREAD and copied_seq include/linux/skmsg.h | 71 ++++++- net/core/skmsg.c | 20 +- net/ipv4/tcp_bpf.c | 26 ++- net/ipv4/udp_bpf.c | 25 ++- .../selftests/bpf/prog_tests/sockmap_basic.c | 192 +++++++++++++++++- .../bpf/progs/test_sockmap_pass_prog.c | 8 + 6 files changed, 325 insertions(+), 17 deletions(-) -- 2.43.0

1 week, 3 days

3
8
0 0

[PATCH bpf-next v2 0/2] selftests/bpf: networking test cleanups

by Hoyeon Lee

This series finishes the sockaddr_storage migration in the networking selftests by removing the remaining open-coded IPv4/IPv6 wrappers (addr_port/tuple in cls_redirect, sa46 in select_reuseport). The tests now use sockaddr_storage directly. No other custom socket-address wrappers remain after this series, so the churn stops here and behavior is unchanged. --- Changes in v2: - Drop the tuple wrapper entirely in cls_redirect and rely on ss_family - Limit the series to patches 1/2 (3/4 applied; 5 sent separately) Hoyeon Lee (2): selftests/bpf: use sockaddr_storage directly in cls_redirect test selftests/bpf: use sockaddr_storage instead of sa46 in select_reuseport test .../selftests/bpf/prog_tests/cls_redirect.c | 122 ++++++------------ .../bpf/prog_tests/select_reuseport.c | 67 +++++----- 2 files changed, 77 insertions(+), 112 deletions(-) -- 2.51.1

1 week, 3 days

3
4
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror November 2025