- Linux-kselftest-mirror - lists.linaro.org

[PATCH RESEND] selftests/futex: skip tests if shmget unsupported

by Carlos Llamas

On systems where the shmget() syscall is not supported, tests like anon_page and shared_waitv will fail. Skip these tests in such cases to allow the rest of the test suite to run. Signed-off-by: Carlos Llamas <cmllamas(a)google.com> --- tools/testing/selftests/futex/functional/futex_wait.c | 2 ++ tools/testing/selftests/futex/functional/futex_waitv.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/tools/testing/selftests/futex/functional/futex_wait.c b/tools/testing/selftests/futex/functional/futex_wait.c index 152ca4612886..1269642bb662 100644 --- a/tools/testing/selftests/futex/functional/futex_wait.c +++ b/tools/testing/selftests/futex/functional/futex_wait.c @@ -71,6 +71,8 @@ TEST(anon_page) /* Testing an anon page shared memory */ shm_id = shmget(IPC_PRIVATE, 4096, IPC_CREAT | 0666); if (shm_id < 0) { + if (errno == ENOSYS) + ksft_exit_skip("shmget syscall not supported\n"); perror("shmget"); exit(1); } diff --git a/tools/testing/selftests/futex/functional/futex_waitv.c b/tools/testing/selftests/futex/functional/futex_waitv.c index c684b10eb76e..3bc4e5dc70e7 100644 --- a/tools/testing/selftests/futex/functional/futex_waitv.c +++ b/tools/testing/selftests/futex/functional/futex_waitv.c @@ -86,6 +86,8 @@ TEST(shared_waitv) int shm_id = shmget(IPC_PRIVATE, 4096, IPC_CREAT | 0666); if (shm_id < 0) { + if (errno == ENOSYS) + ksft_exit_skip("shmget syscall not supported\n"); perror("shmget"); exit(1); } -- 2.52.0.rc1.455.g30608eb744-goog

1 month, 4 weeks

2
1
0 0

[PATCH net v5 0/3] mptcp: Fix conflicts between MPTCP and sockmap

by Jiayuan Chen

Overall, we encountered a warning [1] that can be triggered by running the selftest I provided. sockmap works by replacing sk_data_ready, recvmsg, sendmsg operations and implementing fast socket-level forwarding logic: 1. Users can obtain file descriptors through userspace socket()/accept() interfaces, then call BPF syscall to perform these replacements. 2. Users can also use the bpf_sock_hash_update helper (in sockops programs) to replace handlers when TCP connections enter ESTABLISHED state (BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB/BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB) However, when combined with MPTCP, an issue arises: MPTCP creates subflow sk's and performs TCP handshakes, so the BPF program obtains subflow sk's and may incorrectly replace their sk_prot. We need to reject such operations. In patch 1, we set psock_update_sk_prot to NULL in the subflow's custom sk_prot. Additionally, if the server's listening socket has MPTCP enabled and the client's TCP also uses MPTCP, we should allow the combination of subflow and sockmap. This is because the latest Golang programs have enabled MPTCP for listening sockets by default [2]. For programs already using sockmap, upgrading Golang should not cause sockmap functionality to fail. Patch 2 prevents the WARNING from occurring. Despite these patches fixing stream corruption, users of sockmap must set GODEBUG=multipathtcp=0 to disable MPTCP until sockmap fully supports it. [1] truncated warning: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 mptcp_stream_accept+0x34c/0x380 Modules linked in: RIP: 0010:mptcp_stream_accept+0x34c/0x380 RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202 PKRU: 55555554 Call Trace: <TASK> do_accept+0xeb/0x190 ? __x64_sys_pselect6+0x61/0x80 ? _raw_spin_unlock+0x12/0x30 ? alloc_fd+0x11e/0x190 __sys_accept4+0x8c/0x100 __x64_sys_accept+0x1f/0x30 x64_sys_call+0x202f/0x20f0 do_syscall_64+0x72/0x9a0 ? switch_fpu_return+0x60/0xf0 ? irqentry_exit_to_user_mode+0xdb/0x1e0 ? irqentry_exit+0x3f/0x50 ? clear_bhb_loop+0x50/0xa0 ? clear_bhb_loop+0x50/0xa0 ? clear_bhb_loop+0x50/0xa0 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> ---[ end trace 0000000000000000 ]--- [2]: https://go-review.googlesource.com/c/go/+/607715 --- v4 -> v5: Dropped redundant selftest code, updated the Fixes tag, and added a Reviewed-by tag. v3 -> v4: Addressed questions from Matthieu and Paolo, explained sockmap's operational mechanism, and finalized the changes v2 -> v3: Adopted Jakub Sitnicki's suggestions - atomic retrieval of sk_family is required v1 -> v2: Had initial discussion with Matthieu on sockmap and MPTCP technical details v4: https://lore.kernel.org/bpf/20251105113625.148900-1-jiayuan.chen@linux.dev/ v3: https://lore.kernel.org/bpf/20251023125450.105859-1-jiayuan.chen@linux.dev/ v2: https://lore.kernel.org/bpf/20251020060503.325369-1-jiayuan.chen@linux.dev/… v1: https://lore.kernel.org/mptcp/a0a2b87119a06c5ffaa51427a0964a05534fe6f1@linu… Jiayuan Chen (3): mptcp: disallow MPTCP subflows from sockmap net,mptcp: fix proto fallback detection with BPF selftests/bpf: Add mptcp test with sockmap net/mptcp/protocol.c | 6 +- net/mptcp/subflow.c | 8 + .../testing/selftests/bpf/prog_tests/mptcp.c | 141 ++++++++++++++++++ .../selftests/bpf/progs/mptcp_sockmap.c | 43 ++++++ 4 files changed, 196 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/mptcp_sockmap.c base-commit: 8c0726e861f3920bac958d76cf134b5a3aa14ce4 -- 2.43.0

1 month, 4 weeks

5
12
0 0

Re: [PATCH] selftests/cgroup: conform test to TAP format output

by Sebastian Chlad

On Fri, Nov 14, 2025 at 4:59 AM Guopeng Zhang <zhangguopeng(a)kylinos.cn> wrote: > > Hi Michal, > > Thanks for reviewing and pointing out [1]. > > > Could you please explain more why is the TAP layout beneficial? > > (I understand selftest are for oneself, i.e. human readable only by default.) > > Actually, selftests are no longer just something for developers to view locally; they are now extensively > run in CI and stable branch regression testing. Using a standardized layout means that general test runners > and CI systems can parse the cgroup test results without any special handling. I second that. In fact, we do run some of those tests in the CI; i.e. https://openqa.opensuse.org/tests/5453031#external We added this: https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Parser/Format/… to our CI but frankly the use of the KTAP across the selftests is very inconsistent, so we need to post-process some of the output files quite a lot. Therefore the more standardized the output, the better for any CI. Small ask: should we amend the commit message to say KTAP? That being said - the cgroups tests produce nice output which is easy to parse and gives us no issues in our CI apart from the shell tests, specifically test_cpuset_prs.sh. We currently run the cgroup tests only internally because some of them tend to fail when crossing resource-usage boundaries and don’t provide clear information about by how much. That ties into my earlier effort Michal linked here:: https://lore.kernel.org/all/rua6ubri67gh3b7atarbm5mggqgjyh6646mzkry2n2547jn… I’ll try to add the cgroup tests to the public openSUSE CI and will test your patches. > > TAP provides a structured format that is both human-readable and machine-readable. The plan/result lines are parsed by tools, > while the diagnostic lines can still contain human-readable debug information. Over time, other selftest suites (such as mm, KVM, mptcp, etc.) > have also been converted to TAP-style output, so this change just brings the cgroup tests in line with that broader direction. > > > Or is this part of some tree-wide effort? > > This patch is not part of a formal, tree-wide conversion series I am running; it is an incremental step to align the > cgroup C tests with the existing TAP usage. I started here because these tests already use ksft_test_result_*() and > only require minor changes to generate proper TAP output. > > > I'm asking to better asses whether also the scripts listed in > > Makefile:TEST_PROGS should be converted too. > > I agree that having them produce TAP output would benefit tooling and CI. I did not want to mix > that into this change, but if you and other maintainers think this direction is reasonable, > I would be happy to follow up and convert the cgroup shell tests to TAP as well. > > Thanks again for your review. > > Best regards, > Guopeng > >

1 month, 4 weeks

2
1
0 0

[PATCH] selftests/cgroup: conform test to TAP format output

by Guopeng Zhang

Conform the layout, informational and status messages to TAP. No functional change is intended other than the layout of output messages. Signed-off-by: Guopeng Zhang <zhangguopeng(a)kylinos.cn> --- tools/testing/selftests/cgroup/test_core.c | 7 ++++--- tools/testing/selftests/cgroup/test_cpu.c | 7 ++++--- tools/testing/selftests/cgroup/test_cpuset.c | 7 ++++--- tools/testing/selftests/cgroup/test_freezer.c | 7 ++++--- tools/testing/selftests/cgroup/test_kill.c | 7 ++++--- tools/testing/selftests/cgroup/test_kmem.c | 7 ++++--- tools/testing/selftests/cgroup/test_memcontrol.c | 7 ++++--- tools/testing/selftests/cgroup/test_zswap.c | 7 ++++--- 8 files changed, 32 insertions(+), 24 deletions(-) diff --git a/tools/testing/selftests/cgroup/test_core.c b/tools/testing/selftests/cgroup/test_core.c index 5e5b8c4b8c0e..102262555a59 100644 --- a/tools/testing/selftests/cgroup/test_core.c +++ b/tools/testing/selftests/cgroup/test_core.c @@ -923,8 +923,10 @@ struct corecg_test { int main(int argc, char *argv[]) { char root[PATH_MAX]; - int i, ret = EXIT_SUCCESS; + int i; + ksft_print_header(); + ksft_set_plan(ARRAY_SIZE(tests)); if (cg_find_unified_root(root, sizeof(root), &nsdelegate)) { if (setup_named_v1_root(root, sizeof(root), CG_NAMED_NAME)) ksft_exit_skip("cgroup v2 isn't mounted and could not setup named v1 hierarchy\n"); @@ -946,12 +948,11 @@ int main(int argc, char *argv[]) ksft_test_result_skip("%s\n", tests[i].name); break; default: - ret = EXIT_FAILURE; ksft_test_result_fail("%s\n", tests[i].name); break; } } cleanup_named_v1_root(root); - return ret; + ksft_finished(); } diff --git a/tools/testing/selftests/cgroup/test_cpu.c b/tools/testing/selftests/cgroup/test_cpu.c index 7d77d3d43c8e..c83f05438d7c 100644 --- a/tools/testing/selftests/cgroup/test_cpu.c +++ b/tools/testing/selftests/cgroup/test_cpu.c @@ -796,8 +796,10 @@ struct cpucg_test { int main(int argc, char *argv[]) { char root[PATH_MAX]; - int i, ret = EXIT_SUCCESS; + int i; + ksft_print_header(); + ksft_set_plan(ARRAY_SIZE(tests)); if (cg_find_unified_root(root, sizeof(root), NULL)) ksft_exit_skip("cgroup v2 isn't mounted\n"); @@ -814,11 +816,10 @@ int main(int argc, char *argv[]) ksft_test_result_skip("%s\n", tests[i].name); break; default: - ret = EXIT_FAILURE; ksft_test_result_fail("%s\n", tests[i].name); break; } } - return ret; + ksft_finished(); } diff --git a/tools/testing/selftests/cgroup/test_cpuset.c b/tools/testing/selftests/cgroup/test_cpuset.c index 8094091a5857..c5cf8b56ceb8 100644 --- a/tools/testing/selftests/cgroup/test_cpuset.c +++ b/tools/testing/selftests/cgroup/test_cpuset.c @@ -247,8 +247,10 @@ struct cpuset_test { int main(int argc, char *argv[]) { char root[PATH_MAX]; - int i, ret = EXIT_SUCCESS; + int i; + ksft_print_header(); + ksft_set_plan(ARRAY_SIZE(tests)); if (cg_find_unified_root(root, sizeof(root), NULL)) ksft_exit_skip("cgroup v2 isn't mounted\n"); @@ -265,11 +267,10 @@ int main(int argc, char *argv[]) ksft_test_result_skip("%s\n", tests[i].name); break; default: - ret = EXIT_FAILURE; ksft_test_result_fail("%s\n", tests[i].name); break; } } - return ret; + ksft_finished(); } diff --git a/tools/testing/selftests/cgroup/test_freezer.c b/tools/testing/selftests/cgroup/test_freezer.c index 714c963aa3f5..97fae92c8387 100644 --- a/tools/testing/selftests/cgroup/test_freezer.c +++ b/tools/testing/selftests/cgroup/test_freezer.c @@ -1488,8 +1488,10 @@ struct cgfreezer_test { int main(int argc, char *argv[]) { char root[PATH_MAX]; - int i, ret = EXIT_SUCCESS; + int i; + ksft_print_header(); + ksft_set_plan(ARRAY_SIZE(tests)); if (cg_find_unified_root(root, sizeof(root), NULL)) ksft_exit_skip("cgroup v2 isn't mounted\n"); for (i = 0; i < ARRAY_SIZE(tests); i++) { @@ -1501,11 +1503,10 @@ int main(int argc, char *argv[]) ksft_test_result_skip("%s\n", tests[i].name); break; default: - ret = EXIT_FAILURE; ksft_test_result_fail("%s\n", tests[i].name); break; } } - return ret; + ksft_finished(); } diff --git a/tools/testing/selftests/cgroup/test_kill.c b/tools/testing/selftests/cgroup/test_kill.c index a4dd326ced79..c8c9d306925b 100644 --- a/tools/testing/selftests/cgroup/test_kill.c +++ b/tools/testing/selftests/cgroup/test_kill.c @@ -274,8 +274,10 @@ struct cgkill_test { int main(int argc, char *argv[]) { char root[PATH_MAX]; - int i, ret = EXIT_SUCCESS; + int i; + ksft_print_header(); + ksft_set_plan(ARRAY_SIZE(tests)); if (cg_find_unified_root(root, sizeof(root), NULL)) ksft_exit_skip("cgroup v2 isn't mounted\n"); for (i = 0; i < ARRAY_SIZE(tests); i++) { @@ -287,11 +289,10 @@ int main(int argc, char *argv[]) ksft_test_result_skip("%s\n", tests[i].name); break; default: - ret = EXIT_FAILURE; ksft_test_result_fail("%s\n", tests[i].name); break; } } - return ret; + ksft_finished(); } diff --git a/tools/testing/selftests/cgroup/test_kmem.c b/tools/testing/selftests/cgroup/test_kmem.c index 005a142f3492..ca38525484e3 100644 --- a/tools/testing/selftests/cgroup/test_kmem.c +++ b/tools/testing/selftests/cgroup/test_kmem.c @@ -421,8 +421,10 @@ struct kmem_test { int main(int argc, char **argv) { char root[PATH_MAX]; - int i, ret = EXIT_SUCCESS; + int i; + ksft_print_header(); + ksft_set_plan(ARRAY_SIZE(tests)); if (cg_find_unified_root(root, sizeof(root), NULL)) ksft_exit_skip("cgroup v2 isn't mounted\n"); @@ -446,11 +448,10 @@ int main(int argc, char **argv) ksft_test_result_skip("%s\n", tests[i].name); break; default: - ret = EXIT_FAILURE; ksft_test_result_fail("%s\n", tests[i].name); break; } } - return ret; + ksft_finished(); } diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index 2e9d78ab641c..4e1647568c5b 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -1650,8 +1650,10 @@ struct memcg_test { int main(int argc, char **argv) { char root[PATH_MAX]; - int i, proc_status, ret = EXIT_SUCCESS; + int i, proc_status; + ksft_print_header(); + ksft_set_plan(ARRAY_SIZE(tests)); if (cg_find_unified_root(root, sizeof(root), NULL)) ksft_exit_skip("cgroup v2 isn't mounted\n"); @@ -1685,11 +1687,10 @@ int main(int argc, char **argv) ksft_test_result_skip("%s\n", tests[i].name); break; default: - ret = EXIT_FAILURE; ksft_test_result_fail("%s\n", tests[i].name); break; } } - return ret; + ksft_finished(); } diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/testing/selftests/cgroup/test_zswap.c index ab865d900791..64ebc3f3f203 100644 --- a/tools/testing/selftests/cgroup/test_zswap.c +++ b/tools/testing/selftests/cgroup/test_zswap.c @@ -597,8 +597,10 @@ static bool zswap_configured(void) int main(int argc, char **argv) { char root[PATH_MAX]; - int i, ret = EXIT_SUCCESS; + int i; + ksft_print_header(); + ksft_set_plan(ARRAY_SIZE(tests)); if (cg_find_unified_root(root, sizeof(root), NULL)) ksft_exit_skip("cgroup v2 isn't mounted\n"); @@ -625,11 +627,10 @@ int main(int argc, char **argv) ksft_test_result_skip("%s\n", tests[i].name); break; default: - ret = EXIT_FAILURE; ksft_test_result_fail("%s\n", tests[i].name); break; } } - return ret; + ksft_finished(); } -- 2.25.1

1 month, 4 weeks

2
2
0 0

[PATCH v9 0/9] liveupdate: Rework KHO for in-kernel users

by Pasha Tatashin

Changelog: v9: Added review-bys and addressed comments from Mike Rapoport and Pratyush Yadav. Dropped patch that moves abort/finalize to public header per Mike's request. Added patch from Zhu Yanjun to output errors by name. This series appliyes against akpm's mm-unstable branch. This series refactors the KHO framework to better support in-kernel users like the upcoming LUO. The current design, which relies on a notifier chain and debugfs for control, is too restrictive for direct programmatic use. The core of this rework is the removal of the notifier chain in favor of a direct registration API. This decouples clients from the shutdown-time finalization sequence, allowing them to manage their preserved state more flexibly and at any time. In support of this new model, this series also: - Makes the debugfs interface optional. - Introduces APIs to unpreserve memory and fixes a bug in the abort path where client state was being incorrectly discarded. Note that this is an interim step, as a more comprehensive fix is planned as part of the stateless KHO work [1]. - Moves all KHO code into a new kernel/liveupdate/ directory to consolidate live update components. [1] https://lore.kernel.org/all/20251020100306.2709352-1-jasonmiu@google.com Mike Rapoport (Microsoft) (1): kho: drop notifiers Pasha Tatashin (7): kho: make debugfs interface optional kho: add interfaces to unpreserve folios, page ranges, and vmalloc memblock: Unpreserve memory in case of error test_kho: Unpreserve memory in case of error kho: don't unpreserve memory during abort liveupdate: kho: move to kernel/liveupdate MAINTAINERS: update KHO maintainers Zhu Yanjun (1): liveupdate: kho: Use %pe format specifier for error pointer printing Documentation/core-api/kho/concepts.rst | 2 +- MAINTAINERS | 4 +- include/linux/kexec_handover.h | 46 +- init/Kconfig | 2 + kernel/Kconfig.kexec | 24 - kernel/Makefile | 3 +- kernel/kexec_handover_internal.h | 16 - kernel/liveupdate/Kconfig | 39 ++ kernel/liveupdate/Makefile | 5 + kernel/{ => liveupdate}/kexec_handover.c | 532 +++++++----------- .../{ => liveupdate}/kexec_handover_debug.c | 0 kernel/liveupdate/kexec_handover_debugfs.c | 221 ++++++++ kernel/liveupdate/kexec_handover_internal.h | 56 ++ lib/test_kho.c | 128 +++-- mm/memblock.c | 93 +-- tools/testing/selftests/kho/vmtest.sh | 1 + 16 files changed, 690 insertions(+), 482 deletions(-) delete mode 100644 kernel/kexec_handover_internal.h create mode 100644 kernel/liveupdate/Kconfig create mode 100644 kernel/liveupdate/Makefile rename kernel/{ => liveupdate}/kexec_handover.c (80%) rename kernel/{ => liveupdate}/kexec_handover_debug.c (100%) create mode 100644 kernel/liveupdate/kexec_handover_debugfs.c create mode 100644 kernel/liveupdate/kexec_handover_internal.h base-commit: 9ef7b034116354ee75502d1849280a4d2ff98a7c -- 2.51.1.930.gacf6e81ea2-goog

1 month, 4 weeks

7
40
0 0

[GIT PULL] kselftest fixes update for Linux 6.18-rc6

by Shuah Khan

Hi Linus, Please pull this kselftest fixes update for Linux 6.18-rc6 Fixes event-filter-function.tc tracing test failure caused when a first run to sample events triggers kmem_cache_free which interferes with the rest of the test. Fix this calling sample_events twice to eliminate the kmem_cache_free related noise from the sampling. diff is attached. thanks, -- Shuah ---------------------------------------------------------------- The following changes since commit 920aa3a7705a061cb3004572d8b7932b54463dbf: selftests: cachestat: Fix warning on declaration under label (2025-10-22 09:23:18 -0600) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-fixes-6.18-rc6 for you to fetch changes up to dd4adb986a86727ed8f56c48b6d0695f1e211e65: selftests/tracing: Run sample events to clear page cache events (2025-11-10 18:00:07 -0700) ---------------------------------------------------------------- linux_kselftest-fixes-6.18-rc6 Fixes event-filter-function.tc tracing test failure caused when a first run to sample events triggers kmem_cache_free which interferes with the rest of the test. Fix this calling sample_events twice to eliminate the kmem_cache_free related noise from the sampling. ---------------------------------------------------------------- Steven Rostedt (1): selftests/tracing: Run sample events to clear page cache events tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc | 4 ++++ 1 file changed, 4 insertions(+) ----------------------------------------------------------------

1 month, 4 weeks

2
1
0 0

[PATCH v2] selftests/mm/uffd: remove static address usage in shmem_allocate_area()

by Mehdi Ben Hadj Khelifa

The current shmem_allocate_area() implementation uses a hardcoded virtual base address (BASE_PMD_ADDR) as a hint for mmap() when creating shmem-backed test areas. This approach is fragile and may fail on systems with ASLR or different virtual memory layouts, where the chosen address is unavailable. Replace the static base address with a dynamically reserved address range obtained via mmap(NULL, ..., PROT_NONE). The memfd-backed areas and their alias are then mapped into that reserved region using MAP_FIXED, preserving the original layout and aliasing semantics while avoiding collisions with unrelated mappings. This change improves robustness and portability of the test suite without altering its behavior or coverage. Suggested-by: Mike Rapoport <rppt(a)kernel.org> Signed-off-by: Mehdi Ben Hadj Khelifa <mehdi.benhadjkhelifa(a)gmail.com> --- Testing(Retested): A diff between running the mm selftests on 6.18-rc5 from before and after the change show no regression on x86_64 architecture with 32GB DDR5 RAM. ChangeLog: Changes from v1: -Implemented Mike's suggestions to make cleanup code more clear. Link:https://lore.kernel.org/all/20251111205739.420009-1-mehdi.benhadjkheli… tools/testing/selftests/mm/uffd-common.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/mm/uffd-common.c b/tools/testing/selftests/mm/uffd-common.c index 994fe8c03923..edd02328f77b 100644 --- a/tools/testing/selftests/mm/uffd-common.c +++ b/tools/testing/selftests/mm/uffd-common.c @@ -10,7 +10,6 @@ uffd_test_ops_t *uffd_test_ops; uffd_test_case_ops_t *uffd_test_case_ops; -#define BASE_PMD_ADDR ((void *)(1UL << 30)) /* pthread_mutex_t starts at page offset 0 */ pthread_mutex_t *area_mutex(char *area, unsigned long nr, uffd_global_test_opts_t *gopts) @@ -142,30 +141,37 @@ static int shmem_allocate_area(uffd_global_test_opts_t *gopts, void **alloc_area unsigned long offset = is_src ? 0 : bytes; char *p = NULL, *p_alias = NULL; int mem_fd = uffd_mem_fd_create(bytes * 2, false); + size_t region_size = bytes * 2 + hpage_size; - /* TODO: clean this up. Use a static addr is ugly */ - p = BASE_PMD_ADDR; - if (!is_src) - /* src map + alias + interleaved hpages */ - p += 2 * (bytes + hpage_size); + void *reserve = mmap(NULL, region_size, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, + -1, 0); + if (reserve == MAP_FAILED) { + close(mem_fd); + return -errno; + } + + p = reserve; p_alias = p; p_alias += bytes; p_alias += hpage_size; /* Prevent src/dst VMA merge */ - *alloc_area = mmap(p, bytes, PROT_READ | PROT_WRITE, MAP_SHARED, + *alloc_area = mmap(p, bytes, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, mem_fd, offset); if (*alloc_area == MAP_FAILED) { *alloc_area = NULL; + munmap(reserve, region_size); + close(mem_fd); return -errno; } if (*alloc_area != p) err("mmap of memfd failed at %p", p); - area_alias = mmap(p_alias, bytes, PROT_READ | PROT_WRITE, MAP_SHARED, + area_alias = mmap(p_alias, bytes, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, mem_fd, offset); if (area_alias == MAP_FAILED) { - munmap(*alloc_area, bytes); *alloc_area = NULL; + munmap(reserve, region_size); + close(mem_fd); return -errno; } if (area_alias != p_alias) -- 2.51.2

1 month, 4 weeks

2
1
0 0

[PATCH] selftests/mm/uffd: remove static address usage in shmem_allocate_area()

by Mehdi Ben Hadj Khelifa

The current shmem_allocate_area() implementation uses a hardcoded virtual base address(BASE_PMD_ADDR) as a hint for mmap() when creating shmem-backed test areas. This approach is fragile and may fail on systems with ASLR or different virtual memory layouts, where the chosen address is unavailable. Replace the static base address with a dynamically reserved address range obtained via mmap(NULL, ..., PROT_NONE). The memfd-backed areas and their alias are then mapped into that reserved region using MAP_FIXED, preserving the original layout and aliasing semantics while avoiding collisions with unrelated mappings. This change improves robustness and portability of the test suite without altering its behavior or coverage. Signed-off-by: Mehdi Ben Hadj Khelifa <mehdi.benhadjkhelifa(a)gmail.com> --- Testing: A diff between running the mm selftests on 6.18-rc5 from before and after the change show no regression on x86_64 architecture with 32GB DDR5 RAM. tools/testing/selftests/mm/uffd-common.c | 25 +++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/mm/uffd-common.c b/tools/testing/selftests/mm/uffd-common.c index 994fe8c03923..492b21c960bb 100644 --- a/tools/testing/selftests/mm/uffd-common.c +++ b/tools/testing/selftests/mm/uffd-common.c @@ -6,11 +6,11 @@ */ #include "uffd-common.h" +#include "asm-generic/mman-common.h" uffd_test_ops_t *uffd_test_ops; uffd_test_case_ops_t *uffd_test_case_ops; -#define BASE_PMD_ADDR ((void *)(1UL << 30)) /* pthread_mutex_t starts at page offset 0 */ pthread_mutex_t *area_mutex(char *area, unsigned long nr, uffd_global_test_opts_t *gopts) @@ -142,30 +142,37 @@ static int shmem_allocate_area(uffd_global_test_opts_t *gopts, void **alloc_area unsigned long offset = is_src ? 0 : bytes; char *p = NULL, *p_alias = NULL; int mem_fd = uffd_mem_fd_create(bytes * 2, false); + size_t region_size = bytes * 2 + hpage_size; - /* TODO: clean this up. Use a static addr is ugly */ - p = BASE_PMD_ADDR; - if (!is_src) - /* src map + alias + interleaved hpages */ - p += 2 * (bytes + hpage_size); + void *reserve = mmap(NULL, region_size, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, + -1, 0); + if (reserve == MAP_FAILED) { + close(mem_fd); + return -errno; + } + + p = (char *)reserve; p_alias = p; p_alias += bytes; p_alias += hpage_size; /* Prevent src/dst VMA merge */ - *alloc_area = mmap(p, bytes, PROT_READ | PROT_WRITE, MAP_SHARED, + *alloc_area = mmap(p, bytes, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, mem_fd, offset); if (*alloc_area == MAP_FAILED) { + munmap(reserve, region_size); *alloc_area = NULL; + close(mem_fd); return -errno; } if (*alloc_area != p) err("mmap of memfd failed at %p", p); - area_alias = mmap(p_alias, bytes, PROT_READ | PROT_WRITE, MAP_SHARED, + area_alias = mmap(p_alias, bytes, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, mem_fd, offset); if (area_alias == MAP_FAILED) { - munmap(*alloc_area, bytes); + munmap(reserve, region_size); *alloc_area = NULL; + close(mem_fd); return -errno; } if (area_alias != p_alias) -- 2.51.2

1 month, 4 weeks

2
2
0 0

[PATCH v3 0/2] libbpf: fix BTF dedup to support recursive typedef

by Paul Houssel

Pahole fails to encode BTF for some Go projects (e.g. Kubernetes and Podman) due to recursive type definitions that create reference loops not representable in C. These recursive typedefs trigger a failure in the BTF deduplication algorithm. This patch extends btf_dedup_struct_types() to properly handle potential recursion for BTF_KIND_TYPEDEF, similar to how recursion is already handled for BTF_KIND_STRUCT. This allows pahole to successfully generate BTF for Go binaries using recursive types without impacting existing C-based workflows. Changes in v3: 1. Patch 1: Adjusted the comment of btf_dedup_ref_type() to refer to typedef as well. 2. Patch 2: Update of the "dedup: recursive typedef" test to include a duplicated version of the types to make sure deduplication still happens in this case. Changes in v2: 1. Patch 1: Refactored code to prevent copying existing logic. Instead of adding a new function we modify the existing btf_dedup_struct_type() function to handle the BTF_KIND_TYPEDEF case. Calls to btf_hash_struct() and btf_shallow_equal_struct() are replaced with calls to functions that select btf_hash_struct() / btf_hash_typedef() based on the type. 2. Patch 2: Added tests v2: https://lore.kernel.org/lkml/cover.1762956564.git.paul.houssel@orange.com/ v1: https://lore.kernel.org/lkml/20251107153408.159342-1-paulhoussel2@gmail.com/ Paul Houssel (2): libbpf: fix BTF dedup to support recursive typedef definitions selftests/bpf: add BTF dedup tests for recursive typedef definitions tools/lib/bpf/btf.c | 73 +++++++++++++++----- tools/testing/selftests/bpf/prog_tests/btf.c | 65 +++++++++++++++++ 2 files changed, 121 insertions(+), 17 deletions(-) -- 2.51.0

1 month, 4 weeks

2
3
0 0

[PATCH v2 0/4] KVM ARM64 pre_fault_memory

by Jack Thomson

From: Jack Thomson <jackabt(a)amazon.com> This patch series adds ARM64 support for the KVM_PRE_FAULT_MEMORY feature, which was previously only available on x86 [1]. This allows us to reduce the number of stage-2 faults during execution. This is of benefit in post-copy migration scenarios, particularly in memory intensive applications, where we are experiencing high latencies due to the stage-2 faults. Patch Overview: - The first patch adds support for the KVM_PRE_FAULT_MEMORY ioctl on arm64. - The second patch fixes an issue with unaligned mmap allocations in the selftests. - The third patch updates the pre_fault_memory_test to support arm64. - The last patch extends the pre_fault_memory_test to cover different vm memory backings. === Changes Since v1 [2] === Addressing feedback from Oliver: - No pre-fault flag is passed to user_mem_abort() or gmem_abort() now aborts are synthesized. - Remove retry loop from kvm_arch_vcpu_pre_fault_memory() [1]: https://lore.kernel.org/kvm/20240710174031.312055-1-pbonzini@redhat.com [2]: https://lore.kernel.org/all/20250911134648.58945-1-jackabt.amazon@gmail.com Jack Thomson (4): KVM: arm64: Add pre_fault_memory implementation KVM: selftests: Fix unaligned mmap allocations KVM: selftests: Enable pre_fault_memory_test for arm64 KVM: selftests: Add option for different backing in pre-fault tests Documentation/virt/kvm/api.rst | 3 +- arch/arm64/kvm/Kconfig | 1 + arch/arm64/kvm/arm.c | 1 + arch/arm64/kvm/mmu.c | 73 +++++++++++- tools/testing/selftests/kvm/Makefile.kvm | 1 + tools/testing/selftests/kvm/lib/kvm_util.c | 12 +- .../selftests/kvm/pre_fault_memory_test.c | 110 +++++++++++++----- 7 files changed, 163 insertions(+), 38 deletions(-) base-commit: 42188667be387867d2bf763d028654cbad046f7b -- 2.43.0

1 month, 4 weeks

4
11
0 0

[PATCH v5] selftests: af_unix: Add tests for ECONNRESET and EOF semantics

by Sunday Adelodun

Add selftests to verify and document Linux’s intended behaviour for UNIX domain sockets (SOCK_STREAM and SOCK_DGRAM) when a peer closes. The tests verify that: 1. SOCK_STREAM returns EOF when the peer closes normally. 2. SOCK_STREAM returns ECONNRESET if the peer closes with unread data. 3. SOCK_SEQPACKET returns EOF when the peer closes normally. 4. SOCK_SEQPACKET returns ECONNRESET if the peer closes with unread data. 5. SOCK_DGRAM does not return ECONNRESET when the peer closes. This follows up on review feedback suggesting a selftest to clarify Linux’s semantics. Suggested-by: Kuniyuki Iwashima <kuniyu(a)google.com> Signed-off-by: Sunday Adelodun <adelodunolaoluwa(a)yahoo.com> --- Changelog: changes made in v4 to v5: 1. Moved the send() call before the socket type check in Test 2 to ensure the unread data behavior is tested for SOCK_DGRAM as well. 2. Removed the misleading commend about accept() for clarity. 3. Applied indentation fixes for style consistency (alignment with open parenthesis). 4. Minor comment and formatting cleanups for clarity and adherence to kernel coding style. tools/testing/selftests/net/.gitignore | 1 + tools/testing/selftests/net/af_unix/Makefile | 1 + .../selftests/net/af_unix/unix_connreset.c | 178 ++++++++++++++++++ 3 files changed, 180 insertions(+) create mode 100644 tools/testing/selftests/net/af_unix/unix_connreset.c diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore index 439101b518ee..e89a60581a13 100644 --- a/tools/testing/selftests/net/.gitignore +++ b/tools/testing/selftests/net/.gitignore @@ -65,3 +65,4 @@ udpgso udpgso_bench_rx udpgso_bench_tx unix_connect +unix_connreset diff --git a/tools/testing/selftests/net/af_unix/Makefile b/tools/testing/selftests/net/af_unix/Makefile index de805cbbdf69..5826a8372451 100644 --- a/tools/testing/selftests/net/af_unix/Makefile +++ b/tools/testing/selftests/net/af_unix/Makefile @@ -7,6 +7,7 @@ TEST_GEN_PROGS := \ scm_pidfd \ scm_rights \ unix_connect \ + unix_connreset \ # end of TEST_GEN_PROGS include ../../lib.mk diff --git a/tools/testing/selftests/net/af_unix/unix_connreset.c b/tools/testing/selftests/net/af_unix/unix_connreset.c new file mode 100644 index 000000000000..9cb0f48597eb --- /dev/null +++ b/tools/testing/selftests/net/af_unix/unix_connreset.c @@ -0,0 +1,178 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Selftest for AF_UNIX socket close and ECONNRESET behaviour. + * + * This test verifies: + * 1. SOCK_STREAM returns EOF when the peer closes normally. + * 2. SOCK_STREAM returns ECONNRESET if peer closes with unread data. + * 3. SOCK_SEQPACKET returns EOF when the peer closes normally. + * 4. SOCK_SEQPACKET returns ECONNRESET if the peer closes with unread data. + * 5. SOCK_DGRAM does not return ECONNRESET when the peer closes. + * + * These tests document the intended Linux behaviour. + * + */ + +#define _GNU_SOURCE +#include <stdlib.h> +#include <string.h> +#include <fcntl.h> +#include <unistd.h> +#include <errno.h> +#include <sys/socket.h> +#include <sys/un.h> +#include "../../kselftest_harness.h" + +#define SOCK_PATH "/tmp/af_unix_connreset.sock" + +static void remove_socket_file(void) +{ + unlink(SOCK_PATH); +} + +FIXTURE(unix_sock) +{ + int server; + int client; + int child; +}; + +FIXTURE_VARIANT(unix_sock) +{ + int socket_type; + const char *name; +}; + +FIXTURE_VARIANT_ADD(unix_sock, stream) { + .socket_type = SOCK_STREAM, + .name = "SOCK_STREAM", +}; + +FIXTURE_VARIANT_ADD(unix_sock, dgram) { + .socket_type = SOCK_DGRAM, + .name = "SOCK_DGRAM", +}; + +FIXTURE_VARIANT_ADD(unix_sock, seqpacket) { + .socket_type = SOCK_SEQPACKET, + .name = "SOCK_SEQPACKET", +}; + +FIXTURE_SETUP(unix_sock) +{ + struct sockaddr_un addr = {}; + int err; + + addr.sun_family = AF_UNIX; + strcpy(addr.sun_path, SOCK_PATH); + remove_socket_file(); + + self->server = socket(AF_UNIX, variant->socket_type, 0); + ASSERT_LT(-1, self->server); + + err = bind(self->server, (struct sockaddr *)&addr, sizeof(addr)); + ASSERT_EQ(0, err); + + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + err = listen(self->server, 1); + ASSERT_EQ(0, err); + } + + self->client = socket(AF_UNIX, variant->socket_type | SOCK_NONBLOCK, 0); + ASSERT_LT(-1, self->client); + + err = connect(self->client, (struct sockaddr *)&addr, sizeof(addr)); + ASSERT_EQ(0, err); +} + +FIXTURE_TEARDOWN(unix_sock) +{ + if ((variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) & self->child > 0) + close(self->child); + + close(self->client); + close(self->server); + remove_socket_file(); +} + +/* Test 1: peer closes normally */ +TEST_F(unix_sock, eof) +{ + char buf[16] = {}; + ssize_t n; + + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + self->child = accept(self->server, NULL, NULL); + ASSERT_LT(-1, self->child); + + close(self->child); + } else { + close(self->server); + } + + n = recv(self->client, buf, sizeof(buf), 0); + + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + ASSERT_EQ(0, n); + } else { + ASSERT_EQ(-1, n); + ASSERT_EQ(EAGAIN, errno); + } +} + +/* Test 2: peer closes with unread data */ +TEST_F(unix_sock, reset_unread_behavior) +{ + char buf[16] = {}; + ssize_t n; + + /* Send data that will remain unread */ + send(self->client, "hello", 5, 0); + + if (variant->socket_type == SOCK_DGRAM) { + /* No real connection, just close the server */ + close(self->server); + } else { + /* Accept client connection */ + self->child = accept(self->server, NULL, NULL); + ASSERT_LT(-1, self->child); + + /* Peer closes before client reads */ + close(self->child); + } + + n = recv(self->client, buf, sizeof(buf), 0); + ASSERT_EQ(-1, n); + + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + ASSERT_EQ(ECONNRESET, errno); + } else { + ASSERT_EQ(EAGAIN, errno); + } +} + +/* Test 3: closing unaccepted (embryo) server socket should reset client. */ +TEST_F(unix_sock, reset_closed_embryo) +{ + char buf[16] = {}; + ssize_t n; + + if (variant->socket_type == SOCK_DGRAM) + SKIP(return, "This test only applies to SOCK_STREAM and SOCK_SEQPACKET"); + + /* Close server without accept()ing */ + close(self->server); + + n = recv(self->client, buf, sizeof(buf), 0); + + ASSERT_EQ(-1, n); + ASSERT_EQ(ECONNRESET, errno); +} + +TEST_HARNESS_MAIN + -- 2.43.0

1 month, 4 weeks

2
2
0 0

[PATCH v4] selftests: af_unix: Add tests for ECONNRESET and EOF semantics

by Sunday Adelodun

Add selftests to verify and document Linux’s intended behaviour for UNIX domain sockets (SOCK_STREAM and SOCK_DGRAM) when a peer closes. The tests verify that: 1. SOCK_STREAM returns EOF when the peer closes normally. 2. SOCK_STREAM returns ECONNRESET if the peer closes with unread data. 3. SOCK_SEQPACKET returns EOF when the peer closes normally. 4. SOCK_SEQPACKET returns ECONNRESET if the peer closes with unread data. 5. SOCK_DGRAM does not return ECONNRESET when the peer closes. This follows up on review feedback suggesting a selftest to clarify Linux’s semantics. Suggested-by: Kuniyuki Iwashima <kuniyu(a)google.com> Signed-off-by: Sunday Adelodun <adelodunolaoluwa(a)yahoo.com> --- tools/testing/selftests/net/.gitignore | 1 + tools/testing/selftests/net/af_unix/Makefile | 1 + .../selftests/net/af_unix/unix_connreset.c | 178 ++++++++++++++++++ 3 files changed, 180 insertions(+) create mode 100644 tools/testing/selftests/net/af_unix/unix_connreset.c diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore index 439101b518ee..e89a60581a13 100644 --- a/tools/testing/selftests/net/.gitignore +++ b/tools/testing/selftests/net/.gitignore @@ -65,3 +65,4 @@ udpgso udpgso_bench_rx udpgso_bench_tx unix_connect +unix_connreset diff --git a/tools/testing/selftests/net/af_unix/Makefile b/tools/testing/selftests/net/af_unix/Makefile index de805cbbdf69..5826a8372451 100644 --- a/tools/testing/selftests/net/af_unix/Makefile +++ b/tools/testing/selftests/net/af_unix/Makefile @@ -7,6 +7,7 @@ TEST_GEN_PROGS := \ scm_pidfd \ scm_rights \ unix_connect \ + unix_connreset \ # end of TEST_GEN_PROGS include ../../lib.mk diff --git a/tools/testing/selftests/net/af_unix/unix_connreset.c b/tools/testing/selftests/net/af_unix/unix_connreset.c new file mode 100644 index 000000000000..9413f8a0814f --- /dev/null +++ b/tools/testing/selftests/net/af_unix/unix_connreset.c @@ -0,0 +1,178 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Selftest for AF_UNIX socket close and ECONNRESET behaviour. + * + * This test verifies: + * 1. SOCK_STREAM returns EOF when the peer closes normally. + * 2. SOCK_STREAM returns ECONNRESET if peer closes with unread data. + * 3. SOCK_SEQPACKET returns EOF when the peer closes normally. + * 4. SOCK_SEQPACKET returns ECONNRESET if the peer closes with unread data. + * 5. SOCK_DGRAM does not return ECONNRESET when the peer closes. + * + * These tests document the intended Linux behaviour. + * + */ + +#define _GNU_SOURCE +#include <stdlib.h> +#include <string.h> +#include <fcntl.h> +#include <unistd.h> +#include <errno.h> +#include <sys/socket.h> +#include <sys/un.h> +#include "../../kselftest_harness.h" + +#define SOCK_PATH "/tmp/af_unix_connreset.sock" + +static void remove_socket_file(void) +{ + unlink(SOCK_PATH); +} + +FIXTURE(unix_sock) +{ + int server; + int client; + int child; +}; + +FIXTURE_VARIANT(unix_sock) +{ + int socket_type; + const char *name; +}; + +FIXTURE_VARIANT_ADD(unix_sock, stream) { + .socket_type = SOCK_STREAM, + .name = "SOCK_STREAM", +}; + +FIXTURE_VARIANT_ADD(unix_sock, dgram) { + .socket_type = SOCK_DGRAM, + .name = "SOCK_DGRAM", +}; + +FIXTURE_VARIANT_ADD(unix_sock, seqpacket) { + .socket_type = SOCK_SEQPACKET, + .name = "SOCK_SEQPACKET", +}; + +FIXTURE_SETUP(unix_sock) +{ + struct sockaddr_un addr = {}; + int err; + + addr.sun_family = AF_UNIX; + strcpy(addr.sun_path, SOCK_PATH); + remove_socket_file(); + + self->server = socket(AF_UNIX, variant->socket_type, 0); + ASSERT_LT(-1, self->server); + + err = bind(self->server, (struct sockaddr *)&addr, sizeof(addr)); + ASSERT_EQ(0, err); + + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + err = listen(self->server, 1); + ASSERT_EQ(0, err); + } + + self->client = socket(AF_UNIX, variant->socket_type | SOCK_NONBLOCK, 0); + ASSERT_LT(-1, self->client); + + err = connect(self->client, (struct sockaddr *)&addr, sizeof(addr)); + ASSERT_EQ(0, err); +} + +FIXTURE_TEARDOWN(unix_sock) +{ + if ((variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) & self->child > 0) + close(self->child); + + close(self->client); + close(self->server); + remove_socket_file(); +} + +/* Test 1: peer closes normally */ +TEST_F(unix_sock, eof) +{ + char buf[16] = {}; + ssize_t n; + + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + self->child = accept(self->server, NULL, NULL); + ASSERT_LT(-1, self->child); + + close(self->child); + } else { + close(self->server); + } + + n = recv(self->client, buf, sizeof(buf), 0); + + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + ASSERT_EQ(0, n); + } else { + ASSERT_EQ(-1, n); + ASSERT_EQ(EAGAIN, errno); + } +} + +/* Test 2: peer closes with unread data */ +TEST_F(unix_sock, reset_unread_behavior) +{ + char buf[16] = {}; + ssize_t n; + + if (variant->socket_type == SOCK_DGRAM) { + /* No real connection, just close the server */ + close(self->server); + } else { + /* Establish full connection first */ + self->child = accept(self->server, NULL, NULL); + ASSERT_LT(-1, self->child); + + /* Send data that will remain unread */ + send(self->client, "hello", 5, 0); + + /* Peer closes before client reads */ + close(self->child); + } + + n = recv(self->client, buf, sizeof(buf), 0); + ASSERT_EQ(-1, n); + + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + ASSERT_EQ(ECONNRESET, errno); + } else { + ASSERT_EQ(EAGAIN, errno); + } +} + +/* Test 3: closing unaccepted (embryo) server socket should reset client. */ +TEST_F(unix_sock, reset_closed_embryo) +{ + char buf[16] = {}; + ssize_t n; + + if (variant->socket_type == SOCK_DGRAM) + SKIP(return, "This test only applies to SOCK_STREAM and SOCK_SEQPACKET"); + + /* Close server without accept()ing */ + close(self->server); + + n = recv(self->client, buf, sizeof(buf), 0); + + ASSERT_EQ(-1, n); + ASSERT_EQ(ECONNRESET, errno); +} + +TEST_HARNESS_MAIN + -- 2.43.0

1 month, 4 weeks

2
6
0 0

[PATCH v1] cpuset: Avoid unnecessary partition invalidation

by Sun Shaojie

Currently, when a non-exclusive cpuset's "cpuset.cpus" overlaps with a partitioned sibling, the sibling's partition state becomes invalid. However, this invalidation is often unnecessary. This can be observed in specific configuration sequences: Case 1: Partition created first, then non-exclusive cpuset overlaps #1> mkdir -p /sys/fs/cgroup/A1 #2> echo "0-1" > /sys/fs/cgroup/A1/cpuset.cpus #3> echo "root" > /sys/fs/cgroup/A1/cpuset.cpus.partition #4> mkdir -p /sys/fs/cgroup/B1 #5> echo "0-3" > /sys/fs/cgroup/B1/cpuset.cpus // A1's partition becomes "root invalid" - this is unnecessary Case 2: Non-exclusive cpuset exists first, then partition created #1> mkdir -p /sys/fs/cgroup/B1 #2> echo "0-1" > /sys/fs/cgroup/B1/cpuset.cpus #3> mkdir -p /sys/fs/cgroup/A1 #4> echo "0-1" > /sys/fs/cgroup/A1/cpuset.cpus #5> echo "root" > /sys/fs/cgroup/A1/cpuset.cpus.partition // A1's partition becomes "root invalid" - this is unnecessary In Case 1, the effective CPU mask of B1 can differ from its requested mask. B1 can use CPUs 2-3 which don't overlap with A1's exclusive CPUs (0-1), thus not violating A1's exclusivity requirement. In Case 2, B1 can inherit the effective CPUs from its parent, so there is no need to invalidate A1's partition state. This patch relaxes the overlap check to only consider conflicts between partitioned siblings, not between a partitioned cpuset and a regular non-exclusive one. Signed-off-by: Sun Shaojie <sunshaojie(a)kylinos.cn> --- kernel/cgroup/cpuset.c | 8 ++++---- tools/testing/selftests/cgroup/test_cpuset_prs.sh | 10 +++++----- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..e0d27c9a101a 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -586,14 +586,14 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2) * Returns: true if CPU exclusivity conflict exists, false otherwise * * Conflict detection rules: - * 1. If either cpuset is CPU exclusive, they must be mutually exclusive + * 1. If both cpusets are exclusive, they must be mutually exclusive * 2. exclusive_cpus masks cannot intersect between cpusets * 3. The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs */ static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) { - /* If either cpuset is exclusive, check if they are mutually exclusive */ - if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2)) + /* If both cpusets are exclusive, check if they are mutually exclusive */ + if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2)) return !cpusets_are_exclusive(cs1, cs2); /* Exclusive_cpus cannot intersect */ @@ -695,7 +695,7 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial) goto out; /* - * If either I or some sibling (!= me) is exclusive, we can't + * If both I and some sibling (!= me) are exclusive, we can't * overlap. exclusive_cpus cannot overlap with each other if set. */ ret = -EINVAL; diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh index a17256d9f88a..903dddfe88d7 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -269,7 +269,7 @@ TEST_MATRIX=( " C0-3:S+ C1-3:S+ C2-3 . X2-3 X3:P2 . . 0 A1:0-2|A2:3|A3:3 A1:P0|A2:P2 3" " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3" " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2:C3 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3" - " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-3|A2:1-3|A3:2-3|B1:2-3 A1:P0|A3:P0|B1:P-2" + " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-1|A2:1|A3:1|B1:2-3 A1:P0|A3:P0|B1:P2 2-3" " C0-3:S+ C1-3:S+ C2-3 C4-5 . . . P2 0 B1:4-5 B1:P2 4-5" " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4" " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2:C1-3 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4" @@ -318,7 +318,7 @@ TEST_MATRIX=( # Invalid to valid local partition direct transition tests " C1-3:S+:P2 X4:P2 . . . . . . 0 A1:1-3|XA1:1-3|A2:1-3:XA2: A1:P2|A2:P-2 1-3" " C1-3:S+:P2 X4:P2 . . . X3:P2 . . 0 A1:1-2|XA1:1-3|A2:3:XA2:3 A1:P2|A2:P2 1-3" - " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:4-6 A1:P-2|B1:P0" + " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:5-6 A1:P2|B1:P0 0-4" " C0-3:P2 . . C4-6 C0-4:C0-3 . . . 0 A1:0-3|B1:4-6 A1:P2|B1:P0 0-3" # Local partition invalidation tests @@ -388,10 +388,10 @@ TEST_MATRIX=( " C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:1 A1:P0|A2:P-2" " C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1 A1:P1|A2:P2 0-1|1" - # A non-exclusive cpuset.cpus change will invalidate partition and its siblings - " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P0" + # A non-exclusive cpuset.cpus change will not invalidate partition and its siblings + " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:3 A1:P1|B1:P0" " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P-1" - " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P0|B1:P-1" + " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P0|B1:P1" # cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it " C0-3 . . C4-5 X5 . . . 0 A1:0-3|B1:4-5" -- 2.25.1

1 month, 4 weeks

3
11
0 0

[PATCH net-next v3 0/6] netconsole: support automatic target recovery

by Andre Carvalho

This patchset introduces target resume capability to netconsole allowing it to recover targets when underlying low-level interface comes back online. The patchset starts by refactoring netconsole state representation in order to allow representing deactivated targets (targets that are disabled due to interfaces going down). It then modifies netconsole to handle NETDEV_UP events for such targets and setups netpoll. Targets are matched with incoming interfaces depending on how they were initially bound in netconsole (by mac or interface name). The patchset includes a selftest that validates netconsole target state transitions and that target is functional after resumed. Signed-off-by: Andre Carvalho <asantostc(a)gmail.com> --- Changes in v3: - Resume by mac or interface name depending on how target was created. - Attempt to resume target without holding target list lock, by moving the target to a temporary list. This is required as netpoll may attempt to allocate memory. - Link to v2: https://lore.kernel.org/r/20250921-netcons-retrigger-v2-0-a0e84006237f@gmai… Changes in v2: - Attempt to resume target in the same thread, instead of using workqueue . - Add wrapper around __netpoll_setup (patch 4). - Renamed resume_target to maybe_resume_target and moved conditionals to inside its implementation, keeping code more clear. - Verify that device addr matches target mac address when target was setup using mac. - Update selftest to cover targets bound by mac and interface name. - Fix typo in selftest comment and sort tests alphabetically in Makefile. - Link to v1: https://lore.kernel.org/r/20250909-netcons-retrigger-v1-0-3aea904926cf@gmai… --- Andre Carvalho (4): netconsole: convert 'enabled' flag to enum for clearer state management netpoll: add wrapper around __netpoll_setup with dev reference netconsole: resume previously deactivated target selftests: netconsole: validate target resume Breno Leitao (2): netconsole: add target_state enum netconsole: add STATE_DEACTIVATED to track targets disabled by low level drivers/net/netconsole.c | 126 ++++++++++++++++----- include/linux/netpoll.h | 1 + net/core/netpoll.c | 20 ++++ tools/testing/selftests/drivers/net/Makefile | 1 + .../selftests/drivers/net/lib/sh/lib_netcons.sh | 30 ++++- .../selftests/drivers/net/netcons_resume.sh | 92 +++++++++++++++ 6 files changed, 238 insertions(+), 32 deletions(-) --- base-commit: a0c3aefb08cd81864b17c23c25b388dba90b9dad change-id: 20250816-netcons-retrigger-a4f547bfc867 Best regards, -- Andre Carvalho <asantostc(a)gmail.com>

2 months

3
13
0 0

[PATCH v3 0/4] vfio: selftests: update DMA mapping tests to use queried IOVA ranges

by Alex Mastro

Not all IOMMUs support the same virtual address width as the processor, for instance older Intel consumer platforms only support 39-bits of IOMMU address space. On such platforms, using the virtual address as the IOVA and mappings at the top of the address space both fail. VFIO and IOMMUFD have facilities for retrieving valid IOVA ranges, VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE and IOMMU_IOAS_IOVA_RANGES, respectively. These provide compatible arrays of ranges from which we can construct a simple allocator. Use this new allocator in place of reusing the virtual address, and incorporate the maximum supported IOVA into the limit testing. This latter change doesn't test quite the same absolute end-of-address space behavior but still seems to have some value. This series is based on Alex Williamson's "Incorporate IOVA range info" [1] along with feedback from the discussion in David Matlack's "Skip vfio_dma_map_limit_test if mapping returns -EINVAL" [2]. Given David's plans to split IOMMU concerns from devices as described in [3], this series' home for `struct iova_allocator` and IOVA range helpers are likely to be short lived, since they reside in vfio_pci_device.c. I assume that the rework can move this functionality to a more appropriate location next to other IOMMU-focused code, once such a place exists. [1] https://lore.kernel.org/all/20251108212954.26477-1-alex@shazbot.org/#t [2] https://lore.kernel.org/all/20251107222058.2009244-1-dmatlack@google.com/ [3] https://lore.kernel.org/all/aRIoKJk0uwLD-yGr@google.com/ To: Alex Williamson <alex(a)shazbot.org> To: David Matlack <dmatlack(a)google.com> To: Shuah Khan <shuah(a)kernel.org> To: Jason Gunthorpe <jgg(a)ziepe.ca> Cc: kvm(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org Signed-off-by: Alex Mastro <amastro(a)fb.com> Changes in v3: - Update capability chain cycle detection - Clarify the iova=vaddr commit message - Link to v2: https://lore.kernel.org/r/20251111-iova-ranges-v2-0-0fa267ff9b78@fb.com Changes in v2: - Fix various nits - calloc() where appropriate - Update overflow test to run regardless of iova range constraints - Change iova_allocator_init() to return an allocated struct - Unfold iova_allocator_alloc() - Fix iova allocator initial state bug - Update vfio_pci_driver_test to use iova allocator - Link to v1: https://lore.kernel.org/r/20251110-iova-ranges-v1-0-4d441cf5bf6d@fb.com --- Alex Mastro (4): vfio: selftests: add iova range query helpers vfio: selftests: fix map limit tests to use last available iova vfio: selftests: add iova allocator vfio: selftests: replace iova=vaddr with allocated iovas .../testing/selftests/vfio/lib/include/vfio_util.h | 19 +- tools/testing/selftests/vfio/lib/vfio_pci_device.c | 246 ++++++++++++++++++++- .../testing/selftests/vfio/vfio_dma_mapping_test.c | 20 +- .../testing/selftests/vfio/vfio_pci_driver_test.c | 12 +- 4 files changed, 288 insertions(+), 9 deletions(-) --- base-commit: 0ed3a30fd996cb0cac872432cf25185fda7e5316 change-id: 20251110-iova-ranges-1c09549fbf63 Best regards, -- Alex Mastro <amastro(a)fb.com>

2 months

3
10
0 0

[PATCH v3] selftests: af_unix: Add tests for ECONNRESET and EOF semantics

by Sunday Adelodun

Add selftests to verify and document Linux’s intended behaviour for UNIX domain sockets (SOCK_STREAM and SOCK_DGRAM) when a peer closes. The tests verify that: 1. SOCK_STREAM returns EOF when the peer closes normally. 2. SOCK_STREAM returns ECONNRESET if the peer closes with unread data. 3. SOCK_SEQPACKET returns EOF when the peer closes normally. 4. SOCK_SEQPACKET returns ECONNRESET if the peer closes with unread data. 5. SOCK_DGRAM does not return ECONNRESET when the peer closes. This follows up on review feedback suggesting a selftest to clarify Linux’s semantics. Suggested-by: Kuniyuki Iwashima <kuniyu(a)google.com> Signed-off-by: Sunday Adelodun <adelodunolaoluwa(a)yahoo.com> --- tools/testing/selftests/net/af_unix/Makefile | 1 + .../selftests/net/af_unix/unix_connreset.c | 179 ++++++++++++++++++ 2 files changed, 180 insertions(+) create mode 100644 tools/testing/selftests/net/af_unix/unix_connreset.c diff --git a/tools/testing/selftests/net/af_unix/Makefile b/tools/testing/selftests/net/af_unix/Makefile index de805cbbdf69..5826a8372451 100644 --- a/tools/testing/selftests/net/af_unix/Makefile +++ b/tools/testing/selftests/net/af_unix/Makefile @@ -7,6 +7,7 @@ TEST_GEN_PROGS := \ scm_pidfd \ scm_rights \ unix_connect \ + unix_connreset \ # end of TEST_GEN_PROGS include ../../lib.mk diff --git a/tools/testing/selftests/net/af_unix/unix_connreset.c b/tools/testing/selftests/net/af_unix/unix_connreset.c new file mode 100644 index 000000000000..6f43435d96e2 --- /dev/null +++ b/tools/testing/selftests/net/af_unix/unix_connreset.c @@ -0,0 +1,179 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Selftest for AF_UNIX socket close and ECONNRESET behaviour. + * + * This test verifies: + * 1. SOCK_STREAM returns EOF when the peer closes normally. + * 2. SOCK_STREAM returns ECONNRESET if peer closes with unread data. + * 3. SOCK_SEQPACKET returns EOF when the peer closes normally. + * 4. SOCK_SEQPACKET returns ECONNRESET if the peer closes with unread data. + * 5. SOCK_DGRAM does not return ECONNRESET when the peer closes. + * + * These tests document the intended Linux behaviour. + * + */ + +#define _GNU_SOURCE +#include <stdlib.h> +#include <string.h> +#include <fcntl.h> +#include <unistd.h> +#include <errno.h> +#include <sys/socket.h> +#include <sys/un.h> +#include "../../kselftest_harness.h" + +#define SOCK_PATH "/tmp/af_unix_connreset.sock" + +static void remove_socket_file(void) +{ + unlink(SOCK_PATH); +} + +FIXTURE(unix_sock) +{ + int server; + int client; + int child; +}; + +FIXTURE_VARIANT(unix_sock) +{ + int socket_type; + const char *name; +}; + +/* Define variants: stream and datagram */ +FIXTURE_VARIANT_ADD(unix_sock, stream) { + .socket_type = SOCK_STREAM, + .name = "SOCK_STREAM", +}; + +FIXTURE_VARIANT_ADD(unix_sock, dgram) { + .socket_type = SOCK_DGRAM, + .name = "SOCK_DGRAM", +}; + +FIXTURE_VARIANT_ADD(unix_sock, seqpacket) { + .socket_type = SOCK_SEQPACKET, + .name = "SOCK_SEQPACKET", +}; + +FIXTURE_SETUP(unix_sock) +{ + struct sockaddr_un addr = {}; + int err; + + addr.sun_family = AF_UNIX; + strcpy(addr.sun_path, SOCK_PATH); + remove_socket_file(); + + self->server = socket(AF_UNIX, variant->socket_type, 0); + ASSERT_LT(-1, self->server); + + err = bind(self->server, (struct sockaddr *)&addr, sizeof(addr)); + ASSERT_EQ(0, err); + + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + err = listen(self->server, 1); + ASSERT_EQ(0, err); + + self->client = socket(AF_UNIX, variant->socket_type, 0); + ASSERT_LT(-1, self->client); + + err = connect(self->client, (struct sockaddr *)&addr, sizeof(addr)); + ASSERT_EQ(0, err); + + self->child = accept(self->server, NULL, NULL); + ASSERT_LT(-1, self->child); + } else { + /* Datagram: bind and connect only */ + self->client = socket(AF_UNIX, SOCK_DGRAM | SOCK_NONBLOCK, 0); + ASSERT_LT(-1, self->client); + + err = connect(self->client, (struct sockaddr *)&addr, sizeof(addr)); + ASSERT_EQ(0, err); + } +} + +FIXTURE_TEARDOWN(unix_sock) +{ + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) + close(self->child); + + close(self->client); + close(self->server); + remove_socket_file(); +} + +/* Test 1: peer closes normally */ +TEST_F(unix_sock, eof) +{ + char buf[16] = {}; + ssize_t n; + + /* Peer closes normally */ + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) + close(self->child); + else + close(self->server); + + n = recv(self->client, buf, sizeof(buf), 0); + TH_LOG("%s: recv=%zd errno=%d (%s)", variant->name, n, errno, strerror(errno)); + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + ASSERT_EQ(0, n); + } else { + ASSERT_EQ(-1, n); + ASSERT_EQ(EAGAIN, errno); + } +} + +/* Test 2: peer closes with unread data */ +TEST_F(unix_sock, reset_unread) +{ + char buf[16] = {}; + ssize_t n; + + /* Send data that will remain unread by client */ + send(self->client, "hello", 5, 0); + close(self->child); + + n = recv(self->client, buf, sizeof(buf), 0); + TH_LOG("%s: recv=%zd errno=%d (%s)", variant->name, n, errno, strerror(errno)); + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + ASSERT_EQ(-1, n); + ASSERT_EQ(ECONNRESET, errno); + } else { + ASSERT_EQ(-1, n); + ASSERT_EQ(EAGAIN, errno); + } +} + +/* Test 3: SOCK_DGRAM peer close */ +TEST_F(unix_sock, dgram_reset) +{ + char buf[16] = {}; + ssize_t n; + + send(self->client, "hello", 5, 0); + close(self->server); + + n = recv(self->client, buf, sizeof(buf), 0); + TH_LOG("%s: recv=%zd errno=%d (%s)", variant->name, n, errno, strerror(errno)); + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + ASSERT_EQ(-1, n); + ASSERT_EQ(ECONNRESET, errno); + } else { + ASSERT_EQ(-1, n); + ASSERT_EQ(EAGAIN, errno); + } +} + +TEST_HARNESS_MAIN + -- 2.43.0

2 months

2
6
0 0

[PATCH v2 0/2] libbpf: fix BTF dedup to support recursive typedef

by Paul Houssel

Pahole fails to encode BTF for some Go projects (e.g. Kubernetes and Podman) due to recursive type definitions that create reference loops not representable in C. These recursive typedefs trigger a failure in the BTF deduplication algorithm. This patch extends btf_dedup_struct_types() to properly handle potential recursion for BTF_KIND_TYPEDEF, similar to how recursion is already handled for BTF_KIND_STRUCT. This allows pahole to successfully generate BTF for Go binaries using recursive types without impacting existing C-based workflows. Changes in v2: 1. Patch 1: Refactored code to prevent copying existing logic. Instead of adding a new function () we modify the existing btf_dedup_struct_type() function to handle the BTF_KIND_TYPEDEF case. Calls to btf_hash_struct() and btf_shallow_equal_struct() are replaced with calls to functions that select btf_hash_struct() / btf_hash_typedef() based on the type. 2. Patch 2: Added tests v1: https://lore.kernel.org/lkml/20251107153408.159342-1-paulhoussel2@gmail.com/ Paul Houssel (2): libbpf: fix BTF dedup to support recursive typedef definitions selftests/bpf: add BTF dedup tests for recursive typedef definitions tools/lib/bpf/btf.c | 59 +++++++++++++++---- tools/testing/selftests/bpf/prog_tests/btf.c | 61 ++++++++++++++++++++ 2 files changed, 110 insertions(+), 10 deletions(-) -- 2.51.0

2 months

2
4
0 0

[PATCHSET v10 sched_ext/for-6.19] Add a deadline server for sched_ext tasks

by Andrea Righi

sched_ext tasks can be starved by long-running RT tasks, especially since RT throttling was replaced by deadline servers to boost only SCHED_NORMAL tasks. Several users in the community have reported issues with RT stalling sched_ext tasks. This is fairly common on distributions or environments where applications like video compositors, audio services, etc. run as RT tasks by default. Example trace (showing a per-CPU kthread stalled due to the sway Wayland compositor running as an RT task): runnable task stall (kworker/0:0[106377] failed to run for 5.043s) ... CPU 0 : nr_run=3 flags=0xd cpu_rel=0 ops_qseq=20646200 pnt_seq=45388738 curr=sway[994] class=rt_sched_class R kworker/0:0[106377] -5043ms scx_state/flags=3/0x1 dsq_flags=0x0 ops_state/qseq=0/0 sticky/holding_cpu=-1/-1 dsq_id=0x8000000000000002 dsq_vtime=0 slice=20000000 cpus=01 This is often perceived as a bug in the BPF schedulers, but in reality schedulers can't do much: RT tasks run outside their control and can potentially consume 100% of the CPU bandwidth. Fix this by adding a sched_ext deadline server, so that sched_ext tasks are also boosted and do not suffer starvation. Two kselftests are also provided to verify the starvation fixes and bandwidth allocation is correct. == Highlights in this version == - wait for inactive_task_timer() to fire before removing the bandwidth reservation (Juri/Peter: please check if this new dl_server_remove_params() implementation makes sense to you) - removed the explicit dl_server_stop() from dequeue_task_scx() and rely on the delayed stop behavior (Juri/Peter: ditto) This patchset is also available in the following git branch: git://git.kernel.org/pub/scm/linux/kernel/git/arighi/linux.git scx-dl-server Changes in v10: - reordered patches to better isolate sched_ext changes vs sched/deadline changes (Andrea Righi) - define ext_server only with CONFIG_SCHED_CLASS_EXT=y (Andrea Righi) - add WARN_ON_ONCE(!cpus) check in dl_server_apply_params() (Andrea Righi) - wait for inactive_task_timer to fire before removing the bandwidth reservation (Juri Lelli) - remove explicit dl_server_stop() in dequeue_task_scx() to reduce timer reprogramming overhead (Juri Lelli) - do not restart pick_task() when invoked by the dl_server (Tejun Heo) - rename rq_dl_server to dl_server (Peter Zijlstra) - fixed a missing dl_server start in dl_server_on() (Christian Loehle) - add a comment to the rt_stall selftest to better explain the 4% threshold (Emil Tsalapatis) Changes in v9: - Drop the ->balance() logic as its functionality is now integrated into ->pick_task(), allowing dl_server to call pick_task_scx() directly - Link to v8: https://lore.kernel.org/all/20250903095008.162049-1-arighi@nvidia.com/ Changes in v8: - Add tj's patch to de-couple balance and pick_task and avoid changing sched/core callbacks to propagate @rf - Simplify dl_se->dl_server check (suggested by PeterZ) - Small coding style fixes in the kselftests - Link to v7: https://lore.kernel.org/all/20250809184800.129831-1-joelagnelf@nvidia.com/ Changes in v7: - Rebased to Linus master - Link to v6: https://lore.kernel.org/all/20250702232944.3221001-1-joelagnelf@nvidia.com/ Changes in v6: - Added Acks to few patches - Fixes to few nits suggested by Tejun - Link to v5: https://lore.kernel.org/all/20250620203234.3349930-1-joelagnelf@nvidia.com/ Changes in v5: - Added a kselftest (total_bw) to sched_ext to verify bandwidth values from debugfs - Address comment from Andrea about redundant rq clock invalidation - Link to v4: https://lore.kernel.org/all/20250617200523.1261231-1-joelagnelf@nvidia.com/ Changes in v4: - Fixed issues with hotplugged CPUs having their DL server bandwidth altered due to loading SCX - Fixed other issues - Rebased on Linus master - All sched_ext kselftests reliably pass now, also verified that the total_bw in debugfs (CONFIG_SCHED_DEBUG) is conserved with these patches - Link to v3: https://lore.kernel.org/all/20250613051734.4023260-1-joelagnelf@nvidia.com/ Changes in v3: - Removed code duplication in debugfs. Made ext interface separate - Fixed issue where rq_lock_irqsave was not used in the relinquish patch - Fixed running bw accounting issue in dl_server_remove_params - Link to v2: https://lore.kernel.org/all/20250602180110.816225-1-joelagnelf@nvidia.com/ Changes in v2: - Fixed a hang related to using rq_lock instead of rq_lock_irqsave - Added support to remove BW of DL servers when they are switched to/from EXT - Link to v1: https://lore.kernel.org/all/20250315022158.2354454-1-joelagnelf@nvidia.com/ Andrea Righi (5): sched/deadline: Add support to initialize and remove dl_server bandwidth sched_ext: Add a DL server for sched_ext tasks sched/deadline: Account ext server bandwidth sched_ext: Selectively enable ext and fair DL servers selftests/sched_ext: Add test for sched_ext dl_server Joel Fernandes (6): sched/debug: Fix updating of ppos on server write ops sched/debug: Stop and start server based on if it was active sched/deadline: Clear the defer params sched/deadline: Add a server arg to dl_server_update_idle_time() sched/debug: Add support to change sched_ext server params selftests/sched_ext: Add test for DL server total_bw consistency kernel/sched/core.c | 3 + kernel/sched/deadline.c | 169 +++++++++++--- kernel/sched/debug.c | 171 +++++++++++--- kernel/sched/ext.c | 144 +++++++++++- kernel/sched/fair.c | 2 +- kernel/sched/idle.c | 2 +- kernel/sched/sched.h | 8 +- kernel/sched/topology.c | 5 + tools/testing/selftests/sched_ext/Makefile | 2 + tools/testing/selftests/sched_ext/rt_stall.bpf.c | 23 ++ tools/testing/selftests/sched_ext/rt_stall.c | 222 ++++++++++++++++++ tools/testing/selftests/sched_ext/total_bw.c | 281 +++++++++++++++++++++++ 12 files changed, 955 insertions(+), 77 deletions(-) create mode 100644 tools/testing/selftests/sched_ext/rt_stall.bpf.c create mode 100644 tools/testing/selftests/sched_ext/rt_stall.c create mode 100644 tools/testing/selftests/sched_ext/total_bw.c

2 months

4
27
0 0

[PATCH 0/9] mm/damon: misc cleanups

by SeongJae Park

Yet another batch of misc cleanups and refactoring for DAMON code, tests, and documents. First two patches (1and 2) rename DAMOS core filters related code for readability. Three following patches (3-5) refactor page table walk callback functions in DAMON, as suggested by Hugh and David, and I promised. Next two patches (6 and 7) refactor DAMON core layer kunit test and sysfs interface selftest to be simple and deduplicated. Final two patches (8 and 9) fix up sphinx and grammatical errors on documents. SeongJae Park (9): mm/damon: rename damos core filter helpers to have word core mm/damon: rename damos->filters to damos->core_filters mm/damon/vaddr: cleanup using pmd_trans_huge_lock() mm/damon/vaddr: use vm_normal_folio{,_pmd}() instead of damon_get_folio() mm/damon/vaddr: consistently use only pmd_entry for damos_migrate mm/damon/tests/core-kunit: remove DAMON_MIN_REGION redefinition selftests/damon/sysfs.py: merge DAMON status dumping into commitment assertion Docs/mm/damon/maintainer-profile: fix a typo on mm-untable link Docs/mm/damon/maintainer-profile: fix grammartical errors .clang-format | 4 +- Documentation/mm/damon/maintainer-profile.rst | 10 +- include/linux/damon.h | 14 +- mm/damon/core.c | 25 ++- mm/damon/tests/core-kunit.h | 59 ++++---- mm/damon/vaddr.c | 143 +++++++----------- .../selftests/damon/drgn_dump_damon_status.py | 8 +- tools/testing/selftests/damon/sysfs.py | 45 ++---- 8 files changed, 121 insertions(+), 187 deletions(-) base-commit: 4e9ec347bc14de636aec3014dee3b5d279ca33bf -- 2.47.3

2 months

1
3
0 0

[PATCH bpf-next] selftests/bpf: Fix htab_update/reenter_update selftest failure

by Saket Kumar Bhaskar

Since commit 31158ad02ddb ("rqspinlock: Add deadlock detection and recovery") the updated path on re-entrancy now reports deadlock via -EDEADLK instead of the previous -EBUSY. The selftest is updated to align with expected errno with the kernel’s current behavior. Signed-off-by: Saket Kumar Bhaskar <skb99(a)linux.ibm.com> --- tools/testing/selftests/bpf/prog_tests/htab_update.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/prog_tests/htab_update.c b/tools/testing/selftests/bpf/prog_tests/htab_update.c index 2bc85f4814f4..98d52bb1446f 100644 --- a/tools/testing/selftests/bpf/prog_tests/htab_update.c +++ b/tools/testing/selftests/bpf/prog_tests/htab_update.c @@ -40,7 +40,7 @@ static void test_reenter_update(void) if (!ASSERT_OK(err, "add element")) goto out; - ASSERT_EQ(skel->bss->update_err, -EBUSY, "no reentrancy"); + ASSERT_EQ(skel->bss->update_err, -EDEADLK, "no reentrancy"); out: htab_update__destroy(skel); } -- 2.51.0

2 months

2
5
0 0

[PATCH v5 net-next 00/14] AccECN protocol case handling series

by chia-yu.chang＠nokia-bell-labs.com

From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com> Hello, Plesae find the v5 AccECN case handling patch series, which covers several excpetional case handling of Accurate ECN spec (RFC9768), adds new identifiers to be used by CC modules, adds ecn_delta into rate_sample, and keeps the ACE counter for computation, etc. This patch series is part of the full AccECN patch series, which is available at https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/ Best regards, Chia-Yu --- v5: - Move previous #11 in v4 in latter patch after discussion with RFC author. - Add #3 to update the comments for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN. (Parav Pandit <parav(a)nvidia.com>) - Add gro self-test for TCP CWR flag in #4. (Eric Dumazet <edumazet(a)google.com>) - Add fixes: tag into #7 (Paolo Abeni <pabeni(a)redhat.com>) - Update commit message of #8 and if condition check (Paolo Abeni <pabeni(a)redhat.com>) - Add empty line between variable declarations and code in #13 (Paolo Abeni <pabeni(a)redhat.com>) v4: - Add previous #13 in v2 back after dicussion with the RFC author. - Add TCP_ACCECN_OPTION_PERSIST to tcp_ecn_option sysctl to ignore AccECN fallback policy on sending AccECN option. v3: - Add additional min() check if pkts_acked_ewma is not initialized in #1. (Paolo Abeni <pabeni(a)redhat.com>) - Change TCP_CONG_WANTS_ECT_1 into individual flag add helper function INET_ECN_xmit_wants_ect_1() in #3. (Paolo Abeni <pabeni(a)redhat.com>) - Add empty line between variable declarations and code in #4. (Paolo Abeni <pabeni(a)redhat.com>) - Update commit message to fix old AccECN commits in #5. (Paolo Abeni <pabeni(a)redhat.com>) - Remove unnecessary brackets in #10. (Paolo Abeni <pabeni(a)redhat.com>) - Move patch #3 in v2 to a later Prague patch serise and remove patch #13 in v2. (Paolo Abeni <pabeni(a)redhat.com>) --- Chia-Yu Chang (12): net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN selftests/net: gro: add self-test for TCP CWR flag tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules tcp: disable RFC3168 fallback identifier for CC modules tcp: accecn: handle unexpected AccECN negotiation feedback tcp: accecn: retransmit downgraded SYN in AccECN negotiation tcp: move increment of num_retrans tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion tcp: accecn: fallback outgoing half link to non-AccECN tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST tcp: accecn: enable AccECN Ilpo Järvinen (2): tcp: try to avoid safer when ACKs are thinned gro: flushing when CWR is set negatively affects AccECN Documentation/networking/ip-sysctl.rst | 4 +- .../networking/net_cachelines/tcp_sock.rst | 1 + include/linux/skbuff.h | 13 ++- include/linux/tcp.h | 4 +- include/net/inet_ecn.h | 20 +++- include/net/tcp.h | 32 ++++++- include/net/tcp_ecn.h | 92 ++++++++++++++----- net/ipv4/sysctl_net_ipv4.c | 4 +- net/ipv4/tcp.c | 2 + net/ipv4/tcp_cong.c | 10 +- net/ipv4/tcp_input.c | 37 +++++++- net/ipv4/tcp_minisocks.c | 40 +++++--- net/ipv4/tcp_offload.c | 3 +- net/ipv4/tcp_output.c | 42 ++++++--- tools/testing/selftests/net/gro.c | 80 +++++++++++----- 15 files changed, 294 insertions(+), 90 deletions(-) -- 2.34.1

2 months

4
38
0 0

[PATCH net-next v4 00/12] selftests/vsock: refactor and improve vmtest infrastructure

by Bobby Eshleman

Hey all, This patch series refactors the vsock selftest VM infrastructure to improve test run times, improve logging, and prepare for future tests which make heavy usage of these refactored functions and have new requirements such as simultaneous QEMU processes. These patches were broken off from this prior series: https://lore.kernel.org/all/20251021-vsock-vmtest-v7-0-0661b7b6f081@meta.co… To: Stefano Garzarella <sgarzare(a)redhat.com> To: Shuah Khan <shuah(a)kernel.org> Cc: virtualization(a)lists.linux.dev Cc: netdev(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org Cc: Simon Horman <horms(a)kernel.org> Changes in v4: - fix messed up rebase (wrt check_result() and shared_vm_test() patches) - more consistent variable quotes style - use associative array for pidfiles, remove after terminate - Link to v3: https://lore.kernel.org/r/20251106-vsock-selftests-fixes-and-improvements-v… Changes in v3: - see per-patch changes - Link to v2: https://lore.kernel.org/all/20251104-vsock-selftests-fixes-and-improvements… Changes in v2: - remove "Fixes" for some patches because they do not fix bugs in kselftest runs (some fix bugs only when using bash args that kselftest does not use or otherwise prepare functions for new usage) - broke out one fixes patch for "net" - per-patch changes - add patch for shellcheck declaration to disable false positives - Link to v1: https://lore.kernel.org/r/20251022-vsock-selftests-fixes-and-improvements-v… --- Bobby Eshleman (12): selftests/vsock: improve logging in vmtest.sh selftests/vsock: make wait_for_listener() work even if pipefail is on selftests/vsock: reuse logic for vsock_test through wrapper functions selftests/vsock: avoid multi-VM pidfile collisions with QEMU selftests/vsock: do not unconditionally die if qemu fails selftests/vsock: speed up tests by reducing the QEMU pidfile timeout selftests/vsock: add check_result() for pass/fail counting selftests/vsock: identify and execute tests that can re-use VM selftests/vsock: add BUILD=0 definition selftests/vsock: add 1.37 to tested virtme-ng versions selftests/vsock: add vsock_loopback module loading selftests/vsock: disable shellcheck SC2317 and SC2119 tools/testing/selftests/vsock/vmtest.sh | 346 +++++++++++++++++++++----------- 1 file changed, 233 insertions(+), 113 deletions(-) --- base-commit: a0c3aefb08cd81864b17c23c25b388dba90b9dad change-id: 20251021-vsock-selftests-fixes-and-improvements-057440ffb2fa Best regards, -- Bobby Eshleman <bobbyeshleman(a)meta.com>

2 months

4
25
0 0

[PATCH net-next v14 0/7] bonding: Extend arp_ip_target format to allow for a list of vlan tags.

by David Wilder

The current implementation of the arp monitor builds a list of vlan-tags by following the chain of net_devices above the bond. See bond_verify_device_path(). Unfortunately, with some configurations, this is not possible. One example is when an ovs switch is configured above the bond. This change extends the "arp_ip_target" parameter format to allow for a list of vlan tags to be included for each arp target. This new list of tags is optional and may be omitted to preserve the current format and process of discovering vlans. The new format for arp_ip_target is: arp_ip_target ipv4-address[vlan-tag\...],... For example: arp_ip_target 10.0.0.1[10/20] arp_ip_target 10.0.0.1[] (used to disable vlan discovery) Changes since V13 Thanks for the help Paolo: - Changed first argument of bond_option_arp_ip_target_add() to a const. - Changed first argument of bond_arp_target_to_string to a const. - Added compiler time check of size argument to: bond_arp_target_to_string(), BUILD_BUG_ON(size != BOND_OPTION_STRING_MAX_SIZE); - In bond_arp_send_all() I changed the condition for both the allocation and the free calls to be the same to improve the clarity of the code. - Removed extra tab in bond_fill_info(). - Updated update bond_get_size() to reflect the increased payload for the arp_ip_target option. - Corrected indentation and alignment in bond-arp-ip-target.sh. Changes since V12 Fixed uninitialized variable in bond_option_arp_ip_targets_set() (patch 4) causing a CI failure. Changes since V11 No Change. Changes since V10 Thanks Paolo: - 1/7 Changed the layout of struct bond_arp_target to reduce size of the struct. - 3/7 Fixed format 'size-num' -> 'size - num' - 7/7 Updated selftest (bond-arp-ip-target.sh). Removed sleep 10 in check_failure_count(). Added call to tc to verify arp probes are reaching the target interface. Then I verify that the Link Failure counts are not increasing over "time". Arp probes are sent every 100ms, two missed probes will trigger a Link failure. A one second wait between checking counts should be be more than sufficient. This speeds up the execution of the test. Thanks Nikolay: - 4/7 In bond_option_arp_ip_targets_clear() I changed the definition of empty_target to empty_target = {}. - bond_validate_tags() now verifies input is a multiple of sizeof(struct bond_vlan_tag). Updated VID validity check to use: !tags->vlan_id || tags->vlan_id >= VLAN_VID_MASK) as suggested. - In bond_option_arp_ip_targets_set() removed the redundant length check of target.target_ip. - Added kfree(target.tags) when bond_option_arp_ip_target_add() results in an error. - Removed the caching of struct bond_vlan_tag returned by bond_verify_device_path(), Nikolay pointed out that caching tags prevented the detection of VLAN configuration changes. Added a kfree(tags) for tags allocated in bond_verify_device_path(). Jay, Nikolay and I had a discussion regarding locking when adding, deleting or changing vlan tags. Jay pointed out that user supplied tags that are stashed in the bond configuration and can only be changed via user space this can be done safely in an RCU manner as netlink always operates with RTNL held. If user space provided tags and then replumbs things, it'll be on user space to update the tags in a safe manor. I was concerned about changing options on a configured bond, I found that attempting to change a bonds configuration (using "ip set") will abort the attempt to make a change if the bond's state is "UP" or has slaves configured. Therefor the configuration and operational side of a bond is separated. I agree with Jay that the existing locking scheme is sufficient. Change since V9 Fix kdoc build error. Changes since V8: Moved the #define BOND_MAX_VLAN_TAGS from patch 6 to patch 3. Thanks Simon for catching the bisection break. Changes since V7: These changes should eliminate the CI failures I have been seeing. 1) patch 2, changed type of bond_opt_value.extra_len to size_t. 2) Patch 4, added bond_validate_tags() to validate the array of bond_vlan_tag provided by the user. Changes since V6: 1) I made a number of changes to fix the failure seen in the kernel CI. I am still unable to reproduce the this failure, hopefully I have fixed it. These change are in patch #4 to functions: bond_option_arp_ip_targets_clear() and bond_option_arp_ip_targets_set() Changes since V5: Only the last 2 patches have changed since V5. 1) Fixed sparse warning in bond_fill_info(). 2) Also in bond_fill_info() I resolved data.addr uninitialized when if condition is not met. Thank you Simon for catching this. Note: The change is different that what I shared earlier. 3) Fixed shellcheck warnings in test script: Blocked source warning, Ignored specific unassigned references and exported ALL_TESTS to resolve a reference warning. Changes since V4: 1)Dropped changes to proc and sysfs APIs to bonding. These APIs do not need to be updated to support new functionality. Netlink and iproute2 have been updated to do the right thing, but the other APIs are more or less frozen in the past. 2)Jakub reported a warning triggered in bond_info_seq_show() during testing. I was unable to reproduce this warning or identify it with code inspection. However, all my changes to bond_info_seq_show() have been dropped as unnecessary (see above). Hopefully this will resolve the issue. 3)Selftest script has been updated based on the results of shellcheck. Two unresolved references that are not possible to resolve are all that remain. 4)A patch was added updating bond_info_fill() to support "ip -d show <bond-device>" command. The inclusion of a list of vlan tags is optional. The new logic preserves both forward and backward compatibility with the kernel and iproute2 versions. Changes since V3: 1) Moved the parsing of the extended arp_ip_target out of the kernel and into userspace (ip command). A separate patch to iproute2 to follow shortly. 2) Split up the patch set to make review easier. Please see iproute changes in a separate posting. Thank you for your time and reviews. Signed-off-by: David Wilder <wilder(a)us.ibm.com> David Wilder (7): bonding: Adding struct bond_arp_target bonding: Adding extra_len field to struct bond_opt_value. bonding: arp_ip_target helpers. bonding: Processing extended arp_ip_target from user space. bonding: Update to bond_arp_send_all() to use supplied vlan tags bonding: Update for extended arp_ip_target format. bonding: Selftest and documentation for the arp_ip_target parameter. Documentation/networking/bonding.rst | 11 + drivers/net/bonding/bond_main.c | 48 +++-- drivers/net/bonding/bond_netlink.c | 39 +++- drivers/net/bonding/bond_options.c | 146 ++++++++++--- drivers/net/bonding/bond_procfs.c | 4 +- drivers/net/bonding/bond_sysfs.c | 4 +- include/net/bond_options.h | 29 ++- include/net/bonding.h | 67 +++++- .../selftests/drivers/net/bonding/Makefile | 1 + .../drivers/net/bonding/bond-arp-ip-target.sh | 204 ++++++++++++++++++ 10 files changed, 474 insertions(+), 79 deletions(-) create mode 100755 tools/testing/selftests/drivers/net/bonding/bond-arp-ip-target.sh -- 2.50.1

2 months

4
10
0 0

[PATCH v22 00/28] riscv control-flow integrity for usermode

by Deepak Gupta via B4 Relay

v22: fixing build error due to -march=zicfiss being picked in gcc-13 and above but not actually doing any codegen or recognizing instruction for zicfiss. Change in v22 makes dependence on `-fcf-protection=full` compiler flag to ensure that toolchain has support and then only CONFIG_RISCV_USER_CFI will be visible in menuconfig. v21: fixed build errors. Basics and overview =================== Software with larger attack surfaces (e.g. network facing apps like databases, browsers or apps relying on browser runtimes) suffer from memory corruption issues which can be utilized by attackers to bend control flow of the program to eventually gain control (by making their payload executable). Attackers are able to perform such attacks by leveraging call-sites which rely on indirect calls or return sites which rely on obtaining return address from stack memory. To mitigate such attacks, risc-v extension zicfilp enforces that all indirect calls must land on a landing pad instruction `lpad` else cpu will raise software check exception (a new cpu exception cause code on riscv). Similarly for return flow, risc-v extension zicfiss extends architecture with - `sspush` instruction to push return address on a shadow stack - `sspopchk` instruction to pop return address from shadow stack and compare with input operand (i.e. return address on stack) - `sspopchk` to raise software check exception if comparision above was a mismatch - Protection mechanism using which shadow stack is not writeable via regular store instructions More information an details can be found at extensions github repo [1]. Equivalent to landing pad (zicfilp) on x86 is `ENDBRANCH` instruction in Intel CET [3] and branch target identification (BTI) [4] on arm. Similarly x86's Intel CET has shadow stack [5] and arm64 has guarded control stack (GCS) [6] which are very similar to risc-v's zicfiss shadow stack. x86 and arm64 support for user mode shadow stack is already in mainline. Kernel awareness for user control flow integrity ================================================ This series picks up Samuel Holland's envcfg changes [2] as well. So if those are being applied independently, they should be removed from this series. Enabling: In order to maintain compatibility and not break anything in user mode, kernel doesn't enable control flow integrity cpu extensions on binary by default. Instead exposes a prctl interface to enable, disable and lock the shadow stack or landing pad feature for a task. This allows userspace (loader) to enumerate if all objects in its address space are compiled with shadow stack and landing pad support and accordingly enable the feature. Additionally if a subsequent `dlopen` happens on a library, user mode can take a decision again to disable the feature (if incoming library is not compiled with support) OR terminate the task (if user mode policy is strict to have all objects in address space to be compiled with control flow integirty cpu feature). prctl to enable shadow stack results in allocating shadow stack from virtual memory and activating for user address space. x86 and arm64 are also following same direction due to similar reason(s). clone/fork: On clone and fork, cfi state for task is inherited by child. Shadow stack is part of virtual memory and is a writeable memory from kernel perspective (writeable via a restricted set of instructions aka shadow stack instructions) Thus kernel changes ensure that this memory is converted into read-only when fork/clone happens and COWed when fault is taken due to sspush, sspopchk or ssamoswap. In case `CLONE_VM` is specified and shadow stack is to be enabled, kernel will automatically allocate a shadow stack for that clone call. map_shadow_stack: x86 introduced `map_shadow_stack` system call to allow user space to explicitly map shadow stack memory in its address space. It is useful to allocate shadow for different contexts managed by a single thread (green threads or contexts) risc-v implements this system call as well. signal management: If shadow stack is enabled for a task, kernel performs an asynchronous control flow diversion to deliver the signal and eventually expects userspace to issue sigreturn so that original execution can be resumed. Even though resume context is prepared by kernel, it is in user space memory and is subject to memory corruption and corruption bugs can be utilized by attacker in this race window to perform arbitrary sigreturn and eventually bypass cfi mechanism. Another issue is how to ensure that cfi related state on sigcontext area is not trampled by legacy apps or apps compiled with old kernel headers. In order to mitigate control-flow hijacting, kernel prepares a token and place it on shadow stack before signal delivery and places address of token in sigcontext structure. During sigreturn, kernel obtains address of token from sigcontext struture, reads token from shadow stack and validates it and only then allow sigreturn to succeed. Compatiblity issue is solved by adopting dynamic sigcontext management introduced for vector extension. This series re-factor the code little bit to allow future sigcontext management easy (as proposed by Andy Chiu from SiFive) config and compilation: Introduce a new risc-v config option `CONFIG_RISCV_USER_CFI`. Selecting this config option picks the kernel support for user control flow integrity. This optin is presented only if toolchain has shadow stack and landing pad support. And is on purpose guarded by toolchain support. Reason being that eventually vDSO also needs to be compiled in with shadow stack and landing pad support. vDSO compile patches are not included as of now because landing pad labeling scheme is yet to settle for usermode runtime. To get more information on kernel interactions with respect to zicfilp and zicfiss, patch series adds documentation for `zicfilp` and `zicfiss` in following: Documentation/arch/riscv/zicfiss.rst Documentation/arch/riscv/zicfilp.rst How to test this series ======================= Toolchain --------- $ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev $ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static" $ make -j$(nproc) Qemu ---- Get the lastest qemu $ cd qemu $ mkdir build $ cd build $ ../configure --target-list=riscv64-softmmu $ make -j$(nproc) Opensbi ------- $ git clone git@github.com:deepak0414/opensbi.git -b v6_cfi_spec_split_opensbi $ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic Linux ----- Running defconfig is fine. CFI is enabled by default if the toolchain supports it. $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) Running ------- Modify your qemu command to have: -bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin -cpu rv64,zicfilp=true,zicfiss=true,zimop=true,zcmop=true References ========== [1] - https://github.com/riscv/riscv-cfi [2] - https://lore.kernel.org/all/20240814081126.956287-1-samuel.holland@sifive.c… [3] - https://lwn.net/Articles/889475/ [4] - https://developer.arm.com/documentation/109576/0100/Branch-Target-Identific… [5] - https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-i… [6] - https://lwn.net/Articles/940403/ To: Thomas Gleixner <tglx(a)linutronix.de> To: Ingo Molnar <mingo(a)redhat.com> To: Borislav Petkov <bp(a)alien8.de> To: Dave Hansen <dave.hansen(a)linux.intel.com> To: x86(a)kernel.org To: H. Peter Anvin <hpa(a)zytor.com> To: Andrew Morton <akpm(a)linux-foundation.org> To: Liam R. Howlett <Liam.Howlett(a)oracle.com> To: Vlastimil Babka <vbabka(a)suse.cz> To: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> To: Paul Walmsley <paul.walmsley(a)sifive.com> To: Palmer Dabbelt <palmer(a)dabbelt.com> To: Albert Ou <aou(a)eecs.berkeley.edu> To: Conor Dooley <conor(a)kernel.org> To: Rob Herring <robh(a)kernel.org> To: Krzysztof Kozlowski <krzk+dt(a)kernel.org> To: Arnd Bergmann <arnd(a)arndb.de> To: Christian Brauner <brauner(a)kernel.org> To: Peter Zijlstra <peterz(a)infradead.org> To: Oleg Nesterov <oleg(a)redhat.com> To: Eric Biederman <ebiederm(a)xmission.com> To: Kees Cook <kees(a)kernel.org> To: Jonathan Corbet <corbet(a)lwn.net> To: Shuah Khan <shuah(a)kernel.org> To: Jann Horn <jannh(a)google.com> To: Conor Dooley <conor+dt(a)kernel.org> To: Miguel Ojeda <ojeda(a)kernel.org> To: Alex Gaynor <alex.gaynor(a)gmail.com> To: Boqun Feng <boqun.feng(a)gmail.com> To: Gary Guo <gary(a)garyguo.net> To: Björn Roy Baron <bjorn3_gh(a)protonmail.com> To: Benno Lossin <benno.lossin(a)proton.me> To: Andreas Hindborg <a.hindborg(a)kernel.org> To: Alice Ryhl <aliceryhl(a)google.com> To: Trevor Gross <tmgross(a)umich.edu> Cc: linux-kernel(a)vger.kernel.org Cc: linux-fsdevel(a)vger.kernel.org Cc: linux-mm(a)kvack.org Cc: linux-riscv(a)lists.infradead.org Cc: devicetree(a)vger.kernel.org Cc: linux-arch(a)vger.kernel.org Cc: linux-doc(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: alistair.francis(a)wdc.com Cc: richard.henderson(a)linaro.org Cc: jim.shu(a)sifive.com Cc: andybnac(a)gmail.com Cc: kito.cheng(a)sifive.com Cc: charlie(a)rivosinc.com Cc: atishp(a)rivosinc.com Cc: evan(a)rivosinc.com Cc: cleger(a)rivosinc.com Cc: alexghiti(a)rivosinc.com Cc: samitolvanen(a)google.com Cc: broonie(a)kernel.org Cc: rick.p.edgecombe(a)intel.com Cc: rust-for-linux(a)vger.kernel.org changelog --------- v22: - CONFIG_RISCV_USER_CFI was by default "n". With dual vdso support it is default "y" (if toolchain supports it). Fixing build error due to "-march=zicfiss" being picked in gcc-13 partially. gcc-13 only recognizes the flag but not actually doing any codegen or recognizing instruction for zicfiss. Change in v22 makes dependence on `-fcf-protection=full` compiler flag to ensure that toolchain has support and then only CONFIG_RISCV_USER_CFI will be visible in menuconfig. - picked up tags and some cosmetic changes in commit message for dual vdso patch. v21: - Fixing build errors due to changes in arch/riscv/include/asm/vdso.h Using #ifdef instead of IS_ENABLED in arch/riscv/include/asm/vdso.h vdso-cfi-offsets.h should be included only when CONFIG_RISCV_USER_CFI is selected. v20: - rebased on v6.18-rc1. - Added two vDSO support. If `CONFIG_RISCV_USER_CFI` is selected two vDSOs are compiled (one for hardware prior to RVA23 and one for RVA23 onwards). Kernel exposes RVA23 vDSO if hardware/cpu implements zimop else exposes existing vDSO to userspace. - default selection for `CONFIG_RISCV_USER_CFI` is "Yes". - replaced "__ASSEMBLY__" with "__ASSEMBLER__" v19: - riscv_nousercfi was `int`. changed it to unsigned long. Thanks to Alex Ghiti for reporting it. It was a bug. - ELP is cleared on trap entry only when CONFIG_64BIT. - restore ssp back on return to usermode was being done before `riscv_v_context_nesting_end` on trap exit path. If kernel shadow stack were enabled this would result in kernel operating on user shadow stack and panic (as I found in my testing of kcfi patch series). So fixed that. v18: - rebased on 6.16-rc1 - uprobe handling clears ELP in sstatus image in pt_regs - vdso was missing shadow stack elf note for object files. added that. Additional asm file for vdso needed the elf marker flag. toolchain should complain if `-fcf-protection=full` and marker is missing for object generated from asm file. Asked toolchain folks to fix this. Although no reason to gate the merge on that. - Split up compile options for march and fcf-protection in vdso Makefile - CONFIG_RISCV_USER_CFI option is moved under "Kernel features" menu Added `arch/riscv/configs/hardening.config` fragment which selects CONFIG_RISCV_USER_CFI v17: - fixed warnings due to empty macros in usercfi.h (reported by alexg) - fixed prefixes in commit titles reported by alexg - took below uprobe with fcfi v2 patch from Zong Li and squashed it with "riscv/traps: Introduce software check exception and uprobe handling" https://lore.kernel.org/all/20250604093403.10916-1-zong.li@sifive.com/ v16: - If FWFT is not implemented or returns error for shadow stack activation, then no_usercfi is set to disable shadow stack. Although this should be picked up by extension validation and activation. Fixed this bug for zicfilp and zicfiss both. Thanks to Charlie Jenkins for reporting this. - If toolchain doesn't support cfi, cfi kselftest shouldn't build. Suggested by Charlie Jenkins. - Default for CONFIG_RISCV_USER_CFI is set to no. Charlie/Atish suggested to keep it off till we have more hardware availibility with RVA23 profile and zimop/zcmop implemented. Else this will start breaking people's workflow - Includes the fix if "!RV64 and !SBI" then definitions for FWFT in asm-offsets.c error. v15: - Toolchain has been updated to include `-fcf-protection` flag. This exists for x86 as well. Updated kernel patches to compile vDSO and selftest to compile with `fcf-protection=full` flag. - selecting CONFIG_RISCV_USERCFI selects CONFIG_RISCV_SBI. - Patch to enable shadow stack for kernel wasn't hidden behind CONFIG_RISCV_USERCFI and CONFIG_RISCV_SBI. fixed that. v14: - rebased on top of palmer/sbi-v3. Thus dropped clement's FWFT patches Updated RISCV_ISA_EXT_XXXX in hwcap and hwprobe constants. - Took Radim's suggestions on bitfields. - Placed cfi_state at the end of thread_info block so that current situation is not disturbed with respect to member fields of thread_info in single cacheline. v13: - cpu_supports_shadow_stack/cpu_supports_indirect_br_lp_instr uses riscv_has_extension_unlikely() - uses nops(count) to create nop slide - RISCV_ACQUIRE_BARRIER is not needed in `amo_user_shstk`. Removed it - changed ternaries to simply use implicit casting to convert to bool. - kernel command line allows to disable zicfilp and zicfiss independently. updated kernel-parameters.txt. - ptrace user abi for cfi uses bitmasks instead of bitfields. Added ptrace kselftest. - cosmetic and grammatical changes to documentation. v12: - It seems like I had accidently squashed arch agnostic indirect branch tracking prctl and riscv implementation of those prctls. Split them again. - set_shstk_status/set_indir_lp_status perform CSR writes only when CPU support is available. As suggested by Zong Li. - Some minor clean up in kselftests as suggested by Zong Li. v11: - patch "arch/riscv: compile vdso with landing pad" was unconditionally selecting `_zicfilp` for vDSO compile. fixed that. Changed `lpad 1` to to `lpad 0`. v10: - dropped "mm: helper `is_shadow_stack_vma` to check shadow stack vma". This patch is not that interesting to this patch series for risc-v. There are instances in arch directories where VM_SHADOW_STACK flag is anyways used. Dropping this patch to expedite merging in riscv tree. - Took suggestions from `Clement` on "riscv: zicfiss / zicfilp enumeration" to validate presence of cfi based on config. - Added a patch for vDSO to have `lpad 0`. I had omitted this earlier to make sure we add single vdso object with cfi enabled. But a vdso object with scheme of zero labeled landing pad is least common denominator and should work with all objects of zero labeled as well as function-signature labeled objects. v9: - rebased on master (39a803b754d5 fix braino in "9p: fix ->rename_sem exclusion") - dropped "mm: Introduce ARCH_HAS_USER_SHADOW_STACK" (master has it from arm64/gcs) - dropped "prctl: arch-agnostic prctl for shadow stack" (master has it from arm64/gcs) v8: - rebased on palmer/for-next - dropped samuel holland's `envcfg` context switch patches. they are in parlmer/for-next v7: - Removed "riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv" Instead using `deactivate_mm` flow to clean up. see here for more context https://lore.kernel.org/all/20230908203655.543765-1-rick.p.edgecombe@intel.… - Changed the header include in `kselftest`. Hopefully this fixes compile issue faced by Zong Li at SiFive. - Cleaned up an orphaned change to `mm/mmap.c` in below patch "riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE" - Lock interfaces for shadow stack and indirect branch tracking expect arg == 0 Any future evolution of this interface should accordingly define how arg should be setup. - `mm/map.c` has an instance of using `VM_SHADOW_STACK`. Fixed it to use helper `is_shadow_stack_vma`. - Link to v6: https://lore.kernel.org/r/20241008-v5_user_cfi_series-v6-0-60d9fe073f37@riv… v6: - Picked up Samuel Holland's changes as is with `envcfg` placed in `thread` instead of `thread_info` - fixed unaligned newline escapes in kselftest - cleaned up messages in kselftest and included test output in commit message - fixed a bug in clone path reported by Zong Li - fixed a build issue if CONFIG_RISCV_ISA_V is not selected (this was introduced due to re-factoring signal context management code) v5: - rebased on v6.12-rc1 - Fixed schema related issues in device tree file - Fixed some of the documentation related issues in zicfilp/ss.rst (style issues and added index) - added `SHADOW_STACK_SET_MARKER` so that implementation can define base of shadow stack. - Fixed warnings on definitions added in usercfi.h when CONFIG_RISCV_USER_CFI is not selected. - Adopted context header based signal handling as proposed by Andy Chiu - Added support for enabling kernel mode access to shadow stack using FWFT (https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-firmware…) - Link to v5: https://lore.kernel.org/r/20241001-v5_user_cfi_series-v1-0-3ba65b6e550f@riv… (Note: I had an issue in my workflow due to which version number wasn't picked up correctly while sending out patches) v4: - rebased on 6.11-rc6 - envcfg: Converged with Samuel Holland's patches for envcfg management on per- thread basis. - vma_is_shadow_stack is renamed to is_vma_shadow_stack - picked up Mark Brown's `ARCH_HAS_USER_SHADOW_STACK` patch - signal context: using extended context management to maintain compatibility. - fixed `-Wmissing-prototypes` compiler warnings for prctl functions - Documentation fixes and amending typos. - Link to v4: https://lore.kernel.org/all/20240912231650.3740732-1-debug@rivosinc.com/ v3: - envcfg logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been picked on per task basis, even though CPU didn't implement it. Fixed in this series. - dt-bindings As suggested, split into separate commit. fixed the messaging that spec is in public review - arch_is_shadow_stack change arch_is_shadow_stack changed to vma_is_shadow_stack - hwprobe zicfiss / zicfilp if present will get enumerated in hwprobe - selftests As suggested, added object and binary filenames to .gitignore Selftest binary anyways need to be compiled with cfi enabled compiler which will make sure that landing pad and shadow stack are enabled. Thus removed separate enable/disable tests. Cleaned up tests a bit. - Link to v3: https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/ v2: - Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow integrity for user mode programs can be compiled in the kernel. - Enabling of control flow integrity for user programs is left to user runtime - This patch series introduces arch agnostic `prctls` to enable shadow stack and indirect branch tracking. And implements them on riscv. --- Changes in v22: - Link to v21: https://lore.kernel.org/r/20251015-v5_user_cfi_series-v21-0-6a07856e90e7@ri… Changes in v21: - Link to v20: https://lore.kernel.org/r/20251013-v5_user_cfi_series-v20-0-b9de4be9912e@ri… Changes in v20: - Link to v19: https://lore.kernel.org/r/20250731-v5_user_cfi_series-v19-0-09b468d7beab@ri… Changes in v19: - Link to v18: https://lore.kernel.org/r/20250711-v5_user_cfi_series-v18-0-a8ee62f9f38e@ri… Changes in v18: - Link to v17: https://lore.kernel.org/r/20250604-v5_user_cfi_series-v17-0-4565c2cf869f@ri… Changes in v17: - Link to v16: https://lore.kernel.org/r/20250522-v5_user_cfi_series-v16-0-64f61a35eee7@ri… Changes in v16: - Link to v15: https://lore.kernel.org/r/20250502-v5_user_cfi_series-v15-0-914966471885@ri… Changes in v15: - changelog posted just below cover letter - Link to v14: https://lore.kernel.org/r/20250429-v5_user_cfi_series-v14-0-5239410d012a@ri… Changes in v14: - changelog posted just below cover letter - Link to v13: https://lore.kernel.org/r/20250424-v5_user_cfi_series-v13-0-971437de586a@ri… Changes in v13: - changelog posted just below cover letter - Link to v12: https://lore.kernel.org/r/20250314-v5_user_cfi_series-v12-0-e51202b53138@ri… Changes in v12: - changelog posted just below cover letter - Link to v11: https://lore.kernel.org/r/20250310-v5_user_cfi_series-v11-0-86b36cbfb910@ri… Changes in v11: - changelog posted just below cover letter - Link to v10: https://lore.kernel.org/r/20250210-v5_user_cfi_series-v10-0-163dcfa31c60@ri… --- Andy Chiu (1): riscv: signal: abstract header saving for setup_sigcontext Deepak Gupta (26): mm: VM_SHADOW_STACK definition for riscv dt-bindings: riscv: zicfilp and zicfiss in dt-bindings (extensions.yaml) riscv: zicfiss / zicfilp enumeration riscv: zicfiss / zicfilp extension csr and bit definitions riscv: usercfi state for task and save/restore of CSR_SSP on trap entry/exit riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE riscv/mm: manufacture shadow stack pte riscv/mm: teach pte_mkwrite to manufacture shadow stack PTEs riscv/mm: write protect and shadow stack riscv/mm: Implement map_shadow_stack() syscall riscv/shstk: If needed allocate a new shadow stack on clone riscv: Implements arch agnostic shadow stack prctls prctl: arch-agnostic prctl for indirect branch tracking riscv: Implements arch agnostic indirect branch tracking prctls riscv/traps: Introduce software check exception and uprobe handling riscv/signal: save and restore of shadow stack for signal riscv/kernel: update __show_regs to print shadow stack register riscv/ptrace: riscv cfi status and state via ptrace and in core files riscv/hwprobe: zicfilp / zicfiss enumeration in hwprobe riscv: kernel command line option to opt out of user cfi riscv: enable kernel access to shadow stack memory via FWFT sbi call arch/riscv: dual vdso creation logic and select vdso based on hw riscv: create a config for shadow stack and landing pad instr support riscv: Documentation for landing pad / indirect branch tracking riscv: Documentation for shadow stack on riscv kselftest/riscv: kselftest for user mode cfi Jim Shu (1): arch/riscv: compile vdso with landing pad and shadow stack note Documentation/admin-guide/kernel-parameters.txt | 8 + Documentation/arch/riscv/index.rst | 2 + Documentation/arch/riscv/zicfilp.rst | 115 +++++ Documentation/arch/riscv/zicfiss.rst | 179 +++++++ .../devicetree/bindings/riscv/extensions.yaml | 14 + arch/riscv/Kconfig | 22 + arch/riscv/Makefile | 8 +- arch/riscv/configs/hardening.config | 4 + arch/riscv/include/asm/asm-prototypes.h | 1 + arch/riscv/include/asm/assembler.h | 44 ++ arch/riscv/include/asm/cpufeature.h | 12 + arch/riscv/include/asm/csr.h | 16 + arch/riscv/include/asm/entry-common.h | 2 + arch/riscv/include/asm/hwcap.h | 2 + arch/riscv/include/asm/mman.h | 26 + arch/riscv/include/asm/mmu_context.h | 7 + arch/riscv/include/asm/pgtable.h | 30 +- arch/riscv/include/asm/processor.h | 1 + arch/riscv/include/asm/thread_info.h | 3 + arch/riscv/include/asm/usercfi.h | 95 ++++ arch/riscv/include/asm/vdso.h | 13 +- arch/riscv/include/asm/vector.h | 3 + arch/riscv/include/uapi/asm/hwprobe.h | 2 + arch/riscv/include/uapi/asm/ptrace.h | 34 ++ arch/riscv/include/uapi/asm/sigcontext.h | 1 + arch/riscv/kernel/Makefile | 2 + arch/riscv/kernel/asm-offsets.c | 10 + arch/riscv/kernel/cpufeature.c | 27 + arch/riscv/kernel/entry.S | 38 ++ arch/riscv/kernel/head.S | 27 + arch/riscv/kernel/process.c | 27 +- arch/riscv/kernel/ptrace.c | 95 ++++ arch/riscv/kernel/signal.c | 148 +++++- arch/riscv/kernel/sys_hwprobe.c | 2 + arch/riscv/kernel/sys_riscv.c | 10 + arch/riscv/kernel/traps.c | 54 ++ arch/riscv/kernel/usercfi.c | 545 +++++++++++++++++++++ arch/riscv/kernel/vdso.c | 7 + arch/riscv/kernel/vdso/Makefile | 40 +- arch/riscv/kernel/vdso/flush_icache.S | 4 + arch/riscv/kernel/vdso/gen_vdso_offsets.sh | 4 +- arch/riscv/kernel/vdso/getcpu.S | 4 + arch/riscv/kernel/vdso/note.S | 3 + arch/riscv/kernel/vdso/rt_sigreturn.S | 4 + arch/riscv/kernel/vdso/sys_hwprobe.S | 4 + arch/riscv/kernel/vdso/vgetrandom-chacha.S | 5 +- arch/riscv/kernel/vdso_cfi/Makefile | 25 + arch/riscv/kernel/vdso_cfi/vdso-cfi.S | 11 + arch/riscv/mm/init.c | 2 +- arch/riscv/mm/pgtable.c | 16 + include/linux/cpu.h | 4 + include/linux/mm.h | 7 + include/uapi/linux/elf.h | 2 + include/uapi/linux/prctl.h | 27 + kernel/sys.c | 30 ++ tools/testing/selftests/riscv/Makefile | 2 +- tools/testing/selftests/riscv/cfi/.gitignore | 3 + tools/testing/selftests/riscv/cfi/Makefile | 16 + tools/testing/selftests/riscv/cfi/cfi_rv_test.h | 82 ++++ tools/testing/selftests/riscv/cfi/riscv_cfi_test.c | 173 +++++++ tools/testing/selftests/riscv/cfi/shadowstack.c | 385 +++++++++++++++ tools/testing/selftests/riscv/cfi/shadowstack.h | 27 + 62 files changed, 2475 insertions(+), 41 deletions(-) --- base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787 change-id: 20240930-v5_user_cfi_series-3dc332f8f5b2 -- - debug

2 months

7
44
0 0

[PATCH net 0/6] selftests: mptcp: join: fix some flaky tests

by Matthieu Baerts (NGI0)

When looking at the recent CI results on NIPA and MPTCP CIs, a few MPTCP Join tests are marked as unstable. Here are some fixes for that. - Patch 1: a small fix for mptcp_connect.sh, printing a note as initially intended. For >=v5.13. - Patch 2: avoid unexpected reset when closing subflows. For >= 5.13. - Patches 3-4: longer transfer when not waiting for the end. For >=5.18. - Patch 5: read all received data when expecting a reset. For >= v6.1. - Patch 6: a fix to properly kill background tasks. For >= v6.5. Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org> --- Matthieu Baerts (NGI0) (6): selftests: mptcp: connect: fix fallback note due to OoO selftests: mptcp: join: rm: set backup flag selftests: mptcp: join: endpoints: longer transfer selftests: mptcp: join: userspace: longer transfer selftests: mptcp: connect: trunc: read all recv data selftests: mptcp: join: properly kill background tasks tools/testing/selftests/net/mptcp/mptcp_connect.c | 18 +++-- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 2 +- tools/testing/selftests/net/mptcp/mptcp_join.sh | 90 +++++++++++----------- tools/testing/selftests/net/mptcp/mptcp_lib.sh | 21 +++++ 4 files changed, 80 insertions(+), 51 deletions(-) --- base-commit: 96a9178a29a6b84bb632ebeb4e84cf61191c73d5 change-id: 20251108-net-mptcp-sft-join-unstable-5a28cdb6ea54 Best regards, -- Matthieu Baerts (NGI0) <matttbe(a)kernel.org>

2 months

3
8
0 0

[PATCH net-next v3 00/11] selftests/vsock: refactor and improve vmtest infrastructure

by Bobby Eshleman

Hey all, This patch series refactors the vsock selftest VM infrastructure to improve test run times, improve logging, and prepare for future tests which make heavy usage of these refactored functions and have new requirements such as simultaneous QEMU processes. These patches were broken off from this prior series: https://lore.kernel.org/all/20251021-vsock-vmtest-v7-0-0661b7b6f081@meta.co… To: Stefano Garzarella <sgarzare(a)redhat.com> To: Shuah Khan <shuah(a)kernel.org> Cc: virtualization(a)lists.linux.dev Cc: netdev(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org Changes in v3: - see per-patch changes Changes in v2: - remove "Fixes" for some patches because they do not fix bugs in kselftest runs (some fix bugs only when using bash args that kselftest does not use or otherwise prepare functions for new usage) - broke out one fixes patch for "net" - per-patch changes - add patch for shellcheck declaration to disable false positives - Link to v1: https://lore.kernel.org/r/20251022-vsock-selftests-fixes-and-improvements-v… --- Bobby Eshleman (11): selftests/vsock: improve logging in vmtest.sh selftests/vsock: make wait_for_listener() work even if pipefail is on selftests/vsock: reuse logic for vsock_test through wrapper functions selftests/vsock: avoid multi-VM pidfile collisions with QEMU selftests/vsock: do not unconditionally die if qemu fails selftests/vsock: speed up tests by reducing the QEMU pidfile timeout selftests/vsock: add check_result() for pass/fail counting selftests/vsock: add BUILD=0 definition selftests/vsock: add 1.37 to tested virtme-ng versions selftests/vsock: add vsock_loopback module loading selftests/vsock: disable shellcheck SC2317 and SC2119 tools/testing/selftests/vsock/vmtest.sh | 355 ++++++++++++++++++++++---------- 1 file changed, 243 insertions(+), 112 deletions(-) --- base-commit: 8a25a2e34157d882032112e4194ccdfb29c499e8 change-id: 20251021-vsock-selftests-fixes-and-improvements-057440ffb2fa Best regards, -- Bobby Eshleman <bobbyeshleman(a)meta.com>

2 months

2
15
0 0

[PATCH][next] selftests/bpf: test_xsk: Fix spelling mistake "conigure" -> "configure"

by Colin Ian King

There is a spelling mistake in an ASSERT_OK message. Fix it. Signed-off-by: Colin Ian King <coking(a)nvidia.com> --- tools/testing/selftests/bpf/prog_tests/xsk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/prog_tests/xsk.c b/tools/testing/selftests/bpf/prog_tests/xsk.c index dd4c35c0e428..04f9a5e73e5e 100644 --- a/tools/testing/selftests/bpf/prog_tests/xsk.c +++ b/tools/testing/selftests/bpf/prog_tests/xsk.c @@ -74,7 +74,7 @@ static void test_xsk(const struct test_spec *test_to_run, enum test_mode mode) if (!ASSERT_OK_PTR(ifobj_rx, "create ifobj_rx")) goto delete_tx; - if (!ASSERT_OK(configure_ifobj(ifobj_tx, ifobj_rx), "conigure ifobj")) + if (!ASSERT_OK(configure_ifobj(ifobj_tx, ifobj_rx), "configure ifobj")) goto delete_rx; ret = get_hw_ring_size(ifobj_tx->ifname, &ifobj_tx->ring); -- 2.51.0

2 months

2
1
0 0

[PATCH v2 0/4] vfio: selftests: update DMA mapping tests to use queried IOVA ranges

by Alex Mastro

Not all IOMMUs support the same virtual address width as the processor, for instance older Intel consumer platforms only support 39-bits of IOMMU address space. On such platforms, using the virtual address as the IOVA and mappings at the top of the address space both fail. VFIO and IOMMUFD have facilities for retrieving valid IOVA ranges, VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE and IOMMU_IOAS_IOVA_RANGES, respectively. These provide compatible arrays of ranges from which we can construct a simple allocator and record the maximum supported IOVA address. Use this new allocator in place of reusing the virtual address, and incorporate the maximum supported IOVA into the limit testing. This latter change doesn't test quite the same absolute end-of-address space behavior but still seems to have some value. Testing for overflow is skipped when a reduced address space is supported as the desired errno is not generated. This series is based on Alex Williamson's "Incorporate IOVA range info" [1] along with feedback from the discussion in David Matlack's "Skip vfio_dma_map_limit_test if mapping returns -EINVAL" [2]. Given David's plans to split IOMMU concerns from devices as described in [3], this series' home for `struct iova_allocator` and IOVA range helpers are likely to be short lived, since they reside in vfio_pci_device.c. I assume that the rework can move this functionality to a more appropriate location next to other IOMMU-focused code, once such a place exists. [1] https://lore.kernel.org/all/20251108212954.26477-1-alex@shazbot.org/#t [2] https://lore.kernel.org/all/20251107222058.2009244-1-dmatlack@google.com/ [3] https://lore.kernel.org/all/aRIoKJk0uwLD-yGr@google.com/ To: Alex Williamson <alex(a)shazbot.org> To: David Matlack <dmatlack(a)google.com> To: Shuah Khan <shuah(a)kernel.org> To: Jason Gunthorpe <jgg(a)ziepe.ca> Cc: kvm(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org Signed-off-by: Alex Mastro <amastro(a)fb.com> Changes in v2: - Fix various nits - calloc() where appropriate - Update overflow test to run regardless of iova range constraints - Change iova_allocator_init() to return an allocated struct - Unfold iova_allocator_alloc() - Fix iova allocator initial state bug - Update vfio_pci_driver_test to use iova allocator - Link to v1: https://lore.kernel.org/r/20251110-iova-ranges-v1-0-4d441cf5bf6d@fb.com --- Alex Mastro (4): vfio: selftests: add iova range query helpers vfio: selftests: fix map limit tests to use last available iova vfio: selftests: add iova allocator vfio: selftests: replace iova=vaddr with allocated iovas .../testing/selftests/vfio/lib/include/vfio_util.h | 19 +- tools/testing/selftests/vfio/lib/vfio_pci_device.c | 241 ++++++++++++++++++++- .../testing/selftests/vfio/vfio_dma_mapping_test.c | 20 +- .../testing/selftests/vfio/vfio_pci_driver_test.c | 12 +- 4 files changed, 283 insertions(+), 9 deletions(-) --- base-commit: 0ed3a30fd996cb0cac872432cf25185fda7e5316 change-id: 20251110-iova-ranges-1c09549fbf63 Best regards, -- Alex Mastro <amastro(a)fb.com>

2 months

3
13
0 0

[PATCH 00/11] mm/damon/tests: add more tests for online parameters commit

by SeongJae Park

A DAMON feature called parameters "commit" allows DAMON API callers and ABI users to update nearly every DAMON parameter while DAMON is running. This is being used for flexible DAMON use cases such as taking a snapshot of the monitoring results with minimum overhead, or adjusting access-aware system operations (DAMOS) for user-space driven auto-tuning or investigations. Compared to the usefulness of the feature and size of the implementation, the test coverage is pretty small. Only the filter commit part has a single test case, namely damos_test_commit_filter(). Actually, we found and fixed a few bugs of the feature in the past. The single existing test was also added to avoid reintroduction of a found bug. Add more unit tests for the feature. First four patches (1-4) refactor and extend the existing test for DAMOS filter commit for multiple test cases. Next three patches (5-7) add tests for DAMOS quota commit. Next two patches (8 and 9) refactor damos_commit_dests() for ease of code reading and test writing, and implement a new unit test of the function that is being refactored in a test-friendly way. Final two patches (10 and 11) further add new unit tests for damos_commit() and damon_commit_target_regions(). SeongJae Park (11): mm/damon/tests/core-kunit: remove dynamic allocs on damos_test_commit_filter() mm/damon/tests/core-kunit: split out damos_test_commit_filter() core logic mm/damon/tests/core-kunit: extend damos_test_commit_filter_for() for union fields mm/damon/tests/core-kunit: add test cases to damos_test_commit_filter() mm/damon/tests/core-kunit: add damos_commit_quota_goal() test mm/damon/tests/core-kunit: add damos_commit_quota_goals() test mm/damon/tests/core-kunit: add damos_commit_quota() test mm/damon/core: pass migrate_dests to damos_commit_dests() mm/damon/tests/core-kunit: add damos_commit_dests() test mm/damon/tests/core-kunit: add damos_commit() test mm/damon/tests/core-kunit: add damon_commit_target_regions() test mm/damon/core.c | 38 ++- mm/damon/tests/core-kunit.h | 544 +++++++++++++++++++++++++++++++++++- 2 files changed, 547 insertions(+), 35 deletions(-) base-commit: 620a4c1c5116eb811807ea7e63d61846015f69c8 -- 2.47.3

2 months

1
11
0 0

[PATCH 00/10] selftests: vDSO: Stop using libc types for vDSO calls

by Thomas Weißschuh

Currently the vDSO selftests use the time-related types from libc. This works on glibc by chance today but will break with other libc implementations or on distributions which switch to 64-bit times everywhere. The kernel's UAPI headers provide the proper types to use with the vDSO (and raw syscalls) but are not necessarily compatible with libc types. Introduce a new header which makes the UAPI headers compatible with the libc. Also contains some related cleanups. Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- Thomas Weißschuh (10): Revert "selftests: vDSO: parse_vdso: Use UAPI headers instead of libc headers" selftests: vDSO: Introduce vdso_types.h selftests: vDSO: vdso_test_abi: Use types from vdso_types.h selftests: vDSO: vdso_test_abi: Provide compatibility with 32-bit musl selftests: vDSO: vdso_test_gettimeofday: Remove nolibc checks selftests: vDSO: vdso_test_gettimeofday: Use types from vdso_types.h selftests: vDSO: vdso_test_correctness: Drop SYS_getcpu fallbacks selftests: vDSO: vdso_test_correctness: Use types from vdso_types.h selftests: vDSO: vdso_test_correctness: Provide compatibility with 32-bit musl selftests: vDSO: vdso_test_correctness: Use facilities from parse_vdso.c tools/testing/selftests/vDSO/Makefile | 6 +- tools/testing/selftests/vDSO/parse_vdso.c | 3 +- tools/testing/selftests/vDSO/vdso_test_abi.c | 35 ++++----- .../testing/selftests/vDSO/vdso_test_correctness.c | 85 +++++++++------------- .../selftests/vDSO/vdso_test_gettimeofday.c | 9 +-- tools/testing/selftests/vDSO/vdso_types.h | 70 ++++++++++++++++++ 6 files changed, 121 insertions(+), 87 deletions(-) --- base-commit: 8c6abf7bda867b82f8a6d60a0d5ce9cb1da6c433 change-id: 20251110-vdso-test-types-68ce0c712b79 Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

2 months

2
20
0 0

[PATCH] selftest: net: fix variable sized type not at the end of struct warnings

by Ankit Khushwaha

Some network selftests defined variable-sized types defined at the end of struct causing -Wgnu-variable-sized-type-not-at-end warning. warning: timestamping.c:285:18: warning: field 'cm' with variable sized type 'struct cmsghdr' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end] 285 | struct cmsghdr cm; | ^ ipsec.c:835:5: warning: field 'u' with variable sized type 'union (unnamed union at ipsec.c:831:3)' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end] 835 | } u; | ^ This patch move these field at the end of struct to fix these warnings. Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux(a)gmail.com> --- tools/testing/selftests/net/ipsec.c | 2 +- tools/testing/selftests/net/timestamping.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/net/ipsec.c b/tools/testing/selftests/net/ipsec.c index 0ccf484b1d9d..36083c8f884f 100644 --- a/tools/testing/selftests/net/ipsec.c +++ b/tools/testing/selftests/net/ipsec.c @@ -828,12 +828,12 @@ static int xfrm_state_pack_algo(struct nlmsghdr *nh, size_t req_sz, struct xfrm_desc *desc) { struct { + char buf[XFRM_ALGO_KEY_BUF_SIZE]; union { struct xfrm_algo alg; struct xfrm_algo_aead aead; struct xfrm_algo_auth auth; } u; - char buf[XFRM_ALGO_KEY_BUF_SIZE]; } alg = {}; size_t alen, elen, clen, aelen; unsigned short type; diff --git a/tools/testing/selftests/net/timestamping.c b/tools/testing/selftests/net/timestamping.c index 044bc0e9ed81..ad2be2143698 100644 --- a/tools/testing/selftests/net/timestamping.c +++ b/tools/testing/selftests/net/timestamping.c @@ -282,8 +282,8 @@ static void recvpacket(int sock, int recvmsg_flags, struct iovec entry; struct sockaddr_in from_addr; struct { - struct cmsghdr cm; char control[512]; + struct cmsghdr cm; } control; int res; -- 2.51.0

2 months

2
3
0 0

[PATCH v5 00/34] sparc64: vdso: Switch to the generic vDSO library

by Thomas Weißschuh

The generic vDSO provides a lot common functionality shared between different architectures. SPARC is the last architecture not using it, preventing some necessary code cleanup. Make use of the generic infrastructure. Follow-up to and replacement for Arnd's SPARC vDSO removal patches: https://lore.kernel.org/lkml/20250707144726.4008707-1-arnd@kernel.org/ SPARC64 can not map .bss into userspace, so the vDSO datapages are switched over to be allocated dynamically. This requires changes to the s390 and random subsystem vDSO initialization as preparation. The random subsystem changes in turn require some cleanup of the vDSO headers to not end up as ugly #ifdef mess. Tested on a Niagara T4 and QEMU. This has a semantic conflict with my series "vdso: Reject absolute relocations during build" [0]. The last patch of this series expects all users of the generic vDSO library to use the vdsocheck tool. This is not the case (yet) for SPARC64. I do have the patches for the integration, the specifics will depend on which series is applied first. Based on v6.18-rc1. [0] https://lore.kernel.org/lkml/20250812-vdso-absolute-reloc-v4-0-61a8b615e5ec… Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- Changes in v5: - Merge the patches for 'struct page' mapping and dynamic allocation - Zero out newly-allocated data pages - Pick up review tags - Link to v4: https://lore.kernel.org/r/20251014-vdso-sparc64-generic-2-v4-0-e0607bf49dea… Changes in v4: - Rebase on v6.18-rc1. - Keep inclusion of asm/clocksource.h from linux/clocksource.h - Reword description of "s390/time: Set up vDSO datapage later" - Link to v3: https://lore.kernel.org/r/20250917-vdso-sparc64-generic-2-v3-0-3679b1bc8ee8… Changes in v3: - Allocate vDSO data pages dynamically (and lots of preparations for that) - Drop clock_getres() - Fix 32bit clock_gettime() syscall fallback - Link to v2: https://lore.kernel.org/r/20250815-vdso-sparc64-generic-2-v2-0-b5ff80672347… Changes in v2: - Rebase on v6.17-rc1 - Drop RFC state - Fix typo in commit message - Drop duplicate 'select GENERIC_TIME_VSYSCALL' - Merge "sparc64: time: Remove architecture-specific clocksource data" into the main conversion patch. It violated the check in __clocksource_register_scale() - Link to v1: https://lore.kernel.org/r/20250724-vdso-sparc64-generic-2-v1-0-e376a3bd24d1… --- Arnd Bergmann (1): clocksource: remove ARCH_CLOCKSOURCE_DATA Thomas Weißschuh (33): selftests: vDSO: vdso_test_correctness: Handle different tv_usec types arm64: vDSO: getrandom: Explicitly include asm/alternative.h arm64: vDSO: gettimeofday: Explicitly include vdso/clocksource.h arm64: vDSO: compat_gettimeofday: Add explicit includes ARM: vdso: gettimeofday: Add explicit includes powerpc/vdso/gettimeofday: Explicitly include vdso/time32.h powerpc/vdso: Explicitly include asm/cputable.h and asm/feature-fixups.h LoongArch: vDSO: Explicitly include asm/vdso/vdso.h MIPS: vdso: Add include guard to asm/vdso/vdso.h MIPS: vdso: Explicitly include asm/vdso/vdso.h random: vDSO: Add explicit includes vdso/gettimeofday: Add explicit includes vdso/helpers: Explicitly include vdso/processor.h vdso/datapage: Remove inclusion of gettimeofday.h vdso/datapage: Trim down unnecessary includes random: vDSO: trim vDSO includes random: vDSO: remove ifdeffery random: vDSO: split out datapage update into helper functions random: vDSO: only access vDSO datapage after random_init() s390/time: Set up vDSO datapage later vdso/datastore: Reduce scope of some variables in vvar_fault() vdso/datastore: Drop inclusion of linux/mmap_lock.h vdso/datastore: Allocate data pages dynamically sparc64: vdso: Link with -z noexecstack sparc64: vdso: Remove obsolete "fake section table" reservation sparc64: vdso: Replace code patching with runtime conditional sparc64: vdso: Move hardware counter read into header sparc64: vdso: Move syscall fallbacks into header sparc64: vdso: Introduce vdso/processor.h sparc64: vdso: Switch to the generic vDSO library sparc64: vdso2c: Drop sym_vvar_start handling sparc64: vdso2c: Remove symbol handling sparc64: vdso: Implement clock_gettime64() arch/arm/include/asm/vdso/gettimeofday.h | 2 + arch/arm64/include/asm/vdso/compat_gettimeofday.h | 3 + arch/arm64/include/asm/vdso/gettimeofday.h | 2 + arch/arm64/kernel/vdso/vgetrandom.c | 2 + arch/loongarch/kernel/process.c | 1 + arch/loongarch/kernel/vdso.c | 1 + arch/mips/include/asm/vdso/vdso.h | 5 + arch/mips/kernel/vdso.c | 1 + arch/powerpc/include/asm/vdso/gettimeofday.h | 1 + arch/powerpc/include/asm/vdso/processor.h | 3 + arch/s390/kernel/time.c | 4 +- arch/sparc/Kconfig | 3 +- arch/sparc/include/asm/clocksource.h | 9 - arch/sparc/include/asm/processor.h | 3 + arch/sparc/include/asm/processor_32.h | 2 - arch/sparc/include/asm/processor_64.h | 25 -- arch/sparc/include/asm/vdso.h | 2 - arch/sparc/include/asm/vdso/clocksource.h | 10 + arch/sparc/include/asm/vdso/gettimeofday.h | 184 ++++++++++ arch/sparc/include/asm/vdso/processor.h | 41 +++ arch/sparc/include/asm/vdso/vsyscall.h | 10 + arch/sparc/include/asm/vvar.h | 75 ---- arch/sparc/kernel/Makefile | 1 - arch/sparc/kernel/time_64.c | 6 +- arch/sparc/kernel/vdso.c | 69 ---- arch/sparc/vdso/Makefile | 8 +- arch/sparc/vdso/vclock_gettime.c | 380 ++------------------- arch/sparc/vdso/vdso-layout.lds.S | 26 +- arch/sparc/vdso/vdso.lds.S | 2 - arch/sparc/vdso/vdso2c.c | 24 -- arch/sparc/vdso/vdso2c.h | 45 +-- arch/sparc/vdso/vdso32/vdso32.lds.S | 4 +- arch/sparc/vdso/vma.c | 274 +-------------- drivers/char/random.c | 71 ++-- include/linux/clocksource.h | 6 +- include/linux/vdso_datastore.h | 6 + include/vdso/datapage.h | 23 +- include/vdso/helpers.h | 1 + init/main.c | 2 + kernel/time/Kconfig | 4 - lib/vdso/datastore.c | 74 ++-- lib/vdso/getrandom.c | 3 + lib/vdso/gettimeofday.c | 17 + .../testing/selftests/vDSO/vdso_test_correctness.c | 8 +- 44 files changed, 449 insertions(+), 994 deletions(-) --- base-commit: 28b1ac5ccd8d4900a8f53f0e6e84d517a7ccc71f change-id: 20250722-vdso-sparc64-generic-2-25f2e058e92c Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

2 months

7
52
0 0

[PATCH net v10 0/4] net: netpoll: fix memory leak and add comprehensive selftests

by Breno Leitao

Fix a memory leak in netpoll and introduce netconsole selftests that expose the issue when running with kmemleak detection enabled. This patchset includes a selftest for netpoll with multiple concurrent users (netconsole + bonding), which simulates the scenario from test[1] that originally demonstrated the issue allegedly fixed by commit efa95b01da18 ("netpoll: fix use after free") - a commit that is now being reverted. Sending this to "net" branch because this is a fix, and the selftest might help with the backports validation. Link: https://lore.kernel.org/lkml/96b940137a50e5c387687bb4f57de8b0435a653f.14048… [1] Signed-off-by: Breno Leitao <leitao(a)debian.org> --- Changes in v10: - Get rid of the create_and_enable_dynamic_target() (Simon) - Link to v9: https://lore.kernel.org/r/20251106-netconsole_torture-v9-0-f73cd147c13c@deb… Changes in v9: - Reordered the config entries in tools/testing/selftests/drivers/net/bonding/config (NIPA) - Link to v8: https://lore.kernel.org/r/20251104-netconsole_torture-v8-0-5288440e2fa0@deb… Changes in v8: - Sending it again, now that commit 1a8fed52f7be1 ("netdevsim: set the carrier when the device goes up") has landed in net - Created one namespace for TX and one for RX (Paolo) - Used additional helpers to create and delete netdevsim (Paolo) - Link to v7: https://lore.kernel.org/r/20251003-netconsole_torture-v7-0-aa92fcce62a9@deb… Changes in v7: - Rebased on top of `net` - Link to v6: https://lore.kernel.org/r/20251002-netconsole_torture-v6-0-543bf52f6b46@deb… Changes in v6: - Expand the tests even more and some small fixups - Moved the test to bonding selftests - Link to v5: https://lore.kernel.org/r/20250918-netconsole_torture-v5-0-77e25e0a4eb6@deb… Changes in v5: - Set CONFIG_BONDING=m in selftests/drivers/net/config. - Link to v4: https://lore.kernel.org/r/20250917-netconsole_torture-v4-0-0a5b3b8f81ce@deb… Changes in v4: - Added an additional selftest to test multiple netpoll users in parallel - Link to v3: https://lore.kernel.org/r/20250905-netconsole_torture-v3-0-875c7febd316@deb… Changes in v3: - This patchset is a merge of the fix and the selftest together as recommended by Jakub. Changes in v2: - Reuse the netconsole creation from lib_netcons.sh. Thus, refactoring the create_dynamic_target() (Jakub) - Move the "wait" to after all the messages has been sent. - Link to v1: https://lore.kernel.org/r/20250902-netconsole_torture-v1-1-03c6066598e9@deb… --- Breno Leitao (4): net: netpoll: fix incorrect refcount handling causing incorrect cleanup selftest: netcons: refactor target creation selftest: netcons: create a torture test selftest: netcons: add test for netconsole over bonded interfaces net/core/netpoll.c | 7 +- tools/testing/selftests/drivers/net/Makefile | 1 + .../testing/selftests/drivers/net/bonding/Makefile | 2 + tools/testing/selftests/drivers/net/bonding/config | 4 + .../drivers/net/bonding/netcons_over_bonding.sh | 361 +++++++++++++++++++++ .../selftests/drivers/net/lib/sh/lib_netcons.sh | 78 ++++- .../selftests/drivers/net/netcons_torture.sh | 130 ++++++++ 7 files changed, 566 insertions(+), 17 deletions(-) --- base-commit: 7d1988a943850c584e8e2e4bcc7a3b5275024072 change-id: 20250902-netconsole_torture-8fc23f0aca99 Best regards, -- Breno Leitao <leitao(a)debian.org>

2 months

2
5
0 0

[PATCH] selftests/tracing: Run sample events to clear page cache events

by Steven Rostedt

From: Steven Rostedt <rostedt(a)goodmis.org> The tracing selftest "event-filter-function.tc" was failing because it first runs the "sample_events" function that triggers the kmem_cache_free event and it looks at what function was used during a call to "ls". But the first time it calls this, it could trigger events that are used to pull pages into the page cache. The rest of the test uses the function it finds during that call to see if it will be called in subsequent "sample_events" calls. But if there's no need to pull pages into the page cache, it will not trigger that function and the test will fail. Call the "sample_events" twice to trigger all the page cache work before it calls it to find a function to use in subsequent checks. Cc: stable(a)vger.kernel.org Fixes: eb50d0f250e96 ("selftests/ftrace: Choose target function for filter test from samples") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- .../selftests/ftrace/test.d/filter/event-filter-function.tc | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc index c62165fabd0c..003f612f57b0 100644 --- a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc +++ b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc @@ -20,6 +20,10 @@ sample_events() { echo 0 > tracing_on echo 0 > events/enable +# Clear functions caused by page cache; run sample_events twice +sample_events +sample_events + echo "Get the most frequently calling function" echo > trace sample_events -- 2.51.0

2 months

3
3
0 0

[PATCH 0/4] vfio: selftests: update DMA mapping tests to use queried IOVA ranges

by Alex Mastro

Not all IOMMUs support the same virtual address width as the processor, for instance older Intel consumer platforms only support 39-bits of IOMMU address space. On such platforms, using the virtual address as the IOVA and mappings at the top of the address space both fail. VFIO and IOMMUFD have facilities for retrieving valid IOVA ranges, VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE and IOMMU_IOAS_IOVA_RANGES, respectively. These provide compatible arrays of ranges from which we can construct a simple allocator and record the maximum supported IOVA address. Use this new allocator in place of reusing the virtual address, and incorporate the maximum supported IOVA into the limit testing. This latter change doesn't test quite the same absolute end-of-address space behavior but still seems to have some value. Testing for overflow is skipped when a reduced address space is supported as the desired errno is not generated. This series is based on Alex Williamson's "Incorporate IOVA range info" [1] along with feedback from the discussion in David Matlack's "Skip vfio_dma_map_limit_test if mapping returns -EINVAL" [2]. Given David's plans to split IOMMU concerns from devices as described in [3], this series' home for `struct iova_allocator` is likely to be short lived, since it resides in vfio_pci_device.c. I assume that the rework can move this functionality to a more appropriate location next to other IOMMU-focused code, once such a place exists. [1] https://lore.kernel.org/all/20251108212954.26477-1-alex@shazbot.org/#t [2] https://lore.kernel.org/all/20251107222058.2009244-1-dmatlack@google.com/ [3] https://lore.kernel.org/all/aRIoKJk0uwLD-yGr@google.com/ Signed-off-by: Alex Mastro <amastro(a)fb.com> --- Alex Mastro (4): vfio: selftests: add iova range query helpers vfio: selftests: fix map limit tests to use last available iova vfio: selftests: add iova allocator vfio: selftests: update vfio_dma_mapping_test to allocate iovas .../testing/selftests/vfio/lib/include/vfio_util.h | 22 +- tools/testing/selftests/vfio/lib/vfio_pci_device.c | 226 ++++++++++++++++++++- .../testing/selftests/vfio/vfio_dma_mapping_test.c | 25 ++- 3 files changed, 268 insertions(+), 5 deletions(-) --- base-commit: 0ed3a30fd996cb0cac872432cf25185fda7e5316 change-id: 20251110-iova-ranges-1c09549fbf63 Best regards, -- Alex Mastro <amastro(a)fb.com>

2 months

3
20
0 0

[PATCH] selftest/mm: fix pointer comparison in mremap_test

by Ankit Khushwaha

Pointer arthemitic with 'void * addr' and 'unsigned long long dest_alignment' triggers following warning: mremap_test.c:1035:31: warning: pointer comparison always evaluates to false [-Wtautological-compare] 1035 | if (addr + c.dest_alignment < addr) { | ^ typecasting 'addr' to 'unsigned long long' to fix pointer comparison. Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux(a)gmail.com> --- tools/testing/selftests/mm/mremap_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c index a95c0663a011..5ae0400176af 100644 --- a/tools/testing/selftests/mm/mremap_test.c +++ b/tools/testing/selftests/mm/mremap_test.c @@ -1032,7 +1032,7 @@ static long long remap_region(struct config c, unsigned int threshold_mb, /* Don't destroy existing mappings unless expected to overlap */ while (!is_remap_region_valid(addr, c.region_size) && !c.overlapping) { /* Check for unsigned overflow */ - if (addr + c.dest_alignment < addr) { + if ((unsigned long long) addr + c.dest_alignment < (unsigned long long) addr) { ksft_print_msg("Couldn't find a valid region to remap to\n"); ret = -1; goto clean_up_src; -- 2.51.0

2 months

7
10
0 0

[PATCH] mm/selftests: Fix -Wtautological-compare warning in mremap_test.c

by Wake Liu

The compiler warns about a tautological comparison in mremap_test.c: "pointer comparison always evaluates to false [-Wtautological-compare]" This occurs when checking for unsigned overflow: if (addr + c.dest_alignment < addr) Cast 'addr' to 'unsigned long long' to ensure the comparison is performed with a wider type, correctly detecting potential overflow and resolving the warning. Signed-off-by: Wake Liu <wakel(a)google.com> --- tools/testing/selftests/mm/mremap_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c index bf2863b102e3..c4933f4cbd48 100644 --- a/tools/testing/selftests/mm/mremap_test.c +++ b/tools/testing/selftests/mm/mremap_test.c @@ -1032,7 +1032,7 @@ static long long remap_region(struct config c, unsigned int threshold_mb, /* Don't destroy existing mappings unless expected to overlap */ while (!is_remap_region_valid(addr, c.region_size) && !c.overlapping) { /* Check for unsigned overflow */ - if (addr + c.dest_alignment < addr) { + if ((unsigned long long)addr + c.dest_alignment < (unsigned long long)addr) { ksft_print_msg("Couldn't find a valid region to remap to\n"); ret = -1; goto clean_up_src; -- 2.51.2.1041.gc1ab5b90ca-goog

2 months

2
1
0 0

[PATCH net v9 0/4] net: netpoll: fix memory leak and add comprehensive selftests

by Breno Leitao

Fix a memory leak in netpoll and introduce netconsole selftests that expose the issue when running with kmemleak detection enabled. This patchset includes a selftest for netpoll with multiple concurrent users (netconsole + bonding), which simulates the scenario from test[1] that originally demonstrated the issue allegedly fixed by commit efa95b01da18 ("netpoll: fix use after free") - a commit that is now being reverted. Sending this to "net" branch because this is a fix, and the selftest might help with the backports validation. Link: https://lore.kernel.org/lkml/96b940137a50e5c387687bb4f57de8b0435a653f.14048… [1] Signed-off-by: Breno Leitao <leitao(a)debian.org> --- Changes in v9: - Reordered the config entries in tools/testing/selftests/drivers/net/bonding/config (NIPA) - Link to v8: https://lore.kernel.org/r/20251104-netconsole_torture-v8-0-5288440e2fa0@deb… Changes in v8: - Sending it again, now that commit 1a8fed52f7be1 ("netdevsim: set the carrier when the device goes up") has landed in net - Created one namespace for TX and one for RX (Paolo) - Used additional helpers to create and delete netdevsim (Paolo) - Link to v7: https://lore.kernel.org/r/20251003-netconsole_torture-v7-0-aa92fcce62a9@deb… Changes in v7: - Rebased on top of `net` - Link to v6: https://lore.kernel.org/r/20251002-netconsole_torture-v6-0-543bf52f6b46@deb… Changes in v6: - Expand the tests even more and some small fixups - Moved the test to bonding selftests - Link to v5: https://lore.kernel.org/r/20250918-netconsole_torture-v5-0-77e25e0a4eb6@deb… Changes in v5: - Set CONFIG_BONDING=m in selftests/drivers/net/config. - Link to v4: https://lore.kernel.org/r/20250917-netconsole_torture-v4-0-0a5b3b8f81ce@deb… Changes in v4: - Added an additional selftest to test multiple netpoll users in parallel - Link to v3: https://lore.kernel.org/r/20250905-netconsole_torture-v3-0-875c7febd316@deb… Changes in v3: - This patchset is a merge of the fix and the selftest together as recommended by Jakub. Changes in v2: - Reuse the netconsole creation from lib_netcons.sh. Thus, refactoring the create_dynamic_target() (Jakub) - Move the "wait" to after all the messages has been sent. - Link to v1: https://lore.kernel.org/r/20250902-netconsole_torture-v1-1-03c6066598e9@deb… --- Breno Leitao (4): net: netpoll: fix incorrect refcount handling causing incorrect cleanup selftest: netcons: refactor target creation selftest: netcons: create a torture test selftest: netcons: add test for netconsole over bonded interfaces net/core/netpoll.c | 7 +- tools/testing/selftests/drivers/net/Makefile | 1 + .../testing/selftests/drivers/net/bonding/Makefile | 2 + tools/testing/selftests/drivers/net/bonding/config | 4 + .../drivers/net/bonding/netcons_over_bonding.sh | 361 +++++++++++++++++++++ .../selftests/drivers/net/lib/sh/lib_netcons.sh | 82 ++++- .../selftests/drivers/net/netcons_torture.sh | 130 ++++++++ 7 files changed, 569 insertions(+), 18 deletions(-) --- base-commit: 7d1988a943850c584e8e2e4bcc7a3b5275024072 change-id: 20250902-netconsole_torture-8fc23f0aca99 Best regards, -- Breno Leitao <leitao(a)debian.org>

2 months

2
7
0 0

[PATCH v2] selftest/mm: fix pointer comparison in mremap_test

by Ankit Khushwaha

Pointer arthemitic with 'void * addr' and 'ulong dest_alignment' triggers following warning: mremap_test.c:1035:31: warning: pointer comparison always evaluates to false [-Wtautological-compare] 1035 | if (addr + c.dest_alignment < addr) { | ^ this warning is raised from clang version 20.1.8 (Fedora 20.1.8-4.fc42). use 'void *tmp_addr' to do the pointer arthemitic. Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux(a)gmail.com> --- Changelog: v2: - use 'void *tmp_addr' for pointer arthemitic instead of typecasting 'addr' to 'unsigned long long' as suggested by Andrew. v1: https://lore.kernel.org/linux-kselftest/20251106104917.39890-1-ankitkhushwa… --- tools/testing/selftests/mm/mremap_test.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c index a95c0663a011..308576437228 100644 --- a/tools/testing/selftests/mm/mremap_test.c +++ b/tools/testing/selftests/mm/mremap_test.c @@ -994,7 +994,7 @@ static void mremap_move_multi_invalid_vmas(FILE *maps_fp, unsigned long page_siz static long long remap_region(struct config c, unsigned int threshold_mb, char *rand_addr) { - void *addr, *src_addr, *dest_addr, *dest_preamble_addr = NULL; + void *addr, *tmp_addr, *src_addr, *dest_addr, *dest_preamble_addr = NULL; unsigned long long t, d; struct timespec t_start = {0, 0}, t_end = {0, 0}; long long start_ns, end_ns, align_mask, ret, offset; @@ -1032,7 +1032,8 @@ static long long remap_region(struct config c, unsigned int threshold_mb, /* Don't destroy existing mappings unless expected to overlap */ while (!is_remap_region_valid(addr, c.region_size) && !c.overlapping) { /* Check for unsigned overflow */ - if (addr + c.dest_alignment < addr) { + tmp_addr = addr + c.dest_alignment; + if (tmp_addr < addr) { ksft_print_msg("Couldn't find a valid region to remap to\n"); ret = -1; goto clean_up_src; -- 2.51.1

2 months

3
2
0 0

[PATCH net-next 0/3] Add YNL test framework and library improvements

by Hangbin Liu

This series enhances YNL tools with some functionalities and adds YNL selftest framework. Changes include: - Add MAC address parsing support in YNL library - Fix rt-rule spec consistency with other rt-* families - Add selftests covering CLI and ethtool functionality The tests provide usage examples and regression testing for YNL tools. Hangbin Liu (3): tools: ynl: Add MAC address parsing support netlink: specs: update rt-rule src/dst attribute types to support IPv4 addresses selftests: net: add YNL test framework Documentation/netlink/specs/rt-rule.yaml | 6 +- tools/net/ynl/pyynl/lib/ynl.py | 9 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/net/ynl/Makefile | 18 ++ tools/testing/selftests/net/ynl/cli.sh | 234 +++++++++++++++++++++ tools/testing/selftests/net/ynl/config | 6 + tools/testing/selftests/net/ynl/ethtool.sh | 188 +++++++++++++++++ tools/testing/selftests/net/ynl/settings | 1 + 8 files changed, 461 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/net/ynl/Makefile create mode 100755 tools/testing/selftests/net/ynl/cli.sh create mode 100644 tools/testing/selftests/net/ynl/config create mode 100755 tools/testing/selftests/net/ynl/ethtool.sh create mode 100644 tools/testing/selftests/net/ynl/settings -- 2.50.1

2 months

3
22
0 0

[PATCH] tools/nolibc: add support for fchdir()

by Thomas Weißschuh

Add support for the file descriptor based variant of chdir(). Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- tools/include/nolibc/sys.h | 13 +++++++++++++ tools/testing/selftests/nolibc/nolibc-test.c | 2 ++ 2 files changed, 15 insertions(+) diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h index c5564f57deec88b8aa70291fcf6f9ca4dbc1d03f..a4b0fdb9b641230174f5e62d62762f59af81a00e 100644 --- a/tools/include/nolibc/sys.h +++ b/tools/include/nolibc/sys.h @@ -118,6 +118,7 @@ void *sbrk(intptr_t inc) /* * int chdir(const char *path); + * int fchdir(int fildes); */ static __attribute__((unused)) @@ -132,6 +133,18 @@ int chdir(const char *path) return __sysret(sys_chdir(path)); } +static __attribute__((unused)) +int sys_fchdir(int fildes) +{ + return my_syscall1(__NR_fchdir, fildes); +} + +static __attribute__((unused)) +int fchdir(int fildes) +{ + return __sysret(sys_fchdir(fildes)); +} + /* * int chmod(const char *path, mode_t mode); diff --git a/tools/testing/selftests/nolibc/nolibc-test.c b/tools/testing/selftests/nolibc/nolibc-test.c index 29de21595fc95341c2aa975375a8d471cb3933fc..5927a84466cc0ede3b99611e134a8c6b8ab91e72 100644 --- a/tools/testing/selftests/nolibc/nolibc-test.c +++ b/tools/testing/selftests/nolibc/nolibc-test.c @@ -1343,6 +1343,8 @@ int run_syscall(int min, int max) CASE_TEST(dup3_0); tmp = dup3(0, 100, 0); EXPECT_SYSNE(1, tmp, -1); close(tmp); break; CASE_TEST(dup3_m1); tmp = dup3(-1, 100, 0); EXPECT_SYSER(1, tmp, -1, EBADF); if (tmp != -1) close(tmp); break; CASE_TEST(execve_root); EXPECT_SYSER(1, execve("/", (char*[]){ [0] = "/", [1] = NULL }, NULL), -1, EACCES); break; + CASE_TEST(fchdir_stdin); EXPECT_SYSER(1, fchdir(STDIN_FILENO), -1, ENOTDIR); break; + CASE_TEST(fchdir_badfd); EXPECT_SYSER(1, fchdir(-1), -1, EBADF); break; CASE_TEST(file_stream); EXPECT_SYSZR(1, test_file_stream()); break; CASE_TEST(fork); EXPECT_SYSZR(1, test_fork(FORK_STANDARD)); break; CASE_TEST(getdents64_root); EXPECT_SYSNE(1, test_getdents64("/"), -1); break; --- base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787 change-id: 20251107-nolibc-fchdir-2645c298a538 Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

2 months

2
1
0 0

[PATCH net-next v3 0/5] psp: track stats from core and provide a driver stats api

by Daniel Zahka

This series introduces stats counters for psp. Device key rotations, and so called 'stale-events' are common to all drivers and are tracked by the core. A driver facing api is provided for reporting stats required by the "Implementation Requirements" section of the PSP Architecture Specification. Drivers must implement these stats. Lastly, implementations of the driver stats api for mlx5 and netdevsim are included. Here is the output of running the psp selftest suite and then printing out stats with the ynl cli on system with a psp-capable CX7: $ ./ksft-psp-stats/drivers/net/psp.py TAP version 13 1..28 ok 1 psp.test_case # SKIP Test requires IPv4 connectivity ok 2 psp.data_basic_send_v0_ip6 ok 3 psp.test_case # SKIP Test requires IPv4 connectivity ok 4 psp.data_basic_send_v1_ip6 ok 5 psp.test_case # SKIP Test requires IPv4 connectivity ok 6 psp.data_basic_send_v2_ip6 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-128') ok 7 psp.test_case # SKIP Test requires IPv4 connectivity ok 8 psp.data_basic_send_v3_ip6 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-256') ok 9 psp.test_case # SKIP Test requires IPv4 connectivity ok 10 psp.data_mss_adjust_ip6 ok 11 psp.dev_list_devices ok 12 psp.dev_get_device ok 13 psp.dev_get_device_bad ok 14 psp.dev_rotate ok 15 psp.dev_rotate_spi ok 16 psp.assoc_basic ok 17 psp.assoc_bad_dev ok 18 psp.assoc_sk_only_conn ok 19 psp.assoc_sk_only_mismatch ok 20 psp.assoc_sk_only_mismatch_tx ok 21 psp.assoc_sk_only_unconn ok 22 psp.assoc_version_mismatch ok 23 psp.assoc_twice ok 24 psp.data_send_bad_key ok 25 psp.data_send_disconnect ok 26 psp.data_stale_key ok 27 psp.removal_device_rx # XFAIL Test only works on netdevsim ok 28 psp.removal_device_bi # XFAIL Test only works on netdevsim # Totals: pass:19 fail:0 xfail:2 xpass:0 skip:7 error:0 # # Responder logs (0): # STDERR: # Set PSP enable on device 1 to 0x3 # Set PSP enable on device 1 to 0x0 $ cd ynl/ $ ./pyynl/cli.py --spec netlink/specs/psp.yaml --dump get-stats [{'dev-id': 1, 'key-rotations': 5, 'rx-auth-fail': 21, 'rx-bad': 0, 'rx-bytes': 11844, 'rx-error': 0, 'rx-packets': 94, 'stale-events': 6, 'tx-bytes': 1128456, 'tx-error': 0, 'tx-packets': 780}] CHANGES: v3: - simplify error path in accel_psp_fs_init_tx() - avoid casting argument in mlx5e_accel_psp_fs_get_stats_fill() - delete unused member stats member in mlx5e_psp - remove zero length array from psp_dev_stats v2: https://lore.kernel.org/netdev/20251028000018.3869664-1-daniel.zahka@gmail.… - don't return skb->len from psp_nl_get_stats_dumpit() on success and EMSGSIZE - use %pe to print PTR_ERR() v1: https://lore.kernel.org/netdev/20251022193739.1376320-1-daniel.zahka@gmail.… Daniel Zahka (2): selftests: drv-net: psp: add assertions on core-tracked psp dev stats netdevsim: implement psp device stats Jakub Kicinski (3): psp: report basic stats from the core psp: add stats from psp spec to driver facing api net/mlx5e: Add PSP stats support for Rx/Tx flows Documentation/netlink/specs/psp.yaml | 95 +++++++ .../mellanox/mlx5/core/en_accel/psp.c | 233 ++++++++++++++++-- .../mellanox/mlx5/core/en_accel/psp.h | 16 ++ .../mellanox/mlx5/core/en_accel/psp_rxtx.c | 1 + .../net/ethernet/mellanox/mlx5/core/en_main.c | 5 + drivers/net/netdevsim/netdevsim.h | 5 + drivers/net/netdevsim/psp.c | 27 ++ include/net/psp/types.h | 32 +++ include/uapi/linux/psp.h | 18 ++ net/psp/psp-nl-gen.c | 19 ++ net/psp/psp-nl-gen.h | 2 + net/psp/psp_main.c | 3 +- net/psp/psp_nl.c | 93 +++++++ net/psp/psp_sock.c | 4 +- tools/testing/selftests/drivers/net/psp.py | 13 + 15 files changed, 549 insertions(+), 17 deletions(-) -- 2.47.3

2 months

2
6
0 0

[PATCH net v3] selftests: net: local_termination: Wait for interfaces to come up

by A. Sverdlin

From: Alexander Sverdlin <alexander.sverdlin(a)siemens.com> It seems that most of the tests prepare the interfaces once before the test run (setup_prepare()), rely on setup_wait() to wait for link and only then run the test(s). local_termination brings the physical interfaces down and up during test run but never wait for them to come up. If the auto-negotiation takes some seconds, first test packets are being lost, which leads to false-negative test results. Use setup_wait() in run_test() to make sure auto-negotiation has been completed after all simple_if_init() calls on physical interfaces and test packets will not be lost because of the race against link establishment. Fixes: 90b9566aa5cd3f ("selftests: forwarding: add a test for local_termination.sh") Reviewed-by: Vladimir Oltean <vladimir.oltean(a)nxp.com> Signed-off-by: Alexander Sverdlin <alexander.sverdlin(a)siemens.com> --- Changelog: v3: - moved setup_wait() from individual test groups into run_test() v2: - replaced "setup_wait_dev $h1; setup_wait_dev $h2" with setup_wait() tools/testing/selftests/net/forwarding/local_termination.sh | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/net/forwarding/local_termination.sh b/tools/testing/selftests/net/forwarding/local_termination.sh index ecd34f364125c..892895659c7e4 100755 --- a/tools/testing/selftests/net/forwarding/local_termination.sh +++ b/tools/testing/selftests/net/forwarding/local_termination.sh @@ -176,6 +176,8 @@ run_test() local rcv_dmac=$(mac_get $rcv_if_name) local should_receive + setup_wait + tcpdump_start $rcv_if_name mc_route_prepare $send_if_name -- 2.51.1

2 months

2
1
0 0

[PATCH net-next 0/4] netconsole: Allow userdata buffer to grow dynamically

by Gustavo Luiz Duarte

The current netconsole implementation allocates a static buffer for extradata (userdata + sysdata) with a fixed size of MAX_EXTRADATA_ENTRY_LEN * MAX_EXTRADATA_ITEMS bytes for every target, regardless of whether userspace actually uses this feature. This forces us to keep MAX_EXTRADATA_ITEMS small (16), which is restrictive for users who need to attach more metadata to their log messages. This patch series enables dynamic allocation of the userdata buffer, allowing it to grow on-demand based on actual usage. The series: 1. Refactors send_fragmented_body() to simplify handling of separated userdata and sysdata (patch 1/4) 2. Splits userdata and sysdata into separate buffers (patch 2/4) 3. Implements dynamic allocation for the userdata buffer (patch 3/4) 4. Increases MAX_USERDATA_ITEMS from 16 to 256 now that we can do so without memory waste (patch 4/4) Benefits: - No memory waste when userdata is not used - Targets that use userdata only consume what they need - Users can attach significantly more metadata without impacting systems that don't use this feature Signed-off-by: Gustavo Luiz Duarte <gustavold(a)gmail.com> --- Gustavo Luiz Duarte (4): netconsole: Simplify send_fragmented_body() netconsole: Split userdata and sysdata netconsole: Dynamic allocation of userdata buffer netconsole: Increase MAX_USERDATA_ITEMS drivers/net/netconsole.c | 338 +++++++++------------ .../selftests/drivers/net/netcons_overflow.sh | 2 +- 2 files changed, 152 insertions(+), 188 deletions(-) --- base-commit: 89aec171d9d1ab168e43fcf9754b82e4c0aef9b9 change-id: 20251007-netconsole_dynamic_extradata-21bd9d726568 Best regards, -- Gustavo Duarte <gustavold(a)meta.com>

2 months

2
8
0 0

[PATCH] kselftest/arm64: Align zt-test register dumps

by Mark Rutland

The zt-test output is awkward to read, as the 'Expected' value isn't dumped on its own line and isn't aligned with the 'Got' value beneath. For example: Mismatch: PID=5281, iteration=3270249 Expected [00a1146901a1146902a1146903a1146904a1146905a1146906a1146907a1146908a1146909a114690aa114690ba114690ca114690da114690ea114690fa11469] Got [00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000] SVCR: 2 Add a newline, matching the other FPSIMD/SVE/SME tests, so that we get output that can be read more easily: Mismatch: PID=5281, iteration=3270249 Expected [00a1146901a1146902a1146903a1146904a1146905a1146906a1146907a1146908a1146909a114690aa114690ba114690ca114690da114690ea114690fa11469] Got [00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000] SVCR: 2 Admittedly this isn't all that important when the 'Got' value is all zeroes, but otherwise this would be a major help for identifying which portion of the 'Got' value is not as expected. Signed-off-by: Mark Rutland <mark.rutland(a)arm.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: Mark Brown <broonie(a)kernel.org> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Will Deacon <will(a)kernel.org> Cc: linux-arm-kernel(a)lists.infradead.org Cc: linux-kselftest(a)vger.kernel.org --- tools/testing/selftests/arm64/fp/zt-test.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/arm64/fp/zt-test.S b/tools/testing/selftests/arm64/fp/zt-test.S index 38080f3c32804..a8df057716707 100644 --- a/tools/testing/selftests/arm64/fp/zt-test.S +++ b/tools/testing/selftests/arm64/fp/zt-test.S @@ -276,7 +276,7 @@ function barf bl putdec puts ", iteration=" mov x0, x22 - bl putdec + bl putdecn puts "\tExpected [" mov x0, x10 mov x1, x12 -- 2.30.2

2 months

3
2
0 0

[PATCH v6 0/2] KVM: guest_memfd: use write for population

by Kalyazin, Nikita

[ based on kvm/next ] Implement guest_memfd population via the write syscall. This is useful in non-CoCo use cases where the host can access guest memory. Even though the same can also be achieved via userspace mapping and memcpying from userspace, write provides a more performant option because it does not need to set page tables and it does not cause a page fault for every page like memcpy would. Note that memcpy cannot be accelerated via MADV_POPULATE_WRITE as it is not supported by guest_memfd and relies on GUP. Populating 512MiB of guest_memfd on a x86 machine: - via memcpy: 436 ms - via write: 202 ms (-54%) The write syscall support is conditional on kvm_gmem_supports_mmap. When in-place shared/private conversion is supported, write should only be allowed on shared pages. v6: - Make write support conditional on mmap support instead of relying on the up-to-date flag to decide whether writing to a page is allowed - James: Remove depenendencies on folio_test_large - James: Remove page alignment restriction - James: Formatting fixes v5: - https://lore.kernel.org/kvm/20250902111951.58315-1-kalyazin@amazon.com/ - Replace the call to the unexported filemap_remove_folio with zeroing the bytes that could not be copied - Fix checkpatch findings v4: - https://lore.kernel.org/kvm/20250828153049.3922-1-kalyazin@amazon.com - Switch from implementing the write callback to write_iter - Remove conditional compilation v3: - https://lore.kernel.org/kvm/20250303130838.28812-1-kalyazin@amazon.com - David/Mike D: Only compile support for the write syscall if CONFIG_KVM_GMEM_SHARED_MEM (now gone) is enabled. v2: - https://lore.kernel.org/kvm/20241129123929.64790-1-kalyazin@amazon.com - Switch from an ioctl to the write syscall to implement population v1: - https://lore.kernel.org/kvm/20241024095429.54052-1-kalyazin@amazon.com Nikita Kalyazin (2): KVM: guest_memfd: add generic population via write KVM: selftests: update guest_memfd write tests .../testing/selftests/kvm/guest_memfd_test.c | 51 ++++++++++++++++--- virt/kvm/guest_memfd.c | 49 ++++++++++++++++++ 2 files changed, 94 insertions(+), 6 deletions(-) base-commit: 6b36119b94d0b2bb8cea9d512017efafd461d6ac -- 2.50.1

2 months

4
8
0 0

[PATCH v11 0/9] support FEAT_LSUI

by Yeoreum Yun

Since Armv9.6, FEAT_LSUI supplies the load/store instructions for previleged level to access to access user memory without clearing PSTATE.PAN bit. This patchset support FEAT_LSUI and applies in futex atomic operation and user_swpX emulation where can replace from ldxr/st{l}xr pair implmentation with clearing PSTATE.PAN bit to correspondant load/store unprevileged atomic operation without clearing PSTATE.PAN bit. Patch Sequences ================ Patch #1 adds cpufeature for FEAT_LSUI Patch #2-#3 expose FEAT_LSUI to guest Patch #4 adds Kconfig for FEAT_LSUI Patch #5-#6 support futex atomic-op with FEAT_LSUI Patch #7-#9 support user_swpX emulation with FEAT_LSUI Patch History ============== from v10 to v11: - use cast instruction to emulate deprecated swpb instruction - https://lore.kernel.org/all/20251103163224.818353-1-yeoreum.yun@arm.com/ from v9 to v10: - apply FEAT_LSUI to user_swpX emulation. - add test coverage for LSUI bit in ID_AA64ISAR3_EL1 - rebase to v6.18-rc4 - https://lore.kernel.org/all/20250922102244.2068414-1-yeoreum.yun@arm.com/ from v8 to v9: - refotoring __lsui_cmpxchg64() - rebase to v6.17-rc7 - https://lore.kernel.org/all/20250917110838.917281-1-yeoreum.yun@arm.com/ from v7 to v8: - implements futex_atomic_eor() and futex_atomic_cmpxchg() with casalt with C helper. - Drop the small optimisation on ll/sc futex_atomic_set operation. - modify some commit message. - https://lore.kernel.org/all/20250816151929.197589-1-yeoreum.yun@arm.com/ from v6 to v7: - wrap FEAT_LSUI with CONFIG_AS_HAS_LSUI in cpufeature - remove unnecessary addition of indentation. - remove unnecessary mte_tco_enable()/disable() on LSUI operation. - https://lore.kernel.org/all/20250811163635.1562145-1-yeoreum.yun@arm.com/ from v5 to v6: - rebase to v6.17-rc1 - https://lore.kernel.org/all/20250722121956.1509403-1-yeoreum.yun@arm.com/ from v4 to v5: - remove futex_ll_sc.h futext_lsui and lsui.h and move them to futex.h - reorganize the patches. - https://lore.kernel.org/all/20250721083618.2743569-1-yeoreum.yun@arm.com/ from v3 to v4: - rebase to v6.16-rc7 - modify some patch's title. - https://lore.kernel.org/all/20250617183635.1266015-1-yeoreum.yun@arm.com/ from v2 to v3: - expose FEAT_LUSI to guest - add help section for LUSI Kconfig - https://lore.kernel.org/all/20250611151154.46362-1-yeoreum.yun@arm.com/ from v1 to v2: - remove empty v9.6 menu entry - locate HAS_LUSI in cpucaps in order - https://lore.kernel.org/all/20250611104916.10636-1-yeoreum.yun@arm.com/ Yeoreum Yun (9): arm64: cpufeature: add FEAT_LSUI KVM: arm64: expose FEAT_LSUI to guest KVM: arm64: kselftest: set_id_regs: add test for FEAT_LSUI arm64: Kconfig: Detect toolchain support for LSUI arm64: futex: refactor futex atomic operation arm64: futex: support futex with FEAT_LSUI arm64: separate common LSUI definitions into lsui.h arm64: armv8_deprecated: convert user_swpX to inline function arm64: armv8_deprecated: apply FEAT_LSUI for swpX emulation. arch/arm64/Kconfig | 5 + arch/arm64/include/asm/futex.h | 291 +++++++++++++++--- arch/arm64/include/asm/lsui.h | 25 ++ arch/arm64/kernel/armv8_deprecated.c | 111 +++++-- arch/arm64/kernel/cpufeature.c | 10 + arch/arm64/kvm/sys_regs.c | 3 +- arch/arm64/tools/cpucaps | 1 + .../testing/selftests/kvm/arm64/set_id_regs.c | 1 + 8 files changed, 381 insertions(+), 66 deletions(-) create mode 100644 arch/arm64/include/asm/lsui.h base-commit: 6146a0f1dfae5d37442a9ddcba012add260bceb0 -- LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}

2 months

2
13
0 0

[PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock

by Bobby Eshleman

This series adds namespace support to vhost-vsock and loopback. It does not add namespaces to any of the other guest transports (virtio-vsock, hyperv, or vmci). The current revision supports two modes: local and global. Local mode is complete isolation of namespaces, while global mode is complete sharing between namespaces of CIDs (the original behavior). The mode is set using /proc/sys/net/vsock/ns_mode. Modes are per-netns and write-once. This allows a system to configure namespaces independently (some may share CIDs, others are completely isolated). This also supports future possible mixed use cases, where there may be namespaces in global mode spinning up VMs while there are mixed mode namespaces that provide services to the VMs, but are not allowed to allocate from the global CID pool (this mode not implemented in this series). If a socket or VM is created when a namespace is global but the namespace changes to local, the socket or VM will continue working normally. That is, the socket or VM assumes the mode behavior of the namespace at the time the socket/VM was created. The original mode is captured in vsock_create() and so occurs at the time of socket(2) and accept(2) for sockets and open(2) on /dev/vhost-vsock for VMs. This prevents a socket/VM connection from suddenly breaking due to a namespace mode change. Any new sockets/VMs created after the mode change will adopt the new mode's behavior. Additionally, added tests for the new namespace features: tools/testing/selftests/vsock/vmtest.sh 1..30 ok 1 vm_server_host_client ok 2 vm_client_host_server ok 3 vm_loopback ok 4 ns_host_vsock_ns_mode_ok ok 5 ns_host_vsock_ns_mode_write_once_ok ok 6 ns_global_same_cid_fails ok 7 ns_local_same_cid_ok ok 8 ns_global_local_same_cid_ok ok 9 ns_local_global_same_cid_ok ok 10 ns_diff_global_host_connect_to_global_vm_ok ok 11 ns_diff_global_host_connect_to_local_vm_fails ok 12 ns_diff_global_vm_connect_to_global_host_ok ok 13 ns_diff_global_vm_connect_to_local_host_fails ok 14 ns_diff_local_host_connect_to_local_vm_fails ok 15 ns_diff_local_vm_connect_to_local_host_fails ok 16 ns_diff_global_to_local_loopback_local_fails ok 17 ns_diff_local_to_global_loopback_fails ok 18 ns_diff_local_to_local_loopback_fails ok 19 ns_diff_global_to_global_loopback_ok ok 20 ns_same_local_loopback_ok ok 21 ns_same_local_host_connect_to_local_vm_ok ok 22 ns_same_local_vm_connect_to_local_host_ok ok 23 ns_mode_change_connection_continue_vm_ok ok 24 ns_mode_change_connection_continue_host_ok ok 25 ns_mode_change_connection_continue_both_ok ok 26 ns_delete_vm_ok ok 27 ns_delete_host_ok ok 28 ns_delete_both_ok ok 29 ns_loopback_global_global_late_module_load_ok ok 30 ns_loopback_local_local_late_module_load_fails SUMMARY: PASS=30 SKIP=0 FAIL=0 Dependent on series: https://lore.kernel.org/all/20251022-vsock-selftests-fixes-and-improvements… Thanks again for everyone's help and reviews! Signed-off-by: Bobby Eshleman <bobbyeshleman(a)gmail.com> To: Stefano Garzarella <sgarzare(a)redhat.com> To: Shuah Khan <shuah(a)kernel.org> To: David S. Miller <davem(a)davemloft.net> To: Eric Dumazet <edumazet(a)google.com> To: Jakub Kicinski <kuba(a)kernel.org> To: Paolo Abeni <pabeni(a)redhat.com> To: Simon Horman <horms(a)kernel.org> To: Stefan Hajnoczi <stefanha(a)redhat.com> To: Michael S. Tsirkin <mst(a)redhat.com> To: Jason Wang <jasowang(a)redhat.com> To: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com> To: Eugenio Pérez <eperezma(a)redhat.com> To: K. Y. Srinivasan <kys(a)microsoft.com> To: Haiyang Zhang <haiyangz(a)microsoft.com> To: Wei Liu <wei.liu(a)kernel.org> To: Dexuan Cui <decui(a)microsoft.com> To: Bryan Tan <bryan-bt.tan(a)broadcom.com> To: Vishnu Dasa <vishnu.dasa(a)broadcom.com> To: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com> Cc: virtualization(a)lists.linux.dev Cc: netdev(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org Cc: kvm(a)vger.kernel.org Cc: linux-hyperv(a)vger.kernel.org Cc: berrange(a)redhat.com Changes in v8: - Break generic cleanup/refactoring patches into standalone series, remove those from this series - Link to dependency: https://lore.kernel.org/all/20251022-vsock-selftests-fixes-and-improvements… - Link to v7: https://lore.kernel.org/r/20251021-vsock-vmtest-v7-0-0661b7b6f081@meta.com Changes in v7: - fix hv_sock build - break out vmtest patches into distinct, more well-scoped patches - change `orig_net_mode` to `net_mode` - many fixes and style changes in per-patch change sets (see individual patches for specific changes) - optimize `virtio_vsock_skb_cb` layout - update commit messages with more useful descriptions - vsock_loopback: use orig_net_mode instead of current net mode - add tests for edge cases (ns deletion, mode changing, loopback module load ordering) - Link to v6: https://lore.kernel.org/r/20250916-vsock-vmtest-v6-0-064d2eb0c89d@meta.com Changes in v6: - define behavior when mode changes to local while socket/VM is alive - af_vsock: clarify description of CID behavior - af_vsock: use stronger langauge around CID rules (dont use "may") - af_vsock: improve naming of buf/buffer - af_vsock: improve string length checking on proc writes - vsock_loopback: add space in struct to clarify lock protection - vsock_loopback: do proper cleanup/unregister on vsock_loopback_exit() - vsock_loopback: use virtio_vsock_skb_net() instead of sock_net() - vsock_loopback: set loopback to NULL after kfree() - vsock_loopback: use pernet_operations and remove callback mechanism - vsock_loopback: add macros for "global" and "local" - vsock_loopback: fix length checking - vmtest.sh: check for namespace support in vmtest.sh - Link to v5: https://lore.kernel.org/r/20250827-vsock-vmtest-v5-0-0ba580bede5b@meta.com Changes in v5: - /proc/net/vsock_ns_mode -> /proc/sys/net/vsock/ns_mode - vsock_global_net -> vsock_global_dummy_net - fix netns lookup in vhost_vsock to respect pid namespaces - add callbacks for vsock_loopback to avoid circular dependency - vmtest.sh loads vsock_loopback module - remove vsock_net_mode_can_set() - change vsock_net_write_mode() to return true/false based on success - make vsock_net_mode enum instead of u8 - Link to v4: https://lore.kernel.org/r/20250805-vsock-vmtest-v4-0-059ec51ab111@meta.com Changes in v4: - removed RFC tag - implemented loopback support - renamed new tests to better reflect behavior - completed suite of tests with permutations of ns modes and vsock_test as guest/host - simplified socat bridging with unix socket instead of tcp + veth - only use vsock_test for success case, socat for failure case (context in commit message) - lots of cleanup Changes in v3: - add notion of "modes" - add procfs /proc/net/vsock_ns_mode - local and global modes only - no /dev/vhost-vsock-netns - vmtest.sh already merged, so new patch just adds new tests for NS - Link to v2: https://lore.kernel.org/kvm/20250312-vsock-netns-v2-0-84bffa1aa97a@gmail.com Changes in v2: - only support vhost-vsock namespaces - all g2h namespaces retain old behavior, only common API changes impacted by vhost-vsock changes - add /dev/vhost-vsock-netns for "opt-in" - leave /dev/vhost-vsock to old behavior - removed netns module param - Link to v1: https://lore.kernel.org/r/20200116172428.311437-1-sgarzare@redhat.com Changes in v1: - added 'netns' module param to vsock.ko to enable the network namespace support (disabled by default) - added 'vsock_net_eq()' to check the "net" assigned to a socket only when 'netns' support is enabled - Link to RFC: https://patchwork.ozlabs.org/cover/1202235/ --- Bobby Eshleman (14): vsock: a per-net vsock NS mode state vsock/virtio: pack struct virtio_vsock_skb_cb vsock: add netns to vsock skb cb vsock: add netns to vsock core vsock/loopback: add netns support vsock/virtio: add netns to virtio transport common vhost/vsock: add netns support selftests/vsock: add namespace helpers to vmtest.sh selftests/vsock: prepare vm management helpers for namespaces selftests/vsock: add tests for proc sys vsock ns_mode selftests/vsock: add namespace tests for CID collisions selftests/vsock: add tests for host <-> vm connectivity with namespaces selftests/vsock: add tests for namespace deletion and mode changes selftests/vsock: add tests for module loading order MAINTAINERS | 1 + drivers/vhost/vsock.c | 48 +- include/linux/virtio_vsock.h | 47 +- include/net/af_vsock.h | 70 ++- include/net/net_namespace.h | 4 + include/net/netns/vsock.h | 22 + net/vmw_vsock/af_vsock.c | 264 +++++++- net/vmw_vsock/virtio_transport.c | 7 +- net/vmw_vsock/virtio_transport_common.c | 21 +- net/vmw_vsock/vsock_loopback.c | 89 ++- tools/testing/selftests/vsock/vmtest.sh | 1044 ++++++++++++++++++++++++++++++- 11 files changed, 1532 insertions(+), 85 deletions(-) --- base-commit: 962ac5ca99a5c3e7469215bf47572440402dfd59 change-id: 20250325-vsock-vmtest-b3a21d2102c2 prerequisite-message-id: <20251022-vsock-selftests-fixes-and-improvements-v1-0-edeb179d6463(a)meta.com> prerequisite-patch-id: a2eecc3851f2509ed40009a7cab6990c6d7cfff5 prerequisite-patch-id: 501db2100636b9c8fcb3b64b8b1df797ccbede85 prerequisite-patch-id: ba1a2f07398a035bc48ef72edda41888614be449 prerequisite-patch-id: fd5cc5445aca9355ce678e6d2bfa89fab8a57e61 prerequisite-patch-id: 795ab4432ffb0843e22b580374782e7e0d99b909 prerequisite-patch-id: 1499d263dc933e75366c09e045d2125ca39f7ddd prerequisite-patch-id: f92d99bb1d35d99b063f818a19dcda999152d74c prerequisite-patch-id: e3296f38cdba6d903e061cff2bbb3e7615e8e671 prerequisite-patch-id: bc4662b4710d302d4893f58708820fc2a0624325 prerequisite-patch-id: f8991f2e98c2661a706183fde6b35e2b8d9aedcf prerequisite-patch-id: 44bf9ed69353586d284e5ee63d6fffa30439a698 prerequisite-patch-id: d50621bc630eeaf608bbaf260370c8dabf6326df Best regards, -- Bobby Eshleman <bobbyeshleman(a)meta.com>

2 months

2
35
0 0

[PATCH v2 0/5] introduce VM_MAYBE_GUARD and make it sticky

by Lorenzo Stoakes

Currently, guard regions are not visible to users except through /proc/$pid/pagemap, with no explicit visibility at the VMA level. This makes the feature less useful, as it isn't entirely apparent which VMAs may have these entries present, especially when performing actions which walk through memory regions such as those performed by CRIU. This series addresses this issue by introducing the VM_MAYBE_GUARD flag which fulfils this role, updating the smaps logic to display an entry for these. The semantics of this flag are that a guard region MAY be present if set (we cannot be sure, as we can't efficiently track whether an MADV_GUARD_REMOVE finally removes all the guard regions in a VMA) - but if not set the VMA definitely does NOT have any guard regions present. It's problematic to establish this flag without further action, because that means that VMAs with guard regions in them become non-mergeable with adjacent VMAs for no especially good reason. To work around this, this series also introduces the concept of 'sticky' VMA flags - that is flags which: a. if set in one VMA and not in another still permit those VMAs to be merged (if otherwise compatible). b. When they are merged, the resultant VMA must have the flag set. The VMA logic is updated to propagate these flags correctly. Additionally, VM_MAYBE_GUARD being an explicit VMA flag allows us to solve an issue with file-backed guard regions - previously these established an anon_vma object for file-backed mappings solely to have vma_needs_copy() correctly propagate guard region mappings to child processes. We introduce a new flag alias VM_COPY_ON_FORK (which currently only specifies VM_MAYBE_GUARD) and update vma_needs_copy() to check explicitly for this flag and to copy page tables if it is present, which resolves this issue. Additionally, we add the ability for allow-listed VMA flags to be atomically writable with only mmap/VMA read locks held. The only flag we allow so far is VM_MAYBE_GUARD, which we carefully ensure does not cause any races by being allowed to do so. This allows us to maintain guard region installation as a read-locked operation and not endure the overhead of obtaining a write lock here. Finally we introduce extensive VMA userland tests to assert that the sticky VMA logic behaves correctly as well as guard region self tests to assert that smaps visibility is correctly implemented. v2: * Separated out userland VMA tests for sticky behaviour as per Suren. * Added the concept of atomic writable VMA flags as per Pedro and Vlastimil. * Made VM_MAYBE_GUARD an atomic writable flag so we don't have to take a VMA write lock in madvise() as per Pedro and Vlastimil. v1: https://lore.kernel.org/all/cover.1761756437.git.lorenzo.stoakes@oracle.com/ Lorenzo Stoakes (5): mm: introduce VM_MAYBE_GUARD and make visible in /proc/$pid/smaps mm: add atomic VMA flags, use VM_MAYBE_GUARD as such mm: implement sticky, copy on fork VMA flags tools/testing/vma: add VMA sticky userland tests selftests/mm/guard-regions: add smaps visibility test Documentation/filesystems/proc.rst | 1 + fs/proc/task_mmu.c | 1 + include/linux/mm.h | 58 ++++++++++ include/trace/events/mmflags.h | 1 + mm/madvise.c | 22 ++-- mm/memory.c | 3 + mm/vma.c | 22 ++-- tools/testing/selftests/mm/guard-regions.c | 120 +++++++++++++++++++++ tools/testing/selftests/mm/vm_util.c | 5 + tools/testing/selftests/mm/vm_util.h | 1 + tools/testing/vma/vma.c | 89 +++++++++++++-- tools/testing/vma/vma_internal.h | 35 ++++++ 12 files changed, 330 insertions(+), 28 deletions(-) -- 2.51.0

2 months

4
22
0 0

[RFC 0/2] xdp: Delegate fast path return decision to page_pool

by Dragos Tatulea

This small series proposes the removal of the BPF_RI_F_RF_NO_DIRECT XDP flag in favour of page_pool's internal page_pool_napi_local() check which can override a non-direct recycle into a direct one if the right conditions are met., This was discussed on the mailing list on several occasions [1][2]. The first patch adds additional benchmarking code to the page_pool benchmark. The second patch has the actual change with a proper explanation and measurements. It remains to be debated if the whole BPF_RI_F_RF_NO_DIRECT mechanism should be deleted or only its use in xdp_return_frame_rx_napi(). There is still the unresolved issue of drivers that don't support page_pool NAPI recycling. This series could be extended to add that support. Otherwise those drivers would end up with slow path recycling for XDP. [1] https://lore.kernel.org/all/8d165026-1477-46cb-94d4-a01e1da40833@kernel.org/ [2] https://lore.kernel.org/all/20250918084823.372000-1-dtatulea@nvidia.com/ Dragos Tatulea (2): page_pool: add benchmarking for napi-based recycling xdp: Delegate fast path return decision to page_pool drivers/net/veth.c | 2 - include/linux/filter.h | 22 ----- include/net/xdp.h | 2 +- kernel/bpf/cpumap.c | 2 - net/bpf/test_run.c | 2 - net/core/filter.c | 2 +- net/core/xdp.c | 24 ++--- .../bench/page_pool/bench_page_pool_simple.c | 92 ++++++++++++++++++- 8 files changed, 104 insertions(+), 44 deletions(-) -- 2.50.1

2 months

2
2
0 0

[PATCH v8 28/28] tracing: selftests: Add pKVM trace remote tests

by Vincent Donnefort

Run the trace remote selftests with the pKVM trace remote "hypervisor". Cc: Shuah Khan <skhan(a)linuxfoundation.org> Cc: linux-kselftest(a)vger.kernel.org Signed-off-by: Vincent Donnefort <vdonnefort(a)google.com> diff --git a/tools/testing/selftests/ftrace/test.d/remotes/pkvm/buffer_size.tc b/tools/testing/selftests/ftrace/test.d/remotes/pkvm/buffer_size.tc new file mode 100644 index 000000000000..2de07e4d72fe --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/remotes/pkvm/buffer_size.tc @@ -0,0 +1,11 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: Test pkvm hypervisor trace buffer size +# requires: remotes/hypervisor/write_event + +SOURCE_REMOTE_TEST=1 +. $TEST_DIR/remotes/buffer_size.tc + +set -e +setup_remote "hypervisor" +test_buffer_size diff --git a/tools/testing/selftests/ftrace/test.d/remotes/pkvm/reset.tc b/tools/testing/selftests/ftrace/test.d/remotes/pkvm/reset.tc new file mode 100644 index 000000000000..48afc51627e8 --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/remotes/pkvm/reset.tc @@ -0,0 +1,11 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: Test pkvm hypervisor trace buffer reset +# requires: remotes/hypervisor/write_event + +SOURCE_REMOTE_TEST=1 +. $TEST_DIR/remotes/reset.tc + +set -e +setup_remote "hypervisor" +test_reset diff --git a/tools/testing/selftests/ftrace/test.d/remotes/pkvm/trace.tc b/tools/testing/selftests/ftrace/test.d/remotes/pkvm/trace.tc index 49dca7c3861a..00aed1c2e650 100644 --- a/tools/testing/selftests/ftrace/test.d/remotes/pkvm/trace.tc +++ b/tools/testing/selftests/ftrace/test.d/remotes/pkvm/trace.tc @@ -1,9 +1,10 @@ #!/bin/sh # SPDX-License-Identifier: GPL-2.0 -# description: Test pkvm hypervisor tracing pipe +# description: Test pkvm hypervisor non-consuming trace read +# requires: remotes/hypervisor/write_event SOURCE_REMOTE_TEST=1 -. $TEST_DIR/remotes/trace_pipe.tc +. $TEST_DIR/remotes/trace.tc set -e setup_remote "hypervisor" diff --git a/tools/testing/selftests/ftrace/test.d/remotes/pkvm/trace_pipe.tc b/tools/testing/selftests/ftrace/test.d/remotes/pkvm/trace_pipe.tc new file mode 100644 index 000000000000..b63339aca380 --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/remotes/pkvm/trace_pipe.tc @@ -0,0 +1,11 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: Test pkvm hypervisor consuming trace read +# requires: remotes/hypervisor/write_event + +SOURCE_REMOTE_TEST=1 +. $TEST_DIR/remotes/trace_pipe.tc + +set -e +setup_remote "hypervisor" +test_trace_pipe diff --git a/tools/testing/selftests/ftrace/test.d/remotes/pkvm/unloading.tc b/tools/testing/selftests/ftrace/test.d/remotes/pkvm/unloading.tc new file mode 100644 index 000000000000..eb1640a927cc --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/remotes/pkvm/unloading.tc @@ -0,0 +1,11 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: Test pkvm hypervisor trace buffer unloading +# requires: remotes/hypervisor/write_event + +SOURCE_REMOTE_TEST=1 +. $TEST_DIR/remotes/unloading.tc + +set -e +setup_remote "hypervisor" +test_unloading -- 2.51.2.1041.gc1ab5b90ca-goog

2 months

1
0
0 0

[PATCH v8 15/28] tracing: selftests: Add trace remote tests

by Vincent Donnefort

Exercise the tracefs interface for trace remote with a set of tests to check: * loading/unloading (unloading.tc) * reset (reset.tc) * size changes (buffer_size.tc) * consuming read (trace_pipe.tc) * non-consuming read (trace.tc) Cc: Shuah Khan <skhan(a)linuxfoundation.org> Cc: linux-kselftest(a)vger.kernel.org Signed-off-by: Vincent Donnefort <vdonnefort(a)google.com> diff --git a/tools/testing/selftests/ftrace/test.d/remotes/buffer_size.tc b/tools/testing/selftests/ftrace/test.d/remotes/buffer_size.tc new file mode 100644 index 000000000000..1a43280ffa97 --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/remotes/buffer_size.tc @@ -0,0 +1,25 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: Test trace remote buffer size +# requires: remotes/test + +. $TEST_DIR/remotes/functions + +test_buffer_size() +{ + echo 0 > tracing_on + assert_unloaded + + echo 4096 > buffer_size_kb + echo 1 > tracing_on + assert_loaded + + echo 0 > tracing_on + echo 7 > buffer_size_kb +} + +if [ -z "$SOURCE_REMOTE_TEST" ]; then + set -e + setup_remote_test + test_buffer_size +fi diff --git a/tools/testing/selftests/ftrace/test.d/remotes/functions b/tools/testing/selftests/ftrace/test.d/remotes/functions new file mode 100644 index 000000000000..97a09d564a34 --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/remotes/functions @@ -0,0 +1,88 @@ +# SPDX-License-Identifier: GPL-2.0 + +setup_remote() +{ + local name=$1 + + [ -e $TRACING_DIR/remotes/$name/write_event ] || exit_unresolved + + cd remotes/$name/ + echo 0 > tracing_on + clear_trace + echo 7 > buffer_size_kb + echo 0 > events/enable + echo 1 > events/$name/selftest/enable + echo 1 > tracing_on +} + +setup_remote_test() +{ + [ -d $TRACING_DIR/remotes/test/ ] || modprobe remote_test || exit_unresolved + + setup_remote "test" +} + +assert_loaded() +{ + grep -q "(loaded)" buffer_size_kb +} + +assert_unloaded() +{ + grep -q "(unloaded)" buffer_size_kb +} + +dump_trace_pipe() +{ + output=$(mktemp $TMPDIR/remote_test.XXXXXX) + cat trace_pipe > $output & + pid=$! + sleep 1 + kill -1 $pid + + echo $output +} + +check_trace() +{ + start_id="$1" + end_id="$2" + file="$3" + + # Ensure the file is not empty + test -n "$(head $file)" + + prev_ts=0 + id=0 + + # Only keep <timestamp> <id> + tmp=$(mktemp $TMPDIR/remote_test.XXXXXX) + sed -e 's/\[[0-9]*\]\s*$[0-9]*.[0-9]*$: [a-z]* id=$[0-9]*$/\1 \2/' $file > $tmp + + while IFS= read -r line; do + ts=$(echo $line | cut -d ' ' -f 1) + id=$(echo $line | cut -d ' ' -f 2) + + test $(echo "$ts>$prev_ts" | bc) -eq 1 + test $id -eq $start_id + + prev_ts=$ts + start_id=$((start_id + 1)) + done < $tmp + + test $id -eq $end_id + rm $tmp +} + +get_cpu_ids() +{ + sed -n 's/^processor\s*:\s*$[0-9]\+$.*/\1/p' /proc/cpuinfo +} + +get_page_size() { + sed -ne 's/^.*data.*size:$[0-9][0-9]*$.*/\1/p' events/header_page +} + +get_selftest_event_size() { + sed -ne 's/^.*field:.*;.*size:$[0-9][0-9]*$;.*/\1/p' events/*/selftest/format | awk '{s+=$1} END {print s}' +} diff --git a/tools/testing/selftests/ftrace/test.d/remotes/reset.tc b/tools/testing/selftests/ftrace/test.d/remotes/reset.tc new file mode 100644 index 000000000000..4d176349b2bc --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/remotes/reset.tc @@ -0,0 +1,90 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: Test trace remote reset +# requires: remotes/test + +. $TEST_DIR/remotes/functions + +check_reset() +{ + write_event_path="write_event" + taskset="" + + clear_trace + + # Is the buffer empty? + output=$(dump_trace_pipe) + test $(wc -l $output | cut -d ' ' -f1) -eq 0 + + if $(echo $(pwd) | grep -q "per_cpu/cpu"); then + write_event_path="../../write_event" + cpu_id=$(echo $(pwd) | sed -e 's/.*per_cpu\/cpu//') + taskset="taskset -c $cpu_id" + fi + rm $output + + # Can we properly write a new event? + $taskset echo 7890 > $write_event_path + output=$(dump_trace_pipe) + test $(wc -l $output | cut -d ' ' -f1) -eq 1 + grep -q "id=7890" $output + rm $output +} + +test_global_interface() +{ + output=$(mktemp $TMPDIR/remote_test.XXXXXX) + + # Confidence check + echo 123456 > write_event + output=$(dump_trace_pipe) + grep -q "id=123456" $output + rm $output + + # Reset single event + echo 1 > write_event + check_reset + + # Reset lost events + for i in $(seq 1 10000); do + echo 1 > write_event + done + check_reset +} + +test_percpu_interface() +{ + [ "$(get_cpu_ids | wc -l)" -ge 2 ] || return 0 + + for cpu in $(get_cpu_ids); do + taskset -c $cpu echo 1 > write_event + done + + check_non_empty=0 + for cpu in $(get_cpu_ids); do + cd per_cpu/cpu$cpu/ + + if [ $check_non_empty -eq 0 ]; then + check_reset + check_non_empty=1 + else + # Check we have only reset 1 CPU + output=$(dump_trace_pipe) + test $(wc -l $output | cut -d ' ' -f1) -eq 1 + rm $output + fi + cd - + done +} + +test_reset() +{ + test_global_interface + test_percpu_interface +} + +if [ -z "$SOURCE_REMOTE_TEST" ]; then + set -e + setup_remote_test + test_reset +fi diff --git a/tools/testing/selftests/ftrace/test.d/remotes/trace.tc b/tools/testing/selftests/ftrace/test.d/remotes/trace.tc new file mode 100644 index 000000000000..081133ec45ff --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/remotes/trace.tc @@ -0,0 +1,127 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: Test trace remote non-consuming read +# requires: remotes/test + +. $TEST_DIR/remotes/functions + +test_trace() +{ + echo 0 > tracing_on + assert_unloaded + + echo 7 > buffer_size_kb + echo 1 > tracing_on + assert_loaded + + # Simple test: Emit few events and try to read them + for i in $(seq 1 8); do + echo $i > write_event + done + + check_trace 1 8 trace + + # + # Test interaction with consuming read + # + + cat trace_pipe > /dev/null & + pid=$! + + sleep 1 + kill $pid + + test $(wc -l < trace) -eq 0 + + for i in $(seq 16 32); do + echo $i > write_event + done + + check_trace 16 32 trace + + # + # Test interaction with reset + # + + echo 0 > trace + + test $(wc -l < trace) -eq 0 + + for i in $(seq 1 8); do + echo $i > write_event + done + + check_trace 1 8 trace + + # + # Test interaction with lost events + # + + # Ensure the writer is not on the reader page by reloading the buffer + echo 0 > tracing_on + echo 0 > trace + assert_unloaded + echo 1 > tracing_on + assert_loaded + + # Ensure ring-buffer overflow by emitting events from the same CPU + for cpu in $(get_cpu_ids); do + break + done + + events_per_page=$(($(get_page_size) / $(get_selftest_event_size))) # Approx: does not take TS into account + nr_events=$(($events_per_page * 2)) + for i in $(seq 1 $nr_events); do + taskset -c $cpu echo $i > write_event + done + + id=$(sed -n -e '1s/\[[0-9]*\]\s*[0-9]*.[0-9]*: [a-z]* id=$[0-9]*$/\1/p' trace) + test $id -ne 1 + + check_trace $id $nr_events trace + + # + # Test per-CPU interface + # + echo 0 > trace + + for cpu in $(get_cpu_ids) ; do + taskset -c $cpu echo $cpu > write_event + done + + for cpu in $(get_cpu_ids); do + cd per_cpu/cpu$cpu/ + + check_trace $cpu $cpu trace + + cd - > /dev/null + done + + # + # Test with hotplug + # + + [ "$(get_cpu_ids | wc -l)" -ge 2 ] || return 0 + + echo 0 > trace + + for cpu in $(get_cpu_ids); do + echo 0 > /sys/devices/system/cpu/cpu$cpu/online + break + done + + for i in $(seq 1 8); do + echo $i > write_event + done + + check_trace 1 8 trace + + echo 1 > /sys/devices/system/cpu/cpu$cpu/online +} + +if [ -z "$SOURCE_REMOTE_TEST" ]; then + set -e + + setup_remote_test + test_trace +fi diff --git a/tools/testing/selftests/ftrace/test.d/remotes/trace_pipe.tc b/tools/testing/selftests/ftrace/test.d/remotes/trace_pipe.tc new file mode 100644 index 000000000000..d28eaee10c7c --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/remotes/trace_pipe.tc @@ -0,0 +1,127 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: Test trace remote consuming read +# requires: remotes/test + +. $TEST_DIR/remotes/functions + +test_trace_pipe() +{ + echo 0 > tracing_on + assert_unloaded + + # Emit events from the same CPU + for cpu in $(get_cpu_ids); do + break + done + + # + # Simple test: Emit enough events to fill few pages + # + + echo 1024 > buffer_size_kb + echo 1 > tracing_on + assert_loaded + + events_per_page=$(($(get_page_size) / $(get_selftest_event_size))) + nr_events=$(($events_per_page * 4)) + + output=$(mktemp $TMPDIR/remote_test.XXXXXX) + + cat trace_pipe > $output & + pid=$! + + for i in $(seq 1 $nr_events); do + taskset -c $cpu echo $i > write_event + done + + echo 0 > tracing_on + sleep 1 + kill $pid + + check_trace 1 $nr_events $output + + rm $output + + # + # Test interaction with lost events + # + + assert_unloaded + echo 7 > buffer_size_kb + echo 1 > tracing_on + assert_loaded + + nr_events=$((events_per_page * 2)) + for i in $(seq 1 $nr_events); do + taskset -c $cpu echo $i > write_event + done + + output=$(dump_trace_pipe) + + lost_events=$(sed -n -e '1s/CPU:.*\[LOST $[0-9]*$ EVENTS\]/\1/p' $output) + test -n "$lost_events" + + id=$(sed -n -e '2s/\[[0-9]*\]\s*[0-9]*.[0-9]*: [a-z]* id=$[0-9]*$/\1/p' $output) + test "$id" -eq $(($lost_events + 1)) + + # Drop [LOST EVENTS] line + sed -i '1d' $output + + check_trace $id $nr_events $output + + rm $output + + # + # Test per-CPU interface + # + + echo 0 > trace + echo 1 > tracing_on + + for cpu in $(get_cpu_ids); do + taskset -c $cpu echo $cpu > write_event + done + + for cpu in $(get_cpu_ids); do + cd per_cpu/cpu$cpu/ + output=$(dump_trace_pipe) + + check_trace $cpu $cpu $output + + rm $output + cd - > /dev/null + done + + # + # Test interaction with hotplug + # + + [ "$(get_cpu_ids | wc -l)" -ge 2 ] || return 0 + + echo 0 > trace + + for cpu in $(get_cpu_ids); do + echo 0 > /sys/devices/system/cpu/cpu$cpu/online + break + done + + for i in $(seq 1 8); do + echo $i > write_event + done + + output=$(dump_trace_pipe) + + check_trace 1 8 $output + + rm $output + + echo 1 > /sys/devices/system/cpu/cpu$cpu/online +} + +if [ -z "$SOURCE_REMOTE_TEST" ]; then + set -e + + setup_remote_test + test_trace_pipe +fi diff --git a/tools/testing/selftests/ftrace/test.d/remotes/unloading.tc b/tools/testing/selftests/ftrace/test.d/remotes/unloading.tc new file mode 100644 index 000000000000..cac2190183f6 --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/remotes/unloading.tc @@ -0,0 +1,41 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: Test trace remote unloading +# requires: remotes/test + +. $TEST_DIR/remotes/functions + +test_unloading() +{ + # No reader, writing + assert_loaded + + # No reader, no writing + echo 0 > tracing_on + assert_unloaded + + # 1 reader, no writing + cat trace_pipe & + pid=$! + sleep 1 + assert_loaded + kill $pid + assert_unloaded + + # No reader, no writing, events + echo 1 > tracing_on + echo 1 > write_event + echo 0 > tracing_on + assert_loaded + + # Test reset + clear_trace + assert_unloaded +} + +if [ -z "$SOURCE_REMOTE_TEST" ]; then + set -e + + setup_remote_test + test_unloading +fi -- 2.51.2.1041.gc1ab5b90ca-goog

2 months

1
0
0 0

[PATCH v2] selftests: harness: Support KCOV.

by Kuniyuki Iwashima

While writing a selftest with kselftest_harness.h, I often want to check which paths are actually exercised. Let's support generating KCOV coverage data. We can specify the output directory via the KCOV_OUTPUT environment variable, and the number of instructions to collect via the KCOV_SLOTS environment variable. # KCOV_OUTPUT=$PWD/kcov KCOV_SLOTS=$((4096 * 2)) \ ./tools/testing/selftests/net/af_unix/scm_inq Both variables can also be specified as the make variable. # make -C tools/testing/selftests/ \ KCOV_OUTPUT=$PWD/kcov KCOV_SLOTS=$((4096 * 4)) \ kselftest_override_timeout=60 TARGETS=net/af_unix run_tests The coverage data can be simply decoded with addr2line: $ cat kcov/* | sort | uniq | addr2line -e vmlinux | grep unix net/unix/af_unix.c:1056 net/unix/af_unix.c:3138 net/unix/af_unix.c:3834 net/unix/af_unix.c:3838 net/unix/af_unix.c:311 (discriminator 2) ... or more nicely with a script embedded in vock [0]: $ cat kcov/* | sort | uniq > local.log $ python3 ~/kernel/tools/vock/report.py \ --kernel-src ./ --vmlinux ./vmlinux \ --mode local --local-log local.log --filter unix ... ------------------------------- Coverage Report -------------------------------- 📄 net/unix/af_unix.c (276 lines) ... 942 | static int unix_setsockopt(struct socket *sock, int level, int optname, 943 | sockptr_t optval, unsigned int optlen) 944 | { ... 961 | switch (optname) { 962 | case SO_INQ: 963 > if (sk->sk_type != SOCK_STREAM) 964 | return -EINVAL; 965 | 966 > if (val > 1 || val < 0) 967 | return -EINVAL; 968 | 969 > WRITE_ONCE(u->recvmsg_inq, val); 970 | break; Link: https://github.com/kzall0c/vock/blob/f3d97de9954f9df758c0ab287ca7e24e654288… #[0] Signed-off-by: Kuniyuki Iwashima <kuniyu(a)google.com> --- v2: Support TEST() v1: https://lore.kernel.org/linux-kselftest/20251017084022.3721950-1-kuniyu@goo… --- Documentation/dev-tools/kselftest.rst | 41 ++++++ tools/testing/selftests/Makefile | 14 ++- tools/testing/selftests/kselftest_harness.h | 133 +++++++++++++++++++- 3 files changed, 178 insertions(+), 10 deletions(-) diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst index 18c2da67fae4..5c2b92ac4a30 100644 --- a/Documentation/dev-tools/kselftest.rst +++ b/Documentation/dev-tools/kselftest.rst @@ -200,6 +200,47 @@ You can look at the TAP output to see if you ran into the timeout. Test runners which know a test must run under a specific time can then optionally treat these timeouts then as fatal. +KCOV for selftests +================== + +Selftests built with `kselftest_harness.h` natively support generating +KCOV coverage data. See :doc:`KCOV: code coverage for fuzzing </dev-tools/kcov>` +for prerequisites. + +You can specify the output directory with the `KCOV_OUTPUT` environment +variable. Additionally, you can specify the number of instructions to +collect with the `KCOV_SLOTS` environment variable :: + + # KCOV_OUTPUT=$PWD/kcov KCOV_SLOTS=$((4096 * 2)) \ + ./tools/testing/selftests/net/af_unix/scm_inq + +In the output directory, a coverage file is generated for each test +case in the selftest :: + + $ ls kcov/ + scm_inq.dgram.basic scm_inq.seqpacket.basic scm_inq.stream.basic + +The default value of `KCOV_SLOTS` is `4096`, and `KCOV_SLOTS` multiplied +by `sizeof(unsigned long)` must be multiple of `4096`, so the smallest +value is `512`. + +Both `KCOV_OUTPUT` and `KCOV_SLOTS` can be specified as the variables +on the `make` command line :: + + # make -C tools/testing/selftests/ \ + kselftest_override_timeout=60 \ + KCOV_OUTPUT=$PWD/kcov KCOV_SLOTS=$((4096 * 4)) \ + TARGETS=net/af_unix run_tests + +The collected data can be decoded with `addr2line` :: + + $ cat kcov/* | sort | uniq | addr2line -e vmlinux | grep unix + net/unix/af_unix.c:1056 + net/unix/af_unix.c:3138 + net/unix/af_unix.c:3834 + net/unix/af_unix.c:3838 + ... + Packaging selftests =================== diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index c46ebdb9b8ef..40e70fb1a347 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -218,12 +218,14 @@ all: done; exit $$ret; run_tests: all - @for TARGET in $(TARGETS); do \ - BUILD_TARGET=$$BUILD/$$TARGET; \ - $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET run_tests \ - SRC_PATH=$(shell readlink -e $$(pwd)) \ - OBJ_PATH=$(BUILD) \ - O=$(abs_objtree); \ + @for TARGET in $(TARGETS); do \ + BUILD_TARGET=$$BUILD/$$TARGET; \ + $(MAKE) OUTPUT=$$BUILD_TARGET \ + KCOV_OUTPUT=$(abspath $(KCOV_OUTPUT)) \ + -C $$TARGET run_tests \ + SRC_PATH=$(shell readlink -e $$(pwd)) \ + OBJ_PATH=$(BUILD) \ + O=$(abs_objtree); \ done; hotplug: diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h index 3f66e862e83e..5b7a01722981 100644 --- a/tools/testing/selftests/kselftest_harness.h +++ b/tools/testing/selftests/kselftest_harness.h @@ -56,6 +56,8 @@ #include <asm/types.h> #include <ctype.h> #include <errno.h> +#include <fcntl.h> +#include <linux/kcov.h> #include <linux/unistd.h> #include <poll.h> #include <stdbool.h> @@ -63,7 +65,9 @@ #include <stdio.h> #include <stdlib.h> #include <string.h> +#include <sys/ioctl.h> #include <sys/mman.h> +#include <sys/stat.h> #include <sys/types.h> #include <sys/wait.h> #include <unistd.h> @@ -175,9 +179,12 @@ static void test_name(struct __test_metadata *_metadata); \ static void wrapper_##test_name( \ struct __test_metadata *_metadata, \ - struct __fixture_variant_metadata __attribute__((unused)) *variant) \ + struct __fixture_variant_metadata __attribute__((unused)) *variant, \ + char *test_full_name) \ { \ + enable_kcov(_metadata); \ test_name(_metadata); \ + disable_kcov(_metadata, test_full_name); \ } \ static struct __test_metadata _##test_name##_object = \ { .name = #test_name, \ @@ -401,7 +408,8 @@ const FIXTURE_VARIANT(fixture_name) *variant); \ static void wrapper_##fixture_name##_##test_name( \ struct __test_metadata *_metadata, \ - struct __fixture_variant_metadata *variant) \ + struct __fixture_variant_metadata *variant, \ + char *test_full_name) \ { \ /* fixture data is alloced, setup, and torn down per call. */ \ FIXTURE_DATA(fixture_name) self_private, *self = NULL; \ @@ -430,7 +438,9 @@ if (_metadata->exit_code) \ _exit(0); \ *_metadata->no_teardown = false; \ + enable_kcov(_metadata); \ fixture_name##_##test_name(_metadata, self, variant->data); \ + disable_kcov(_metadata, test_full_name); \ _metadata->teardown_fn(false, _metadata, self, variant->data); \ _exit(0); \ } else if (child < 0 || child != waitpid(child, &status, 0)) { \ @@ -470,6 +480,8 @@ object->teardown_fn = &wrapper_##fixture_name##_##test_name##_teardown; \ object->termsig = signal; \ object->timeout = tmout; \ + object->kcov_fd = -1; \ + object->kcov_slots = -1; \ _##fixture_name##_##test_name##_object = object; \ __register_test(object); \ } \ @@ -908,7 +920,8 @@ __register_fixture_variant(struct __fixture_metadata *f, struct __test_metadata { const char *name; void (*fn)(struct __test_metadata *, - struct __fixture_variant_metadata *); + struct __fixture_variant_metadata *, + char *test_name); pid_t pid; /* pid of test when being run */ struct __fixture_metadata *fixture; void (*teardown_fn)(bool in_parent, struct __test_metadata *_metadata, @@ -923,6 +936,10 @@ struct __test_metadata { const void *variant; struct __test_results *results; struct __test_metadata *prev, *next; + int kcov_fd; + int kcov_slots; + char *kcov_dir; + unsigned long *kcov_mem; }; static inline bool __test_passed(struct __test_metadata *metadata) @@ -1185,6 +1202,114 @@ static bool test_enabled(int argc, char **argv, return !has_positive; } +#define KCOV_SLOTS 4096 + +static void enable_kcov(struct __test_metadata *t) +{ + char *slots; + int err; + + t->kcov_dir = getenv("KCOV_OUTPUT"); + if (!t->kcov_dir || *t->kcov_dir == '\0') + return; + + slots = getenv("KCOV_SLOTS"); + if (slots && *slots != '\0') + sscanf(slots, "%d", &t->kcov_slots); + if (t->kcov_slots <= 0) + t->kcov_slots = KCOV_SLOTS; + + t->kcov_fd = open("/sys/kernel/debug/kcov", O_RDWR); + if (t->kcov_fd < 0) { + ksft_print_msg("ERROR OPENING KCOV FD\n"); + goto err; + } + + err = ioctl(t->kcov_fd, KCOV_INIT_TRACE, t->kcov_slots); + if (err) { + ksft_print_msg("ERROR INITIALISING KCOV\n"); + goto err; + } + + t->kcov_mem = mmap(NULL, sizeof(unsigned long) * t->kcov_slots, + PROT_READ | PROT_WRITE, MAP_SHARED, t->kcov_fd, 0); + if ((void *)t->kcov_mem == MAP_FAILED) { + ksft_print_msg("ERROR ALLOCATING MEMORY FOR KCOV\n"); + goto err; + } + + err = ioctl(t->kcov_fd, KCOV_ENABLE, KCOV_TRACE_PC); + if (err) { + ksft_print_msg("ERROR ENABLING KCOV\n"); + goto err; + } + + __atomic_store_n(&t->kcov_mem[0], 0, __ATOMIC_RELAXED); + return; +err: + t->exit_code = KSFT_FAIL; + _exit(KSFT_FAIL); +} + +static void disable_kcov(struct __test_metadata *t, char *test_name) +{ + int slots, err, dir, fd, i; + + if (t->kcov_fd == -1) + return; + + slots = __atomic_load_n(&t->kcov_mem[0], __ATOMIC_RELAXED); + if (slots == t->kcov_slots - 1) + ksft_print_msg("Set KCOV_SLOTS to a value greater than %d\n", t->kcov_slots); + + err = ioctl(t->kcov_fd, KCOV_DISABLE, 0); + if (err) { + ksft_print_msg("ERROR DISABLING KCOV\n"); + goto out; + } + + err = mkdir(t->kcov_dir, 0755); + if (err == -1 && errno != EEXIST) { + ksft_print_msg("ERROR CREATING '%s'\n", t->kcov_dir); + goto out; + } + err = 0; + + dir = open(t->kcov_dir, O_DIRECTORY); + if (dir < 0) { + ksft_print_msg("ERROR OPENING %s\n", t->kcov_dir); + err = dir; + goto out; + } + + fd = openat(dir, test_name, O_RDWR | O_CREAT | O_TRUNC); + + close(dir); + + if (fd == -1) { + ksft_print_msg("ERROR CREATING '%s' at '%s'\n", test_name, t->kcov_dir); + err = fd; + goto out; + } + + for (i = 0; i < slots; i++) { + char buf[64]; + int size; + + size = snprintf(buf, 64, "0x%lx\n", t->kcov_mem[i + 1]); + write(fd, buf, size); + } + +out: + munmap(t->kcov_mem, sizeof(t->kcov_mem[0]) * t->kcov_slots); + close(t->kcov_fd); + + if (err) { + t->exit_code = KSFT_FAIL; + _exit(KSFT_FAIL); + } +} + static void __run_test(struct __fixture_metadata *f, struct __fixture_variant_metadata *variant, struct __test_metadata *t) @@ -1216,7 +1341,7 @@ static void __run_test(struct __fixture_metadata *f, t->exit_code = KSFT_FAIL; } else if (child == 0) { setpgrp(); - t->fn(t, variant); + t->fn(t, variant, test_name); _exit(t->exit_code); } else { t->pid = child; -- 2.51.1.838.g19442a804e-goog

2 months

3
3
0 0

[PATCH RESEND 0/5] release of KTAP version 2

by Rae Moar

Hi all! I wanted to resend out this series to respark the discussion on KTAP version 2. Many of the features proposed are already in use by KUnit. This would add these features to the KTAP documentation. Note that all the features of KTAP v2 are backwards compatible. Also, today is my last day at Google so I will be responding with my personal email afterwards. -- This patch series represents the final release of KTAP version 2. There have been open discussions on version 2 for just over 2 years. This patch series marks the end of KTAP version 2 development and beginning of the KTAP version 3 development. The largest component of KTAP version 2 release is the addition of test metadata to the specification. KTAP metadata could include any test information that is pertinent for user interaction before or after the running of the test. For example, the test file path or the test speed. Example of KTAP Metadata: KTAP version 2 #:ktap_test: main #:ktap_arch: uml 1..1 KTAP version 2 #:ktap_test: suite_1 #:ktap_subsystem: example #:ktap_test_file: lib/test.c 1..2 ok 1 test_1 #:ktap_test: test_2 #:ktap_speed: very_slow # test_2 has begun #:custom_is_flaky: true ok 2 test_2 # suite_1 has passed ok 1 suite_1 The release also includes some formatting fixes and changes to update the specification to version 2. Frank Rowand (2): ktap_v2: change version to 2-rc in KTAP specification ktap_v2: change "version 1" to "version 2" in examples Rae Moar (3): ktap_v2: add test metadata ktap_v2: formatting fixes to ktap spec ktap_v2: change version to 2 in KTAP specification Documentation/dev-tools/ktap.rst | 273 +++++++++++++++++++++++++++++-- 1 file changed, 257 insertions(+), 16 deletions(-) base-commit: 9de5f847ef8fa205f4fd704a381d32ecb5b66da9 -- 2.51.2.1041.gc1ab5b90ca-goog

2 months

1
5
0 0

[PATCH] vfio: selftests: Store libvfio build outputs in $(OUTPUT)/libvfio

by David Matlack

Store the tools/testing/selftests/vfio/lib outputs (e.g. object files) in $(OUTPUT)/libvfio rather than in $(OUTPUT)/lib. This is in preparation for building the VFIO selftests library into the KVM selftests (see Link below). Specifically this will avoid name conflicts between tools/testing/selftests/{vfio,kvm/lib and also avoid leaving behind empty directories under tools/testing/selftests/kvm after a make clean. Link: https://lore.kernel.org/kvm/20250912222525.2515416-2-dmatlack@google.com/ Signed-off-by: David Matlack <dmatlack(a)google.com> --- Note: This patch applies on top of vfio/next. https://github.com/awilliam/linux-vfio/tree/next tools/testing/selftests/vfio/lib/libvfio.mk | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/vfio/lib/libvfio.mk b/tools/testing/selftests/vfio/lib/libvfio.mk index 5d11c3a89a28..3c0cdac30cb6 100644 --- a/tools/testing/selftests/vfio/lib/libvfio.mk +++ b/tools/testing/selftests/vfio/lib/libvfio.mk @@ -1,24 +1,26 @@ include $(top_srcdir)/scripts/subarch.include ARCH ?= $(SUBARCH) -VFIO_DIR := $(selfdir)/vfio +LIBVFIO_SRCDIR := $(selfdir)/vfio/lib -LIBVFIO_C := lib/vfio_pci_device.c -LIBVFIO_C += lib/vfio_pci_driver.c +LIBVFIO_C := vfio_pci_device.c +LIBVFIO_C += vfio_pci_driver.c ifeq ($(ARCH:x86_64=x86),x86) -LIBVFIO_C += lib/drivers/ioat/ioat.c -LIBVFIO_C += lib/drivers/dsa/dsa.c +LIBVFIO_C += drivers/ioat/ioat.c +LIBVFIO_C += drivers/dsa/dsa.c endif -LIBVFIO_O := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBVFIO_C)) +LIBVFIO_OUTPUT := $(OUTPUT)/libvfio + +LIBVFIO_O := $(patsubst %.c, $(LIBVFIO_OUTPUT)/%.o, $(LIBVFIO_C)) LIBVFIO_O_DIRS := $(shell dirname $(LIBVFIO_O) | uniq) $(shell mkdir -p $(LIBVFIO_O_DIRS)) -CFLAGS += -I$(VFIO_DIR)/lib/include +CFLAGS += -I$(LIBVFIO_SRCDIR)/include -$(LIBVFIO_O): $(OUTPUT)/%.o : $(VFIO_DIR)/%.c +$(LIBVFIO_O): $(LIBVFIO_OUTPUT)/%.o : $(LIBVFIO_SRCDIR)/%.c $(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@ -EXTRA_CLEAN += $(LIBVFIO_O) +EXTRA_CLEAN += $(LIBVFIO_OUTPUT) base-commit: acb59a4bb8ed34e738a4c3463127bf3f6b5e11a9 -- 2.51.0.534.gc79095c0ca-goog

2 months

2
1
0 0

[PATCH net-next v6 0/6] net: devmem: improve cpu cost of RX token management

by Bobby Eshleman

This series improves the CPU cost of RX token management by adding a socket option that configures the socket to avoid the xarray allocator and instead use an niov array and a uref field in niov. Improvement is ~13% cpu util per RX user thread. Using kperf, the following results were observed: Before: Average RX worker idle %: 13.13, flows 4, test runs 11 After: Average RX worker idle %: 26.32, flows 4, test runs 11 Two other approaches were tested, but with no improvement. Namely, 1) using a hashmap for tokens and 2) keeping an xarray of atomic counters but using RCU so that the hotpath could be mostly lockless. Neither of these approaches proved better than the simple array in terms of CPU. The sockopt SO_DEVMEM_AUTORELEASE is added to toggle the optimization. It defaults to 0 (i.e., optimization on). Note that prior revs reported only a 5% gain. This lower gain was measured with cpu frequency boosting (unknowingly) disabled. A consistent ~13% is measured for both kperf and nccl workloads with cpu frequency boosting on. To: David S. Miller <davem(a)davemloft.net> To: Eric Dumazet <edumazet(a)google.com> To: Jakub Kicinski <kuba(a)kernel.org> To: Paolo Abeni <pabeni(a)redhat.com> To: Simon Horman <horms(a)kernel.org> To: Kuniyuki Iwashima <kuniyu(a)google.com> To: Willem de Bruijn <willemb(a)google.com> To: Neal Cardwell <ncardwell(a)google.com> To: David Ahern <dsahern(a)kernel.org> To: Mina Almasry <almasrymina(a)google.com> To: Arnd Bergmann <arnd(a)arndb.de> To: Jonathan Corbet <corbet(a)lwn.net> To: Andrew Lunn <andrew+netdev(a)lunn.ch> To: Shuah Khan <shuah(a)kernel.org> Cc: Stanislav Fomichev <sdf(a)fomichev.me> Cc: netdev(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org Cc: linux-arch(a)vger.kernel.org Cc: linux-doc(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Signed-off-by: Bobby Eshleman <bobbyeshleman(a)meta.com> Changes in v6: - renamed 'net: devmem: use niov array for token management' to refer to optionality of new config - added documentation and tests - make autorelease flag per-socket sockopt instead of binding field / sysctl - many per-patch changes (see Changes sections per-patch) - Link to v5: https://lore.kernel.org/r/20251023-scratch-bobbyeshleman-devmem-tcp-token-u… Changes in v5: - add sysctl to opt-out of performance benefit, back to old token release - Link to v4: https://lore.kernel.org/all/20250926-scratch-bobbyeshleman-devmem-tcp-token… Changes in v4: - rebase to net-next - Link to v3: https://lore.kernel.org/r/20250926-scratch-bobbyeshleman-devmem-tcp-token-u… Changes in v3: - make urefs per-binding instead of per-socket, reducing memory footprint - fallback to cleaning up references in dmabuf unbind if socket leaked tokens - drop ethtool patch - Link to v2: https://lore.kernel.org/r/20250911-scratch-bobbyeshleman-devmem-tcp-token-u… Changes in v2: - net: ethtool: prevent user from breaking devmem single-binding rule (Mina) - pre-assign niovs in binding->vec for RX case (Mina) - remove WARNs on invalid user input (Mina) - remove extraneous binding ref get (Mina) - remove WARN for changed binding (Mina) - always use GFP_ZERO for binding->vec (Mina) - fix length of alloc for urefs - use atomic_set(, 0) to initialize sk_user_frags.urefs - Link to v1: https://lore.kernel.org/r/20250902-scratch-bobbyeshleman-devmem-tcp-token-u… --- Bobby Eshleman (6): net: devmem: rename tx_vec to vec in dmabuf binding net: devmem: refactor sock_devmem_dontneed for autorelease split net: devmem: prepare for autorelease rx token management net: devmem: add SO_DEVMEM_AUTORELEASE for autorelease control net: devmem: document SO_DEVMEM_AUTORELEASE socket option net: devmem: add tests for SO_DEVMEM_AUTORELEASE socket option Documentation/networking/devmem.rst | 70 +++++++++- include/net/netmem.h | 1 + include/net/sock.h | 13 +- include/uapi/asm-generic/socket.h | 2 + net/core/devmem.c | 54 +++++--- net/core/devmem.h | 4 +- net/core/sock.c | 152 ++++++++++++++++++---- net/ipv4/tcp.c | 69 ++++++++-- net/ipv4/tcp_ipv4.c | 11 +- net/ipv4/tcp_minisocks.c | 5 +- tools/include/uapi/asm-generic/socket.h | 2 + tools/testing/selftests/drivers/net/hw/devmem.py | 115 +++++++++++++++- tools/testing/selftests/drivers/net/hw/ncdevmem.c | 20 ++- 13 files changed, 453 insertions(+), 65 deletions(-) --- base-commit: 255d75ef029f33f75fcf5015052b7302486f7ad2 change-id: 20250829-scratch-bobbyeshleman-devmem-tcp-token-upstream-292be174d503 Best regards, -- Bobby Eshleman <bobbyeshleman(a)meta.com>

2 months

5
17
0 0

[PATCH bpf-next 0/2] selftests/bpf: enfoce SO_REUSEADDR in basic test servers

by Alexis Lothoré (eBPF Foundation)

Hello, This small series is another follow-up to [1], in which I misunderstood Martin's initial feedback (see [2]). I proposed to make tc-tunnel apply SO_REUSEPORT once server is brought up. This series updates start_server_addr to really apply Martin's proposal after his clarification [3] [1] https://lore.kernel.org/bpf/20251031-tc_tunnel_improv-v1-0-0ffe44d27eda@boo… [2] https://lore.kernel.org/bpf/efa3540a-1f52-46ca-9f49-e631a5e3e48c@linux.dev/ [3] https://lore.kernel.org/bpf/4cbabdf1-af2c-490a-a41a-b40c1539c1cb@linux.dev/ Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com> --- Alexis Lothoré (eBPF Foundation) (2): selftests/bpf: systematically add SO_REUSEADDR in start_server_addr selftests/bpf: use start_server_str rather than start_reuseport_server in tc_tunnel tools/testing/selftests/bpf/network_helpers.c | 9 +++++++- .../selftests/bpf/prog_tests/test_tc_tunnel.c | 27 ++++++++++++---------- 2 files changed, 23 insertions(+), 13 deletions(-) --- base-commit: de0745f7cc98146c70a020bc3a1b73c7f3405282 change-id: 20251104-start-server-soreuseaddr-e442446e2d37 Best regards, -- Alexis Lothoré, Bootlin Embedded Linux and Kernel engineering https://bootlin.com

2 months

3
4
0 0

[PATCH] selftests/tracing: Add basic test for trace_marker_raw file

by Steven Rostedt

From: Steven Rostedt <rostedt(a)goodmis.org> Commit 64cf7d058a00 ("tracing: Have trace_marker use per-cpu data to read user space") made an update that fixed both trace_marker and trace_marker_raw. But the small difference made to trace_marker_raw had a blatant bug in it that any basic testing would have uncovered. Unfortunately, the self tests have tests for trace_marker but nothing for trace_marker_raw which allowed the bug to get upstream. Add basic selftests to test trace_marker_raw so that this doesn't happen again. Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- .../ftrace/test.d/00basic/trace_marker_raw.tc | 107 ++++++++++++++++++ 1 file changed, 107 insertions(+) create mode 100644 tools/testing/selftests/ftrace/test.d/00basic/trace_marker_raw.tc diff --git a/tools/testing/selftests/ftrace/test.d/00basic/trace_marker_raw.tc b/tools/testing/selftests/ftrace/test.d/00basic/trace_marker_raw.tc new file mode 100644 index 000000000000..7daf7292209e --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/00basic/trace_marker_raw.tc @@ -0,0 +1,107 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: Basic tests on writing to trace_marker_raw +# requires: trace_marker_raw +# flags: instance + +is_little_endian() { + if lscpu | grep -q 'Little Endian'; then + echo 1; + else + echo 0; + fi +} + +little=`is_little_endian` + +make_str() { + id=$1 + cnt=$2 + + if [ $little -eq 1 ]; then + val=`printf "\\%03o\\%03o\\%03o\\%03o" \ + $(($id & 0xff)) \ + $((($id >> 8) & 0xff)) \ + $((($id >> 16) & 0xff)) \ + $((($id >> 24) & 0xff))` + else + val=`printf "\\%03o\\%03o\\%03o\\%03o" \ + $((($id >> 24) & 0xff)) \ + $((($id >> 16) & 0xff)) \ + $((($id >> 8) & 0xff)) \ + $(($id & 0xff))` + fi + + data=`printf -- 'X%.0s' $(seq $cnt)` + + printf "${val}${data}" +} + +write_buffer() { + id=$1 + size=$2 + + # write the string into the raw marker + make_str $id $size > trace_marker_raw +} + + +test_multiple_writes() { + + # Write a bunch of data where the id is the count of + # data to write + for i in `seq 1 10` `seq 101 110` `seq 1001 1010`; do + write_buffer $i $i + done + + # add a little buffer + echo stop > trace_marker + + # Check to make sure the number of entries is the id (rounded up by 4) + awk '/.*: # [0-9a-f]* / { + print; + cnt = -1; + for (i = 0; i < NF; i++) { + # The counter is after the "#" marker + if ( $i == "#" ) { + i++; + cnt = strtonum("0x" $i); + num = NF - (i + 1); + # The number of items is always rounded up by 4 + cnt2 = int((cnt + 3) / 4) * 4; + if (cnt2 != num) { + exit 1; + } + break; + } + } + } + // { if (NR > 30) { exit 0; } } ' trace_pipe; +} + + +get_buffer_data_size() { + sed -ne 's/^.*data.*size:$[0-9][0-9]*$.*/\1/p' events/header_page +} + +test_buffer() { + + # The id must be four bytes, test that 3 bytes fails a write + if echo -n abc > ./trace_marker_raw ; then + echo "Too small of write expected to fail but did not" + exit_fail + fi + + size=`get_buffer_data_size` + echo size = $size + + # Now add a little more than what it can handle + + if write_buffer 0xdeadbeef $size ; then + echo "Too big of write expected to fail but did not" + exit_fail + fi +} + +test_buffer +test_multiple_writes -- 2.51.0

2 months

3
4
0 0

[PATCH] selftests/dma: fix invalid array access in printf

by Zhang Chujun

The printf statement attempts to print the DMA direction string using the syntax 'dir[directions]', which is an invalid array access. The variable 'dir' is an integer, and 'directions' is a char pointer array. This incorrect syntax should be 'directions[dir]', using 'dir' as the index into the 'directions' array. Fix this by correcting the array access from 'dir[directions]' to 'directions[dir]'. Signed-off-by: Zhang Chujun <zhangchujun(a)cmss.chinamobile.com> diff --git a/tools/testing/selftests/dma/dma_map_benchmark.c b/tools/testing/selftests/dma/dma_map_benchmark.c index b12f1f9babf8..b925756373ce 100644 --- a/tools/testing/selftests/dma/dma_map_benchmark.c +++ b/tools/testing/selftests/dma/dma_map_benchmark.c @@ -118,7 +118,7 @@ int main(int argc, char **argv) } printf("dma mapping benchmark: threads:%d seconds:%d node:%d dir:%s granule: %d\n", - threads, seconds, node, dir[directions], granule); + threads, seconds, node, directions[dir], granule); printf("average map latency(us):%.1f standard deviation:%.1f\n", map.avg_map_100ns/10.0, map.map_stddev/10.0); printf("average unmap latency(us):%.1f standard deviation:%.1f\n", -- 2.50.1.windows.1

2 months

2
1
0 0

[PATCH v6 0/3] drivers/base: Introduce revocable

by Tzung-Bi Shih

The series is separated from [1] to show the independency and compare potential use cases easier. This is the revocable core part. Use cases are in other series. The 1st patch introduces the revocable which is an implementation of ideas from the talk [2]. The 2nd and 3rd patches add test cases for revocable in Kunit and selftest. [1] https://lore.kernel.org/chrome-platform/20251016054204.1523139-1-tzungbi@ke… [2] https://lpc.events/event/17/contributions/1627/ v6: - Rebase onto next-20251106. - Separate revocable core and use cases. v5: https://lore.kernel.org/chrome-platform/20251016054204.1523139-1-tzungbi@ke… - Rebase onto next-20251015. - Add more context about the PoC. - Support multiple revocable providers in the PoC. v4: https://lore.kernel.org/chrome-platform/20250923075302.591026-1-tzungbi@ker… - Rebase onto next-20250922. - Remove the 5th patch from v3. - Add fops replacement PoC in 5th - 7th patches. v3: https://lore.kernel.org/chrome-platform/20250912081718.3827390-1-tzungbi@ke… - Rebase onto https://lore.kernel.org/chrome-platform/20250828083601.856083-1-tzungbi@ker… and next-20250912. - The 4th patch changed accordingly. v2: https://lore.kernel.org/chrome-platform/20250820081645.847919-1-tzungbi@ker… - Rename "ref_proxy" -> "revocable". - Add test cases in Kunit and selftest. v1: https://lore.kernel.org/chrome-platform/20250814091020.1302888-1-tzungbi@ke… Tzung-Bi Shih (3): revocable: Revocable resource management revocable: Add Kunit test cases selftests: revocable: Add kselftest cases .../driver-api/driver-model/index.rst | 1 + .../driver-api/driver-model/revocable.rst | 112 +++++++++ MAINTAINERS | 9 + drivers/base/Kconfig | 8 + drivers/base/Makefile | 5 +- drivers/base/revocable.c | 234 ++++++++++++++++++ drivers/base/revocable_test.c | 139 +++++++++++ include/linux/revocable.h | 69 ++++++ tools/testing/selftests/Makefile | 1 + .../selftests/drivers/base/revocable/Makefile | 7 + .../drivers/base/revocable/revocable_test.c | 136 ++++++++++ .../drivers/base/revocable/test-revocable.sh | 39 +++ .../base/revocable/test_modules/Makefile | 10 + .../revocable/test_modules/revocable_test.c | 195 +++++++++++++++ 14 files changed, 964 insertions(+), 1 deletion(-) create mode 100644 Documentation/driver-api/driver-model/revocable.rst create mode 100644 drivers/base/revocable.c create mode 100644 drivers/base/revocable_test.c create mode 100644 include/linux/revocable.h create mode 100644 tools/testing/selftests/drivers/base/revocable/Makefile create mode 100644 tools/testing/selftests/drivers/base/revocable/revocable_test.c create mode 100755 tools/testing/selftests/drivers/base/revocable/test-revocable.sh create mode 100644 tools/testing/selftests/drivers/base/revocable/test_modules/Makefile create mode 100644 tools/testing/selftests/drivers/base/revocable/test_modules/revocable_test.c -- 2.48.1

2 months

2
4
0 0

[PATCH net] bonding: fix NULL pointer dereference in actor_port_prio setting

by Hangbin Liu

Liang reported an issue where setting a slave’s actor_port_prio to predefined values such as 0, 255, or 65535 would cause a system crash. The problem occurs because in bond_opt_parse(), when the provided value matches a predefined table entry, the function returns that table entry, which does not contain slave information. Later, in bond_option_actor_port_prio_set(), calling bond_slave_get_rtnl() leads to a NULL pointer dereference. Since actor_port_prio is defined as a u16 and initialized to the default value of 255 in ad_initialize_port(), there is no need for the bond_actor_port_prio_tbl. Using the BOND_OPTFLAG_RAWVAL flag is sufficient. Fixes: 6b6dc81ee7e8 ("bonding: add support for per-port LACP actor priority") Reported-by: Liang Li <liali(a)redhat.com> Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com> --- BTW, the logic in bond_opt_parse() may also need an update after we have f2b3b28ce523 ("bonding: add slave_dev field for bond_opt_value"), as we may need range checking on slave options in future. But this should be another patch and not urgent as this one. --- drivers/net/bonding/bond_options.c | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c index 495a87f2ea7c..384499c869b8 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -225,13 +225,6 @@ static const struct bond_opt_value bond_ad_actor_sys_prio_tbl[] = { { NULL, -1, 0}, }; -static const struct bond_opt_value bond_actor_port_prio_tbl[] = { - { "minval", 0, BOND_VALFLAG_MIN}, - { "maxval", 65535, BOND_VALFLAG_MAX}, - { "default", 255, BOND_VALFLAG_DEFAULT}, - { NULL, -1, 0}, -}; - static const struct bond_opt_value bond_ad_user_port_key_tbl[] = { { "minval", 0, BOND_VALFLAG_MIN | BOND_VALFLAG_DEFAULT}, { "maxval", 1023, BOND_VALFLAG_MAX}, @@ -497,7 +490,7 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = { .id = BOND_OPT_ACTOR_PORT_PRIO, .name = "actor_port_prio", .unsuppmodes = BOND_MODE_ALL_EX(BIT(BOND_MODE_8023AD)), - .values = bond_actor_port_prio_tbl, + .flags = BOND_OPTFLAG_RAWVAL, .set = bond_option_actor_port_prio_set, }, [BOND_OPT_AD_ACTOR_SYSTEM] = { -- 2.50.1

2 months

2
1
0 0

[PATCH net v2] selftests/vsock: avoid false-positives when checking dmesg

by Bobby Eshleman

From: Bobby Eshleman <bobbyeshleman(a)meta.com> Sometimes VMs will have some intermittent dmesg warnings that are unrelated to vsock. Change the dmesg parsing to filter on strings containing 'vsock' to avoid false positive failures that are unrelated to vsock. The downside is that it is possible for some vsock related warnings to not contain the substring 'vsock', so those will be missed. Fixes: a4a65c6fe08b ("selftests/vsock: add initial vmtest.sh for vsock") Reviewed-by: Simon Horman <horms(a)kernel.org> Signed-off-by: Bobby Eshleman <bobbyeshleman(a)meta.com> --- Previously was part of the series: https://lore.kernel.org/all/20251022-vsock-selftests-fixes-and-improvements… --- Changes in v2: - use consistent quoting for vsock string - Link to v1: https://lore.kernel.org/r/20251104-vsock-vmtest-dmesg-fix-v1-1-80c8db3f5dfe… --- tools/testing/selftests/vsock/vmtest.sh | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh index edacebfc1632..8ceeb8a7894f 100755 --- a/tools/testing/selftests/vsock/vmtest.sh +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -389,9 +389,9 @@ run_test() { local rc host_oops_cnt_before=$(dmesg | grep -c -i 'Oops') - host_warn_cnt_before=$(dmesg --level=warn | wc -l) + host_warn_cnt_before=$(dmesg --level=warn | grep -c -i 'vsock') vm_oops_cnt_before=$(vm_ssh -- dmesg | grep -c -i 'Oops') - vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | wc -l) + vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | grep -c -i 'vsock') name=$(echo "${1}" | awk '{ print $1 }') eval test_"${name}" @@ -403,7 +403,7 @@ run_test() { rc=$KSFT_FAIL fi - host_warn_cnt_after=$(dmesg --level=warn | wc -l) + host_warn_cnt_after=$(dmesg --level=warn | grep -c -i 'vsock') if [[ ${host_warn_cnt_after} -gt ${host_warn_cnt_before} ]]; then echo "FAIL: kernel warning detected on host" | log_host "${name}" rc=$KSFT_FAIL @@ -415,7 +415,7 @@ run_test() { rc=$KSFT_FAIL fi - vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | wc -l) + vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | grep -c -i 'vsock') if [[ ${vm_warn_cnt_after} -gt ${vm_warn_cnt_before} ]]; then echo "FAIL: kernel warning detected on vm" | log_host "${name}" rc=$KSFT_FAIL --- base-commit: 89aec171d9d1ab168e43fcf9754b82e4c0aef9b9 change-id: 20251104-vsock-vmtest-dmesg-fix-b2c59e1d9c38 Best regards, -- Bobby Eshleman <bobbyeshleman(a)meta.com>

2 months

3
2
0 0

[PATCH] selftests: timers: nanosleep: Add tests for return of remaining time

by Thomas Weißschuh

If interrupted by a signal clock_nanosleep() returns the remaining time into the structure pointed to by the rmtp parameter. So far this functionality was not tested by the timer selftests. Extend the nanosleep selftest to cover this feature. Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- tools/testing/selftests/timers/nanosleep.c | 55 ++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/tools/testing/selftests/timers/nanosleep.c b/tools/testing/selftests/timers/nanosleep.c index 252c6308c5698f9094b8bdc39c284077b5d55531..10badae13ebeb8d596839d5aab1a5161526eeaa9 100644 --- a/tools/testing/selftests/timers/nanosleep.c +++ b/tools/testing/selftests/timers/nanosleep.c @@ -116,6 +116,56 @@ int nanosleep_test(int clockid, long long ns) return 0; } +static void dummy_event_handler(int val) +{ + /* No action needed */ +} + +static int nanosleep_test_remaining(int clockid) +{ + struct timespec rqtp = {}, rmtp = {}; + struct itimerspec itimer = {}; + struct sigaction sa = {}; + timer_t timer; + int ret; + + sa.sa_handler = dummy_event_handler; + ret = sigaction(SIGALRM, &sa, NULL); + if (ret) + return -1; + + ret = timer_create(clockid, NULL, &timer); + if (ret) + return -1; + + itimer.it_value.tv_nsec = NSEC_PER_SEC / 4; + ret = timer_settime(timer, 0, &itimer, NULL); + if (ret) + return -1; + + rqtp.tv_nsec = NSEC_PER_SEC / 2; + ret = clock_nanosleep(clockid, 0, &rqtp, &rmtp); + if (ret != EINTR) + return -1; + + ret = timer_delete(timer); + if (ret) + return -1; + + sa.sa_handler = SIG_DFL; + ret = sigaction(SIGALRM, &sa, NULL); + if (ret) + return -1; + + if (!in_order((struct timespec) {}, rmtp)) + return -1; + + if (!in_order(rmtp, rqtp)) + return -1; + + return 0; +} + int main(int argc, char **argv) { long long length; @@ -150,6 +200,11 @@ int main(int argc, char **argv) } length *= 100; } + ret = nanosleep_test_remaining(clockid); + if (ret < 0) { + ksft_test_result_fail("%-31s\n", clockstring(clockid)); + ksft_exit_fail(); + } ksft_test_result_pass("%-31s\n", clockstring(clockid)); next: ret = 0; --- base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787 change-id: 20251014-nanosleep-rtmp-selftest-6c14f1374f6f Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

2 months

1
0
0 0

[PATCH] selftests: net: local_termination: Wait for interfaces to come up

by A. Sverdlin

From: Alexander Sverdlin <alexander.sverdlin(a)siemens.com> It seems that most of the tests prepare the interfaces once before the test run (setup_prepare()), rely on setup_wait() to wait for link and only then run the test(s). local_termination brings the physical interfaces down and up during test run but never wait for them to come up. If the auto-negotiation takes some seconds, first test packets are being lost, which leads to false-negative test results. Use setup_wait_dev() after corresponding simple_if_init() on physical interfaces to make sure auto-negotiation has been completed and test packets will not be lost because of the race against link establishment. The wait has to be done in each individual test because the interfaces have to be brough up first and only then we can wait for link (not individually, because they are expected to be looped in pairs). Fixes: 90b9566aa5cd3f ("selftests: forwarding: add a test for local_termination.sh") Signed-off-by: Alexander Sverdlin <alexander.sverdlin(a)siemens.com> --- .../selftests/net/forwarding/local_termination.sh | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/tools/testing/selftests/net/forwarding/local_termination.sh b/tools/testing/selftests/net/forwarding/local_termination.sh index ecd34f364125c..369c8b2c1f4a2 100755 --- a/tools/testing/selftests/net/forwarding/local_termination.sh +++ b/tools/testing/selftests/net/forwarding/local_termination.sh @@ -430,6 +430,8 @@ standalone() h1_create h2_create macvlan_create $h2 + setup_wait_dev $h1 + setup_wait_dev $h2 run_test $h1 $h2 $skip_ptp $no_unicast_flt "$h2" @@ -448,6 +450,8 @@ test_bridge() bridge_create $vlan_filtering simple_if_init br0 $H2_IPV4/24 $H2_IPV6/64 macvlan_create br0 + setup_wait_dev $h1 + setup_wait_dev $h2 run_test $h1 br0 $skip_ptp $no_unicast_flt \ "vlan_filtering=$vlan_filtering bridge" @@ -480,6 +484,8 @@ test_vlan() h1_vlan_create h2_vlan_create macvlan_create $h2.100 + setup_wait_dev $h1 + setup_wait_dev $h2 run_test $h1.100 $h2.100 $skip_ptp $no_unicast_flt "VLAN upper" @@ -505,6 +511,8 @@ vlan_over_bridged_port() h2_vlan_create bridge_create $vlan_filtering macvlan_create $h2.100 + setup_wait_dev $h1 + setup_wait_dev $h2 run_test $h1.100 $h2.100 $skip_ptp $no_unicast_flt \ "VLAN over vlan_filtering=$vlan_filtering bridged port" @@ -536,6 +544,8 @@ vlan_over_bridge() simple_if_init br0 vlan_create br0 100 vbr0 $H2_IPV4/24 $H2_IPV6/64 macvlan_create br0.100 + setup_wait_dev $h1 + setup_wait_dev $h2 if [ $vlan_filtering = 1 ]; then bridge vlan add dev $h2 vid 100 master -- 2.51.1

2 months

3
2
0 0

[PATCH v4 00/35] sparc64: vdso: Switch to the generic vDSO library

by Thomas Weißschuh

The generic vDSO provides a lot common functionality shared between different architectures. SPARC is the last architecture not using it, preventing some necessary code cleanup. Make use of the generic infrastructure. Follow-up to and replacement for Arnd's SPARC vDSO removal patches: https://lore.kernel.org/lkml/20250707144726.4008707-1-arnd@kernel.org/ SPARC64 can not map .bss into userspace, so the vDSO datapages are switched over to be allocated dynamically. This requires changes to the s390 and random subsystem vDSO initialization as preparation. The random subsystem changes in turn require some cleanup of the vDSO headers to not end up as ugly #ifdef mess. Tested on a Niagara T4 and QEMU. This has a semantic conflict with my series "vdso: Reject absolute relocations during build" [0]. The last patch of this series expects all users of the generic vDSO library to use the vdsocheck tool. This is not the case (yet) for SPARC64. I do have the patches for the integration, the specifics will depend on which series is applied first. Based on v6.18-rc1. [0] https://lore.kernel.org/lkml/20250812-vdso-absolute-reloc-v4-0-61a8b615e5ec… Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- Changes in v4: - Rebase on v6.18-rc1. - Keep inclusion of asm/clocksource.h from linux/clocksource.h - Reword description of "s390/time: Set up vDSO datapage later" - Link to v3: https://lore.kernel.org/r/20250917-vdso-sparc64-generic-2-v3-0-3679b1bc8ee8… Changes in v3: - Allocate vDSO data pages dynamically (and lots of preparations for that) - Drop clock_getres() - Fix 32bit clock_gettime() syscall fallback - Link to v2: https://lore.kernel.org/r/20250815-vdso-sparc64-generic-2-v2-0-b5ff80672347… Changes in v2: - Rebase on v6.17-rc1 - Drop RFC state - Fix typo in commit message - Drop duplicate 'select GENERIC_TIME_VSYSCALL' - Merge "sparc64: time: Remove architecture-specific clocksource data" into the main conversion patch. It violated the check in __clocksource_register_scale() - Link to v1: https://lore.kernel.org/r/20250724-vdso-sparc64-generic-2-v1-0-e376a3bd24d1… --- Arnd Bergmann (1): clocksource: remove ARCH_CLOCKSOURCE_DATA Thomas Weißschuh (34): selftests: vDSO: vdso_test_correctness: Handle different tv_usec types arm64: vDSO: getrandom: Explicitly include asm/alternative.h arm64: vDSO: gettimeofday: Explicitly include vdso/clocksource.h arm64: vDSO: compat_gettimeofday: Add explicit includes ARM: vdso: gettimeofday: Add explicit includes powerpc/vdso/gettimeofday: Explicitly include vdso/time32.h powerpc/vdso: Explicitly include asm/cputable.h and asm/feature-fixups.h LoongArch: vDSO: Explicitly include asm/vdso/vdso.h MIPS: vdso: Add include guard to asm/vdso/vdso.h MIPS: vdso: Explicitly include asm/vdso/vdso.h random: vDSO: Add explicit includes vdso/gettimeofday: Add explicit includes vdso/helpers: Explicitly include vdso/processor.h vdso/datapage: Remove inclusion of gettimeofday.h vdso/datapage: Trim down unnecessary includes random: vDSO: trim vDSO includes random: vDSO: remove ifdeffery random: vDSO: split out datapage update into helper functions random: vDSO: only access vDSO datapage after random_init() s390/time: Set up vDSO datapage later vdso/datastore: Reduce scope of some variables in vvar_fault() vdso/datastore: Drop inclusion of linux/mmap_lock.h vdso/datastore: Map pages through struct page vdso/datastore: Allocate data pages dynamically sparc64: vdso: Link with -z noexecstack sparc64: vdso: Remove obsolete "fake section table" reservation sparc64: vdso: Replace code patching with runtime conditional sparc64: vdso: Move hardware counter read into header sparc64: vdso: Move syscall fallbacks into header sparc64: vdso: Introduce vdso/processor.h sparc64: vdso: Switch to the generic vDSO library sparc64: vdso2c: Drop sym_vvar_start handling sparc64: vdso2c: Remove symbol handling sparc64: vdso: Implement clock_gettime64() arch/arm/include/asm/vdso/gettimeofday.h | 2 + arch/arm64/include/asm/vdso/compat_gettimeofday.h | 3 + arch/arm64/include/asm/vdso/gettimeofday.h | 2 + arch/arm64/kernel/vdso/vgetrandom.c | 2 + arch/loongarch/kernel/process.c | 1 + arch/loongarch/kernel/vdso.c | 1 + arch/mips/include/asm/vdso/vdso.h | 5 + arch/mips/kernel/vdso.c | 1 + arch/powerpc/include/asm/vdso/gettimeofday.h | 1 + arch/powerpc/include/asm/vdso/processor.h | 3 + arch/s390/kernel/time.c | 4 +- arch/sparc/Kconfig | 3 +- arch/sparc/include/asm/clocksource.h | 9 - arch/sparc/include/asm/processor.h | 3 + arch/sparc/include/asm/processor_32.h | 2 - arch/sparc/include/asm/processor_64.h | 25 -- arch/sparc/include/asm/vdso.h | 2 - arch/sparc/include/asm/vdso/clocksource.h | 10 + arch/sparc/include/asm/vdso/gettimeofday.h | 184 ++++++++++ arch/sparc/include/asm/vdso/processor.h | 41 +++ arch/sparc/include/asm/vdso/vsyscall.h | 10 + arch/sparc/include/asm/vvar.h | 75 ---- arch/sparc/kernel/Makefile | 1 - arch/sparc/kernel/time_64.c | 6 +- arch/sparc/kernel/vdso.c | 69 ---- arch/sparc/vdso/Makefile | 8 +- arch/sparc/vdso/vclock_gettime.c | 380 ++------------------- arch/sparc/vdso/vdso-layout.lds.S | 26 +- arch/sparc/vdso/vdso.lds.S | 2 - arch/sparc/vdso/vdso2c.c | 24 -- arch/sparc/vdso/vdso2c.h | 45 +-- arch/sparc/vdso/vdso32/vdso32.lds.S | 4 +- arch/sparc/vdso/vma.c | 274 +-------------- drivers/char/random.c | 71 ++-- include/linux/clocksource.h | 6 +- include/linux/vdso_datastore.h | 6 + include/vdso/datapage.h | 23 +- include/vdso/helpers.h | 1 + init/main.c | 2 + kernel/time/Kconfig | 4 - lib/vdso/datastore.c | 73 ++-- lib/vdso/getrandom.c | 3 + lib/vdso/gettimeofday.c | 17 + .../testing/selftests/vDSO/vdso_test_correctness.c | 8 +- 44 files changed, 448 insertions(+), 994 deletions(-) --- base-commit: 28b1ac5ccd8d4900a8f53f0e6e84d517a7ccc71f change-id: 20250722-vdso-sparc64-generic-2-25f2e058e92c Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

2 months

8
54
0 0

[PATCH v3 0/2] Print map ID on successful creation

by Harshit Mogalapalli

Hi all, I have tried looking at an issue from the bpftool repository: https://github.com/libbpf/bpftool/issues/121 and this patch series tries to add that enhancement. Summary: Currently when a map creation is successful there is no message on the terminal, printing IDs on successful creation of maps can help notify the user and can be used in CI/CD. The first patch adds the logic for printing and the second patch adds a simple selftest for the same. Thank you very much. V1 --> V2: PATCH 1 updated [Thanks Yonghong for suggesting better way of error handling with a new label for close(fd); instead of calling multiple times] V2 --> V3: Thanks to Quentin. PATCH1: drop \n in p_err statement PATCH2: Remove messages in cases of successful ID printing. Also remove message with a "FAIL:" prefix to make it more consistent. Regards, Harshit Harshit Mogalapalli (2): bpftool: Print map ID upon creation and support JSON output selftests/bpf: Add test for bpftool map ID printing tools/bpf/bpftool/map.c | 21 +++++++++--- .../testing/selftests/bpf/test_bpftool_map.sh | 32 +++++++++++++++++++ 2 files changed, 49 insertions(+), 4 deletions(-) -- 2.50.1

2 months

3
10
0 0

[PATCH] selftests/seccomp: Fix pointer type mismatch in uprobe function declarations

by WangYuli

From: WangYuli <wangyl5933(a)chinaunicom.cn> Add __nocf_check attribute to probed_uretprobe on x86_64 to match probed_uprobe's function signature. [ Fix follow error with gcc-15: ] CC seccomp_bpf seccomp_bpf.c: In function ‘UPROBE_setup’: seccomp_bpf.c:5175:74: error: pointer type mismatch in conditional expression [-Wincompatible-pointer-types] 5175 | offset = get_uprobe_offset(variant->uretprobe ? probed_uretprobe : probed_uprobe); | ^ seccomp_bpf.c:5175:57: note: first expression has type ‘int (*)(void)’ 5175 | offset = get_uprobe_offset(variant->uretprobe ? probed_uretprobe : probed_uprobe); | ^~~~~~~~~~~~~~~~ seccomp_bpf.c:5175:76: note: second expression has type ‘int (__attribute__((nocf_check)) *)(void)’ 5175 | offset = get_uprobe_offset(variant->uretprobe ? probed_uretprobe : probed_uprobe); | ^~~~~~~~~~~~~ Signed-off-by: WangYuli <wangyl5933(a)chinaunicom.cn> Signed-off-by: WangYuli <wangyuli(a)aosc.io> --- tools/testing/selftests/seccomp/seccomp_bpf.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 874f17763536..19df80d18619 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -5057,17 +5057,23 @@ __naked __nocf_check noinline int probed_uprobe(void) } #pragma GCC diagnostic pop -#else +int __nocf_check noinline probed_uretprobe(void) +{ + return 1; +} + +#else /* !__x86_64__ */ + noinline int probed_uprobe(void) { return 1; } -#endif noinline int probed_uretprobe(void) { return 1; } +#endif /* __x86_64__ */ static int parse_uint_from_file(const char *file, const char *fmt) { -- 2.51.0

2 months

1
0
0 0

[PATCH][next] selftests/bpf: Fix spelling mistake "clien" -> "client"

by Colin Ian King

There are spelling mistakes in ASSERT_OK_PTR messages. Fix them. Signed-off-by: Colin Ian King <coking(a)nvidia.com> --- tools/testing/selftests/bpf/prog_tests/test_tc_tunnel.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/test_tc_tunnel.c b/tools/testing/selftests/bpf/prog_tests/test_tc_tunnel.c index deea90aaefad..f74fb50e3f9f 100644 --- a/tools/testing/selftests/bpf/prog_tests/test_tc_tunnel.c +++ b/tools/testing/selftests/bpf/prog_tests/test_tc_tunnel.c @@ -516,13 +516,13 @@ static void subtest_cleanup(struct subtest_cfg *cfg) struct nstoken *nstoken; nstoken = open_netns(CLIENT_NS); - if (ASSERT_OK_PTR(nstoken, "open clien ns")) { + if (ASSERT_OK_PTR(nstoken, "open client ns")) { SYS_NOFAIL("tc qdisc delete dev veth1 parent ffff:fff1"); SYS_NOFAIL("ip a flush veth1"); close_netns(nstoken); } nstoken = open_netns(SERVER_NS); - if (ASSERT_OK_PTR(nstoken, "open clien ns")) { + if (ASSERT_OK_PTR(nstoken, "open client ns")) { SYS_NOFAIL("tc qdisc delete dev veth2 parent ffff:fff1"); SYS_NOFAIL("ip a flush veth2"); if (!cfg->expect_kern_decap_failure) -- 2.51.0

2 months

1
0
0 0

[PATCH] selftest/alsa: correct grammar in conf_get_bool error string

by Zhang Chujun

The phrase "an bool" is grammatically incorrect; it should be "a bool". Signed-off-by: Zhang Chujun <zhangchujun(a)cmss.chinamobile.com> diff --git a/tools/testing/selftests/alsa/conf.c b/tools/testing/selftests/alsa/conf.c index 5b7c83fe87b3..317212078e36 100644 --- a/tools/testing/selftests/alsa/conf.c +++ b/tools/testing/selftests/alsa/conf.c @@ -448,7 +448,7 @@ int conf_get_bool(snd_config_t *root, const char *key1, const char *key2, int de ksft_exit_fail_msg("key '%s'.'%s' search error: %s\n", key1, key2, snd_strerror(ret)); ret = snd_config_get_bool(cfg); if (ret < 0) - ksft_exit_fail_msg("key '%s'.'%s' is not an bool\n", key1, key2); + ksft_exit_fail_msg("key '%s'.'%s' is not a bool\n", key1, key2); return !!ret; } -- 2.50.1.windows.1

2 months

2
1
0 0

[PATCH v2] selftests/user_events: Fix type cast for write_index packed member in perf_test

by Ankit Khushwaha

Accessing 'reg.write_index' directly triggers a -Waddress-of-packed-member warning due to potential unaligned pointer access: perf_test.c:239:38: warning: taking address of packed member 'write_index' of class or structure 'user_reg' may result in an unaligned pointer value [-Waddress-of-packed-member] 239 | ASSERT_NE(-1, write(self->data_fd, &reg.write_index, | ^~~~~~~~~~~~~~~ Since write(2) works with any alignment. Casting '&reg.write_index' explicitly to 'void *' to suppress this warning. Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux(a)gmail.com> --- Changelog: v2: - typecast '&reg.write_index' to 'void *' & remove use of memcpy as suggested by Andrew. v1: https://lore.kernel.org/linux-kselftest/20251027113439.36059-1-ankitkhushwa… --- tools/testing/selftests/user_events/perf_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/user_events/perf_test.c b/tools/testing/selftests/user_events/perf_test.c index 201459d8094d..cafec0e52eb3 100644 --- a/tools/testing/selftests/user_events/perf_test.c +++ b/tools/testing/selftests/user_events/perf_test.c @@ -236,7 +236,7 @@ TEST_F(user, perf_empty_events) { ASSERT_EQ(1 << reg.enable_bit, self->check); /* Ensure write shows up at correct offset */ - ASSERT_NE(-1, write(self->data_fd, &reg.write_index, + ASSERT_NE(-1, write(self->data_fd, (void *)&reg.write_index, sizeof(reg.write_index))); val = (void *)(((char *)perf_page) + perf_page->data_offset); ASSERT_EQ(PERF_RECORD_SAMPLE, *val); -- 2.51.0

2 months

1
0
0 0

/Re,

by Harry Schofield ESQ

Re: Good day, Hope you are well, my first email returned undelivered, please can I provide you with more information through this email?. Best regards, Harry Schofield

2 months

1
0
0 0

[PATCH 0/3] introduce VM_MAYBE_GUARD and make it sticky

by Lorenzo Stoakes

Currently, guard regions are not visible to users except through /proc/$pid/pagemap, with no explicit visibility at the VMA level. This makes the feature less useful, as it isn't entirely apparent which VMAs may have these entries present, especially when performing actions which walk through memory regions such as those performed by CRIU. This series addresses this issue by introducing the VM_MAYBE_GUARD flag which fulfils this role, updating the smaps logic to display an entry for these. The semantics of this flag are that a guard region MAY be present if set (we cannot be sure, as we can't efficiently track whether an MADV_GUARD_REMOVE finally removes all the guard regions in a VMA) - but if not set the VMA definitely does NOT have any guard regions present. It's problematic to establish this flag without further action, because that means that VMAs with guard regions in them become non-mergeable with adjacent VMAs for no especially good reason. To work around this, this series also introduces the concept of 'sticky' VMA flags - that is flags which: a. if set in one VMA and not in another still permit those VMAs to be merged (if otherwise compatible). b. When they are merged, the resultant VMA must have the flag set. The VMA logic is updated to propagate these flags correctly. Additionally, VM_MAYBE_GUARD being an explicit VMA flag allows us to solve an issue with file-backed guard regions - previously these established an anon_vma object for file-backed mappings solely to have vma_needs_copy() correctly propagate guard region mappings to child processes. We introduce a new flag alias VM_COPY_ON_FORK (which currently only specifies VM_MAYBE_GUARD) and update vma_needs_copy() to check explicitly for this flag and to copy page tables if it is present, which resolves this issue. Finally we introduce extensive VMA userland tests to assert that the sticky VMA logic behaves correctly as well as guard region self tests to assert that smaps visibility is correctly implemented. Lorenzo Stoakes (3): mm: introduce VM_MAYBE_GUARD and make visible for guard regions mm: implement sticky, copy on fork VMA flags selftests/mm/guard-regions: add smaps visibility test Documentation/filesystems/proc.rst | 1 + fs/proc/task_mmu.c | 1 + include/linux/mm.h | 33 ++++++ include/trace/events/mmflags.h | 1 + mm/madvise.c | 22 ++-- mm/memory.c | 3 + mm/vma.c | 22 ++-- tools/testing/selftests/mm/guard-regions.c | 120 +++++++++++++++++++++ tools/testing/selftests/mm/vm_util.c | 5 + tools/testing/selftests/mm/vm_util.h | 1 + tools/testing/vma/vma.c | 89 +++++++++++++-- tools/testing/vma/vma_internal.h | 33 ++++++ 12 files changed, 303 insertions(+), 28 deletions(-) -- 2.51.0

2 months

5
27
0 0

[bpf-next] selftests/bpf: refactor snprintf_btf test to use bpf_strncmp

by Hoyeon Lee

The netif_receive_skb BPF program used in snprintf_btf test still uses a custom __strncmp. This is unnecessary as the bpf_strncmp helper is available and provides the same functionality. This commit refactors the test to use the bpf_strncmp helper, removing the redundant custom implementation. Signed-off-by: Hoyeon Lee <hoyeon.lee(a)suse.com> --- .../selftests/bpf/progs/netif_receive_skb.c | 15 +-------------- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/netif_receive_skb.c b/tools/testing/selftests/bpf/progs/netif_receive_skb.c index 9e067dcbf607..186b8c82b9e6 100644 --- a/tools/testing/selftests/bpf/progs/netif_receive_skb.c +++ b/tools/testing/selftests/bpf/progs/netif_receive_skb.c @@ -31,19 +31,6 @@ struct { __type(value, char[STRSIZE]); } strdata SEC(".maps"); -static int __strncmp(const void *m1, const void *m2, size_t len) -{ - const unsigned char *s1 = m1; - const unsigned char *s2 = m2; - int i, delta = 0; - - for (i = 0; i < len; i++) { - delta = s1[i] - s2[i]; - if (delta || s1[i] == 0 || s2[i] == 0) - break; - } - return delta; -} #if __has_builtin(__builtin_btf_type_id) #define TEST_BTF(_str, _type, _flags, _expected, ...) \ @@ -69,7 +56,7 @@ static int __strncmp(const void *m1, const void *m2, size_t len) &_ptr, sizeof(_ptr), _hflags); \ if (ret) \ break; \ - _cmp = __strncmp(_str, _expectedval, EXPECTED_STRSIZE); \ + _cmp = bpf_strncmp(_str, EXPECTED_STRSIZE, _expectedval); \ if (_cmp != 0) { \ bpf_printk("(%d) got %s", _cmp, _str); \ bpf_printk("(%d) expected %s", _cmp, \ -- 2.51.1

2 months

3
7
0 0

[PATCH net v4 0/3] mptcp: Fix conflicts between MPTCP and sockmap

by Jiayuan Chen

Overall, we encountered a warning [1] that can be triggered by running the selftest I provided. sockmap works by replacing sk_data_ready, recvmsg, sendmsg operations and implementing fast socket-level forwarding logic: 1. Users can obtain file descriptors through userspace socket()/accept() interfaces, then call BPF syscall to perform these replacements. 2. Users can also use the bpf_sock_hash_update helper (in sockops programs) to replace handlers when TCP connections enter ESTABLISHED state (BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB/BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB) However, when combined with MPTCP, an issue arises: MPTCP creates subflow sk's and performs TCP handshakes, so the BPF program obtains subflow sk's and may incorrectly replace their sk_prot. We need to reject such operations. In patch 1, we set psock_update_sk_prot to NULL in the subflow's custom sk_prot. Additionally, if the server's listening socket has MPTCP enabled and the client's TCP also uses MPTCP, we should allow the combination of subflow and sockmap. This is because the latest Golang programs have enabled MPTCP for listening sockets by default [2]. For programs already using sockmap, upgrading Golang should not cause sockmap functionality to fail. Patch 2 prevents the WARNING from occurring. [1] truncated warning: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 \ mptcp_stream_accept+0x34c/0x380 Modules linked in: RIP: 0010:mptcp_stream_accept+0x34c/0x380 RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202 PKRU: 55555554 Call Trace: <TASK> do_accept+0xeb/0x190 ? __x64_sys_pselect6+0x61/0x80 ? _raw_spin_unlock+0x12/0x30 ? alloc_fd+0x11e/0x190 __sys_accept4+0x8c/0x100 __x64_sys_accept+0x1f/0x30 x64_sys_call+0x202f/0x20f0 do_syscall_64+0x72/0x9a0 ? switch_fpu_return+0x60/0xf0 ? irqentry_exit_to_user_mode+0xdb/0x1e0 ? irqentry_exit+0x3f/0x50 ? clear_bhb_loop+0x50/0xa0 ? clear_bhb_loop+0x50/0xa0 ? clear_bhb_loop+0x50/0xa0 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> ---[ end trace 0000000000000000 ]--- [2]: https://go-review.googlesource.com/c/go/+/607715 --- v3 -> v4: Addressed questions from Matthieu and Paolo, explained sockmap's operational mechanism, and finalized the changes v2 -> v3: Adopted Jakub Sitnicki's suggestions - atomic retrieval of sk_family is required v1 -> v2: Had initial discussion with Matthieu on sockmap and MPTCP technical details v3: https://lore.kernel.org/bpf/20251023125450.105859-1-jiayuan.chen@linux.dev/ v2: https://lore.kernel.org/bpf/20251020060503.325369-1-jiayuan.chen@linux.dev/… v1: https://lore.kernel.org/mptcp/a0a2b87119a06c5ffaa51427a0964a05534fe6f1@linu… Jiayuan Chen (3): mptcp: disallow MPTCP subflows from sockmap net,mptcp: fix proto fallback detection with BPF selftests/bpf: Add mptcp test with sockmap net/mptcp/protocol.c | 6 +- net/mptcp/subflow.c | 8 + .../testing/selftests/bpf/prog_tests/mptcp.c | 150 ++++++++++++++++++ .../selftests/bpf/progs/mptcp_sockmap.c | 43 +++++ 4 files changed, 205 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/mptcp_sockmap.c base-commit: 89aec171d9d1ab168e43fcf9754b82e4c0aef9b9 -- 2.43.0

2 months

2
10
0 0

[PATCH net-next v2 00/12] selftests/vsock: refactor and improve vmtest infrastructure

by Bobby Eshleman

Hey all, This patch series refactors the vsock selftest VM infrastructure to improve test run times, improve logging, and prepare for future tests which make heavy usage of these refactored functions and have new requirements such as simultaneous QEMU processes. These patches were broken off from this prior series: https://lore.kernel.org/all/20251021-vsock-vmtest-v7-0-0661b7b6f081@meta.co… --- Changes in v2: - remove "Fixes" for some patches because they do not fix bugs in kselftest runs (some fix bugs only when using bash args that kselftest does not use or otherwise prepare functions for new usage) - broke out one fixes patch for "net" - per-patch changes - add patch for shellcheck declaration to disable false positives - Link to v1: https://lore.kernel.org/r/20251022-vsock-selftests-fixes-and-improvements-v… --- Bobby Eshleman (12): selftests/vsock: improve logging in vmtest.sh selftests/vsock: make wait_for_listener() work even if pipefail is on selftests/vsock: reuse logic for vsock_test through wrapper functions selftests/vsock: avoid multi-VM pidfile collisions with QEMU selftests/vsock: do not unconditionally die if qemu fails selftests/vsock: speed up tests by reducing the QEMU pidfile timeout selftests/vsock: add check_result() for pass/fail counting selftests/vsock: identify and execute tests that can re-use VM selftests/vsock: add BUILD=0 definition selftests/vsock: add 1.37 to tested virtme-ng versions selftests/vsock: add vsock_loopback module loading selftests/vsock: disable shellcheck SC2317 and SC2119 tools/testing/selftests/vsock/vmtest.sh | 332 +++++++++++++++++++++----------- 1 file changed, 216 insertions(+), 116 deletions(-) --- base-commit: 255d75ef029f33f75fcf5015052b7302486f7ad2 change-id: 20251021-vsock-selftests-fixes-and-improvements-057440ffb2fa Best regards, -- Bobby Eshleman <bobbyeshleman(a)meta.com>

2 months, 1 week

2
20
0 0

[PATCH net-next v2 0/5] psp: track stats from core and provide a driver stats api

by Daniel Zahka

This series introduces stats counters for psp. Device key rotations, and so called 'stale-events' are common to all drivers and are tracked by the core. A driver facing api is provided for reporting stats required by the "Implementation Requirements" section of the PSP Architecture Specification. Drivers must implement these stats. Lastly, implementations of the driver stats api for mlx5 and netdevsim are included. Here is the output of running the psp selftest suite and then printing out stats with the ynl cli on system with a psp-capable CX7: $ ./ksft-psp-stats/drivers/net/psp.py TAP version 13 1..28 ok 1 psp.test_case # SKIP Test requires IPv4 connectivity ok 2 psp.data_basic_send_v0_ip6 ok 3 psp.test_case # SKIP Test requires IPv4 connectivity ok 4 psp.data_basic_send_v1_ip6 ok 5 psp.test_case # SKIP Test requires IPv4 connectivity ok 6 psp.data_basic_send_v2_ip6 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-128') ok 7 psp.test_case # SKIP Test requires IPv4 connectivity ok 8 psp.data_basic_send_v3_ip6 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-256') ok 9 psp.test_case # SKIP Test requires IPv4 connectivity ok 10 psp.data_mss_adjust_ip6 ok 11 psp.dev_list_devices ok 12 psp.dev_get_device ok 13 psp.dev_get_device_bad ok 14 psp.dev_rotate ok 15 psp.dev_rotate_spi ok 16 psp.assoc_basic ok 17 psp.assoc_bad_dev ok 18 psp.assoc_sk_only_conn ok 19 psp.assoc_sk_only_mismatch ok 20 psp.assoc_sk_only_mismatch_tx ok 21 psp.assoc_sk_only_unconn ok 22 psp.assoc_version_mismatch ok 23 psp.assoc_twice ok 24 psp.data_send_bad_key ok 25 psp.data_send_disconnect ok 26 psp.data_stale_key ok 27 psp.removal_device_rx # XFAIL Test only works on netdevsim ok 28 psp.removal_device_bi # XFAIL Test only works on netdevsim # Totals: pass:19 fail:0 xfail:2 xpass:0 skip:7 error:0 # # Responder logs (0): # STDERR: # Set PSP enable on device 1 to 0x3 # Set PSP enable on device 1 to 0x0 $ cd ynl/ $ ./pyynl/cli.py --spec netlink/specs/psp.yaml --dump get-stats [{'dev-id': 1, 'key-rotations': 5, 'rx-auth-fail': 21, 'rx-bad': 0, 'rx-bytes': 11844, 'rx-error': 0, 'rx-packets': 94, 'stale-events': 6, 'tx-bytes': 1128456, 'tx-error': 0, 'tx-packets': 780}] CHANGES: v2: - don't return skb->len from psp_nl_get_stats_dumpit() on success and EMSGSIZE - use %pe to print PTR_ERR() v1: https://lore.kernel.org/netdev/20251022193739.1376320-1-daniel.zahka@gmail.… Daniel Zahka (2): selftests: drv-net: psp: add assertions on core-tracked psp dev stats netdevsim: implement psp device stats Jakub Kicinski (3): psp: report basic stats from the core psp: add stats from psp spec to driver facing api net/mlx5e: Add PSP stats support for Rx/Tx flows Documentation/netlink/specs/psp.yaml | 95 +++++++ .../mellanox/mlx5/core/en_accel/psp.c | 239 ++++++++++++++++-- .../mellanox/mlx5/core/en_accel/psp.h | 18 ++ .../mellanox/mlx5/core/en_accel/psp_rxtx.c | 1 + .../net/ethernet/mellanox/mlx5/core/en_main.c | 5 + drivers/net/netdevsim/netdevsim.h | 5 + drivers/net/netdevsim/psp.c | 27 ++ include/net/psp/types.h | 35 +++ include/uapi/linux/psp.h | 18 ++ net/psp/psp-nl-gen.c | 19 ++ net/psp/psp-nl-gen.h | 2 + net/psp/psp_main.c | 3 +- net/psp/psp_nl.c | 94 +++++++ net/psp/psp_sock.c | 4 +- tools/testing/selftests/drivers/net/psp.py | 13 + 15 files changed, 561 insertions(+), 17 deletions(-) -- 2.47.3

2 months, 1 week

4
10
0 0

[PATCH 0/8] Initial DMABUF support for iommufd

by Jason Gunthorpe

This series is the start of adding full DMABUF support to iommufd. Currently it is limited to only work with VFIO's DMABUF exporter. It sits on top of Leon's series to add a DMABUF exporter to VFIO: https://lore.kernel.org/all/cover.1760368250.git.leon@kernel.org/ The existing IOMMU_IOAS_MAP_FILE is enhanced to detect DMABUF fd's, but otherwise works the same as it does today for a memfd. The user can select a slice of the FD to map into the ioas and if the underliyng alignment requirements are met it will be placed in the iommu_domain. Though limited, it is enough to allow a VMM like QEMU to connect MMIO BAR memory from VFIO to an iommu_domain controlled by iommufd. This is used for PCI Peer to Peer support in VMs, and is the last feature that the VFIO type 1 container has that iommufd couldn't do. The VFIO type1 version extracts raw PFNs from VMAs, which has no lifetime control and is a use-after-free security problem. Instead iommufd relies on revokable DMABUFs. Whenever VFIO thinks there should be no access to the MMIO it can shoot down the mapping in iommufd which will unmap it from the iommu_domain. There is no automatic remap, this is a safety protocol so the kernel doesn't get stuck. Userspace is expected to know it is doing something that will revoke the dmabuf and map/unmap it around the activity. Eg when QEMU goes to issue FLR it should do the map/unmap to iommufd. Since DMABUF is missing some key general features for this use case it relies on a "private interconnect" between VFIO and iommufd via the vfio_pci_dma_buf_iommufd_map() call. The call confirms the DMABUF has revoke semantics and delivers a phys_addr for the memory suitable for use with iommu_map(). Medium term there is a desire to expand the supported DMABUFs to include GPU drivers to support DPDK/SPDK type use cases so future series will work to add a general concept of revoke and a general negotiation of interconnect to remove vfio_pci_dma_buf_iommufd_map(). I also plan another series to modify iommufd's vfio_compat to transparently pull a dmabuf out of a VFIO VMA to emulate more of the uAPI of type1. The latest series for interconnect negotation to exchange a phys_addr is: https://lore.kernel.org/r/20251027044712.1676175-1-vivek.kasireddy@intel.com And the discussion for design of revoke is here: https://lore.kernel.org/dri-devel/20250114173103.GE5556@nvidia.com/ This is on github: https://github.com/jgunthorpe/linux/commits/iommufd_dmabuf The branch has various modifications to Leon's series I've suggested. Jason Gunthorpe (8): iommufd: Add DMABUF to iopt_pages iommufd: Do not map/unmap revoked DMABUFs iommufd: Allow a DMABUF to be revoked iommufd: Allow MMIO pages in a batch iommufd: Have pfn_reader process DMABUF iopt_pages iommufd: Have iopt_map_file_pages convert the fd to a file iommufd: Accept a DMABUF through IOMMU_IOAS_MAP_FILE iommufd/selftest: Add some tests for the dmabuf flow drivers/iommu/iommufd/io_pagetable.c | 74 +++- drivers/iommu/iommufd/io_pagetable.h | 53 ++- drivers/iommu/iommufd/ioas.c | 8 +- drivers/iommu/iommufd/iommufd_private.h | 13 +- drivers/iommu/iommufd/iommufd_test.h | 10 + drivers/iommu/iommufd/main.c | 10 + drivers/iommu/iommufd/pages.c | 407 ++++++++++++++++-- drivers/iommu/iommufd/selftest.c | 142 ++++++ tools/testing/selftests/iommu/iommufd.c | 43 ++ tools/testing/selftests/iommu/iommufd_utils.h | 44 ++ 10 files changed, 741 insertions(+), 63 deletions(-) base-commit: fc882154e421f82677925d33577226e776bb07a4 -- 2.43.0

2 months, 1 week

3
11
0 0

[PATCH nf-next v8 0/3] Add IPIP flowtable SW acceleration

by Lorenzo Bianconi

Introduce SW acceleration for IPIP tunnels in the netfilter flowtable infrastructure. This series introduces basic infrastructure to accelerate other tunnel types (e.g. IP6IP6). --- Changes in v8: - Rebase on top of the following series (not yet applied) https://patchwork.ozlabs.org/project/netfilter-devel/list/?series=477081 - Link to v7: https://lore.kernel.org/r/20251021-nf-flowtable-ipip-v7-0-a45214896106@kern… Changes in v7: - Introduce sw acceleration for tx path of IPIP tunnels - Rely on exact match during flowtable entry lookup - Fix typos - Link to v6: https://lore.kernel.org/r/20250818-nf-flowtable-ipip-v6-0-eda90442739c@kern… Changes in v6: - Rebase on top of nf-next main branch - Link to v5: https://lore.kernel.org/r/20250721-nf-flowtable-ipip-v5-0-0865af9e58c6@kern… Changes in v5: - Rely on __ipv4_addr_hash() to compute the hash used as encap ID - Remove unnecessary pskb_may_pull() in nf_flow_tuple_encap() - Add nf_flow_ip4_ecanp_pop utility routine - Link to v4: https://lore.kernel.org/r/20250718-nf-flowtable-ipip-v4-0-f8bb1c18b986@kern… Changes in v4: - Use the hash value of the saddr, daddr and protocol of outer IP header as encapsulation id. - Link to v3: https://lore.kernel.org/r/20250703-nf-flowtable-ipip-v3-0-880afd319b9f@kern… Changes in v3: - Add outer IP header sanity checks - target nf-next tree instead of net-next - Link to v2: https://lore.kernel.org/r/20250627-nf-flowtable-ipip-v2-0-c713003ce75b@kern… Changes in v2: - Introduce IPIP flowtable selftest - Link to v1: https://lore.kernel.org/r/20250623-nf-flowtable-ipip-v1-1-2853596e3941@kern… --- Lorenzo Bianconi (3): net: netfilter: Add IPIP flowtable rx sw acceleration net: netfilter: Add IPIP flowtable tx sw acceleration selftests: netfilter: nft_flowtable.sh: Add IPIP flowtable selftest include/linux/netdevice.h | 16 +++ include/net/netfilter/nf_flow_table.h | 22 ++++ net/ipv4/ipip.c | 29 +++++ net/netfilter/nf_flow_table_core.c | 3 + net/netfilter/nf_flow_table_ip.c | 117 ++++++++++++++++++++- net/netfilter/nf_flow_table_path.c | 86 +++++++++++++-- .../selftests/net/netfilter/nft_flowtable.sh | 40 +++++++ 7 files changed, 298 insertions(+), 15 deletions(-) --- base-commit: 32e4b1bf1bbfe63e52e2fff7ade0aaeb805defe3 change-id: 20250623-nf-flowtable-ipip-1b3d7b08d067 Best regards, -- Lorenzo Bianconi <lorenzo(a)kernel.org>

2 months, 1 week

3
7
0 0

[PATCH net] selftests/vsock: avoid false-positives when checking dmesg

by Bobby Eshleman

From: Bobby Eshleman <bobbyeshleman(a)meta.com> Sometimes VMs will have some intermittent dmesg warnings that are unrelated to vsock. Change the dmesg parsing to filter on strings containing 'vsock' to avoid false positive failures that are unrelated to vsock. The downside is that it is possible for some vsock related warnings to not contain the substring 'vsock', so those will be missed. Fixes: a4a65c6fe08b ("selftests/vsock: add initial vmtest.sh for vsock") Reviewed-by: Simon Horman <horms(a)kernel.org> Signed-off-by: Bobby Eshleman <bobbyeshleman(a)meta.com> --- Previously was part of the series: https://lore.kernel.org/all/20251022-vsock-selftests-fixes-and-improvements… --- tools/testing/selftests/vsock/vmtest.sh | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh index edacebfc1632..e1732f236d14 100755 --- a/tools/testing/selftests/vsock/vmtest.sh +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -389,9 +389,9 @@ run_test() { local rc host_oops_cnt_before=$(dmesg | grep -c -i 'Oops') - host_warn_cnt_before=$(dmesg --level=warn | wc -l) + host_warn_cnt_before=$(dmesg --level=warn | grep -c -i 'vsock') vm_oops_cnt_before=$(vm_ssh -- dmesg | grep -c -i 'Oops') - vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | wc -l) + vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | grep -c -i 'vsock') name=$(echo "${1}" | awk '{ print $1 }') eval test_"${name}" @@ -403,7 +403,7 @@ run_test() { rc=$KSFT_FAIL fi - host_warn_cnt_after=$(dmesg --level=warn | wc -l) + host_warn_cnt_after=$(dmesg --level=warn | grep -c -i vsock) if [[ ${host_warn_cnt_after} -gt ${host_warn_cnt_before} ]]; then echo "FAIL: kernel warning detected on host" | log_host "${name}" rc=$KSFT_FAIL @@ -415,7 +415,7 @@ run_test() { rc=$KSFT_FAIL fi - vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | wc -l) + vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | grep -c -i vsock) if [[ ${vm_warn_cnt_after} -gt ${vm_warn_cnt_before} ]]; then echo "FAIL: kernel warning detected on vm" | log_host "${name}" rc=$KSFT_FAIL --- base-commit: 255d75ef029f33f75fcf5015052b7302486f7ad2 change-id: 20251104-vsock-vmtest-dmesg-fix-b2c59e1d9c38 Best regards, -- Bobby Eshleman <bobbyeshleman(a)meta.com>

2 months, 1 week

2
2
0 0

[PATCH net v8 0/4] net: netpoll: fix memory leak and add comprehensive selftests

by Breno Leitao

Fix a memory leak in netpoll and introduce netconsole selftests that expose the issue when running with kmemleak detection enabled. This patchset includes a selftest for netpoll with multiple concurrent users (netconsole + bonding), which simulates the scenario from test[1] that originally demonstrated the issue allegedly fixed by commit efa95b01da18 ("netpoll: fix use after free") - a commit that is now being reverted. Sending this to "net" branch because this is a fix, and the selftest might help with the backports validation. Link: https://lore.kernel.org/lkml/96b940137a50e5c387687bb4f57de8b0435a653f.14048… [1] Signed-off-by: Breno Leitao <leitao(a)debian.org> --- Changes in v8: - Sending it again, now that commit 1a8fed52f7be1 ("netdevsim: set the carrier when the device goes up") has landed in net - Created one namespace for TX and one for RX (Paolo) - Used additional helpers to create and delete netdevsim (Paolo) - Link to v7: https://lore.kernel.org/r/20251003-netconsole_torture-v7-0-aa92fcce62a9@deb… Changes in v7: - Rebased on top of `net` - Link to v6: https://lore.kernel.org/r/20251002-netconsole_torture-v6-0-543bf52f6b46@deb… Changes in v6: - Expand the tests even more and some small fixups - Moved the test to bonding selftests - Link to v5: https://lore.kernel.org/r/20250918-netconsole_torture-v5-0-77e25e0a4eb6@deb… Changes in v5: - Set CONFIG_BONDING=m in selftests/drivers/net/config. - Link to v4: https://lore.kernel.org/r/20250917-netconsole_torture-v4-0-0a5b3b8f81ce@deb… Changes in v4: - Added an additional selftest to test multiple netpoll users in parallel - Link to v3: https://lore.kernel.org/r/20250905-netconsole_torture-v3-0-875c7febd316@deb… Changes in v3: - This patchset is a merge of the fix and the selftest together as recommended by Jakub. Changes in v2: - Reuse the netconsole creation from lib_netcons.sh. Thus, refactoring the create_dynamic_target() (Jakub) - Move the "wait" to after all the messages has been sent. - Link to v1: https://lore.kernel.org/r/20250902-netconsole_torture-v1-1-03c6066598e9@deb… --- Breno Leitao (4): net: netpoll: fix incorrect refcount handling causing incorrect cleanup selftest: netcons: refactor target creation selftest: netcons: create a torture test selftest: netcons: add test for netconsole over bonded interfaces net/core/netpoll.c | 7 +- tools/testing/selftests/drivers/net/Makefile | 1 + .../testing/selftests/drivers/net/bonding/Makefile | 2 + tools/testing/selftests/drivers/net/bonding/config | 4 + .../drivers/net/bonding/netcons_over_bonding.sh | 361 +++++++++++++++++++++ .../selftests/drivers/net/lib/sh/lib_netcons.sh | 82 ++++- .../selftests/drivers/net/netcons_torture.sh | 130 ++++++++ 7 files changed, 569 insertions(+), 18 deletions(-) --- base-commit: e120f46768d98151ece8756ebd688b0e43dc8b29 change-id: 20250902-netconsole_torture-8fc23f0aca99 Best regards, -- Breno Leitao <leitao(a)debian.org>

2 months, 1 week

1
5
0 0

[PATCH net-next 0/4] mptcp: pm: in-kernel: fullmesh endp nb + bind cases

by Matthieu Baerts (NGI0)

Here is a small optimisation for the in-kernel PM, joined by a small behavioural change to avoid confusions, and followed by a few more tests. - Patch 1: record fullmesh endpoints numbers, not to iterate over all endpoints to check if one is marked as fullmesh. - Patch 2: when at least one endpoint is marked as fullmesh, only use these endpoints when reacting to an ADD_ADDR, even if there are no endpoints for this IP family: this is less confusing. - Patch 3: reduce duplicated code to prepare the next patch. - Patch 4: extra "bind" cases: the listen socket restrict the bind to one IP address, not allowing MP_JOIN to extra IP addresses, except if another listening socket accepts them. Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org> --- Matthieu Baerts (NGI0) (4): mptcp: pm: in-kernel: record fullmesh endp nb mptcp: pm: in kernel: only use fullmesh endp if any selftests: mptcp: join: do_transfer: reduce code dup selftests: mptcp: join: validate extra bind cases include/uapi/linux/mptcp.h | 3 +- net/mptcp/pm_kernel.c | 36 ++++- net/mptcp/protocol.h | 1 + net/mptcp/sockopt.c | 2 + tools/testing/selftests/net/mptcp/mptcp_connect.c | 10 +- tools/testing/selftests/net/mptcp/mptcp_join.sh | 187 +++++++++++++++++++--- 6 files changed, 213 insertions(+), 26 deletions(-) --- base-commit: 01cc760632b875c4ad0d8fec0b0c01896b8a36d4 change-id: 20251101-net-next-mptcp-fm-endp-nb-bind-cf7ab688d9f1 Best regards, -- Matthieu Baerts (NGI0) <matttbe(a)kernel.org>

2 months, 1 week

2
5
0 0

[PATCH net] selftests: netdevsim: Fix ethtool-features.sh fail

by Wang Liang

The test 'ethtool-features.sh' failed with the below output: TAP version 13 1..1 # timeout set to 600 # selftests: drivers/net/netdevsim: ethtool-features.sh # Warning: file ethtool-features.sh is not executable # ethtool: bad command line argument(s) # For more information run ethtool -h # ethtool: bad command line argument(s) # For more information run ethtool -h # ethtool: bad command line argument(s) # For more information run ethtool -h # ethtool: bad command line argument(s) # For more information run ethtool -h # ethtool: bad command line argument(s) # For more information run ethtool -h # ethtool: bad command line argument(s) # For more information run ethtool -h # ethtool: bad command line argument(s) # For more information run ethtool -h # ethtool: bad command line argument(s) # For more information run ethtool -h # ethtool: bad command line argument(s) # For more information run ethtool -h # ethtool: bad command line argument(s) # For more information run ethtool -h # FAILED 10/10 checks not ok 1 selftests: drivers/net/netdevsim: ethtool-features.sh # exit=1 Similar to commit 18378b0e49d9 ("selftests/damon: Add executable permission to test scripts"), the script 'ethtool-features.sh' has no executable permission, which leads to the warning 'file ethtool-features.sh is not executable'. Old version ethtool (my ethtool version is 5.16) does not support command 'ethtool --json -k enp1s0', which leads to the output 'ethtool: bad command line argument(s)'. This patch adds executable permission to script 'ethtool-features.sh', and check 'ethtool --json -k' support. After this patch: TAP version 13 1..1 # timeout set to 600 # selftests: drivers/net/netdevsim: ethtool-features.sh # SKIP: No --json -k support in ethtool ok 1 selftests: drivers/net/netdevsim: ethtool-features.sh Fixes: 0189270117c3 ("selftests: netdevsim: add a test checking ethtool features") Signed-off-by: Wang Liang <wangliang74(a)huawei.com> --- .../selftests/drivers/net/netdevsim/ethtool-features.sh | 5 +++++ 1 file changed, 5 insertions(+) mode change 100644 => 100755 tools/testing/selftests/drivers/net/netdevsim/ethtool-features.sh diff --git a/tools/testing/selftests/drivers/net/netdevsim/ethtool-features.sh b/tools/testing/selftests/drivers/net/netdevsim/ethtool-features.sh old mode 100644 new mode 100755 index bc210dc6ad2d..f771dc6839ea --- a/tools/testing/selftests/drivers/net/netdevsim/ethtool-features.sh +++ b/tools/testing/selftests/drivers/net/netdevsim/ethtool-features.sh @@ -7,6 +7,11 @@ NSIM_NETDEV=$(make_netdev) set -o pipefail +if ! ethtool --json -k $NSIM_NETDEV > /dev/null 2>&1; then + echo "SKIP: No --json -k support in ethtool" + exit $ksft_skip +fi + FEATS=" tx-checksum-ip-generic tx-scatter-gather -- 2.34.1

2 months, 1 week

4
11
0 0

[PATCH v4 nf-next] selftests: netfilter: Add bridge_fastpath.sh

by Eric Woudstra

Add a script to test various scenarios where a bridge is involved in the fastpath. It runs tests in the forward path, and also in a bridged path. The setup is similar to a basic home router with multiple lan ports. It uses 3 pairs of veth-devices. Each or all pairs can be replaced by a pair of real interfaces, interconnected by wire. This is necessary to test the behavior when dealing with dsa ports, foreign (dsa) ports and switchdev userports that support SWITCHDEV_OBJ_ID_PORT_VLAN. See the head of the script for a detailed description. Run without arguments to perform all tests on veth-devices. Signed-off-by: Eric Woudstra <ericwouds(a)gmail.com> --- This test script is written first for the proposed bridge-fastpath patch-sets, but it's use is more general and can easily be expanded. Changes in v4: - Also only match ct state in rule without fastpath. - Dropped RFC - Cosmetics Changes in v3: - Removed all warnings reported by shellcheck -x -e SC2317 - Improved del_pppoe(), check if interfaces are removed - Added is_known_issue() to warn instead of error for known issues - Link down and (hardware) interfaces to default netns at end of script - Removed matching ip(v6) address Changes in v2: - Moved test-series to functions - Moved code to set_pair_link() up/down - Added conntrack zone to bridged traffic - Test bridge chain prerouting in test without fastpath and bridge chain forward in tests with fastpath Some example outputs of this last version of patches from different hardware, without and with patches: ALL VETH: ========= ./bridge_fastpath.sh -t Setup: CLIENT 0 veth0cl | veth0rt WAN ROUTER LAN1 LAN2 veth1rt veth2rt | | veth1cl veth2cl CLIENT 1 CLIENT 2 Without patches: PASS: unaware bridge, without encaps, without fastpath PASS: unaware bridge, with single vlan encap, without fastpath WARN: unaware bridge, with double q vlan encaps, without fastpath: ipv4/6: established bytes 0 < 4194304 WARN: unaware bridge, with 802.1ad vlan encaps, without fastpath: ipv4/6: established bytes 0 < 4194304 WARN: unaware bridge, with pppoe encap, without fastpath: ipv4/6: established bytes 0 < 4194304 WARN: unaware bridge, with pppoe-in-q encaps, without fastpath: ipv4/6: established bytes 0 < 4194304 PASS: aware bridge, without/without vlan encap, without fastpath PASS: aware bridge, with/without vlan encap, without fastpath PASS: aware bridge, with/with vlan encap, without fastpath PASS: aware bridge, without/with vlan encap, without fastpath PASS: forward, without vlan-device, without vlan encap, client1, without fastpath PASS: forward, without vlan-device, without vlan encap, client1, with fastpath PASS: forward, without vlan-device, with vlan encap, client1, without fastpath WARN: forward, without vlan-device, with vlan encap, client1, with fastpath: ipv4/6: tcp broken PASS: forward, with vlan-device, without vlan encap, client1, without fastpath PASS: forward, with vlan-device, without vlan encap, client1, with fastpath PASS: forward, with vlan-device, with vlan encap, client1, without fastpath PASS: forward, with vlan-device, with vlan encap, client1, with fastpath PASS: all tests passed With patches: PASS: unaware bridge, without encaps, without fastpath PASS: unaware bridge, without encaps, with fastpath PASS: unaware bridge, with single vlan encap, without fastpath PASS: unaware bridge, with single vlan encap, with fastpath PASS: unaware bridge, with double q vlan encaps, without fastpath PASS: unaware bridge, with double q vlan encaps, with fastpath PASS: unaware bridge, with 802.1ad vlan encaps, without fastpath PASS: unaware bridge, with 802.1ad vlan encaps, with fastpath PASS: unaware bridge, with pppoe encap, without fastpath PASS: unaware bridge, with pppoe encap, with fastpath PASS: unaware bridge, with pppoe-in-q encaps, without fastpath PASS: unaware bridge, with pppoe-in-q encaps, with fastpath PASS: aware bridge, without/without vlan encap, without fastpath PASS: aware bridge, without/without vlan encap, with fastpath PASS: aware bridge, with/without vlan encap, without fastpath PASS: aware bridge, with/without vlan encap, with fastpath PASS: aware bridge, with/with vlan encap, without fastpath PASS: aware bridge, with/with vlan encap, with fastpath PASS: aware bridge, without/with vlan encap, without fastpath PASS: aware bridge, without/with vlan encap, with fastpath PASS: forward, without vlan-device, without vlan encap, client1, without fastpath PASS: forward, without vlan-device, without vlan encap, client1, with fastpath PASS: forward, without vlan-device, with vlan encap, client1, without fastpath PASS: forward, without vlan-device, with vlan encap, client1, with fastpath PASS: forward, with vlan-device, without vlan encap, client1, without fastpath PASS: forward, with vlan-device, without vlan encap, client1, with fastpath PASS: forward, with vlan-device, with vlan encap, client1, without fastpath PASS: forward, with vlan-device, with vlan encap, client1, with fastpath PASS: all tests passed BANANAPI-R3 (lan1 & lan2 are dsa): ============ Without patches: ./bridge_fastpath.sh -t -0 enu1u2,lan2 -1 enu1u1,lan1 -2 lan4,eth1 Setup: CLIENT 0 enu1u2 | lan2 WAN ROUTER LAN1 LAN2 lan1 eth1 | | enu1u1 lan4 CLIENT 1 CLIENT 2 PASS: unaware bridge, without encaps, without fastpath PASS: unaware bridge, with single vlan encap, without fastpath WARN: unaware bridge, with pppoe encap, without fastpath: ipv4/6: established bytes 0 < 4194304 WARN: unaware bridge, with pppoe-in-q encaps, without fastpath: ipv4/6: established bytes 0 < 4194304 PASS: aware bridge, without/without vlan encap, without fastpath PASS: aware bridge, with/without vlan encap, without fastpath PASS: aware bridge, with/with vlan encap, without fastpath PASS: aware bridge, without/with vlan encap, without fastpath PASS: forward, without vlan-device, without vlan encap, client1, without fastpath WARN: forward, without vlan-device, without vlan encap, client1, with fastpath: ipv4: counted bytes 2110480 > 2097152 WARN: forward, without vlan-device, without vlan encap, client1, with fastpath: ipv6: counted bytes 2116104 > 2097152 PASS: forward, without vlan-device, without vlan encap, client1, with hw_fastpath PASS: forward, without vlan-device, without vlan encap, client2, without fastpath PASS: forward, without vlan-device, without vlan encap, client2, with fastpath PASS: forward, without vlan-device, without vlan encap, client2, with hw_fastpath PASS: forward, without vlan-device, with vlan encap, client1, without fastpath WARN: forward, without vlan-device, with vlan encap, client1, with fastpath: ipv4/6: tcp broken WARN: forward, without vlan-device, with vlan encap, client1, with hw_fastpath: ipv4/6: tcp broken PASS: forward, without vlan-device, with vlan encap, client2, without fastpath WARN: forward, without vlan-device, with vlan encap, client2, with fastpath: ipv4/6: tcp broken WARN: forward, without vlan-device, with vlan encap, client2, with hw_fastpath: ipv4/6: tcp broken PASS: forward, with vlan-device, without vlan encap, client1, without fastpath PASS: forward, with vlan-device, without vlan encap, client1, with fastpath PASS: forward, with vlan-device, without vlan encap, client1, with hw_fastpath PASS: forward, with vlan-device, without vlan encap, client2, without fastpath WARN: forward, with vlan-device, without vlan encap, client2, with fastpath: ipv4: counted bytes 2122388 > 2097152 WARN: forward, with vlan-device, without vlan encap, client2, with fastpath: ipv6: counted bytes 2129280 > 2097152 WARN: forward, with vlan-device, without vlan encap, client2, with hw_fastpath: ipv4: counted bytes 2110428 > 2097152 WARN: forward, with vlan-device, without vlan encap, client2, with hw_fastpath: ipv6: counted bytes 2140144 > 2097152 PASS: forward, with vlan-device, with vlan encap, client1, without fastpath PASS: forward, with vlan-device, with vlan encap, client1, with fastpath PASS: forward, with vlan-device, with vlan encap, client1, with hw_fastpath PASS: forward, with vlan-device, with vlan encap, client2, without fastpath PASS: forward, with vlan-device, with vlan encap, client2, with fastpath PASS: forward, with vlan-device, with vlan encap, client2, with hw_fastpath PASS: all tests passed With patches: PASS: unaware bridge, without encaps, without fastpath PASS: unaware bridge, without encaps, with fastpath PASS: unaware bridge, without encaps, with hw_fastpath PASS: unaware bridge, with single vlan encap, without fastpath PASS: unaware bridge, with single vlan encap, with fastpath PASS: unaware bridge, with single vlan encap, with hw_fastpath PASS: unaware bridge, with pppoe encap, without fastpath PASS: unaware bridge, with pppoe encap, with fastpath PASS: unaware bridge, with pppoe encap, with hw_fastpath PASS: unaware bridge, with pppoe-in-q encaps, without fastpath PASS: unaware bridge, with pppoe-in-q encaps, with fastpath PASS: unaware bridge, with pppoe-in-q encaps, with hw_fastpath PASS: aware bridge, without/without vlan encap, without fastpath PASS: aware bridge, without/without vlan encap, with fastpath PASS: aware bridge, without/without vlan encap, with hw_fastpath PASS: aware bridge, with/without vlan encap, without fastpath PASS: aware bridge, with/without vlan encap, with fastpath PASS: aware bridge, with/without vlan encap, with hw_fastpath PASS: aware bridge, with/with vlan encap, without fastpath PASS: aware bridge, with/with vlan encap, with fastpath PASS: aware bridge, with/with vlan encap, with hw_fastpath PASS: aware bridge, without/with vlan encap, without fastpath PASS: aware bridge, without/with vlan encap, with fastpath PASS: aware bridge, without/with vlan encap, with hw_fastpath PASS: forward, without vlan-device, without vlan encap, client1, without fastpath PASS: forward, without vlan-device, without vlan encap, client1, with fastpath PASS: forward, without vlan-device, without vlan encap, client1, with hw_fastpath PASS: forward, without vlan-device, without vlan encap, client2, without fastpath PASS: forward, without vlan-device, without vlan encap, client2, with fastpath PASS: forward, without vlan-device, without vlan encap, client2, with hw_fastpath PASS: forward, without vlan-device, with vlan encap, client1, without fastpath PASS: forward, without vlan-device, with vlan encap, client1, with fastpath PASS: forward, without vlan-device, with vlan encap, client1, with hw_fastpath PASS: forward, without vlan-device, with vlan encap, client2, without fastpath PASS: forward, without vlan-device, with vlan encap, client2, with fastpath PASS: forward, without vlan-device, with vlan encap, client2, with hw_fastpath PASS: forward, with vlan-device, without vlan encap, client1, without fastpath PASS: forward, with vlan-device, without vlan encap, client1, with fastpath PASS: forward, with vlan-device, without vlan encap, client1, with hw_fastpath PASS: forward, with vlan-device, without vlan encap, client2, without fastpath PASS: forward, with vlan-device, without vlan encap, client2, with fastpath PASS: forward, with vlan-device, without vlan encap, client2, with hw_fastpath PASS: forward, with vlan-device, with vlan encap, client1, without fastpath PASS: forward, with vlan-device, with vlan encap, client1, with fastpath PASS: forward, with vlan-device, with vlan encap, client1, with hw_fastpath PASS: forward, with vlan-device, with vlan encap, client2, without fastpath PASS: forward, with vlan-device, with vlan encap, client2, with fastpath PASS: forward, with vlan-device, with vlan encap, client2, with hw_fastpath PASS: all tests passed .../testing/selftests/net/netfilter/Makefile | 1 + .../net/netfilter/bridge_fastpath.sh | 1050 +++++++++++++++++ 2 files changed, 1051 insertions(+) create mode 100755 tools/testing/selftests/net/netfilter/bridge_fastpath.sh diff --git a/tools/testing/selftests/net/netfilter/Makefile b/tools/testing/selftests/net/netfilter/Makefile index ee2d1a5254f8..a7edc6654040 100644 --- a/tools/testing/selftests/net/netfilter/Makefile +++ b/tools/testing/selftests/net/netfilter/Makefile @@ -10,6 +10,7 @@ TEST_PROGS := \ br_netfilter.sh \ br_netfilter_queue.sh \ bridge_brouter.sh \ + bridge_fastpath.sh \ conntrack_clash.sh \ conntrack_dump_flush.sh \ conntrack_icmp_related.sh \ diff --git a/tools/testing/selftests/net/netfilter/bridge_fastpath.sh b/tools/testing/selftests/net/netfilter/bridge_fastpath.sh new file mode 100755 index 000000000000..d09b704d7bc6 --- /dev/null +++ b/tools/testing/selftests/net/netfilter/bridge_fastpath.sh @@ -0,0 +1,1050 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Check if conntrack, nft chain and fastpath is functional in setups +# where a bridge is in the fastpath. +# +# Commandline options make it possible to use real ethernet pairs +# instead of veth-device pairs. Any, or all, pairs can be tested using +# real hardware pairs. This is can be useful to test dsa-ports, +# switchdev (dsa) foreign ports and switchdev ports supporting +# SWITCHDEV_OBJ_ID_PORT_VLAN. +# +# First tcp is tested. Conntrack and nft chain are tested using a counter. +# When there is a fastpath possible between the interfaces then the +# fastpath is also tested. +# When there is a hardware offloaded fastpath possible between the +# interfaces then the hardware offloaded path is also tested. +# +# Setup is as a typical router: +# +# nsclientwan +# | +# nsrt +# | | +# nsclient1 nsclient2 +# +# Masquerading for ipv4 only. +# +# First check if a bridge table forward chain can be setup, skip +# these tests if this is not possible. +# Then check if a inet table forward chain can be setup, skip +# these tests if this is not possible. +# +# Different setups of paths are tested that involve a bridge in the +# fastpath. This can be in the forward-fastpath or in the bridge-fastpath. +# +# The first series, in the bridge-fastpath, using a vlan-unaware bridge. +# Traffic with the following vlan-tags is checked: +# a. without vlan +# b. single vlan +# c. double q vlan (only on veth-devices) +# d. 802.1ad vlan (only on veth-devices) +# e. pppoe (when available) +# f. pppoe-in-q (when available) +# +# (for items c to f fastpath can only work when a conntrack zone is set) +# (double tag testing results in broken tcp traffic on most hardware, +# in this test setup, use '-a' argument to test it anyway) +# (pppoe testing takes place if pppd and pppoe-server are installed) +# +# The second series, in the bridge-fastpath, using a vlan-aware bridge. +# Here we test all combinations of ingress/egress with or without single +# vlan encaps. +# +# The third series, in the forward-fastpath, using a vlan-aware bridge, +# without a vlan-device linked to the master port. We test the same combinations +# of ingress/egress with or without single vlan encaps. +# +# The fourth series, in the forward-fastpath, using a vlan-aware bridge, +# with a vlan-device linked to the master port. We test the same combinations +# of ingress/egress with or without single vlan encaps. +# +# Note 1: Using dsa userports on both sides of eth-pairs client1 or client2 +# gives erratic and unpredictable results. Use, for example, an usb-eth device +# on the client side to test a dsa-userport. +# +# Note 2: Testing the hardware offloaded fastpath, it is not checked if the +# packets do not follow the software fastpath instead. A universal way to +# check this should be added at some point. +# +# Note 3: Some interfaces to test on the router side, are netns immutable. +# Use the -d or --defaultnsrouter option so that the interfaces of the router +# do not have to change netns. The router is build up in the default netns. +# + +source lib.sh + +checktool "nft --version" "run test without nft" +checktool "socat -h" "run test without socat" +checktool "bridge -V" "run test without bridge" + +NR_OF_TESTS=4 +VID1=100 +VID2=101 +BRWAN=brwan +BRLAN=brlan +BRCL=brcl +LINKUP_TIMEOUT=10 +PING_TIMEOUT=10 +SOCAT_TIMEOUT=10 +filesize=$((2 * 1024 * 1024)) + +filein=$(mktemp) +file1out=$(mktemp) +file2out=$(mktemp) +pppoeserveroptions=$(mktemp) +pppoeserverpid=$(mktemp) + +setup_ns nsclientwan nsclientlan1 nsclientlan2 + + WAN=0 ; LAN1=1 ; LAN2=2 ; ADWAN=3 ; ADLAN=4 +nsa=( "$nsclientwan" "$nsclientlan1" "$nsclientlan2" ) # $nsrt $nsrt +AD4=( '192.168.1.1' '192.168.2.101' '192.168.2.102' '192.168.1.2' '192.168.2.1' ) +AD6=( 'dead:1::1' 'dead:2::101' 'dead:2::102' 'dead:1::2' 'dead:2::1' ) + +tests_string=$(seq 1 $NR_OF_TESTS) + +while [ "${1:-}" != '' ]; do + case "$1" in + '-0' | '--pairwan') + shift + vethcl[WAN]="${1%,*}" + vethrt[WAN]="${1#*,}" + ;; + '-1' | '--pairlan1') + shift + vethcl[LAN1]="${1%,*}" + vethrt[LAN1]="${1#*,}" + ;; + '-2' | '--pairlan2') + shift + vethcl[LAN2]="${1%,*}" + vethrt[LAN2]="${1#*,}" + ;; + '-s' | '--filesize') + shift + filesize=$1 + ;; + '-p' | '--parts') + shift + tests_string=$1 + ;; + '-4' | '--ipv4') + do_ipv4=1 + ;; + '-6' | '--ipv6') + do_ipv6=1 + ;; + '-n' | '--noskip') + noskip=1 + ;; + '-d' | '--defaultnsrouter') + defaultnsrouter=1 + ;; + '-f' | '--fixmac') + fixmac=1 + ;; + '-t' | '--showtree') + showtree=1 + ;; + *) + cat <<-EOF + Usage: $(basename "$0") [OPTION]... + -0 --pairwan eth0cl,eth0rt pair of real interfaces to use on wan side + -1 --pairlan1 eth1cl,eth1rt pair of real interfaces to use on lan1 side + -2 --pairlan2 eth2cl,eth2rt pair of real interfaces to use on lan2 side + -s --filesize filesize to use for testing in bytes + -p --parts partnumbers of tests to run, comma separated + -4|-6 --ipv4|--ipv6 test ipv4/6 only + -d --defaultnsrouter router in default network namespace, caution! + -f --fixmac change mac address when conflict found + -n --noskip also perform the normally skipped tests + -t --showtree show the tree of used interfaces + EOF + exit "$ksft_skip" + ;; + esac + shift +done + +for i in ${tests_string//','/' '}; do + tests[i]="yes" +done + +if [ -n "$defaultnsrouter" ]; then + nsrt="nsrt-$(mktemp -u XXXXXX)" + touch "/var/run/netns/$nsrt" + mount --bind /proc/1/ns/net "/var/run/netns/$nsrt" +else + setup_ns nsrt +fi +nsa+=("$nsrt" "$nsrt") + +cleanup() { + if [ -n "$defaultnsrouter" ]; then + umount "/var/run/netns/$nsrt" + rm -f "/var/run/netns/$nsrt" + fi + cleanup_all_ns + rm -f "$filein" "$file1out" "$file2out" "$pppoeserveroptions" "$pppoeserverpid" +} + +trap cleanup EXIT + +head -c "$filesize" < /dev/urandom > "$filein" + +check_mac() +{ + local ns=$1 + local dev=$2 + local othermacs=$3 + local mac + + mac=$(ip -net "$ns" -br link show dev "$dev" | \ + grep -o -E '([[:xdigit:]]{1,2}:){5}[[:xdigit:]]{1,2}') + + if [[ ! "$othermacs" =~ $mac ]]; then + echo "$mac" + return 0 + fi + echo "WARN: Conflicting mac address $dev $mac" 1>&2 + + [ -z "$fixmac" ] && return 1 + + for (( j = 0 ; j < 10 ; j++ )); do + mac="${mac::6}$(printf %02x:%02x:%02x:%02x $((RANDOM%256)) \ + $((RANDOM%256)) $((RANDOM%256)) $((RANDOM%256)))" + [[ "$othermacs" =~ $mac ]] && continue + echo "$mac" + ip -net "$ns" link set dev "$dev" address "$mac" 1>&2 + return $? + done + return 1 +} + +is_link() +{ + local updown=$1 + local ns=$2 + local dev=$3 + + if ip -net "$ns" link show dev "$dev" "${updown,,}" 2>/dev/null | \ + grep -q "state ${updown^^}" + then + return 0 + fi + return 1 +} + +set_pair_link() +{ + local updown=$1 + local all="${*:2}" + local lret=0 + local i j + + for i in $all; do + ns="${nsa[$i]}" + ip -net "$ns" link set "${vethcl[$i]}" "$updown" + lret=$((lret | $?)) + ip -net "$nsrt" link set "${vethrt[$i]}" "$updown" + lret=$((lret | $?)) + done + [ $lret -ne 0 ] && return 1 + + for j in $(seq 1 $((LINKUP_TIMEOUT * 5 ))); do + lret=0 + for i in $all; do + ns="${nsa[$i]}" + is_link "$updown" "$ns" "${vethcl[$i]}" + lret=$((lret | $?)) + is_link "$updown" "$nsrt" "${vethrt[$i]}" + lret=$((lret | $?)) + done + [ $lret -eq 0 ] && break + sleep 0.2 + done + return $lret +} + +wait_ping() +{ + local i1=$1 + local i2=$2 + local ns1=${nsa[$i1]} + local j + local lret + + for j in $(seq 1 $((PING_TIMEOUT * 5 ))); do + ip netns exec "$ns1" ping -c 1 -w $PING_TIMEOUT -i 0.2 \ + -q "${AD4[$i2]}" >/dev/null 2>&1 + lret=$? + [ $lret -le 1 ] && return $lret + sleep 0.2 + done + return 1 +} + +add_addr() +{ + local i=$1 + local dev=$2 + local ns=${nsa[$i]} + local ad4=${AD4[$i]} + local ad6=${AD6[$i]} + + ip -net "$ns" addr add "${ad4}/24" dev "$dev" + ip -net "$ns" addr add "${ad6}/64" dev "$dev" nodad + if [[ "$ns" == "nsclientlan"* ]]; then + ip -net "$ns" route add default via "${AD4[$ADLAN]}" + ip -net "$ns" route add default via "${AD6[$ADLAN]}" + elif [[ "$ns" == "nsclientwan"* ]]; then + ip -net "$ns" route add default via "${AD6[$ADWAN]}" + fi + +} + +del_addr() +{ + local i=$1 + local dev=$2 + local ns=${nsa[$i]} + local ad4=${AD4[$i]} + local ad6=${AD6[$i]} + + if [[ "$ns" == "nsclientlan"* ]]; then + ip -net "$ns" route del default via "${AD6[$ADLAN]}" + ip -net "$ns" route del default via "${AD4[$ADLAN]}" + elif [[ "$ns" == "nsclientwan"* ]]; then + ip -net "$ns" route del default via "${AD6[$ADWAN]}" + fi + ip -net "$ns" addr del "${ad6}/64" dev "$dev" nodad + ip -net "$ns" addr del "${ad4}/24" dev "$dev" +} + +set_client() +{ + local i=$1 + local vlan=$2 + local arg=$3 + local ns=${nsa[$i]} + local vdev="${vethcl[$i]}" + local brdev="$BRCL" + local proto="" + local pvidslave="" + + unset_client "$i" + + if [[ "$vlan" == "qq" ]]; then + ip -net "$ns" link add link "$vdev" name "$vdev.$VID1" type vlan id $VID1 + ip -net "$ns" link add link "$vdev.$VID1" name "$vdev.$VID1.$VID2" \ + type vlan id $VID2 + ip -net "$ns" link set "$vdev.$VID1" up + ip -net "$ns" link set "$vdev.$VID1.$VID2" up + add_addr "$i" "$vdev.$VID1.$VID2" + return + fi + + [[ "$vlan" == "none" ]] && pvidslave="pvid untagged" + [[ "$vlan" == "ad" ]] && proto="vlan_protocol 802.1ad" + + # shellcheck disable=SC2086 + ip -net "$ns" link add "$brdev" type bridge vlan_filtering 1 vlan_default_pvid 0 $proto + ip -net "$ns" link set "$vdev" master "$brdev" + ip -net "$ns" link set "$brdev" up + + # shellcheck disable=SC2086 + bridge -net "$ns" vlan add dev "$vdev" vid $VID1 $pvidslave + bridge -net "$ns" vlan add dev "$brdev" vid $VID1 pvid untagged self + + if [[ "$vlan" == "ad" ]]; then + ip -net "$ns" link add link "$brdev" name "$brdev.$VID2" type vlan id $VID2 + brdev="$brdev.$VID2" + ip -net "$ns" link set "$brdev" up + fi + + if [[ "$arg" != "noaddress" ]]; then + add_addr "$i" "$brdev" + fi +} + +unset_client() +{ + local i=$1 + local ns=${nsa[$i]} + local vdev="${vethcl[$i]}" + local brdev="$BRCL" + + ip -net "$ns" link del "$brdev" type bridge 2>/dev/null + ip -net "$ns" link del "$vdev.$VID1" 2>/dev/null +} + +add_pppoe() +{ + local i1=$1 + local i2=$2 + local dev1=$3 + local dev2=$4 + local desc=$5 + local ns1=${nsa[$i1]} + local ns2=${nsa[$i2]} + + ppp1=0 + while [ -n "$(ip -net "$ns1" link show ppp$ppp1 2>/dev/null)" ] + do ((ppp1++)); done + echo "noauth defaultroute noipdefault unit $ppp1" >"$pppoeserveroptions" + ppp1="ppp$ppp1" + + if ! ip netns exec "$ns1" pppoe-server -k -L "${AD4[$i1]}" -R "${AD4[$i2]}" \ + -I "$dev1" -X "$pppoeserverpid" -O "$pppoeserveroptions" >/dev/null; then + echo "ERROR: $desc: failed to setup pppoe server" 1>&2 + return 1 + fi + + if ! ip netns exec "$ns2" pppd plugin pppoe.so nic-"$dev2" persist holdoff 0 noauth \ + defaultroute noipdefault noaccomp nodeflate noproxyarp nopcomp \ + novj novjccomp linkname "selftest-$$" >/dev/null; then + echo "ERROR: $desc: failed to setup pppoe client" 1>&2 + return 1 + fi + + if ! wait_ping "$i1" "$i2"; then + echo "ERROR: $desc: failed to setup functional pppoe connection" 1>&2 + return 1 + fi + + ppp2=$(tail -n 1 < "/run/pppd/ppp-selftest-$$.pid") + + ip -net "$ns1" addr add "${AD6[$i1]}/64" dev "$ppp1" nodad + ip -net "$ns2" addr add "${AD6[$i2]}/64" dev "$ppp2" nodad + + return 0 +} + +del_pppoe() +{ + local i1=$1 + local i2=$2 + local dev1=$3 + local dev2=$4 + local ns1=${nsa[$i1]} + local ns2=${nsa[$i2]} + local i serverpid clientpid + + serverpid="$(head -n 1 < "$pppoeserverpid")" + clientpid="$(head -n 1 < "/run/pppd/ppp-selftest-$$.pid")" + + [[ -n "$ppp1" ]] && ip -net "$ns1" addr del "${AD6[$i1]}/64" dev "$ppp1" + [[ -n "$ppp2" ]] && ip -net "$ns2" addr del "${AD6[$i2]}/64" dev "$ppp2" + + for i in $(seq 1 $((PING_TIMEOUT * 5 ))); do + if ip -net "$ns2" link show dev "$ppp2" 1>/dev/null 2>/dev/null; then + kill -9 "$clientpid" 2>/dev/null + elif ip -net "$ns1" link show dev "$ppp1" 1>/dev/null 2>/dev/null; then + kill -SIGTERM "$serverpid" 2>/dev/null + else return 0 + fi + sleep 0.2 + done + echo "ERROR: failed to remove pppoe connection" 1>&2 + return 1 +} + +listener_ready() +{ + local ns=$1 + local ipv=$2 + + ss -N "$ns" --ipv"$ipv" -lnt -o "sport = :8080" | grep -q 8080 +} + +test_tcp() { + local i1=$1 + local i2=$2 + local dofast=$3 + local desc=$4 + local ns1=${nsa[$i1]} + local ns2=${nsa[$i2]} + local i=-1 + local lret=0 + local ads="" + local ipv ad a lpid bytes error + + if [ -n "$do_ipv4" ]; then ads="${AD4[$i2]}" + elif [ -n "$do_ipv6" ]; then ads="${AD6[$i2]}" + else ads="${AD4[$i2]} ${AD6[$i2]}" + fi + for ad in $ads; do + ((i++)) + if [[ "$ad" =~ ":" ]] + then ipv="6"; a="[${ad}]" + else ipv="4"; a="${ad}" + fi + + rm -f "$file1out" "$file2out" + + # ip netns exec "$nsrt" nft reset counters >/dev/null + # But on some systems this results in 4GB values in packet and byte count, so: + (echo "flush ruleset"; ip netns exec "$nsrt" nft --stateless list ruleset) | \ + ip netns exec "$nsrt" nft -f - + + timeout "$SOCAT_TIMEOUT" ip netns exec "$ns2" socat TCP$ipv-LISTEN:8080,reuseaddr \ + STDIO <"$filein" >"$file2out" 2>/dev/null & + lpid=$! + busywait 1000 listener_ready "$ns2" "$ipv" + + timeout "$SOCAT_TIMEOUT" ip netns exec "$ns1" socat TCP$ipv:"$a":8080 \ + STDIO <"$filein" >"$file1out" 2>/dev/null + + if ! wait $lpid; then + error[i]="tcp broken" + continue + fi + if ! cmp "$filein" "$file1out" >/dev/null 2>&1; then + error[i]="file mismatch to ${ad}" + continue + fi + if ! cmp "$filein" "$file2out" >/dev/null 2>&1; then + error[i]="file mismatch from ${ad}" + continue + fi + + bytes=$(ip netns exec "$nsrt" nft list counter $family filter "check" | \ + grep "packets" | cut -d' ' -f4) + if [ -z "$dofast" ] && [ "$bytes" -lt "$((2 * filesize))" ]; then + + error[i]="established bytes $bytes < $((2 * filesize))" + continue + fi + if [ -n "$dofast" ] && [ "$bytes" -gt "$filesize" ]; then + # Significant reduction of bytes expected + error[i]="counted bytes $bytes > $filesize" + continue + fi + + done + + if [ -n "${error[0]}" ]; then + if [[ "${error[0]}" == "${error[1]}" ]]; then + error[0]="$desc: ipv4/6: ${error[0]}" + error[1]="" + else + error[0]="$desc: ipv4: ${error[0]}" + fi + fi + if [ -n "${error[1]}" ]; then + error[1]="$desc: ipv6: ${error[1]}" + fi + + for i in 0 1; do + if [ -n "${error[i]}" ]; then + if is_known_issue "$desc: ${error[i]}"; then + echo "WARN: ${error[i]}" 1>&2 + lret=$((lret | 1)) + else + echo "ERROR: ${error[i]}" 1>&2 + lret=$((lret | 2)) + fi + fi + done + if [ $lret -eq 0 ]; then + echo "PASS: $desc" + fi + return $(( lret & 2 )) +} + +known_issues=( +'*unaware bridge,*with double q vlan encaps,*without fastpath*established*' # 1 +'*unaware bridge,*with 802.1ad vlan encaps,*without fastpath*established*' # 1 +'*unaware bridge,*with pppoe encap,*without fastpath*established*' # 1 +'*unaware bridge,*with pppoe-in-q encaps,*without fastpath*established*' # 1 +'*forward,*without vlan-device, without vlan encap,*with *fastpath:*counted*' # 2 +'*forward,*without vlan-device, with vlan encap,*with *fastpath:*tcp broken*' # 3 +'*forward,*with vlan-device, without vlan encap,*with *fastpath:*counted*' # 4 +) + +is_known_issue() { + local err=$1 + for issue in "${known_issues[@]}"; do + # shellcheck disable=SC2053 + [[ "$err" == $issue ]] && return 0 + done + return 1 +} + +test_paths() { + local i1=$1 + local i2=$2 + local desc=$3 + + if ! setup_nftables "$i1" "$i2"; then + echo "ERROR: $desc: cannot setup nftables" 1>&2 + return 1 + fi + if ! test_tcp "$i1" "$i2" "" "$desc without fastpath"; then + return 1 + fi + + if ! setup_fastpath "$i1" "$i2" "" 2>/dev/null; then + return 0 + fi + if ! test_tcp "$i1" "$i2" "fast" "$desc with fastpath"; then + return 1 + fi + + if ! setup_fastpath "$i1" "$i2" "hw" 2>/dev/null; then + return 0 + fi + if ! test_tcp "$i1" "$i2" "fast" "$desc with hw_fastpath"; then + return 1 + fi + + return 0 + +} + +add_masq() +{ + if [[ $family != "bridge" ]]; then + ip netns exec "$nsrt" nft -f - <<-EOF + table ip nat { + chain postrouting { + type nat hook postrouting priority 0; + oifname ${BRWAN} masquerade + } + } + EOF + else + return 0 + fi +} + +add_zone() +{ + local devs=$1 + + if [[ $family == "bridge" ]]; then + ip netns exec "$nsrt" nft -f - <<-EOF + table ${family} filter { + chain preroutingzones { + type filter hook prerouting priority -300; + iif ${devs} ct zone set 23 + } + } + EOF + fi +} + +setup_nftables() +{ + local devs="{ ${vethrt[$1]} , ${vethrt[$2]} }" + local i1=$1 + local i2=$2 + + ip netns exec "$nsrt" nft flush ruleset + + if ! add_masq; then + return 1 + fi + + add_zone "${devs}" 2>/dev/null + + ip netns exec "$nsrt" nft -f - <<-EOF + table ${family} filter { + counter check { } + chain prerouting { + type filter hook prerouting priority 0; policy accept; + ct state established counter name "check" + } + } + EOF +} + +setup_fastpath() +{ + local devs="{ ${vethrt[$1]} , ${vethrt[$2]} }" + local arg=$3 + local flags="" + + [[ "$arg" == "hw" ]] && flags="flags offload" + + ip netns exec "$nsrt" nft flush ruleset + + if ! add_masq; then + return 1 + fi + + add_zone "${devs}" 2>/dev/null + + ip netns exec "$nsrt" nft -f - <<-EOF + table ${family} filter { + counter check { } + flowtable f { + hook ingress priority filter + devices = ${devs} + ${flags} + } + chain forward { + type filter hook forward priority 0; policy accept; + counter name "check" + ct state established flow add @f + } + } + EOF +} + +test_unaware_bridge() +{ + local lret=0 + local i + + for i in $LAN1 $LAN2; do + set_client "$i" none + done + + test_paths $LAN1 $LAN2 "unaware bridge, without encaps, " + lret=$((lret | $?)) + + for i in $LAN1 $LAN2; do + set_client "$i" q + done + + test_paths $LAN1 $LAN2 "unaware bridge, with single vlan encap, " + lret=$((lret | $?)) + + for i in $LAN1 $LAN2; do + set_client "$i" qq + done + + # Skip testing double tagged packets on real hardware + if [ -n "$lan_all_veth" ] || [ -n "$noskip" ]; then + + test_paths $LAN1 $LAN2 "unaware bridge, with double q vlan encaps, " + lret=$((lret | $?)) + + for i in $LAN1 $LAN2; do + set_client "$i" ad + done + + test_paths $LAN1 $LAN2 "unaware bridge, with 802.1ad vlan encaps, " + lret=$((lret | $?)) + + fi + # End Skip testing double tagged packets + + if [ -n "$(command -v pppd 2>/dev/null)" ] && + [ -n "$(command -v pppoe-server 2>/dev/null)" ]; then + # Start pppoe + + for i in $LAN1 $LAN2; do + set_client "$i" none noaddress + done + + if add_pppoe $LAN1 $LAN2 "$BRCL" "$BRCL" "unaware bridge, with pppoe encap"; then + test_paths $LAN1 $LAN2 "unaware bridge, with pppoe encap, " + lret=$((lret | $?)) + fi + + del_pppoe $LAN1 $LAN2 "$BRCL" "$BRCL" + lret=$((lret | $?)) + + for i in $LAN1 $LAN2; do + set_client "$i" q noaddress + done + + if add_pppoe $LAN1 $LAN2 "$BRCL" "$BRCL" "unaware bridge, with pppoe-in-q encaps"; then + test_paths $LAN1 $LAN2 "unaware bridge, with pppoe-in-q encaps, " + lret=$((lret | $?)) + fi + + del_pppoe $LAN1 $LAN2 "$BRCL" "$BRCL" + lret=$((lret | $?)) + + # End pppoe + fi + + for i in $LAN1 $LAN2; do + unset_client "$i" + done + return $lret +} + +test_aware_bridge() +{ + local lret=0 + local i + + for i in $LAN1 $LAN2; do + bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1 pvid untagged + set_client "$i" none + done + test_paths $LAN1 $LAN2 "aware bridge, without/without vlan encap," + lret=$((lret | $?)) + + i=$LAN1 + bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 pvid untagged + bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1 + set_client $i q + + test_paths $LAN1 $LAN2 "aware bridge, with/without vlan encap, " + lret=$((lret | $?)) + + i=$LAN2 + bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 pvid untagged + bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1 + set_client $i q + + test_paths $LAN1 $LAN2 "aware bridge, with/with vlan encap, " + lret=$((lret | $?)) + + i=$LAN1 + bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 + bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1 pvid untagged + set_client $i none + + test_paths $LAN1 $LAN2 "aware bridge, without/with vlan encap, " + lret=$((lret | $?)) + + i=$LAN1 + bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 pvid untagged + unset_client $i + i=$LAN2 + bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 + unset_client $i + + return $lret +} + +test_forward_without_vlandev() +{ + local wo=$1 + local lret=0 + local i + + [[ "$wo" == "" ]] && wo="without" + + for i in $LAN1 $LAN2; do + bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1 pvid untagged + set_client "$i" none + done + + test_paths $LAN1 $WAN "forward, $wo vlan-device, without vlan encap, client1," + lret=$((lret | $?)) + if [ -z "$lan_all_veth" ] || [ -n "$noskip" ]; then + test_paths $LAN2 $WAN "forward, $wo vlan-device, without vlan encap, client2," + lret=$((lret | $?)) + fi + + for i in $LAN1 $LAN2; do + bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 pvid untagged + bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1 + set_client "$i" q + done + + test_paths $LAN1 $WAN "forward, $wo vlan-device, with vlan encap, client1," + lret=$((lret | $?)) + if [ -z "$lan_all_veth" ] || [ -n "$noskip" ]; then + test_paths $LAN2 $WAN "forward, $wo vlan-device, with vlan encap, client2," + lret=$((lret | $?)) + fi + + for i in $LAN1 $LAN2; do + bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 + unset_client "$i" + done + return $lret +} + +test_forward_with_vlandev() +{ + test_forward_without_vlandev "with" + return $? +} + +ret=0 +### Start Initial Setup ### + +for i in 4 6; do + ip netns exec "$nsrt" sysctl -q net.ipv$i.conf.all.forwarding=1 +done + +### Use brwan to make sure software fastpath is ### +### direct xmit in other direction also ### + +ip -net "$nsrt" link add $BRWAN type bridge +ret=$((ret | $?)) +ip -net "$nsrt" link set $BRWAN up +ret=$((ret | $?)) +if [ $ret -ne 0 ]; then + echo "SKIP: Can't create bridge" + exit "$ksft_skip" +fi + +# If both lan clients are veth-devices, only test 1 in the forward path +if [ -z "${vethcl[$LAN1]}" ] && [ -z "${vethcl[$LAN2]}" ]; then + lan_all_veth=1 +fi + +for i in $WAN $LAN1 $LAN2; do + ns="${nsa[$i]}" + if [ -z "${vethcl[$i]}" ]; then + vethcl[i]="veth${i}cl" + vethrt[i]="veth${i}rt" + ip link add "${vethcl[$i]}" netns "$ns" type veth \ + peer name "${vethrt[$i]}" netns "$nsrt" + ret=$((ret | $?)) + else # Use pair of interconnected hardware interfaces + ip link set "${vethrt[$i]}" netns "$nsrt" + ret=$((ret | $?)) + ip link set "${vethcl[$i]}" netns "$ns" + ret=$((ret | $?)) + fi +done +if [ $ret -ne 0 ]; then + echo "SKIP: (v)eth pairs cannot be used" + exit "$ksft_skip" +fi + +if [ -n "$showtree" ]; then + cat <<-EOF + Setup: + CLIENT 0 + ${vethcl[$WAN]} + | + ${vethrt[$WAN]} + WAN + ROUTER + LAN1 LAN2 + $(printf "%14.14s" "${vethrt[$LAN1]}") ${vethrt[$LAN2]} + | | + $(printf "%14.14s" "${vethcl[$LAN1]}") ${vethcl[$LAN2]} + CLIENT 1 CLIENT 2 + + EOF +fi + +for n in nsclientwan nsclientlan; do + routerside=""; clientside="" + for i in $WAN $LAN1 $LAN2; do + ns="${nsa[$i]}" + [[ "$ns" != "$n"* ]] && continue + mac=$(check_mac "$ns" "${vethcl[$i]}" "$routerside $clientside") + ret=$((ret | $?)) + clientside+=" $mac" + mac=$(check_mac "$nsrt" "${vethrt[$i]}" "$clientside") + ret=$((ret | $?)) + routerside+=" $mac" + done +done +if [ $ret -ne 0 ]; then + echo "SKIP: conflicting mac address" + exit "$ksft_skip" +fi + +set_pair_link up $WAN $LAN1 $LAN2 +ret=$((ret | $?)) +if [ $ret -ne 0 ]; then + echo "SKIP: setting (v)eth pairs link up failed" + exit "$ksft_skip" +fi + +i=$WAN +ip -net "$nsrt" link set "${vethrt[$i]}" master $BRWAN +set_client $i none +add_addr $ADWAN "$BRWAN" + +family="bridge" +if ! setup_nftables $LAN1 $LAN2 2>/dev/null; then + echo "INFO: Cannot add nftables table $family" + tests[1]=""; tests[2]="" +fi +family="inet" +if ! setup_nftables $WAN $LAN1 2>/dev/null; then + echo "INFO: Cannot add nftables table $family" + tests[3]=""; tests[4]="" +fi + +### End Initial Setup ### + +if [ -n "${tests[1]}" ]; then + # Setup brlan as vlan unaware bridge + family="bridge" + ip -net "$nsrt" link add $BRLAN type bridge + ip -net "$nsrt" link set $BRLAN up + for i in $LAN1 $LAN2; do + ip -net "$nsrt" link set "${vethrt[$i]}" master $BRLAN + done + test_unaware_bridge + ret=$((ret | $?)) + ip -net "$nsrt" link del $BRLAN type bridge +fi + +if [ -n "${tests[2]}" ] || [ -n "${tests[3]}" ] || [ -n "${tests[4]}" ]; then + # Setup brlan as vlan aware bridge + family="bridge" + + ip -net "$nsrt" link add $BRLAN type bridge vlan_filtering 1 vlan_default_pvid 0 + ip -net "$nsrt" link set $BRLAN up + bridge -net "$nsrt" vlan add dev $BRLAN vid $VID1 pvid untagged self + add_addr $ADLAN "$BRLAN" + for i in $LAN1 $LAN2; do + ip -net "$nsrt" link set "${vethrt[$i]}" master $BRLAN + done + + if [ -n "${tests[2]}" ]; then + test_aware_bridge + ret=$((ret | $?)) + fi + + family="inet" + + if [ -n "${tests[3]}" ]; then + test_forward_without_vlandev + ret=$((ret | $?)) + fi + + if [ -n "${tests[4]}" ]; then + # Setup vlan-device linked to brlan master port + del_addr $ADLAN "$BRLAN" + ip -net "$nsrt" link set $BRLAN down + bridge -net "$nsrt" vlan del dev $BRLAN vid $VID1 pvid untagged self + bridge -net "$nsrt" vlan add dev $BRLAN vid $VID1 self + ip -net "$nsrt" link add link $BRLAN name $BRLAN.$VID1 type vlan id $VID1 + ip -net "$nsrt" link set $BRLAN up + ip -net "$nsrt" link set "$BRLAN.$VID1" up + add_addr $ADLAN "$BRLAN.$VID1" + test_forward_with_vlandev + ret=$((ret | $?)) + fi + + ip -net "$nsrt" link del $BRLAN type bridge +fi + +### Finish tests ### + +ip -net "$nsrt" link del $BRWAN type bridge + +for i in $WAN $LAN1 $LAN2; do + unset_client "$i" +done + +set_pair_link down $WAN $LAN1 $LAN2 + +for i in $WAN $LAN1 $LAN2; do + ns="${nsa[$i]}" + if [[ "${vethcl[$i]:0:4}" != "veth" ]]; then + ip netns exec "$ns" ip link set "${vethcl[$i]}" netns 1 + fi + if [[ "${vethrt[$i]:0:4}" != "veth" ]]; then + ip netns exec "$nsrt" ip link set "${vethrt[$i]}" netns 1 + fi +done + +if [ $ret -eq 0 ]; then + echo "PASS: all tests passed" +else + echo "ERROR: bridge fastpath test has failed" +fi + +exit $ret -- 2.50.0

2 months, 1 week

1
0
0 0

[PATCH v2 7/7] KVM: LoongArch: selftests: Add time counter test

by Bibo Mao

With time counter test, it is to verify that time count starts from 0 and always grows up then. Signed-off-by: Bibo Mao <maobibo(a)loongson.cn> --- .../selftests/kvm/lib/loongarch/processor.c | 9 ++++++ .../selftests/kvm/loongarch/arch_timer.c | 29 +++++++++++++++++++ 2 files changed, 38 insertions(+) diff --git a/tools/testing/selftests/kvm/lib/loongarch/processor.c b/tools/testing/selftests/kvm/lib/loongarch/processor.c index 436990258068..ac2ffd076bff 100644 --- a/tools/testing/selftests/kvm/lib/loongarch/processor.c +++ b/tools/testing/selftests/kvm/lib/loongarch/processor.c @@ -3,6 +3,7 @@ #include <assert.h> #include <linux/compiler.h> +#include <asm/kvm.h> #include "kvm_util.h" #include "processor.h" #include "ucall_common.h" @@ -256,6 +257,11 @@ static void loongarch_set_csr(struct kvm_vcpu *vcpu, uint64_t id, uint64_t val) __vcpu_set_reg(vcpu, csrid, val); } +static void loongarch_set_reg(struct kvm_vcpu *vcpu, uint64_t id, uint64_t val) +{ + __vcpu_set_reg(vcpu, id, val); +} + static void loongarch_vcpu_setup(struct kvm_vcpu *vcpu) { int width; @@ -279,6 +285,9 @@ static void loongarch_vcpu_setup(struct kvm_vcpu *vcpu) loongarch_set_csr(vcpu, LOONGARCH_CSR_ECFG, 0); loongarch_set_csr(vcpu, LOONGARCH_CSR_TCFG, 0); loongarch_set_csr(vcpu, LOONGARCH_CSR_ASID, 1); + /* time count start from 0 */ + val = 0; + loongarch_set_reg(vcpu, KVM_REG_LOONGARCH_COUNTER, val); val = 0; width = vm->page_shift - 3; diff --git a/tools/testing/selftests/kvm/loongarch/arch_timer.c b/tools/testing/selftests/kvm/loongarch/arch_timer.c index 579132a082cd..f3a25a0163fc 100644 --- a/tools/testing/selftests/kvm/loongarch/arch_timer.c +++ b/tools/testing/selftests/kvm/loongarch/arch_timer.c @@ -133,10 +133,39 @@ static void guest_test_emulate_timer(uint32_t cpu) local_irq_enable(); } +static void guest_time_count_test(uint32_t cpu) +{ + uint32_t config_iter; + unsigned long start, end, prev, us; + + /* Assuming that test case starts to run in 1 second */ + start = timer_get_cycles(); + us = msec_to_cycles(1000); + __GUEST_ASSERT(start <= us, + "start = 0x%lx, us = 0x%lx.\n", + start, us); + + us = msec_to_cycles(test_args.timer_period_ms); + for (config_iter = 0; config_iter < test_args.nr_iter; config_iter++) { + start = timer_get_cycles(); + end = start + us; + /* test time count growing up always */ + while (start < end) { + prev = start; + start = timer_get_cycles(); + __GUEST_ASSERT(prev <= start, + "prev = 0x%lx, start = 0x%lx.\n", + prev, start); + } + } +} + static void guest_code(void) { uint32_t cpu = guest_get_vcpuid(); + /* must run at first */ + guest_time_count_test(cpu); timer_irq_enable(); local_irq_enable(); guest_test_oneshot_timer(cpu); -- 2.39.3

2 months, 1 week

1
0
0 0

[PATCH v2 6/7] KVM: LoongArch: selftests: Add SW emulated timer test

by Bibo Mao

This test case setup one-shot timer and execute idle instruction immediately to indicate giving up CPU, hypervisor will emulate SW hrtimer and wakeup vCPU when SW hrtimer is fired. Signed-off-by: Bibo Mao <maobibo(a)loongson.cn> --- .../selftests/kvm/loongarch/arch_timer.c | 40 +++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/tools/testing/selftests/kvm/loongarch/arch_timer.c b/tools/testing/selftests/kvm/loongarch/arch_timer.c index a4a39f24bb7e..579132a082cd 100644 --- a/tools/testing/selftests/kvm/loongarch/arch_timer.c +++ b/tools/testing/selftests/kvm/loongarch/arch_timer.c @@ -94,6 +94,45 @@ static void guest_test_period_timer(uint32_t cpu) irq_iter); } +static void do_idle(void) +{ + unsigned int intid; + unsigned long estat; + + __asm__ __volatile__("idle 0" : : : "memory"); + + estat = csr_read(LOONGARCH_CSR_ESTAT); + intid = !!(estat & BIT(INT_TI)); + + /* Make sure pending timer IRQ arrived */ + GUEST_ASSERT_EQ(intid, 1); + csr_write(CSR_TINTCLR_TI, LOONGARCH_CSR_TINTCLR); +} + +static void guest_test_emulate_timer(uint32_t cpu) +{ + uint32_t config_iter; + uint64_t xcnt_diff_us, us; + struct test_vcpu_shared_data *shared_data = &vcpu_shared_data[cpu]; + + local_irq_disable(); + shared_data->nr_iter = 0; + us = msecs_to_usecs(test_args.timer_period_ms); + for (config_iter = 0; config_iter < test_args.nr_iter; config_iter++) { + shared_data->xcnt = timer_get_cycles(); + + /* Setup the next interrupt */ + timer_set_next_cmp_ms(test_args.timer_period_ms, false); + do_idle(); + + xcnt_diff_us = cycles_to_usec(timer_get_cycles() - shared_data->xcnt); + __GUEST_ASSERT(xcnt_diff_us >= us, + "xcnt_diff_us = 0x%lx, us = 0x%lx.\n", + xcnt_diff_us, us); + } + local_irq_enable(); +} + static void guest_code(void) { uint32_t cpu = guest_get_vcpuid(); @@ -102,6 +141,7 @@ static void guest_code(void) local_irq_enable(); guest_test_oneshot_timer(cpu); guest_test_period_timer(cpu); + guest_test_emulate_timer(cpu); GUEST_DONE(); } -- 2.39.3

2 months, 1 week

1
0
0 0

[PATCH v2 5/7] KVM: LoongArch: selftests: Add period mode timer test

by Bibo Mao

Period mode timer is added. Timer only need program once with period mode, its compared tick value will reload when timer is fired. Signed-off-by: Bibo Mao <maobibo(a)loongson.cn> --- .../kvm/include/loongarch/arch_timer.h | 5 ++++ .../selftests/kvm/loongarch/arch_timer.c | 28 +++++++++++++++++++ 2 files changed, 33 insertions(+) diff --git a/tools/testing/selftests/kvm/include/loongarch/arch_timer.h b/tools/testing/selftests/kvm/include/loongarch/arch_timer.h index 94b1cba2744d..b6399e748f72 100644 --- a/tools/testing/selftests/kvm/include/loongarch/arch_timer.h +++ b/tools/testing/selftests/kvm/include/loongarch/arch_timer.h @@ -36,6 +36,11 @@ static inline void timer_set_next_cmp_ms(unsigned int msec, bool period) csr_write(val, LOONGARCH_CSR_TCFG); } +static inline void disable_timer(void) +{ + csr_write(0, LOONGARCH_CSR_TCFG); +} + static inline unsigned long timer_get_val(void) { return csr_read(LOONGARCH_CSR_TVAL); diff --git a/tools/testing/selftests/kvm/loongarch/arch_timer.c b/tools/testing/selftests/kvm/loongarch/arch_timer.c index 2a2cebcf3885..a4a39f24bb7e 100644 --- a/tools/testing/selftests/kvm/loongarch/arch_timer.c +++ b/tools/testing/selftests/kvm/loongarch/arch_timer.c @@ -23,6 +23,13 @@ static void guest_irq_handler(struct ex_regs *regs) GUEST_ASSERT_EQ(intid, 1); cfg = timer_get_cfg(); + if (cfg & CSR_TCFG_PERIOD) { + WRITE_ONCE(shared_data->nr_iter, shared_data->nr_iter - 1); + if (shared_data->nr_iter == 0) + disable_timer(); + csr_write(CSR_TINTCLR_TI, LOONGARCH_CSR_TINTCLR); + return; + } /* * On physical machine, value of LOONGARCH_CSR_TVAL is BIT_ULL(48) - 1 @@ -67,6 +74,26 @@ static void guest_test_oneshot_timer(uint32_t cpu) } } +static void guest_test_period_timer(uint32_t cpu) +{ + uint32_t irq_iter; + uint64_t us; + struct test_vcpu_shared_data *shared_data = &vcpu_shared_data[cpu]; + + shared_data->nr_iter = test_args.nr_iter; + shared_data->xcnt = timer_get_cycles(); + us = msecs_to_usecs(test_args.timer_period_ms) + test_args.timer_err_margin_us; + timer_set_next_cmp_ms(test_args.timer_period_ms, true); + /* Setup a timeout for the interrupt to arrive */ + udelay(us * test_args.nr_iter); + irq_iter = READ_ONCE(shared_data->nr_iter); + __GUEST_ASSERT(irq_iter == 0, + "irq_iter = 0x%x.\n" + " Guest period timer interrupt was not triggered within the specified\n" + " interval, try to increase the error margin by [-e] option.\n", + irq_iter); +} + static void guest_code(void) { uint32_t cpu = guest_get_vcpuid(); @@ -74,6 +101,7 @@ static void guest_code(void) timer_irq_enable(); local_irq_enable(); guest_test_oneshot_timer(cpu); + guest_test_period_timer(cpu); GUEST_DONE(); } -- 2.39.3

2 months, 1 week

1
0
0 0

[PATCH v2 2/7] KVM: LoongArch: selftests: Add exception handler register interface

by Bibo Mao

Add interrupt and exception handler register interface. When exception happens, execute registered exception handler if exists, else report error. Signed-off-by: Bibo Mao <maobibo(a)loongson.cn> --- .../kvm/include/loongarch/processor.h | 14 +++++++++ .../selftests/kvm/lib/loongarch/processor.c | 29 +++++++++++++++++++ 2 files changed, 43 insertions(+) diff --git a/tools/testing/selftests/kvm/include/loongarch/processor.h b/tools/testing/selftests/kvm/include/loongarch/processor.h index 374caddfb0db..a18ac7bff303 100644 --- a/tools/testing/selftests/kvm/include/loongarch/processor.h +++ b/tools/testing/selftests/kvm/include/loongarch/processor.h @@ -84,6 +84,11 @@ #define LOONGARCH_CSR_EUEN 0x2 #define LOONGARCH_CSR_ECFG 0x4 #define LOONGARCH_CSR_ESTAT 0x5 /* Exception status */ +#define CSR_ESTAT_EXC_SHIFT 16 +#define CSR_ESTAT_EXC_WIDTH 6 +#define CSR_ESTAT_EXC (0x3f << CSR_ESTAT_EXC_SHIFT) +#define EXCCODE_INT 0 /* Interrupt */ +#define INT_TI 11 /* Timer interrupt*/ #define LOONGARCH_CSR_ERA 0x6 /* ERA */ #define LOONGARCH_CSR_BADV 0x7 /* Bad virtual address */ #define LOONGARCH_CSR_EENTRY 0xc @@ -133,6 +138,15 @@ struct ex_regs { #define PRMD_OFFSET_EXREGS offsetof(struct ex_regs, prmd) #define EXREGS_SIZE sizeof(struct ex_regs) +#define VECTOR_NUM 64 +typedef void(*handler_fn)(struct ex_regs *); +struct handlers { + handler_fn exception_handlers[VECTOR_NUM]; +}; + +void vm_init_descriptor_tables(struct kvm_vm *vm); +void vm_install_exception_handler(struct kvm_vm *vm, int vector, handler_fn handler); + #else #define PC_OFFSET_EXREGS ((EXREGS_GPRS + 0) * 8) #define ESTAT_OFFSET_EXREGS ((EXREGS_GPRS + 1) * 8) diff --git a/tools/testing/selftests/kvm/lib/loongarch/processor.c b/tools/testing/selftests/kvm/lib/loongarch/processor.c index 0ac1abcb71cb..be537c5ff74e 100644 --- a/tools/testing/selftests/kvm/lib/loongarch/processor.c +++ b/tools/testing/selftests/kvm/lib/loongarch/processor.c @@ -11,6 +11,7 @@ #define LOONGARCH_GUEST_STACK_VADDR_MIN 0x200000 static vm_paddr_t invalid_pgtable[4]; +static vm_vaddr_t exception_handlers; static uint64_t virt_pte_index(struct kvm_vm *vm, vm_vaddr_t gva, int level) { @@ -184,6 +185,13 @@ void assert_on_unhandled_exception(struct kvm_vcpu *vcpu) void route_exception(struct ex_regs *regs) { unsigned long pc, estat, badv; + int vector; + struct handlers *handlers; + + handlers = (struct handlers *)exception_handlers; + vector = (regs->estat & CSR_ESTAT_EXC) >> CSR_ESTAT_EXC_SHIFT; + if (handlers && handlers->exception_handlers[vector]) + return handlers->exception_handlers[vector](regs); pc = regs->pc; badv = regs->badv; @@ -192,6 +200,27 @@ void route_exception(struct ex_regs *regs) while (1) ; } +void vm_init_descriptor_tables(struct kvm_vm *vm) +{ + void *addr; + + vm->handlers = __vm_vaddr_alloc(vm, sizeof(struct handlers), + LOONGARCH_GUEST_STACK_VADDR_MIN, MEM_REGION_DATA); + + addr = addr_gva2hva(vm, vm->handlers); + memset(addr, 0, vm->page_size); + exception_handlers = vm->handlers; + sync_global_to_guest(vm, exception_handlers); +} + +void vm_install_exception_handler(struct kvm_vm *vm, int vector, handler_fn handler) +{ + struct handlers *handlers = addr_gva2hva(vm, vm->handlers); + + assert(vector < VECTOR_NUM); + handlers->exception_handlers[vector] = handler; +} + void vcpu_args_set(struct kvm_vcpu *vcpu, unsigned int num, ...) { int i; -- 2.39.3

2 months, 1 week

1
0
0 0

[PATCH v2 1/7] KVM: LoongArch: selftests: Add system registers save and restore on exception

by Bibo Mao

When system returns from exception with ertn instruction, PC comes from LOONGARCH_CSR_ERA, and CSR_CRMD comes LOONGARCH_CSR_PRMD. Here save CSR register CSR_ERA and CSR_PRMD in stack, and restore them from stack. So it can be modified by exception handler in future. Signed-off-by: Bibo Mao <maobibo(a)loongson.cn> --- tools/testing/selftests/kvm/include/loongarch/processor.h | 5 ++++- tools/testing/selftests/kvm/lib/loongarch/exception.S | 6 ++++++ 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/kvm/include/loongarch/processor.h b/tools/testing/selftests/kvm/include/loongarch/processor.h index 6427a3275e6a..374caddfb0db 100644 --- a/tools/testing/selftests/kvm/include/loongarch/processor.h +++ b/tools/testing/selftests/kvm/include/loongarch/processor.h @@ -124,18 +124,21 @@ struct ex_regs { unsigned long pc; unsigned long estat; unsigned long badv; + unsigned long prmd; }; #define PC_OFFSET_EXREGS offsetof(struct ex_regs, pc) #define ESTAT_OFFSET_EXREGS offsetof(struct ex_regs, estat) #define BADV_OFFSET_EXREGS offsetof(struct ex_regs, badv) +#define PRMD_OFFSET_EXREGS offsetof(struct ex_regs, prmd) #define EXREGS_SIZE sizeof(struct ex_regs) #else #define PC_OFFSET_EXREGS ((EXREGS_GPRS + 0) * 8) #define ESTAT_OFFSET_EXREGS ((EXREGS_GPRS + 1) * 8) #define BADV_OFFSET_EXREGS ((EXREGS_GPRS + 2) * 8) -#define EXREGS_SIZE ((EXREGS_GPRS + 3) * 8) +#define PRMD_OFFSET_EXREGS ((EXREGS_GPRS + 3) * 8) +#define EXREGS_SIZE ((EXREGS_GPRS + 4) * 8) #endif #endif /* SELFTEST_KVM_PROCESSOR_H */ diff --git a/tools/testing/selftests/kvm/lib/loongarch/exception.S b/tools/testing/selftests/kvm/lib/loongarch/exception.S index 88bfa505c6f5..3f1e4b67c5ae 100644 --- a/tools/testing/selftests/kvm/lib/loongarch/exception.S +++ b/tools/testing/selftests/kvm/lib/loongarch/exception.S @@ -51,9 +51,15 @@ handle_exception: st.d t0, sp, ESTAT_OFFSET_EXREGS csrrd t0, LOONGARCH_CSR_BADV st.d t0, sp, BADV_OFFSET_EXREGS + csrrd t0, LOONGARCH_CSR_PRMD + st.d t0, sp, PRMD_OFFSET_EXREGS or a0, sp, zero bl route_exception + ld.d t0, sp, PC_OFFSET_EXREGS + csrwr t0, LOONGARCH_CSR_ERA + ld.d t0, sp, PRMD_OFFSET_EXREGS + csrwr t0, LOONGARCH_CSR_PRMD restore_gprs sp csrrd sp, LOONGARCH_CSR_KS0 ertn -- 2.39.3

2 months, 1 week

1
0
0 0

[PATCH v10 0/9] support FEAT_LSUI

by Yeoreum Yun

Since Armv9.6, FEAT_LSUI supplies the load/store instructions for previleged level to access to access user memory without clearing PSTATE.PAN bit. This patchset support FEAT_LSUI and applies in futex atomic operation and user_swpX emulation where can replace from ldxr/st{l}xr pair implmentation with clearing PSTATE.PAN bit to correspondant load/store unprevileged atomic operation without clearing PSTATE.PAN bit. Patch Sequences ================ Patch #1 adds cpufeature for FEAT_LSUI Patch #2-#3 expose FEAT_LSUI to guest Patch #4 adds Kconfig for FEAT_LSUI Patch #5-#6 support futex atomic-op with FEAT_LSUI Patch #7-#9 support user_swpX emulation with FEAT_LSUI Patch History ============== from v9 to v10: - apply FEAT_LSUI to user_swpX emulation. - add test coverage for LSUI bit in ID_AA64ISAR3_EL1 - rebase to v6.18-rc4 - https://lore.kernel.org/all/20250922102244.2068414-1-yeoreum.yun@arm.com/ from v8 to v9: - refotoring __lsui_cmpxchg64() - rebase to v6.17-rc7 - https://lore.kernel.org/all/20250917110838.917281-1-yeoreum.yun@arm.com/ from v7 to v8: - implements futex_atomic_eor() and futex_atomic_cmpxchg() with casalt with C helper. - Drop the small optimisation on ll/sc futex_atomic_set operation. - modify some commit message. - https://lore.kernel.org/all/20250816151929.197589-1-yeoreum.yun@arm.com/ from v6 to v7: - wrap FEAT_LSUI with CONFIG_AS_HAS_LSUI in cpufeature - remove unnecessary addition of indentation. - remove unnecessary mte_tco_enable()/disable() on LSUI operation. - https://lore.kernel.org/all/20250811163635.1562145-1-yeoreum.yun@arm.com/ from v5 to v6: - rebase to v6.17-rc1 - https://lore.kernel.org/all/20250722121956.1509403-1-yeoreum.yun@arm.com/ from v4 to v5: - remove futex_ll_sc.h futext_lsui and lsui.h and move them to futex.h - reorganize the patches. - https://lore.kernel.org/all/20250721083618.2743569-1-yeoreum.yun@arm.com/ from v3 to v4: - rebase to v6.16-rc7 - modify some patch's title. - https://lore.kernel.org/all/20250617183635.1266015-1-yeoreum.yun@arm.com/ from v2 to v3: - expose FEAT_LUSI to guest - add help section for LUSI Kconfig - https://lore.kernel.org/all/20250611151154.46362-1-yeoreum.yun@arm.com/ from v1 to v2: - remove empty v9.6 menu entry - locate HAS_LUSI in cpucaps in order - https://lore.kernel.org/all/20250611104916.10636-1-yeoreum.yun@arm.com/ Yeoreum Yun (9): arm64: cpufeature: add FEAT_LSUI KVM: arm64: expose FEAT_LSUI to guest KVM: arm64: kselftest: set_id_regs: add test for FEAT_LSUI arm64: Kconfig: Detect toolchain support for LSUI arm64: futex: refactor futex atomic operation arm64: futex: support futex with FEAT_LSUI arm64: separate common LSUI definitions into lsui.h arm64: armv8_deprecated: convert user_swpX to inline function arm64: armv8_deprecated: apply FEAT_LSUI for swpX emulation. arch/arm64/Kconfig | 5 + arch/arm64/include/asm/futex.h | 291 +++++++++++++++--- arch/arm64/include/asm/lsui.h | 25 ++ arch/arm64/kernel/armv8_deprecated.c | 86 +++++- arch/arm64/kernel/cpufeature.c | 10 + arch/arm64/kvm/sys_regs.c | 3 +- arch/arm64/tools/cpucaps | 1 + .../testing/selftests/kvm/arm64/set_id_regs.c | 1 + 8 files changed, 360 insertions(+), 62 deletions(-) create mode 100644 arch/arm64/include/asm/lsui.h base-commit: 6146a0f1dfae5d37442a9ddcba012add260bceb0 -- LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}

2 months, 1 week

2
13
0 0

[PATCH] kunit: Implement ftrace-based stubbing

by Eddie Phillips

Allow function redirection using ftrace. This is basically equivalent to the static_stub support in the previous patch, but does not require the function being replaced to be modified (save for the addition of KUNIT_STUBBABLE/noinline). This is hidden behind the CONFIG_KUNIT_FTRACE_STUBS option, and has a number of dependencies, including ftrace and CONFIG_KALLSYMS_ALL. As a result, it only works on architectures where these are available. You can run the KUnit example tests with the following: $ ./tools/testing/kunit/kunit.py run --kunitconfig lib/kunit/stubs_example.kunitconfig --arch=x86_64 To the end user, replacing a function is very simple, e.g. KUNIT_STUBBABLE void real_func(int n); void replacement_func(int n); /* in tests */ kunit_activate_ftrace_stub(test, real_func, replacement_func); The implementation is inspired by Steven's snippet here [1]. Some more details: * stubbing is automatically undone at the end of tests * it can also be manually undone with kunit_deactive_ftrace_stub() * stubbing only applies when current->kunit_test == test * note: currently can't have more than one test running at a time * KUNIT_STUBBABLE marks tests as noinline when CONFIG_KUNIT_STUBS is set * this ensures we can actually stub all calls * KUNIT_STUBBABLE_TRAMPOLINE is a version that evaluates to __always_inline when stubbing is not enabled * This may need to be used with a wrapper function. * See the doc comment for more details. Sharp-edges: * kernel livepatch only works on some arches (not UML) * if you don't use noinline/KUNIT_STUBBABLE, functions might be inlined and thus none of this works: * if it's always inlined, at least the attempt to stub will fail * if it's sometimes inlined, then the stub silently won't work [1] https://lore.kernel.org/lkml/20220224091550.2b7e8784@gandalf.local.home Co-developed-by: Daniel Latypov <dlatypov(a)google.com> Signed-off-by: Eddie Phillips <eddiephillips(a)google.com> --- Link to original: https://lore.kernel.org/all/20220910212804.670622-3-davidgow@google.com/ include/kunit/ftrace_stub.h | 84 ++++++++++++++++ lib/kunit/Kconfig | 11 +++ lib/kunit/Makefile | 4 + lib/kunit/ftrace_stub.c | 146 ++++++++++++++++++++++++++++ lib/kunit/kunit-example-test.c | 29 +++++- lib/kunit/stubs_example.kunitconfig | 10 ++ 6 files changed, 282 insertions(+), 2 deletions(-) create mode 100644 include/kunit/ftrace_stub.h create mode 100644 lib/kunit/ftrace_stub.c create mode 100644 lib/kunit/stubs_example.kunitconfig diff --git a/include/kunit/ftrace_stub.h b/include/kunit/ftrace_stub.h new file mode 100644 index 000000000000..bfd57ea6289c --- /dev/null +++ b/include/kunit/ftrace_stub.h @@ -0,0 +1,84 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _KUNIT_FTRACE_STUB_H +#define _KUNIT_FTRACE_STUB_H + +/** KUNIT_STUBBABLE - marks a function as stubbable when stubbing support is + * enabled. + * + * Stubbing uses ftrace internally, so we can only stub out functions when they + * are not inlined. This macro eavlautes to noinline when stubbing support is + * enabled to thus make it safe. + * + * If you cannot add this annotation to the function, you can instead use + * KUNIT_STUBBABLE_TRAMPOLINE, which is the same, but evaluates to + * __always_inline when stubbing is not enabled. + * + * Consider copy_to_user, which is marked as __always_inline: + * + * .. code-block:: c + * static KUNIT_STUBBABLE_TRAMPOLINE unsigned long + * copy_to_user_trampoline(void __user *to, const void *from, unsigned long n) + * { + * return copy_to_user(to, from, n); + * } + * + * Then we simply need to update our code to go through this function instead + * (in the places where we want to stub it out). + */ +#if IS_ENABLED(CONFIG_KUNIT_FTRACE_STUBS) +#define KUNIT_STUBBABLE noinline +#define KUNIT_STUBBABLE_TRAMPOLINE noinline +#else +#define KUNIT_STUBBABLE +#define KUNIT_STUBBABLE_TRAMPOLINE __always_inline +#endif + +struct kunit; + +/** + * kunit_activate_ftrace_stub() - makes all calls to @func go to @replacement during @test. + * @test: The test context object. + * @func: The function to stub out, must be annotated with KUNIT_STUBBABLE. + * @replacement: The function to replace @func with. + * + * All calls to @func will instead call @replacement for the duration of the + * current test. If called from outside the test's thread, the function will + * not be redirected. + * + * The redirection can be disabled again with kunit_deactivate_ftrace_stub(). + * + * Example: + * + * .. code-block:: c + * KUNIT_STUBBABLE int real_func(int n) + * { + * pr_info("real_func() called with %d", n); + * return 0; + * } + * + * void replacement_func(int n) + * { + * pr_info("replacement_func() called with %d", n); + * return 42; + * } + * + * void example_test(struct kunit *test) + * { + * kunit_active_ftrace_stub(test, real_func, replacement_func); + * KUNIT_EXPECT_EQ(test, real_func(1), 42); + * } + * + */ +#define kunit_activate_ftrace_stub(test, real_fn_addr, replacement_addr) do { \ + typecheck_fn(typeof(&replacement_addr), real_fn_addr); \ + __kunit_activate_ftrace_stub(test, #real_fn_addr, real_fn_addr, replacement_addr); \ +} while (0) + +void __kunit_activate_ftrace_stub(struct kunit *test, + const char *name, + void *real_fn_addr, + void *replacement_addr); + + +void kunit_deactivate_ftrace_stub(struct kunit *test, void *real_fn_addr); +#endif /* _KUNIT_STUB_H */ diff --git a/lib/kunit/Kconfig b/lib/kunit/Kconfig index 7a6af361d2fc..8a629017b917 100644 --- a/lib/kunit/Kconfig +++ b/lib/kunit/Kconfig @@ -70,6 +70,17 @@ config KUNIT_ALL_TESTS If unsure, say N. +config KUNIT_FTRACE_STUBS + bool "Support for stubbing out functions in KUnit tests with ftrace and kernel livepatch" + depends on FTRACE=y && FUNCTION_TRACER=y && MODULES=y && DEBUG_KERNEL=y && KALLSYMS_ALL=y + help + Builds support for stubbing out functions for the duration of KUnit + test cases or suites using ftrace. + See KUNIT_EXAMPLE_TEST for an example. + + NOTE: this does not work on all architectures (like UML) and + relies on a lot of magic (see the dependencies list). + config KUNIT_DEFAULT_ENABLED bool "Default value of kunit.enable" default y diff --git a/lib/kunit/Makefile b/lib/kunit/Makefile index 656f1fa35abc..f04f6ea4d6a8 100644 --- a/lib/kunit/Makefile +++ b/lib/kunit/Makefile @@ -29,3 +29,7 @@ obj-$(CONFIG_KUNIT_TEST) += assert_test.o endif obj-$(CONFIG_KUNIT_EXAMPLE_TEST) += kunit-example-test.o + +ifeq ($(CONFIG_KUNIT_FTRACE_STUBS),y) +kunit-objs += ftrace_stub.o +endif \ No newline at end of file diff --git a/lib/kunit/ftrace_stub.c b/lib/kunit/ftrace_stub.c new file mode 100644 index 000000000000..b19eaa35f5ed --- /dev/null +++ b/lib/kunit/ftrace_stub.c @@ -0,0 +1,146 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <kunit/ftrace_stub.h> +#include <kunit/test.h> + +#include <linux/typecheck.h> + +#include <linux/ftrace.h> +#include <linux/livepatch.h> +#include <linux/sched.h> + + +struct kunit_ftrace_stub_ctx { + struct kunit *test; + unsigned long real_fn_addr; /* used as a key to lookup the stub */ + unsigned long replacement_addr; + struct ftrace_ops ops; /* a copy of kunit_stub_base_ops with .private set */ +}; + +static void kunit_stub_trampoline(unsigned long ip, unsigned long parent_ip, + struct ftrace_ops *ops, + struct ftrace_regs *fregs) +{ + struct kunit_ftrace_stub_ctx *ctx = ops->private; + int lock_bit; + + if (current->kunit_test != ctx->test) + return; + + lock_bit = ftrace_test_recursion_trylock(ip, parent_ip); + KUNIT_ASSERT_GE(ctx->test, lock_bit, 0); + + ftrace_regs_set_instruction_pointer(fregs, ctx->replacement_addr); + + ftrace_test_recursion_unlock(lock_bit); +} + +static struct ftrace_ops kunit_stub_base_ops = { + .func = &kunit_stub_trampoline, + .flags = FTRACE_OPS_FL_IPMODIFY | +#ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS + FTRACE_OPS_FL_SAVE_REGS | +#endif + FTRACE_OPS_FL_DYNAMIC +}; + +static void __kunit_ftrace_stub_resource_free(struct kunit_resource *res) +{ + struct kunit_ftrace_stub_ctx *ctx = res->data; + + unregister_ftrace_function(&ctx->ops); + kfree(ctx); +} + +/* Matching function for kunit_find_resource(). match_data is real_fn_addr. */ +static bool __kunit_static_stub_resource_match(struct kunit *test, + struct kunit_resource *res, + void *match_real_fn_addr) +{ + /* This pointer is only valid if res is a static stub resource. */ + struct kunit_ftrace_stub_ctx *ctx = res->data; + + /* Make sure the resource is a static stub resource. */ + if (res->free != &__kunit_ftrace_stub_resource_free) + return false; + + return ctx->real_fn_addr == (unsigned long)match_real_fn_addr; +} + +void kunit_deactivate_ftrace_stub(struct kunit *test, void *real_fn_addr) +{ + struct kunit_resource *res; + + KUNIT_ASSERT_PTR_NE_MSG(test, real_fn_addr, NULL, + "Tried to deactivate a NULL stub."); + + /* Look up the existing stub for this function. */ + res = kunit_find_resource(test, + __kunit_static_stub_resource_match, + real_fn_addr); + + /* Error out if the stub doesn't exist. */ + KUNIT_ASSERT_PTR_NE_MSG(test, res, NULL, + "Tried to deactivate a nonexistent stub."); + + /* Free the stub. We 'put' twice, as we got a reference + * from kunit_find_resource(). The free function will deactivate the + * ftrace stub. + */ + kunit_remove_resource(test, res); + kunit_put_resource(res); +} +EXPORT_SYMBOL_GPL(kunit_deactivate_ftrace_stub); + +void __kunit_activate_ftrace_stub(struct kunit *test, + const char *name, + void *real_fn_addr, + void *replacement_addr) +{ + unsigned long ftrace_ip; + struct kunit_ftrace_stub_ctx *ctx; + int ret; + + ftrace_ip = ftrace_location((unsigned long)real_fn_addr); + if (!ftrace_ip) + KUNIT_FAIL_ASSERTION(test, KUNIT_ASSERTION, + "%s ip is invalid: not a function, or is marked notrace or inline", name); + + /* Allocate the stub context, which contains pointers to the replacement + * function and the test object. It's also registered as a KUnit + * resource which can be looked up by address (to deactivate manually) + * and is destroyed automatically on test exit. + */ + ctx = kmalloc(sizeof(*ctx), GFP_KERNEL); + KUNIT_ASSERT_PTR_NE_MSG(test, ctx, NULL, "failed to allocate kunit stub for %s", name); + + ctx->test = test; + ctx->ops = kunit_stub_base_ops; + ctx->ops.private = ctx; + ctx->real_fn_addr = (unsigned long)real_fn_addr; + ctx->replacement_addr = (unsigned long)replacement_addr; + + ret = ftrace_set_filter_ip(&ctx->ops, ftrace_ip, 0, 0); + if (ret) { + kfree(ctx); + KUNIT_FAIL_ASSERTION(test, KUNIT_ASSERTION, + "failed to set filter ip for %s: %d", name, ret); + } + + ret = register_ftrace_function(&ctx->ops); + if (ret) { + kfree(ctx); + if (ret == -EBUSY) + KUNIT_FAIL_ASSERTION( + test, KUNIT_ASSERTION, + "failed to register stub (-EBUSY) for %s, likely due to already stubbing it?", + name); + KUNIT_FAIL_ASSERTION(test, KUNIT_ASSERTION, + "failed to register stub for %s: %d", name, + ret); + } + + kunit_alloc_resource(test, NULL, + __kunit_ftrace_stub_resource_free, + GFP_KERNEL, ctx); +} +EXPORT_SYMBOL_GPL(__kunit_activate_ftrace_stub); diff --git a/lib/kunit/kunit-example-test.c b/lib/kunit/kunit-example-test.c index 9452b163956f..676ad552ae7b 100644 --- a/lib/kunit/kunit-example-test.c +++ b/lib/kunit/kunit-example-test.c @@ -6,8 +6,9 @@ * Author: Brendan Higgins <brendanhiggins(a)google.com> */ -#include <kunit/test.h> +#include <kunit/ftrace_stub.h> #include <kunit/static_stub.h> +#include <kunit/test.h> /* * This is the most fundamental element of KUnit, the test case. A test case @@ -152,7 +153,7 @@ static void example_all_expect_macros_test(struct kunit *test) } /* This is a function we'll replace with static stubs. */ -static int add_one(int i) +static KUNIT_STUBBABLE int add_one(int i) { /* This will trigger the stub if active. */ KUNIT_STATIC_STUB_REDIRECT(add_one, i); @@ -221,6 +222,29 @@ static void example_static_stub_using_fn_ptr_test(struct kunit *test) KUNIT_EXPECT_EQ(test, add_one(1), 2); } +/* + * This test shows the use of dynamic stubs. + */ +static void example_ftrace_stub_test(struct kunit *test) +{ +#if !IS_ENABLED(CONFIG_KUNIT_FTRACE_STUBS) + kunit_skip(test, "KUNIT_FTRACE_STUBS not enabled"); +#else + /* By default, function is not stubbed. */ + KUNIT_EXPECT_EQ(test, add_one(1), 2); + + /* Replace add_one() with subtract_one(). */ + kunit_activate_ftrace_stub(test, add_one, subtract_one); + + /* add_one() is now replaced. */ + KUNIT_EXPECT_EQ(test, add_one(1), 0); + + /* Return add_one() to normal. */ + kunit_deactivate_ftrace_stub(test, add_one); + KUNIT_EXPECT_EQ(test, add_one(1), 2); +#endif +} + static const struct example_param { int value; } example_params_array[] = { @@ -506,6 +530,7 @@ static struct kunit_case example_test_cases[] = { KUNIT_CASE(example_all_expect_macros_test), KUNIT_CASE(example_static_stub_test), KUNIT_CASE(example_static_stub_using_fn_ptr_test), + KUNIT_CASE(example_ftrace_stub_test), KUNIT_CASE(example_priv_test), KUNIT_CASE_PARAM(example_params_test, example_gen_params), KUNIT_CASE_PARAM_WITH_INIT(example_params_test_with_init, kunit_array_gen_params, diff --git a/lib/kunit/stubs_example.kunitconfig b/lib/kunit/stubs_example.kunitconfig new file mode 100644 index 000000000000..20af4da9bc75 --- /dev/null +++ b/lib/kunit/stubs_example.kunitconfig @@ -0,0 +1,10 @@ +CONFIG_KUNIT=y +CONFIG_KUNIT_FTRACE_STUBS=y +CONFIG_KUNIT_EXAMPLE_TEST=y + +# Depedencies +CONFIG_FTRACE=y +CONFIG_FUNCTION_TRACER=y +CONFIG_MODULES=y +CONFIG_DEBUG_KERNEL=y +CONFIG_KALLSYMS_ALL=y -- 2.51.1.851.g4ebd6896fd-goog

2 months, 1 week

1
0
0 0

[PATCH bpf-next v4 0/2] bpf: Skip bounds adjustment for conditional jumps on same scalar register

by KaFai Wan

This small patchset is about avoid verifier bug warning when conditional jumps on same register when the register holds a scalar with range. v4: - make code better. (Alexei) v3: https://lore.kernel.org/bpf/20251031154107.403054-1-kafai.wan@linux.dev/ - Enhance is_scalar_branch_taken() to handle scalar case. (Eduard) - Update the selftest to cover all conditional jump opcodes. (Eduard) v2: https://lore.kernel.org/bpf/20251025053017.2308823-1-kafai.wan@linux.dev/ - Enhance is_branch_taken() and is_scalar_branch_taken() to handle branch direction computation for same register. (Eduard and Alexei) - Update the selftest. v1: https://lore.kernel.org/bpf/20251022164457.1203756-1-kafai.wan@linux.dev/ --- KaFai Wan (2): bpf: Skip bounds adjustment for conditional jumps on same scalar register selftests/bpf: Add test for conditional jumps on same scalar register kernel/bpf/verifier.c | 31 ++++ .../selftests/bpf/progs/verifier_bounds.c | 154 ++++++++++++++++++ 2 files changed, 185 insertions(+) -- 2.43.0

2 months, 1 week

4
6
0 0

[PATCH v4 0/5] mm: Refactor and improve VMA count limit code

by Kalesh Singh

Hi all, This series refactors the VMA count limit code to improve clarity, test coverage, and observability. The VMA count limit, controlled by sysctl_max_map_count, is a safeguard that prevents a single process from consuming excessive kernel memory by creating too many memory mappings. A major change since v3 is the first patch in the series which instead of attempting to fix overshooting the limit now documents that this is the intended behavior. As Hugh pointed out, the lenient check (>) in do_mmap() and do_brk_flags() is intentional to allow for potential VMA merges or expansions when the process is at the sysctl_max_map_count limit. The consensus is that this historical behavior is correct but non-obvious. This series now focuses on making that behavior clear and the surrounding code more robust. Based on feedback from Lorenzo and David, this series retains the helper function and the rename of map_count. The refined v4 series is now structured as follows: 1. Documents the lenient VMA count checks with comments to clarify their purpose. 2. Adds a comprehensive selftest to codify the expected behavior at the limit, including the lenient mmap case. 3. Introduces max_vma_count() to abstract the max map count sysctl, making the sysctl static and converting all callers to use the new helper. 4. Renames mm_struct->map_count to the more explicit vma_count for better code clarity. 5. Adds a tracepoint for observability when a process fails to allocate a VMA due to the count limit. Tested on x86_64 and arm64: 1. Build test: allyesconfig for rename 2. Selftests: cd tools/testing/selftests/mm && \ make && \ ./run_vmtests.sh -t max_vma_count 3. vma tests: cd tools/testing/vma && \ make && \ ./vma Link to v3: https://lore.kernel.org/r/20251013235259.589015-1-kaleshsingh@google.com/ Thanks to everyone for the valuable discussion on previous revisions. -- Kalesh Kalesh Singh (5): mm: Document lenient map_count checks mm/selftests: add max_vma_count tests mm: Introduce max_vma_count() to abstract the max map count sysctl mm: rename mm_struct::map_count to vma_count mm/tracing: introduce trace_mm_insufficient_vma_slots event MAINTAINERS | 2 + fs/binfmt_elf.c | 2 +- fs/coredump.c | 2 +- include/linux/mm.h | 2 - include/linux/mm_types.h | 2 +- include/trace/events/vma.h | 32 + kernel/fork.c | 2 +- mm/debug.c | 2 +- mm/internal.h | 3 + mm/mmap.c | 25 +- mm/mremap.c | 13 +- mm/nommu.c | 8 +- mm/util.c | 1 - mm/vma.c | 42 +- mm/vma_internal.h | 2 + tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 1 + .../selftests/mm/max_vma_count_tests.c | 716 ++++++++++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 5 + tools/testing/vma/vma.c | 32 +- tools/testing/vma/vma_internal.h | 13 +- 21 files changed, 856 insertions(+), 52 deletions(-) create mode 100644 include/trace/events/vma.h create mode 100644 tools/testing/selftests/mm/max_vma_count_tests.c base-commit: b227c04932039bccc21a0a89cd6df50fa57e4716 -- 2.51.1.851.g4ebd6896fd-goog

2 months, 1 week

2
8
0 0

[PATCH bpf-next 0/3] selftests/bpf: small improvements on tc_tunnel

by Alexis Lothoré (eBPF Foundation)

Hello, this series is a small follow-up to the test_tc_tunnel recent integration, to address some small missing details raised during the final review ([1]). This is mostly about adding some missing checks on net namespaces management. [1] https://lore.kernel.org/bpf/1ac9d14e-4250-480c-b863-410be78ac6c6@linux.dev/ Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com> --- Alexis Lothoré (eBPF Foundation) (3): selftests/bpf: skip tc_tunnel subtest if its setup fails selftests/bpf: add checks in tc_tunnel when entering net namespaces selftests/bpf: use start_server_str rather than start_reuseport_server in tc_tunnel .../selftests/bpf/prog_tests/test_tc_tunnel.c | 162 ++++++++++++++------- 1 file changed, 107 insertions(+), 55 deletions(-) --- base-commit: 1e2d874b04ba46a3b9fe6697097aa437641f4339 change-id: 20251030-tc_tunnel_improv-6b9d1c22c6f6 Best regards, -- Alexis Lothoré, Bootlin Embedded Linux and Kernel engineering https://bootlin.com

2 months, 1 week

3
5
0 0

[PATCH net v3 0/3] mptcp: Fix conflicts between MPTCP and sockmap

by Jiayuan Chen

Overall, we encountered a warning [1] that can be triggered by running the selftest I provided. MPTCP creates subflows for data transmission between two endpoints. However, BPF can use sockops to perform additional operations when TCP completes the three-way handshake. The issue arose because we used sockmap in sockops, which replaces sk->sk_prot and some handlers. Since subflows also have their own specialized handlers, this creates a conflict and leads to traffic failure. Therefore, we need to reject operations targeting subflows. This patchset simply prevents the combination of subflows and sockmap without changing any functionality. A complete integration of MPTCP and sockmap would require more effort, for example, we would need to retrieve the parent socket from subflows in sockmap and implement handlers like read_skb. If maintainers don't object, we can further improve this in subsequent work. [1] truncated warning: [ 18.234652] ------------[ cut here ]------------ [ 18.234664] WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 mptcp_stream_accept+0x34c/0x380 [ 18.234726] Modules linked in: [ 18.234755] RIP: 0010:mptcp_stream_accept+0x34c/0x380 [ 18.234762] RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202 [ 18.234800] PKRU: 55555554 [ 18.234806] Call Trace: [ 18.234810] <TASK> [ 18.234837] do_accept+0xeb/0x190 [ 18.234861] ? __x64_sys_pselect6+0x61/0x80 [ 18.234898] ? _raw_spin_unlock+0x12/0x30 [ 18.234915] ? alloc_fd+0x11e/0x190 [ 18.234925] __sys_accept4+0x8c/0x100 [ 18.234930] __x64_sys_accept+0x1f/0x30 [ 18.234933] x64_sys_call+0x202f/0x20f0 [ 18.234966] do_syscall_64+0x72/0x9a0 [ 18.234979] ? switch_fpu_return+0x60/0xf0 [ 18.234993] ? irqentry_exit_to_user_mode+0xdb/0x1e0 [ 18.235002] ? irqentry_exit+0x3f/0x50 [ 18.235005] ? clear_bhb_loop+0x50/0xa0 [ 18.235022] ? clear_bhb_loop+0x50/0xa0 [ 18.235025] ? clear_bhb_loop+0x50/0xa0 [ 18.235028] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 18.235066] </TASK> [ 18.235109] ---[ end trace 0000000000000000 ]--- --- v2: https://lore.kernel.org/bpf/20251020060503.325369-1-jiayuan.chen@linux.dev/… Some advice suggested by Jakub Sitnicki v1: https://lore.kernel.org/mptcp/a0a2b87119a06c5ffaa51427a0964a05534fe6f1@linu… Some advice from Matthieu Baerts. Jiayuan Chen (3): net,mptcp: fix proto fallback detection with BPF sockmap bpf,sockmap: disallow MPTCP sockets from sockmap selftests/bpf: Add mptcp test with sockmap net/core/sock_map.c | 27 ++++ net/mptcp/protocol.c | 9 +- .../testing/selftests/bpf/prog_tests/mptcp.c | 150 ++++++++++++++++++ .../selftests/bpf/progs/mptcp_sockmap.c | 43 +++++ 4 files changed, 227 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/mptcp_sockmap.c -- 2.43.0

2 months, 1 week

3
16
0 0

[PATCH] selftests/timers: Skip some posix_timers tests on kernels < 6.13

by Wake Liu

Several tests in the posix_timers selftest fail on kernels older than 6.13. These tests check for timer behavior related to SIG_IGN, which was refactored in the 6.13 kernel cycle, notably by commit caf77435dd8a ("signal: Handle ignored signals in do_sigaction(action != SIG_IGN)"). To ensure the selftests pass on older, stable kernels, gate the affected tests with a ksft_min_kernel_version(6, 13) check. Signed-off-by: Wake Liu <wakel(a)google.com> --- tools/testing/selftests/timers/posix_timers.c | 21 +++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c index f0eceb0faf34..f228e51f8b58 100644 --- a/tools/testing/selftests/timers/posix_timers.c +++ b/tools/testing/selftests/timers/posix_timers.c @@ -256,6 +256,11 @@ static void *ignore_thread(void *arg) static void check_sig_ign(int thread) { + if (!ksft_min_kernel_version(6, 13)) { + // see caf77435dd8a + ksft_test_result_skip("Depends on refactor of posix timers in 6.13\n"); + return; + } struct tmrsig tsig = { }; struct itimerspec its; unsigned int tid = 0; @@ -342,6 +347,10 @@ static void check_sig_ign(int thread) static void check_rearm(void) { + if (!ksft_min_kernel_version(6, 13)) { + ksft_test_result_skip("Depends on refactor of posix timers in 6.13\n"); + return; + } struct tmrsig tsig = { }; struct itimerspec its; struct sigaction sa; @@ -398,6 +407,10 @@ static void check_rearm(void) static void check_delete(void) { + if (!ksft_min_kernel_version(6, 13)) { + ksft_test_result_skip("Depends on refactor of posix timers in 6.13\n"); + return; + } struct tmrsig tsig = { }; struct itimerspec its; struct sigaction sa; @@ -455,6 +468,10 @@ static inline int64_t calcdiff_ns(struct timespec t1, struct timespec t2) static void check_sigev_none(int which, const char *name) { + if (!ksft_min_kernel_version(6, 13)) { + ksft_test_result_skip("Depends on refactor of posix timers in 6.13\n"); + return; + } struct timespec start, now; struct itimerspec its; struct sigevent sev; @@ -493,6 +510,10 @@ static void check_sigev_none(int which, const char *name) static void check_gettime(int which, const char *name) { + if (!ksft_min_kernel_version(6, 13)) { + ksft_test_result_skip("Depends on refactor of posix timers in 6.13\n"); + return; + } struct itimerspec its, prev; struct timespec start, now; struct sigevent sev; -- 2.50.1.703.g449372360f-goog

2 months, 1 week

2
2
0 0

[PATCH bpf-next 0/4] selftests/bpf: convert test_tc_edt.sh into test_progs

by Alexis Lothoré (eBPF Foundation)

Hello, this is yet another conversion series, this time tackling the test_tc_edt.sh. This one was at the bottom of our list due to the fact that it is based on some bandwith measurement (and so, increasing the risk to make it flaky in CI), but here is an attempt anyway, as it also showcases a nice example of BPF-based rate shaping. The converted test roughly follows the original script logic, with two veths in two namespaces, a TCP connection between a client and a server, and the client pushing as much data as possible during a specific period. We then compute the effective data rate, shaped by the eBPF program, by reading the RX interface stats, and compare it to the target rate. The test passes if the measured rate is within a defined error margin. There are two knobs driving the robustness of the test in CI: - the test duration (the higher, the more precise is the effective rate) - the tolerated error margin The original test was configured with a 20s duration and a 1% error margin. The new test is configured with a 2s duration and a 2% error margin, to: - make the duration tolerable in CI - while keeping enough margin for rate measure fluctuations depending on the CI machines load This has been run multiple times locally to ensure that those values are sane, and once in CI before sending the series, but I suggest to let it live a few days in CI to see how it really behaves. Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com> --- Alexis Lothoré (eBPF Foundation) (4): selftests/bpf: rename test_tc_edt.bpf.c section to expose program type selftests/bpf: integrate test_tc_edt into test_progs selftests/bpf: remove test_tc_edt.sh selftests/bpf: do not hardcode target rate in test_tc_edt BPF program tools/testing/selftests/bpf/Makefile | 2 - .../testing/selftests/bpf/prog_tests/test_tc_edt.c | 274 +++++++++++++++++++++ tools/testing/selftests/bpf/progs/test_tc_edt.c | 9 +- tools/testing/selftests/bpf/test_tc_edt.sh | 100 -------- 4 files changed, 279 insertions(+), 106 deletions(-) --- base-commit: 1e2d874b04ba46a3b9fe6697097aa437641f4339 change-id: 20251030-tc_edt-3ea8e8d3d14e Best regards, -- Alexis Lothoré, Bootlin Embedded Linux and Kernel engineering https://bootlin.com

2 months, 1 week

4
7
0 0

[PATCH bpf-next v3 0/2] bpf: Skip bounds adjustment for conditional jumps on same scalar register

by KaFai Wan

This small patchset is about avoid verifier bug warning when conditional jumps on same register when the register holds a scalar with range. v3: - Enhance is_scalar_branch_taken() to handle scalar case. (Eduard) - Update the selftest to cover all conditional jump opcodes. (Eduard) v2: https://lore.kernel.org/bpf/20251025053017.2308823-1-kafai.wan@linux.dev/ - Enhance is_branch_taken() and is_scalar_branch_taken() to handle branch direction computation for same register. (Eduard and Alexei) - Update the selftest. v1: https://lore.kernel.org/bpf/20251022164457.1203756-1-kafai.wan@linux.dev/ --- KaFai Wan (2): bpf: Skip bounds adjustment for conditional jumps on same scalar register selftests/bpf: Add test for conditional jumps on same scalar register kernel/bpf/verifier.c | 33 ++++ .../selftests/bpf/progs/verifier_bounds.c | 154 ++++++++++++++++++ 2 files changed, 187 insertions(+) -- 2.43.0

2 months, 1 week

2
4
0 0

[PATCH 00/12] tools/nolibc: always use 64-bit ino_t, off_t and time-related types

by Thomas Weißschuh

nolibc currently uses 32-bit types for various APIs. These are problematic as their reduced value range can lead to truncated values. Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net> --- Thomas Weißschuh (12): tools/nolibc: use 64-bit ino_t tools/nolibc: handle 64-bit off_t for llseek tools/nolibc: prefer the llseek syscall tools/nolibc: use 64-bit off_t tools/nolibc: remove now superfluous overflow check in llseek tools/nolibc: remove more __nolibc_enosys() fallbacks tools/nolibc: prefer explicit 64-bit time-related system calls tools/nolibc: gettimeofday(): avoid libgcc 64-bit divisions tools/nolibc: use a custom struct timespec tools/nolibc: always use 64-bit time types selftests/nolibc: test compatibility of timespec and __kernel_timespec tools/nolibc: remove time conversions tools/include/nolibc/arch-s390.h | 3 + tools/include/nolibc/poll.h | 12 ++-- tools/include/nolibc/std.h | 6 +- tools/include/nolibc/sys.h | 21 +++--- tools/include/nolibc/sys/time.h | 2 +- tools/include/nolibc/sys/timerfd.h | 20 +----- tools/include/nolibc/time.h | 96 ++++++---------------------- tools/include/nolibc/types.h | 9 ++- tools/testing/selftests/nolibc/nolibc-test.c | 18 ++++++ 9 files changed, 68 insertions(+), 119 deletions(-) --- base-commit: 90ee85c0e1e4b5804ceebbd731653e10ef3849a6 change-id: 20251001-nolibc-uapi-types-1c072d10fcc7 Best regards, -- Thomas Weißschuh <linux(a)weissschuh.net>

2 months, 1 week

3
26
0 0

NEW PO 83199 Saturday, November 1, 2025 at 08:22:32 PM

by Procurement 05471

Hi Linux-kselftest, Please provide a quote for your products: Include: 1.Pricing (per unit) 2.Delivery cost & timeline 3.Quote expiry date Deadline: October Thanks! Danny Peddinti PathnSitu Trading

2 months, 1 week

1
0
0 0

[PATCH v2 0/2] Print map ID on successful creation

by Harshit Mogalapalli

Hi all, I have tried looking at an issue from the bpftool repository: https://github.com/libbpf/bpftool/issues/121 and this RFC tries to add that enhancement. Summary: Currently when a map creation is successful there is no message on the terminal, printing IDs on successful creation of maps can help notify the user and can be used in CI/CD. The first patch adds the logic for printing and the second patch adds a simple selftest for the same. The github issue is not fully solved with these two patches, as there are other bpf objects that might need similar additions. Would appreciate any inputs on this. Thank you very much. V1 --> V2: PATCH 1 updated [Thanks Yonghong for suggesting better way of error handling with a new label for close(fd); instead of calling multiple times] Regards, Harshit Harshit Mogalapalli (2): bpftool: Print map ID upon creation and support JSON output selftests/bpf: Add test for bpftool map ID printing tools/bpf/bpftool/map.c | 21 ++++++++--- .../testing/selftests/bpf/test_bpftool_map.sh | 36 +++++++++++++++++++ 2 files changed, 53 insertions(+), 4 deletions(-) -- 2.50.1

2 months, 1 week

3
8
0 0

[PATCH 00/22] mm/damon/tests: fix memory bugs in kunit tests

by SeongJae Park

DAMON kunit tests were initially written assuming those will be run on environments that are well controlled and therefore tolerant to transient test failures and bugs in the test code itself. The user-mode linux based manual run of the tests is one example of such an environment. And the test code was written for adding more test coverage as fast as possible, over making those safe and reliable. As a result, the tests resulted in having a number of bugs including real memory leaks, theoretical unhandled memory allocation failures, and unused memory allocations. The allocation failures that are not handled well are unlikely in the real world, since those allocations are too small to fail. But in theory, it can happen and cause inappropriate memory access. It is arguable if bugs in test code can really harm users. But, anyway bugs are bugs that need to be fixed. Fix the bugs one by one. Also Cc stable@ for the fixes of memory leak and unhandled memory allocation failures. The unused memory allocations are only a matter of memory efficiency, so not Cc-ing stable@. The first patch fixes memory leaks in the test code for the DAMON core layer. Following fifteen, three, and one patches respectively fix unhandled memory allocation failures in the test code for DAMON core layer, virtual address space DAMON operation set, and DAMON sysfs interface, one by one per test function. Final two patches remove memory allocations that are correctly deallocated at the end, but not really being used by any code. SeongJae Park (22): mm/damon/tests/core-kunit: fix memory leak in damon_test_set_filters_default_reject() mm/damon/tests/core-kunit: handle allocation failures in damon_test_regions() mm/damon/tests/core-kunit: handle memory failure from damon_test_target() mm/damon/tests/core-kunit: handle memory alloc failure from damon_test_aggregate() mm/damon/tests/core-kunit: handle alloc failures on damon_test_split_at() mm/damon/tests/core-kunit: handle alloc failures on damon_test_merge_two() mm/damon/tests/core-kunit: handle alloc failures on dasmon_test_merge_regions_of() mm/damon/tests/core-kunit: handle alloc failures on damon_test_split_regions_of() mm/damon/tests/core-kunit: handle alloc failures in damon_test_ops_registration() mm/damon/tests/core-kunit: handle alloc failures in damon_test_set_regions() mm/damon/tests/core-kunit: handle alloc failures in damon_test_update_monitoring_result() mm/damon/tests/core-kunit: handle alloc failure on damon_test_set_attrs() mm/damon/tests/core-kunit: handle alloc failres in damon_test_new_filter() mm/damon/tests/core-kunit: handle alloc failure on damos_test_commit_filter() mm/damon/tests/core-kunit: handle alloc failures on damos_test_filter_out() mm/damon/tests/core-kunit: handle alloc failures on damon_test_set_filters_default_reject() mm/damon/tests/vaddr-kunit: handle alloc failures on damon_do_test_apply_three_regions() mm/damon/tests/vaddr-kunit: handle alloc failures in damon_test_split_evenly_fail() mm/damon/tests/vaddr-kunit: handle alloc failures on damon_test_split_evenly_succ() mm/damon/tests/sysfs-kunit: handle alloc failures on damon_sysfs_test_add_targets() mm/damon/tests/core-kunit: remove unnecessary damon_ctx variable on damon_test_split_at() mm/damon/tests/core-kunit: remove unused ctx in damon_test_split_regions_of() mm/damon/tests/core-kunit.h | 125 ++++++++++++++++++++++++++++++++--- mm/damon/tests/sysfs-kunit.h | 25 +++++++ mm/damon/tests/vaddr-kunit.h | 26 +++++++- 3 files changed, 163 insertions(+), 13 deletions(-) base-commit: 75f0c76bb8c01fdea838a601dc3326b11177c0d8 -- 2.47.3

2 months, 1 week

1
22
0 0

[PATCH] selftests/user_events: Avoid taking address of packed member in perf_test

by Ankit Khushwaha

Accessing 'reg.write_index' directly triggers a -Waddress-of-packed-member warning due to potential unaligned pointer access: perf_test.c:239:38: warning: taking address of packed member 'write_index' of class or structure 'user_reg' may result in an unaligned pointer value [-Waddress-of-packed-member] 239 | ASSERT_NE(-1, write(self->data_fd, &reg.write_index, | ^~~~~~~~~~~~~~~ Use memcpy() instead to safely copy the value and avoid unaligned pointer access across architectures. Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux(a)gmail.com> --- tools/testing/selftests/user_events/perf_test.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/user_events/perf_test.c b/tools/testing/selftests/user_events/perf_test.c index 201459d8094d..e4385f4aa231 100644 --- a/tools/testing/selftests/user_events/perf_test.c +++ b/tools/testing/selftests/user_events/perf_test.c @@ -201,6 +201,7 @@ TEST_F(user, perf_empty_events) { struct perf_event_mmap_page *perf_page; int page_size = sysconf(_SC_PAGESIZE); int id, fd; + __u32 write_index; __u32 *val; reg.size = sizeof(reg); @@ -236,7 +237,8 @@ TEST_F(user, perf_empty_events) { ASSERT_EQ(1 << reg.enable_bit, self->check); /* Ensure write shows up at correct offset */ - ASSERT_NE(-1, write(self->data_fd, &reg.write_index, + memcpy(&write_index, &reg.write_index, sizeof(reg.write_index)); + ASSERT_NE(-1, write(self->data_fd, &write_index, sizeof(reg.write_index))); val = (void *)(((char *)perf_page) + perf_page->data_offset); ASSERT_EQ(PERF_RECORD_SAMPLE, *val); -- 2.51.0

2 months, 1 week

2
5
0 0

[PATCH net v2] selftests: netdevsim: Fix ethtool-coalesce.sh fail by installing ethtool-common.sh

by Wang Liang

The script "ethtool-common.sh" is not installed in INSTALL_PATH, and triggers some errors when I try to run the test 'drivers/net/netdevsim/ethtool-coalesce.sh': TAP version 13 1..1 # timeout set to 600 # selftests: drivers/net/netdevsim: ethtool-coalesce.sh # ./ethtool-coalesce.sh: line 4: ethtool-common.sh: No such file or directory # ./ethtool-coalesce.sh: line 25: make_netdev: command not found # ethtool: bad command line argument(s) # ./ethtool-coalesce.sh: line 124: check: command not found # ./ethtool-coalesce.sh: line 126: [: -eq: unary operator expected # FAILED /0 checks not ok 1 selftests: drivers/net/netdevsim: ethtool-coalesce.sh # exit=1 Install this file to avoid this error. After this patch: TAP version 13 1..1 # timeout set to 600 # selftests: drivers/net/netdevsim: ethtool-coalesce.sh # PASSED all 22 checks ok 1 selftests: drivers/net/netdevsim: ethtool-coalesce.sh Fixes: fbb8531e58bd ("selftests: extract common functions in ethtool-common.sh") Signed-off-by: Wang Liang <wangliang74(a)huawei.com> --- tools/testing/selftests/drivers/net/netdevsim/Makefile | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/testing/selftests/drivers/net/netdevsim/Makefile b/tools/testing/selftests/drivers/net/netdevsim/Makefile index daf51113c827..df10c7243511 100644 --- a/tools/testing/selftests/drivers/net/netdevsim/Makefile +++ b/tools/testing/selftests/drivers/net/netdevsim/Makefile @@ -20,4 +20,8 @@ TEST_PROGS := \ udp_tunnel_nic.sh \ # end of TEST_PROGS +TEST_FILES := \ + ethtool-common.sh +# end of TEST_FILES + include ../../../lib.mk -- 2.34.1

2 months, 1 week

2
1
0 0

[PATCH net] selftests/net: use destination options instead of hop-by-hop

by Anubhav Singh

The GRO self-test, gro.c, currently constructs IPv6 packets containing a Hop-by-Hop Options header (IPPROTO_HOPOPTS) to ensure the GRO path correctly handles IPv6 extension headers. However, network elements may be configured to drop packets with the Hop-by-Hop Options header (HBH). This causes the self-test to fail in environments where such network elements are present. To improve the robustness and reliability of this test in diverse network environments, switch from using IPPROTO_HOPOPTS to IPPROTO_DSTOPTS (Destination Options). The Destination Options header is less likely to be dropped by intermediate routers and still serves the core purpose of the test: validating GRO's handling of an IPv6 extension header. This change ensures the test can execute successfully without being incorrectly failed by network policies outside the kernel's control. Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test") Reviewed-by: Willem de Bruijn <willemb(a)google.com> Signed-off-by: Anubhav Singh <anubhavsinggh(a)google.com> --- tools/testing/selftests/net/gro.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/net/gro.c b/tools/testing/selftests/net/gro.c index 2b1d9f2b3e9e..d8c29fe39c1d 100644 --- a/tools/testing/selftests/net/gro.c +++ b/tools/testing/selftests/net/gro.c @@ -754,11 +754,11 @@ static void send_ipv6_exthdr(int fd, struct sockaddr_ll *daddr, char *ext_data1, static char exthdr_pck[sizeof(buf) + MIN_EXTHDR_SIZE]; create_packet(buf, 0, 0, PAYLOAD_LEN, 0); - add_ipv6_exthdr(buf, exthdr_pck, IPPROTO_HOPOPTS, ext_data1); + add_ipv6_exthdr(buf, exthdr_pck, IPPROTO_DSTOPTS, ext_data1); write_packet(fd, exthdr_pck, total_hdr_len + PAYLOAD_LEN + MIN_EXTHDR_SIZE, daddr); create_packet(buf, PAYLOAD_LEN * 1, 0, PAYLOAD_LEN, 0); - add_ipv6_exthdr(buf, exthdr_pck, IPPROTO_HOPOPTS, ext_data2); + add_ipv6_exthdr(buf, exthdr_pck, IPPROTO_DSTOPTS, ext_data2); write_packet(fd, exthdr_pck, total_hdr_len + PAYLOAD_LEN + MIN_EXTHDR_SIZE, daddr); } -- 2.51.1.851.g4ebd6896fd-goog

2 months, 1 week

2
1
0 0

[PATCH net] selftests/net: fix out-of-order delivery of FIN in gro:tcp test

by Anubhav Singh

Due to the gro_sender sending data packets and FIN packets in very quick succession, these are received almost simultaneously by the gro_receiver. FIN packets are sometimes processed before the data packets leading to intermittent (~1/100) test failures. This change adds a delay of 100ms before sending FIN packets in gro:tcp test to avoid the out-of-order delivery. The same mitigation already exists for the gro:ip test. Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test") Reviewed-by: Willem de Bruijn <willemb(a)google.com> Signed-off-by: Anubhav Singh <anubhavsinggh(a)google.com> --- tools/testing/selftests/net/gro.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/tools/testing/selftests/net/gro.c b/tools/testing/selftests/net/gro.c index 2b1d9f2b3e9e..3fa63bd85dea 100644 --- a/tools/testing/selftests/net/gro.c +++ b/tools/testing/selftests/net/gro.c @@ -989,6 +989,7 @@ static void check_recv_pkts(int fd, int *correct_payload, static void gro_sender(void) { + const int fin_delay_us = 100 * 1000; static char fin_pkt[MAX_HDR_LEN]; struct sockaddr_ll daddr = {}; int txfd = -1; @@ -1032,15 +1033,22 @@ static void gro_sender(void) write_packet(txfd, fin_pkt, total_hdr_len, &daddr); } else if (strcmp(testname, "tcp") == 0) { send_changed_checksum(txfd, &daddr); + /* Adding sleep before sending FIN so that it is not + * received prior to other packets. + */ + usleep(fin_delay_us); write_packet(txfd, fin_pkt, total_hdr_len, &daddr); send_changed_seq(txfd, &daddr); + usleep(fin_delay_us); write_packet(txfd, fin_pkt, total_hdr_len, &daddr); send_changed_ts(txfd, &daddr); + usleep(fin_delay_us); write_packet(txfd, fin_pkt, total_hdr_len, &daddr); send_diff_opt(txfd, &daddr); + usleep(fin_delay_us); write_packet(txfd, fin_pkt, total_hdr_len, &daddr); } else if (strcmp(testname, "ip") == 0) { send_changed_ECN(txfd, &daddr); -- 2.51.1.851.g4ebd6896fd-goog

2 months, 1 week

3
3
0 0

[RFC PATCH 00/21] VFIO live update support

by Vipin Sharma

Hello, This series adds the live update support in the VFIO PCI subsystem on top of Live Update Orchestrator (LUO) [1]. This series can also be found on GitHub: https://github.com/shvipin/linux vfio/liveupdate/rfc-v1 Goal of live update in VFIO subsystem is to preserve VFIO PCI devices while the host kernel is going through a live update. A preserved device means it can continue to work, perform DMA, not get reset while host under live update gets rebooted via kexec. This series registers VFIO with LUO, implements LUO callbacks, skip DMA clear, skip device reset, preserves and restores a device virtual config during live update. I have added a selftest towards the end of this series, vfio_pci_liveupdate_test, which sets certain properties of a VFIO PCI device, performs a live update, and then validates those properties are still same on the device. Overall flow for a VFIO device going through a live update will be something like: 1. Userspace passes a VFIO cdev FD along with a token to LUO for preservation. 2. LUO passes FD to VFIO subsystem to verify if FD can be preserved. If yes, it increases the refcount on the FD. 3. Eventually, userspace tells LUO to prepare for live update which results in LUO calling prepare() callback to each of its register filesystem handler with the passed FD it should be preparing. 4. VFIO subsystem saves certain properties which will be either lost or hard to recover from the device. 5. VFIO saves the needed data to KHO and provide LUO with the physical address of the data preserved by KHO. 6. Userspace sends FREEZE event to freeze the system. LUO forwards this to each of its registered subsystem. 7. VFIO disables interrupts configured on the device during freeze call. 8. Userspace performs kexec. 9. During kexec reboot, generally, all PCI devices gets their Bus Master Enable bit disabled. In live update case, preserved VFIO devices are skipped. 9. During boot, usual device enumeration happens and LUO also intializes itself. 10. Userspace uses the same token value (step 1), and ask LUO to return VFIO FD corresponding to token. 11. LUO ask VFIO to return VFIO cdev FD corresponding to the token. It gives it the physical address which VFIO returned it in step 5. 12. VFIO restore the KHO data and read the BDF value it saved. It iterates through all of the VFIO device it has in its VFIO cdev class and finds the BDF device. 13. VFIO creates an anonymous inode and file corresponding to the VFIO PCI device and returns it to LUO and LUO returns it to userspace. 14. Now FD returned to userspace works exactly same as if userspace has opened a VFIO device from /dev/vfio/device/* location. 15. It makes usual bind iommufd and attach page table calls. 16. During bind, when VFIO device is internally opened for the first time: - VFIO skips Bus Master Disable - VFIO skips device reset. - VFIO instead of initializing vconfig from the scratch uses the vconfig stored in KHO, and same for few other fields. This is what current series is implementing and validating through selftest. There are other things are which not implemented yet and some are also dependent on other subsystems. For example: 1. Once a device has been prepared, VFIO should not allow any changes to its state from userspace for example, changing PCI config values, resetting the device, etc. 2. Device IOVA is not preserved in this series. This work is done separately in IOMMMUFD live update preservation [2] 3. During PCI device enumeration, PCI subsystem writes to PCI config space, attach device to its original driver if present. This work is being done in PCI preservation [3]. 4. Enabling PCI device done in VFIO subsystem should be handled in PCI subsystem. Current, this patch series hasn't changed the behavior. 5. If live update gets canceled, interrupts which are disabled in freeze need to be reconfigured again. 6. In finish, if a device is not restored, how to know if KHO folio has been restored or not. 6. VFIO cdev is restored in anonymous file system. This should instead be done on devetmpfs For reviewers, following are the grouping of patches in this series: Patches 1-4 ----------- Feel free to ignore if you are only interested in VFIO. These are only for live update selftests. I had to make some changes on top LUO v4 series, to create a library out of them which can be used in other selftests (vfio), and fix some build issues. Patches 5-9 ----------- Adds basic live update support in VFIO. Registers to LUO, saves the device BDF in KHO during prepare, and returns VFIO cdev FD during restore. It doesn't save or skip anything else. Patches 10-17 ------------- Adds support for skipping certain opertions and preserving certain data needed to restore a device. Patches 18-21 ------------- - Integrate VFIO selftest with live update selftest library. - Adds a basic vfio_pci_liveupdate_test test which validates that Bus Master Enable bit is preserved, and virtual config is restored properly. Testing ------- I have done testing on QEMU with a test pci device and also on a bare metal with Intel DSA device. Make sure IDXD driver is not built in your kernel if testing with Intel DSA device. Basically, whichever device you use, it should not get auto-bind to any other driver. Important config options which should be enabled to test this series: - CONFIG_KEXEC_FILE - CONFIG_LIVEUPDATE - CONFIG_KEXEC_HANDOVER Besides this usual VFIO, VFIO_PCI, IOMMU and other dependencies are enabled. To build the test provide KHDR_INCLUDES to your make command if your headers are out-of-tree. KHDR_INCLUDES="-isystem ../../../../build/usr/include" make vfio_pci_liveupdate_test needs to be executed manually. This test needs to be executed two times; one before the live update and second after. ./run.sh -d 0000:00:04.0 vfio_pci_liveupdate_test Next Steps ---------- 1. Looking forward to feedback on this series. - What other things we should save? - Which things should not be saved? - Any locks or incorrect locking done in the series. - Any optimizations. 2. Integration with IOMMUFD and PCI series for complete workflow where a device continues a DMA while undergoing through live update. I will be going on a paternity leave soon, so, my responses gonna be intermittent. David Matlack (dmatlack(a)google.com) has graciously offered to work on this series and continue upstream engagement on this feature until I am back. Thank you, David! [1] LUO-v4: https://lore.kernel.org/linux-mm/20250929010321.3462457-1-pasha.tatashin@so… [2] IOMMUFD: https://lore.kernel.org/linux-iommu/20250928190624.3735830-1-skhawaja@googl… [3] PCI: https://lore.kernel.org/linux-pci/20250916-luo-pci-v2-0-c494053c3c08@kernel… Vipin Sharma (21): selftests/liveupdate: Build tests from the selftests/liveupdate directory selftests/liveupdate: Create library of core live update ioctls selftests/liveupdate: Move do_kexec.sh script to liveupdate/lib selftests/liveupdate: Move LUO ioctls calls to liveupdate library vfio/pci: Register VFIO live update file handler to Live Update Orchestrator vfio/pci: Accept live update preservation request for VFIO cdev vfio/pci: Store VFIO PCI device preservation data in KHO for live update vfio/pci: Retrieve preserved VFIO device for Live Update Orechestrator vfio/pci: Add Live Update finish callback implementation PCI: Add option to skip Bus Master Enable reset during kexec vfio/pci: Skip clearing bus master on live update device during kexec vfio/pci: Skip clearing bus master on live update restored device vfio/pci: Preserve VFIO PCI config space through live update vfio/pci: Skip device reset on live update restored device. PCI: Make PCI saved state and capability structs public vfio/pci: Save and restore the PCI state of the VFIO device vfio/pci: Disable interrupts before going live update kexec vfio: selftests: Build liveupdate library in VFIO selftests vfio: selftests: Initialize vfio_pci_device using a VFIO cdev FD vfio: selftests: Add VFIO live update test vfio: selftests: Validate vconfig preservation of VFIO PCI device during live update drivers/pci/pci-driver.c | 6 +- drivers/pci/pci.c | 5 - drivers/pci/pci.h | 7 - drivers/vfio/pci/Makefile | 1 + drivers/vfio/pci/vfio_pci_config.c | 17 + drivers/vfio/pci/vfio_pci_core.c | 31 +- drivers/vfio/pci/vfio_pci_liveupdate.c | 461 ++++++++++++++++++ drivers/vfio/pci/vfio_pci_priv.h | 17 + drivers/vfio/vfio_main.c | 20 +- include/linux/pci.h | 15 + include/linux/vfio.h | 8 + include/linux/vfio_pci_core.h | 1 + tools/testing/selftests/liveupdate/.gitignore | 7 +- tools/testing/selftests/liveupdate/Makefile | 31 +- .../liveupdate/{ => lib}/do_kexec.sh | 0 .../liveupdate/lib/include/liveupdate_util.h | 27 + .../selftests/liveupdate/lib/libliveupdate.mk | 18 + .../liveupdate/lib/liveupdate_util.c | 106 ++++ .../selftests/liveupdate/luo_multi_file.c | 2 - .../selftests/liveupdate/luo_multi_kexec.c | 2 - .../selftests/liveupdate/luo_multi_session.c | 2 - .../selftests/liveupdate/luo_test_utils.c | 73 +-- .../selftests/liveupdate/luo_test_utils.h | 10 +- .../selftests/liveupdate/luo_unreclaimed.c | 1 - tools/testing/selftests/vfio/Makefile | 15 +- .../selftests/vfio/lib/include/vfio_util.h | 1 + .../selftests/vfio/lib/vfio_pci_device.c | 33 +- .../selftests/vfio/vfio_pci_liveupdate_test.c | 116 +++++ 28 files changed, 900 insertions(+), 133 deletions(-) create mode 100644 drivers/vfio/pci/vfio_pci_liveupdate.c rename tools/testing/selftests/liveupdate/{ => lib}/do_kexec.sh (100%) create mode 100644 tools/testing/selftests/liveupdate/lib/include/liveupdate_util.h create mode 100644 tools/testing/selftests/liveupdate/lib/libliveupdate.mk create mode 100644 tools/testing/selftests/liveupdate/lib/liveupdate_util.c create mode 100644 tools/testing/selftests/vfio/vfio_pci_liveupdate_test.c base-commit: e48be01cadc981362646dc3a87d57316421590a5 -- 2.51.0.858.gf9c4a03a3a-goog

2 months, 1 week

6
56
0 0

[PATCH] gpio-selftests: replace fixed sleep with polling+timeout

by zntsproj

Replace the hard-coded sleep 0.1 with a polling loop with timeout to check the sysfs GPIO value. This avoids timing-dependent flaky failures in CI and on slower machines. --- .../testing/selftests/gpio/gpio-aggregator.sh | 59 +++++++++++++++---- 1 file changed, 46 insertions(+), 13 deletions(-) diff --git a/tools/testing/selftests/gpio/gpio-aggregator.sh b/tools/testing/selftests/gpio/gpio-aggregator.sh index 9b6f80ad9..1e81e62e9 100755 --- a/tools/testing/selftests/gpio/gpio-aggregator.sh +++ b/tools/testing/selftests/gpio/gpio-aggregator.sh @@ -671,26 +671,59 @@ teardown_4() { agg_configfs_cleanup } +# helper: wait for sysfs file to become a given value (timeout in seconds) +wait_for_sysfs_value() { + file="$1" + expected="$2" + timeout="${3:-2}" # seconds + interval="0.01" # seconds per poll + max=$((timeout * 100)) + i=0 + + while [ "$i" -lt "$max" ]; do + if [ "$(cat "$file")" = "$expected" ]; then + return 0 + fi + sleep "$interval" + i=$((i + 1)) + done + + return 1 +} + echo "4.1. Forwarding set values" setup_4 OFFSET=0 for SETTING in $SETTINGS; do - CHIP=$(echo "$SETTING" | cut -d: -f1) - BANK=$(echo "$SETTING" | cut -d: -f2) - LINE=$(echo "$SETTING" | cut -d: -f3) - DEVNAME=$(cat "$CONFIGFS_SIM_DIR/$CHIP/dev_name") - CHIPNAME=$(cat "$CONFIGFS_SIM_DIR/$CHIP/$BANK/chip_name") - VAL_PATH="/sys/devices/platform/$DEVNAME/$CHIPNAME/sim_gpio${LINE}/value" - test $(cat $VAL_PATH) = "0" || fail "incorrect value read from sysfs" - $BASE_DIR/gpio-mockup-cdev -s 1 "/dev/$(agg_configfs_chip_name agg0)" "$OFFSET" & - mock_pid=$! - sleep 0.1 # FIXME Any better way? - test "$(cat $VAL_PATH)" = "1" || fail "incorrect value read from sysfs" - kill "$mock_pid" - OFFSET=$(expr $OFFSET + 1) + CHIP=$(echo "$SETTING" | cut -d: -f1) + BANK=$(echo "$SETTING" | cut -d: -f2) + LINE=$(echo "$SETTING" | cut -d: -f3) + DEVNAME=$(cat "$CONFIGFS_SIM_DIR/$CHIP/dev_name") + CHIPNAME=$(cat "$CONFIGFS_SIM_DIR/$CHIP/$BANK/chip_name") + VAL_PATH="/sys/devices/platform/$DEVNAME/$CHIPNAME/sim_gpio${LINE}/value" + + test "$(cat "$VAL_PATH")" = "0" || fail "incorrect value read from sysfs" + + $BASE_DIR/gpio-mockup-cdev -s 1 "/dev/$(agg_configfs_chip_name agg0)" "$OFFSET" & + mock_pid=$! + + # wait up to 2s for value to flip to "1" + if ! wait_for_sysfs_value "$VAL_PATH" "1" 2; then + kill "$mock_pid" 2>/dev/null || true + wait "$mock_pid" 2>/dev/null || true + fail "timeout waiting for $VAL_PATH to become 1" + fi + + test "$(cat "$VAL_PATH")" = "1" || fail "incorrect value read from sysfs" + + kill "$mock_pid" 2>/dev/null || true + wait "$mock_pid" 2>/dev/null || true + + OFFSET=$((OFFSET + 1)) done teardown_4 + echo "4.2. Forwarding set config" setup_4 OFFSET=0 -- 2.51.2

2 months, 1 week

1
0
0 0

[PATCH v2] selftests: af_unix: Add tests for ECONNRESET and EOF semantics

by Sunday Adelodun

Add selftests to verify and document Linux’s intended behaviour for UNIX domain sockets (SOCK_STREAM and SOCK_DGRAM) when a peer closes. The tests cover: 1. EOF returned when a SOCK_STREAM peer closes normally. 2. ECONNRESET returned when a SOCK_STREAM peer closes with unread data. 3. SOCK_DGRAM sockets not returning ECONNRESET on peer close. This follows up on review feedback suggesting a selftest to clarify Linux’s semantics. Suggested-by: Kuniyuki Iwashima <kuniyu(a)google.com> Signed-off-by: Sunday Adelodun <adelodunolaoluwa(a)yahoo.com> --- Changelog: Changes made from v1: - Patch prefix updated to selftest: af_unix:. - All mentions of “UNIX” changed to AF_UNIX. - Removed BSD references from comments. - Shared setup refactored using FIXTURE_VARIANT(). - Cleanup moved to FIXTURE_TEARDOWN() to always run. - Tests consolidated to reduce duplication: EOF, ECONNRESET, SOCK_DGRAM peer close. - Corrected ASSERT usage and initialization style. - Makefile updated for new directory af_unix. tools/testing/selftests/net/af_unix/Makefile | 1 + .../selftests/net/af_unix/unix_connreset.c | 161 ++++++++++++++++++ 2 files changed, 162 insertions(+) create mode 100644 tools/testing/selftests/net/af_unix/unix_connreset.c diff --git a/tools/testing/selftests/net/af_unix/Makefile b/tools/testing/selftests/net/af_unix/Makefile index de805cbbdf69..5826a8372451 100644 --- a/tools/testing/selftests/net/af_unix/Makefile +++ b/tools/testing/selftests/net/af_unix/Makefile @@ -7,6 +7,7 @@ TEST_GEN_PROGS := \ scm_pidfd \ scm_rights \ unix_connect \ + unix_connreset \ # end of TEST_GEN_PROGS include ../../lib.mk diff --git a/tools/testing/selftests/net/af_unix/unix_connreset.c b/tools/testing/selftests/net/af_unix/unix_connreset.c new file mode 100644 index 000000000000..c65ec997d77d --- /dev/null +++ b/tools/testing/selftests/net/af_unix/unix_connreset.c @@ -0,0 +1,161 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Selftest for AF_UNIX socket close and ECONNRESET behaviour. + * + * This test verifies that: + * 1. SOCK_STREAM sockets return EOF when peer closes normally. + * 2. SOCK_STREAM sockets return ECONNRESET if peer closes with unread data. + * 3. SOCK_DGRAM sockets do not return ECONNRESET when peer closes. + * + * These tests document the intended Linux behaviour. + * + */ + +#define _GNU_SOURCE +#include <stdlib.h> +#include <string.h> +#include <fcntl.h> +#include <unistd.h> +#include <errno.h> +#include <sys/socket.h> +#include <sys/un.h> +#include "../../kselftest_harness.h" + +#define SOCK_PATH "/tmp/af_unix_connreset.sock" + +static void remove_socket_file(void) +{ + unlink(SOCK_PATH); +} + +FIXTURE(unix_sock) +{ + int server; + int client; + int child; +}; + +FIXTURE_VARIANT(unix_sock) +{ + int socket_type; + const char *name; +}; + +/* Define variants: stream and datagram */ +FIXTURE_VARIANT_ADD(unix_sock, stream) { + .socket_type = SOCK_STREAM, + .name = "SOCK_STREAM", +}; + +FIXTURE_VARIANT_ADD(unix_sock, dgram) { + .socket_type = SOCK_DGRAM, + .name = "SOCK_DGRAM", +}; + +FIXTURE_SETUP(unix_sock) +{ + struct sockaddr_un addr = {}; + int err; + + addr.sun_family = AF_UNIX; + strcpy(addr.sun_path, SOCK_PATH); + + self->server = socket(AF_UNIX, variant->socket_type, 0); + ASSERT_LT(-1, self->server); + + err = bind(self->server, (struct sockaddr *)&addr, sizeof(addr)); + ASSERT_EQ(0, err); + + if (variant->socket_type == SOCK_STREAM) { + err = listen(self->server, 1); + ASSERT_EQ(0, err); + + self->client = socket(AF_UNIX, SOCK_STREAM, 0); + ASSERT_LT(-1, self->client); + + err = connect(self->client, (struct sockaddr *)&addr, sizeof(addr)); + ASSERT_EQ(0, err); + + self->child = accept(self->server, NULL, NULL); + ASSERT_LT(-1, self->child); + } else { + /* Datagram: bind and connect only */ + self->client = socket(AF_UNIX, SOCK_DGRAM | SOCK_NONBLOCK, 0); + ASSERT_LT(-1, self->client); + + err = connect(self->client, (struct sockaddr *)&addr, sizeof(addr)); + ASSERT_EQ(0, err); + } +} + +FIXTURE_TEARDOWN(unix_sock) +{ + if (variant->socket_type == SOCK_STREAM) + close(self->child); + + close(self->client); + close(self->server); + remove_socket_file(); +} + +/* Test 1: peer closes normally */ +TEST_F(unix_sock, eof) +{ + char buf[16] = {}; + ssize_t n; + + if (variant->socket_type != SOCK_STREAM) + SKIP(return, "This test only applies to SOCK_STREAM"); + + /* Peer closes normally */ + close(self->child); + + n = recv(self->client, buf, sizeof(buf), 0); + TH_LOG("%s: recv=%zd errno=%d (%s)", variant->name, n, errno, strerror(errno)); + if (n == -1) + ASSERT_EQ(ECONNRESET, errno); + + if (n != -1) + ASSERT_EQ(0, n); +} + +/* Test 2: peer closes with unread data */ +TEST_F(unix_sock, reset_unread) +{ + char buf[16] = {}; + ssize_t n; + + if (variant->socket_type != SOCK_STREAM) + SKIP(return, "This test only applies to SOCK_STREAM"); + + /* Send data that will remain unread by client */ + send(self->client, "hello", 5, 0); + close(self->child); + + n = recv(self->client, buf, sizeof(buf), 0); + TH_LOG("%s: recv=%zd errno=%d (%s)", variant->name, n, errno, strerror(errno)); + ASSERT_EQ(-1, n); + ASSERT_EQ(ECONNRESET, errno); +} + +/* Test 3: SOCK_DGRAM peer close */ +TEST_F(unix_sock, dgram_reset) +{ + char buf[16] = {}; + ssize_t n; + + if (variant->socket_type != SOCK_DGRAM) + SKIP(return, "This test only applies to SOCK_DGRAM"); + + send(self->client, "hello", 5, 0); + close(self->server); + + n = recv(self->client, buf, sizeof(buf), 0); + TH_LOG("%s: recv=%zd errno=%d (%s)", variant->name, n, errno, strerror(errno)); + /* Expect EAGAIN because there is no datagram and peer is closed. */ + ASSERT_EQ(-1, n); + ASSERT_EQ(EAGAIN, errno); +} + +TEST_HARNESS_MAIN + -- 2.43.0

2 months, 1 week

2
4
0 0

[PATCH v22 00/28] riscv control-flow integrity for usermode

by Deepak Gupta

v22: fixing build error due to -march=zicfiss being picked in gcc-13 and above but not actually doing any codegen or recognizing instruction for zicfiss. Change in v22 makes dependence on `-fcf-protection=full` compiler flag to ensure that toolchain has support and then only CONFIG_RISCV_USER_CFI will be visible in menuconfig. v21: fixed build errors. Basics and overview =================== Software with larger attack surfaces (e.g. network facing apps like databases, browsers or apps relying on browser runtimes) suffer from memory corruption issues which can be utilized by attackers to bend control flow of the program to eventually gain control (by making their payload executable). Attackers are able to perform such attacks by leveraging call-sites which rely on indirect calls or return sites which rely on obtaining return address from stack memory. To mitigate such attacks, risc-v extension zicfilp enforces that all indirect calls must land on a landing pad instruction `lpad` else cpu will raise software check exception (a new cpu exception cause code on riscv). Similarly for return flow, risc-v extension zicfiss extends architecture with - `sspush` instruction to push return address on a shadow stack - `sspopchk` instruction to pop return address from shadow stack and compare with input operand (i.e. return address on stack) - `sspopchk` to raise software check exception if comparision above was a mismatch - Protection mechanism using which shadow stack is not writeable via regular store instructions More information an details can be found at extensions github repo [1]. Equivalent to landing pad (zicfilp) on x86 is `ENDBRANCH` instruction in Intel CET [3] and branch target identification (BTI) [4] on arm. Similarly x86's Intel CET has shadow stack [5] and arm64 has guarded control stack (GCS) [6] which are very similar to risc-v's zicfiss shadow stack. x86 and arm64 support for user mode shadow stack is already in mainline. Kernel awareness for user control flow integrity ================================================ This series picks up Samuel Holland's envcfg changes [2] as well. So if those are being applied independently, they should be removed from this series. Enabling: In order to maintain compatibility and not break anything in user mode, kernel doesn't enable control flow integrity cpu extensions on binary by default. Instead exposes a prctl interface to enable, disable and lock the shadow stack or landing pad feature for a task. This allows userspace (loader) to enumerate if all objects in its address space are compiled with shadow stack and landing pad support and accordingly enable the feature. Additionally if a subsequent `dlopen` happens on a library, user mode can take a decision again to disable the feature (if incoming library is not compiled with support) OR terminate the task (if user mode policy is strict to have all objects in address space to be compiled with control flow integirty cpu feature). prctl to enable shadow stack results in allocating shadow stack from virtual memory and activating for user address space. x86 and arm64 are also following same direction due to similar reason(s). clone/fork: On clone and fork, cfi state for task is inherited by child. Shadow stack is part of virtual memory and is a writeable memory from kernel perspective (writeable via a restricted set of instructions aka shadow stack instructions) Thus kernel changes ensure that this memory is converted into read-only when fork/clone happens and COWed when fault is taken due to sspush, sspopchk or ssamoswap. In case `CLONE_VM` is specified and shadow stack is to be enabled, kernel will automatically allocate a shadow stack for that clone call. map_shadow_stack: x86 introduced `map_shadow_stack` system call to allow user space to explicitly map shadow stack memory in its address space. It is useful to allocate shadow for different contexts managed by a single thread (green threads or contexts) risc-v implements this system call as well. signal management: If shadow stack is enabled for a task, kernel performs an asynchronous control flow diversion to deliver the signal and eventually expects userspace to issue sigreturn so that original execution can be resumed. Even though resume context is prepared by kernel, it is in user space memory and is subject to memory corruption and corruption bugs can be utilized by attacker in this race window to perform arbitrary sigreturn and eventually bypass cfi mechanism. Another issue is how to ensure that cfi related state on sigcontext area is not trampled by legacy apps or apps compiled with old kernel headers. In order to mitigate control-flow hijacting, kernel prepares a token and place it on shadow stack before signal delivery and places address of token in sigcontext structure. During sigreturn, kernel obtains address of token from sigcontext struture, reads token from shadow stack and validates it and only then allow sigreturn to succeed. Compatiblity issue is solved by adopting dynamic sigcontext management introduced for vector extension. This series re-factor the code little bit to allow future sigcontext management easy (as proposed by Andy Chiu from SiFive) config and compilation: Introduce a new risc-v config option `CONFIG_RISCV_USER_CFI`. Selecting this config option picks the kernel support for user control flow integrity. This optin is presented only if toolchain has shadow stack and landing pad support. And is on purpose guarded by toolchain support. Reason being that eventually vDSO also needs to be compiled in with shadow stack and landing pad support. vDSO compile patches are not included as of now because landing pad labeling scheme is yet to settle for usermode runtime. To get more information on kernel interactions with respect to zicfilp and zicfiss, patch series adds documentation for `zicfilp` and `zicfiss` in following: Documentation/arch/riscv/zicfiss.rst Documentation/arch/riscv/zicfilp.rst How to test this series ======================= Toolchain --------- $ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev $ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static" $ make -j$(nproc) Qemu ---- Get the lastest qemu $ cd qemu $ mkdir build $ cd build $ ../configure --target-list=riscv64-softmmu $ make -j$(nproc) Opensbi ------- $ git clone git@github.com:deepak0414/opensbi.git -b v6_cfi_spec_split_opensbi $ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic Linux ----- Running defconfig is fine. CFI is enabled by default if the toolchain supports it. $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) Running ------- Modify your qemu command to have: -bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin -cpu rv64,zicfilp=true,zicfiss=true,zimop=true,zcmop=true References ========== [1] - https://github.com/riscv/riscv-cfi [2] - https://lore.kernel.org/all/20240814081126.956287-1-samuel.holland@sifive.c… [3] - https://lwn.net/Articles/889475/ [4] - https://developer.arm.com/documentation/109576/0100/Branch-Target-Identific… [5] - https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-i… [6] - https://lwn.net/Articles/940403/ To: Thomas Gleixner <tglx(a)linutronix.de> To: Ingo Molnar <mingo(a)redhat.com> To: Borislav Petkov <bp(a)alien8.de> To: Dave Hansen <dave.hansen(a)linux.intel.com> To: x86(a)kernel.org To: H. Peter Anvin <hpa(a)zytor.com> To: Andrew Morton <akpm(a)linux-foundation.org> To: Liam R. Howlett <Liam.Howlett(a)oracle.com> To: Vlastimil Babka <vbabka(a)suse.cz> To: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> To: Paul Walmsley <paul.walmsley(a)sifive.com> To: Palmer Dabbelt <palmer(a)dabbelt.com> To: Albert Ou <aou(a)eecs.berkeley.edu> To: Conor Dooley <conor(a)kernel.org> To: Rob Herring <robh(a)kernel.org> To: Krzysztof Kozlowski <krzk+dt(a)kernel.org> To: Arnd Bergmann <arnd(a)arndb.de> To: Christian Brauner <brauner(a)kernel.org> To: Peter Zijlstra <peterz(a)infradead.org> To: Oleg Nesterov <oleg(a)redhat.com> To: Eric Biederman <ebiederm(a)xmission.com> To: Kees Cook <kees(a)kernel.org> To: Jonathan Corbet <corbet(a)lwn.net> To: Shuah Khan <shuah(a)kernel.org> To: Jann Horn <jannh(a)google.com> To: Conor Dooley <conor+dt(a)kernel.org> To: Miguel Ojeda <ojeda(a)kernel.org> To: Alex Gaynor <alex.gaynor(a)gmail.com> To: Boqun Feng <boqun.feng(a)gmail.com> To: Gary Guo <gary(a)garyguo.net> To: Björn Roy Baron <bjorn3_gh(a)protonmail.com> To: Benno Lossin <benno.lossin(a)proton.me> To: Andreas Hindborg <a.hindborg(a)kernel.org> To: Alice Ryhl <aliceryhl(a)google.com> To: Trevor Gross <tmgross(a)umich.edu> Cc: linux-kernel(a)vger.kernel.org Cc: linux-fsdevel(a)vger.kernel.org Cc: linux-mm(a)kvack.org Cc: linux-riscv(a)lists.infradead.org Cc: devicetree(a)vger.kernel.org Cc: linux-arch(a)vger.kernel.org Cc: linux-doc(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: alistair.francis(a)wdc.com Cc: richard.henderson(a)linaro.org Cc: jim.shu(a)sifive.com Cc: andybnac(a)gmail.com Cc: kito.cheng(a)sifive.com Cc: charlie(a)rivosinc.com Cc: atishp(a)rivosinc.com Cc: evan(a)rivosinc.com Cc: cleger(a)rivosinc.com Cc: alexghiti(a)rivosinc.com Cc: samitolvanen(a)google.com Cc: broonie(a)kernel.org Cc: rick.p.edgecombe(a)intel.com Cc: rust-for-linux(a)vger.kernel.org changelog --------- v22: - CONFIG_RISCV_USER_CFI was by default "n". With dual vdso support it is default "y" (if toolchain supports it). Fixing build error due to "-march=zicfiss" being picked in gcc-13 partially. gcc-13 only recognizes the flag but not actually doing any codegen or recognizing instruction for zicfiss. Change in v22 makes dependence on `-fcf-protection=full` compiler flag to ensure that toolchain has support and then only CONFIG_RISCV_USER_CFI will be visible in menuconfig. - picked up tags and some cosmetic changes in commit message for dual vdso patch. v21: - Fixing build errors due to changes in arch/riscv/include/asm/vdso.h Using #ifdef instead of IS_ENABLED in arch/riscv/include/asm/vdso.h vdso-cfi-offsets.h should be included only when CONFIG_RISCV_USER_CFI is selected. v20: - rebased on v6.18-rc1. - Added two vDSO support. If `CONFIG_RISCV_USER_CFI` is selected two vDSOs are compiled (one for hardware prior to RVA23 and one for RVA23 onwards). Kernel exposes RVA23 vDSO if hardware/cpu implements zimop else exposes existing vDSO to userspace. - default selection for `CONFIG_RISCV_USER_CFI` is "Yes". - replaced "__ASSEMBLY__" with "__ASSEMBLER__" v19: - riscv_nousercfi was `int`. changed it to unsigned long. Thanks to Alex Ghiti for reporting it. It was a bug. - ELP is cleared on trap entry only when CONFIG_64BIT. - restore ssp back on return to usermode was being done before `riscv_v_context_nesting_end` on trap exit path. If kernel shadow stack were enabled this would result in kernel operating on user shadow stack and panic (as I found in my testing of kcfi patch series). So fixed that. v18: - rebased on 6.16-rc1 - uprobe handling clears ELP in sstatus image in pt_regs - vdso was missing shadow stack elf note for object files. added that. Additional asm file for vdso needed the elf marker flag. toolchain should complain if `-fcf-protection=full` and marker is missing for object generated from asm file. Asked toolchain folks to fix this. Although no reason to gate the merge on that. - Split up compile options for march and fcf-protection in vdso Makefile - CONFIG_RISCV_USER_CFI option is moved under "Kernel features" menu Added `arch/riscv/configs/hardening.config` fragment which selects CONFIG_RISCV_USER_CFI v17: - fixed warnings due to empty macros in usercfi.h (reported by alexg) - fixed prefixes in commit titles reported by alexg - took below uprobe with fcfi v2 patch from Zong Li and squashed it with "riscv/traps: Introduce software check exception and uprobe handling" https://lore.kernel.org/all/20250604093403.10916-1-zong.li@sifive.com/ v16: - If FWFT is not implemented or returns error for shadow stack activation, then no_usercfi is set to disable shadow stack. Although this should be picked up by extension validation and activation. Fixed this bug for zicfilp and zicfiss both. Thanks to Charlie Jenkins for reporting this. - If toolchain doesn't support cfi, cfi kselftest shouldn't build. Suggested by Charlie Jenkins. - Default for CONFIG_RISCV_USER_CFI is set to no. Charlie/Atish suggested to keep it off till we have more hardware availibility with RVA23 profile and zimop/zcmop implemented. Else this will start breaking people's workflow - Includes the fix if "!RV64 and !SBI" then definitions for FWFT in asm-offsets.c error. v15: - Toolchain has been updated to include `-fcf-protection` flag. This exists for x86 as well. Updated kernel patches to compile vDSO and selftest to compile with `fcf-protection=full` flag. - selecting CONFIG_RISCV_USERCFI selects CONFIG_RISCV_SBI. - Patch to enable shadow stack for kernel wasn't hidden behind CONFIG_RISCV_USERCFI and CONFIG_RISCV_SBI. fixed that. v14: - rebased on top of palmer/sbi-v3. Thus dropped clement's FWFT patches Updated RISCV_ISA_EXT_XXXX in hwcap and hwprobe constants. - Took Radim's suggestions on bitfields. - Placed cfi_state at the end of thread_info block so that current situation is not disturbed with respect to member fields of thread_info in single cacheline. v13: - cpu_supports_shadow_stack/cpu_supports_indirect_br_lp_instr uses riscv_has_extension_unlikely() - uses nops(count) to create nop slide - RISCV_ACQUIRE_BARRIER is not needed in `amo_user_shstk`. Removed it - changed ternaries to simply use implicit casting to convert to bool. - kernel command line allows to disable zicfilp and zicfiss independently. updated kernel-parameters.txt. - ptrace user abi for cfi uses bitmasks instead of bitfields. Added ptrace kselftest. - cosmetic and grammatical changes to documentation. v12: - It seems like I had accidently squashed arch agnostic indirect branch tracking prctl and riscv implementation of those prctls. Split them again. - set_shstk_status/set_indir_lp_status perform CSR writes only when CPU support is available. As suggested by Zong Li. - Some minor clean up in kselftests as suggested by Zong Li. v11: - patch "arch/riscv: compile vdso with landing pad" was unconditionally selecting `_zicfilp` for vDSO compile. fixed that. Changed `lpad 1` to to `lpad 0`. v10: - dropped "mm: helper `is_shadow_stack_vma` to check shadow stack vma". This patch is not that interesting to this patch series for risc-v. There are instances in arch directories where VM_SHADOW_STACK flag is anyways used. Dropping this patch to expedite merging in riscv tree. - Took suggestions from `Clement` on "riscv: zicfiss / zicfilp enumeration" to validate presence of cfi based on config. - Added a patch for vDSO to have `lpad 0`. I had omitted this earlier to make sure we add single vdso object with cfi enabled. But a vdso object with scheme of zero labeled landing pad is least common denominator and should work with all objects of zero labeled as well as function-signature labeled objects. v9: - rebased on master (39a803b754d5 fix braino in "9p: fix ->rename_sem exclusion") - dropped "mm: Introduce ARCH_HAS_USER_SHADOW_STACK" (master has it from arm64/gcs) - dropped "prctl: arch-agnostic prctl for shadow stack" (master has it from arm64/gcs) v8: - rebased on palmer/for-next - dropped samuel holland's `envcfg` context switch patches. they are in parlmer/for-next v7: - Removed "riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv" Instead using `deactivate_mm` flow to clean up. see here for more context https://lore.kernel.org/all/20230908203655.543765-1-rick.p.edgecombe@intel.… - Changed the header include in `kselftest`. Hopefully this fixes compile issue faced by Zong Li at SiFive. - Cleaned up an orphaned change to `mm/mmap.c` in below patch "riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE" - Lock interfaces for shadow stack and indirect branch tracking expect arg == 0 Any future evolution of this interface should accordingly define how arg should be setup. - `mm/map.c` has an instance of using `VM_SHADOW_STACK`. Fixed it to use helper `is_shadow_stack_vma`. - Link to v6: https://lore.kernel.org/r/20241008-v5_user_cfi_series-v6-0-60d9fe073f37@riv… v6: - Picked up Samuel Holland's changes as is with `envcfg` placed in `thread` instead of `thread_info` - fixed unaligned newline escapes in kselftest - cleaned up messages in kselftest and included test output in commit message - fixed a bug in clone path reported by Zong Li - fixed a build issue if CONFIG_RISCV_ISA_V is not selected (this was introduced due to re-factoring signal context management code) v5: - rebased on v6.12-rc1 - Fixed schema related issues in device tree file - Fixed some of the documentation related issues in zicfilp/ss.rst (style issues and added index) - added `SHADOW_STACK_SET_MARKER` so that implementation can define base of shadow stack. - Fixed warnings on definitions added in usercfi.h when CONFIG_RISCV_USER_CFI is not selected. - Adopted context header based signal handling as proposed by Andy Chiu - Added support for enabling kernel mode access to shadow stack using FWFT (https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-firmware…) - Link to v5: https://lore.kernel.org/r/20241001-v5_user_cfi_series-v1-0-3ba65b6e550f@riv… (Note: I had an issue in my workflow due to which version number wasn't picked up correctly while sending out patches) v4: - rebased on 6.11-rc6 - envcfg: Converged with Samuel Holland's patches for envcfg management on per- thread basis. - vma_is_shadow_stack is renamed to is_vma_shadow_stack - picked up Mark Brown's `ARCH_HAS_USER_SHADOW_STACK` patch - signal context: using extended context management to maintain compatibility. - fixed `-Wmissing-prototypes` compiler warnings for prctl functions - Documentation fixes and amending typos. - Link to v4: https://lore.kernel.org/all/20240912231650.3740732-1-debug@rivosinc.com/ v3: - envcfg logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been picked on per task basis, even though CPU didn't implement it. Fixed in this series. - dt-bindings As suggested, split into separate commit. fixed the messaging that spec is in public review - arch_is_shadow_stack change arch_is_shadow_stack changed to vma_is_shadow_stack - hwprobe zicfiss / zicfilp if present will get enumerated in hwprobe - selftests As suggested, added object and binary filenames to .gitignore Selftest binary anyways need to be compiled with cfi enabled compiler which will make sure that landing pad and shadow stack are enabled. Thus removed separate enable/disable tests. Cleaned up tests a bit. - Link to v3: https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/ v2: - Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow integrity for user mode programs can be compiled in the kernel. - Enabling of control flow integrity for user programs is left to user runtime - This patch series introduces arch agnostic `prctls` to enable shadow stack and indirect branch tracking. And implements them on riscv. --- Changes in v22: - Link to v21: https://lore.kernel.org/r/20251015-v5_user_cfi_series-v21-0-6a07856e90e7@ri… Changes in v21: - Link to v20: https://lore.kernel.org/r/20251013-v5_user_cfi_series-v20-0-b9de4be9912e@ri… Changes in v20: - Link to v19: https://lore.kernel.org/r/20250731-v5_user_cfi_series-v19-0-09b468d7beab@ri… Changes in v19: - Link to v18: https://lore.kernel.org/r/20250711-v5_user_cfi_series-v18-0-a8ee62f9f38e@ri… Changes in v18: - Link to v17: https://lore.kernel.org/r/20250604-v5_user_cfi_series-v17-0-4565c2cf869f@ri… Changes in v17: - Link to v16: https://lore.kernel.org/r/20250522-v5_user_cfi_series-v16-0-64f61a35eee7@ri… Changes in v16: - Link to v15: https://lore.kernel.org/r/20250502-v5_user_cfi_series-v15-0-914966471885@ri… Changes in v15: - changelog posted just below cover letter - Link to v14: https://lore.kernel.org/r/20250429-v5_user_cfi_series-v14-0-5239410d012a@ri… Changes in v14: - changelog posted just below cover letter - Link to v13: https://lore.kernel.org/r/20250424-v5_user_cfi_series-v13-0-971437de586a@ri… Changes in v13: - changelog posted just below cover letter - Link to v12: https://lore.kernel.org/r/20250314-v5_user_cfi_series-v12-0-e51202b53138@ri… Changes in v12: - changelog posted just below cover letter - Link to v11: https://lore.kernel.org/r/20250310-v5_user_cfi_series-v11-0-86b36cbfb910@ri… Changes in v11: - changelog posted just below cover letter - Link to v10: https://lore.kernel.org/r/20250210-v5_user_cfi_series-v10-0-163dcfa31c60@ri… --- Andy Chiu (1): riscv: signal: abstract header saving for setup_sigcontext Deepak Gupta (26): mm: VM_SHADOW_STACK definition for riscv dt-bindings: riscv: zicfilp and zicfiss in dt-bindings (extensions.yaml) riscv: zicfiss / zicfilp enumeration riscv: zicfiss / zicfilp extension csr and bit definitions riscv: usercfi state for task and save/restore of CSR_SSP on trap entry/exit riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE riscv/mm: manufacture shadow stack pte riscv/mm: teach pte_mkwrite to manufacture shadow stack PTEs riscv/mm: write protect and shadow stack riscv/mm: Implement map_shadow_stack() syscall riscv/shstk: If needed allocate a new shadow stack on clone riscv: Implements arch agnostic shadow stack prctls prctl: arch-agnostic prctl for indirect branch tracking riscv: Implements arch agnostic indirect branch tracking prctls riscv/traps: Introduce software check exception and uprobe handling riscv/signal: save and restore of shadow stack for signal riscv/kernel: update __show_regs to print shadow stack register riscv/ptrace: riscv cfi status and state via ptrace and in core files riscv/hwprobe: zicfilp / zicfiss enumeration in hwprobe riscv: kernel command line option to opt out of user cfi riscv: enable kernel access to shadow stack memory via FWFT sbi call arch/riscv: dual vdso creation logic and select vdso based on hw riscv: create a config for shadow stack and landing pad instr support riscv: Documentation for landing pad / indirect branch tracking riscv: Documentation for shadow stack on riscv kselftest/riscv: kselftest for user mode cfi Jim Shu (1): arch/riscv: compile vdso with landing pad and shadow stack note Documentation/admin-guide/kernel-parameters.txt | 8 + Documentation/arch/riscv/index.rst | 2 + Documentation/arch/riscv/zicfilp.rst | 115 +++++ Documentation/arch/riscv/zicfiss.rst | 179 +++++++ .../devicetree/bindings/riscv/extensions.yaml | 14 + arch/riscv/Kconfig | 22 + arch/riscv/Makefile | 8 +- arch/riscv/configs/hardening.config | 4 + arch/riscv/include/asm/asm-prototypes.h | 1 + arch/riscv/include/asm/assembler.h | 44 ++ arch/riscv/include/asm/cpufeature.h | 12 + arch/riscv/include/asm/csr.h | 16 + arch/riscv/include/asm/entry-common.h | 2 + arch/riscv/include/asm/hwcap.h | 2 + arch/riscv/include/asm/mman.h | 26 + arch/riscv/include/asm/mmu_context.h | 7 + arch/riscv/include/asm/pgtable.h | 30 +- arch/riscv/include/asm/processor.h | 1 + arch/riscv/include/asm/thread_info.h | 3 + arch/riscv/include/asm/usercfi.h | 95 ++++ arch/riscv/include/asm/vdso.h | 13 +- arch/riscv/include/asm/vector.h | 3 + arch/riscv/include/uapi/asm/hwprobe.h | 2 + arch/riscv/include/uapi/asm/ptrace.h | 34 ++ arch/riscv/include/uapi/asm/sigcontext.h | 1 + arch/riscv/kernel/Makefile | 2 + arch/riscv/kernel/asm-offsets.c | 10 + arch/riscv/kernel/cpufeature.c | 27 + arch/riscv/kernel/entry.S | 38 ++ arch/riscv/kernel/head.S | 27 + arch/riscv/kernel/process.c | 27 +- arch/riscv/kernel/ptrace.c | 95 ++++ arch/riscv/kernel/signal.c | 148 +++++- arch/riscv/kernel/sys_hwprobe.c | 2 + arch/riscv/kernel/sys_riscv.c | 10 + arch/riscv/kernel/traps.c | 54 ++ arch/riscv/kernel/usercfi.c | 545 +++++++++++++++++++++ arch/riscv/kernel/vdso.c | 7 + arch/riscv/kernel/vdso/Makefile | 40 +- arch/riscv/kernel/vdso/flush_icache.S | 4 + arch/riscv/kernel/vdso/gen_vdso_offsets.sh | 4 +- arch/riscv/kernel/vdso/getcpu.S | 4 + arch/riscv/kernel/vdso/note.S | 3 + arch/riscv/kernel/vdso/rt_sigreturn.S | 4 + arch/riscv/kernel/vdso/sys_hwprobe.S | 4 + arch/riscv/kernel/vdso/vgetrandom-chacha.S | 5 +- arch/riscv/kernel/vdso_cfi/Makefile | 25 + arch/riscv/kernel/vdso_cfi/vdso-cfi.S | 11 + arch/riscv/mm/init.c | 2 +- arch/riscv/mm/pgtable.c | 16 + include/linux/cpu.h | 4 + include/linux/mm.h | 7 + include/uapi/linux/elf.h | 2 + include/uapi/linux/prctl.h | 27 + kernel/sys.c | 30 ++ tools/testing/selftests/riscv/Makefile | 2 +- tools/testing/selftests/riscv/cfi/.gitignore | 3 + tools/testing/selftests/riscv/cfi/Makefile | 16 + tools/testing/selftests/riscv/cfi/cfi_rv_test.h | 82 ++++ tools/testing/selftests/riscv/cfi/riscv_cfi_test.c | 173 +++++++ tools/testing/selftests/riscv/cfi/shadowstack.c | 385 +++++++++++++++ tools/testing/selftests/riscv/cfi/shadowstack.h | 27 + 62 files changed, 2475 insertions(+), 41 deletions(-) --- base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787 change-id: 20240930-v5_user_cfi_series-3dc332f8f5b2 -- - debug

2 months, 1 week

2
15
0 0

[PATCH bpf-next v7 00/15] selftests/bpf: Integrate test_xsk.c to test_progs framework

by Bastien Curutchet (eBPF Foundation)

Hi all, The test_xsk.sh script covers many AF_XDP use cases. The tests it runs are defined in xksxceiver.c. Since this script is used to test real hardware, the goal here is to leave it as it is, and only integrate the tests that run on veth peers into the test_progs framework. PATCH 1 extracts test_xsk[.c/.h] from xskxceiver[.c/.h] to make the tests available to test_progs. PATCH 2 to 7 fix small issues in the current test PATCH 8 to 13 handle all errors to release resources instead of calling exit() when any error occurs. PATCH 14 isolates the tests that won't fit in the CI PATCH 15 integrates the CI tests to the test_progs framework Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com> --- Changes in v7: - Restore 'test_ns' prefix to allow parallel execution. - PATCH 11: fix potential uninitialized variable spotted by AI. - PACTH 12: fix potential resource leak spotted by AI - Link to v6: https://lore.kernel.org/r/20251029-xsk-v6-0-5a63a64dff98@bootlin.com Changes in v6: - Setup veth peer once for each mode instead of once for each substest - Rename the 'flaky' table 'skip-ci' table and move the automatically skipped and the longest tests into it - Link to v5: https://lore.kernel.org/r/20251016-xsk-v5-0-662c95eb8005@bootlin.com Changes in v5: - Rebase on latest bpf-next_base - Move XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF to the flaky table - Add Maciej's reviewed-by - Link to v4: https://lore.kernel.org/r/20250924-xsk-v4-0-20e57537b876@bootlin.com Changes in v4: - Fix test_xsk.sh's summary report. - Merge PATCH 11 & 12 together, otherwise PATCH 11 fails to build. - Split old PATCH 3 in two patches. The first one fixes testapp_stats_rx_dropped(), the second one fixes testapp_xdp_shared_umem(). The unecessary frees (in testapp_stats_rx_full() and testapp_stats_fill_empty() are removed) - Link to v3: https://lore.kernel.org/r/20250904-xsk-v3-0-ce382e331485@bootlin.com Changes in v3: - Rebase on latest bpf-next_base to integrate commit c9110e6f7237 ("selftests/bpf: Fix count write in testapp_xdp_metadata_copy()"). - Move XDP_METADATA_COPY_* tests from flaky-tests to nominal tests - Link to v2: https://lore.kernel.org/r/20250902-xsk-v2-0-17c6345d5215@bootlin.com Changes in v2: - Rebase on the latest bpf-next_base and integrate the newly added tests to the work (adjust_tail* and tx_queue_consumer tests) - Re-order patches to split xkxceiver sooner. - Fix the bug reported by Maciej. - Fix verbose mode in test_xsk.sh by keeping kselftest (remove PATCH 1, 7 and 8) - Link to v1: https://lore.kernel.org/r/20250313-xsk-v1-0-7374729a93b9@bootlin.com --- Bastien Curutchet (eBPF Foundation) (15): selftests/bpf: test_xsk: Split xskxceiver selftests/bpf: test_xsk: Initialize bitmap before use selftests/bpf: test_xsk: Fix __testapp_validate_traffic()'s return value selftests/bpf: test_xsk: fix memory leak in testapp_stats_rx_dropped() selftests/bpf: test_xsk: fix memory leak in testapp_xdp_shared_umem() selftests/bpf: test_xsk: Wrap test clean-up in functions selftests/bpf: test_xsk: Release resources when swap fails selftests/bpf: test_xsk: Add return value to init_iface() selftests/bpf: test_xsk: Don't exit immediately when xsk_attach fails selftests/bpf: test_xsk: Don't exit immediately when gettimeofday fails selftests/bpf: test_xsk: Don't exit immediately when workers fail selftests/bpf: test_xsk: Don't exit immediately if validate_traffic fails selftests/bpf: test_xsk: Don't exit immediately on allocation failures selftests/bpf: test_xsk: Isolate non-CI tests selftests/bpf: test_xsk: Integrate test_xsk.c to test_progs framework tools/testing/selftests/bpf/Makefile | 11 +- tools/testing/selftests/bpf/prog_tests/test_xsk.c | 2596 ++++++++++++++++++++ tools/testing/selftests/bpf/prog_tests/test_xsk.h | 298 +++ tools/testing/selftests/bpf/prog_tests/xsk.c | 151 ++ tools/testing/selftests/bpf/xskxceiver.c | 2696 +-------------------- tools/testing/selftests/bpf/xskxceiver.h | 156 -- 6 files changed, 3184 insertions(+), 2724 deletions(-) --- base-commit: 1e2d874b04ba46a3b9fe6697097aa437641f4339 change-id: 20250218-xsk-0cf90e975d14 Best regards, -- Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>

2 months, 1 week

3
17
0 0

[PATCH v7 00/15] Consolidate iommu page table implementations (AMD)

by Jason Gunthorpe

[Kevin has a done a great job to get through reviews on all these, and Vasant/Ankit have been looking at it on AMD systems, I think we are close to being done now!] Currently each of the iommu page table formats duplicates all of the logic to maintain the page table and perform map/unmap/etc operations. There are several different versions of the algorithms between all the different formats. The io-pgtable system provides an interface to help isolate the page table code from the iommu driver, but doesn't provide tools to implement the common algorithms. This makes it very hard to improve the state of the pagetable code under the iommu domains as any proposed improvement needs to alter a large number of different driver code paths. Combined with a lack of software based testing this makes improvement in this area very hard. iommufd wants several new page table operations: - More efficient map/unmap operations, using iommufd's batching logic - unmap that returns the physical addresses into a batch as it progresses - cut that allows splitting areas so large pages can have holes poked in them dynamically (ie guestmemfd hitless shared/private transitions) - More agressive freeing of table memory to avoid waste - Fragmenting large pages so that dirty tracking can be more granular - Reassembling large pages so that VMs can run at full IO performance in migration/dirty tracking error flows - KHO integration for kernel live upgrade Together these are algorithmically complex enough to be a very significant task to go and implement in all the page table formats we support. Just the "server" focused drivers use almost all the formats (ARMv8 S1&S2 / x86 PAE / AMDv1 / VT-d SS / RISCV) Instead of doing the duplicated work, this series takes the first step to consolidate the algorithms into one places. In spirit it is similar to the work Christoph did a few years back to pull the redundant get_user_pages() implementations out of the arch code into core MM. This unlocked a great deal of improvement in that space in the following years. I would like to see the same benefit in iommu as well. My first RFC showed a bigger picture with all most all formats and more algorithms. This series reorganizes that to be narrowly focused on just enough to convert the AMD driver to use the new mechanism. kunit tests are provided that allow good testing of the algorithms and all formats on x86, nothing is arch specific. AMD is one of the simpler options as the HW is quite uniform with few different options/bugs while still requiring the complicated contiguous pages support. The HW also has a very simple range based invalidation approach that is easy to implement. The AMD v1 and AMD v2 page table formats are implemented bit for bit identical to the current code, tested using a compare kunit test that checks against the io-pgtable version (on github, see below). Updating the AMD driver to replace the io-pgtable layer with the new stuff is fairly straightforward now. The layering is fixed up in the new version so that all the invalidation goes through function pointers. Several small fixing patches have come out of this as I've been fixing the problems that the test suite uncovers in the current code, and implementing the fixed version in iommupt. On performance, there is a quite wide variety of implementation designs across all the drivers. Looking at some key performance across the main formats: iommu_map(): pgsz ,avg new,old ns, min new,old ns , min % (+ve is better) 2^12, 53,66 , 51,63 , 19.19 (AMDV1) 256*2^12, 386,1909 , 367,1795 , 79.79 256*2^21, 362,1633 , 355,1556 , 77.77 2^12, 56,62 , 52,59 , 11.11 (AMDv2) 256*2^12, 405,1355 , 357,1292 , 72.72 256*2^21, 393,1160 , 358,1114 , 67.67 2^12, 55,65 , 53,62 , 14.14 (VT-d second stage) 256*2^12, 391,518 , 332,512 , 35.35 256*2^21, 383,635 , 336,624 , 46.46 2^12, 57,65 , 55,63 , 12.12 (ARM 64 bit) 256*2^12, 380,389 , 361,369 , 2.02 256*2^21, 358,419 , 345,400 , 13.13 iommu_unmap(): pgsz ,avg new,old ns, min new,old ns , min % (+ve is better) 2^12, 69,88 , 65,85 , 23.23 (AMDv1) 256*2^12, 353,6498 , 331,6029 , 94.94 256*2^21, 373,6014 , 360,5706 , 93.93 2^12, 71,72 , 66,69 , 4.04 (AMDv2) 256*2^12, 228,891 , 206,871 , 76.76 256*2^21, 254,721 , 245,711 , 65.65 2^12, 69,87 , 65,82 , 20.20 (VT-d second stage) 256*2^12, 210,321 , 200,315 , 36.36 256*2^21, 255,349 , 238,342 , 30.30 2^12, 72,77 , 68,74 , 8.08 (ARM 64 bit) 256*2^12, 521,357 , 447,346 , -29.29 256*2^21, 489,358 , 433,345 , -25.25 * Above numbers include additional patches to remove the iommu_pgsize() overheads. gcc 13.3.0, i7-12700 This version provides fairly consistent performance across formats. ARM unmap performance is quite different because this version supports contiguous pages and uses a very different algorithm for unmapping. Though why it is so worse compared to AMDv1 I haven't figured out yet. The per-format commits include a more detailed chart. There is a second branch: https://github.com/jgunthorpe/linux/commits/iommu_pt_all Containing supporting work and future steps: - ARM short descriptor (32 bit), ARM long descriptor (64 bit) formats - RISCV format and RISCV conversion https://github.com/jgunthorpe/linux/commits/iommu_pt_riscv - Support for a DMA incoherent HW page table walker - VT-d second stage format and VT-d conversion https://github.com/jgunthorpe/linux/commits/iommu_pt_vtd - DART v1 & v2 format - Draft of a iommufd 'cut' operation to break down huge pages - A compare test that checks the iommupt formats against the iopgtable interface, including updating AMD to have a working iopgtable and patches to make VT-d have an iopgtable for testing. - A performance test to micro-benchmark map and unmap against iogptable My strategy is to go one by one for the drivers: - AMD driver conversion - RISCV page table and driver - Intel VT-d driver and VTDSS page table - Flushing improvements for RISCV - ARM SMMUv3 And concurrently work on the algorithm side: - debugfs content dump, like VT-d has - Cut support - Increase/Decrease page size support - map/unmap batching - KHO As we make more algorithm improvements the value to convert the drivers increases. This is on github: https://github.com/jgunthorpe/linux/commits/iommu_pt v7: - Rebase to v6.18-rc2 - Improve comments and documentation - Add a few missed __sme_sets() for AMD CC - Rename pt_iommu_flush_ops -> pt_iommu_driver_ops VT-D -> VT-d pt_clear_entry -> pt_clear_entries pt_entry_write_is_dirty -> pt_entry_is_write_dirty pt_entry_set_write_clean -> pt_entry_make_write_clean - Tidy some of the map flow into a new function do_map() - Fix ffz64() v6: https://patch.msgid.link/r/0-v6-0fb54a1d9850+36b-iommu_pt_jgg@nvidia.com - Improve comments and documentation - Rename pt_entry_oa_full -> pt_entry_oa_exact pt_has_system_page -> pt_has_system_page_size pt_max_output_address_lg2 -> pt_max_oa_lg2 log2_f*() -> vaf* / oaf* / f*_t pt_item_fully_covered -> pt_entry_fully_covered - Fix missed constant propogation causing division - Consolidate debugging checks to pt_check_install_leaf_args() - Change collect->ignore_mapped to check_mapped - Shuffle some hunks around to more appropriate patches - Two new mini kunit tests v5: https://patch.msgid.link/r/0-v5-116c4948af3d+68091-iommu_pt_jgg@nvidia.com - Text grammar updates and kdoc fixes v4: https://patch.msgid.link/r/0-v4-0d6a6726a372+18959-iommu_pt_jgg@nvidia.com - Rebase on v6.16-rc3 - Integrate the HATS/HATDis changes - Remove 'default n' from kconfig - Remove unused 'PT_FIXED_TOP_LEVEL' - Improve comments and documentation - Fix some compile warnings from kbuild robots v3: https://patch.msgid.link/r/0-v3-a93aab628dbc+521-iommu_pt_jgg@nvidia.com - Rebase on v6.16-rc2 - s/PT_ENTRY_WORD_SIZE/PT_ITEM_WORD_SIZE/s to follow the language better - Comment and documentation updates - Add PT_TOP_PHYS_MASK to help manage alignment restrictions on the top pointer - Add missed force_aperture = true - Make pt_iommu_deinit() take care of the not-yet-inited error case internally as AMD/RISCV/VTD all shared this logic - Change gather_range() into gather_range_pages() so it also deals with the page list. This makes the following cache flushing series simpler - Fix missed update of unmap->unmapped in some error cases - Change clear_contig() to order the gather more logically - Remove goto from the error handling in __map_range_leaf() - s/log2_/oalog2_/ in places where the argument is an oaddr_t - Pass the pts to pt_table_install64/32() - Do not use SIGN_EXTEND for the AMDv2 page table because of Vasant's information on how PASID 0 works. v2: https://patch.msgid.link/r/0-v2-5c26bde5c22d+58b-iommu_pt_jgg@nvidia.com - AMD driver only, many code changes RFC: https://lore.kernel.org/all/0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com/ Cc: Michael Roth <michael.roth(a)amd.com> Cc: Alexey Kardashevskiy <aik(a)amd.com> Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com> Cc: James Gowans <jgowans(a)amazon.com> Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com> Alejandro Jimenez (1): iommu/amd: Use the generic iommu page table Jason Gunthorpe (14): genpt: Generic Page Table base API genpt: Add Documentation/ files iommupt: Add the basic structure of the iommu implementation iommupt: Add the AMD IOMMU v1 page table format iommupt: Add iova_to_phys op iommupt: Add unmap_pages op iommupt: Add map_pages op iommupt: Add read_and_clear_dirty op iommupt: Add a kunit test for Generic Page Table iommupt: Add a mock pagetable format for iommufd selftest to use iommufd: Change the selftest to use iommupt instead of xarray iommupt: Add the x86 64 bit page table format iommu/amd: Remove AMD io_pgtable support iommupt: Add a kunit test for the IOMMU implementation .clang-format | 1 + Documentation/driver-api/generic_pt.rst | 142 ++ Documentation/driver-api/index.rst | 1 + drivers/iommu/Kconfig | 2 + drivers/iommu/Makefile | 1 + drivers/iommu/amd/Kconfig | 5 +- drivers/iommu/amd/Makefile | 2 +- drivers/iommu/amd/amd_iommu.h | 1 - drivers/iommu/amd/amd_iommu_types.h | 110 +- drivers/iommu/amd/io_pgtable.c | 577 -------- drivers/iommu/amd/io_pgtable_v2.c | 370 ------ drivers/iommu/amd/iommu.c | 538 ++++---- drivers/iommu/generic_pt/.kunitconfig | 13 + drivers/iommu/generic_pt/Kconfig | 68 + drivers/iommu/generic_pt/fmt/Makefile | 26 + drivers/iommu/generic_pt/fmt/amdv1.h | 415 ++++++ drivers/iommu/generic_pt/fmt/defs_amdv1.h | 21 + drivers/iommu/generic_pt/fmt/defs_x86_64.h | 21 + drivers/iommu/generic_pt/fmt/iommu_amdv1.c | 15 + drivers/iommu/generic_pt/fmt/iommu_mock.c | 10 + drivers/iommu/generic_pt/fmt/iommu_template.h | 48 + drivers/iommu/generic_pt/fmt/iommu_x86_64.c | 11 + drivers/iommu/generic_pt/fmt/x86_64.h | 259 ++++ drivers/iommu/generic_pt/iommu_pt.h | 1162 +++++++++++++++++ drivers/iommu/generic_pt/kunit_generic_pt.h | 713 ++++++++++ drivers/iommu/generic_pt/kunit_iommu.h | 183 +++ drivers/iommu/generic_pt/kunit_iommu_pt.h | 487 +++++++ drivers/iommu/generic_pt/pt_common.h | 358 +++++ drivers/iommu/generic_pt/pt_defs.h | 329 +++++ drivers/iommu/generic_pt/pt_fmt_defaults.h | 233 ++++ drivers/iommu/generic_pt/pt_iter.h | 636 +++++++++ drivers/iommu/generic_pt/pt_log2.h | 122 ++ drivers/iommu/io-pgtable.c | 4 - drivers/iommu/iommufd/Kconfig | 1 + drivers/iommu/iommufd/iommufd_test.h | 11 +- drivers/iommu/iommufd/selftest.c | 438 +++---- include/linux/generic_pt/common.h | 167 +++ include/linux/generic_pt/iommu.h | 271 ++++ include/linux/io-pgtable.h | 2 - include/linux/irqchip/riscv-imsic.h | 3 +- tools/testing/selftests/iommu/iommufd.c | 60 +- tools/testing/selftests/iommu/iommufd_utils.h | 12 + 42 files changed, 6237 insertions(+), 1612 deletions(-) create mode 100644 Documentation/driver-api/generic_pt.rst delete mode 100644 drivers/iommu/amd/io_pgtable.c delete mode 100644 drivers/iommu/amd/io_pgtable_v2.c create mode 100644 drivers/iommu/generic_pt/.kunitconfig create mode 100644 drivers/iommu/generic_pt/Kconfig create mode 100644 drivers/iommu/generic_pt/fmt/Makefile create mode 100644 drivers/iommu/generic_pt/fmt/amdv1.h create mode 100644 drivers/iommu/generic_pt/fmt/defs_amdv1.h create mode 100644 drivers/iommu/generic_pt/fmt/defs_x86_64.h create mode 100644 drivers/iommu/generic_pt/fmt/iommu_amdv1.c create mode 100644 drivers/iommu/generic_pt/fmt/iommu_mock.c create mode 100644 drivers/iommu/generic_pt/fmt/iommu_template.h create mode 100644 drivers/iommu/generic_pt/fmt/iommu_x86_64.c create mode 100644 drivers/iommu/generic_pt/fmt/x86_64.h create mode 100644 drivers/iommu/generic_pt/iommu_pt.h create mode 100644 drivers/iommu/generic_pt/kunit_generic_pt.h create mode 100644 drivers/iommu/generic_pt/kunit_iommu.h create mode 100644 drivers/iommu/generic_pt/kunit_iommu_pt.h create mode 100644 drivers/iommu/generic_pt/pt_common.h create mode 100644 drivers/iommu/generic_pt/pt_defs.h create mode 100644 drivers/iommu/generic_pt/pt_fmt_defaults.h create mode 100644 drivers/iommu/generic_pt/pt_iter.h create mode 100644 drivers/iommu/generic_pt/pt_log2.h create mode 100644 include/linux/generic_pt/common.h create mode 100644 include/linux/generic_pt/iommu.h base-commit: bf3db0366052dcdf7dea89a07929b690aac59b15 -- 2.43.0

2 months, 1 week

5
38
0 0

[PATCH v3] selftests/run_kselftest.sh: exit with error if tests fail

by Brendan Jackman

Parsing KTAP is quite an inconvenience, but most of the time the thing you really want to know is "did anything fail"? Let's give the user the his information without them needing to parse anything. Because of the use of subshells and namespaces, this needs to be communicated via a file. Just write arbitrary data into the file and treat non-empty content as a signal that something failed. In case any user depends on the current behaviour, such as running this from a script with `set -e` and parsing the result for failures afterwards, add a flag they can set to get the old behaviour, namely --no-error-on-fail. Signed-off-by: Brendan Jackman <jackmanb(a)google.com> --- Changes in v3: - Fixed quoting - Link to v2: https://lore.kernel.org/r/20251014-b4-ksft-error-on-fail-v2-1-b3e2657237b8@… Changes in v2: - Fixed bug in report_failure() - Made error-on-fail the default - Link to v1: https://lore.kernel.org/r/20251007-b4-ksft-error-on-fail-v1-1-71bf058f5662@… --- tools/testing/selftests/kselftest/runner.sh | 14 ++++++++++---- tools/testing/selftests/run_kselftest.sh | 14 ++++++++++++++ 2 files changed, 24 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh index 2c3c58e65a419f5ee8d7dc51a37671237a07fa0b..3a62039fa6217f3453423ff011575d0a1eb8c275 100644 --- a/tools/testing/selftests/kselftest/runner.sh +++ b/tools/testing/selftests/kselftest/runner.sh @@ -44,6 +44,12 @@ tap_timeout() fi } +report_failure() +{ + echo "not ok $*" + echo "$*" >> "$kselftest_failures_file" +} + run_one() { DIR="$1" @@ -105,7 +111,7 @@ run_one() echo "# $TEST_HDR_MSG" if [ ! -e "$TEST" ]; then echo "# Warning: file $TEST is missing!" - echo "not ok $test_num $TEST_HDR_MSG" + report_failure "$test_num $TEST_HDR_MSG" else if [ -x /usr/bin/stdbuf ]; then stdbuf="/usr/bin/stdbuf --output=L " @@ -123,7 +129,7 @@ run_one() interpreter=$(head -n 1 "$TEST" | cut -c 3-) cmd="$stdbuf $interpreter ./$BASENAME_TEST" else - echo "not ok $test_num $TEST_HDR_MSG" + report_failure "$test_num $TEST_HDR_MSG" return fi fi @@ -137,9 +143,9 @@ run_one() echo "ok $test_num $TEST_HDR_MSG # SKIP" elif [ $rc -eq $timeout_rc ]; then \ echo "#" - echo "not ok $test_num $TEST_HDR_MSG # TIMEOUT $kselftest_timeout seconds" + report_failure "$test_num $TEST_HDR_MSG # TIMEOUT $kselftest_timeout seconds" else - echo "not ok $test_num $TEST_HDR_MSG # exit=$rc" + report_failure "$test_num $TEST_HDR_MSG # exit=$rc" fi) cd - >/dev/null fi diff --git a/tools/testing/selftests/run_kselftest.sh b/tools/testing/selftests/run_kselftest.sh index 0443beacf3621ae36cb12ffd57f696ddef3526b5..d4be97498b32e975c63a1167d3060bdeba674c8c 100755 --- a/tools/testing/selftests/run_kselftest.sh +++ b/tools/testing/selftests/run_kselftest.sh @@ -33,6 +33,7 @@ Usage: $0 [OPTIONS] -c | --collection COLLECTION Run all tests from COLLECTION -l | --list List the available collection:test entries -d | --dry-run Don't actually run any tests + -f | --no-error-on-fail Don't exit with an error just because tests failed -n | --netns Run each test in namespace -h | --help Show this usage info -o | --override-timeout Number of seconds after which we timeout @@ -44,6 +45,7 @@ COLLECTIONS="" TESTS="" dryrun="" kselftest_override_timeout="" +ERROR_ON_FAIL=true while true; do case "$1" in -s | --summary) @@ -65,6 +67,9 @@ while true; do -d | --dry-run) dryrun="echo" shift ;; + -f | --no-error-on-fail) + ERROR_ON_FAIL=false + shift ;; -n | --netns) RUN_IN_NETNS=1 shift ;; @@ -105,9 +110,18 @@ if [ -n "$TESTS" ]; then available="$(echo "$valid" | sed -e 's/ /\n/g')" fi +kselftest_failures_file="$(mktemp --tmpdir kselftest-failures-XXXXXX)" +export kselftest_failures_file + collections=$(echo "$available" | cut -d: -f1 | sort | uniq) for collection in $collections ; do [ -w /dev/kmsg ] && echo "kselftest: Running tests in $collection" >> /dev/kmsg tests=$(echo "$available" | grep "^$collection:" | cut -d: -f2) ($dryrun cd "$collection" && $dryrun run_many $tests) done + +failures="$(cat "$kselftest_failures_file")" +rm "$kselftest_failures_file" +if "$ERROR_ON_FAIL" && [ "$failures" ]; then + exit 1 +fi --- base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585 change-id: 20251007-b4-ksft-error-on-fail-0c2cb3246041 Best regards, -- Brendan Jackman <jackmanb(a)google.com>

2 months, 1 week

2
3
0 0

[PATCH 0/6] KVM: LoongArch: selftests: Add timer test case

by Bibo Mao

This patch set adds timer test case for LoongArch system, it is based on common arch_timer test case. And it includes time counter function, one-shot/period mode interrupt, and software emulated timer function test. Bibo Mao (6): KVM: LoongArch: selftests: Add system registers save and restore on exception KVM: LoongArch: selftests: Add exception handler register interface KVM: LoongArch: selftests: Add basic interfaces KVM: LoongArch: selftests: Add timer test case with one-shot mode KVM: LoongArch: selftests: Add period mode timer and time counter test KVM: LoongArch: selftests: Add SW emulated timer test tools/testing/selftests/kvm/Makefile.kvm | 10 +- .../kvm/include/loongarch/arch_timer.h | 84 ++++++++ .../kvm/include/loongarch/processor.h | 81 +++++++- .../selftests/kvm/lib/loongarch/exception.S | 6 + .../selftests/kvm/lib/loongarch/processor.c | 38 +++- .../selftests/kvm/loongarch/arch_timer.c | 187 ++++++++++++++++++ 6 files changed, 400 insertions(+), 6 deletions(-) create mode 100644 tools/testing/selftests/kvm/include/loongarch/arch_timer.h create mode 100644 tools/testing/selftests/kvm/loongarch/arch_timer.c base-commit: e53642b87a4f4b03a8d7e5f8507fc3cd0c595ea6 -- 2.39.3

2 months, 1 week

1
6
0 0

[GIT PULL] kselftest fixes update for Linux 6.18-rc4

by Shuah Khan

Hi Linus, Please pull the following kselftest fixes update for Linux 6.18-rc4. Fixes build warning in cachestat found during clang build and adds tmpshmcstat to .gitignore. diff is attached. thanks, -- Shuah ---------------------------------------------------------------- The following changes since commit 3a8660878839faadb4f1a6dd72c3179c1df56787: Linux 6.18-rc1 (2025-10-12 13:42:36 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-fixes-6.18-rc4 for you to fetch changes up to 920aa3a7705a061cb3004572d8b7932b54463dbf: selftests: cachestat: Fix warning on declaration under label (2025-10-22 09:23:18 -0600) ---------------------------------------------------------------- linux_kselftest-fixes-6.18-rc4 Fixes build warning in cachestat found during clang build and adds tmpshmcstat to .gitignore. ---------------------------------------------------------------- Madhur Kumar (1): selftests/cachestat: add tmpshmcstat file to .gitignore Sidharth Seela (1): selftests: cachestat: Fix warning on declaration under label tools/testing/selftests/cachestat/.gitignore | 1 + tools/testing/selftests/cachestat/test_cachestat.c | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) ----------------------------------------------------------------

2 months, 1 week

2
1
0 0

[GIT PULL] kunit fixes update for Linux 6.18-rc4

by Shuah Khan

Hi Linus, Please pull the following kunit fixes update for Linux 6.18-rc4. Fixes log overwrite in param_tests and fixes incorrect cast of priv pointer in test_dev_action(). Updates email address for Rae Moar in MAINTAINERS KUnit entry. diff is attached. thanks, -- Shuah ---------------------------------------------------------------- The following changes since commit 3a8660878839faadb4f1a6dd72c3179c1df56787: Linux 6.18-rc1 (2025-10-12 13:42:36 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-kunit-fixes-6.18-rc4 for you to fetch changes up to f3903ec76ae6afcdba0347681d1dda005fb145cd: MAINTAINERS: Update KUnit email address for Rae Moar (2025-10-29 14:57:54 -0600) ---------------------------------------------------------------- linux_kselftest-kunit-fixes-6.18-rc4 Fixes log overwrite in param_tests and fixes incorrect cast of priv pointer in test_dev_action(). Updates email address for Rae Moar in MAINTAINERS KUnit entry. ---------------------------------------------------------------- Carlos Llamas (1): kunit: prevent log overwrite in param_tests Florian Schmaus (1): kunit: test_dev_action: Correctly cast 'priv' pointer to long* Rae Moar (1): MAINTAINERS: Update KUnit email address for Rae Moar .mailmap | 1 + MAINTAINERS | 2 +- lib/kunit/kunit-test.c | 2 +- lib/kunit/test.c | 3 ++- 4 files changed, 5 insertions(+), 3 deletions(-) ----------------------------------------------------------------

2 months, 1 week

2
1
0 0

[PATCH bpf 0/2] use rqspinlock for bpf lru map

by Menglong Dong

Convert the raw_spinlock to rqspinlock to fix the possible deadlock in [1] for bpf lru map. Meanwhile, add the testcase for the deadlock. Link: https://lore.kernel.org/bpf/CAEf4BzbTJCUx0D=zjx6+5m5iiGhwLzaP94hnw36ZMDHAf4… Menglong Dong (2): bpf: use rqspinlock for lru map selftests/bpf: test map deadlock caused by NMI kernel/bpf/bpf_lru_list.c | 47 +++--- kernel/bpf/bpf_lru_list.h | 5 +- .../selftests/bpf/prog_tests/map_deadlock.c | 134 ++++++++++++++++++ .../selftests/bpf/progs/map_deadlock.c | 52 +++++++ 4 files changed, 217 insertions(+), 21 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/map_deadlock.c create mode 100644 tools/testing/selftests/bpf/progs/map_deadlock.c -- 2.51.2

2 months, 1 week

3
5
0 0

[PATCH net-next v3] selftests: drv-net: replace the nsim ring test with a drv-net one

by Jakub Kicinski

We are trying to move away from netdevsim-only tests and towards tests which can be run both against netdevsim and real drivers. Replace the simple bash script we have for checking ethtool -g/-G on netdevsim with a Python test tweaking those params as well as channel count. The new test is not exactly equivalent to the netdevsim one, but real drivers don't often support random ring sizes, let alone modifying max values via debugfs. Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- v3: - let ring sizes fall all the way down to 0 v2: https://lore.kernel.org/20251027192131.2053792-1-kuba@kernel.org - add the new test to Makefile and remove the old one turns out NIPA checking for Makefile presence was busted v1: https://lore.kernel.org/20251024215552.1249838-1-kuba@kernel.org CC: andrew(a)lunn.ch CC: shuah(a)kernel.org CC: linux-kselftest(a)vger.kernel.org --- tools/testing/selftests/drivers/net/Makefile | 1 + .../selftests/drivers/net/netdevsim/Makefile | 1 - .../drivers/net/netdevsim/ethtool-ring.sh | 85 --------- .../selftests/drivers/net/ring_reconfig.py | 167 ++++++++++++++++++ 4 files changed, 168 insertions(+), 86 deletions(-) delete mode 100755 tools/testing/selftests/drivers/net/netdevsim/ethtool-ring.sh create mode 100755 tools/testing/selftests/drivers/net/ring_reconfig.py diff --git a/tools/testing/selftests/drivers/net/Makefile b/tools/testing/selftests/drivers/net/Makefile index 6e41635bd55a..68e0bb603a9d 100644 --- a/tools/testing/selftests/drivers/net/Makefile +++ b/tools/testing/selftests/drivers/net/Makefile @@ -22,6 +22,7 @@ TEST_PROGS := \ ping.py \ psp.py \ queues.py \ + ring_reconfig.py \ shaper.py \ stats.py \ xdp.py \ diff --git a/tools/testing/selftests/drivers/net/netdevsim/Makefile b/tools/testing/selftests/drivers/net/netdevsim/Makefile index daf51113c827..833abd8e6fdc 100644 --- a/tools/testing/selftests/drivers/net/netdevsim/Makefile +++ b/tools/testing/selftests/drivers/net/netdevsim/Makefile @@ -8,7 +8,6 @@ TEST_PROGS := \ ethtool-features.sh \ ethtool-fec.sh \ ethtool-pause.sh \ - ethtool-ring.sh \ fib.sh \ fib_notifications.sh \ hw_stats_l3.sh \ diff --git a/tools/testing/selftests/drivers/net/netdevsim/ethtool-ring.sh b/tools/testing/selftests/drivers/net/netdevsim/ethtool-ring.sh deleted file mode 100755 index c969559ffa7a..000000000000 --- a/tools/testing/selftests/drivers/net/netdevsim/ethtool-ring.sh +++ /dev/null @@ -1,85 +0,0 @@ -#!/bin/bash -# SPDX-License-Identifier: GPL-2.0-only - -source ethtool-common.sh - -function get_value { - local query="${SETTINGS_MAP[$1]}" - - echo $(ethtool -g $NSIM_NETDEV | \ - tail -n +$CURR_SETT_LINE | \ - awk -F':' -v pattern="$query:" '$0 ~ pattern {gsub(/[\t ]/, "", $2); print $2}') -} - -function update_current_settings { - for key in ${!SETTINGS_MAP[@]}; do - CURRENT_SETTINGS[$key]=$(get_value $key) - done - echo ${CURRENT_SETTINGS[@]} -} - -if ! ethtool -h | grep -q set-ring >/dev/null; then - echo "SKIP: No --set-ring support in ethtool" - exit 4 -fi - -NSIM_NETDEV=$(make_netdev) - -set -o pipefail - -declare -A SETTINGS_MAP=( - ["rx"]="RX" - ["rx-mini"]="RX Mini" - ["rx-jumbo"]="RX Jumbo" - ["tx"]="TX" -) - -declare -A EXPECTED_SETTINGS=( - ["rx"]="" - ["rx-mini"]="" - ["rx-jumbo"]="" - ["tx"]="" -) - -declare -A CURRENT_SETTINGS=( - ["rx"]="" - ["rx-mini"]="" - ["rx-jumbo"]="" - ["tx"]="" -) - -MAX_VALUE=$((RANDOM % $((2**32-1)))) -RING_MAX_LIST=$(ls $NSIM_DEV_DFS/ethtool/ring/) - -for ring_max_entry in $RING_MAX_LIST; do - echo $MAX_VALUE > $NSIM_DEV_DFS/ethtool/ring/$ring_max_entry -done - -CURR_SETT_LINE=$(ethtool -g $NSIM_NETDEV | grep -i -m1 -n 'Current hardware settings' | cut -f1 -d:) - -# populate the expected settings map -for key in ${!SETTINGS_MAP[@]}; do - EXPECTED_SETTINGS[$key]=$(get_value $key) -done - -# test -for key in ${!SETTINGS_MAP[@]}; do - value=$((RANDOM % $MAX_VALUE)) - - ethtool -G $NSIM_NETDEV "$key" "$value" - - EXPECTED_SETTINGS[$key]="$value" - expected=${EXPECTED_SETTINGS[@]} - current=$(update_current_settings) - - check $? "$current" "$expected" - set +x -done - -if [ $num_errors -eq 0 ]; then - echo "PASSED all $((num_passes)) checks" - exit 0 -else - echo "FAILED $num_errors/$((num_errors+num_passes)) checks" - exit 1 -fi diff --git a/tools/testing/selftests/drivers/net/ring_reconfig.py b/tools/testing/selftests/drivers/net/ring_reconfig.py new file mode 100755 index 000000000000..f9530a8b0856 --- /dev/null +++ b/tools/testing/selftests/drivers/net/ring_reconfig.py @@ -0,0 +1,167 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 + +""" +Test channel and ring size configuration via ethtool (-L / -G). +""" + +from lib.py import ksft_run, ksft_exit, ksft_pr +from lib.py import ksft_eq +from lib.py import NetDrvEpEnv, EthtoolFamily, GenerateTraffic +from lib.py import defer, NlError + + +def channels(cfg) -> None: + """ + Twiddle channel counts in various combinations of parameters. + We're only looking for driver adhering to the requested config + if the config is accepted and crashes. + """ + ehdr = {'header':{'dev-index': cfg.ifindex}} + chans = cfg.eth.channels_get(ehdr) + + all_keys = ["rx", "tx", "combined"] + mixes = [{"combined"}, {"rx", "tx"}, {"rx", "combined"}, {"tx", "combined"}, + {"rx", "tx", "combined"},] + + # Get the set of keys that device actually supports + restore = {} + supported = set() + for key in all_keys: + if key + "-max" in chans: + supported.add(key) + restore |= {key + "-count": chans[key + "-count"]} + + defer(cfg.eth.channels_set, ehdr | restore) + + def test_config(config): + try: + cfg.eth.channels_set(ehdr | config) + get = cfg.eth.channels_get(ehdr) + for k, v in config.items(): + ksft_eq(get.get(k, 0), v) + except NlError as e: + failed.append(mix) + ksft_pr("Can't set", config, e) + else: + ksft_pr("Okay", config) + + failed = [] + for mix in mixes: + if not mix.issubset(supported): + continue + + # Set all the values in the mix to 1, other supported to 0 + config = {} + for key in all_keys: + config[key + "-count"] = 1 if key in mix else 0 + test_config(config) + + for mix in mixes: + if not mix.issubset(supported): + continue + if mix in failed: + continue + + # Set all the values in the mix to max, other supported to 0 + config = {} + for key in all_keys: + config[key + "-count"] = chans[key + '-max'] if key in mix else 0 + test_config(config) + + +def _configure_min_ring_cnt(cfg) -> None: + """ Try to configure a single Rx/Tx ring. """ + ehdr = {'header':{'dev-index': cfg.ifindex}} + chans = cfg.eth.channels_get(ehdr) + + all_keys = ["rx-count", "tx-count", "combined-count"] + restore = {} + config = {} + for key in all_keys: + if key in chans: + restore[key] = chans[key] + config[key] = 0 + + if chans.get('combined-count', 0) > 1: + config['combined-count'] = 1 + elif chans.get('rx-count', 0) > 1 and chans.get('tx-count', 0) > 1: + config['tx-count'] = 1 + config['rx-count'] = 1 + else: + # looks like we're already on 1 channel + return + + cfg.eth.channels_set(ehdr | config) + defer(cfg.eth.channels_set, ehdr | restore) + + +def ringparam(cfg) -> None: + """ + Tweak the ringparam configuration. Try to run some traffic over min + ring size to make sure it actually functions. + """ + ehdr = {'header':{'dev-index': cfg.ifindex}} + rings = cfg.eth.rings_get(ehdr) + + restore = {} + maxes = {} + params = set() + for key in rings.keys(): + if 'max' in key: + param = key[:-4] + maxes[param] = rings[key] + params.add(param) + restore[param] = rings[param] + + defer(cfg.eth.rings_set, ehdr | restore) + + # Speed up the reconfig by configuring just one ring + _configure_min_ring_cnt(cfg) + + # Try to reach min on all settings + for param in params: + val = rings[param] + while True: + try: + cfg.eth.rings_set({'header':{'dev-index': cfg.ifindex}, + param: val // 2}) + if val == 0: + break + val //= 2 + except NlError: + break + + get = cfg.eth.rings_get(ehdr) + ksft_eq(get[param], val) + + ksft_pr(f"Reached min for '{param}' at {val} (max {rings[param]})") + + GenerateTraffic(cfg).wait_pkts_and_stop(10000) + + # Try max across all params, if the driver supports large rings + # this may OOM so we ignore errors + try: + ksft_pr("Applying max settings") + config = {p: maxes[p] for p in params} + cfg.eth.rings_set(ehdr | config) + except NlError as e: + ksft_pr("Can't set max params", config, e) + else: + GenerateTraffic(cfg).wait_pkts_and_stop(10000) + + +def main() -> None: + """ Ksft boiler plate main """ + + with NetDrvEpEnv(__file__) as cfg: + cfg.eth = EthtoolFamily() + + ksft_run([channels, + ringparam], + args=(cfg, )) + ksft_exit() + + +if __name__ == "__main__": + main() -- 2.51.0

2 months, 1 week

4
3
0 0

[PATCH bpf-next v3 0/4] selftests/bpf: convert test_tc_tunnel.sh to test_progs

by Alexis Lothoré (eBPF Foundation)

Hello, this is the v3 of test_tc_tunnel conversion into test_progs framework. This new revision: - fixes a few issues spotted by the bot reviewer - removes any test ensuring connection failure (and so depending on a timout) to keep the execution time reasonable test_tc_tunnel.sh tests a variety of tunnels based on BPF: packets are encapsulated by a BPF program on the client egress. We then check that those packets can be decapsulated on server ingress side, either thanks to kernel-based or BPF-based decapsulation. Those tests are run thanks to two veths in two dedicated namespaces. - patches 1 and 2 are preparatory patches - patch 3 introduce tc_tunnel test into test_progs - patch 4 gets rid of the test_tc_tunnel.sh script The new test has been executed both in some x86 local qemu machine, as well as in CI: # ./test_progs -a tc_tunnel #454/1 tc_tunnel/ipip_none:OK #454/2 tc_tunnel/ipip6_none:OK #454/3 tc_tunnel/ip6tnl_none:OK #454/4 tc_tunnel/sit_none:OK #454/5 tc_tunnel/vxlan_eth:OK #454/6 tc_tunnel/ip6vxlan_eth:OK #454/7 tc_tunnel/gre_none:OK #454/8 tc_tunnel/gre_eth:OK #454/9 tc_tunnel/gre_mpls:OK #454/10 tc_tunnel/ip6gre_none:OK #454/11 tc_tunnel/ip6gre_eth:OK #454/12 tc_tunnel/ip6gre_mpls:OK #454/13 tc_tunnel/udp_none:OK #454/14 tc_tunnel/udp_eth:OK #454/15 tc_tunnel/udp_mpls:OK #454/16 tc_tunnel/ip6udp_none:OK #454/17 tc_tunnel/ip6udp_eth:OK #454/18 tc_tunnel/ip6udp_mpls:OK #454 tc_tunnel:OK Summary: 1/18 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com> --- Changes in v3: - remove systematic "connection must fail" test part of each subtest - also remove kernel-based decap test for subtests supposed to fail on kernel side - fix potential fd leak if connection structure allocation fails - fix wrong early return in run_test - Link to v2: https://lore.kernel.org/r/20251022-tc_tunnel-v2-0-a44a0bd52902@bootlin.com Changes in v2: - declare a single tc_prog_attach helper rather than multiple, intermediate helpers - move the new helper to network_helpers.c rather than a dedicated file - do not rename existing tc_helpers.c/h pair (drop patch) - keep only the minimal set of needed NS switches - Link to v1: https://lore.kernel.org/r/20251017-tc_tunnel-v1-0-2d86808d86b2@bootlin.com --- Alexis Lothoré (eBPF Foundation) (4): selftests/bpf: add tc helpers selftests/bpf: make test_tc_tunnel.bpf.c compatible with big endian platforms selftests/bpf: integrate test_tc_tunnel.sh tests into test_progs selftests/bpf: remove test_tc_tunnel.sh tools/testing/selftests/bpf/Makefile | 1 - tools/testing/selftests/bpf/network_helpers.c | 45 ++ tools/testing/selftests/bpf/network_helpers.h | 16 + .../selftests/bpf/prog_tests/test_tc_tunnel.c | 674 +++++++++++++++++++++ .../testing/selftests/bpf/prog_tests/test_tunnel.c | 107 +--- tools/testing/selftests/bpf/progs/test_tc_tunnel.c | 95 ++- tools/testing/selftests/bpf/test_tc_tunnel.sh | 320 ---------- 7 files changed, 790 insertions(+), 468 deletions(-) --- base-commit: ecdeefe65eaeb82a1262e20401ba750b8c9e0b97 change-id: 20250811-tc_tunnel-c61342683f18 Best regards, -- Alexis Lothoré, Bootlin Embedded Linux and Kernel engineering https://bootlin.com

2 months, 1 week

4
9
0 0

[PATCH] KVM: arm64: selftests: Filter ZCR_EL2 in get-reg-list

by Mark Brown

get-reg-list includes ZCR_EL2 in the list of EL2 registers that it looks for when NV is enabled but does not have any feature gate for this register, meaning that testing any combination of features that includes EL2 but does not include SVE will result in a test failure due to a missing register being reported: | The following lines are missing registers: | | ARM64_SYS_REG(3, 4, 1, 2, 0), Add ZCR_EL2 to feat_id_regs so that the test knows not to expect to see it without SVE being enabled. Fixes: 3a90b6f27964 ("KVM: arm64: selftests: get-reg-list: Add base EL2 registers") Signed-off-by: Mark Brown <broonie(a)kernel.org> --- tools/testing/selftests/kvm/arm64/get-reg-list.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/kvm/arm64/get-reg-list.c b/tools/testing/selftests/kvm/arm64/get-reg-list.c index c9b84eeaab6b..7ae26ce875ad 100644 --- a/tools/testing/selftests/kvm/arm64/get-reg-list.c +++ b/tools/testing/selftests/kvm/arm64/get-reg-list.c @@ -68,6 +68,7 @@ static struct feature_id_reg feat_id_regs[] = { REG_FEAT(VNCR_EL2, ID_AA64MMFR4_EL1, NV_frac, NV2_ONLY), REG_FEAT(CNTHV_CTL_EL2, ID_AA64MMFR1_EL1, VH, IMP), REG_FEAT(CNTHV_CVAL_EL2,ID_AA64MMFR1_EL1, VH, IMP), + REG_FEAT(ZCR_EL2, ID_AA64PFR0_EL1, SVE, IMP), }; bool filter_reg(__u64 reg) --- base-commit: 211ddde0823f1442e4ad052a2f30f050145ccada change-id: 20251023-kvm-arm64-get-reg-list-zcr-el2-c43090e11f23 Best regards, -- Mark Brown <broonie(a)kernel.org>

2 months, 1 week

2
1
0 0

[PATCH] KVM: arm64: selftests: Add SCTLR2_EL2 to get-reg-list

by Mark Brown

We recently added support for SCTLR2_EL2 to the kernel but did not add it to get-reg-list, resulting in it reporting the missing register when it is available. Add it. Signed-off-by: Mark Brown <broonie(a)kernel.org> --- tools/testing/selftests/kvm/arm64/get-reg-list.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/kvm/arm64/get-reg-list.c b/tools/testing/selftests/kvm/arm64/get-reg-list.c index c9b84eeaab6b..2abef0a86d46 100644 --- a/tools/testing/selftests/kvm/arm64/get-reg-list.c +++ b/tools/testing/selftests/kvm/arm64/get-reg-list.c @@ -63,6 +63,7 @@ static struct feature_id_reg feat_id_regs[] = { REG_FEAT(HDFGWTR2_EL2, ID_AA64MMFR0_EL1, FGT, FGT2), REG_FEAT(ZCR_EL2, ID_AA64PFR0_EL1, SVE, IMP), REG_FEAT(SCTLR2_EL1, ID_AA64MMFR3_EL1, SCTLRX, IMP), + REG_FEAT(SCTLR2_EL2, ID_AA64MMFR3_EL1, SCTLRX, IMP), REG_FEAT(VDISR_EL2, ID_AA64PFR0_EL1, RAS, IMP), REG_FEAT(VSESR_EL2, ID_AA64PFR0_EL1, RAS, IMP), REG_FEAT(VNCR_EL2, ID_AA64MMFR4_EL1, NV_frac, NV2_ONLY), @@ -718,6 +719,7 @@ static __u64 el2_regs[] = { SYS_REG(VMPIDR_EL2), SYS_REG(SCTLR_EL2), SYS_REG(ACTLR_EL2), + SYS_REG(SCTLR2_EL2), SYS_REG(HCR_EL2), SYS_REG(MDCR_EL2), SYS_REG(CPTR_EL2), --- base-commit: 211ddde0823f1442e4ad052a2f30f050145ccada change-id: 20251023-b4-kvm-arm64-get-reg-list-sctlr-el2-222e463e8aaf Best regards, -- Mark Brown <broonie(a)kernel.org>

2 months, 1 week

2
1
0 0

[PATCH v2] KVM: selftests: fix MAPC RDbase target formatting in vgic_lpi_stress

by Maximilian Dittgen

Since GITS_TYPER.PTA == 0, the ITS MAPC command demands a CPU ID, rather than a physical redistributor address, for its RDbase command argument. As such, when MAPC-ing guest ITS collections, vgic_lpi_stress iterates over CPU IDs in the range [0, nr_cpus), passing them as the RDbase vcpu_id argument to its_send_mapc_cmd(). However, its_encode_target() in the its_send_mapc_cmd() selftest handler expects RDbase arguments to be formatted with a 16 bit offset, as shown by the 16-bit target_addr right shift its implementation: its_mask_encode(&cmd->raw_cmd[2], target_addr >> 16, 51, 16) At the moment, all CPU IDs passed into its_send_mapc_cmd() have no offset, therefore becoming 0x0 after the bit shift. Thus, when vgic_its_cmd_handle_mapc() receives the ITS command in vgic-its.c, it always interprets the RDbase target CPU as CPU 0. All interrupts sent to collections will be processed by vCPU 0, which defeats the purpose of this multi-vCPU test. Fix by creating procnum_to_rdbase() helper function, which left-shifts the vCPU parameter received by its_send_mapc_cmd 16 bits before passing it to its_encode_target for encoding. Signed-off-by: Maximilian Dittgen <mdittgen(a)amazon.de> --- v2: Refactor the vcpu_id left shift into procnum_to_rdbase() helper. Rename and rewrite commit to reflect root cause of bug which was improper RDbase formatting, not that MAPC expects a physical address as the RDbase parameter. To validate the patch, I added the following debug code at the top of vgic_its_cmd_handle_mapc: u64 raw_cmd2 = le64_to_cpu(its_cmd[2]); u32 target_addr = its_cmd_get_target_addr(its_cmd); kvm_info("MAPC: coll_id=%d, raw_cmd[2]=0x%llx, parsed_target=%u\n", coll_id, raw_cmd2, target_addr); vcpu = kvm_get_vcpu_by_id(kvm, its_cmd_get_target_addr(its_cmd)); kvm_info("MAPC: coll_id=%d, vcpu_id=%d\n", coll_id, vcpu ? vcpu->vcpu_id : -1); I then ran `./vgic_lpi_stress -v 3` to trigger the stress selftest with 3 vCPUs. Before the patch, the debug logs read: kvm [20832]: MAPC: coll_id=0, raw_cmd[2]=0x8000000000000000, parsed_target=0 kvm [20832]: MAPC: coll_id=0, vcpu_id=0 kvm [20832]: MAPC: coll_id=1, raw_cmd[2]=0x8000000000000001, parsed_target=0 kvm [20832]: MAPC: coll_id=1, vcpu_id=0 kvm [20832]: MAPC: coll_id=2, raw_cmd[2]=0x8000000000000002, parsed_target=0 kvm [20832]: MAPC: coll_id=2, vcpu_id=0 Note the last bit of the cmd string reflects the collection ID, but the rest of the cmd string reads 0. The handler parses out vCPU 0 for all 3 mapc calls. After the patch, the debug logs read: kvm [20019]: MAPC: coll_id=0, raw_cmd[2]=0x8000000000000000, parsed_target=0 kvm [20019]: MAPC: coll_id=0, vcpu_id=0 kvm [20019]: MAPC: coll_id=1, raw_cmd[2]=0x8000000000010001, parsed_target=1 kvm [20019]: MAPC: coll_id=1, vcpu_id=1 kvm [20019]: MAPC: coll_id=2, raw_cmd[2]=0x8000000000020002, parsed_target=2 kvm [20019]: MAPC: coll_id=2, vcpu_id=2 Note that the target vcpu and target collection are both visible in the cmd string. The handler parses out the correct vCPU for all 3 mapc calls. ___ tools/testing/selftests/kvm/lib/arm64/gic_v3_its.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/kvm/lib/arm64/gic_v3_its.c b/tools/testing/selftests/kvm/lib/arm64/gic_v3_its.c index 09f270545646..0e2f8ed90f30 100644 --- a/tools/testing/selftests/kvm/lib/arm64/gic_v3_its.c +++ b/tools/testing/selftests/kvm/lib/arm64/gic_v3_its.c @@ -15,6 +15,8 @@ #include "gic_v3.h" #include "processor.h" +#define GITS_COLLECTION_TARGET_SHIFT 16 + static u64 its_read_u64(unsigned long offset) { return readq_relaxed(GITS_BASE_GVA + offset); @@ -163,6 +165,11 @@ static void its_encode_collection(struct its_cmd_block *cmd, u16 col) its_mask_encode(&cmd->raw_cmd[2], col, 15, 0); } +static u64 procnum_to_rdbase(u32 vcpu_id) +{ + return vcpu_id << GITS_COLLECTION_TARGET_SHIFT; +} + #define GITS_CMDQ_POLL_ITERATIONS 0 static void its_send_cmd(void *cmdq_base, struct its_cmd_block *cmd) @@ -217,7 +224,7 @@ void its_send_mapc_cmd(void *cmdq_base, u32 vcpu_id, u32 collection_id, bool val its_encode_cmd(&cmd, GITS_CMD_MAPC); its_encode_collection(&cmd, collection_id); - its_encode_target(&cmd, vcpu_id); + its_encode_target(&cmd, procnum_to_rdbase(vcpu_id)); its_encode_valid(&cmd, valid); its_send_cmd(cmdq_base, &cmd); -- 2.50.1 (Apple Git-155) Amazon Web Services Development Center Germany GmbH Tamara-Danz-Str. 13 10243 Berlin Geschaeftsfuehrung: Christian Schlaeger Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597

2 months, 1 week

2
1
0 0

[PATCH 0/3] KVM: selftests: arm64: Improve diagnostics from set_id_regs

by Mark Brown

While debugging issues related to aarch64 only systems I ran into speedbumps due to the lack of detail in the results reported when the guest register read and reset value preservation tests were run, they generated an immediately fatal assert without indicating which register was being tested. Update these tests to report a result per register, making it much easier to see what the problem being reported is. A similar, though less severe, issue exists with the validation of the individual bitfields in registers due to the use of immediately fatal asserts. Update those asserts to be standard kselftest reports. Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Mark Brown (3): KVM: selftests: arm64: Report set_id_reg reads of test registers as tests KVM: selftests: arm64: Report register reset tests individually KVM: selftests: arm64: Make set_id_regs bitfield validatity checks non-fatal tools/testing/selftests/kvm/arm64/set_id_regs.c | 108 ++++++++++++++++++------ 1 file changed, 82 insertions(+), 26 deletions(-) --- base-commit: 211ddde0823f1442e4ad052a2f30f050145ccada change-id: 20251028-kvm-arm64-set-id-regs-aarch64-ebb77969401c Best regards, -- Mark Brown <broonie(a)kernel.org>

2 months, 1 week

1
3
0 0

[PATCH bpf-next v3 0/3] bpf: Add overwrite mode for BPF ring buffer

by Xu Kuohai

When the BPF ring buffer is full, a new event cannot be recorded until one or more old events are consumed to make enough space for it. In cases such as fault diagnostics, where recent events are more useful than older ones, this mechanism may lead to critical events being lost. So add overwrite mode for BPF ring buffer to address it. In this mode, the new event overwrites the oldest event when the buffer is full. v3: - remove half-round wakeup, drop unnecessary min in ringbuf_avail_data_sz(), switch to smp_load_acquire, update tests and fix typos, etc (Andrii) - rebase and re-collect performance data v2: https://lore.kernel.org/bpf/20250905150641.2078838-1-xukuohai@huaweicloud.c… - remove libbpf changes (Andrii) - update overwrite benchmark v1: https://lore.kernel.org/bpf/20250804022101.2171981-1-xukuohai@huaweicloud.c… Xu Kuohai (3): bpf: Add overwrite mode for BPF ring buffer selftests/bpf: Add overwrite mode test for BPF ring buffer selftests/bpf/benchs: Add overwrite mode benchmark for BPF ring buffer include/uapi/linux/bpf.h | 4 + kernel/bpf/ringbuf.c | 109 +++++++++++++++--- tools/include/uapi/linux/bpf.h | 4 + tools/testing/selftests/bpf/Makefile | 3 +- .../selftests/bpf/benchs/bench_ringbufs.c | 66 ++++++++++- .../bpf/benchs/run_bench_ringbufs.sh | 4 + .../selftests/bpf/prog_tests/ringbuf.c | 64 ++++++++++ .../selftests/bpf/progs/ringbuf_bench.c | 11 ++ .../bpf/progs/test_ringbuf_overwrite.c | 98 ++++++++++++++++ 9 files changed, 337 insertions(+), 26 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/test_ringbuf_overwrite.c -- 2.43.0

2 months, 1 week

2
7
0 0

[PATCH bpf-next v6 00/15] selftests/bpf: Integrate test_xsk.c to test_progs framework

by Bastien Curutchet (eBPF Foundation)

Hi all, The test_xsk.sh script covers many AF_XDP use cases. The tests it runs are defined in xksxceiver.c. Since this script is used to test real hardware, the goal here is to leave it as it is, and only integrate the tests that run on veth peers into the test_progs framework. I've looked into what could improve the speed in the CI: - some tests are skipped when run on veth peers in a VM (because they rely on huge page allocation or HW rings). This skipping logic still takes some time and can be easily avoided. - the TEARDOWN test is quite long (several seconds on its own) because it runs the same test 10 times in a row to ensure the teardown process works properly With theses tests fully skipped in the CI and the veth setup done only once for each mode (DRV / SKB), the execution time is reduced to about 5 seconds on my setup. ``` $ tools/testing/selftests/bpf/vmtest.sh -d $HOME/ebpf/output-regular/ -- time ./test_progs -t xsk [...] real 0m 5.04s user 0m 0.38s sys 0m 1.61s ``` It still feels a bit long, but there are 24 tests run in both DRV and SKB modes which means around 100ms for each one. I'm not sure I can make it much faster without randomizing the tests so that not all of them run in every CI execution. PATCH 1 extracts test_xsk[.c/.h] from xskxceiver[.c/.h] to make the tests available to test_progs. PATCH 2 to 7 fix small issues in the current test PATCH 8 to 13 handle all errors to release resources instead of calling exit() when any error occurs. PATCH 14 isolates the tests that won't fit in the CI PATCH 15 integrates the CI tests to the test_progs framework Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com> --- Changes in v6: - Setup veth peer once for each mode instead of once for each substest - Rename the 'flaky' table 'skip-ci' table and move the automatically skipped and the longest tests into it - Link to v5: https://lore.kernel.org/r/20251016-xsk-v5-0-662c95eb8005@bootlin.com Changes in v5: - Rebase on latest bpf-next_base - Move XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF to the flaky table - Add Maciej's reviewed-by - Link to v4: https://lore.kernel.org/r/20250924-xsk-v4-0-20e57537b876@bootlin.com Changes in v4: - Fix test_xsk.sh's summary report. - Merge PATCH 11 & 12 together, otherwise PATCH 11 fails to build. - Split old PATCH 3 in two patches. The first one fixes testapp_stats_rx_dropped(), the second one fixes testapp_xdp_shared_umem(). The unecessary frees (in testapp_stats_rx_full() and testapp_stats_fill_empty() are removed) - Link to v3: https://lore.kernel.org/r/20250904-xsk-v3-0-ce382e331485@bootlin.com Changes in v3: - Rebase on latest bpf-next_base to integrate commit c9110e6f7237 ("selftests/bpf: Fix count write in testapp_xdp_metadata_copy()"). - Move XDP_METADATA_COPY_* tests from flaky-tests to nominal tests - Link to v2: https://lore.kernel.org/r/20250902-xsk-v2-0-17c6345d5215@bootlin.com Changes in v2: - Rebase on the latest bpf-next_base and integrate the newly added tests to the work (adjust_tail* and tx_queue_consumer tests) - Re-order patches to split xkxceiver sooner. - Fix the bug reported by Maciej. - Fix verbose mode in test_xsk.sh by keeping kselftest (remove PATCH 1, 7 and 8) - Link to v1: https://lore.kernel.org/r/20250313-xsk-v1-0-7374729a93b9@bootlin.com --- Bastien Curutchet (eBPF Foundation) (15): selftests/bpf: test_xsk: Split xskxceiver selftests/bpf: test_xsk: Initialize bitmap before use selftests/bpf: test_xsk: Fix __testapp_validate_traffic()'s return value selftests/bpf: test_xsk: fix memory leak in testapp_stats_rx_dropped() selftests/bpf: test_xsk: fix memory leak in testapp_xdp_shared_umem() selftests/bpf: test_xsk: Wrap test clean-up in functions selftests/bpf: test_xsk: Release resources when swap fails selftests/bpf: test_xsk: Add return value to init_iface() selftests/bpf: test_xsk: Don't exit immediately when xsk_attach fails selftests/bpf: test_xsk: Don't exit immediately when gettimeofday fails selftests/bpf: test_xsk: Don't exit immediately when workers fail selftests/bpf: test_xsk: Don't exit immediately if validate_traffic fails selftests/bpf: test_xsk: Don't exit immediately on allocation failures selftests/bpf: test_xsk: Isolate non-CI tests selftests/bpf: test_xsk: Integrate test_xsk.c to test_progs framework tools/testing/selftests/bpf/Makefile | 11 +- tools/testing/selftests/bpf/prog_tests/test_xsk.c | 2595 ++++++++++++++++++++ tools/testing/selftests/bpf/prog_tests/test_xsk.h | 298 +++ tools/testing/selftests/bpf/prog_tests/xsk.c | 151 ++ tools/testing/selftests/bpf/xskxceiver.c | 2696 +-------------------- tools/testing/selftests/bpf/xskxceiver.h | 156 -- 6 files changed, 3183 insertions(+), 2724 deletions(-) --- base-commit: 4481a8590725400f37d3015f0ee0d53a2cdc1bd6 change-id: 20250218-xsk-0cf90e975d14 Best regards, -- Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>

2 months, 1 week

4
19
0 0

[PATCH 0/1] selftests: net: use BASH for bareudp testing

by Po-Hsu Lin

The bareudp.sh script uses /bin/sh and it will load another lib.sh BASH script at the very beginning. But on some operating systems like Ubuntu, /bin/sh is actually pointed to DASH, thus it will try to run BASH commands with DASH and consequently leads to syntax issues. This patch fixes syntax failures on systems where /bin/sh is not BASH by explicitily using BASH for bareudp.sh. Po-Hsu Lin (1): selftests: net: use BASH for bareudp testing tools/testing/selftests/net/bareudp.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.34.1

2 months, 2 weeks

3
3
0 0

[PATCH v2] selftest: net: fix socklen_t type mismatch in sctp_collision test

by Ankit Khushwaha

Socket APIs like recvfrom(), accept(), and getsockname() expect socklen_t* arg, but tests were using int variables. This causes -Wpointer-sign warnings on platforms where socklen_t is unsigned. Change the variable type from int to socklen_t to resolve the warning and ensure type safety across platforms. warning fixed: sctp_collision.c:62:70: warning: passing 'int *' to parameter of type 'socklen_t *' (aka 'unsigned int *') converts between pointers to integer types with different sign [-Wpointer-sign] 62 | ret = recvfrom(sd, buf, sizeof(buf), 0, (struct sockaddr *)&daddr, &len); | ^~~~ /usr/include/sys/socket.h:165:27: note: passing argument to parameter '__addr_len' here 165 | socklen_t *__restrict __addr_len); | ^ Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com> Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux(a)gmail.com> --- tools/testing/selftests/net/netfilter/sctp_collision.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/netfilter/sctp_collision.c b/tools/testing/selftests/net/netfilter/sctp_collision.c index 21bb1cfd8a85..91df996367e9 100644 --- a/tools/testing/selftests/net/netfilter/sctp_collision.c +++ b/tools/testing/selftests/net/netfilter/sctp_collision.c @@ -9,7 +9,8 @@ int main(int argc, char *argv[]) { struct sockaddr_in saddr = {}, daddr = {}; - int sd, ret, len = sizeof(daddr); + socklen_t len = sizeof(daddr); struct timeval tv = {25, 0}; char buf[] = "hello"; + int sd, ret; -- 2.51.0

2 months, 2 weeks

3
3
0 0

[PATCH v3 0/3] KHO: kfence + KHO memory corruption fix

by Pasha Tatashin

This series fixes a memory corruption bug in KHO that occurs when KFENCE is enabled. The root cause is that KHO metadata, allocated via kzalloc(), can be randomly serviced by kfence_alloc(). When a kernel boots via KHO, the early memblock allocator is restricted to a "scratch area". This forces the KFENCE pool to be allocated within this scratch area, creating a conflict. If KHO metadata is subsequently placed in this pool, it gets corrupted during the next kexec operation. Patch 1/3 introduces a debug-only feature (CONFIG_KEXEC_HANDOVER_DEBUG) that adds checks to detect and fail any operation that attempts to place KHO metadata or preserved memory within the scratch area. This serves as a validation and diagnostic tool to confirm the problem without affecting production builds. Patch 2/3 Increases bitmap to PAGE_SIZE, so buddy allocator can be used. Patch 3/3 Provides the fix by modifying KHO to allocate its metadata directly from the buddy allocator instead of slab. This bypasses the KFENCE interception entirely. Pasha Tatashin (3): liveupdate: kho: warn and fail on metadata or preserved memory in scratch area liveupdate: kho: Increase metadata bitmap size to PAGE_SIZE liveupdate: kho: allocate metadata directly from the buddy allocator include/linux/gfp.h | 3 ++ kernel/Kconfig.kexec | 9 ++++ kernel/Makefile | 1 + kernel/kexec_handover.c | 72 ++++++++++++++++++++------------ kernel/kexec_handover_debug.c | 25 +++++++++++ kernel/kexec_handover_internal.h | 16 +++++++ 6 files changed, 100 insertions(+), 26 deletions(-) create mode 100644 kernel/kexec_handover_debug.c create mode 100644 kernel/kexec_handover_internal.h base-commit: 6548d364a3e850326831799d7e3ea2d7bb97ba08 -- 2.51.0.869.ge66316f041-goog

2 months, 2 weeks

6
24
0 0

[RFC bpf-next 0/2] Print map ID on successful creation

by Harshit Mogalapalli

Hi all, I have tried looking at an issue from the bpftool repository: https://github.com/libbpf/bpftool/issues/121 and this RFC tries to add that enhancement. Summary: Currently when a map creation is successful there is no message on the terminal, printing IDs on successful creation of maps can help notify the user and can be used in CI/CD. The first patch adds the logic for printing and the second patch adds a simple selftest for the same. The github issue is not fully solved with these two patches, as there are other bpf objects that might need similar additions. Would appreciate any inputs on this. Thank you very much. Regards, Harshit Harshit Mogalapalli (2): bpftool: Print map ID upon creation and support JSON output selftests/bpf: Add test for bpftool map ID printing tools/bpf/bpftool/map.c | 24 +++++++++++--- .../testing/selftests/bpf/test_bpftool_map.sh | 32 +++++++++++++++++++ 2 files changed, 52 insertions(+), 4 deletions(-) -- 2.50.1

2 months, 2 weeks

2
6
0 0

[PATCH] selftests/seccomp: Fixed type mismatch warning

by Alessandro Zanni

Forced cast of the variable passed to the get_uprobe_offset() function either probed_uretprobe or probed_uprobe. The solved warning is as follows: CC seccomp_bpf seccomp_bpf.c: In function ‘UPROBE_setup’: seccomp_bpf.c:5175:74: warning: pointer type mismatch in conditional expression 5175 | offset = get_uprobe_offset(variant->uretprobe ? probed_uretprobe : probed_uprobe); | Command to test it: make -C tools/testing/selftests TARGETS=seccomp Signed-off-by: Alessandro Zanni <alessandro.zanni87(a)gmail.com> --- tools/testing/selftests/seccomp/seccomp_bpf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 874f17763536..cd745a8a5b7e 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -5172,7 +5172,8 @@ FIXTURE_SETUP(UPROBE) ASSERT_GE(bit, 0); } - offset = get_uprobe_offset(variant->uretprobe ? probed_uretprobe : probed_uprobe); + offset = get_uprobe_offset(variant->uretprobe + ? (void *)probed_uretprobe : (void *)probed_uprobe); ASSERT_GE(offset, 0); if (variant->uretprobe) -- 2.43.0

2 months, 2 weeks

1
0
0 0

[PATCH net] selftests: netdevsim: Fix ethtool-coalesce.sh fail by installing ethtool-common.sh

by Wang Liang

The script "ethtool-common.sh" is not installed in INSTALL_PATH, and triggers some errors when I try to run the test 'drivers/net/netdevsim/ethtool-coalesce.sh': TAP version 13 1..1 # timeout set to 600 # selftests: drivers/net/netdevsim: ethtool-coalesce.sh # ./ethtool-coalesce.sh: line 4: ethtool-common.sh: No such file or directory # ./ethtool-coalesce.sh: line 25: make_netdev: command not found # ethtool: bad command line argument(s) # ./ethtool-coalesce.sh: line 124: check: command not found # ./ethtool-coalesce.sh: line 126: [: -eq: unary operator expected # FAILED /0 checks not ok 1 selftests: drivers/net/netdevsim: ethtool-coalesce.sh # exit=1 Install this file to avoid this error. After this patch: TAP version 13 1..1 # timeout set to 600 # selftests: drivers/net/netdevsim: ethtool-coalesce.sh # PASSED all 22 checks ok 1 selftests: drivers/net/netdevsim: ethtool-coalesce.sh Fixes: fbb8531e58bd ("selftests: extract common functions in ethtool-common.sh") Signed-off-by: Wang Liang <wangliang74(a)huawei.com> --- tools/testing/selftests/drivers/net/netdevsim/Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/drivers/net/netdevsim/Makefile b/tools/testing/selftests/drivers/net/netdevsim/Makefile index daf51113c827..653141a654a0 100644 --- a/tools/testing/selftests/drivers/net/netdevsim/Makefile +++ b/tools/testing/selftests/drivers/net/netdevsim/Makefile @@ -20,4 +20,6 @@ TEST_PROGS := \ udp_tunnel_nic.sh \ # end of TEST_PROGS +TEST_FILES := ethtool-common.sh + include ../../../lib.mk -- 2.34.1

2 months, 2 weeks

3
4
0 0

[PATCH net-next 00/12] selftests/vsock: refactor and improve vmtest infrastructure

by Bobby Eshleman

Hey all, This patch series refactors the vsock selftest VM infrastructure to improve test run times, reduce false-positives, improve logging generally, and fix several bugs. It also prepares for future tests which make heavy usage of these refactored functions and have new requirements such as simultaneous QEMU processes. These patches were broken off from this prior series: https://lore.kernel.org/all/20251021-vsock-vmtest-v7-0-0661b7b6f081@meta.co… To: Stefano Garzarella <sgarzare(a)redhat.com> To: Shuah Khan <shuah(a)kernel.org> To: Jakub Kicinski <kuba(a)kernel.org> To: Bobby Eshleman <bobbyeshleman(a)gmail.com> Cc: virtualization(a)lists.linux.dev Cc: netdev(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org Signed-off-by: Bobby Eshleman <bobbyeshleman(a)meta.com> --- Bobby Eshleman (12): selftests/vsock: improve logging in vmtest.sh selftests/vsock: make wait_for_listener() work even if pipefail is on selftests/vsock: reuse logic for vsock_test through wrapper functions selftests/vsock: avoid multi-VM pidfile collisions with QEMU selftests/vsock: do not unconditionally die if qemu fails selftests/vsock: speed up tests by reducing the QEMU pidfile timeout selftests/vsock: add check_result() for pass/fail counting selftests/vsock: identify and execute tests that can re-use VM selftests/vsock: add BUILD=0 definition selftests/vsock: avoid false-positives when checking dmesg selftests/vsock: add 1.37 to tested virtme-ng versions selftests/vsock: add vsock_loopback module loading tools/testing/selftests/vsock/vmtest.sh | 345 +++++++++++++++++++++----------- 1 file changed, 227 insertions(+), 118 deletions(-) --- base-commit: 962ac5ca99a5c3e7469215bf47572440402dfd59 change-id: 20251021-vsock-selftests-fixes-and-improvements-057440ffb2fa Best regards, -- Bobby Eshleman <bobbyeshleman(a)meta.com>

2 months, 2 weeks

3
33
0 0

[PATCH v3 00/12] Start porting UML to nolibc

by Benjamin Berg

From: Benjamin Berg <benjamin.berg(a)intel.com> This patchset is an attempt to start a nolibc port of UML. The goal is to port UML to use nolibc in smaller chunks to make the switch more manageable. Using nolibc has the advantage that it is a smaller runtime and it allows us to be in full control about all memory mappings that are done. Another libc on the other hand might map memory unaware of UML, causing collisions with the UML memory layout. Such mappings could even happen before UML has fully initialized (e.g. rseq being mapped into the physical or vmalloc memory areas). There are three parts to this patchset: * Two patches to use tools/include headers instead of kernel headers for userspace files. * A few nolibc fixes and a new NOLIBC_NO_RUNTIME compile flag for it * Finally nolibc build support for UML and switching two files while adding the appropriate support in nolibc itself. v1 of this patchset was https://lore.kernel.org/all/20250915071115.1429196-1-benjamin@sipsolutions.… v2: https://lore.kernel.org/all/20250919153420.727385-1-benjamin@sipsolutions.n… Changes in v3: - sys_ptrace is now not a varadic function - improve printf %m implementation - keep perror as function available with NOLIBC_IGNORE_ERRNO - change syscall guard and fix i386 build Changes in v2: - add sys/uio.h and sys/ptrace.h to nolibc - Use NOLIBC_NO_RUNTIME to disable nolibc startup code - Fix out-of-tree build - various small improvements and cleanups Benjamin Benjamin Berg (12): tools compiler.h: fix __used definition um: use tools/include for user files tools/nolibc/stdio: let perror work when NOLIBC_IGNORE_ERRNO is set tools/nolibc/dirent: avoid errno in readdir_r tools/nolibc: implement %m if errno is not defined tools/nolibc: use __fallthrough__ rather than fallthrough tools/nolibc: add option to disable runtime um: add infrastructure to build files using nolibc um: use nolibc for the --showconfig implementation tools/nolibc: add uio.h with readv and writev tools/nolibc: add ptrace support um: switch ptrace FP register access to nolibc arch/um/Makefile | 38 ++++++++++++--- arch/um/include/shared/init.h | 2 +- arch/um/include/shared/os.h | 2 + arch/um/include/shared/user.h | 6 --- arch/um/kernel/Makefile | 2 +- arch/um/kernel/skas/stub.c | 1 + arch/um/kernel/skas/stub_exe.c | 4 +- arch/um/os-Linux/skas/process.c | 6 +-- arch/um/os-Linux/start_up.c | 4 +- arch/um/scripts/Makefile.rules | 10 +++- arch/x86/um/Makefile | 6 ++- arch/x86/um/os-Linux/Makefile | 5 +- arch/x86/um/os-Linux/registers.c | 20 ++------ arch/x86/um/user-offsets.c | 1 - tools/include/linux/compiler.h | 2 +- tools/include/nolibc/Makefile | 2 + tools/include/nolibc/arch-arm.h | 2 + tools/include/nolibc/arch-arm64.h | 2 + tools/include/nolibc/arch-loongarch.h | 2 + tools/include/nolibc/arch-m68k.h | 2 + tools/include/nolibc/arch-mips.h | 2 + tools/include/nolibc/arch-powerpc.h | 2 + tools/include/nolibc/arch-riscv.h | 2 + tools/include/nolibc/arch-s390.h | 2 + tools/include/nolibc/arch-sh.h | 2 + tools/include/nolibc/arch-sparc.h | 2 + tools/include/nolibc/arch-x86.h | 4 ++ tools/include/nolibc/compiler.h | 4 +- tools/include/nolibc/crt.h | 3 ++ tools/include/nolibc/dirent.h | 6 +-- tools/include/nolibc/nolibc.h | 2 + tools/include/nolibc/stackprotector.h | 2 + tools/include/nolibc/stdio.h | 10 +++- tools/include/nolibc/stdlib.h | 2 + tools/include/nolibc/sys.h | 3 +- tools/include/nolibc/sys/auxv.h | 3 ++ tools/include/nolibc/sys/ptrace.h | 44 ++++++++++++++++++ tools/include/nolibc/sys/uio.h | 49 ++++++++++++++++++++ tools/testing/selftests/nolibc/nolibc-test.c | 11 +++++ 39 files changed, 221 insertions(+), 53 deletions(-) create mode 100644 tools/include/nolibc/sys/ptrace.h create mode 100644 tools/include/nolibc/sys/uio.h -- 2.51.0

2 months, 2 weeks

3
19
0 0

[PATCH] MAINTAINERS: Update KUnit email address for Rae Moar

by Rae Moar

Update Rae's email address for the KUnit entry. Also add an entry to .mailmap to map former google email to current gmail address. Signed-off-by: Rae Moar <rmoar(a)google.com> --- I am leaving Google and am going through and cleaning up my @google.com address in the relevant places. Note that Friday, November 7 2025 is my last day at Google after which I will lose access to this email account so any future updates or comments after Friday will come from my @gmail.com account. .mailmap | 1 + MAINTAINERS | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/.mailmap b/.mailmap index d2edd256b19d..2fcf7e4a5cfd 100644 --- a/.mailmap +++ b/.mailmap @@ -642,6 +642,7 @@ Qais Yousef <qyousef(a)layalina.io> <qais.yousef(a)arm.com> Quentin Monnet <qmo(a)kernel.org> <quentin.monnet(a)netronome.com> Quentin Monnet <qmo(a)kernel.org> <quentin(a)isovalent.com> Quentin Perret <qperret(a)qperret.net> <quentin.perret(a)arm.com> +Rae Moar <raemoar63(a)gmail.com> <rmoar(a)google.com> Rafael J. Wysocki <rjw(a)rjwysocki.net> <rjw(a)sisk.pl> Rajeev Nandan <quic_rajeevny(a)quicinc.com> <rajeevny(a)codeaurora.org> Rajendra Nayak <quic_rjendra(a)quicinc.com> <rnayak(a)codeaurora.org> diff --git a/MAINTAINERS b/MAINTAINERS index 46126ce2f968..eefcff990987 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -13601,7 +13601,7 @@ F: fs/smb/server/ KERNEL UNIT TESTING FRAMEWORK (KUnit) M: Brendan Higgins <brendan.higgins(a)linux.dev> M: David Gow <davidgow(a)google.com> -R: Rae Moar <rmoar(a)google.com> +R: Rae Moar <raemoar63(a)gmail.com> L: linux-kselftest(a)vger.kernel.org L: kunit-dev(a)googlegroups.com S: Maintained base-commit: 9de5f847ef8fa205f4fd704a381d32ecb5b66da9 -- 2.51.1.851.g4ebd6896fd-goog

2 months, 2 weeks

3
2
0 0

[PATCH v5 net-next 00/14] AccECN protocol case handling series

by chia-yu.chang＠nokia-bell-labs.com

From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com> Hello, Plesae find the v5 AccECN case handling patch series, which covers several excpetional case handling of Accurate ECN spec (RFC9768), adds new identifiers to be used by CC modules, adds ecn_delta into rate_sample, and keeps the ACE counter for computation, etc. This patch series is part of the full AccECN patch series, which is available at https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/ Best regards, Chia-Yu --- v5: - Move previous #11 in v4 in latter patch after discussion with RFC author. - Add #3 to update the comments for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN. (Parav Pandit <parav(a)nvidia.com>) - Add gro self-test for TCP CWR flag in #4. (Eric Dumazet <edumazet(a)google.com>) - Add fixes: tag into #7 (Paolo Abeni <pabeni(a)redhat.com>) - Update commit message of #8 and if condition check (Paolo Abeni <pabeni(a)redhat.com>) - Add empty line between variable declarations and code in #13 (Paolo Abeni <pabeni(a)redhat.com>) v4: - Add previous #13 in v2 back after dicussion with the RFC author. - Add TCP_ACCECN_OPTION_PERSIST to tcp_ecn_option sysctl to ignore AccECN fallback policy on sending AccECN option. v3: - Add additional min() check if pkts_acked_ewma is not initialized in #1. (Paolo Abeni <pabeni(a)redhat.com>) - Change TCP_CONG_WANTS_ECT_1 into individual flag add helper function INET_ECN_xmit_wants_ect_1() in #3. (Paolo Abeni <pabeni(a)redhat.com>) - Add empty line between variable declarations and code in #4. (Paolo Abeni <pabeni(a)redhat.com>) - Update commit message to fix old AccECN commits in #5. (Paolo Abeni <pabeni(a)redhat.com>) - Remove unnecessary brackets in #10. (Paolo Abeni <pabeni(a)redhat.com>) - Move patch #3 in v2 to a later Prague patch serise and remove patch #13 in v2. (Paolo Abeni <pabeni(a)redhat.com>) --- Chia-Yu Chang (12): net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN selftests/net: gro: add self-test for TCP CWR flag tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules tcp: disable RFC3168 fallback identifier for CC modules tcp: accecn: handle unexpected AccECN negotiation feedback tcp: accecn: retransmit downgraded SYN in AccECN negotiation tcp: move increment of num_retrans tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion tcp: accecn: fallback outgoing half link to non-AccECN tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST tcp: accecn: enable AccECN Ilpo Järvinen (2): tcp: try to avoid safer when ACKs are thinned gro: flushing when CWR is set negatively affects AccECN Documentation/networking/ip-sysctl.rst | 4 +- .../networking/net_cachelines/tcp_sock.rst | 1 + include/linux/skbuff.h | 13 ++- include/linux/tcp.h | 4 +- include/net/inet_ecn.h | 20 +++- include/net/tcp.h | 32 ++++++- include/net/tcp_ecn.h | 92 ++++++++++++++----- net/ipv4/sysctl_net_ipv4.c | 4 +- net/ipv4/tcp.c | 2 + net/ipv4/tcp_cong.c | 10 +- net/ipv4/tcp_input.c | 37 +++++++- net/ipv4/tcp_minisocks.c | 40 +++++--- net/ipv4/tcp_offload.c | 3 +- net/ipv4/tcp_output.c | 42 ++++++--- tools/testing/selftests/net/gro.c | 80 +++++++++++----- 15 files changed, 294 insertions(+), 90 deletions(-) -- 2.34.1

2 months, 2 weeks

1
14
0 0

[PATCH net-next v2] selftests: drv-net: replace the nsim ring test with a drv-net one

by Jakub Kicinski

We are trying to move away from netdevsim-only tests and towards tests which can be run both against netdevsim and real drivers. Replace the simple bash script we have for checking ethtool -g/-G on netdevsim with a Python test tweaking those params as well as channel count. The new test is not exactly equivalent to the netdevsim one, but real drivers don't often support random ring sizes, let alone modifying max values via debugfs. Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- v2: - add the new test to Makefile and remove the old one turns out NIPA checking for Makefile presence was busted v1: https://lore.kernel.org/20251024215552.1249838-1-kuba@kernel.org CC: andrew(a)lunn.ch CC: shuah(a)kernel.org CC: linux-kselftest(a)vger.kernel.org --- tools/testing/selftests/drivers/net/Makefile | 1 + .../selftests/drivers/net/netdevsim/Makefile | 1 - .../drivers/net/netdevsim/ethtool-ring.sh | 85 --------- .../selftests/drivers/net/ring_reconfig.py | 167 ++++++++++++++++++ 4 files changed, 168 insertions(+), 86 deletions(-) delete mode 100755 tools/testing/selftests/drivers/net/netdevsim/ethtool-ring.sh create mode 100755 tools/testing/selftests/drivers/net/ring_reconfig.py diff --git a/tools/testing/selftests/drivers/net/Makefile b/tools/testing/selftests/drivers/net/Makefile index 6e41635bd55a..68e0bb603a9d 100644 --- a/tools/testing/selftests/drivers/net/Makefile +++ b/tools/testing/selftests/drivers/net/Makefile @@ -22,6 +22,7 @@ TEST_PROGS := \ ping.py \ psp.py \ queues.py \ + ring_reconfig.py \ shaper.py \ stats.py \ xdp.py \ diff --git a/tools/testing/selftests/drivers/net/netdevsim/Makefile b/tools/testing/selftests/drivers/net/netdevsim/Makefile index daf51113c827..833abd8e6fdc 100644 --- a/tools/testing/selftests/drivers/net/netdevsim/Makefile +++ b/tools/testing/selftests/drivers/net/netdevsim/Makefile @@ -8,7 +8,6 @@ TEST_PROGS := \ ethtool-features.sh \ ethtool-fec.sh \ ethtool-pause.sh \ - ethtool-ring.sh \ fib.sh \ fib_notifications.sh \ hw_stats_l3.sh \ diff --git a/tools/testing/selftests/drivers/net/netdevsim/ethtool-ring.sh b/tools/testing/selftests/drivers/net/netdevsim/ethtool-ring.sh deleted file mode 100755 index c969559ffa7a..000000000000 --- a/tools/testing/selftests/drivers/net/netdevsim/ethtool-ring.sh +++ /dev/null @@ -1,85 +0,0 @@ -#!/bin/bash -# SPDX-License-Identifier: GPL-2.0-only - -source ethtool-common.sh - -function get_value { - local query="${SETTINGS_MAP[$1]}" - - echo $(ethtool -g $NSIM_NETDEV | \ - tail -n +$CURR_SETT_LINE | \ - awk -F':' -v pattern="$query:" '$0 ~ pattern {gsub(/[\t ]/, "", $2); print $2}') -} - -function update_current_settings { - for key in ${!SETTINGS_MAP[@]}; do - CURRENT_SETTINGS[$key]=$(get_value $key) - done - echo ${CURRENT_SETTINGS[@]} -} - -if ! ethtool -h | grep -q set-ring >/dev/null; then - echo "SKIP: No --set-ring support in ethtool" - exit 4 -fi - -NSIM_NETDEV=$(make_netdev) - -set -o pipefail - -declare -A SETTINGS_MAP=( - ["rx"]="RX" - ["rx-mini"]="RX Mini" - ["rx-jumbo"]="RX Jumbo" - ["tx"]="TX" -) - -declare -A EXPECTED_SETTINGS=( - ["rx"]="" - ["rx-mini"]="" - ["rx-jumbo"]="" - ["tx"]="" -) - -declare -A CURRENT_SETTINGS=( - ["rx"]="" - ["rx-mini"]="" - ["rx-jumbo"]="" - ["tx"]="" -) - -MAX_VALUE=$((RANDOM % $((2**32-1)))) -RING_MAX_LIST=$(ls $NSIM_DEV_DFS/ethtool/ring/) - -for ring_max_entry in $RING_MAX_LIST; do - echo $MAX_VALUE > $NSIM_DEV_DFS/ethtool/ring/$ring_max_entry -done - -CURR_SETT_LINE=$(ethtool -g $NSIM_NETDEV | grep -i -m1 -n 'Current hardware settings' | cut -f1 -d:) - -# populate the expected settings map -for key in ${!SETTINGS_MAP[@]}; do - EXPECTED_SETTINGS[$key]=$(get_value $key) -done - -# test -for key in ${!SETTINGS_MAP[@]}; do - value=$((RANDOM % $MAX_VALUE)) - - ethtool -G $NSIM_NETDEV "$key" "$value" - - EXPECTED_SETTINGS[$key]="$value" - expected=${EXPECTED_SETTINGS[@]} - current=$(update_current_settings) - - check $? "$current" "$expected" - set +x -done - -if [ $num_errors -eq 0 ]; then - echo "PASSED all $((num_passes)) checks" - exit 0 -else - echo "FAILED $num_errors/$((num_errors+num_passes)) checks" - exit 1 -fi diff --git a/tools/testing/selftests/drivers/net/ring_reconfig.py b/tools/testing/selftests/drivers/net/ring_reconfig.py new file mode 100755 index 000000000000..2251efe63014 --- /dev/null +++ b/tools/testing/selftests/drivers/net/ring_reconfig.py @@ -0,0 +1,167 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 + +""" +Test channel and ring size configuration via ethtool (-L / -G). +""" + +from lib.py import ksft_run, ksft_exit, ksft_pr +from lib.py import ksft_eq +from lib.py import NetDrvEpEnv, EthtoolFamily, GenerateTraffic +from lib.py import defer, NlError + + +def channels(cfg) -> None: + """ + Twiddle channel counts in various combinations of parameters. + We're only looking for driver adhering to the requested config + if the config is accepted and crashes. + """ + ehdr = {'header':{'dev-index': cfg.ifindex}} + chans = cfg.eth.channels_get(ehdr) + + all_keys = ["rx", "tx", "combined"] + mixes = [{"combined"}, {"rx", "tx"}, {"rx", "combined"}, {"tx", "combined"}, + {"rx", "tx", "combined"},] + + # Get the set of keys that device actually supports + restore = {} + supported = set() + for key in all_keys: + if key + "-max" in chans: + supported.add(key) + restore |= {key + "-count": chans[key + "-count"]} + + defer(cfg.eth.channels_set, ehdr | restore) + + def test_config(config): + try: + cfg.eth.channels_set(ehdr | config) + get = cfg.eth.channels_get(ehdr) + for k, v in config.items(): + ksft_eq(get.get(k, 0), v) + except NlError as e: + failed.append(mix) + ksft_pr("Can't set", config, e) + else: + ksft_pr("Okay", config) + + failed = [] + for mix in mixes: + if not mix.issubset(supported): + continue + + # Set all the values in the mix to 1, other supported to 0 + config = {} + for key in all_keys: + config[key + "-count"] = 1 if key in mix else 0 + test_config(config) + + for mix in mixes: + if not mix.issubset(supported): + continue + if mix in failed: + continue + + # Set all the values in the mix to max, other supported to 0 + config = {} + for key in all_keys: + config[key + "-count"] = chans[key + '-max'] if key in mix else 0 + test_config(config) + + +def _configure_min_ring_cnt(cfg) -> None: + """ Try to configure a single Rx/Tx ring. """ + ehdr = {'header':{'dev-index': cfg.ifindex}} + chans = cfg.eth.channels_get(ehdr) + + all_keys = ["rx-count", "tx-count", "combined-count"] + restore = {} + config = {} + for key in all_keys: + if key in chans: + restore[key] = chans[key] + config[key] = 0 + + if chans.get('combined-count', 0) > 1: + config['combined-count'] = 1 + elif chans.get('rx-count', 0) > 1 and chans.get('tx-count', 0) > 1: + config['tx-count'] = 1 + config['rx-count'] = 1 + else: + # looks like we're already on 1 channel + return + + cfg.eth.channels_set(ehdr | config) + defer(cfg.eth.channels_set, ehdr | restore) + + +def ringparam(cfg) -> None: + """ + Tweak the ringparam configuration. Try to run some traffic over min + ring size to make sure it actually functions. + """ + ehdr = {'header':{'dev-index': cfg.ifindex}} + rings = cfg.eth.rings_get(ehdr) + + restore = {} + maxes = {} + params = set() + for key in rings.keys(): + if 'max' in key: + param = key[:-4] + maxes[param] = rings[key] + params.add(param) + restore[param] = rings[param] + + defer(cfg.eth.rings_set, ehdr | restore) + + # Speed up the reconfig by configuring just one ring + _configure_min_ring_cnt(cfg) + + # Try to reach min on all settings + for param in params: + val = rings[param] + while True: + try: + cfg.eth.rings_set({'header':{'dev-index': cfg.ifindex}, + param: val // 2}) + val //= 2 + if val <= 1: + break + except NlError: + break + + get = cfg.eth.rings_get(ehdr) + ksft_eq(get[param], val) + + ksft_pr(f"Reached min for '{param}' at {val} (max {rings[param]})") + + GenerateTraffic(cfg).wait_pkts_and_stop(50000) + + # Try max across all params, if the driver supports large rings + # this may OOM so we ignore errors + try: + ksft_pr("Applying max settings") + config = {p: maxes[p] for p in params} + cfg.eth.rings_set(ehdr | config) + except NlError as e: + ksft_pr("Can't set max params", config, e) + else: + GenerateTraffic(cfg).wait_pkts_and_stop(50000) + + +def main() -> None: + """ Ksft boiler plate main """ + + with NetDrvEpEnv(__file__) as cfg: + cfg.eth = EthtoolFamily() + + ksft_run([channels, + ringparam], + args=(cfg, )) + ksft_exit() + + +if __name__ == "__main__": + main() -- 2.51.0

2 months, 2 weeks

2
4
0 0

[PATCH bpf-next v2 0/2] bpf: Fix tnum_overlap to check for zero mask intersection

by KaFai Wan

This small patchset is about avoid verifier bug warning when tnum_overlap() is called with zero mask intersection. v2: - fix runtime error v1: https://lore.kernel.org/all/20251026163806.3300636-1-kafai.wan@linux.dev/ --- KaFai Wan (2): bpf: Fix tnum_overlap to check for zero mask intersection selftests/bpf: Range analysis test case for JEQ kernel/bpf/tnum.c | 2 ++ .../selftests/bpf/progs/verifier_bounds.c | 23 +++++++++++++++++++ 2 files changed, 25 insertions(+) -- 2.43.0

2 months, 2 weeks

3
4
0 0

[PATCH v11 00/21] TDX KVM selftests

by Sagi Shahar

This is v11 of the TDX selftests. This series is based on v6.17-rc7 Changes from v10 [1]: - Rebased on top of v6.17-rc4. - Addressed minor comments from v10. - Removed code for setting up X86_CR4_OSXMMEXCPT which is not needed for now. - Added call to vm_tdx_load_common_boot_parameters() in "KVM: selftests: Call TDX init when creating a new TDX vm" which was accidentally dropped between v9 and v10 due to code refactoring [1] https://lore.kernel.org/lkml/20250904065453.639610-1-sagis@google.com/#r Ackerley Tng (2): KVM: selftests: Add helpers to init TDX memory and finalize VM KVM: selftests: Add ucall support for TDX Erdem Aktas (2): KVM: selftests: Add TDX boot code KVM: selftests: Add support for TDX TDCALL from guest Isaku Yamahata (2): KVM: selftests: Update kvm_init_vm_address_properties() for TDX KVM: selftests: TDX: Use KVM_TDX_CAPABILITIES to validate TDs' attribute configuration Sagi Shahar (15): KVM: selftests: Allocate pgd in virt_map() as necessary KVM: selftests: Expose functions to get default sregs values KVM: selftests: Expose function to allocate guest vCPU stack KVM: selftests: Expose segment definitons to assembly files KVM: selftests: Add kbuild definitons KVM: selftests: Define structs to pass parameters to TDX boot code KVM: selftests: Set up TDX boot code region KVM: selftests: Set up TDX boot parameters region KVM: selftests: Add helper to initialize TDX VM KVM: selftests: Call TDX init when creating a new TDX vm KVM: selftests: Setup memory regions for TDX on vm creation KVM: selftests: Call KVM_TDX_INIT_VCPU when creating a new TDX vcpu KVM: selftests: Set entry point for TDX guest code KVM: selftests: Add wrapper for TDX MMIO from guest KVM: selftests: Add TDX lifecycle test tools/include/linux/kbuild.h | 18 + tools/testing/selftests/kvm/Makefile.kvm | 32 ++ .../selftests/kvm/include/ucall_common.h | 1 + .../selftests/kvm/include/x86/processor.h | 35 ++ .../selftests/kvm/include/x86/processor_asm.h | 12 + .../selftests/kvm/include/x86/tdx/td_boot.h | 74 ++++ .../kvm/include/x86/tdx/td_boot_asm.h | 16 + .../selftests/kvm/include/x86/tdx/tdcall.h | 34 ++ .../selftests/kvm/include/x86/tdx/tdx.h | 14 + .../selftests/kvm/include/x86/tdx/tdx_util.h | 86 +++++ .../testing/selftests/kvm/include/x86/ucall.h | 6 - tools/testing/selftests/kvm/lib/kvm_util.c | 10 +- .../testing/selftests/kvm/lib/x86/processor.c | 93 +++-- .../selftests/kvm/lib/x86/tdx/td_boot.S | 60 +++ .../kvm/lib/x86/tdx/td_boot_offsets.c | 21 ++ .../selftests/kvm/lib/x86/tdx/tdcall.S | 93 +++++ .../kvm/lib/x86/tdx/tdcall_offsets.c | 16 + tools/testing/selftests/kvm/lib/x86/tdx/tdx.c | 23 ++ .../selftests/kvm/lib/x86/tdx/tdx_util.c | 348 ++++++++++++++++++ tools/testing/selftests/kvm/lib/x86/ucall.c | 46 ++- tools/testing/selftests/kvm/x86/tdx_vm_test.c | 31 ++ 21 files changed, 1029 insertions(+), 40 deletions(-) create mode 100644 tools/include/linux/kbuild.h create mode 100644 tools/testing/selftests/kvm/include/x86/processor_asm.h create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/td_boot.h create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/td_boot_asm.h create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdcall.h create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdx.h create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdx_util.h create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/td_boot.S create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/td_boot_offsets.c create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdcall.S create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdcall_offsets.c create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdx.c create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdx_util.c create mode 100644 tools/testing/selftests/kvm/x86/tdx_vm_test.c -- 2.51.0.536.g15c5d4f767-goog

2 months, 2 weeks

3
32
0 0

[PATCH] selftest: net: fix socklen_t type mismatch in sctp_collision test

by Ankit Khushwaha

Socket APIs like recvfrom(), accept(), and getsockname() expect socklen_t* arg, but tests were using int variables. This causes -Wpointer-sign warnings on platforms where socklen_t is unsigned. Change the variable type from int to socklen_t to resolve the warning and ensure type safety across platforms. warning fixed: sctp_collision.c:62:70: warning: passing 'int *' to parameter of type 'socklen_t *' (aka 'unsigned int *') converts between pointers to integer types with different sign [-Wpointer-sign] 62 | ret = recvfrom(sd, buf, sizeof(buf), 0, (struct sockaddr *)&daddr, &len); | ^~~~ /usr/include/sys/socket.h:165:27: note: passing argument to parameter '__addr_len' here 165 | socklen_t *__restrict __addr_len); | ^ Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux(a)gmail.com> --- tools/testing/selftests/net/netfilter/sctp_collision.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/netfilter/sctp_collision.c b/tools/testing/selftests/net/netfilter/sctp_collision.c index 21bb1cfd8a85..91df996367e9 100644 --- a/tools/testing/selftests/net/netfilter/sctp_collision.c +++ b/tools/testing/selftests/net/netfilter/sctp_collision.c @@ -9,7 +9,8 @@ int main(int argc, char *argv[]) { struct sockaddr_in saddr = {}, daddr = {}; - int sd, ret, len = sizeof(daddr); + int sd, ret; + socklen_t len = sizeof(daddr); struct timeval tv = {25, 0}; char buf[] = "hello"; -- 2.51.0

2 months, 2 weeks

3
3
0 0

[PATCH v4] selftests/bpf: Change variable types for -Wsign-compare

by Mehdi Ben Hadj Khelifa

This is a follow up patch for commit 495d2d8133fd("selftests/bpf: Attempt to build BPF programs with -Wsign-compare") from Alexei Starovoitov[1] to be able to enable -Wsign-compare C compilation flag for clang since -Wall doesn't add it and BPF programs are built with clang.This has the benefit to catch problematic comparisons in future tests as quoted from the commit message:" int i = -1; unsigned int j = 1; if (i < j) // this is false. long i = -1; unsigned int j = 1; if (i < j) // this is true. C standard for reference: - If either operand is unsigned long the other shall be converted to unsigned long. - Otherwise, if one operand is a long int and the other unsigned int, then if a long int can represent all the values of an unsigned int, the unsigned int shall be converted to a long int; otherwise both operands shall be converted to unsigned long int. - Otherwise, if either operand is long, the other shall be converted to long. - Otherwise, if either operand is unsigned, the other shall be converted to unsigned. Unfortunately clang's -Wsign-compare is very noisy. It complains about (s32)a == (u32)b which is safe and doen't have surprising behavior." This specific patch supresses the following warnings when -Wsign-compare is enabled: 1 warning generated. progs/bpf_iter_bpf_percpu_array_map.c:35:16: warning: comparison of integers of different signs: 'int' and 'const volatile __u32' (aka 'const volatile unsigned int') [-Wsign-compare] 35 | for (i = 0; i < num_cpus; i++) { | ~ ^ ~~~~~~~~ 1 warning generated. progs/bpf_qdisc_fifo.c:93:2: warning: comparison of integers of different signs: 'int' and '__u32' (aka 'unsigned int') [-Wsign-compare] 93 | bpf_for(i, 0, sch->q.qlen) { | ^ ~ ~~~~~~~~~~~ Should be noted that many more similar changes are still needed in order to be able to enable the -Wsign-compare flag since -Werror is enabled and would cause compilation of bpf selftests to fail. [1]. Link:https://github.com/torvalds/linux/commit/495d2d8133fd1407519170a5238f4… Signed-off-by: Mehdi Ben Hadj Khelifa <mehdi.benhadjkhelifa(a)gmail.com> --- Changelog: Changes from v3: -Downsized the patch as suggested by vivek yadav[2]. -Changed the commit message as suggested by Daniel Borkmann[3]. Link:https://lore.kernel.org/all/20250925103559.14876-1-mehdi.benhadjkhelif… Changes from v2: -Split up the patch into a patch series as suggested by vivek -Include only changes to variable types with no casting by my mentor david -Removed the -Wsign-compare in Makefile to avoid compilation errors until adding casting for rest of comparisons. Link:https://lore.kernel.org/bpf/20250924195731.6374-1-mehdi.benhadjkhelifa… Changes from v1: - Fix CI failed builds where it failed due to do missing .c and .h files in my patch for working in mainline. Link:https://lore.kernel.org/bpf/20250924162408.815137-1-mehdi.benhadjkheli… [2]:https://lore.kernel.org/all/CABPSWR7_w3mxr74wCDEF=MYYuG2F_vMJeD-dqotc8MD… [3]:https://lore.kernel.org/all/5ad26663-a3cc-4bf4-9d6f-8213ac8e8ce6@iogearb… .../testing/selftests/bpf/progs/bpf_iter_bpf_percpu_array_map.c | 2 +- tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_percpu_array_map.c b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_percpu_array_map.c index 9fdea8cd4c6f..0baf00463f35 100644 --- a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_percpu_array_map.c +++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_percpu_array_map.c @@ -24,7 +24,7 @@ int dump_bpf_percpu_array_map(struct bpf_iter__bpf_map_elem *ctx) __u32 *key = ctx->key; void *pptr = ctx->value; __u32 step; - int i; + __u32 i; if (key == (void *)0 || pptr == (void *)0) return 0; diff --git a/tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c b/tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c index 1de2be3e370b..7a639dcb23a9 100644 --- a/tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c +++ b/tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c @@ -88,7 +88,7 @@ void BPF_PROG(bpf_fifo_reset, struct Qdisc *sch) { struct bpf_list_node *node; struct skb_node *skbn; - int i; + __u32 i; bpf_for(i, 0, sch->q.qlen) { struct sk_buff *skb = NULL; -- 2.51.1.dirty

2 months, 2 weeks

2
1
0 0

[PATCH bpf-next v2 0/2] bpf: Skip bounds adjustment for conditional jumps on same register

by KaFai Wan

This small patchset is about avoid verifier bug warning when conditional jumps on same register when the register holds a scalar with range. v2: - Enhance is_branch_taken() and is_scalar_branch_taken() to handle branch direction computation for same register. (Eduard and Alexei) - Update the selftest. v1: https://lore.kernel.org/bpf/20251022164457.1203756-1-kafai.wan@linux.dev/ --- KaFai Wan (2): bpf: Skip bounds adjustment for conditional jumps on same register selftests/bpf: Add test for BPF_JGT on same register kernel/bpf/verifier.c | 32 +++++++++++++++++++ .../selftests/bpf/progs/verifier_bounds.c | 18 +++++++++++ 2 files changed, 50 insertions(+) -- 2.43.0

2 months, 2 weeks

2
6
0 0

[PATCH] selftests: harness: Support KCOV.

by Kuniyuki Iwashima

While writing a selftest with kselftest_harness.h, I often want to check which paths are actually exercised. Let's support generating KCOV coverage data. We can specify the output directory via the KCOV_OUTPUT environment variable, and the number of instructions to collect via the KCOV_SLOTS environment variable. # KCOV_OUTPUT=$PWD/kcov KCOV_SLOTS=$((4096 * 2)) \ ./tools/testing/selftests/net/af_unix/scm_inq Both variables can also be specified as the make variable. # make -C tools/testing/selftests/ \ KCOV_OUTPUT=$PWD/kcov KCOV_SLOTS=$((4096 * 4)) \ kselftest_override_timeout=60 TARGETS=net/af_unix run_tests The coverage data can be simply decoded with addr2line: $ cat kcov/* | sort | uniq | addr2line -e vmlinux | grep unix net/unix/af_unix.c:1056 net/unix/af_unix.c:3138 net/unix/af_unix.c:3834 net/unix/af_unix.c:3838 net/unix/af_unix.c:311 (discriminator 2) ... or more nicely with a script embedded in vock [0]: $ cat kcov/* | sort | uniq > local.log $ python3 ~/kernel/tools/vock/report.py \ --kernel-src ./ --vmlinux ./vmlinux \ --mode local --local-log local.log --filter unix ... ------------------------------- Coverage Report -------------------------------- 📄 net/unix/af_unix.c (276 lines) ... 942 | static int unix_setsockopt(struct socket *sock, int level, int optname, 943 | sockptr_t optval, unsigned int optlen) 944 | { ... 961 | switch (optname) { 962 | case SO_INQ: 963 > if (sk->sk_type != SOCK_STREAM) 964 | return -EINVAL; 965 | 966 > if (val > 1 || val < 0) 967 | return -EINVAL; 968 | 969 > WRITE_ONCE(u->recvmsg_inq, val); 970 | break; Link: https://github.com/kzall0c/vock/blob/f3d97de9954f9df758c0ab287ca7e24e654288… #[0] Signed-off-by: Kuniyuki Iwashima <kuniyu(a)google.com> --- Documentation/dev-tools/kselftest.rst | 41 +++++++ tools/testing/selftests/Makefile | 14 ++- tools/testing/selftests/kselftest_harness.h | 128 +++++++++++++++++++- 3 files changed, 174 insertions(+), 9 deletions(-) diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst index 18c2da67fae42..5c2b92ac4a300 100644 --- a/Documentation/dev-tools/kselftest.rst +++ b/Documentation/dev-tools/kselftest.rst @@ -200,6 +200,47 @@ You can look at the TAP output to see if you ran into the timeout. Test runners which know a test must run under a specific time can then optionally treat these timeouts then as fatal. +KCOV for selftests +================== + +Selftests built with `kselftest_harness.h` natively support generating +KCOV coverage data. See :doc:`KCOV: code coverage for fuzzing </dev-tools/kcov>` +for prerequisites. + +You can specify the output directory with the `KCOV_OUTPUT` environment +variable. Additionally, you can specify the number of instructions to +collect with the `KCOV_SLOTS` environment variable :: + + # KCOV_OUTPUT=$PWD/kcov KCOV_SLOTS=$((4096 * 2)) \ + ./tools/testing/selftests/net/af_unix/scm_inq + +In the output directory, a coverage file is generated for each test +case in the selftest :: + + $ ls kcov/ + scm_inq.dgram.basic scm_inq.seqpacket.basic scm_inq.stream.basic + +The default value of `KCOV_SLOTS` is `4096`, and `KCOV_SLOTS` multiplied +by `sizeof(unsigned long)` must be multiple of `4096`, so the smallest +value is `512`. + +Both `KCOV_OUTPUT` and `KCOV_SLOTS` can be specified as the variables +on the `make` command line :: + + # make -C tools/testing/selftests/ \ + kselftest_override_timeout=60 \ + KCOV_OUTPUT=$PWD/kcov KCOV_SLOTS=$((4096 * 4)) \ + TARGETS=net/af_unix run_tests + +The collected data can be decoded with `addr2line` :: + + $ cat kcov/* | sort | uniq | addr2line -e vmlinux | grep unix + net/unix/af_unix.c:1056 + net/unix/af_unix.c:3138 + net/unix/af_unix.c:3834 + net/unix/af_unix.c:3838 + ... + Packaging selftests =================== diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index c46ebdb9b8ef7..40e70fb1a3478 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -218,12 +218,14 @@ all: done; exit $$ret; run_tests: all - @for TARGET in $(TARGETS); do \ - BUILD_TARGET=$$BUILD/$$TARGET; \ - $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET run_tests \ - SRC_PATH=$(shell readlink -e $$(pwd)) \ - OBJ_PATH=$(BUILD) \ - O=$(abs_objtree); \ + @for TARGET in $(TARGETS); do \ + BUILD_TARGET=$$BUILD/$$TARGET; \ + $(MAKE) OUTPUT=$$BUILD_TARGET \ + KCOV_OUTPUT=$(abspath $(KCOV_OUTPUT)) \ + -C $$TARGET run_tests \ + SRC_PATH=$(shell readlink -e $$(pwd)) \ + OBJ_PATH=$(BUILD) \ + O=$(abs_objtree); \ done; hotplug: diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h index 3f66e862e83eb..cba8020853b5d 100644 --- a/tools/testing/selftests/kselftest_harness.h +++ b/tools/testing/selftests/kselftest_harness.h @@ -56,6 +56,8 @@ #include <asm/types.h> #include <ctype.h> #include <errno.h> +#include <fcntl.h> +#include <linux/kcov.h> #include <linux/unistd.h> #include <poll.h> #include <stdbool.h> @@ -63,7 +65,9 @@ #include <stdio.h> #include <stdlib.h> #include <string.h> +#include <sys/ioctl.h> #include <sys/mman.h> +#include <sys/stat.h> #include <sys/types.h> #include <sys/wait.h> #include <unistd.h> @@ -401,7 +405,8 @@ const FIXTURE_VARIANT(fixture_name) *variant); \ static void wrapper_##fixture_name##_##test_name( \ struct __test_metadata *_metadata, \ - struct __fixture_variant_metadata *variant) \ + struct __fixture_variant_metadata *variant, \ + char *test_full_name) \ { \ /* fixture data is alloced, setup, and torn down per call. */ \ FIXTURE_DATA(fixture_name) self_private, *self = NULL; \ @@ -430,7 +435,9 @@ if (_metadata->exit_code) \ _exit(0); \ *_metadata->no_teardown = false; \ + enable_kcov(_metadata); \ fixture_name##_##test_name(_metadata, self, variant->data); \ + disable_kcov(_metadata, test_full_name); \ _metadata->teardown_fn(false, _metadata, self, variant->data); \ _exit(0); \ } else if (child < 0 || child != waitpid(child, &status, 0)) { \ @@ -470,6 +477,8 @@ object->teardown_fn = &wrapper_##fixture_name##_##test_name##_teardown; \ object->termsig = signal; \ object->timeout = tmout; \ + object->kcov_fd = -1; \ + object->kcov_slots = -1; \ _##fixture_name##_##test_name##_object = object; \ __register_test(object); \ } \ @@ -908,7 +917,8 @@ __register_fixture_variant(struct __fixture_metadata *f, struct __test_metadata { const char *name; void (*fn)(struct __test_metadata *, - struct __fixture_variant_metadata *); + struct __fixture_variant_metadata *, + char *test_name); pid_t pid; /* pid of test when being run */ struct __fixture_metadata *fixture; void (*teardown_fn)(bool in_parent, struct __test_metadata *_metadata, @@ -923,6 +933,10 @@ struct __test_metadata { const void *variant; struct __test_results *results; struct __test_metadata *prev, *next; + int kcov_fd; + int kcov_slots; + char *kcov_dir; + unsigned long *kcov_mem; }; static inline bool __test_passed(struct __test_metadata *metadata) @@ -1185,6 +1199,114 @@ static bool test_enabled(int argc, char **argv, return !has_positive; } +#define KCOV_SLOTS 4096 + +static void enable_kcov(struct __test_metadata *t) +{ + char *slots; + int err; + + t->kcov_dir = getenv("KCOV_OUTPUT"); + if (!t->kcov_dir || *t->kcov_dir == '\0') + return; + + slots = getenv("KCOV_SLOTS"); + if (slots && *slots != '\0') + sscanf(slots, "%d", &t->kcov_slots); + if (t->kcov_slots <= 0) + t->kcov_slots = KCOV_SLOTS; + + t->kcov_fd = open("/sys/kernel/debug/kcov", O_RDWR); + if (t->kcov_fd < 0) { + ksft_print_msg("ERROR OPENING KCOV FD\n"); + goto err; + } + + err = ioctl(t->kcov_fd, KCOV_INIT_TRACE, t->kcov_slots); + if (err) { + ksft_print_msg("ERROR INITIALISING KCOV\n"); + goto err; + } + + t->kcov_mem = mmap(NULL, sizeof(unsigned long) * t->kcov_slots, + PROT_READ | PROT_WRITE, MAP_SHARED, t->kcov_fd, 0); + if ((void *)t->kcov_mem == MAP_FAILED) { + ksft_print_msg("ERROR ALLOCATING MEMORY FOR KCOV\n"); + goto err; + } + + err = ioctl(t->kcov_fd, KCOV_ENABLE, KCOV_TRACE_PC); + if (err) { + ksft_print_msg("ERROR ENABLING KCOV\n"); + goto err; + } + + __atomic_store_n(&t->kcov_mem[0], 0, __ATOMIC_RELAXED); + return; +err: + t->exit_code = KSFT_FAIL; + _exit(KSFT_FAIL); +} + +static void disable_kcov(struct __test_metadata *t, char *test_name) +{ + int slots, err, dir, fd, i; + + if (t->kcov_fd == -1) + return; + + slots = __atomic_load_n(&t->kcov_mem[0], __ATOMIC_RELAXED); + if (slots == t->kcov_slots - 1) + ksft_print_msg("Set KCOV_SLOTS to a value greater than %d\n", t->kcov_slots); + + err = ioctl(t->kcov_fd, KCOV_DISABLE, 0); + if (err) { + ksft_print_msg("ERROR DISABLING KCOV\n"); + goto out; + } + + err = mkdir(t->kcov_dir, 0755); + if (err == -1 && errno != EEXIST) { + ksft_print_msg("ERROR CREATING '%s'\n", t->kcov_dir); + goto out; + } + err = 0; + + dir = open(t->kcov_dir, O_DIRECTORY); + if (dir < 0) { + ksft_print_msg("ERROR OPENING %s\n", t->kcov_dir); + err = dir; + goto out; + } + + fd = openat(dir, test_name, O_RDWR | O_CREAT | O_TRUNC); + + close(dir); + + if (fd == -1) { + ksft_print_msg("ERROR CREATING '%s' at '%s'\n", test_name, t->kcov_dir); + err = fd; + goto out; + } + + for (i = 0; i < slots; i++) { + char buf[64]; + int size; + + size = snprintf(buf, 64, "0x%lx\n", t->kcov_mem[i + 1]); + write(fd, buf, size); + } + +out: + munmap(t->kcov_mem, sizeof(t->kcov_mem[0]) * t->kcov_slots); + close(t->kcov_fd); + + if (err) { + t->exit_code = KSFT_FAIL; + _exit(KSFT_FAIL); + } +} + static void __run_test(struct __fixture_metadata *f, struct __fixture_variant_metadata *variant, struct __test_metadata *t) @@ -1216,7 +1338,7 @@ static void __run_test(struct __fixture_metadata *f, t->exit_code = KSFT_FAIL; } else if (child == 0) { setpgrp(); - t->fn(t, variant); + t->fn(t, variant, test_name); _exit(t->exit_code); } else { t->pid = child; -- 2.51.0.858.gf9c4a03a3a-goog

2 months, 2 weeks

2
3
0 0

[PATCH net-next v8 1/2] net/tls: support setting the maximum payload size

by Wilfred Mallawa

From: Wilfred Mallawa <wilfred.mallawa(a)wdc.com> During a handshake, an endpoint may specify a maximum record size limit. Currently, the kernel defaults to TLS_MAX_PAYLOAD_SIZE (16KB) for the maximum record size. Meaning that, the outgoing records from the kernel can exceed a lower size negotiated during the handshake. In such a case, the TLS endpoint must send a fatal "record_overflow" alert [1], and thus the record is discarded. Upcoming Western Digital NVMe-TCP hardware controllers implement TLS support. For these devices, supporting TLS record size negotiation is necessary because the maximum TLS record size supported by the controller is less than the default 16KB currently used by the kernel. Currently, there is no way to inform the kernel of such a limit. This patch adds support to a new setsockopt() option `TLS_TX_MAX_PAYLOAD_LEN` that allows for setting the maximum plaintext fragment size. Once set, outgoing records are no larger than the size specified. This option can be used to specify the record size limit. [1] https://www.rfc-editor.org/rfc/rfc8449 Signed-off-by: Wilfred Mallawa <wilfred.mallawa(a)wdc.com> --- V7 -> V8: - Fixup HTML doc indentation - Drop the getsockopt() change in V7 where ContentType was included in the max payload length --- Documentation/networking/tls.rst | 20 ++++++++++ include/net/tls.h | 3 ++ include/uapi/linux/tls.h | 2 + net/tls/tls_device.c | 2 +- net/tls/tls_main.c | 64 ++++++++++++++++++++++++++++++++ net/tls/tls_sw.c | 2 +- 6 files changed, 91 insertions(+), 2 deletions(-) diff --git a/Documentation/networking/tls.rst b/Documentation/networking/tls.rst index 36cc7afc2527..980c442d7161 100644 --- a/Documentation/networking/tls.rst +++ b/Documentation/networking/tls.rst @@ -280,6 +280,26 @@ If the record decrypted turns out to had been padded or is not a data record it will be decrypted again into a kernel buffer without zero copy. Such events are counted in the ``TlsDecryptRetry`` statistic. +TLS_TX_MAX_PAYLOAD_LEN +~~~~~~~~~~~~~~~~~~~~~~ + +Specifies the maximum size of the plaintext payload for transmitted TLS records. + +When this option is set, the kernel enforces the specified limit on all outgoing +TLS records. No plaintext fragment will exceed this size. This option can be used +to implement the TLS Record Size Limit extension [1]. + +* For TLS 1.2, the value corresponds directly to the record size limit. +* For TLS 1.3, the value should be set to record_size_limit - 1, since + the record size limit includes one additional byte for the ContentType + field. + +The valid range for this option is 64 to 16384 bytes for TLS 1.2, and 63 to +16384 bytes for TLS 1.3. The lower minimum for TLS 1.3 accounts for the +extra byte used by the ContentType field. + +[1] https://datatracker.ietf.org/doc/html/rfc8449 + Statistics ========== diff --git a/include/net/tls.h b/include/net/tls.h index 857340338b69..f2af113728aa 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -53,6 +53,8 @@ struct tls_rec; /* Maximum data size carried in a TLS record */ #define TLS_MAX_PAYLOAD_SIZE ((size_t)1 << 14) +/* Minimum record size limit as per RFC8449 */ +#define TLS_MIN_RECORD_SIZE_LIM ((size_t)1 << 6) #define TLS_HEADER_SIZE 5 #define TLS_NONCE_OFFSET TLS_HEADER_SIZE @@ -226,6 +228,7 @@ struct tls_context { u8 rx_conf:3; u8 zerocopy_sendfile:1; u8 rx_no_pad:1; + u16 tx_max_payload_len; int (*push_pending_record)(struct sock *sk, int flags); void (*sk_write_space)(struct sock *sk); diff --git a/include/uapi/linux/tls.h b/include/uapi/linux/tls.h index b66a800389cc..b8b9c42f848c 100644 --- a/include/uapi/linux/tls.h +++ b/include/uapi/linux/tls.h @@ -41,6 +41,7 @@ #define TLS_RX 2 /* Set receive parameters */ #define TLS_TX_ZEROCOPY_RO 3 /* TX zerocopy (only sendfile now) */ #define TLS_RX_EXPECT_NO_PAD 4 /* Attempt opportunistic zero-copy */ +#define TLS_TX_MAX_PAYLOAD_LEN 5 /* Maximum plaintext size */ /* Supported versions */ #define TLS_VERSION_MINOR(ver) ((ver) & 0xFF) @@ -194,6 +195,7 @@ enum { TLS_INFO_RXCONF, TLS_INFO_ZC_RO_TX, TLS_INFO_RX_NO_PAD, + TLS_INFO_TX_MAX_PAYLOAD_LEN, __TLS_INFO_MAX, }; #define TLS_INFO_MAX (__TLS_INFO_MAX - 1) diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index caa2b5d24622..4d29b390aed9 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -462,7 +462,7 @@ static int tls_push_data(struct sock *sk, /* TLS_HEADER_SIZE is not counted as part of the TLS record, and * we need to leave room for an authentication tag. */ - max_open_record_len = TLS_MAX_PAYLOAD_SIZE + + max_open_record_len = tls_ctx->tx_max_payload_len + prot->prepend_size; do { rc = tls_do_allocation(sk, ctx, pfrag, prot->prepend_size); diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index 39a2ab47fe72..56ce0bc8317b 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -541,6 +541,28 @@ static int do_tls_getsockopt_no_pad(struct sock *sk, char __user *optval, return 0; } +static int do_tls_getsockopt_tx_payload_len(struct sock *sk, char __user *optval, + int __user *optlen) +{ + struct tls_context *ctx = tls_get_ctx(sk); + u16 payload_len = ctx->tx_max_payload_len; + int len; + + if (get_user(len, optlen)) + return -EFAULT; + + if (len < sizeof(payload_len)) + return -EINVAL; + + if (put_user(sizeof(payload_len), optlen)) + return -EFAULT; + + if (copy_to_user(optval, &payload_len, sizeof(payload_len))) + return -EFAULT; + + return 0; +} + static int do_tls_getsockopt(struct sock *sk, int optname, char __user *optval, int __user *optlen) { @@ -560,6 +582,9 @@ static int do_tls_getsockopt(struct sock *sk, int optname, case TLS_RX_EXPECT_NO_PAD: rc = do_tls_getsockopt_no_pad(sk, optval, optlen); break; + case TLS_TX_MAX_PAYLOAD_LEN: + rc = do_tls_getsockopt_tx_payload_len(sk, optval, optlen); + break; default: rc = -ENOPROTOOPT; break; @@ -809,6 +834,32 @@ static int do_tls_setsockopt_no_pad(struct sock *sk, sockptr_t optval, return rc; } +static int do_tls_setsockopt_tx_payload_len(struct sock *sk, sockptr_t optval, + unsigned int optlen) +{ + struct tls_context *ctx = tls_get_ctx(sk); + struct tls_sw_context_tx *sw_ctx = tls_sw_ctx_tx(ctx); + u16 value; + bool tls_13 = ctx->prot_info.version == TLS_1_3_VERSION; + + if (sw_ctx && sw_ctx->open_rec) + return -EBUSY; + + if (sockptr_is_null(optval) || optlen != sizeof(value)) + return -EINVAL; + + if (copy_from_sockptr(&value, optval, sizeof(value))) + return -EFAULT; + + if (value < TLS_MIN_RECORD_SIZE_LIM - (tls_13 ? 1 : 0) || + value > TLS_MAX_PAYLOAD_SIZE) + return -EINVAL; + + ctx->tx_max_payload_len = value; + + return 0; +} + static int do_tls_setsockopt(struct sock *sk, int optname, sockptr_t optval, unsigned int optlen) { @@ -830,6 +881,11 @@ static int do_tls_setsockopt(struct sock *sk, int optname, sockptr_t optval, case TLS_RX_EXPECT_NO_PAD: rc = do_tls_setsockopt_no_pad(sk, optval, optlen); break; + case TLS_TX_MAX_PAYLOAD_LEN: + lock_sock(sk); + rc = do_tls_setsockopt_tx_payload_len(sk, optval, optlen); + release_sock(sk); + break; default: rc = -ENOPROTOOPT; break; @@ -1019,6 +1075,7 @@ static int tls_init(struct sock *sk) ctx->tx_conf = TLS_BASE; ctx->rx_conf = TLS_BASE; + ctx->tx_max_payload_len = TLS_MAX_PAYLOAD_SIZE; update_sk_prot(sk, ctx); out: write_unlock_bh(&sk->sk_callback_lock); @@ -1108,6 +1165,12 @@ static int tls_get_info(struct sock *sk, struct sk_buff *skb, bool net_admin) goto nla_failure; } + err = nla_put_u16(skb, TLS_INFO_TX_MAX_PAYLOAD_LEN, + ctx->tx_max_payload_len); + + if (err) + goto nla_failure; + rcu_read_unlock(); nla_nest_end(skb, start); return 0; @@ -1129,6 +1192,7 @@ static size_t tls_get_info_size(const struct sock *sk, bool net_admin) nla_total_size(sizeof(u16)) + /* TLS_INFO_TXCONF */ nla_total_size(0) + /* TLS_INFO_ZC_RO_TX */ nla_total_size(0) + /* TLS_INFO_RX_NO_PAD */ + nla_total_size(sizeof(u16)) + /* TLS_INFO_TX_MAX_PAYLOAD_LEN */ 0; return size; diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index d17135369980..9937d4c810f2 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1079,7 +1079,7 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg, orig_size = msg_pl->sg.size; full_record = false; try_to_copy = msg_data_left(msg); - record_room = TLS_MAX_PAYLOAD_SIZE - msg_pl->sg.size; + record_room = tls_ctx->tx_max_payload_len - msg_pl->sg.size; if (try_to_copy >= record_room) { try_to_copy = record_room; full_record = true; -- 2.51.0

2 months, 2 weeks

4
10
0 0

[PATCH bpf-next v1] selftests/bpf: Guard addr_space_cast code with __BPF_FEATURE_ADDR_SPACE_CAST

by Jiayuan Chen

When compiling the BPF selftests with Clang versions that do not support the addr_space_cast builtin, the build fails with assembly errors in "verifier_ldsx.c" [1]. The root cause is that the inline assembly using addr_space_cast is being processed by a compiler that lacks this feature. To resolve this, wrap the affected code sections (specifically the arena_ldsx_* test functions) with #if defined(__BPF_FEATURE_ADDR_SPACE_CAST). This ensures the code is only compiled when the Clang supports the necessary feature, preventing build failures on older or incompatible compiler versions. This change maintains test coverage for systems with support while allowing the tests to build successfully in all environments. [1]: root:tools/testing/selftests/bpf$ make CLNG-BPF [test_progs] verifier_ldsx.bpf.o progs/verifier_ldsx.c:322:2: error: invalid operand for instruction 322 | "r1 = %[arena] ll;" | ^ <inline asm>:1:52: note: instantiated into assembly here 1 | r1 = arena ll;r0 = 0xdeadbeef;r0 = addr_space_cast(r0,... | ^ Fixes: f61654912404 ("selftests: bpf: Add tests for signed loads from arena") Signed-off-by: Jiayuan Chen <jiayuan.chen(a)linux.dev> --- tools/testing/selftests/bpf/progs/verifier_ldsx.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/bpf/progs/verifier_ldsx.c b/tools/testing/selftests/bpf/progs/verifier_ldsx.c index c8494b682c31..cefa02e417d3 100644 --- a/tools/testing/selftests/bpf/progs/verifier_ldsx.c +++ b/tools/testing/selftests/bpf/progs/verifier_ldsx.c @@ -263,6 +263,7 @@ __naked void ldsx_ctx_8(void) : __clobber_all); } +#if defined(__BPF_FEATURE_ADDR_SPACE_CAST) SEC("syscall") __description("Arena LDSX Disasm") __success @@ -425,6 +426,7 @@ __naked void arena_ldsx_s32(void *ctx) : __clobber_all ); } +#endif /* to retain debug info for BTF generation */ void kfunc_root(void) -- 2.43.0

2 months, 2 weeks

4
9
0 0

[PATCH net v2 0/2] netconsole: Fix userdata race condition

by Gustavo Luiz Duarte

This series fixes a race condition in netconsole's userdata handling where concurrent message transmission could read partially updated userdata fields, resulting in corrupted netconsole output. The first patch adds a selftest that reproduces the race condition by continuously sending messages while rapidly changing userdata values, detecting any torn reads in the output. The second patch fixes the issue by ensuring update_userdata() holds the target_list_lock while updating both extradata_complete and userdata_length, preventing readers from seeing inconsistent state. This targets net tree as it fixes a bug introduced in commit df03f830d099 ("net: netconsole: cache userdata formatted string in netconsole_target"). Signed-off-by: Gustavo Luiz Duarte <gustavold(a)gmail.com> Changes in v2: - Added testcase to Makefile. - Reordered fix and testcase to avoid failure in CI. - testcase: delay cleanup until child process are killed, plus shellcheck fixes. - Link to v1: https://lore.kernel.org/all/20251020-netconsole-fix-race-v1-0-b775be30ee8a@… --- Gustavo Luiz Duarte (2): netconsole: Fix race condition in between reader and writer of userdata selftests: netconsole: Add race condition test for userdata corruption drivers/net/netconsole.c | 5 ++ tools/testing/selftests/drivers/net/Makefile | 1 + .../selftests/drivers/net/netcons_race_userdata.sh | 87 ++++++++++++++++++++++ 3 files changed, 93 insertions(+) --- base-commit: d63f0391d6c7b75e1a847e1a26349fa8cad0004d change-id: 20251020-netconsole-fix-race-f465f37b57ea Best regards, -- Gustavo Duarte <gustavold(a)meta.com>

2 months, 2 weeks

2
6
0 0

[PATCH bpf-next 0/2] bpf: Fix tnum_overlap to check for zero mask first

by KaFai Wan

This small patchset is about avoid verifier bug warning when tnum_overlap() is called with zero mask. --- KaFai Wan (2): bpf: Fix tnum_overlap to check for zero mask first selftests/bpf: Range analysis test case for JEQ kernel/bpf/tnum.c | 2 ++ .../selftests/bpf/progs/verifier_bounds.c | 23 +++++++++++++++++++ 2 files changed, 25 insertions(+) -- 2.43.0

2 months, 2 weeks

1
3
0 0

[PATCH v2] selftest: net: fix variable sized type issue not at the end of a struct

by Ankit Khushwaha

Some network selftests defined variable-sized types variable at the middle of struct causing -Wgnu-variable-sized-type-not-at-end warning. warning: timestamping.c:285:18: warning: field 'cm' with variable sized type 'struct cmsghdr' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end] 285 | struct cmsghdr cm; | ^ ipsec.c:835:5: warning: field 'u' with variable sized type 'union (unnamed union at ipsec.c:831:3)' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end] 835 | } u; | ^ This patch move these field at the end of struct to fix these warnings. Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux(a)gmail.com> --- Changelog: v2: https://lore.kernel.org/linux-kselftest/20251027050856.30270-1-ankitkhushwa… - fixed typos in the commit msg. --- tools/testing/selftests/net/ipsec.c | 2 +- tools/testing/selftests/net/timestamping.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/net/ipsec.c b/tools/testing/selftests/net/ipsec.c index 0ccf484b1d9d..36083c8f884f 100644 --- a/tools/testing/selftests/net/ipsec.c +++ b/tools/testing/selftests/net/ipsec.c @@ -828,12 +828,12 @@ static int xfrm_state_pack_algo(struct nlmsghdr *nh, size_t req_sz, struct xfrm_desc *desc) { struct { + char buf[XFRM_ALGO_KEY_BUF_SIZE]; union { struct xfrm_algo alg; struct xfrm_algo_aead aead; struct xfrm_algo_auth auth; } u; - char buf[XFRM_ALGO_KEY_BUF_SIZE]; } alg = {}; size_t alen, elen, clen, aelen; unsigned short type; diff --git a/tools/testing/selftests/net/timestamping.c b/tools/testing/selftests/net/timestamping.c index 044bc0e9ed81..ad2be2143698 100644 --- a/tools/testing/selftests/net/timestamping.c +++ b/tools/testing/selftests/net/timestamping.c @@ -282,8 +282,8 @@ static void recvpacket(int sock, int recvmsg_flags, struct iovec entry; struct sockaddr_in from_addr; struct { - struct cmsghdr cm; char control[512]; + struct cmsghdr cm; } control; int res; -- 2.51.0

2 months, 2 weeks

3
2
0 0

next-20251027: backlight.c:59:39: error: implicit declaration of function 'of_find_node_by_name'; did you mean 'bus_find_device_by_name'?

by Naresh Kamboju

The following powerpc ppc6xx_defconfig build regressions noticed on the Linux next-20251027 tag with gcc-14 and gcc-8. * powerpc, build - gcc-14-ppc6xx_defconfig - gcc-8-ppc6xx_defconfig First seen on next-20251027 Good: next-20251024 Bad: next-20251027 Regression Analysis: - New regression? yes - Reproducibility? yes Build regression: next-20251027: backlight.c:59:39: error: implicit declaration of function 'of_find_node_by_name'; did you mean 'bus_find_device_by_name'? Build regression: next-20251027: include/linux/math.h:167:43: error: first argument to '__builtin_choose_expr' not a constant Build regression: next-20251027: via-pmu-backlight.c:22:20: error: 'FB_BACKLIGHT_LEVELS' undeclared here (not in a function) Build regression: next-20251027: minmax.h:71:17: error: first argument to '__builtin_choose_expr' not a constant Build regression: next-20251027: compiler.h:168:17: error: '__UNIQUE_ID_x__286' undeclared (first use in this function); did you mean '__UNIQUE_ID_y__287'? Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org> ## Build error arch/powerpc/platforms/powermac/backlight.c: In function 'pmac_has_backlight_type': arch/powerpc/platforms/powermac/backlight.c:59:39: error: implicit declaration of function 'of_find_node_by_name'; did you mean 'bus_find_device_by_name'? [-Wimplicit-function-declaration] 59 | struct device_node* bk_node = of_find_node_by_name(NULL, "backlight"); | ^~~~~~~~~~~~~~~~~~~~ | bus_find_device_by_name arch/powerpc/platforms/powermac/backlight.c:59:39: error: initialization of 'struct device_node *' from 'int' makes pointer from integer without a cast [-Wint-conversion] arch/powerpc/platforms/powermac/backlight.c:60:17: error: implicit declaration of function 'of_property_match_string' [-Wimplicit-function-declaration] 60 | int i = of_property_match_string(bk_node, "backlight-control", type); | ^~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/platforms/powermac/backlight.c:62:9: error: implicit declaration of function 'of_node_put' [-Wimplicit-function-declaration] 62 | of_node_put(bk_node); | ^~~~~~~~~~~ drivers/macintosh/via-pmu-backlight.c:22:20: error: 'FB_BACKLIGHT_LEVELS' undeclared here (not in a function) 22 | static u8 bl_curve[FB_BACKLIGHT_LEVELS]; | ^~~~~~~~~~~~~~~~~~~ In file included from <command-line>: drivers/macintosh/via-pmu-backlight.c: In function 'pmu_backlight_curve_lookup': include/linux/compiler.h:168:17: error: '__UNIQUE_ID_x__286' undeclared (first use in this function); did you mean '__UNIQUE_ID_y__287'? 168 | __PASTE(__UNIQUE_ID_, \ | ^~~~~~~~~~~~ drivers/macintosh/via-pmu-backlight.c:45:23: note: in expansion of macro 'max' 45 | max = max((int)bl_curve[i], max); | ^~~ include/linux/minmax.h:71:17: error: first argument to '__builtin_choose_expr' not a constant 71 | (typeof(__builtin_choose_expr(sizeof(ux) > 4, 1LL, 1L)))(ux) >= 0) | ^~~~~~~~~~~~~~~~~~~~~ include/linux/compiler_types.h:577:23: note: in definition of macro '__compiletime_assert' 577 | if (!(condition)) \ | ^~~~~~~~~ drivers/macintosh/via-pmu-backlight.c:45:23: note: in expansion of macro 'max' 45 | max = max((int)bl_curve[i], max); | ^~~ include/linux/minmax.h:71:17: error: first argument to '__builtin_choose_expr' not a constant 71 | (typeof(__builtin_choose_expr(sizeof(ux) > 4, 1LL, 1L)))(ux) >= 0) | ^~~~~~~~~~~~~~~~~~~~~ include/linux/compiler_types.h:577:23: note: in definition of macro '__compiletime_assert' 577 | if (!(condition)) \ | ^~~~~~~~~ include/linux/minmax.h:112:25: note: in expansion of macro '__careful_cmp' 112 | #define max(x, y) __careful_cmp(max, x, y) | ^~~~~~~~~~~~~ drivers/macintosh/via-pmu-backlight.c:45:23: note: in expansion of macro 'max' 45 | max = max((int)bl_curve[i], max); | ^~~ In file included from include/linux/kernel.h:27, from arch/powerpc/include/asm/page.h:11, from arch/powerpc/include/asm/thread_info.h:13, from include/linux/thread_info.h:60, from arch/powerpc/include/asm/ptrace.h:342, from drivers/macintosh/via-pmu-backlight.c:11: include/linux/math.h:162:17: error: first argument to '__builtin_choose_expr' not a constant 162 | __builtin_choose_expr( \ | ^~~~~~~~~~~~~~~~~~~~~ drivers/macintosh/via-pmu-backlight.c: In function 'pmu_backlight_get_level_brightness': drivers/macintosh/via-pmu-backlight.c:63:38: error: 'FB_BACKLIGHT_MAX' undeclared (first use in this function); did you mean 'BACKLIGHT_RAW'? 63 | pmulevel = bl_curve[level] * FB_BACKLIGHT_MAX / MAX_PMU_LEVEL; | ^~~~~~~~~~~~~~~~ | BACKLIGHT_RAW drivers/macintosh/via-pmu-backlight.c:58:51: warning: parameter 'level' set but not used [-Wunused-but-set-parameter] 58 | static int pmu_backlight_get_level_brightness(int level) | ~~~~^~~~~ drivers/macintosh/via-pmu-backlight.c: In function 'pmu_backlight_init': drivers/macintosh/via-pmu-backlight.c:144:17: error: implicit declaration of function 'of_machine_is_compatible' [-Wimplicit-function-declaration] 144 | of_machine_is_compatible("AAPL,3400/2400") || | ^~~~~~~~~~~~~~~~~~~~~~~~ drivers/macintosh/via-pmu-backlight.c: At top level: drivers/macintosh/via-pmu-backlight.c:22:11: warning: 'bl_curve' defined but not used [-Wunused-variable] 22 | static u8 bl_curve[FB_BACKLIGHT_LEVELS]; | ^~~~~~~~ make[5]: *** [scripts/Makefile.build:287: drivers/macintosh/via-pmu-backlight.o] Error 1 make[5]: Target 'drivers/macintosh/' not remade because of errors. ## Source * Kernel version: 6.18.0-rc2-next-20251027 * Git tree: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git * Git describe: next-20251027 * Git commit: 8fec172c82c2b5f6f8e47ab837c1dc91ee3d1b87 * Architectures: powerpc * Toolchains: gcc-14 * Kconfigs: defconfig ## Build * Test log: https://storage.tuxsuite.com/public/linaro/lkft/builds/34dKrlb77LGOQQSoC8FH… * Test details: https://regressions.linaro.org/lkft/linux-next-master/next-20251027/build/g… * Build plan: https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/builds/34dKrlb77… * Build link: https://storage.tuxsuite.com/public/linaro/lkft/builds/34dKrlb77LGOQQSoC8FH… * Kernel config: https://storage.tuxsuite.com/public/linaro/lkft/builds/34dKrlb77LGOQQSoC8FH… -- Linaro LKFT

2 months, 2 weeks

2
2
0 0

[PATCH v8 0/8] liveupdate: Rework KHO for in-kernel users

by Pasha Tatashin

Changelog: v8: Added review-bys and addressed comments from Mike Rapoport and Pratyush Yadav. Added "memblock: Unpreserve memory in case of error" to handle rollback if preserve fails half way through. This series refactors the KHO framework to better support in-kernel users like the upcoming LUO. The current design, which relies on a notifier chain and debugfs for control, is too restrictive for direct programmatic use. The core of this rework is the removal of the notifier chain in favor of a direct registration API. This decouples clients from the shutdown-time finalization sequence, allowing them to manage their preserved state more flexibly and at any time. In support of this new model, this series also: - Exports kho_finalize() and kho_abort() for programmatic control. - Makes the debugfs interface optional. - Introduces APIs to unpreserve memory and fixes a bug in the abort path where client state was being incorrectly discarded. Note that this is an interim step, as a more comprehensive fix is planned as part of the stateless KHO work [1]. - Moves all KHO code into a new kernel/liveupdate/ directory to consolidate live update components. [1] https://lore.kernel.org/all/20251020100306.2709352-1-jasonmiu@google.com Mike Rapoport (Microsoft) (1): kho: drop notifiers Pasha Tatashin (7): kho: allow to drive kho from within kernel kho: make debugfs interface optional kho: add interfaces to unpreserve folios and page ranges kho: don't unpreserve memory during abort liveupdate: kho: move to kernel/liveupdate liveupdate: kho: move kho debugfs directory to liveupdate memblock: Unpreserve memory in case of error Documentation/core-api/kho/concepts.rst | 2 +- MAINTAINERS | 3 +- include/linux/kexec_handover.h | 53 +- init/Kconfig | 2 + kernel/Kconfig.kexec | 24 - kernel/Makefile | 3 +- kernel/kexec_handover_internal.h | 16 - kernel/liveupdate/Kconfig | 39 ++ kernel/liveupdate/Makefile | 5 + kernel/{ => liveupdate}/kexec_handover.c | 508 +++++++----------- .../{ => liveupdate}/kexec_handover_debug.c | 0 kernel/liveupdate/kexec_handover_debugfs.c | 219 ++++++++ kernel/liveupdate/kexec_handover_internal.h | 56 ++ lib/test_kho.c | 33 +- mm/memblock.c | 82 ++- tools/testing/selftests/kho/init.c | 2 +- tools/testing/selftests/kho/vmtest.sh | 1 + 17 files changed, 590 insertions(+), 458 deletions(-) delete mode 100644 kernel/kexec_handover_internal.h create mode 100644 kernel/liveupdate/Kconfig create mode 100644 kernel/liveupdate/Makefile rename kernel/{ => liveupdate}/kexec_handover.c (80%) rename kernel/{ => liveupdate}/kexec_handover_debug.c (100%) create mode 100644 kernel/liveupdate/kexec_handover_debugfs.c create mode 100644 kernel/liveupdate/kexec_handover_internal.h base-commit: 72fb0170ef1f45addf726319c52a0562b6913707 -- 2.51.1.821.gb6fe4d2222-goog

2 months, 2 weeks

4
18
0 0

[PATCH 0/4] SBI MPXY support for KVM Guest

by Anup Patel

This series adds SBI MPXY support for KVM Guest/VM which will enable QEMU-KVM or KVMTOOL to emulate RPMI MPXY channels for the Guest/VM. These patches can also be found in riscv_kvm_sbi_mpxy_v1 branch at: https://github.com/avpatel/linux.git Anup Patel (4): RISC-V: KVM: Convert kvm_riscv_vcpu_sbi_forward() into extension handler RISC-V: KVM: Add separate source for forwarded SBI extensions RISC-V: KVM: Add SBI MPXY extension support for Guest KVM: riscv: selftests: Add SBI MPXY extension to get-reg-list arch/riscv/include/asm/kvm_vcpu_sbi.h | 5 ++- arch/riscv/include/uapi/asm/kvm.h | 1 + arch/riscv/kvm/Makefile | 1 + arch/riscv/kvm/vcpu_sbi.c | 10 +++++- arch/riscv/kvm/vcpu_sbi_base.c | 28 +-------------- arch/riscv/kvm/vcpu_sbi_forward.c | 34 +++++++++++++++++++ arch/riscv/kvm/vcpu_sbi_replace.c | 32 ----------------- arch/riscv/kvm/vcpu_sbi_system.c | 4 +-- arch/riscv/kvm/vcpu_sbi_v01.c | 3 +- .../selftests/kvm/riscv/get-reg-list.c | 4 +++ 10 files changed, 56 insertions(+), 66 deletions(-) create mode 100644 arch/riscv/kvm/vcpu_sbi_forward.c -- 2.43.0

2 months, 2 weeks

3
9
0 0

[PATCH] selftests: tty: add tty_tiocsti_test to .gitignore

by Gopi Krishna Menon

Add the tty_tiocsti_test binary to .gitignore to avoid accidentally staging the build artifact. Signed-off-by: Gopi Krishna Menon <krishnagopi487(a)gmail.com> --- tools/testing/selftests/tty/.gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/tty/.gitignore b/tools/testing/selftests/tty/.gitignore index fe70462a4aad..2453685d2493 100644 --- a/tools/testing/selftests/tty/.gitignore +++ b/tools/testing/selftests/tty/.gitignore @@ -1,2 +1,3 @@ # SPDX-License-Identifier: GPL-2.0-only +tty_tiocsti_test tty_tstamp_update -- 2.43.0

2 months, 2 weeks

3
6
0 0

[PATCH v3 0/9] riscv: vector: misc ptrace fixes for debug use-cases

by Sergey Matyukevich

This patch series suggests fixes for several corner cases in the RISC-V vector ptrace implementation: - init vector context with proper vlenb, to avoid reading zero vlenb by an early attached debugger - follow gdbserver expectations and return ENODATA instead of EINVAL if vector extension is supported but not yet activated for the traced process - validate input vector csr registers in ptrace, to maintain an accurate view of the tracee's vector context across multiple halt/resume debug cycles For detailed description see the appropriate commit messages. A new test suite v_ptrace is added into the tools/testing/selftests/riscv/vector to verify some of the vector ptrace functionality and corner cases. Previous versions: - v2: https://lore.kernel.org/linux-riscv/20250821173957.563472-1-geomatsi@gmail.… - v1: https://lore.kernel.org/linux-riscv/20251007115840.2320557-1-geomatsi@gmail… Changes in v3: Address the review comments by Andy Chiu and rework the approach: - drop forced vector context save entirely - perform strict validation of vector csr regs in ptrace Changes in v2: - add thread_info flag to allow to force vector context save - force vector context save after vector ptrace to ensure valid vector context in the next ptrace operations - force vector context save on the first context switch after vector context init to get proper vlenb --- Ilya Mamay (1): riscv: ptrace: return ENODATA for inactive vector extension Sergey Matyukevich (8): selftests: riscv: test ptrace vector interface selftests: riscv: verify initial vector state with ptrace riscv: vector: init vector context with proper vlenb riscv: csr: define vector registers elements riscv: ptrace: validate input vector csr registers selftests: riscv: verify ptrace rejects invalid vector csr inputs selftests: riscv: verify ptrace accepts valid vector csr values selftests: riscv: verify syscalls discard vector context arch/riscv/include/asm/csr.h | 11 + arch/riscv/kernel/ptrace.c | 72 +- arch/riscv/kernel/vector.c | 12 +- .../testing/selftests/riscv/vector/.gitignore | 1 + tools/testing/selftests/riscv/vector/Makefile | 5 +- .../testing/selftests/riscv/vector/v_ptrace.c | 737 ++++++++++++++++++ 6 files changed, 830 insertions(+), 8 deletions(-) create mode 100644 tools/testing/selftests/riscv/vector/v_ptrace.c base-commit: 72761a7e31225971d0b29d9195e0ffa986b77867 -- 2.51.0

2 months, 2 weeks

1
9
0 0

[PATCH AUTOSEL 6.17-5.4] selftests/Makefile: include $(INSTALL_DEP_TARGETS) in clean target to clean net/lib dependency

by Sasha Levin

From: Nai-Chen Cheng <bleach1827(a)gmail.com> [ Upstream commit d3f7457da7b9527a06dbcbfaf666aa51ac2eeb53 ] The selftests 'make clean' does not clean the net/lib because it only processes $(TARGETS) and ignores $(INSTALL_DEP_TARGETS). This leaves compiled objects in net/lib after cleaning, requiring manual cleanup. Include $(INSTALL_DEP_TARGETS) in clean target to ensure net/lib dependency is properly cleaned. Signed-off-by: Nai-Chen Cheng <bleach1827(a)gmail.com> Reviewed-by: Simon Horman <horms(a)kernel.org> Tested-by: Simon Horman <horms(a)kernel.org> # build-tested Acked-by: Shuah Khan <skhan(a)linuxfoundation.org> Link: https://patch.msgid.link/20250910-selftests-makefile-clean-v1-1-29e7f496cd8… Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- LLM Generated explanations, may be completely bogus: YES – updating `tools/testing/selftests/Makefile:319` to iterate over `$(TARGETS) $(INSTALL_DEP_TARGETS)` brings the `clean` recipe in line with the existing `all` and `install` loops (`tools/testing/selftests/Makefile:211` and `:276`), so the implicit net/lib helper that gets built for networking selftests is now removed during `make clean`. This fixes a real annoyance where stale objects remain under `net/lib/` after cleaning, forcing manual removal and risking rebuild flakiness if headers or tooling change. The change is tiny, selftests-only, and mirrors logic already exercised in other targets, so regression risk is negligible. Just make sure the stable branch you target already carries the dependency hook that introduced `INSTALL_DEP_TARGETS` (commit b86761ff6374813cdf64ffd6b95ddd1813c435d8 or equivalent); older branches without that infrastructure don’t need this patch. Natural follow-up: once backported, run `make -C tools/testing/selftests clean` after building the net tests to confirm the stale net/lib objects are cleared. tools/testing/selftests/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 030da61dbff3a..a2d8e1093b005 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -314,7 +314,7 @@ gen_tar: install @echo "Created ${TAR_PATH}" clean: - @for TARGET in $(TARGETS); do \ + @for TARGET in $(TARGETS) $(INSTALL_DEP_TARGETS); do \ BUILD_TARGET=$$BUILD/$$TARGET; \ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET clean;\ done; -- 2.51.0

2 months, 2 weeks

1
0
0 0

[PATCH] kunit: prevent log overwrite in param_tests

by Carlos Llamas

When running parameterized tests, each test case is initialized with kunit_init_test(). This function takes the test_case->log as a parameter but it clears it via string_stream_clear() on each iteration. This results in only the log from the last parameter being preserved in the test_case->log and the results from the previous parameters are lost from the debugfs entry. Fix this by manually setting the param_test.log to the test_case->log after it has been initialized. This prevents kunit_init_test() from clearing the log on each iteration. Fixes: 4b59300ba4d2 ("kunit: Add parent kunit for parameterized test context") Signed-off-by: Carlos Llamas <cmllamas(a)google.com> --- lib/kunit/test.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/kunit/test.c b/lib/kunit/test.c index bb66ea1a3eac..62eb529824c6 100644 --- a/lib/kunit/test.c +++ b/lib/kunit/test.c @@ -745,7 +745,8 @@ int kunit_run_tests(struct kunit_suite *suite) .param_index = ++test.param_index, .parent = &test, }; - kunit_init_test(&param_test, test_case->name, test_case->log); + kunit_init_test(&param_test, test_case->name, NULL); + param_test.log = test_case->log; kunit_run_case_catch_errors(suite, test_case, &param_test); if (param_desc[0] == '\0') { -- 2.51.1.821.gb6fe4d2222-goog

2 months, 2 weeks

2
1
0 0

[PATCH v2] selftest: net: prevent use of uninitialized variable

by Alessandro Zanni

Fix to avoid the usage of the `ret` variable uninitialized in the following macro expansions. It solves the following warning: In file included from netlink-dumps.c:21: netlink-dumps.c: In function ‘dump_extack’: ../kselftest_harness.h:788:35: warning: ‘ret’ may be used uninitialized [-Wmaybe-uninitialized] 788 | intmax_t __exp_print = (intmax_t)__exp; \ | ^~~~~~~~~~~ ../kselftest_harness.h:631:9: note: in expansion of macro ‘__EXPECT’ 631 | __EXPECT(expected, #expected, seen, #seen, ==, 0) | ^~~~~~~~ netlink-dumps.c:169:9: note: in expansion of macro ‘EXPECT_EQ’ 169 | EXPECT_EQ(ret, FOUND_EXTACK); | ^~~~~~~~~ The issue can be reproduced, building the tests, with the command: make -C tools/testing/selftests TARGETS=net Signed-off-by: Alessandro Zanni <alessandro.zanni87(a)gmail.com> --- Notes: v2: applied the reverse christmas tree order tools/testing/selftests/net/netlink-dumps.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/netlink-dumps.c b/tools/testing/selftests/net/netlink-dumps.c index 7618ebe528a4..7de360c029c6 100644 --- a/tools/testing/selftests/net/netlink-dumps.c +++ b/tools/testing/selftests/net/netlink-dumps.c @@ -111,8 +111,8 @@ static const struct { TEST(dump_extack) { + int i, cnt, ret = 0; int netlink_sock; - int i, cnt, ret; char buf[8192]; int one = 1; ssize_t n; -- 2.43.0

2 months, 2 weeks

2
1
0 0

[PATCH net-next] selftests: drv-net: replace the nsim ring test with a drv-net one

by Jakub Kicinski

We are trying to move away from netdevsim-only tests and towards tests which can be run both against netdevsim and real drivers. Replace the simple bash script we have for checking ethtool -g/-G on netdevsim with a Python test tweaking those params as well as channel count. The new test is not exactly equivalent to the netdevsim one, but real drivers don't often support random ring sizes, let alone modifying max values via debugfs. Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- CC: andrew(a)lunn.ch CC: shuah(a)kernel.org CC: linux-kselftest(a)vger.kernel.org --- .../drivers/net/netdevsim/ethtool-ring.sh | 85 --------- .../selftests/drivers/net/ring_reconfig.py | 167 ++++++++++++++++++ 2 files changed, 167 insertions(+), 85 deletions(-) delete mode 100755 tools/testing/selftests/drivers/net/netdevsim/ethtool-ring.sh create mode 100755 tools/testing/selftests/drivers/net/ring_reconfig.py diff --git a/tools/testing/selftests/drivers/net/netdevsim/ethtool-ring.sh b/tools/testing/selftests/drivers/net/netdevsim/ethtool-ring.sh deleted file mode 100755 index c969559ffa7a..000000000000 --- a/tools/testing/selftests/drivers/net/netdevsim/ethtool-ring.sh +++ /dev/null @@ -1,85 +0,0 @@ -#!/bin/bash -# SPDX-License-Identifier: GPL-2.0-only - -source ethtool-common.sh - -function get_value { - local query="${SETTINGS_MAP[$1]}" - - echo $(ethtool -g $NSIM_NETDEV | \ - tail -n +$CURR_SETT_LINE | \ - awk -F':' -v pattern="$query:" '$0 ~ pattern {gsub(/[\t ]/, "", $2); print $2}') -} - -function update_current_settings { - for key in ${!SETTINGS_MAP[@]}; do - CURRENT_SETTINGS[$key]=$(get_value $key) - done - echo ${CURRENT_SETTINGS[@]} -} - -if ! ethtool -h | grep -q set-ring >/dev/null; then - echo "SKIP: No --set-ring support in ethtool" - exit 4 -fi - -NSIM_NETDEV=$(make_netdev) - -set -o pipefail - -declare -A SETTINGS_MAP=( - ["rx"]="RX" - ["rx-mini"]="RX Mini" - ["rx-jumbo"]="RX Jumbo" - ["tx"]="TX" -) - -declare -A EXPECTED_SETTINGS=( - ["rx"]="" - ["rx-mini"]="" - ["rx-jumbo"]="" - ["tx"]="" -) - -declare -A CURRENT_SETTINGS=( - ["rx"]="" - ["rx-mini"]="" - ["rx-jumbo"]="" - ["tx"]="" -) - -MAX_VALUE=$((RANDOM % $((2**32-1)))) -RING_MAX_LIST=$(ls $NSIM_DEV_DFS/ethtool/ring/) - -for ring_max_entry in $RING_MAX_LIST; do - echo $MAX_VALUE > $NSIM_DEV_DFS/ethtool/ring/$ring_max_entry -done - -CURR_SETT_LINE=$(ethtool -g $NSIM_NETDEV | grep -i -m1 -n 'Current hardware settings' | cut -f1 -d:) - -# populate the expected settings map -for key in ${!SETTINGS_MAP[@]}; do - EXPECTED_SETTINGS[$key]=$(get_value $key) -done - -# test -for key in ${!SETTINGS_MAP[@]}; do - value=$((RANDOM % $MAX_VALUE)) - - ethtool -G $NSIM_NETDEV "$key" "$value" - - EXPECTED_SETTINGS[$key]="$value" - expected=${EXPECTED_SETTINGS[@]} - current=$(update_current_settings) - - check $? "$current" "$expected" - set +x -done - -if [ $num_errors -eq 0 ]; then - echo "PASSED all $((num_passes)) checks" - exit 0 -else - echo "FAILED $num_errors/$((num_errors+num_passes)) checks" - exit 1 -fi diff --git a/tools/testing/selftests/drivers/net/ring_reconfig.py b/tools/testing/selftests/drivers/net/ring_reconfig.py new file mode 100755 index 000000000000..2251efe63014 --- /dev/null +++ b/tools/testing/selftests/drivers/net/ring_reconfig.py @@ -0,0 +1,167 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 + +""" +Test channel and ring size configuration via ethtool (-L / -G). +""" + +from lib.py import ksft_run, ksft_exit, ksft_pr +from lib.py import ksft_eq +from lib.py import NetDrvEpEnv, EthtoolFamily, GenerateTraffic +from lib.py import defer, NlError + + +def channels(cfg) -> None: + """ + Twiddle channel counts in various combinations of parameters. + We're only looking for driver adhering to the requested config + if the config is accepted and crashes. + """ + ehdr = {'header':{'dev-index': cfg.ifindex}} + chans = cfg.eth.channels_get(ehdr) + + all_keys = ["rx", "tx", "combined"] + mixes = [{"combined"}, {"rx", "tx"}, {"rx", "combined"}, {"tx", "combined"}, + {"rx", "tx", "combined"},] + + # Get the set of keys that device actually supports + restore = {} + supported = set() + for key in all_keys: + if key + "-max" in chans: + supported.add(key) + restore |= {key + "-count": chans[key + "-count"]} + + defer(cfg.eth.channels_set, ehdr | restore) + + def test_config(config): + try: + cfg.eth.channels_set(ehdr | config) + get = cfg.eth.channels_get(ehdr) + for k, v in config.items(): + ksft_eq(get.get(k, 0), v) + except NlError as e: + failed.append(mix) + ksft_pr("Can't set", config, e) + else: + ksft_pr("Okay", config) + + failed = [] + for mix in mixes: + if not mix.issubset(supported): + continue + + # Set all the values in the mix to 1, other supported to 0 + config = {} + for key in all_keys: + config[key + "-count"] = 1 if key in mix else 0 + test_config(config) + + for mix in mixes: + if not mix.issubset(supported): + continue + if mix in failed: + continue + + # Set all the values in the mix to max, other supported to 0 + config = {} + for key in all_keys: + config[key + "-count"] = chans[key + '-max'] if key in mix else 0 + test_config(config) + + +def _configure_min_ring_cnt(cfg) -> None: + """ Try to configure a single Rx/Tx ring. """ + ehdr = {'header':{'dev-index': cfg.ifindex}} + chans = cfg.eth.channels_get(ehdr) + + all_keys = ["rx-count", "tx-count", "combined-count"] + restore = {} + config = {} + for key in all_keys: + if key in chans: + restore[key] = chans[key] + config[key] = 0 + + if chans.get('combined-count', 0) > 1: + config['combined-count'] = 1 + elif chans.get('rx-count', 0) > 1 and chans.get('tx-count', 0) > 1: + config['tx-count'] = 1 + config['rx-count'] = 1 + else: + # looks like we're already on 1 channel + return + + cfg.eth.channels_set(ehdr | config) + defer(cfg.eth.channels_set, ehdr | restore) + + +def ringparam(cfg) -> None: + """ + Tweak the ringparam configuration. Try to run some traffic over min + ring size to make sure it actually functions. + """ + ehdr = {'header':{'dev-index': cfg.ifindex}} + rings = cfg.eth.rings_get(ehdr) + + restore = {} + maxes = {} + params = set() + for key in rings.keys(): + if 'max' in key: + param = key[:-4] + maxes[param] = rings[key] + params.add(param) + restore[param] = rings[param] + + defer(cfg.eth.rings_set, ehdr | restore) + + # Speed up the reconfig by configuring just one ring + _configure_min_ring_cnt(cfg) + + # Try to reach min on all settings + for param in params: + val = rings[param] + while True: + try: + cfg.eth.rings_set({'header':{'dev-index': cfg.ifindex}, + param: val // 2}) + val //= 2 + if val <= 1: + break + except NlError: + break + + get = cfg.eth.rings_get(ehdr) + ksft_eq(get[param], val) + + ksft_pr(f"Reached min for '{param}' at {val} (max {rings[param]})") + + GenerateTraffic(cfg).wait_pkts_and_stop(50000) + + # Try max across all params, if the driver supports large rings + # this may OOM so we ignore errors + try: + ksft_pr("Applying max settings") + config = {p: maxes[p] for p in params} + cfg.eth.rings_set(ehdr | config) + except NlError as e: + ksft_pr("Can't set max params", config, e) + else: + GenerateTraffic(cfg).wait_pkts_and_stop(50000) + + +def main() -> None: + """ Ksft boiler plate main """ + + with NetDrvEpEnv(__file__) as cfg: + cfg.eth = EthtoolFamily() + + ksft_run([channels, + ringparam], + args=(cfg, )) + ksft_exit() + + +if __name__ == "__main__": + main() -- 2.51.0

2 months, 2 weeks

1
0
0 0

[PATCH bpf-next 0/2] bpf: Skip bounds adjustment for conditional jumps on same register

by KaFai Wan

This small patchset is about avoid verifier bug warning when conditional jumps on same register when the register holds a scalar with range. --- KaFai Wan (2): bpf: Skip bounds adjustment for conditional jumps on same register selftests/bpf: Add test for conditional jumps on same register kernel/bpf/verifier.c | 4 ++++ .../selftests/bpf/progs/verifier_bounds.c | 17 +++++++++++++++++ 2 files changed, 21 insertions(+) -- 2.43.0

2 months, 2 weeks

4
14
0 0

[PATCHv7 0/7] liveupdate: Rework KHO for in-kernel users

by Pasha Tatashin

These patches are taken from the LUOv4 series [1] and address recent comments from Pratyush. They apply onto mm/mm-nonmm-unstable. This series refactors the KHO framework to better support in-kernel users like the upcoming LUO. The current design, which relies on a notifier chain and debugfs for control, is too restrictive for direct programmatic use. The core of this rework is the removal of the notifier chain in favor of a direct registration API. This decouples clients from the shutdown-time finalization sequence, allowing them to manage their preserved state more flexibly and at any time. In support of this new model, this series also: - Exports kho_finalize() and kho_abort() for programmatic control. - Makes the debugfs interface optional. - Introduces APIs to unpreserve memory and fixes a bug in the abort path where client state was being incorrectly discarded. Note that this is an interim step, as a more comprehensive fix is planned as part of the stateless KHO work [2]. - Moves all KHO code into a new kernel/liveupdate/ directory to consolidate live update components. [1] https://lore.kernel.org/all/20250929010321.3462457-1-pasha.tatashin@soleen.… [2] https://lore.kernel.org/all/20251001011941.1513050-1-jasonmiu@google.com Mike Rapoport (Microsoft) (1): kho: drop notifiers Pasha Tatashin (6): kho: allow to drive kho from within kernel kho: make debugfs interface optional kho: add interfaces to unpreserve folios and page ranges kho: don't unpreserve memory during abort liveupdate: kho: move to kernel/liveupdate liveupdate: kho: move kho debugfs directory to liveupdate Documentation/core-api/kho/concepts.rst | 2 +- MAINTAINERS | 3 +- include/linux/kexec_handover.h | 53 +- init/Kconfig | 2 + kernel/Kconfig.kexec | 24 - kernel/Makefile | 3 +- kernel/kexec_handover_internal.h | 16 - kernel/liveupdate/Kconfig | 39 ++ kernel/liveupdate/Makefile | 5 + kernel/{ => liveupdate}/kexec_handover.c | 513 +++++++----------- .../{ => liveupdate}/kexec_handover_debug.c | 0 kernel/liveupdate/kexec_handover_debugfs.c | 219 ++++++++ kernel/liveupdate/kexec_handover_internal.h | 56 ++ lib/test_kho.c | 30 +- mm/memblock.c | 62 +-- tools/testing/selftests/kho/init.c | 2 +- tools/testing/selftests/kho/vmtest.sh | 1 + 17 files changed, 576 insertions(+), 454 deletions(-) delete mode 100644 kernel/kexec_handover_internal.h create mode 100644 kernel/liveupdate/Kconfig create mode 100644 kernel/liveupdate/Makefile rename kernel/{ => liveupdate}/kexec_handover.c (80%) rename kernel/{ => liveupdate}/kexec_handover_debug.c (100%) create mode 100644 kernel/liveupdate/kexec_handover_debugfs.c create mode 100644 kernel/liveupdate/kexec_handover_internal.h base-commit: 4d90027271cfa0d89473d1e288af52fda9a74935 -- 2.51.0.915.g61a8936c21-goog

2 months, 2 weeks

3
28
0 0

[PATCH bpf-next v2 0/4] selftests/bpf: convert test_tc_tunnel.sh to test_progs

by Alexis Lothoré (eBPF Foundation)

Hello, this is the v2 of test_tc_tunnel conversion into test_progs framework. test_tc_tunnel.sh tests a variety of tunnels based on BPF: packets are encapsulated by a BPF program on the client egress. We then check that those packets can be decapsulated on server ingress side, either thanks to kernel-based or BPF-based decapsulation. Those tests are run thanks to two veths in two dedicated namespaces. - patches 1 and 2 are preparatory patches - patch 3 introduce tc_tunnel test into test_progs - patch 4 gets rid of the test_tc_tunnel.sh script The new test has been executed both in some x86 local qemu machine, as well as in CI: # ./test_progs -a tc_tunnel #454/1 tc_tunnel/ipip_none:OK #454/2 tc_tunnel/ipip6_none:OK #454/3 tc_tunnel/ip6tnl_none:OK #454/4 tc_tunnel/sit_none:OK #454/5 tc_tunnel/vxlan_eth:OK #454/6 tc_tunnel/ip6vxlan_eth:OK #454/7 tc_tunnel/gre_none:OK #454/8 tc_tunnel/gre_eth:OK #454/9 tc_tunnel/gre_mpls:OK #454/10 tc_tunnel/ip6gre_none:OK #454/11 tc_tunnel/ip6gre_eth:OK #454/12 tc_tunnel/ip6gre_mpls:OK #454/13 tc_tunnel/udp_none:OK #454/14 tc_tunnel/udp_eth:OK #454/15 tc_tunnel/udp_mpls:OK #454/16 tc_tunnel/ip6udp_none:OK #454/17 tc_tunnel/ip6udp_eth:OK #454/18 tc_tunnel/ip6udp_mpls:OK #454 tc_tunnel:OK Summary: 1/18 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com> --- Changes in v2: - declare a single tc_prog_attach helper rather than multiple, intermediate helpers - move the new helper to network_helpers.c rather than a dedicated file - do not rename existing tc_helpers.c/h pair (drop patch) - keep only the minimal set of needed NS switches - Link to v1: https://lore.kernel.org/r/20251017-tc_tunnel-v1-0-2d86808d86b2@bootlin.com --- Alexis Lothoré (eBPF Foundation) (4): selftests/bpf: add tc helpers selftests/bpf: make test_tc_tunnel.bpf.c compatible with big endian platforms selftests/bpf: integrate test_tc_tunnel.sh tests into test_progs selftests/bpf: remove test_tc_tunnel.sh tools/testing/selftests/bpf/Makefile | 1 - tools/testing/selftests/bpf/network_helpers.c | 45 ++ tools/testing/selftests/bpf/network_helpers.h | 16 + .../selftests/bpf/prog_tests/test_tc_tunnel.c | 660 +++++++++++++++++++++ .../testing/selftests/bpf/prog_tests/test_tunnel.c | 107 +--- tools/testing/selftests/bpf/progs/test_tc_tunnel.c | 95 ++- tools/testing/selftests/bpf/test_tc_tunnel.sh | 320 ---------- 7 files changed, 776 insertions(+), 468 deletions(-) --- base-commit: b92bbe400a50e4eb033b378252292d1cc19cabae change-id: 20250811-tc_tunnel-c61342683f18 Best regards, -- Alexis Lothoré, Bootlin Embedded Linux and Kernel engineering https://bootlin.com

2 months, 2 weeks

3
8
0 0

[PATCH bpf-next v1 0/2] bpf: Add kfuncs and selftests for detecting execution context and selftests

by Jiayuan Chen

This path introduces several kfuncs to help BPF programs determine their current execution context. When hooking functions for statistics, we often need to use current->comm to get the process name. However, these hooked functions can be called from either process context or interrupt context. When called from interrupt context, the current we obtain may refer to the process that was interrupted, which may not be what we need. These new kfuncs expose APIs that allow users to determine the actual execution context. Jiayuan Chen (2): bpf: Add kfuncs for detecting execution context selftests/bpf: Add selftests for context detection kfuncs kernel/bpf/helpers.c | 45 +++++++++++++++++++ .../selftests/bpf/prog_tests/context.c | 32 +++++++++++++ .../selftests/bpf/progs/context_prog.c | 33 ++++++++++++++ 3 files changed, 110 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/context.c create mode 100644 tools/testing/selftests/bpf/progs/context_prog.c -- 2.43.0

2 months, 2 weeks

3
4
0 0

[PATCH 0/2] KHO: Fix metadata allocation in scratch area

by Pasha Tatashin

This series fixes a memory corruption bug in KHO that occurs when KFENCE is enabled. The root cause is that KHO metadata, allocated via kzalloc(), can be randomly serviced by kfence_alloc(). When a kernel boots via KHO, the early memblock allocator is restricted to a "scratch area". This forces the KFENCE pool to be allocated within this scratch area, creating a conflict. If KHO metadata is subsequently placed in this pool, it gets corrupted during the next kexec operation. The series is structured in two parts: Patch 1/2 introduces a debug-only feature (CONFIG_KEXEC_HANDOVER_DEBUG) that adds checks to detect and fail any operation that attempts to place KHO metadata or preserved memory within the scratch area. This serves as a validation and diagnostic tool to confirm the problem without affecting production builds. Patch 2/2 provides the fix by modifying KHO to allocate its metadata directly from the buddy allocator instead of SLUB. This bypasses the KFENCE interception entirely. Pasha Tatashin (2): liveupdate: kho: warn and fail on metadata or preserved memory in scratch area liveupdate: kho: allocate metadata directly from the buddy allocator kernel/liveupdate/Kconfig | 15 ++++++ kernel/liveupdate/kexec_handover.c | 51 ++++++++++++++++----- kernel/liveupdate/kexec_handover_debug.c | 18 ++++++++ kernel/liveupdate/kexec_handover_internal.h | 9 ++++ 4 files changed, 81 insertions(+), 12 deletions(-) base-commit: 0b2f041c47acb45db82b4e847af6e17eb66cd32d -- 2.51.0.788.g6d19910ace-goog

2 months, 2 weeks

5
23
0 0

Need Customer Service? Here’s How to Talk to a Live Agent Spirit Airlines

by a.ishv.u26.0＠gmail.com

Need to talk to a real person Spirit Airlines? Calling Spirit Airlines directly at 📞 1-866-284-3022. Whether you are trying to make a flight change, cancel your booking, ask about baggage, or resolve a booking issue, reaching a live agent can save you time, stress, and confusion. While automated systems are useful for simple tasks, some situations just need a human touch. In this guide, we will walk you through exactly how to reach a live person, what to prepare before calling, and alternate methods if the phone lines are busy. ☎️ First Things First: Call 1-866-284-3022 The most direct and reliable way to speak with a real Spirit Airlines representative is by calling 📞 1-866-284-3022. This is Frontier’s official customer service number and should be your go-to for: ✈️ Flight changes or cancellations 🧾 Refund or credit questions 🛄 Baggage issues 🔁 Name corrections 🛑 Check-in or boarding problems 💺 Seat selection and upgrades Pro Tip: When calling, try to do so during non-peak hours — early mornings or late evenings — to reduce your hold time. 🎧 How to Navigate the Automated Menu When you call 1-866-284-3022, you will first hear an automated system. To get through to a live person faster, follow these steps: Dial 1-866-284-3022 Wait for the automated greeting to begin Press “1” for English (or “2” for Spanish) Press “2” for existing reservations Press “0” to speak with an agent (you may need to press "0" more than once) 👉 If “0” does not work immediately, stay on the line. Sometimes the system transfers you to an agent after a brief wait without needing more input. If the lines are busy and you are placed on hold, do not hang up — wait times vary, but you will eventually reach someone. 🧾 What to Have Ready Before You Call To help the agent assist you faster, make sure you have the following details on hand: 📌 Your confirmation code or booking number 📌 The full name on the reservation 📌 Your flight date and destination 📌 Any relevant documents (ID, credit card, etc.) 📌 A notepad for writing down instructions or confirmation numbers If you are calling to fix a mistake or request a refund, be prepared to briefly explain the issue and possibly provide documentation via email upon request. ⏰ Best Times to Call 1-866-284-3022 Customer service lines can be busy, especially during: ⚠️ Holidays ⚠️ Severe weather or flight delays ⚠️ Early morning flight hours 🎯 For the best chance at a short wait, try calling during these times: 🕔 5:00 AM – 7:00 AM (EST) 🕘 9:00 PM – 11:00 PM (EST) 📅 Midweek (Tuesdays and Wednesdays) Avoid Mondays if possible — it is the busiest day for airlines. 🧑‍💻 Alternative Ways to Reach Spirit Airlines (If Phone Fails) If calling 📞 1-866-284-3022 does not work or you are stuck in a long queue, here are a few alternate ways to get help: 💬 1. Online Chat (Limited Availability) Visit www.flyfrontier.com Scroll down and look for the “Let’s Chat” option. This can connect you to a live agent or AI assistant, depending on availability. 📧 2. Email Support You can also submit a help request through their Customer Support Form online. Use this for non-urgent matters like refund requests or documentation review. 📱 3. Social Media Tweet or DM Spirit Airlines on platforms like Twitter/X (@FlyFrontier) or send a message via Facebook. Sometimes social media agents respond faster than the phone team during high-volume periods. 📲 4. Mobile App Download the Spirit Airlines App, log in, and navigate to “My Trips” or “Support” for quick options. While this will not guarantee a live agent, you might find answers to basic questions faster. ❗ Common Issues That Require a Live Agent While many tasks can be done online, certain problems are best resolved with a real person at 1-866-284-3022: 🛑 Double charges or billing issues 🔄 Complex flight changes involving multiple passengers 🛄 Lost or delayed baggage ✍️ Legal name changes (marriage, divorce, etc.) 🧑‍⚕️ Medical or accessibility needs during travel In these cases, avoid wasting time — call directly and ask for a live agent. 🚨 Beware of Fake Numbers and Scams Only use the official number: 📞 1-866-284-3022. Scammers often post fake “Frontier support” numbers online, asking for credit card info or login credentials. 🛡️ Never share your full credit card number or personal information with an unverified source. ✅ Final Thoughts Talking to a live person at an airline should not be this hard — but when it comes to Spirit Airlines, knowing the right steps and phone number makes all the difference. 🧠 Remember: Dial 📞 1-866-284-3022 Press 0 to reach a live agent Call during off-peak hours Have your booking info ready Use alternative methods if the line is too busy 💡 The sooner you reach out, the more options you'll have to resolve your issue. ✈️ Whether you are rebooking, fixing an error, or checking a flight, calling 1-866-284-3022 connects you with someone who can truly help.

2 months, 2 weeks

1
0
0 0

Step-by-Step Guide: How to Speak with a Live Agent Southwest Airlines

by a.ishv.u26.0＠gmail.com

Need to talk to a real person Southwest Airlines? Calling Southwest Airlines directly at 📞 1-866-284-3022. Whether you are trying to make a flight change, cancel your booking, ask about baggage, or resolve a booking issue, reaching a live agent can save you time, stress, and confusion. While automated systems are useful for simple tasks, some situations just need a human touch. In this guide, we will walk you through exactly how to reach a live person, what to prepare before calling, and alternate methods if the phone lines are busy. ☎️ First Things First: Call 1-866-284-3022 The most direct and reliable way to speak with a real Southwest Airlines representative is by calling 📞 1-866-284-3022. This is Frontier’s official customer service number and should be your go-to for: ✈️ Flight changes or cancellations 🧾 Refund or credit questions 🛄 Baggage issues 🔁 Name corrections 🛑 Check-in or boarding problems 💺 Seat selection and upgrades Pro Tip: When calling, try to do so during non-peak hours — early mornings or late evenings — to reduce your hold time. 🎧 How to Navigate the Automated Menu When you call 1-866-284-3022, you will first hear an automated system. To get through to a live person faster, follow these steps: Dial 1-866-284-3022 Wait for the automated greeting to begin Press “1” for English (or “2” for Spanish) Press “2” for existing reservations Press “0” to speak with an agent (you may need to press "0" more than once) 👉 If “0” does not work immediately, stay on the line. Sometimes the system transfers you to an agent after a brief wait without needing more input. If the lines are busy and you are placed on hold, do not hang up — wait times vary, but you will eventually reach someone. 🧾 What to Have Ready Before You Call To help the agent assist you faster, make sure you have the following details on hand: 📌 Your confirmation code or booking number 📌 The full name on the reservation 📌 Your flight date and destination 📌 Any relevant documents (ID, credit card, etc.) 📌 A notepad for writing down instructions or confirmation numbers If you are calling to fix a mistake or request a refund, be prepared to briefly explain the issue and possibly provide documentation via email upon request. ⏰ Best Times to Call 1-866-284-3022 Customer service lines can be busy, especially during: ⚠️ Holidays ⚠️ Severe weather or flight delays ⚠️ Early morning flight hours 🎯 For the best chance at a short wait, try calling during these times: 🕔 5:00 AM – 7:00 AM (EST) 🕘 9:00 PM – 11:00 PM (EST) 📅 Midweek (Tuesdays and Wednesdays) Avoid Mondays if possible — it is the busiest day for airlines. 🧑‍💻 Alternative Ways to Reach Southwest Airlines (If Phone Fails) If calling 📞 1-866-284-3022 does not work or you are stuck in a long queue, here are a few alternate ways to get help: 💬 1. Online Chat (Limited Availability) Visit www.flyfrontier.com Scroll down and look for the “Let’s Chat” option. This can connect you to a live agent or AI assistant, depending on availability. 📧 2. Email Support You can also submit a help request through their Customer Support Form online. Use this for non-urgent matters like refund requests or documentation review. 📱 3. Social Media Tweet or DM Southwest Airlines on platforms like Twitter/X (@FlyFrontier) or send a message via Facebook. Sometimes social media agents respond faster than the phone team during high-volume periods. 📲 4. Mobile App Download the Southwest Airlines App, log in, and navigate to “My Trips” or “Support” for quick options. While this will not guarantee a live agent, you might find answers to basic questions faster. ❗ Common Issues That Require a Live Agent While many tasks can be done online, certain problems are best resolved with a real person at 1-866-284-3022: 🛑 Double charges or billing issues 🔄 Complex flight changes involving multiple passengers 🛄 Lost or delayed baggage ✍️ Legal name changes (marriage, divorce, etc.) 🧑‍⚕️ Medical or accessibility needs during travel In these cases, avoid wasting time — call directly and ask for a live agent. 🚨 Beware of Fake Numbers and Scams Only use the official number: 📞 1-866-284-3022. Scammers often post fake “Frontier support” numbers online, asking for credit card info or login credentials. 🛡️ Never share your full credit card number or personal information with an unverified source. ✅ Final Thoughts Talking to a live person at an airline should not be this hard — but when it comes to Southwest Airlines, knowing the right steps and phone number makes all the difference. 🧠 Remember: Dial 📞 1-866-284-3022 Press 0 to reach a live agent Call during off-peak hours Have your booking info ready Use alternative methods if the line is too busy 💡 The sooner you reach out, the more options you'll have to resolve your issue. ✈️ Whether you are rebooking, fixing an error, or checking a flight, calling 1-866-284-3022 connects you with someone who can truly help.

2 months, 2 weeks

1
0
0 0

Contacting a SkyWest Airlines Live Agent: A Simple Guide

by a.ishv.u26.0＠gmail.com

Need to talk to a real person SkyWest Airlines? Calling SkyWest Airlines directly at 📞 1-866-284-3022. Whether you are trying to make a flight change, cancel your booking, ask about baggage, or resolve a booking issue, reaching a live agent can save you time, stress, and confusion. While automated systems are useful for simple tasks, some situations just need a human touch. In this guide, we will walk you through exactly how to reach a live person, what to prepare before calling, and alternate methods if the phone lines are busy. ☎️ First Things First: Call 1-866-284-3022 The most direct and reliable way to speak with a real SkyWest Airlines representative is by calling 📞 1-866-284-3022. This is Frontier’s official customer service number and should be your go-to for: ✈️ Flight changes or cancellations 🧾 Refund or credit questions 🛄 Baggage issues 🔁 Name corrections 🛑 Check-in or boarding problems 💺 Seat selection and upgrades Pro Tip: When calling, try to do so during non-peak hours — early mornings or late evenings — to reduce your hold time. 🎧 How to Navigate the Automated Menu When you call 1-866-284-3022, you will first hear an automated system. To get through to a live person faster, follow these steps: Dial 1-866-284-3022 Wait for the automated greeting to begin Press “1” for English (or “2” for Spanish) Press “2” for existing reservations Press “0” to speak with an agent (you may need to press "0" more than once) 👉 If “0” does not work immediately, stay on the line. Sometimes the system transfers you to an agent after a brief wait without needing more input. If the lines are busy and you are placed on hold, do not hang up — wait times vary, but you will eventually reach someone. 🧾 What to Have Ready Before You Call To help the agent assist you faster, make sure you have the following details on hand: 📌 Your confirmation code or booking number 📌 The full name on the reservation 📌 Your flight date and destination 📌 Any relevant documents (ID, credit card, etc.) 📌 A notepad for writing down instructions or confirmation numbers If you are calling to fix a mistake or request a refund, be prepared to briefly explain the issue and possibly provide documentation via email upon request. ⏰ Best Times to Call 1-866-284-3022 Customer service lines can be busy, especially during: ⚠️ Holidays ⚠️ Severe weather or flight delays ⚠️ Early morning flight hours 🎯 For the best chance at a short wait, try calling during these times: 🕔 5:00 AM – 7:00 AM (EST) 🕘 9:00 PM – 11:00 PM (EST) 📅 Midweek (Tuesdays and Wednesdays) Avoid Mondays if possible — it is the busiest day for airlines. 🧑‍💻 Alternative Ways to Reach SkyWest Airlines (If Phone Fails) If calling 📞 1-866-284-3022 does not work or you are stuck in a long queue, here are a few alternate ways to get help: 💬 1. Online Chat (Limited Availability) Visit www.flyfrontier.com Scroll down and look for the “Let’s Chat” option. This can connect you to a live agent or AI assistant, depending on availability. 📧 2. Email Support You can also submit a help request through their Customer Support Form online. Use this for non-urgent matters like refund requests or documentation review. 📱 3. Social Media Tweet or DM SkyWest Airlines on platforms like Twitter/X (@FlyFrontier) or send a message via Facebook. Sometimes social media agents respond faster than the phone team during high-volume periods. 📲 4. Mobile App Download the SkyWest Airlines App, log in, and navigate to “My Trips” or “Support” for quick options. While this will not guarantee a live agent, you might find answers to basic questions faster. ❗ Common Issues That Require a Live Agent While many tasks can be done online, certain problems are best resolved with a real person at 1-866-284-3022: 🛑 Double charges or billing issues 🔄 Complex flight changes involving multiple passengers 🛄 Lost or delayed baggage ✍️ Legal name changes (marriage, divorce, etc.) 🧑‍⚕️ Medical or accessibility needs during travel In these cases, avoid wasting time — call directly and ask for a live agent. 🚨 Beware of Fake Numbers and Scams Only use the official number: 📞 1-866-284-3022. Scammers often post fake “Frontier support” numbers online, asking for credit card info or login credentials. 🛡️ Never share your full credit card number or personal information with an unverified source. ✅ Final Thoughts Talking to a live person at an airline should not be this hard — but when it comes to SkyWest Airlines, knowing the right steps and phone number makes all the difference. 🧠 Remember: Dial 📞 1-866-284-3022 Press 0 to reach a live agent Call during off-peak hours Have your booking info ready Use alternative methods if the line is too busy 💡 The sooner you reach out, the more options you'll have to resolve your issue. ✈️ Whether you are rebooking, fixing an error, or checking a flight, calling 1-866-284-3022 connects you with someone who can truly help.

2 months, 2 weeks

1
0
0 0

How to Contact a Live Agent Scandinavian Airlines (SAS) for Assistance

by a.ishv.u26.0＠gmail.com

Need to talk to a real person Scandinavian Airlines (SAS)? Calling Scandinavian Airlines (SAS) directly at 📞 1-866-284-3022. Whether you are trying to make a flight change, cancel your booking, ask about baggage, or resolve a booking issue, reaching a live agent can save you time, stress, and confusion. While automated systems are useful for simple tasks, some situations just need a human touch. In this guide, we will walk you through exactly how to reach a live person, what to prepare before calling, and alternate methods if the phone lines are busy. ☎️ First Things First: Call 1-866-284-3022 The most direct and reliable way to speak with a real Scandinavian Airlines (SAS) representative is by calling 📞 1-866-284-3022. This is Frontier’s official customer service number and should be your go-to for: ✈️ Flight changes or cancellations 🧾 Refund or credit questions 🛄 Baggage issues 🔁 Name corrections 🛑 Check-in or boarding problems 💺 Seat selection and upgrades Pro Tip: When calling, try to do so during non-peak hours — early mornings or late evenings — to reduce your hold time. 🎧 How to Navigate the Automated Menu When you call 1-866-284-3022, you will first hear an automated system. To get through to a live person faster, follow these steps: Dial 1-866-284-3022 Wait for the automated greeting to begin Press “1” for English (or “2” for Spanish) Press “2” for existing reservations Press “0” to speak with an agent (you may need to press "0" more than once) 👉 If “0” does not work immediately, stay on the line. Sometimes the system transfers you to an agent after a brief wait without needing more input. If the lines are busy and you are placed on hold, do not hang up — wait times vary, but you will eventually reach someone. 🧾 What to Have Ready Before You Call To help the agent assist you faster, make sure you have the following details on hand: 📌 Your confirmation code or booking number 📌 The full name on the reservation 📌 Your flight date and destination 📌 Any relevant documents (ID, credit card, etc.) 📌 A notepad for writing down instructions or confirmation numbers If you are calling to fix a mistake or request a refund, be prepared to briefly explain the issue and possibly provide documentation via email upon request. ⏰ Best Times to Call 1-866-284-3022 Customer service lines can be busy, especially during: ⚠️ Holidays ⚠️ Severe weather or flight delays ⚠️ Early morning flight hours 🎯 For the best chance at a short wait, try calling during these times: 🕔 5:00 AM – 7:00 AM (EST) 🕘 9:00 PM – 11:00 PM (EST) 📅 Midweek (Tuesdays and Wednesdays) Avoid Mondays if possible — it is the busiest day for airlines. 🧑‍💻 Alternative Ways to Reach Scandinavian Airlines (SAS) (If Phone Fails) If calling 📞 1-866-284-3022 does not work or you are stuck in a long queue, here are a few alternate ways to get help: 💬 1. Online Chat (Limited Availability) Visit www.flyfrontier.com Scroll down and look for the “Let’s Chat” option. This can connect you to a live agent or AI assistant, depending on availability. 📧 2. Email Support You can also submit a help request through their Customer Support Form online. Use this for non-urgent matters like refund requests or documentation review. 📱 3. Social Media Tweet or DM Scandinavian Airlines (SAS) on platforms like Twitter/X (@FlyFrontier) or send a message via Facebook. Sometimes social media agents respond faster than the phone team during high-volume periods. 📲 4. Mobile App Download the Scandinavian Airlines (SAS) App, log in, and navigate to “My Trips” or “Support” for quick options. While this will not guarantee a live agent, you might find answers to basic questions faster. ❗ Common Issues That Require a Live Agent While many tasks can be done online, certain problems are best resolved with a real person at 1-866-284-3022: 🛑 Double charges or billing issues 🔄 Complex flight changes involving multiple passengers 🛄 Lost or delayed baggage ✍️ Legal name changes (marriage, divorce, etc.) 🧑‍⚕️ Medical or accessibility needs during travel In these cases, avoid wasting time — call directly and ask for a live agent. 🚨 Beware of Fake Numbers and Scams Only use the official number: 📞 1-866-284-3022. Scammers often post fake “Frontier support” numbers online, asking for credit card info or login credentials. 🛡️ Never share your full credit card number or personal information with an unverified source. ✅ Final Thoughts Talking to a live person at an airline should not be this hard — but when it comes to Scandinavian Airlines (SAS), knowing the right steps and phone number makes all the difference. 🧠 Remember: Dial 📞 1-866-284-3022 Press 0 to reach a live agent Call during off-peak hours Have your booking info ready Use alternative methods if the line is too busy 💡 The sooner you reach out, the more options you'll have to resolve your issue. ✈️ Whether you are rebooking, fixing an error, or checking a flight, calling 1-866-284-3022 connects you with someone who can truly help.

2 months, 2 weeks

1
0
0 0

How to Easily Contact a Live Agent Royal Jordanian Airlines for Help

by a.ishv.u26.0＠gmail.com

Need to talk to a real person Royal Jordanian Airlines? Calling Royal Jordanian Airlines directly at 📞 1-866-284-3022. Whether you are trying to make a flight change, cancel your booking, ask about baggage, or resolve a booking issue, reaching a live agent can save you time, stress, and confusion. While automated systems are useful for simple tasks, some situations just need a human touch. In this guide, we will walk you through exactly how to reach a live person, what to prepare before calling, and alternate methods if the phone lines are busy. ☎️ First Things First: Call 1-866-284-3022 The most direct and reliable way to speak with a real Royal Jordanian Airlines representative is by calling 📞 1-866-284-3022. This is Frontier’s official customer service number and should be your go-to for: ✈️ Flight changes or cancellations 🧾 Refund or credit questions 🛄 Baggage issues 🔁 Name corrections 🛑 Check-in or boarding problems 💺 Seat selection and upgrades Pro Tip: When calling, try to do so during non-peak hours — early mornings or late evenings — to reduce your hold time. 🎧 How to Navigate the Automated Menu When you call 1-866-284-3022, you will first hear an automated system. To get through to a live person faster, follow these steps: Dial 1-866-284-3022 Wait for the automated greeting to begin Press “1” for English (or “2” for Spanish) Press “2” for existing reservations Press “0” to speak with an agent (you may need to press "0" more than once) 👉 If “0” does not work immediately, stay on the line. Sometimes the system transfers you to an agent after a brief wait without needing more input. If the lines are busy and you are placed on hold, do not hang up — wait times vary, but you will eventually reach someone. 🧾 What to Have Ready Before You Call To help the agent assist you faster, make sure you have the following details on hand: 📌 Your confirmation code or booking number 📌 The full name on the reservation 📌 Your flight date and destination 📌 Any relevant documents (ID, credit card, etc.) 📌 A notepad for writing down instructions or confirmation numbers If you are calling to fix a mistake or request a refund, be prepared to briefly explain the issue and possibly provide documentation via email upon request. ⏰ Best Times to Call 1-866-284-3022 Customer service lines can be busy, especially during: ⚠️ Holidays ⚠️ Severe weather or flight delays ⚠️ Early morning flight hours 🎯 For the best chance at a short wait, try calling during these times: 🕔 5:00 AM – 7:00 AM (EST) 🕘 9:00 PM – 11:00 PM (EST) 📅 Midweek (Tuesdays and Wednesdays) Avoid Mondays if possible — it is the busiest day for airlines. 🧑‍💻 Alternative Ways to Reach Royal Jordanian Airlines (If Phone Fails) If calling 📞 1-866-284-3022 does not work or you are stuck in a long queue, here are a few alternate ways to get help: 💬 1. Online Chat (Limited Availability) Visit www.flyfrontier.com Scroll down and look for the “Let’s Chat” option. This can connect you to a live agent or AI assistant, depending on availability. 📧 2. Email Support You can also submit a help request through their Customer Support Form online. Use this for non-urgent matters like refund requests or documentation review. 📱 3. Social Media Tweet or DM Royal Jordanian Airlines on platforms like Twitter/X (@FlyFrontier) or send a message via Facebook. Sometimes social media agents respond faster than the phone team during high-volume periods. 📲 4. Mobile App Download the Royal Jordanian Airlines App, log in, and navigate to “My Trips” or “Support” for quick options. While this will not guarantee a live agent, you might find answers to basic questions faster. ❗ Common Issues That Require a Live Agent While many tasks can be done online, certain problems are best resolved with a real person at 1-866-284-3022: 🛑 Double charges or billing issues 🔄 Complex flight changes involving multiple passengers 🛄 Lost or delayed baggage ✍️ Legal name changes (marriage, divorce, etc.) 🧑‍⚕️ Medical or accessibility needs during travel In these cases, avoid wasting time — call directly and ask for a live agent. 🚨 Beware of Fake Numbers and Scams Only use the official number: 📞 1-866-284-3022. Scammers often post fake “Frontier support” numbers online, asking for credit card info or login credentials. 🛡️ Never share your full credit card number or personal information with an unverified source. ✅ Final Thoughts Talking to a live person at an airline should not be this hard — but when it comes to Royal Jordanian Airlines, knowing the right steps and phone number makes all the difference. 🧠 Remember: Dial 📞 1-866-284-3022 Press 0 to reach a live agent Call during off-peak hours Have your booking info ready Use alternative methods if the line is too busy 💡 The sooner you reach out, the more options you'll have to resolve your issue. ✈️ Whether you are rebooking, fixing an error, or checking a flight, calling 1-866-284-3022 connects you with someone who can truly help.

2 months, 2 weeks

1
0
0 0

How to Contact a Live Agent Royal Brunei Airlines for Assistance

by a.ishv.u26.0＠gmail.com

Need to talk to a real person Royal Brunei Airlines? Calling Royal Brunei Airlines directly at 📞 1-866-284-3022. Whether you are trying to make a flight change, cancel your booking, ask about baggage, or resolve a booking issue, reaching a live agent can save you time, stress, and confusion. While automated systems are useful for simple tasks, some situations just need a human touch. In this guide, we will walk you through exactly how to reach a live person, what to prepare before calling, and alternate methods if the phone lines are busy. ☎️ First Things First: Call 1-866-284-3022 The most direct and reliable way to speak with a real Royal Brunei Airlines representative is by calling 📞 1-866-284-3022. This is Frontier’s official customer service number and should be your go-to for: ✈️ Flight changes or cancellations 🧾 Refund or credit questions 🛄 Baggage issues 🔁 Name corrections 🛑 Check-in or boarding problems 💺 Seat selection and upgrades Pro Tip: When calling, try to do so during non-peak hours — early mornings or late evenings — to reduce your hold time. 🎧 How to Navigate the Automated Menu When you call 1-866-284-3022, you will first hear an automated system. To get through to a live person faster, follow these steps: Dial 1-866-284-3022 Wait for the automated greeting to begin Press “1” for English (or “2” for Spanish) Press “2” for existing reservations Press “0” to speak with an agent (you may need to press "0" more than once) 👉 If “0” does not work immediately, stay on the line. Sometimes the system transfers you to an agent after a brief wait without needing more input. If the lines are busy and you are placed on hold, do not hang up — wait times vary, but you will eventually reach someone. 🧾 What to Have Ready Before You Call To help the agent assist you faster, make sure you have the following details on hand: 📌 Your confirmation code or booking number 📌 The full name on the reservation 📌 Your flight date and destination 📌 Any relevant documents (ID, credit card, etc.) 📌 A notepad for writing down instructions or confirmation numbers If you are calling to fix a mistake or request a refund, be prepared to briefly explain the issue and possibly provide documentation via email upon request. ⏰ Best Times to Call 1-866-284-3022 Customer service lines can be busy, especially during: ⚠️ Holidays ⚠️ Severe weather or flight delays ⚠️ Early morning flight hours 🎯 For the best chance at a short wait, try calling during these times: 🕔 5:00 AM – 7:00 AM (EST) 🕘 9:00 PM – 11:00 PM (EST) 📅 Midweek (Tuesdays and Wednesdays) Avoid Mondays if possible — it is the busiest day for airlines. 🧑‍💻 Alternative Ways to Reach Royal Brunei Airlines (If Phone Fails) If calling 📞 1-866-284-3022 does not work or you are stuck in a long queue, here are a few alternate ways to get help: 💬 1. Online Chat (Limited Availability) Visit www.flyfrontier.com Scroll down and look for the “Let’s Chat” option. This can connect you to a live agent or AI assistant, depending on availability. 📧 2. Email Support You can also submit a help request through their Customer Support Form online. Use this for non-urgent matters like refund requests or documentation review. 📱 3. Social Media Tweet or DM Royal Brunei Airlines on platforms like Twitter/X (@FlyFrontier) or send a message via Facebook. Sometimes social media agents respond faster than the phone team during high-volume periods. 📲 4. Mobile App Download the Royal Brunei Airlines App, log in, and navigate to “My Trips” or “Support” for quick options. While this will not guarantee a live agent, you might find answers to basic questions faster. ❗ Common Issues That Require a Live Agent While many tasks can be done online, certain problems are best resolved with a real person at 1-866-284-3022: 🛑 Double charges or billing issues 🔄 Complex flight changes involving multiple passengers 🛄 Lost or delayed baggage ✍️ Legal name changes (marriage, divorce, etc.) 🧑‍⚕️ Medical or accessibility needs during travel In these cases, avoid wasting time — call directly and ask for a live agent. 🚨 Beware of Fake Numbers and Scams Only use the official number: 📞 1-866-284-3022. Scammers often post fake “Frontier support” numbers online, asking for credit card info or login credentials. 🛡️ Never share your full credit card number or personal information with an unverified source. ✅ Final Thoughts Talking to a live person at an airline should not be this hard — but when it comes to Royal Brunei Airlines, knowing the right steps and phone number makes all the difference. 🧠 Remember: Dial 📞 1-866-284-3022 Press 0 to reach a live agent Call during off-peak hours Have your booking info ready Use alternative methods if the line is too busy 💡 The sooner you reach out, the more options you'll have to resolve your issue. ✈️ Whether you are rebooking, fixing an error, or checking a flight, calling 1-866-284-3022 connects you with someone who can truly help.

2 months, 2 weeks

1
0
0 0

How to Speak to a Live Person Porter Airlines: Tips and Tricks

by a.ishv.u26.0＠gmail.com

Need to talk to a real person Porter Airlines? Calling Porter Airlines directly at 📞 1-866-284-3022. Whether you are trying to make a flight change, cancel your booking, ask about baggage, or resolve a booking issue, reaching a live agent can save you time, stress, and confusion. While automated systems are useful for simple tasks, some situations just need a human touch. In this guide, we will walk you through exactly how to reach a live person, what to prepare before calling, and alternate methods if the phone lines are busy. ☎️ First Things First: Call 1-866-284-3022 The most direct and reliable way to speak with a real Porter Airlines representative is by calling 📞 1-866-284-3022. This is Frontier’s official customer service number and should be your go-to for: ✈️ Flight changes or cancellations 🧾 Refund or credit questions 🛄 Baggage issues 🔁 Name corrections 🛑 Check-in or boarding problems 💺 Seat selection and upgrades Pro Tip: When calling, try to do so during non-peak hours — early mornings or late evenings — to reduce your hold time. 🎧 How to Navigate the Automated Menu When you call 1-866-284-3022, you will first hear an automated system. To get through to a live person faster, follow these steps: Dial 1-866-284-3022 Wait for the automated greeting to begin Press “1” for English (or “2” for Spanish) Press “2” for existing reservations Press “0” to speak with an agent (you may need to press "0" more than once) 👉 If “0” does not work immediately, stay on the line. Sometimes the system transfers you to an agent after a brief wait without needing more input. If the lines are busy and you are placed on hold, do not hang up — wait times vary, but you will eventually reach someone. 🧾 What to Have Ready Before You Call To help the agent assist you faster, make sure you have the following details on hand: 📌 Your confirmation code or booking number 📌 The full name on the reservation 📌 Your flight date and destination 📌 Any relevant documents (ID, credit card, etc.) 📌 A notepad for writing down instructions or confirmation numbers If you are calling to fix a mistake or request a refund, be prepared to briefly explain the issue and possibly provide documentation via email upon request. ⏰ Best Times to Call 1-866-284-3022 Customer service lines can be busy, especially during: ⚠️ Holidays ⚠️ Severe weather or flight delays ⚠️ Early morning flight hours 🎯 For the best chance at a short wait, try calling during these times: 🕔 5:00 AM – 7:00 AM (EST) 🕘 9:00 PM – 11:00 PM (EST) 📅 Midweek (Tuesdays and Wednesdays) Avoid Mondays if possible — it is the busiest day for airlines. 🧑‍💻 Alternative Ways to Reach Porter Airlines (If Phone Fails) If calling 📞 1-866-284-3022 does not work or you are stuck in a long queue, here are a few alternate ways to get help: 💬 1. Online Chat (Limited Availability) Visit www.flyfrontier.com Scroll down and look for the “Let’s Chat” option. This can connect you to a live agent or AI assistant, depending on availability. 📧 2. Email Support You can also submit a help request through their Customer Support Form online. Use this for non-urgent matters like refund requests or documentation review. 📱 3. Social Media Tweet or DM Porter Airlines on platforms like Twitter/X (@FlyFrontier) or send a message via Facebook. Sometimes social media agents respond faster than the phone team during high-volume periods. 📲 4. Mobile App Download the Porter Airlines App, log in, and navigate to “My Trips” or “Support” for quick options. While this will not guarantee a live agent, you might find answers to basic questions faster. ❗ Common Issues That Require a Live Agent While many tasks can be done online, certain problems are best resolved with a real person at 1-866-284-3022: 🛑 Double charges or billing issues 🔄 Complex flight changes involving multiple passengers 🛄 Lost or delayed baggage ✍️ Legal name changes (marriage, divorce, etc.) 🧑‍⚕️ Medical or accessibility needs during travel In these cases, avoid wasting time — call directly and ask for a live agent. 🚨 Beware of Fake Numbers and Scams Only use the official number: 📞 1-866-284-3022. Scammers often post fake “Frontier support” numbers online, asking for credit card info or login credentials. 🛡️ Never share your full credit card number or personal information with an unverified source. ✅ Final Thoughts Talking to a live person at an airline should not be this hard — but when it comes to Porter Airlines, knowing the right steps and phone number makes all the difference. 🧠 Remember: Dial 📞 1-866-284-3022 Press 0 to reach a live agent Call during off-peak hours Have your booking info ready Use alternative methods if the line is too busy 💡 The sooner you reach out, the more options you'll have to resolve your issue. ✈️ Whether you are rebooking, fixing an error, or checking a flight, calling 1-866-284-3022 connects you with someone who can truly help.

2 months, 2 weeks

1
0
0 0

How to Connect with a Live Customer Service Agent Play Airlines

by a.ishv.u26.0＠gmail.com

Need to talk to a real person Play Airlines? Calling Play Airlines directly at 📞 1-866-284-3022. Whether you are trying to make a flight change, cancel your booking, ask about baggage, or resolve a booking issue, reaching a live agent can save you time, stress, and confusion. While automated systems are useful for simple tasks, some situations just need a human touch. In this guide, we will walk you through exactly how to reach a live person, what to prepare before calling, and alternate methods if the phone lines are busy. ☎️ First Things First: Call 1-866-284-3022 The most direct and reliable way to speak with a real Play Airlines representative is by calling 📞 1-866-284-3022. This is Frontier’s official customer service number and should be your go-to for: ✈️ Flight changes or cancellations 🧾 Refund or credit questions 🛄 Baggage issues 🔁 Name corrections 🛑 Check-in or boarding problems 💺 Seat selection and upgrades Pro Tip: When calling, try to do so during non-peak hours — early mornings or late evenings — to reduce your hold time. 🎧 How to Navigate the Automated Menu When you call 1-866-284-3022, you will first hear an automated system. To get through to a live person faster, follow these steps: Dial 1-866-284-3022 Wait for the automated greeting to begin Press “1” for English (or “2” for Spanish) Press “2” for existing reservations Press “0” to speak with an agent (you may need to press "0" more than once) 👉 If “0” does not work immediately, stay on the line. Sometimes the system transfers you to an agent after a brief wait without needing more input. If the lines are busy and you are placed on hold, do not hang up — wait times vary, but you will eventually reach someone. 🧾 What to Have Ready Before You Call To help the agent assist you faster, make sure you have the following details on hand: 📌 Your confirmation code or booking number 📌 The full name on the reservation 📌 Your flight date and destination 📌 Any relevant documents (ID, credit card, etc.) 📌 A notepad for writing down instructions or confirmation numbers If you are calling to fix a mistake or request a refund, be prepared to briefly explain the issue and possibly provide documentation via email upon request. ⏰ Best Times to Call 1-866-284-3022 Customer service lines can be busy, especially during: ⚠️ Holidays ⚠️ Severe weather or flight delays ⚠️ Early morning flight hours 🎯 For the best chance at a short wait, try calling during these times: 🕔 5:00 AM – 7:00 AM (EST) 🕘 9:00 PM – 11:00 PM (EST) 📅 Midweek (Tuesdays and Wednesdays) Avoid Mondays if possible — it is the busiest day for airlines. 🧑‍💻 Alternative Ways to Reach Play Airlines (If Phone Fails) If calling 📞 1-866-284-3022 does not work or you are stuck in a long queue, here are a few alternate ways to get help: 💬 1. Online Chat (Limited Availability) Visit www.flyfrontier.com Scroll down and look for the “Let’s Chat” option. This can connect you to a live agent or AI assistant, depending on availability. 📧 2. Email Support You can also submit a help request through their Customer Support Form online. Use this for non-urgent matters like refund requests or documentation review. 📱 3. Social Media Tweet or DM Play Airlines on platforms like Twitter/X (@FlyFrontier) or send a message via Facebook. Sometimes social media agents respond faster than the phone team during high-volume periods. 📲 4. Mobile App Download the Play Airlines App, log in, and navigate to “My Trips” or “Support” for quick options. While this will not guarantee a live agent, you might find answers to basic questions faster. ❗ Common Issues That Require a Live Agent While many tasks can be done online, certain problems are best resolved with a real person at 1-866-284-3022: 🛑 Double charges or billing issues 🔄 Complex flight changes involving multiple passengers 🛄 Lost or delayed baggage ✍️ Legal name changes (marriage, divorce, etc.) 🧑‍⚕️ Medical or accessibility needs during travel In these cases, avoid wasting time — call directly and ask for a live agent. 🚨 Beware of Fake Numbers and Scams Only use the official number: 📞 1-866-284-3022. Scammers often post fake “Frontier support” numbers online, asking for credit card info or login credentials. 🛡️ Never share your full credit card number or personal information with an unverified source. ✅ Final Thoughts Talking to a live person at an airline should not be this hard — but when it comes to Play Airlines, knowing the right steps and phone number makes all the difference. 🧠 Remember: Dial 📞 1-866-284-3022 Press 0 to reach a live agent Call during off-peak hours Have your booking info ready Use alternative methods if the line is too busy 💡 The sooner you reach out, the more options you'll have to resolve your issue. ✈️ Whether you are rebooking, fixing an error, or checking a flight, calling 1-866-284-3022 connects you with someone who can truly help.

2 months, 2 weeks

1
0
0 0

Connecting with a Philippine Airlines Live Agent: A Step-by-Step Guide

by a.ishv.u26.0＠gmail.com

Need to talk to a real person Philippine Airlines? Calling Philippine Airlines directly at 📞 1-866-284-3022. Whether you are trying to make a flight change, cancel your booking, ask about baggage, or resolve a booking issue, reaching a live agent can save you time, stress, and confusion. While automated systems are useful for simple tasks, some situations just need a human touch. In this guide, we will walk you through exactly how to reach a live person, what to prepare before calling, and alternate methods if the phone lines are busy. ☎️ First Things First: Call 1-866-284-3022 The most direct and reliable way to speak with a real Philippine Airlines representative is by calling 📞 1-866-284-3022. This is Frontier’s official customer service number and should be your go-to for: ✈️ Flight changes or cancellations 🧾 Refund or credit questions 🛄 Baggage issues 🔁 Name corrections 🛑 Check-in or boarding problems 💺 Seat selection and upgrades Pro Tip: When calling, try to do so during non-peak hours — early mornings or late evenings — to reduce your hold time. 🎧 How to Navigate the Automated Menu When you call 1-866-284-3022, you will first hear an automated system. To get through to a live person faster, follow these steps: Dial 1-866-284-3022 Wait for the automated greeting to begin Press “1” for English (or “2” for Spanish) Press “2” for existing reservations Press “0” to speak with an agent (you may need to press "0" more than once) 👉 If “0” does not work immediately, stay on the line. Sometimes the system transfers you to an agent after a brief wait without needing more input. If the lines are busy and you are placed on hold, do not hang up — wait times vary, but you will eventually reach someone. 🧾 What to Have Ready Before You Call To help the agent assist you faster, make sure you have the following details on hand: 📌 Your confirmation code or booking number 📌 The full name on the reservation 📌 Your flight date and destination 📌 Any relevant documents (ID, credit card, etc.) 📌 A notepad for writing down instructions or confirmation numbers If you are calling to fix a mistake or request a refund, be prepared to briefly explain the issue and possibly provide documentation via email upon request. ⏰ Best Times to Call 1-866-284-3022 Customer service lines can be busy, especially during: ⚠️ Holidays ⚠️ Severe weather or flight delays ⚠️ Early morning flight hours 🎯 For the best chance at a short wait, try calling during these times: 🕔 5:00 AM – 7:00 AM (EST) 🕘 9:00 PM – 11:00 PM (EST) 📅 Midweek (Tuesdays and Wednesdays) Avoid Mondays if possible — it is the busiest day for airlines. 🧑‍💻 Alternative Ways to Reach Philippine Airlines (If Phone Fails) If calling 📞 1-866-284-3022 does not work or you are stuck in a long queue, here are a few alternate ways to get help: 💬 1. Online Chat (Limited Availability) Visit www.flyfrontier.com Scroll down and look for the “Let’s Chat” option. This can connect you to a live agent or AI assistant, depending on availability. 📧 2. Email Support You can also submit a help request through their Customer Support Form online. Use this for non-urgent matters like refund requests or documentation review. 📱 3. Social Media Tweet or DM Philippine Airlines on platforms like Twitter/X (@FlyFrontier) or send a message via Facebook. Sometimes social media agents respond faster than the phone team during high-volume periods. 📲 4. Mobile App Download the Philippine Airlines App, log in, and navigate to “My Trips” or “Support” for quick options. While this will not guarantee a live agent, you might find answers to basic questions faster. ❗ Common Issues That Require a Live Agent While many tasks can be done online, certain problems are best resolved with a real person at 1-866-284-3022: 🛑 Double charges or billing issues 🔄 Complex flight changes involving multiple passengers 🛄 Lost or delayed baggage ✍️ Legal name changes (marriage, divorce, etc.) 🧑‍⚕️ Medical or accessibility needs during travel In these cases, avoid wasting time — call directly and ask for a live agent. 🚨 Beware of Fake Numbers and Scams Only use the official number: 📞 1-866-284-3022. Scammers often post fake “Frontier support” numbers online, asking for credit card info or login credentials. 🛡️ Never share your full credit card number or personal information with an unverified source. ✅ Final Thoughts Talking to a live person at an airline should not be this hard — but when it comes to Philippine Airlines, knowing the right steps and phone number makes all the difference. 🧠 Remember: Dial 📞 1-866-284-3022 Press 0 to reach a live agent Call during off-peak hours Have your booking info ready Use alternative methods if the line is too busy 💡 The sooner you reach out, the more options you'll have to resolve your issue. ✈️ Whether you are rebooking, fixing an error, or checking a flight, calling 1-866-284-3022 connects you with someone who can truly help.

2 months, 2 weeks

1
0
0 0

How to Contact a Live Agent Pegasus Airlines for Assistance

by a.ishv.u26.0＠gmail.com

Need to talk to a real person Pegasus Airlines? Calling Pegasus Airlines directly at 📞 1-866-284-3022. Whether you are trying to make a flight change, cancel your booking, ask about baggage, or resolve a booking issue, reaching a live agent can save you time, stress, and confusion. While automated systems are useful for simple tasks, some situations just need a human touch. In this guide, we will walk you through exactly how to reach a live person, what to prepare before calling, and alternate methods if the phone lines are busy. ☎️ First Things First: Call 1-866-284-3022 The most direct and reliable way to speak with a real Pegasus Airlines representative is by calling 📞 1-866-284-3022. This is Frontier’s official customer service number and should be your go-to for: ✈️ Flight changes or cancellations 🧾 Refund or credit questions 🛄 Baggage issues 🔁 Name corrections 🛑 Check-in or boarding problems 💺 Seat selection and upgrades Pro Tip: When calling, try to do so during non-peak hours — early mornings or late evenings — to reduce your hold time. 🎧 How to Navigate the Automated Menu When you call 1-866-284-3022, you will first hear an automated system. To get through to a live person faster, follow these steps: Dial 1-866-284-3022 Wait for the automated greeting to begin Press “1” for English (or “2” for Spanish) Press “2” for existing reservations Press “0” to speak with an agent (you may need to press "0" more than once) 👉 If “0” does not work immediately, stay on the line. Sometimes the system transfers you to an agent after a brief wait without needing more input. If the lines are busy and you are placed on hold, do not hang up — wait times vary, but you will eventually reach someone. 🧾 What to Have Ready Before You Call To help the agent assist you faster, make sure you have the following details on hand: 📌 Your confirmation code or booking number 📌 The full name on the reservation 📌 Your flight date and destination 📌 Any relevant documents (ID, credit card, etc.) 📌 A notepad for writing down instructions or confirmation numbers If you are calling to fix a mistake or request a refund, be prepared to briefly explain the issue and possibly provide documentation via email upon request. ⏰ Best Times to Call 1-866-284-3022 Customer service lines can be busy, especially during: ⚠️ Holidays ⚠️ Severe weather or flight delays ⚠️ Early morning flight hours 🎯 For the best chance at a short wait, try calling during these times: 🕔 5:00 AM – 7:00 AM (EST) 🕘 9:00 PM – 11:00 PM (EST) 📅 Midweek (Tuesdays and Wednesdays) Avoid Mondays if possible — it is the busiest day for airlines. 🧑‍💻 Alternative Ways to Reach Pegasus Airlines (If Phone Fails) If calling 📞 1-866-284-3022 does not work or you are stuck in a long queue, here are a few alternate ways to get help: 💬 1. Online Chat (Limited Availability) Visit www.flyfrontier.com Scroll down and look for the “Let’s Chat” option. This can connect you to a live agent or AI assistant, depending on availability. 📧 2. Email Support You can also submit a help request through their Customer Support Form online. Use this for non-urgent matters like refund requests or documentation review. 📱 3. Social Media Tweet or DM Pegasus Airlines on platforms like Twitter/X (@FlyFrontier) or send a message via Facebook. Sometimes social media agents respond faster than the phone team during high-volume periods. 📲 4. Mobile App Download the Pegasus Airlines App, log in, and navigate to “My Trips” or “Support” for quick options. While this will not guarantee a live agent, you might find answers to basic questions faster. ❗ Common Issues That Require a Live Agent While many tasks can be done online, certain problems are best resolved with a real person at 1-866-284-3022: 🛑 Double charges or billing issues 🔄 Complex flight changes involving multiple passengers 🛄 Lost or delayed baggage ✍️ Legal name changes (marriage, divorce, etc.) 🧑‍⚕️ Medical or accessibility needs during travel In these cases, avoid wasting time — call directly and ask for a live agent. 🚨 Beware of Fake Numbers and Scams Only use the official number: 📞 1-866-284-3022. Scammers often post fake “Frontier support” numbers online, asking for credit card info or login credentials. 🛡️ Never share your full credit card number or personal information with an unverified source. ✅ Final Thoughts Talking to a live person at an airline should not be this hard — but when it comes to Pegasus Airlines, knowing the right steps and phone number makes all the difference. 🧠 Remember: Dial 📞 1-866-284-3022 Press 0 to reach a live agent Call during off-peak hours Have your booking info ready Use alternative methods if the line is too busy 💡 The sooner you reach out, the more options you'll have to resolve your issue. ✈️ Whether you are rebooking, fixing an error, or checking a flight, calling 1-866-284-3022 connects you with someone who can truly help.

2 months, 2 weeks

1
0
0 0

How to Reach a Live Agent Mokulele Airlines: A Quick Guide

by a.ishv.u26.0＠gmail.com

Need to talk to a real person Mokulele Airlines? Calling Mokulele Airlines directly at 📞 1-866-284-3022. Whether you are trying to make a flight change, cancel your booking, ask about baggage, or resolve a booking issue, reaching a live agent can save you time, stress, and confusion. While automated systems are useful for simple tasks, some situations just need a human touch. In this guide, we will walk you through exactly how to reach a live person, what to prepare before calling, and alternate methods if the phone lines are busy. ☎️ First Things First: Call 1-866-284-3022 The most direct and reliable way to speak with a real Mokulele Airlines representative is by calling 📞 1-866-284-3022. This is Frontier’s official customer service number and should be your go-to for: ✈️ Flight changes or cancellations 🧾 Refund or credit questions 🛄 Baggage issues 🔁 Name corrections 🛑 Check-in or boarding problems 💺 Seat selection and upgrades Pro Tip: When calling, try to do so during non-peak hours — early mornings or late evenings — to reduce your hold time. 🎧 How to Navigate the Automated Menu When you call 1-866-284-3022, you will first hear an automated system. To get through to a live person faster, follow these steps: Dial 1-866-284-3022 Wait for the automated greeting to begin Press “1” for English (or “2” for Spanish) Press “2” for existing reservations Press “0” to speak with an agent (you may need to press "0" more than once) 👉 If “0” does not work immediately, stay on the line. Sometimes the system transfers you to an agent after a brief wait without needing more input. If the lines are busy and you are placed on hold, do not hang up — wait times vary, but you will eventually reach someone. 🧾 What to Have Ready Before You Call To help the agent assist you faster, make sure you have the following details on hand: 📌 Your confirmation code or booking number 📌 The full name on the reservation 📌 Your flight date and destination 📌 Any relevant documents (ID, credit card, etc.) 📌 A notepad for writing down instructions or confirmation numbers If you are calling to fix a mistake or request a refund, be prepared to briefly explain the issue and possibly provide documentation via email upon request. ⏰ Best Times to Call 1-866-284-3022 Customer service lines can be busy, especially during: ⚠️ Holidays ⚠️ Severe weather or flight delays ⚠️ Early morning flight hours 🎯 For the best chance at a short wait, try calling during these times: 🕔 5:00 AM – 7:00 AM (EST) 🕘 9:00 PM – 11:00 PM (EST) 📅 Midweek (Tuesdays and Wednesdays) Avoid Mondays if possible — it is the busiest day for airlines. 🧑‍💻 Alternative Ways to Reach Mokulele Airlines (If Phone Fails) If calling 📞 1-866-284-3022 does not work or you are stuck in a long queue, here are a few alternate ways to get help: 💬 1. Online Chat (Limited Availability) Visit www.flyfrontier.com Scroll down and look for the “Let’s Chat” option. This can connect you to a live agent or AI assistant, depending on availability. 📧 2. Email Support You can also submit a help request through their Customer Support Form online. Use this for non-urgent matters like refund requests or documentation review. 📱 3. Social Media Tweet or DM Mokulele Airlines on platforms like Twitter/X (@FlyFrontier) or send a message via Facebook. Sometimes social media agents respond faster than the phone team during high-volume periods. 📲 4. Mobile App Download the Mokulele Airlines App, log in, and navigate to “My Trips” or “Support” for quick options. While this will not guarantee a live agent, you might find answers to basic questions faster. ❗ Common Issues That Require a Live Agent While many tasks can be done online, certain problems are best resolved with a real person at 1-866-284-3022: 🛑 Double charges or billing issues 🔄 Complex flight changes involving multiple passengers 🛄 Lost or delayed baggage ✍️ Legal name changes (marriage, divorce, etc.) 🧑‍⚕️ Medical or accessibility needs during travel In these cases, avoid wasting time — call directly and ask for a live agent. 🚨 Beware of Fake Numbers and Scams Only use the official number: 📞 1-866-284-3022. Scammers often post fake “Frontier support” numbers online, asking for credit card info or login credentials. 🛡️ Never share your full credit card number or personal information with an unverified source. ✅ Final Thoughts Talking to a live person at an airline should not be this hard — but when it comes to Mokulele Airlines, knowing the right steps and phone number makes all the difference. 🧠 Remember: Dial 📞 1-866-284-3022 Press 0 to reach a live agent Call during off-peak hours Have your booking info ready Use alternative methods if the line is too busy 💡 The sooner you reach out, the more options you'll have to resolve your issue. ✈️ Whether you are rebooking, fixing an error, or checking a flight, calling 1-866-284-3022 connects you with someone who can truly help.

2 months, 2 weeks

1
0
0 0

How to Reach a Live Agent Mesa Airlines: A Quick Guide

by a.ishv.u26.0＠gmail.com

Need to talk to a real person Mesa Airlines? Calling Mesa Airlines directly at 📞 1-866-284-3022. Whether you are trying to make a flight change, cancel your booking, ask about baggage, or resolve a booking issue, reaching a live agent can save you time, stress, and confusion. While automated systems are useful for simple tasks, some situations just need a human touch. In this guide, we will walk you through exactly how to reach a live person, what to prepare before calling, and alternate methods if the phone lines are busy. ☎️ First Things First: Call 1-866-284-3022 The most direct and reliable way to speak with a real Mesa Airlines representative is by calling 📞 1-866-284-3022. This is Frontier’s official customer service number and should be your go-to for: ✈️ Flight changes or cancellations 🧾 Refund or credit questions 🛄 Baggage issues 🔁 Name corrections 🛑 Check-in or boarding problems 💺 Seat selection and upgrades Pro Tip: When calling, try to do so during non-peak hours — early mornings or late evenings — to reduce your hold time. 🎧 How to Navigate the Automated Menu When you call 1-866-284-3022, you will first hear an automated system. To get through to a live person faster, follow these steps: Dial 1-866-284-3022 Wait for the automated greeting to begin Press “1” for English (or “2” for Spanish) Press “2” for existing reservations Press “0” to speak with an agent (you may need to press "0" more than once) 👉 If “0” does not work immediately, stay on the line. Sometimes the system transfers you to an agent after a brief wait without needing more input. If the lines are busy and you are placed on hold, do not hang up — wait times vary, but you will eventually reach someone. 🧾 What to Have Ready Before You Call To help the agent assist you faster, make sure you have the following details on hand: 📌 Your confirmation code or booking number 📌 The full name on the reservation 📌 Your flight date and destination 📌 Any relevant documents (ID, credit card, etc.) 📌 A notepad for writing down instructions or confirmation numbers If you are calling to fix a mistake or request a refund, be prepared to briefly explain the issue and possibly provide documentation via email upon request. ⏰ Best Times to Call 1-866-284-3022 Customer service lines can be busy, especially during: ⚠️ Holidays ⚠️ Severe weather or flight delays ⚠️ Early morning flight hours 🎯 For the best chance at a short wait, try calling during these times: 🕔 5:00 AM – 7:00 AM (EST) 🕘 9:00 PM – 11:00 PM (EST) 📅 Midweek (Tuesdays and Wednesdays) Avoid Mondays if possible — it is the busiest day for airlines. 🧑‍💻 Alternative Ways to Reach Mesa Airlines (If Phone Fails) If calling 📞 1-866-284-3022 does not work or you are stuck in a long queue, here are a few alternate ways to get help: 💬 1. Online Chat (Limited Availability) Visit www.flyfrontier.com Scroll down and look for the “Let’s Chat” option. This can connect you to a live agent or AI assistant, depending on availability. 📧 2. Email Support You can also submit a help request through their Customer Support Form online. Use this for non-urgent matters like refund requests or documentation review. 📱 3. Social Media Tweet or DM Mesa Airlines on platforms like Twitter/X (@FlyFrontier) or send a message via Facebook. Sometimes social media agents respond faster than the phone team during high-volume periods. 📲 4. Mobile App Download the Mesa Airlines App, log in, and navigate to “My Trips” or “Support” for quick options. While this will not guarantee a live agent, you might find answers to basic questions faster. ❗ Common Issues That Require a Live Agent While many tasks can be done online, certain problems are best resolved with a real person at 1-866-284-3022: 🛑 Double charges or billing issues 🔄 Complex flight changes involving multiple passengers 🛄 Lost or delayed baggage ✍️ Legal name changes (marriage, divorce, etc.) 🧑‍⚕️ Medical or accessibility needs during travel In these cases, avoid wasting time — call directly and ask for a live agent. 🚨 Beware of Fake Numbers and Scams Only use the official number: 📞 1-866-284-3022. Scammers often post fake “Frontier support” numbers online, asking for credit card info or login credentials. 🛡️ Never share your full credit card number or personal information with an unverified source. ✅ Final Thoughts Talking to a live person at an airline should not be this hard — but when it comes to Mesa Airlines, knowing the right steps and phone number makes all the difference. 🧠 Remember: Dial 📞 1-866-284-3022 Press 0 to reach a live agent Call during off-peak hours Have your booking info ready Use alternative methods if the line is too busy 💡 The sooner you reach out, the more options you'll have to resolve your issue. ✈️ Whether you are rebooking, fixing an error, or checking a flight, calling 1-866-284-3022 connects you with someone who can truly help.

2 months, 2 weeks

1
0
0 0

[PATCH] selftests/unix: Add test for ECONNRESET and EOF behaviour

by Sunday Adelodun

Add selftests verifying the EOF and ECONNRESET behaviour of UNIX domain sockets (SOCK_STREAM and SOCK_DGRAM). The tests document Linux's semantics and clarify the long-standing differences with BSD. Suggested-by: Kuniyuki Iwashima <kuniyu(a)google.com> Signed-off-by: Sunday Adelodun <adelodunolaoluwa(a)yahoo.com> --- tools/testing/selftests/net/unix/Makefile | 5 + .../selftests/net/unix/test_unix_connreset.c | 147 ++++++++++++++++++ 2 files changed, 152 insertions(+) create mode 100644 tools/testing/selftests/net/unix/Makefile create mode 100644 tools/testing/selftests/net/unix/test_unix_connreset.c diff --git a/tools/testing/selftests/net/unix/Makefile b/tools/testing/selftests/net/unix/Makefile new file mode 100644 index 000000000000..a52992ba23d9 --- /dev/null +++ b/tools/testing/selftests/net/unix/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 +TEST_GEN_PROGS := test_unix_connreset + +include ../../lib.mk + diff --git a/tools/testing/selftests/net/unix/test_unix_connreset.c b/tools/testing/selftests/net/unix/test_unix_connreset.c new file mode 100644 index 000000000000..a8720c7565cb --- /dev/null +++ b/tools/testing/selftests/net/unix/test_unix_connreset.c @@ -0,0 +1,147 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Selftest for UNIX socket close and ECONNRESET behaviour. + * + * This test verifies that: + * 1. SOCK_STREAM sockets return EOF when peer closes normally. + * 2. SOCK_STREAM sockets return ECONNRESET if peer closes with unread data. + * 3. SOCK_DGRAM sockets do not return ECONNRESET when peer closes, + * unlike BSD where this error is observed. + * + * These tests document the intended Linux behaviour, distinguishing it from BSD. + * + */ + +#define _GNU_SOURCE +#include <stdlib.h> +#include <string.h> +#include <fcntl.h> +#include <unistd.h> +#include <errno.h> +#include <sys/socket.h> +#include <sys/un.h> +#include "../../kselftest_harness.h" + +#define SOCK_PATH "/tmp/test_unix_connreset.sock" + +static void remove_socket_file(void) +{ + unlink(SOCK_PATH); +} + +/* Test 1: peer closes normally */ +TEST(stream_eof) +{ + int server, client, child; + struct sockaddr_un addr = {0}; + char buf[16] = {0}; + ssize_t n; + + server = socket(AF_UNIX, SOCK_STREAM, 0); + ASSERT_GE(server, 0); + + addr.sun_family = AF_UNIX; + strcpy(addr.sun_path, SOCK_PATH); + remove_socket_file(); + + ASSERT_EQ(bind(server, (struct sockaddr *)&addr, sizeof(addr)), 0); + ASSERT_EQ(listen(server, 1), 0); + + client = socket(AF_UNIX, SOCK_STREAM, 0); + ASSERT_GE(client, 0); + ASSERT_EQ(connect(client, (struct sockaddr *)&addr, sizeof(addr)), 0); + + child = accept(server, NULL, NULL); + ASSERT_GE(child, 0); + + /* Peer closes normally */ + close(child); + + n = recv(client, buf, sizeof(buf), 0); + EXPECT_EQ(n, 0); + TH_LOG("recv=%zd errno=%d (%s)", n, errno, strerror(errno)); + + close(client); + close(server); + remove_socket_file(); +} + +/* Test 2: peer closes with unread data */ +TEST(stream_reset_unread) +{ + int server, client, child; + struct sockaddr_un addr = {0}; + char buf[16] = {0}; + ssize_t n; + + server = socket(AF_UNIX, SOCK_STREAM, 0); + ASSERT_GE(server, 0); + + addr.sun_family = AF_UNIX; + strcpy(addr.sun_path, SOCK_PATH); + remove_socket_file(); + + ASSERT_EQ(bind(server, (struct sockaddr *)&addr, sizeof(addr)), 0); + ASSERT_EQ(listen(server, 1), 0); + + client = socket(AF_UNIX, SOCK_STREAM, 0); + ASSERT_GE(client, 0); + ASSERT_EQ(connect(client, (struct sockaddr *)&addr, sizeof(addr)), 0); + + child = accept(server, NULL, NULL); + ASSERT_GE(child, 0); + + /* Send data that will remain unread by client */ + send(client, "hello", 5, 0); + close(child); + + n = recv(client, buf, sizeof(buf), 0); + EXPECT_LT(n, 0); + EXPECT_EQ(errno, ECONNRESET); + TH_LOG("recv=%zd errno=%d (%s)", n, errno, strerror(errno)); + + close(client); + close(server); + remove_socket_file(); +} + +/* Test 3: SOCK_DGRAM peer close */ +TEST(dgram_reset) +{ + int server, client; + int flags; + struct sockaddr_un addr = {0}; + char buf[16] = {0}; + ssize_t n; + + server = socket(AF_UNIX, SOCK_DGRAM, 0); + ASSERT_GE(server, 0); + + addr.sun_family = AF_UNIX; + strcpy(addr.sun_path, SOCK_PATH); + remove_socket_file(); + + ASSERT_EQ(bind(server, (struct sockaddr *)&addr, sizeof(addr)), 0); + + client = socket(AF_UNIX, SOCK_DGRAM, 0); + ASSERT_GE(client, 0); + ASSERT_EQ(connect(client, (struct sockaddr *)&addr, sizeof(addr)), 0); + + send(client, "hello", 5, 0); + close(server); + + flags = fcntl(client, F_GETFL, 0); + fcntl(client, F_SETFL, flags | O_NONBLOCK); + + n = recv(client, buf, sizeof(buf), 0); + TH_LOG("recv=%zd errno=%d (%s)", n, errno, strerror(errno)); + /* Expect EAGAIN or EWOULDBLOCK because there is no datagram and peer is closed. */ + EXPECT_LT(n, 0); + EXPECT_TRUE(errno == EAGAIN); + + close(client); + remove_socket_file(); +} + +TEST_HARNESS_MAIN + -- 2.43.0

2 months, 2 weeks

2
2
0 0

[PATCH net-next 2/2] selftests: bridge_mdb: Add a test for MDB flush on snooping disable

by Petr Machata

Check that non-permanent MDB entries are removed as IGMP / MLD snooping is disabled. Signed-off-by: Petr Machata <petrm(a)nvidia.com> Reviewed-by: Ido Schimmel <idosch(a)nvidia.com> --- Notes: CC: linux-kselftest(a)vger.kernel.org CC: Shuah Khan <shuah(a)kernel.org> .../selftests/net/forwarding/bridge_mdb.sh | 100 +++++++++++++++++- 1 file changed, 98 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/net/forwarding/bridge_mdb.sh b/tools/testing/selftests/net/forwarding/bridge_mdb.sh index 8c1597ebc2d3..e86d77946585 100755 --- a/tools/testing/selftests/net/forwarding/bridge_mdb.sh +++ b/tools/testing/selftests/net/forwarding/bridge_mdb.sh @@ -28,6 +28,7 @@ ALL_TESTS=" cfg_test fwd_test ctrl_test + disable_test " NUM_NETIFS=4 @@ -64,7 +65,10 @@ h2_destroy() switch_create() { - ip link add name br0 type bridge vlan_filtering 1 vlan_default_pvid 0 \ + local vlan_filtering=$1; shift + + ip link add name br0 type bridge \ + vlan_filtering "$vlan_filtering" vlan_default_pvid 0 \ mcast_snooping 1 mcast_igmp_version 3 mcast_mld_version 2 bridge vlan add vid 10 dev br0 self bridge vlan add vid 20 dev br0 self @@ -118,7 +122,7 @@ setup_prepare() h1_create h2_create - switch_create + switch_create 1 } cleanup() @@ -1357,6 +1361,98 @@ ctrl_test() ctrl_mldv2_is_in_test } +check_group() +{ + local group=$1; shift + local vid=$1; shift + local should_fail=$1; shift + local when=$1; shift + local -a vidkws + + if ((vid)); then + vidkws=(vid "$vid") + fi + + bridge mdb get dev br0 grp "$group" "${vidkws[@]}" 2>/dev/null | + grep -q "port $swp1" + check_err_fail "$should_fail" $? "$group seen $when snooping disable:" +} + +__disable_test() +{ + local vid=$1; shift + local what=$1; shift + local -a vidkws + + if ((vid)); then + vidkws=(vid "$vid") + fi + + RET=0 + + bridge mdb add dev br0 port "$swp1" grp ff0e::1 permanent \ + "${vidkws[@]}" filter_mode include source_list 2001:db8:1::1 + bridge mdb add dev br0 port "$swp1" grp ff0e::2 permanent \ + "${vidkws[@]}" filter_mode exclude + + bridge mdb add dev br0 port "$swp1" grp ff0e::3 \ + "${vidkws[@]}" filter_mode include source_list 2001:db8:1::2 + bridge mdb add dev br0 port "$swp1" grp ff0e::4 \ + "${vidkws[@]}" filter_mode exclude + + bridge mdb add dev br0 port "$swp1" grp 239.1.1.1 permanent \ + "${vidkws[@]}" filter_mode include source_list 192.0.2.1 + bridge mdb add dev br0 port "$swp1" grp 239.1.1.2 permanent \ + "${vidkws[@]}" filter_mode exclude + + bridge mdb add dev br0 port "$swp1" grp 239.1.1.3 \ + "${vidkws[@]}" filter_mode include source_list 192.0.2.2 + bridge mdb add dev br0 port "$swp1" grp 239.1.1.4 \ + "${vidkws[@]}" filter_mode exclude + + check_group ff0e::1 "$vid" 0 "before" + check_group ff0e::2 "$vid" 0 "before" + check_group ff0e::3 "$vid" 0 "before" + check_group ff0e::4 "$vid" 0 "before" + + check_group 239.1.1.1 "$vid" 0 "before" + check_group 239.1.1.2 "$vid" 0 "before" + check_group 239.1.1.3 "$vid" 0 "before" + check_group 239.1.1.4 "$vid" 0 "before" + + ip link set dev br0 type bridge mcast_snooping 0 + + check_group ff0e::1 "$vid" 0 "after" + check_group ff0e::2 "$vid" 0 "after" + check_group ff0e::3 "$vid" 1 "after" + check_group ff0e::4 "$vid" 1 "after" + + check_group 239.1.1.1 "$vid" 0 "after" + check_group 239.1.1.2 "$vid" 0 "after" + check_group 239.1.1.3 "$vid" 1 "after" + check_group 239.1.1.4 "$vid" 1 "after" + + log_test "$what: Flush after disable" + + ip link set dev br0 type bridge mcast_snooping 1 + sleep 10 +} + +disable_test() +{ + __disable_test 10 802.1q + + switch_destroy + switch_create 0 + setup_wait + + __disable_test 0 802.1d + + switch_destroy + switch_create 1 + setup_wait +} + if ! bridge mdb help 2>&1 | grep -q "flush"; then echo "SKIP: iproute2 too old, missing bridge mdb flush support" exit $ksft_skip -- 2.49.0

2 months, 2 weeks

2
1
0 0

[PATCH v1] selftests: cachestat: Fix warning on declaration under label

by Sidharth Seela

Fix warning caused from declaration under a case label. The proper way is to declare variable at the beginning of the function. The warning came from running clang using LLVM=1; and is as follows: -- -test_cachestat.c:260:3: warning: label followed by a declaration is a C23 extension [-Wc23-extensions] 260 | char *map = mmap(NULL, filesize, PROT_READ | PROT_WRITE, | Signed-off-by: Sidharth Seela <sidharthseela(a)gmail.com> --- diff --git a/tools/testing/selftests/cachestat/test_cachestat.c b/tools/testing/selftests/cachestat/test_cachestat.c index c952640f163b..0305e736f2b8 100644 --- a/tools/testing/selftests/cachestat/test_cachestat.c +++ b/tools/testing/selftests/cachestat/test_cachestat.c @@ -226,7 +226,7 @@ bool run_cachestat_test(enum file_type type) int syscall_ret; size_t compute_len = PS * 512; struct cachestat_range cs_range = { PS, compute_len }; - char *filename = "tmpshmcstat"; + char *filename = "tmpshmcstat", *map; struct cachestat cs; bool ret = true; int fd; @@ -257,7 +257,7 @@ bool run_cachestat_test(enum file_type type) } break; case FILE_MMAP: - char *map = mmap(NULL, filesize, PROT_READ | PROT_WRITE, + map = mmap(NULL, filesize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); if (map == MAP_FAILED) { -- 2.47.3

2 months, 2 weeks

6
7
0 0

[PATCH 0/3] compiler_types: Introduce __counted_by_ptr()

by Kees Cook

Hi, Add the __counted_by_ptr() macro for annotating pointer struct members with the "counted_by" attribute. Add LKDTM test, and a first user. -Kees Kees Cook (3): compiler_types: Introduce __counted_by_ptr() lkdtm/bugs: Add __counted_by_ptr() test PTR_BOUNDS coredump: Use __counted_by_ptr for struct core_name::corename init/Kconfig | 11 +++ Makefile | 4 ++ include/linux/compiler_types.h | 21 +++++- include/uapi/linux/stddef.h | 4 ++ drivers/misc/lkdtm/bugs.c | 90 ++++++++++++++++++++++--- fs/coredump.c | 8 +-- tools/testing/selftests/lkdtm/tests.txt | 2 + 7 files changed, 127 insertions(+), 13 deletions(-) -- 2.34.1

2 months, 2 weeks

4
12
0 0

[PATCH v22 00/28] riscv control-flow integrity for usermode

by Deepak Gupta

v22: fixing build error due to -march=zicfiss being picked in gcc-13 and above but not actually doing any codegen or recognizing instruction for zicfiss. Change in v22 makes dependence on `-fcf-protection=full` compiler flag to ensure that toolchain has support and then only CONFIG_RISCV_USER_CFI will be visible in menuconfig. v21: fixed build errors. Basics and overview =================== Software with larger attack surfaces (e.g. network facing apps like databases, browsers or apps relying on browser runtimes) suffer from memory corruption issues which can be utilized by attackers to bend control flow of the program to eventually gain control (by making their payload executable). Attackers are able to perform such attacks by leveraging call-sites which rely on indirect calls or return sites which rely on obtaining return address from stack memory. To mitigate such attacks, risc-v extension zicfilp enforces that all indirect calls must land on a landing pad instruction `lpad` else cpu will raise software check exception (a new cpu exception cause code on riscv). Similarly for return flow, risc-v extension zicfiss extends architecture with - `sspush` instruction to push return address on a shadow stack - `sspopchk` instruction to pop return address from shadow stack and compare with input operand (i.e. return address on stack) - `sspopchk` to raise software check exception if comparision above was a mismatch - Protection mechanism using which shadow stack is not writeable via regular store instructions More information an details can be found at extensions github repo [1]. Equivalent to landing pad (zicfilp) on x86 is `ENDBRANCH` instruction in Intel CET [3] and branch target identification (BTI) [4] on arm. Similarly x86's Intel CET has shadow stack [5] and arm64 has guarded control stack (GCS) [6] which are very similar to risc-v's zicfiss shadow stack. x86 and arm64 support for user mode shadow stack is already in mainline. Kernel awareness for user control flow integrity ================================================ This series picks up Samuel Holland's envcfg changes [2] as well. So if those are being applied independently, they should be removed from this series. Enabling: In order to maintain compatibility and not break anything in user mode, kernel doesn't enable control flow integrity cpu extensions on binary by default. Instead exposes a prctl interface to enable, disable and lock the shadow stack or landing pad feature for a task. This allows userspace (loader) to enumerate if all objects in its address space are compiled with shadow stack and landing pad support and accordingly enable the feature. Additionally if a subsequent `dlopen` happens on a library, user mode can take a decision again to disable the feature (if incoming library is not compiled with support) OR terminate the task (if user mode policy is strict to have all objects in address space to be compiled with control flow integirty cpu feature). prctl to enable shadow stack results in allocating shadow stack from virtual memory and activating for user address space. x86 and arm64 are also following same direction due to similar reason(s). clone/fork: On clone and fork, cfi state for task is inherited by child. Shadow stack is part of virtual memory and is a writeable memory from kernel perspective (writeable via a restricted set of instructions aka shadow stack instructions) Thus kernel changes ensure that this memory is converted into read-only when fork/clone happens and COWed when fault is taken due to sspush, sspopchk or ssamoswap. In case `CLONE_VM` is specified and shadow stack is to be enabled, kernel will automatically allocate a shadow stack for that clone call. map_shadow_stack: x86 introduced `map_shadow_stack` system call to allow user space to explicitly map shadow stack memory in its address space. It is useful to allocate shadow for different contexts managed by a single thread (green threads or contexts) risc-v implements this system call as well. signal management: If shadow stack is enabled for a task, kernel performs an asynchronous control flow diversion to deliver the signal and eventually expects userspace to issue sigreturn so that original execution can be resumed. Even though resume context is prepared by kernel, it is in user space memory and is subject to memory corruption and corruption bugs can be utilized by attacker in this race window to perform arbitrary sigreturn and eventually bypass cfi mechanism. Another issue is how to ensure that cfi related state on sigcontext area is not trampled by legacy apps or apps compiled with old kernel headers. In order to mitigate control-flow hijacting, kernel prepares a token and place it on shadow stack before signal delivery and places address of token in sigcontext structure. During sigreturn, kernel obtains address of token from sigcontext struture, reads token from shadow stack and validates it and only then allow sigreturn to succeed. Compatiblity issue is solved by adopting dynamic sigcontext management introduced for vector extension. This series re-factor the code little bit to allow future sigcontext management easy (as proposed by Andy Chiu from SiFive) config and compilation: Introduce a new risc-v config option `CONFIG_RISCV_USER_CFI`. Selecting this config option picks the kernel support for user control flow integrity. This optin is presented only if toolchain has shadow stack and landing pad support. And is on purpose guarded by toolchain support. Reason being that eventually vDSO also needs to be compiled in with shadow stack and landing pad support. vDSO compile patches are not included as of now because landing pad labeling scheme is yet to settle for usermode runtime. To get more information on kernel interactions with respect to zicfilp and zicfiss, patch series adds documentation for `zicfilp` and `zicfiss` in following: Documentation/arch/riscv/zicfiss.rst Documentation/arch/riscv/zicfilp.rst How to test this series ======================= Toolchain --------- $ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev $ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static" $ make -j$(nproc) Qemu ---- Get the lastest qemu $ cd qemu $ mkdir build $ cd build $ ../configure --target-list=riscv64-softmmu $ make -j$(nproc) Opensbi ------- $ git clone git@github.com:deepak0414/opensbi.git -b v6_cfi_spec_split_opensbi $ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic Linux ----- Running defconfig is fine. CFI is enabled by default if the toolchain supports it. $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) Running ------- Modify your qemu command to have: -bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin -cpu rv64,zicfilp=true,zicfiss=true,zimop=true,zcmop=true References ========== [1] - https://github.com/riscv/riscv-cfi [2] - https://lore.kernel.org/all/20240814081126.956287-1-samuel.holland@sifive.c… [3] - https://lwn.net/Articles/889475/ [4] - https://developer.arm.com/documentation/109576/0100/Branch-Target-Identific… [5] - https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-i… [6] - https://lwn.net/Articles/940403/ To: Thomas Gleixner <tglx(a)linutronix.de> To: Ingo Molnar <mingo(a)redhat.com> To: Borislav Petkov <bp(a)alien8.de> To: Dave Hansen <dave.hansen(a)linux.intel.com> To: x86(a)kernel.org To: H. Peter Anvin <hpa(a)zytor.com> To: Andrew Morton <akpm(a)linux-foundation.org> To: Liam R. Howlett <Liam.Howlett(a)oracle.com> To: Vlastimil Babka <vbabka(a)suse.cz> To: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> To: Paul Walmsley <paul.walmsley(a)sifive.com> To: Palmer Dabbelt <palmer(a)dabbelt.com> To: Albert Ou <aou(a)eecs.berkeley.edu> To: Conor Dooley <conor(a)kernel.org> To: Rob Herring <robh(a)kernel.org> To: Krzysztof Kozlowski <krzk+dt(a)kernel.org> To: Arnd Bergmann <arnd(a)arndb.de> To: Christian Brauner <brauner(a)kernel.org> To: Peter Zijlstra <peterz(a)infradead.org> To: Oleg Nesterov <oleg(a)redhat.com> To: Eric Biederman <ebiederm(a)xmission.com> To: Kees Cook <kees(a)kernel.org> To: Jonathan Corbet <corbet(a)lwn.net> To: Shuah Khan <shuah(a)kernel.org> To: Jann Horn <jannh(a)google.com> To: Conor Dooley <conor+dt(a)kernel.org> To: Miguel Ojeda <ojeda(a)kernel.org> To: Alex Gaynor <alex.gaynor(a)gmail.com> To: Boqun Feng <boqun.feng(a)gmail.com> To: Gary Guo <gary(a)garyguo.net> To: Björn Roy Baron <bjorn3_gh(a)protonmail.com> To: Benno Lossin <benno.lossin(a)proton.me> To: Andreas Hindborg <a.hindborg(a)kernel.org> To: Alice Ryhl <aliceryhl(a)google.com> To: Trevor Gross <tmgross(a)umich.edu> Cc: linux-kernel(a)vger.kernel.org Cc: linux-fsdevel(a)vger.kernel.org Cc: linux-mm(a)kvack.org Cc: linux-riscv(a)lists.infradead.org Cc: devicetree(a)vger.kernel.org Cc: linux-arch(a)vger.kernel.org Cc: linux-doc(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: alistair.francis(a)wdc.com Cc: richard.henderson(a)linaro.org Cc: jim.shu(a)sifive.com Cc: andybnac(a)gmail.com Cc: kito.cheng(a)sifive.com Cc: charlie(a)rivosinc.com Cc: atishp(a)rivosinc.com Cc: evan(a)rivosinc.com Cc: cleger(a)rivosinc.com Cc: alexghiti(a)rivosinc.com Cc: samitolvanen(a)google.com Cc: broonie(a)kernel.org Cc: rick.p.edgecombe(a)intel.com Cc: rust-for-linux(a)vger.kernel.org changelog --------- v22: - CONFIG_RISCV_USER_CFI was by default "n". With dual vdso support it is default "y" (if toolchain supports it). Fixing build error due to "-march=zicfiss" being picked in gcc-13 partially. gcc-13 only recognizes the flag but not actually doing any codegen or recognizing instruction for zicfiss. Change in v22 makes dependence on `-fcf-protection=full` compiler flag to ensure that toolchain has support and then only CONFIG_RISCV_USER_CFI will be visible in menuconfig. - picked up tags and some cosmetic changes in commit message for dual vdso patch. v21: - Fixing build errors due to changes in arch/riscv/include/asm/vdso.h Using #ifdef instead of IS_ENABLED in arch/riscv/include/asm/vdso.h vdso-cfi-offsets.h should be included only when CONFIG_RISCV_USER_CFI is selected. v20: - rebased on v6.18-rc1. - Added two vDSO support. If `CONFIG_RISCV_USER_CFI` is selected two vDSOs are compiled (one for hardware prior to RVA23 and one for RVA23 onwards). Kernel exposes RVA23 vDSO if hardware/cpu implements zimop else exposes existing vDSO to userspace. - default selection for `CONFIG_RISCV_USER_CFI` is "Yes". - replaced "__ASSEMBLY__" with "__ASSEMBLER__" v19: - riscv_nousercfi was `int`. changed it to unsigned long. Thanks to Alex Ghiti for reporting it. It was a bug. - ELP is cleared on trap entry only when CONFIG_64BIT. - restore ssp back on return to usermode was being done before `riscv_v_context_nesting_end` on trap exit path. If kernel shadow stack were enabled this would result in kernel operating on user shadow stack and panic (as I found in my testing of kcfi patch series). So fixed that. v18: - rebased on 6.16-rc1 - uprobe handling clears ELP in sstatus image in pt_regs - vdso was missing shadow stack elf note for object files. added that. Additional asm file for vdso needed the elf marker flag. toolchain should complain if `-fcf-protection=full` and marker is missing for object generated from asm file. Asked toolchain folks to fix this. Although no reason to gate the merge on that. - Split up compile options for march and fcf-protection in vdso Makefile - CONFIG_RISCV_USER_CFI option is moved under "Kernel features" menu Added `arch/riscv/configs/hardening.config` fragment which selects CONFIG_RISCV_USER_CFI v17: - fixed warnings due to empty macros in usercfi.h (reported by alexg) - fixed prefixes in commit titles reported by alexg - took below uprobe with fcfi v2 patch from Zong Li and squashed it with "riscv/traps: Introduce software check exception and uprobe handling" https://lore.kernel.org/all/20250604093403.10916-1-zong.li@sifive.com/ v16: - If FWFT is not implemented or returns error for shadow stack activation, then no_usercfi is set to disable shadow stack. Although this should be picked up by extension validation and activation. Fixed this bug for zicfilp and zicfiss both. Thanks to Charlie Jenkins for reporting this. - If toolchain doesn't support cfi, cfi kselftest shouldn't build. Suggested by Charlie Jenkins. - Default for CONFIG_RISCV_USER_CFI is set to no. Charlie/Atish suggested to keep it off till we have more hardware availibility with RVA23 profile and zimop/zcmop implemented. Else this will start breaking people's workflow - Includes the fix if "!RV64 and !SBI" then definitions for FWFT in asm-offsets.c error. v15: - Toolchain has been updated to include `-fcf-protection` flag. This exists for x86 as well. Updated kernel patches to compile vDSO and selftest to compile with `fcf-protection=full` flag. - selecting CONFIG_RISCV_USERCFI selects CONFIG_RISCV_SBI. - Patch to enable shadow stack for kernel wasn't hidden behind CONFIG_RISCV_USERCFI and CONFIG_RISCV_SBI. fixed that. v14: - rebased on top of palmer/sbi-v3. Thus dropped clement's FWFT patches Updated RISCV_ISA_EXT_XXXX in hwcap and hwprobe constants. - Took Radim's suggestions on bitfields. - Placed cfi_state at the end of thread_info block so that current situation is not disturbed with respect to member fields of thread_info in single cacheline. v13: - cpu_supports_shadow_stack/cpu_supports_indirect_br_lp_instr uses riscv_has_extension_unlikely() - uses nops(count) to create nop slide - RISCV_ACQUIRE_BARRIER is not needed in `amo_user_shstk`. Removed it - changed ternaries to simply use implicit casting to convert to bool. - kernel command line allows to disable zicfilp and zicfiss independently. updated kernel-parameters.txt. - ptrace user abi for cfi uses bitmasks instead of bitfields. Added ptrace kselftest. - cosmetic and grammatical changes to documentation. v12: - It seems like I had accidently squashed arch agnostic indirect branch tracking prctl and riscv implementation of those prctls. Split them again. - set_shstk_status/set_indir_lp_status perform CSR writes only when CPU support is available. As suggested by Zong Li. - Some minor clean up in kselftests as suggested by Zong Li. v11: - patch "arch/riscv: compile vdso with landing pad" was unconditionally selecting `_zicfilp` for vDSO compile. fixed that. Changed `lpad 1` to to `lpad 0`. v10: - dropped "mm: helper `is_shadow_stack_vma` to check shadow stack vma". This patch is not that interesting to this patch series for risc-v. There are instances in arch directories where VM_SHADOW_STACK flag is anyways used. Dropping this patch to expedite merging in riscv tree. - Took suggestions from `Clement` on "riscv: zicfiss / zicfilp enumeration" to validate presence of cfi based on config. - Added a patch for vDSO to have `lpad 0`. I had omitted this earlier to make sure we add single vdso object with cfi enabled. But a vdso object with scheme of zero labeled landing pad is least common denominator and should work with all objects of zero labeled as well as function-signature labeled objects. v9: - rebased on master (39a803b754d5 fix braino in "9p: fix ->rename_sem exclusion") - dropped "mm: Introduce ARCH_HAS_USER_SHADOW_STACK" (master has it from arm64/gcs) - dropped "prctl: arch-agnostic prctl for shadow stack" (master has it from arm64/gcs) v8: - rebased on palmer/for-next - dropped samuel holland's `envcfg` context switch patches. they are in parlmer/for-next v7: - Removed "riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv" Instead using `deactivate_mm` flow to clean up. see here for more context https://lore.kernel.org/all/20230908203655.543765-1-rick.p.edgecombe@intel.… - Changed the header include in `kselftest`. Hopefully this fixes compile issue faced by Zong Li at SiFive. - Cleaned up an orphaned change to `mm/mmap.c` in below patch "riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE" - Lock interfaces for shadow stack and indirect branch tracking expect arg == 0 Any future evolution of this interface should accordingly define how arg should be setup. - `mm/map.c` has an instance of using `VM_SHADOW_STACK`. Fixed it to use helper `is_shadow_stack_vma`. - Link to v6: https://lore.kernel.org/r/20241008-v5_user_cfi_series-v6-0-60d9fe073f37@riv… v6: - Picked up Samuel Holland's changes as is with `envcfg` placed in `thread` instead of `thread_info` - fixed unaligned newline escapes in kselftest - cleaned up messages in kselftest and included test output in commit message - fixed a bug in clone path reported by Zong Li - fixed a build issue if CONFIG_RISCV_ISA_V is not selected (this was introduced due to re-factoring signal context management code) v5: - rebased on v6.12-rc1 - Fixed schema related issues in device tree file - Fixed some of the documentation related issues in zicfilp/ss.rst (style issues and added index) - added `SHADOW_STACK_SET_MARKER` so that implementation can define base of shadow stack. - Fixed warnings on definitions added in usercfi.h when CONFIG_RISCV_USER_CFI is not selected. - Adopted context header based signal handling as proposed by Andy Chiu - Added support for enabling kernel mode access to shadow stack using FWFT (https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-firmware…) - Link to v5: https://lore.kernel.org/r/20241001-v5_user_cfi_series-v1-0-3ba65b6e550f@riv… (Note: I had an issue in my workflow due to which version number wasn't picked up correctly while sending out patches) v4: - rebased on 6.11-rc6 - envcfg: Converged with Samuel Holland's patches for envcfg management on per- thread basis. - vma_is_shadow_stack is renamed to is_vma_shadow_stack - picked up Mark Brown's `ARCH_HAS_USER_SHADOW_STACK` patch - signal context: using extended context management to maintain compatibility. - fixed `-Wmissing-prototypes` compiler warnings for prctl functions - Documentation fixes and amending typos. - Link to v4: https://lore.kernel.org/all/20240912231650.3740732-1-debug@rivosinc.com/ v3: - envcfg logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been picked on per task basis, even though CPU didn't implement it. Fixed in this series. - dt-bindings As suggested, split into separate commit. fixed the messaging that spec is in public review - arch_is_shadow_stack change arch_is_shadow_stack changed to vma_is_shadow_stack - hwprobe zicfiss / zicfilp if present will get enumerated in hwprobe - selftests As suggested, added object and binary filenames to .gitignore Selftest binary anyways need to be compiled with cfi enabled compiler which will make sure that landing pad and shadow stack are enabled. Thus removed separate enable/disable tests. Cleaned up tests a bit. - Link to v3: https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/ v2: - Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow integrity for user mode programs can be compiled in the kernel. - Enabling of control flow integrity for user programs is left to user runtime - This patch series introduces arch agnostic `prctls` to enable shadow stack and indirect branch tracking. And implements them on riscv. --- Changes in v22: - Link to v21: https://lore.kernel.org/r/20251015-v5_user_cfi_series-v21-0-6a07856e90e7@ri… Changes in v21: - Link to v20: https://lore.kernel.org/r/20251013-v5_user_cfi_series-v20-0-b9de4be9912e@ri… Changes in v20: - Link to v19: https://lore.kernel.org/r/20250731-v5_user_cfi_series-v19-0-09b468d7beab@ri… Changes in v19: - Link to v18: https://lore.kernel.org/r/20250711-v5_user_cfi_series-v18-0-a8ee62f9f38e@ri… Changes in v18: - Link to v17: https://lore.kernel.org/r/20250604-v5_user_cfi_series-v17-0-4565c2cf869f@ri… Changes in v17: - Link to v16: https://lore.kernel.org/r/20250522-v5_user_cfi_series-v16-0-64f61a35eee7@ri… Changes in v16: - Link to v15: https://lore.kernel.org/r/20250502-v5_user_cfi_series-v15-0-914966471885@ri… Changes in v15: - changelog posted just below cover letter - Link to v14: https://lore.kernel.org/r/20250429-v5_user_cfi_series-v14-0-5239410d012a@ri… Changes in v14: - changelog posted just below cover letter - Link to v13: https://lore.kernel.org/r/20250424-v5_user_cfi_series-v13-0-971437de586a@ri… Changes in v13: - changelog posted just below cover letter - Link to v12: https://lore.kernel.org/r/20250314-v5_user_cfi_series-v12-0-e51202b53138@ri… Changes in v12: - changelog posted just below cover letter - Link to v11: https://lore.kernel.org/r/20250310-v5_user_cfi_series-v11-0-86b36cbfb910@ri… Changes in v11: - changelog posted just below cover letter - Link to v10: https://lore.kernel.org/r/20250210-v5_user_cfi_series-v10-0-163dcfa31c60@ri… --- Andy Chiu (1): riscv: signal: abstract header saving for setup_sigcontext Deepak Gupta (26): mm: VM_SHADOW_STACK definition for riscv dt-bindings: riscv: zicfilp and zicfiss in dt-bindings (extensions.yaml) riscv: zicfiss / zicfilp enumeration riscv: zicfiss / zicfilp extension csr and bit definitions riscv: usercfi state for task and save/restore of CSR_SSP on trap entry/exit riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE riscv/mm: manufacture shadow stack pte riscv/mm: teach pte_mkwrite to manufacture shadow stack PTEs riscv/mm: write protect and shadow stack riscv/mm: Implement map_shadow_stack() syscall riscv/shstk: If needed allocate a new shadow stack on clone riscv: Implements arch agnostic shadow stack prctls prctl: arch-agnostic prctl for indirect branch tracking riscv: Implements arch agnostic indirect branch tracking prctls riscv/traps: Introduce software check exception and uprobe handling riscv/signal: save and restore of shadow stack for signal riscv/kernel: update __show_regs to print shadow stack register riscv/ptrace: riscv cfi status and state via ptrace and in core files riscv/hwprobe: zicfilp / zicfiss enumeration in hwprobe riscv: kernel command line option to opt out of user cfi riscv: enable kernel access to shadow stack memory via FWFT sbi call arch/riscv: dual vdso creation logic and select vdso based on hw riscv: create a config for shadow stack and landing pad instr support riscv: Documentation for landing pad / indirect branch tracking riscv: Documentation for shadow stack on riscv kselftest/riscv: kselftest for user mode cfi Jim Shu (1): arch/riscv: compile vdso with landing pad and shadow stack note Documentation/admin-guide/kernel-parameters.txt | 8 + Documentation/arch/riscv/index.rst | 2 + Documentation/arch/riscv/zicfilp.rst | 115 +++++ Documentation/arch/riscv/zicfiss.rst | 179 +++++++ .../devicetree/bindings/riscv/extensions.yaml | 14 + arch/riscv/Kconfig | 22 + arch/riscv/Makefile | 8 +- arch/riscv/configs/hardening.config | 4 + arch/riscv/include/asm/asm-prototypes.h | 1 + arch/riscv/include/asm/assembler.h | 44 ++ arch/riscv/include/asm/cpufeature.h | 12 + arch/riscv/include/asm/csr.h | 16 + arch/riscv/include/asm/entry-common.h | 2 + arch/riscv/include/asm/hwcap.h | 2 + arch/riscv/include/asm/mman.h | 26 + arch/riscv/include/asm/mmu_context.h | 7 + arch/riscv/include/asm/pgtable.h | 30 +- arch/riscv/include/asm/processor.h | 1 + arch/riscv/include/asm/thread_info.h | 3 + arch/riscv/include/asm/usercfi.h | 95 ++++ arch/riscv/include/asm/vdso.h | 13 +- arch/riscv/include/asm/vector.h | 3 + arch/riscv/include/uapi/asm/hwprobe.h | 2 + arch/riscv/include/uapi/asm/ptrace.h | 34 ++ arch/riscv/include/uapi/asm/sigcontext.h | 1 + arch/riscv/kernel/Makefile | 2 + arch/riscv/kernel/asm-offsets.c | 10 + arch/riscv/kernel/cpufeature.c | 27 + arch/riscv/kernel/entry.S | 38 ++ arch/riscv/kernel/head.S | 27 + arch/riscv/kernel/process.c | 27 +- arch/riscv/kernel/ptrace.c | 95 ++++ arch/riscv/kernel/signal.c | 148 +++++- arch/riscv/kernel/sys_hwprobe.c | 2 + arch/riscv/kernel/sys_riscv.c | 10 + arch/riscv/kernel/traps.c | 54 ++ arch/riscv/kernel/usercfi.c | 545 +++++++++++++++++++++ arch/riscv/kernel/vdso.c | 7 + arch/riscv/kernel/vdso/Makefile | 40 +- arch/riscv/kernel/vdso/flush_icache.S | 4 + arch/riscv/kernel/vdso/gen_vdso_offsets.sh | 4 +- arch/riscv/kernel/vdso/getcpu.S | 4 + arch/riscv/kernel/vdso/note.S | 3 + arch/riscv/kernel/vdso/rt_sigreturn.S | 4 + arch/riscv/kernel/vdso/sys_hwprobe.S | 4 + arch/riscv/kernel/vdso/vgetrandom-chacha.S | 5 +- arch/riscv/kernel/vdso_cfi/Makefile | 25 + arch/riscv/kernel/vdso_cfi/vdso-cfi.S | 11 + arch/riscv/mm/init.c | 2 +- arch/riscv/mm/pgtable.c | 16 + include/linux/cpu.h | 4 + include/linux/mm.h | 7 + include/uapi/linux/elf.h | 2 + include/uapi/linux/prctl.h | 27 + kernel/sys.c | 30 ++ tools/testing/selftests/riscv/Makefile | 2 +- tools/testing/selftests/riscv/cfi/.gitignore | 3 + tools/testing/selftests/riscv/cfi/Makefile | 16 + tools/testing/selftests/riscv/cfi/cfi_rv_test.h | 82 ++++ tools/testing/selftests/riscv/cfi/riscv_cfi_test.c | 173 +++++++ tools/testing/selftests/riscv/cfi/shadowstack.c | 385 +++++++++++++++ tools/testing/selftests/riscv/cfi/shadowstack.h | 27 + 62 files changed, 2475 insertions(+), 41 deletions(-) --- base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787 change-id: 20240930-v5_user_cfi_series-3dc332f8f5b2 -- - debug

2 months, 2 weeks

2
15
0 0

[PATCH net v2 0/3] mptcp: Fix conflicts between MPTCP and sockmap

by Jiayuan Chen

Overall, we encountered a warning [1] that can be triggered by running the selftest I provided. MPTCP creates subflows for data transmission between two endpoints. However, BPF can use sockops to perform additional operations when TCP completes the three-way handshake. The issue arose because we used sockmap in sockops, which replaces sk->sk_prot and some handlers. Since subflows also have their own specialized handlers, this creates a conflict and leads to traffic failure. Therefore, we need to reject operations targeting subflows. This patchset simply prevents the combination of subflows and sockmap without changing any functionality. A complete integration of MPTCP and sockmap would require more effort, for example, we would need to retrieve the parent socket from subflows in sockmap and implement handlers like read_skb. If maintainers don't object, we can further improve this in subsequent work. v1: https://lore.kernel.org/mptcp/a0a2b87119a06c5ffaa51427a0964a05534fe6f1@linu… [1] truncated warning: [ 18.234652] ------------[ cut here ]------------ [ 18.234664] WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 mptcp_stream_accept+0x34c/0x380 [ 18.234726] Modules linked in: [ 18.234755] RIP: 0010:mptcp_stream_accept+0x34c/0x380 [ 18.234762] RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202 [ 18.234800] PKRU: 55555554 [ 18.234806] Call Trace: [ 18.234810] <TASK> [ 18.234837] do_accept+0xeb/0x190 [ 18.234861] ? __x64_sys_pselect6+0x61/0x80 [ 18.234898] ? _raw_spin_unlock+0x12/0x30 [ 18.234915] ? alloc_fd+0x11e/0x190 [ 18.234925] __sys_accept4+0x8c/0x100 [ 18.234930] __x64_sys_accept+0x1f/0x30 [ 18.234933] x64_sys_call+0x202f/0x20f0 [ 18.234966] do_syscall_64+0x72/0x9a0 [ 18.234979] ? switch_fpu_return+0x60/0xf0 [ 18.234993] ? irqentry_exit_to_user_mode+0xdb/0x1e0 [ 18.235002] ? irqentry_exit+0x3f/0x50 [ 18.235005] ? clear_bhb_loop+0x50/0xa0 [ 18.235022] ? clear_bhb_loop+0x50/0xa0 [ 18.235025] ? clear_bhb_loop+0x50/0xa0 [ 18.235028] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 18.235066] </TASK> [ 18.235109] ---[ end trace 0000000000000000 ]--- [ 18.235677] sockmap: MPTCP sockets are not supported Jiayuan Chen (3): net,mptcp: fix incorrect IPv4/IPv6 fallback detection with BPF Sockmap bpf,sockmap: disallow MPTCP sockets from sockmap updates selftests/bpf: Add mptcp test with sockmap net/core/sock_map.c | 9 ++ net/mptcp/protocol.c | 7 +- .../testing/selftests/bpf/prog_tests/mptcp.c | 136 ++++++++++++++++++ .../selftests/bpf/progs/mptcp_sockmap.c | 43 ++++++ 4 files changed, 193 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/mptcp_sockmap.c -- 2.43.0

2 months, 2 weeks

2
7
0 0

[PATCH 0/9] mm/damon: support pin-point targets removal

by SeongJae Park

DAMON maintains the targets in a list, and allows committing only an entire list of targets having the new parameters. Targets having same index on the lists are treated as matching source and destination targets. If an existing target cannot find a matching one in the sources list, the target is removed. This means that there is no way to remove only a specific monitoring target in the middle of the current targets list. Such pin-point target removal is really needed in some use cases, though. Monitoring access patterns on virtual address spaces of processes that spawned from the same ancestor is one example. If a process of the group is terminated, the user may want to remove the matching DAMON target as soon as possible, to save in-kernel memory usage for the unnecessary target data. The user may also want to do that without turning DAMON off or removing unnecessary targets, to keep the current monitoring results for other active processes. Extend DAMON kernel API and sysfs ABI to support the pin-point removal in the following way. For API, add a new damon_target field, namely 'obsolete'. If the field on parameters commit source target is set, it means the matching destination target is obsolete. Then the parameters commit logic removes the destination target from the existing targets list. For sysfs ABI, add a new file under the target directory, namely 'obsolete_target'. It is connected with the 'obsolete' field of the commit source targets, so internally using the new API. Also add a selftest for the new feature. The related helper scripts for manipulating the sysfs interface and dumping in-kernel DAMON status are also extended for this. Note that the selftest part was initially posted as an individual RFC series [1], but now merged into this one. Bijan Tabatabai (bijan311(a)gmail.com) has originally reported this issue, and participated in this solution design on a GitHub issue [1] for DAMON user-space tool. Changes from RFC (https://lore.kernel.org/20251016214736.84286-1-sj@kernel.org) - Wordsmith commit messages - Add Reviewed-by: tags from Bijan - Add a kselftest for the functionality of the new feature (https://lore.kernel.org/20251018204448.8906-1-sj@kernel.org) [1] https://github.com/damonitor/damo/issues/36 SeongJae Park (9): mm/damon/core: add damon_target->obsolete for pin-point removal mm/damon/sysfs: test commit input against realistic destination mm/damon/sysfs: implement obsolete_target file Docs/admin-guide/mm/damon/usage: document obsolete_target file Docs/ABI/damon: document obsolete_target sysfs file selftests/damon/_damon_sysfs: support obsolete_target file drgn_dump_damon_status: dump damon_target->obsolete sysfs.py: extend assert_ctx_committed() for monitoring targets selftests/damon/sysfs: add obsolete_target test .../ABI/testing/sysfs-kernel-mm-damon | 7 +++ Documentation/admin-guide/mm/damon/usage.rst | 13 +++-- include/linux/damon.h | 6 +++ mm/damon/core.c | 10 +++- mm/damon/sysfs.c | 51 ++++++++++++++++++- tools/testing/selftests/damon/_damon_sysfs.py | 11 +++- .../selftests/damon/drgn_dump_damon_status.py | 1 + tools/testing/selftests/damon/sysfs.py | 48 +++++++++++++++++ 8 files changed, 140 insertions(+), 7 deletions(-) base-commit: a3e008fdd7964bc3e6d876491c202d476406ed59 -- 2.47.3

2 months, 3 weeks

1
4
0 0

[PATCH net-next 0/5] psp: track stats from core and provide a driver stats api

by Daniel Zahka

This series introduces stats counters for psp. Device key rotations, and so called 'stale-events' are common to all drivers and are tracked by the core. A driver facing api is provided for reporting stats required by the "Implementation Requirements" section of the PSP Architecture Specification. Drivers must implement these stats. Lastly, implementations of the driver stats api for mlx5 and netdevsim are included. Here is the output of running the psp selftest suite and then printing out stats with the ynl cli on system with a psp-capable CX7: $ ./ksft-psp-stats/drivers/net/psp.py TAP version 13 1..28 ok 1 psp.test_case # SKIP Test requires IPv4 connectivity ok 2 psp.data_basic_send_v0_ip6 ok 3 psp.test_case # SKIP Test requires IPv4 connectivity ok 4 psp.data_basic_send_v1_ip6 ok 5 psp.test_case # SKIP Test requires IPv4 connectivity ok 6 psp.data_basic_send_v2_ip6 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-128') ok 7 psp.test_case # SKIP Test requires IPv4 connectivity ok 8 psp.data_basic_send_v3_ip6 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-256') ok 9 psp.test_case # SKIP Test requires IPv4 connectivity ok 10 psp.data_mss_adjust_ip6 ok 11 psp.dev_list_devices ok 12 psp.dev_get_device ok 13 psp.dev_get_device_bad ok 14 psp.dev_rotate ok 15 psp.dev_rotate_spi ok 16 psp.assoc_basic ok 17 psp.assoc_bad_dev ok 18 psp.assoc_sk_only_conn ok 19 psp.assoc_sk_only_mismatch ok 20 psp.assoc_sk_only_mismatch_tx ok 21 psp.assoc_sk_only_unconn ok 22 psp.assoc_version_mismatch ok 23 psp.assoc_twice ok 24 psp.data_send_bad_key ok 25 psp.data_send_disconnect ok 26 psp.data_stale_key ok 27 psp.removal_device_rx # XFAIL Test only works on netdevsim ok 28 psp.removal_device_bi # XFAIL Test only works on netdevsim # Totals: pass:19 fail:0 xfail:2 xpass:0 skip:7 error:0 # # Responder logs (0): # STDERR: # Set PSP enable on device 1 to 0x3 # Set PSP enable on device 1 to 0x0 $ cd ynl/ $ ./pyynl/cli.py --spec netlink/specs/psp.yaml --dump get-stats [{'dev-id': 1, 'key-rotations': 5, 'rx-auth-fail': 21, 'rx-bad': 0, 'rx-bytes': 11844, 'rx-error': 0, 'rx-packets': 94, 'stale-events': 6, 'tx-bytes': 1128456, 'tx-error': 0, 'tx-packets': 780}] Daniel Zahka (2): selftests: drv-net: psp: add assertions on core-tracked psp dev stats netdevsim: implement psp device stats Jakub Kicinski (3): psp: report basic stats from the core psp: add stats from psp spec to driver facing api net/mlx5e: Add PSP stats support for Rx/Tx flows Documentation/netlink/specs/psp.yaml | 95 +++++++ .../mellanox/mlx5/core/en_accel/psp.c | 239 ++++++++++++++++-- .../mellanox/mlx5/core/en_accel/psp.h | 18 ++ .../mellanox/mlx5/core/en_accel/psp_rxtx.c | 1 + .../net/ethernet/mellanox/mlx5/core/en_main.c | 5 + drivers/net/netdevsim/netdevsim.h | 5 + drivers/net/netdevsim/psp.c | 27 ++ include/net/psp/types.h | 35 +++ include/uapi/linux/psp.h | 18 ++ net/psp/psp-nl-gen.c | 19 ++ net/psp/psp-nl-gen.h | 2 + net/psp/psp_main.c | 3 +- net/psp/psp_nl.c | 99 ++++++++ net/psp/psp_sock.c | 4 +- tools/testing/selftests/drivers/net/psp.py | 13 + 15 files changed, 566 insertions(+), 17 deletions(-) -- 2.47.3

2 months, 3 weeks

2
6
0 0

[PATCH 0/4] KVM: selftests: Add test of SET_NESTED_STATE with 48-bit L2 on 57-bit L1

by Jim Mattson

Prior to commit 9245fd6b8531 ("KVM: x86: model canonical checks more precisely"), KVM_SET_NESTED_STATE would fail if the state was captured with L2 active, L1 had CR4.LA57 set, L2 did not, and the VMCS12.HOST_GSBASE (or other host-state field checked for canonicality) had an address greater than 48 bits wide. Add a regression test that reproduces the KVM_SET_NESTED_STATE failure conditions. To do so, the first three patches add support for 5-level paging in the selftest L1 VM. Jim Mattson (4): KVM: selftests: Use a loop to create guest page tables KVM: selftests: Use a loop to walk guest page tables KVM: selftests: Add VM_MODE_PXXV57_4K VM mode KVM: selftests: Add a VMX test for LA57 nested state tools/testing/selftests/kvm/Makefile.kvm | 1 + .../testing/selftests/kvm/include/kvm_util.h | 1 + tools/testing/selftests/kvm/lib/kvm_util.c | 21 +++ .../testing/selftests/kvm/lib/x86/processor.c | 66 ++++----- tools/testing/selftests/kvm/lib/x86/vmx.c | 7 +- .../kvm/x86/vmx_la57_nested_state_test.c | 137 ++++++++++++++++++ 6 files changed, 195 insertions(+), 38 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86/vmx_la57_nested_state_test.c -- 2.51.0.470.ga7dc726c21-goog

2 months, 3 weeks

3
19
0 0

[PATCH net 0/2] netconsole: Fix userdata race condition

by Gustavo Luiz Duarte

This series fixes a race condition in netconsole's userdata handling where concurrent message transmission could read partially updated userdata fields, resulting in corrupted netconsole output. The first patch adds a selftest that reproduces the race condition by continuously sending messages while rapidly changing userdata values, detecting any torn reads in the output. The second patch fixes the issue by ensuring update_userdata() holds the target_list_lock while updating both extradata_complete and userdata_length, preventing readers from seeing inconsistent state. This targets net tree as it fixes a bug introduced in commit df03f830d099 ("net: netconsole: cache userdata formatted string in netconsole_target"). Signed-off-by: Gustavo Luiz Duarte <gustavold(a)gmail.com> --- Gustavo Luiz Duarte (2): selftests: netconsole: Add race condition test for userdata corruption netconsole: Fix race condition in between reader and writer of userdata drivers/net/netconsole.c | 5 ++ .../selftests/drivers/net/netcons_race_userdata.sh | 87 ++++++++++++++++++++++ 2 files changed, 92 insertions(+) --- base-commit: ffff5c8fc2af2218a3332b3d5b97654599d50cde change-id: 20251020-netconsole-fix-race-f465f37b57ea Best regards, -- Gustavo Luiz Duarte <gustavold(a)gmail.com>

2 months, 3 weeks

3
5
0 0

[PATCH RESEND] selftests/cachestat: add tmpshmcstat file to .gitignore

by Madhur Kumar

Add the tmpshmcstat file to .gitignore to avoid accidentally staging the build artifact Signed-off-by: Madhur Kumar <madhurkumar004(a)gmail.com> --- tools/testing/selftests/cachestat/.gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/cachestat/.gitignore b/tools/testing/selftests/cachestat/.gitignore index d6c30b43a4bb..abbb13b6e96b 100644 --- a/tools/testing/selftests/cachestat/.gitignore +++ b/tools/testing/selftests/cachestat/.gitignore @@ -1,2 +1,3 @@ # SPDX-License-Identifier: GPL-2.0-only test_cachestat +tmpshmcstat -- 2.51.0

2 months, 3 weeks

2
1
0 0

[PATCH v4 00/23] ARM64 PMU Partitioning

by Colton Lewis

This series creates a new PMU scheme on ARM, a partitioned PMU that allows reserving a subset of counters for more direct guest access, significantly reducing overhead. More details, including performance benchmarks, can be read in the v1 cover letter linked below. v4: * Apply Mark Brown's non-UNDEF FGT control commit to the PMU FGT controls and calculate those controls with the others in kvm_calculate_traps() * Introduce lazy context swaps for guests that only turns on for guests that have enabled partitioning and accessed PMU registers. * Rename pmu-part.c to pmu-direct.c because future features might achieve direct PMU access without partitioning. * Better explain certain commits, such as why the untrapped registers are safe to untrap. * Reduce the PMU include cleanup down to only what is still necessary and explain why. v3: https://lore.kernel.org/kvm/20250626200459.1153955-1-coltonlewis@google.com/ v2: https://lore.kernel.org/kvm/20250620221326.1261128-1-coltonlewis@google.com/ v1: https://lore.kernel.org/kvm/20250602192702.2125115-1-coltonlewis@google.com/ Colton Lewis (21): arm64: cpufeature: Add cpucap for HPMN0 KVM: arm64: Reorganize PMU functions perf: arm_pmuv3: Introduce method to partition the PMU perf: arm_pmuv3: Generalize counter bitmasks perf: arm_pmuv3: Keep out of guest counter partition KVM: arm64: Account for partitioning in kvm_pmu_get_max_counters() KVM: arm64: Set up FGT for Partitioned PMU KVM: arm64: Writethrough trapped PMEVTYPER register KVM: arm64: Use physical PMSELR for PMXEVTYPER if partitioned KVM: arm64: Writethrough trapped PMOVS register KVM: arm64: Write fast path PMU register handlers KVM: arm64: Setup MDCR_EL2 to handle a partitioned PMU KVM: arm64: Account for partitioning in PMCR_EL0 access KVM: arm64: Context swap Partitioned PMU guest registers KVM: arm64: Enforce PMU event filter at vcpu_load() KVM: arm64: Extract enum debug_owner to enum vcpu_register_owner KVM: arm64: Implement lazy PMU context swaps perf: arm_pmuv3: Handle IRQs for Partitioned PMU guest counters KVM: arm64: Inject recorded guest interrupts KVM: arm64: Add ioctl to partition the PMU when supported KVM: arm64: selftests: Add test case for partitioned PMU Marc Zyngier (1): KVM: arm64: Reorganize PMU includes Mark Brown (1): KVM: arm64: Introduce non-UNDEF FGT control Documentation/virt/kvm/api.rst | 21 + arch/arm/include/asm/arm_pmuv3.h | 38 + arch/arm64/include/asm/arm_pmuv3.h | 61 +- arch/arm64/include/asm/kvm_host.h | 34 +- arch/arm64/include/asm/kvm_pmu.h | 123 +++ arch/arm64/include/asm/kvm_types.h | 7 +- arch/arm64/kernel/cpufeature.c | 8 + arch/arm64/kvm/Makefile | 2 +- arch/arm64/kvm/arm.c | 22 + arch/arm64/kvm/debug.c | 33 +- arch/arm64/kvm/hyp/include/hyp/debug-sr.h | 6 +- arch/arm64/kvm/hyp/include/hyp/switch.h | 181 ++++- arch/arm64/kvm/pmu-direct.c | 395 ++++++++++ arch/arm64/kvm/pmu-emul.c | 674 +--------------- arch/arm64/kvm/pmu.c | 725 ++++++++++++++++++ arch/arm64/kvm/sys_regs.c | 137 +++- arch/arm64/tools/cpucaps | 1 + arch/arm64/tools/sysreg | 6 +- drivers/perf/arm_pmuv3.c | 128 +++- include/linux/perf/arm_pmu.h | 1 + include/linux/perf/arm_pmuv3.h | 14 +- include/uapi/linux/kvm.h | 4 + tools/include/uapi/linux/kvm.h | 2 + .../selftests/kvm/arm64/vpmu_counter_access.c | 62 +- 24 files changed, 1910 insertions(+), 775 deletions(-) create mode 100644 arch/arm64/kvm/pmu-direct.c base-commit: 79150772457f4d45e38b842d786240c36bb1f97f -- 2.50.0.727.gbf7dc18ff4-goog

2 months, 3 weeks

3
31
0 0

next-20251020: selftests: helpers.h:10:10: fatal error: kselftest.h: No such file or directory

by Naresh Kamboju

The selftests x86_64 builds failed due to following build warnings / errors on the Linux next-20251020 and next-20251021 tags with gcc-14 and clang-21. First seen on next-20251020 Good: next-20251020 Bad: next-20251017 Regression Analysis: - New regression? yes - Reproducibility? yes ### Build errors x86_64-linux-gnu-gcc -m64 -o kselftest/x86/single_step_syscall_64 -O2 -g -std=gnu99 -pthread -Wall -isystem usr/include -no-pie -DCAN_BUILD_64 single_step_syscall.c -lrt -ldl In file included from single_step_syscall.c:34: helpers.h:10:10: fatal error: kselftest.h: No such file or directory 10 | #include "kselftest.h" | ^~~~~~~~~~~~~ compilation terminated. make[4]: *** [Makefile:86: kselftest/x86/single_step_syscall_64] Error 1 ### Suspected patch git log --oneline next-20251017..next-20251020 -- tools/testing/selftests/x86/ 4d89827dfb274 selftests: complete kselftest include centralization Build regressions: next-20251020: selftests: helpers.h:10:10: fatal error: kselftest.h: No such file or directory Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org> ### Steps to reproduce - tuxmake --runtime podman --target-arch x86_64 --toolchain gcc-14 \ --kconfig https://storage.tuxsuite.com/public/linaro/lkft/builds/34JgN0fZ9uXj6HVnjvjq… \ debugkernel cpupower headers kernel kselftest modules ## Source * Kernel version: 6.18.0-rc2 * Git tree: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git * Git describe: next-20251021 and next-20251020 * Git commit: fe45352cd106ae41b5ad3f0066c2e54dbb2dfd70 * Architectures: x86_64 * Toolchains: gcc-14 and clang-21 * Kconfigs: defconfig+selftests/*/configs ## Build * Build log: https://storage.tuxsuite.com/public/linaro/lkft/builds/34JgN0fZ9uXj6HVnjvjq… * Build details: https://regressions.linaro.org/lkft/linux-next-master/next-20251020/kselfte… * Build link: https://storage.tuxsuite.com/public/linaro/lkft/builds/34JgN0fZ9uXj6HVnjvjq… * Kernel config: https://storage.tuxsuite.com/public/linaro/lkft/builds/34JgN0fZ9uXj6HVnjvjq… -- Linaro LKFT https://lkft.linaro.org

2 months, 3 weeks

3
2
0 0

[PATCH] selftest: net: prevent use of uninitialized variable

by Alessandro Zanni

Fix to avoid the usage of the `ret` variable uninitialized in the following macro expansions. It solves the following warning: In file included from netlink-dumps.c:21: netlink-dumps.c: In function ‘dump_extack’: ../kselftest_harness.h:788:35: warning: ‘ret’ may be used uninitialized [-Wmaybe-uninitialized] 788 | intmax_t __exp_print = (intmax_t)__exp; \ | ^~~~~~~~~~~ ../kselftest_harness.h:631:9: note: in expansion of macro ‘__EXPECT’ 631 | __EXPECT(expected, #expected, seen, #seen, ==, 0) | ^~~~~~~~ netlink-dumps.c:169:9: note: in expansion of macro ‘EXPECT_EQ’ 169 | EXPECT_EQ(ret, FOUND_EXTACK); | ^~~~~~~~~ The issue can be reproduced, building the tests, with the command: make -C tools/testing/selftests TARGETS=net Signed-off-by: Alessandro Zanni <alessandro.zanni87(a)gmail.com> --- tools/testing/selftests/net/netlink-dumps.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/netlink-dumps.c b/tools/testing/selftests/net/netlink-dumps.c index 7618ebe528a4..8ebb8b1b9c5c 100644 --- a/tools/testing/selftests/net/netlink-dumps.c +++ b/tools/testing/selftests/net/netlink-dumps.c @@ -112,7 +112,7 @@ static const struct { TEST(dump_extack) { int netlink_sock; - int i, cnt, ret; + int i, cnt, ret = 0; char buf[8192]; int one = 1; ssize_t n; -- 2.43.0

2 months, 3 weeks

2
1
0 0

[PATCH nf-next v7 0/3] Add IPIP flowtable SW acceleration

by Lorenzo Bianconi

Introduce SW acceleration for IPIP tunnels in the netfilter flowtable infrastructure. This series introduces basic infrastructure to accelerate other tunnel types (e.g. IP6IP6). --- Changes in v7: - Introduce sw acceleration for tx path of IPIP tunnels - Rely on exact match during flowtable entry lookup - Fix typos - Link to v6: https://lore.kernel.org/r/20250818-nf-flowtable-ipip-v6-0-eda90442739c@kern… Changes in v6: - Rebase on top of nf-next main branch - Link to v5: https://lore.kernel.org/r/20250721-nf-flowtable-ipip-v5-0-0865af9e58c6@kern… Changes in v5: - Rely on __ipv4_addr_hash() to compute the hash used as encap ID - Remove unnecessary pskb_may_pull() in nf_flow_tuple_encap() - Add nf_flow_ip4_ecanp_pop utility routine - Link to v4: https://lore.kernel.org/r/20250718-nf-flowtable-ipip-v4-0-f8bb1c18b986@kern… Changes in v4: - Use the hash value of the saddr, daddr and protocol of outer IP header as encapsulation id. - Link to v3: https://lore.kernel.org/r/20250703-nf-flowtable-ipip-v3-0-880afd319b9f@kern… Changes in v3: - Add outer IP header sanity checks - target nf-next tree instead of net-next - Link to v2: https://lore.kernel.org/r/20250627-nf-flowtable-ipip-v2-0-c713003ce75b@kern… Changes in v2: - Introduce IPIP flowtable selftest - Link to v1: https://lore.kernel.org/r/20250623-nf-flowtable-ipip-v1-1-2853596e3941@kern… --- Lorenzo Bianconi (3): net: netfilter: Add IPIP flowtable rx sw acceleration net: netfilter: Add IPIP flowtable tx sw acceleration selftests: netfilter: nft_flowtable.sh: Add IPIP flowtable selftest include/linux/netdevice.h | 16 +++ include/net/netfilter/nf_flow_table.h | 26 +++++ net/ipv4/ipip.c | 29 +++++ net/netfilter/nf_flow_table_core.c | 10 ++ net/netfilter/nf_flow_table_ip.c | 118 ++++++++++++++++++++- net/netfilter/nft_flow_offload.c | 79 ++++++++++++-- .../selftests/net/netfilter/nft_flowtable.sh | 40 +++++++ 7 files changed, 307 insertions(+), 11 deletions(-) --- base-commit: d1d7998df9d7d3ee20bcfc876065fa897b11506d change-id: 20250623-nf-flowtable-ipip-1b3d7b08d067 Best regards, -- Lorenzo Bianconi <lorenzo(a)kernel.org>

2 months, 3 weeks

2
5
0 0

Zwrot

by Eryk Wawrzyn

Dzień dobry, kontaktuję się w imieniu kancelarii specjalizującej się w zarządzaniu wierzytelnościami. Od lat wspieramy firmy w odzyskiwaniu należności. Prowadzimy kompleksową obsługę na etapach: przedsądowym, sądowym i egzekucyjnym, dostosowując działania do branży Klienta. Kiedy możemy porozmawiać? Pozdrawiam Eryk Wawrzyn

2 months, 3 weeks

1
0
0 0

[PATCH net 0/5] mptcp: handle late ADD_ADDR + selftests skip

by Matthieu Baerts (NGI0)

Here are a few independent fixes related to MPTCP and its selftests: - Patch 1: correctly handle ADD_ADDR being received after the switch to 'fully-established'. A fix for another recent fix backported up to v5.14. - Patches 2-5: properly mark some MPTCP Join subtests as 'skipped' if the tested kernel doesn't support the feature being validated. Some fixes for up to v5.13, v5.18, v6.11 and v6.18-rc1 respectively. Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org> --- Matthieu Baerts (NGI0) (5): mptcp: pm: in-kernel: C-flag: handle late ADD_ADDR selftests: mptcp: join: mark 'flush re-add' as skipped if not supported selftests: mptcp: join: mark implicit tests as skipped if not supported selftests: mptcp: join: mark 'delete re-add signal' as skipped if not supported selftests: mptcp: join: mark laminar tests as skipped if not supported net/mptcp/pm_kernel.c | 6 ++++++ tools/testing/selftests/net/mptcp/mptcp_join.sh | 18 +++++++++--------- 2 files changed, 15 insertions(+), 9 deletions(-) --- base-commit: ffff5c8fc2af2218a3332b3d5b97654599d50cde change-id: 20251020-net-mptcp-c-flag-late-add-addr-1d954e7b63d2 Best regards, -- Matthieu Baerts (NGI0) <matttbe(a)kernel.org>

2 months, 3 weeks

2
6
0 0

[PATCH net-next v7 00/26] vsock: add namespace support to vhost-vsock

by Bobby Eshleman

This series adds namespace support to vhost-vsock and loopback. It does not add namespaces to any of the other guest transports (virtio-vsock, hyperv, or vmci). The current revision supports two modes: local and global. Local mode is complete isolation of namespaces, while global mode is complete sharing between namespaces of CIDs (the original behavior). The mode is set using /proc/sys/net/vsock/ns_mode. Modes are per-netns and write-once. This allows a system to configure namespaces independently (some may share CIDs, others are completely isolated). This also supports future possible mixed use cases, where there may be namespaces in global mode spinning up VMs while there are mixed mode namespaces that provide services to the VMs, but are not allowed to allocate from the global CID pool (this mode not implemented in this series). If a socket or VM is created when a namespace is global but the namespace changes to local, the socket or VM will continue working normally. That is, the socket or VM assumes the mode behavior of the namespace at the time the socket/VM was created. The original mode is captured in vsock_create() and so occurs at the time of socket(2) and accept(2) for sockets and open(2) on /dev/vhost-vsock for VMs. This prevents a socket/VM connection from suddenly breaking due to a namespace mode change. Any new sockets/VMs created after the mode change will adopt the new mode's behavior. Additionally, added tests for the new namespace features: tools/testing/selftests/vsock/vmtest.sh 1..30 ok 1 vm_server_host_client ok 2 vm_client_host_server ok 3 vm_loopback ok 4 ns_host_vsock_ns_mode_ok ok 5 ns_host_vsock_ns_mode_write_once_ok ok 6 ns_global_same_cid_fails ok 7 ns_local_same_cid_ok ok 8 ns_global_local_same_cid_ok ok 9 ns_local_global_same_cid_ok ok 10 ns_diff_global_host_connect_to_global_vm_ok ok 11 ns_diff_global_host_connect_to_local_vm_fails ok 12 ns_diff_global_vm_connect_to_global_host_ok ok 13 ns_diff_global_vm_connect_to_local_host_fails ok 14 ns_diff_local_host_connect_to_local_vm_fails ok 15 ns_diff_local_vm_connect_to_local_host_fails ok 16 ns_diff_global_to_local_loopback_local_fails ok 17 ns_diff_local_to_global_loopback_fails ok 18 ns_diff_local_to_local_loopback_fails ok 19 ns_diff_global_to_global_loopback_ok ok 20 ns_same_local_loopback_ok ok 21 ns_same_local_host_connect_to_local_vm_ok ok 22 ns_same_local_vm_connect_to_local_host_ok ok 23 ns_mode_change_connection_continue_vm_ok ok 24 ns_mode_change_connection_continue_host_ok ok 25 ns_mode_change_connection_continue_both_ok ok 26 ns_delete_vm_ok ok 27 ns_delete_host_ok ok 28 ns_delete_both_ok ok 29 ns_loopback_global_global_late_module_load_ok ok 30 ns_loopback_local_local_late_module_load_fails SUMMARY: PASS=30 SKIP=0 FAIL=0 Thanks again for everyone's help and reviews! Signed-off-by: Bobby Eshleman <bobbyeshleman(a)gmail.com> To: Stefano Garzarella <sgarzare(a)redhat.com> To: Shuah Khan <shuah(a)kernel.org> To: David S. Miller <davem(a)davemloft.net> To: Eric Dumazet <edumazet(a)google.com> To: Jakub Kicinski <kuba(a)kernel.org> To: Paolo Abeni <pabeni(a)redhat.com> To: Simon Horman <horms(a)kernel.org> To: Stefan Hajnoczi <stefanha(a)redhat.com> To: Michael S. Tsirkin <mst(a)redhat.com> To: Jason Wang <jasowang(a)redhat.com> To: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com> To: Eugenio Pérez <eperezma(a)redhat.com> To: K. Y. Srinivasan <kys(a)microsoft.com> To: Haiyang Zhang <haiyangz(a)microsoft.com> To: Wei Liu <wei.liu(a)kernel.org> To: Dexuan Cui <decui(a)microsoft.com> To: Bryan Tan <bryan-bt.tan(a)broadcom.com> To: Vishnu Dasa <vishnu.dasa(a)broadcom.com> To: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com> Cc: virtualization(a)lists.linux.dev Cc: netdev(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org Cc: kvm(a)vger.kernel.org Cc: linux-hyperv(a)vger.kernel.org Cc: berrange(a)redhat.com Changes in v7: - fix hv_sock build - break out vmtest patches into distinct, more well-scoped patches - change `orig_net_mode` to `net_mode` - many fixes and style changes in per-patch change sets (see individual patches for specific changes) - optimize `virtio_vsock_skb_cb` layout - update commit messages with more useful descriptions - vsock_loopback: use orig_net_mode instead of current net mode - add tests for edge cases (ns deletion, mode changing, loopback module load ordering) - Link to v6: https://lore.kernel.org/r/20250916-vsock-vmtest-v6-0-064d2eb0c89d@meta.com Changes in v6: - define behavior when mode changes to local while socket/VM is alive - af_vsock: clarify description of CID behavior - af_vsock: use stronger langauge around CID rules (dont use "may") - af_vsock: improve naming of buf/buffer - af_vsock: improve string length checking on proc writes - vsock_loopback: add space in struct to clarify lock protection - vsock_loopback: do proper cleanup/unregister on vsock_loopback_exit() - vsock_loopback: use virtio_vsock_skb_net() instead of sock_net() - vsock_loopback: set loopback to NULL after kfree() - vsock_loopback: use pernet_operations and remove callback mechanism - vsock_loopback: add macros for "global" and "local" - vsock_loopback: fix length checking - vmtest.sh: check for namespace support in vmtest.sh - Link to v5: https://lore.kernel.org/r/20250827-vsock-vmtest-v5-0-0ba580bede5b@meta.com Changes in v5: - /proc/net/vsock_ns_mode -> /proc/sys/net/vsock/ns_mode - vsock_global_net -> vsock_global_dummy_net - fix netns lookup in vhost_vsock to respect pid namespaces - add callbacks for vsock_loopback to avoid circular dependency - vmtest.sh loads vsock_loopback module - remove vsock_net_mode_can_set() - change vsock_net_write_mode() to return true/false based on success - make vsock_net_mode enum instead of u8 - Link to v4: https://lore.kernel.org/r/20250805-vsock-vmtest-v4-0-059ec51ab111@meta.com Changes in v4: - removed RFC tag - implemented loopback support - renamed new tests to better reflect behavior - completed suite of tests with permutations of ns modes and vsock_test as guest/host - simplified socat bridging with unix socket instead of tcp + veth - only use vsock_test for success case, socat for failure case (context in commit message) - lots of cleanup Changes in v3: - add notion of "modes" - add procfs /proc/net/vsock_ns_mode - local and global modes only - no /dev/vhost-vsock-netns - vmtest.sh already merged, so new patch just adds new tests for NS - Link to v2: https://lore.kernel.org/kvm/20250312-vsock-netns-v2-0-84bffa1aa97a@gmail.com Changes in v2: - only support vhost-vsock namespaces - all g2h namespaces retain old behavior, only common API changes impacted by vhost-vsock changes - add /dev/vhost-vsock-netns for "opt-in" - leave /dev/vhost-vsock to old behavior - removed netns module param - Link to v1: https://lore.kernel.org/r/20200116172428.311437-1-sgarzare@redhat.com Changes in v1: - added 'netns' module param to vsock.ko to enable the network namespace support (disabled by default) - added 'vsock_net_eq()' to check the "net" assigned to a socket only when 'netns' support is enabled - Link to RFC: https://patchwork.ozlabs.org/cover/1202235/ --- Bobby Eshleman (26): vsock: a per-net vsock NS mode state vsock/virtio: pack struct virtio_vsock_skb_cb vsock: add netns to vsock skb cb vsock: add netns to vsock core vsock/loopback: add netns support vsock/virtio: add netns to virtio transport common vhost/vsock: add netns support selftests/vsock: improve logging in vmtest.sh selftests/vsock: make wait_for_listener() work even if pipefail is on selftests/vsock: reuse logic for vsock_test through wrapper functions selftests/vsock: avoid multi-VM pidfile collisions with QEMU selftests/vsock: do not unconditionally die if qemu fails selftests/vsock: speed up tests by reducing the QEMU pidfile timeout selftests/vsock: add check_result() for pass/fail counting selftests/vsock: identify and execute tests that can re-use VM selftests/vsock: add namespace initialization function selftests/vsock: remove namespaces in cleanup() selftests/vsock: prepare vm management helpers for namespaces selftests/vsock: add BUILD=0 definition selftests/vsock: avoid false-positives when checking dmesg selftests/vsock: add tests for proc sys vsock ns_mode selftests/vsock: add namespace tests for CID collisions selftests/vsock: add tests for host <-> vm connectivity with namespaces selftests/vsock: add tests for namespace deletion and mode changes selftests/vsock: add tests for module loading order selftests/vsock: add 1.37 to tested virtme-ng versions MAINTAINERS | 1 + drivers/vhost/vsock.c | 48 +- include/linux/virtio_vsock.h | 47 +- include/net/af_vsock.h | 78 +- include/net/net_namespace.h | 4 + include/net/netns/vsock.h | 22 + net/vmw_vsock/af_vsock.c | 264 ++++++- net/vmw_vsock/virtio_transport.c | 7 +- net/vmw_vsock/virtio_transport_common.c | 21 +- net/vmw_vsock/vsock_loopback.c | 89 ++- tools/testing/selftests/vsock/vmtest.sh | 1320 ++++++++++++++++++++++++++++--- 11 files changed, 1729 insertions(+), 172 deletions(-) --- base-commit: 3ff9bcecce83f12169ab3e42671bd76554ca521a change-id: 20250325-vsock-vmtest-b3a21d2102c2 Best regards, -- Bobby Eshleman <bobbyeshleman(a)meta.com>

2 months, 3 weeks

2
28
0 0

[PATCH net-next v7 1/2] net/tls: support setting the maximum payload size

by Wilfred Mallawa

From: Wilfred Mallawa <wilfred.mallawa(a)wdc.com> During a handshake, an endpoint may specify a maximum record size limit. Currently, the kernel defaults to TLS_MAX_PAYLOAD_SIZE (16KB) for the maximum record size. Meaning that, the outgoing records from the kernel can exceed a lower size negotiated during the handshake. In such a case, the TLS endpoint must send a fatal "record_overflow" alert [1], and thus the record is discarded. Upcoming Western Digital NVMe-TCP hardware controllers implement TLS support. For these devices, supporting TLS record size negotiation is necessary because the maximum TLS record size supported by the controller is less than the default 16KB currently used by the kernel. Currently, there is no way to inform the kernel of such a limit. This patch adds support to a new setsockopt() option `TLS_TX_MAX_PAYLOAD_LEN` that allows for setting the maximum plaintext fragment size. Once set, outgoing records are no larger than the size specified. This option can be used to specify the record size limit. [1] https://www.rfc-editor.org/rfc/rfc8449 Signed-off-by: Wilfred Mallawa <wilfred.mallawa(a)wdc.com> --- V6 -> V7: - Added more information to the description regarding record_size_limit - For TLS 1.3, setsockopt() now allows a 63 byte minimum to account for the ContentType - getsockopt() returns the total plaintext length, for TLS 1.3, this will 1 byte higher than what is set using setsockopt(). --- Documentation/networking/tls.rst | 22 +++++++++++ include/net/tls.h | 3 ++ include/uapi/linux/tls.h | 2 + net/tls/tls_device.c | 2 +- net/tls/tls_main.c | 68 ++++++++++++++++++++++++++++++++ net/tls/tls_sw.c | 2 +- 6 files changed, 97 insertions(+), 2 deletions(-) diff --git a/Documentation/networking/tls.rst b/Documentation/networking/tls.rst index 36cc7afc2527..ecaa7631ec46 100644 --- a/Documentation/networking/tls.rst +++ b/Documentation/networking/tls.rst @@ -280,6 +280,28 @@ If the record decrypted turns out to had been padded or is not a data record it will be decrypted again into a kernel buffer without zero copy. Such events are counted in the ``TlsDecryptRetry`` statistic. +TLS_TX_MAX_PAYLOAD_LEN +~~~~~~~~~~~~~~~~~~~~~~ + +Specifies the maximum size of the plaintext payload for transmitted TLS records. + +When this option is set, the kernel enforces the specified limit on all outgoing +TLS records. No plaintext fragment will exceed this size. This option can be used +to implement the TLS Record Size Limit extension [1]. + - For TLS 1.2, the value corresponds directly to the record size limit. + - For TLS 1.3, the value should be set to record_size_limit - 1, since + the record size limit includes one additional byte for the ContentType + field. + +The valid range for this option is 64 to 16384 bytes for TLS 1.2, and 63 to +16384 bytes for TLS 1.3. The lower minimum for TLS 1.3 accounts for the +extra byte used by the ContentType field. + +For TLS 1.3, getsockopt() will return the total plaintext fragment length, +inclusive of the ContentType field. + +[1] https://datatracker.ietf.org/doc/html/rfc8449 + Statistics ========== diff --git a/include/net/tls.h b/include/net/tls.h index 857340338b69..f2af113728aa 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -53,6 +53,8 @@ struct tls_rec; /* Maximum data size carried in a TLS record */ #define TLS_MAX_PAYLOAD_SIZE ((size_t)1 << 14) +/* Minimum record size limit as per RFC8449 */ +#define TLS_MIN_RECORD_SIZE_LIM ((size_t)1 << 6) #define TLS_HEADER_SIZE 5 #define TLS_NONCE_OFFSET TLS_HEADER_SIZE @@ -226,6 +228,7 @@ struct tls_context { u8 rx_conf:3; u8 zerocopy_sendfile:1; u8 rx_no_pad:1; + u16 tx_max_payload_len; int (*push_pending_record)(struct sock *sk, int flags); void (*sk_write_space)(struct sock *sk); diff --git a/include/uapi/linux/tls.h b/include/uapi/linux/tls.h index b66a800389cc..b8b9c42f848c 100644 --- a/include/uapi/linux/tls.h +++ b/include/uapi/linux/tls.h @@ -41,6 +41,7 @@ #define TLS_RX 2 /* Set receive parameters */ #define TLS_TX_ZEROCOPY_RO 3 /* TX zerocopy (only sendfile now) */ #define TLS_RX_EXPECT_NO_PAD 4 /* Attempt opportunistic zero-copy */ +#define TLS_TX_MAX_PAYLOAD_LEN 5 /* Maximum plaintext size */ /* Supported versions */ #define TLS_VERSION_MINOR(ver) ((ver) & 0xFF) @@ -194,6 +195,7 @@ enum { TLS_INFO_RXCONF, TLS_INFO_ZC_RO_TX, TLS_INFO_RX_NO_PAD, + TLS_INFO_TX_MAX_PAYLOAD_LEN, __TLS_INFO_MAX, }; #define TLS_INFO_MAX (__TLS_INFO_MAX - 1) diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index caa2b5d24622..4d29b390aed9 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -462,7 +462,7 @@ static int tls_push_data(struct sock *sk, /* TLS_HEADER_SIZE is not counted as part of the TLS record, and * we need to leave room for an authentication tag. */ - max_open_record_len = TLS_MAX_PAYLOAD_SIZE + + max_open_record_len = tls_ctx->tx_max_payload_len + prot->prepend_size; do { rc = tls_do_allocation(sk, ctx, pfrag, prot->prepend_size); diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index 39a2ab47fe72..b234d44bd789 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -541,6 +541,32 @@ static int do_tls_getsockopt_no_pad(struct sock *sk, char __user *optval, return 0; } +static int do_tls_getsockopt_tx_payload_len(struct sock *sk, char __user *optval, + int __user *optlen) +{ + struct tls_context *ctx = tls_get_ctx(sk); + u16 payload_len = ctx->tx_max_payload_len; + int len; + + if (get_user(len, optlen)) + return -EFAULT; + + /* For TLS 1.3 payload length includes ContentType */ + if (ctx->prot_info.version == TLS_1_3_VERSION) + payload_len++; + + if (len < sizeof(payload_len)) + return -EINVAL; + + if (put_user(sizeof(payload_len), optlen)) + return -EFAULT; + + if (copy_to_user(optval, &payload_len, sizeof(payload_len))) + return -EFAULT; + + return 0; +} + static int do_tls_getsockopt(struct sock *sk, int optname, char __user *optval, int __user *optlen) { @@ -560,6 +586,9 @@ static int do_tls_getsockopt(struct sock *sk, int optname, case TLS_RX_EXPECT_NO_PAD: rc = do_tls_getsockopt_no_pad(sk, optval, optlen); break; + case TLS_TX_MAX_PAYLOAD_LEN: + rc = do_tls_getsockopt_tx_payload_len(sk, optval, optlen); + break; default: rc = -ENOPROTOOPT; break; @@ -809,6 +838,32 @@ static int do_tls_setsockopt_no_pad(struct sock *sk, sockptr_t optval, return rc; } +static int do_tls_setsockopt_tx_payload_len(struct sock *sk, sockptr_t optval, + unsigned int optlen) +{ + struct tls_context *ctx = tls_get_ctx(sk); + struct tls_sw_context_tx *sw_ctx = tls_sw_ctx_tx(ctx); + u16 value; + bool tls_13 = ctx->prot_info.version == TLS_1_3_VERSION; + + if (sw_ctx && sw_ctx->open_rec) + return -EBUSY; + + if (sockptr_is_null(optval) || optlen != sizeof(value)) + return -EINVAL; + + if (copy_from_sockptr(&value, optval, sizeof(value))) + return -EFAULT; + + if (value < TLS_MIN_RECORD_SIZE_LIM - (tls_13 ? 1 : 0) || + value > TLS_MAX_PAYLOAD_SIZE) + return -EINVAL; + + ctx->tx_max_payload_len = value; + + return 0; +} + static int do_tls_setsockopt(struct sock *sk, int optname, sockptr_t optval, unsigned int optlen) { @@ -830,6 +885,11 @@ static int do_tls_setsockopt(struct sock *sk, int optname, sockptr_t optval, case TLS_RX_EXPECT_NO_PAD: rc = do_tls_setsockopt_no_pad(sk, optval, optlen); break; + case TLS_TX_MAX_PAYLOAD_LEN: + lock_sock(sk); + rc = do_tls_setsockopt_tx_payload_len(sk, optval, optlen); + release_sock(sk); + break; default: rc = -ENOPROTOOPT; break; @@ -1019,6 +1079,7 @@ static int tls_init(struct sock *sk) ctx->tx_conf = TLS_BASE; ctx->rx_conf = TLS_BASE; + ctx->tx_max_payload_len = TLS_MAX_PAYLOAD_SIZE; update_sk_prot(sk, ctx); out: write_unlock_bh(&sk->sk_callback_lock); @@ -1108,6 +1169,12 @@ static int tls_get_info(struct sock *sk, struct sk_buff *skb, bool net_admin) goto nla_failure; } + err = nla_put_u16(skb, TLS_INFO_TX_MAX_PAYLOAD_LEN, + ctx->tx_max_payload_len); + + if (err) + goto nla_failure; + rcu_read_unlock(); nla_nest_end(skb, start); return 0; @@ -1129,6 +1196,7 @@ static size_t tls_get_info_size(const struct sock *sk, bool net_admin) nla_total_size(sizeof(u16)) + /* TLS_INFO_TXCONF */ nla_total_size(0) + /* TLS_INFO_ZC_RO_TX */ nla_total_size(0) + /* TLS_INFO_RX_NO_PAD */ + nla_total_size(sizeof(u16)) + /* TLS_INFO_TX_MAX_PAYLOAD_LEN */ 0; return size; diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index d17135369980..9937d4c810f2 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1079,7 +1079,7 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg, orig_size = msg_pl->sg.size; full_record = false; try_to_copy = msg_data_left(msg); - record_room = TLS_MAX_PAYLOAD_SIZE - msg_pl->sg.size; + record_room = tls_ctx->tx_max_payload_len - msg_pl->sg.size; if (try_to_copy >= record_room) { try_to_copy = record_room; full_record = true; -- 2.51.0

2 months, 3 weeks

4
5
0 0

[PATCH] selftests: arg_parsing: Ensure data is flushed to disk before reading.

by Xing Guo

Recently, I noticed a selftest failure in my local environment. The test_parse_test_list_file writes some data to /tmp/bpf_arg_parsing_test.XXXXXX and parse_test_list_file() will read the data back. However, after writing data to that file, we forget to call fsync() and it's causing testing failure in my laptop. This patch helps fix it by adding the missing fsync() call. Signed-off-by: Xing Guo <higuoxing(a)gmail.com> --- tools/testing/selftests/bpf/prog_tests/arg_parsing.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/bpf/prog_tests/arg_parsing.c b/tools/testing/selftests/bpf/prog_tests/arg_parsing.c index bb143de68875..4f071943ffb0 100644 --- a/tools/testing/selftests/bpf/prog_tests/arg_parsing.c +++ b/tools/testing/selftests/bpf/prog_tests/arg_parsing.c @@ -140,6 +140,7 @@ static void test_parse_test_list_file(void) fprintf(fp, "testA/subtest2\n"); fprintf(fp, "testC_no_eof_newline"); fflush(fp); + fsync(fd); if (!ASSERT_OK(ferror(fp), "prepare tmp")) goto out_fclose; -- 2.51.0

2 months, 3 weeks

6
18
0 0

[PATCH v2 0/6] riscv: vector: misc ptrace fixes for debug use-cases

by Sergey Matyukevich

This patch series suggests fixes for several corner cases in the RISC-V vector ptrace implementation: - follow gdbserver expectations and return ENODATA instead of EINVAL if vector extension is supported but not yet activated for a traced process - force vector context save on the next context switch after ptrace call that modified vector CSRs, to avoid reading stale values by the next ptrace calls - force vector context save on the first context switch after vector context initialization, to avoid reading zero vlenb by an early attached debugger For detailed description see the appropriate commit messages. A new test is added into the tools/testing/selftests/riscv/vector to verify the fixes. Each fix is accompanied by its own test case. Initial version [1] of this series included only the last fix for zero vlenb. [1] https://lore.kernel.org/linux-riscv/20250821173957.563472-1-geomatsi@gmail.… Ilya Mamay (1): riscv: ptrace: return ENODATA for inactive vector extension Sergey Matyukevich (5): selftests: riscv: test ptrace vector interface selftests: riscv: set invalid vtype using ptrace riscv: vector: allow to force vector context save selftests: riscv: verify initial vector state with ptrace riscv: vector: initialize vlenb on the first context switch arch/riscv/include/asm/thread_info.h | 2 + arch/riscv/include/asm/vector.h | 3 + arch/riscv/kernel/process.c | 2 + arch/riscv/kernel/ptrace.c | 15 +- arch/riscv/kernel/vector.c | 4 + .../testing/selftests/riscv/vector/.gitignore | 1 + tools/testing/selftests/riscv/vector/Makefile | 5 +- .../testing/selftests/riscv/vector/v_ptrace.c | 302 ++++++++++++++++++ 8 files changed, 331 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/riscv/vector/v_ptrace.c base-commit: c746c3b5169831d7fb032a1051d8b45592ae8d78 -- 2.51.0

2 months, 3 weeks

2
13
0 0

[PATCH] Fix up 'make versioncheck' issues

by Jesper Juhl

From d2e411b4cd37b1936a30d130e2b21e37e62e0cfb Mon Sep 17 00:00:00 2001 From: Jesper Juhl <jesperjuhl76(a)gmail.com> Date: Tue, 21 Oct 2025 03:51:21 +0200 Subject: [PATCH] [PATCH] Fix up 'make versioncheck' issues 'make versioncheck' currently flags a few files that don't need to needs it but doesn't include it. This patch fixes that up. Signed-Off-By: Jesper Juhl <jesperjuhl76(a)gmail.com> --- samples/bpf/spintest.bpf.c | 1 - tools/lib/bpf/bpf_helpers.h | 2 ++ tools/testing/selftests/bpf/progs/dev_cgroup.c | 1 - tools/testing/selftests/bpf/progs/netcnt_prog.c | 2 -- tools/testing/selftests/bpf/progs/test_map_lock.c | 1 - tools/testing/selftests/bpf/progs/test_send_signal_kern.c | 1 - tools/testing/selftests/bpf/progs/test_spin_lock.c | 1 - tools/testing/selftests/bpf/progs/test_tcp_estats.c | 1 - tools/testing/selftests/wireguard/qemu/init.c | 1 - 9 files changed, 2 insertions(+), 9 deletions(-) diff --git a/samples/bpf/spintest.bpf.c b/samples/bpf/spintest.bpf.c index cba5a9d507831..6278f6d0b731f 100644 --- a/samples/bpf/spintest.bpf.c +++ b/samples/bpf/spintest.bpf.c @@ -5,7 +5,6 @@ * License as published by the Free Software Foundation. */ #include "vmlinux.h" -#include <linux/version.h> #include <bpf/bpf_helpers.h> #include <bpf/bpf_tracing.h> diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h index 80c0285406561..393ce1063a977 100644 --- a/tools/lib/bpf/bpf_helpers.h +++ b/tools/lib/bpf/bpf_helpers.h @@ -2,6 +2,8 @@ #ifndef __BPF_HELPERS__ #define __BPF_HELPERS__ +#include <linux/version.h> + /* * Note that bpf programs need to include either * vmlinux.h (auto-generated from BTF) or linux/types.h diff --git a/tools/testing/selftests/bpf/progs/dev_cgroup.c b/tools/testing/selftests/bpf/progs/dev_cgroup.c index c1dfbd2b56fc9..4c4e747bf827a 100644 --- a/tools/testing/selftests/bpf/progs/dev_cgroup.c +++ b/tools/testing/selftests/bpf/progs/dev_cgroup.c @@ -6,7 +6,6 @@ */ #include <linux/bpf.h> -#include <linux/version.h> #include <bpf/bpf_helpers.h> SEC("cgroup/dev") diff --git a/tools/testing/selftests/bpf/progs/netcnt_prog.c b/tools/testing/selftests/bpf/progs/netcnt_prog.c index f9ef8aee56f16..3cf6b7a27a34a 100644 --- a/tools/testing/selftests/bpf/progs/netcnt_prog.c +++ b/tools/testing/selftests/bpf/progs/netcnt_prog.c @@ -1,7 +1,5 @@ // SPDX-License-Identifier: GPL-2.0 #include <linux/bpf.h> -#include <linux/version.h> - #include <bpf/bpf_helpers.h> #include "netcnt_common.h" diff --git a/tools/testing/selftests/bpf/progs/test_map_lock.c b/tools/testing/selftests/bpf/progs/test_map_lock.c index 1c02511b73cdb..982bdbf0dba6b 100644 --- a/tools/testing/selftests/bpf/progs/test_map_lock.c +++ b/tools/testing/selftests/bpf/progs/test_map_lock.c @@ -1,7 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 // Copyright (c) 2019 Facebook #include <linux/bpf.h> -#include <linux/version.h> #include <bpf/bpf_helpers.h> #define VAR_NUM 16 diff --git a/tools/testing/selftests/bpf/progs/test_send_signal_kern.c b/tools/testing/selftests/bpf/progs/test_send_signal_kern.c index 176a355e30624..e70b191162359 100644 --- a/tools/testing/selftests/bpf/progs/test_send_signal_kern.c +++ b/tools/testing/selftests/bpf/progs/test_send_signal_kern.c @@ -1,7 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 // Copyright (c) 2019 Facebook #include <vmlinux.h> -#include <linux/version.h> #include <bpf/bpf_helpers.h> struct task_struct *bpf_task_from_pid(int pid) __ksym; diff --git a/tools/testing/selftests/bpf/progs/test_spin_lock.c b/tools/testing/selftests/bpf/progs/test_spin_lock.c index d8d77bdffd3d2..9bcee268f828b 100644 --- a/tools/testing/selftests/bpf/progs/test_spin_lock.c +++ b/tools/testing/selftests/bpf/progs/test_spin_lock.c @@ -1,7 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 // Copyright (c) 2019 Facebook #include <linux/bpf.h> -#include <linux/version.h> #include <bpf/bpf_helpers.h> #include "bpf_misc.h" diff --git a/tools/testing/selftests/bpf/progs/test_tcp_estats.c b/tools/testing/selftests/bpf/progs/test_tcp_estats.c index e2ae049c2f850..eb0e55ba3f284 100644 --- a/tools/testing/selftests/bpf/progs/test_tcp_estats.c +++ b/tools/testing/selftests/bpf/progs/test_tcp_estats.c @@ -34,7 +34,6 @@ #include <string.h> #include <linux/bpf.h> #include <linux/ipv6.h> -#include <linux/version.h> #include <sys/socket.h> #include <bpf/bpf_helpers.h> diff --git a/tools/testing/selftests/wireguard/qemu/init.c b/tools/testing/selftests/wireguard/qemu/init.c index 3e49924dd77e8..20d8d3192f75c 100644 --- a/tools/testing/selftests/wireguard/qemu/init.c +++ b/tools/testing/selftests/wireguard/qemu/init.c @@ -24,7 +24,6 @@ #include <sys/sysmacros.h> #include <sys/random.h> #include <linux/random.h> -#include <linux/version.h> __attribute__((noreturn)) static void poweroff(void) { -- 2.51.1

2 months, 3 weeks

2
1
0 0

[PATCH nf-next v6 0/2] Add IPIP flowtable SW acceleratio

by Lorenzo Bianconi

Introduce SW acceleration for IPIP tunnels in the netfilter flowtable infrastructure. --- Changes in v6: - Rebase on top of nf-next main branch - Link to v5: https://lore.kernel.org/r/20250721-nf-flowtable-ipip-v5-0-0865af9e58c6@kern… Changes in v5: - Rely on __ipv4_addr_hash() to compute the hash used as encap ID - Remove unnecessary pskb_may_pull() in nf_flow_tuple_encap() - Add nf_flow_ip4_ecanp_pop utility routine - Link to v4: https://lore.kernel.org/r/20250718-nf-flowtable-ipip-v4-0-f8bb1c18b986@kern… Changes in v4: - Use the hash value of the saddr, daddr and protocol of outer IP header as encapsulation id. - Link to v3: https://lore.kernel.org/r/20250703-nf-flowtable-ipip-v3-0-880afd319b9f@kern… Changes in v3: - Add outer IP header sanity checks - target nf-next tree instead of net-next - Link to v2: https://lore.kernel.org/r/20250627-nf-flowtable-ipip-v2-0-c713003ce75b@kern… Changes in v2: - Introduce IPIP flowtable selftest - Link to v1: https://lore.kernel.org/r/20250623-nf-flowtable-ipip-v1-1-2853596e3941@kern… --- Lorenzo Bianconi (2): net: netfilter: Add IPIP flowtable SW acceleration selftests: netfilter: nft_flowtable.sh: Add IPIP flowtable selftest include/linux/netdevice.h | 1 + net/ipv4/ipip.c | 28 +++++++++++ net/netfilter/nf_flow_table_ip.c | 56 +++++++++++++++++++++- net/netfilter/nft_flow_offload.c | 1 + .../selftests/net/netfilter/nft_flowtable.sh | 40 ++++++++++++++++ 5 files changed, 124 insertions(+), 2 deletions(-) --- base-commit: bab3ce404553de56242d7b09ad7ea5b70441ea41 change-id: 20250623-nf-flowtable-ipip-1b3d7b08d067 Best regards, -- Lorenzo Bianconi <lorenzo(a)kernel.org>

2 months, 3 weeks

2
5
0 0

[PATCH v6 00/15] Consolidate iommu page table implementations (AMD)

by Jason Gunthorpe

[All the precursor patches are merged now and AMD/RISCV/VTD conversions are written] Currently each of the iommu page table formats duplicates all of the logic to maintain the page table and perform map/unmap/etc operations. There are several different versions of the algorithms between all the different formats. The io-pgtable system provides an interface to help isolate the page table code from the iommu driver, but doesn't provide tools to implement the common algorithms. This makes it very hard to improve the state of the pagetable code under the iommu domains as any proposed improvement needs to alter a large number of different driver code paths. Combined with a lack of software based testing this makes improvement in this area very hard. iommufd wants several new page table operations: - More efficient map/unmap operations, using iommufd's batching logic - unmap that returns the physical addresses into a batch as it progresses - cut that allows splitting areas so large pages can have holes poked in them dynamically (ie guestmemfd hitless shared/private transitions) - More agressive freeing of table memory to avoid waste - Fragmenting large pages so that dirty tracking can be more granular - Reassembling large pages so that VMs can run at full IO performance in migration/dirty tracking error flows - KHO integration for kernel live upgrade Together these are algorithmically complex enough to be a very significant task to go and implement in all the page table formats we support. Just the "server" focused drivers use almost all the formats (ARMv8 S1&S2 / x86 PAE / AMDv1 / VT-D SS / RISCV) Instead of doing the duplicated work, this series takes the first step to consolidate the algorithms into one places. In spirit it is similar to the work Christoph did a few years back to pull the redundant get_user_pages() implementations out of the arch code into core MM. This unlocked a great deal of improvement in that space in the following years. I would like to see the same benefit in iommu as well. My first RFC showed a bigger picture with all most all formats and more algorithms. This series reorganizes that to be narrowly focused on just enough to convert the AMD driver to use the new mechanism. kunit tests are provided that allow good testing of the algorithms and all formats on x86, nothing is arch specific. AMD is one of the simpler options as the HW is quite uniform with few different options/bugs while still requiring the complicated contiguous pages support. The HW also has a very simple range based invalidation approach that is easy to implement. The AMD v1 and AMD v2 page table formats are implemented bit for bit identical to the current code, tested using a compare kunit test that checks against the io-pgtable version (on github, see below). Updating the AMD driver to replace the io-pgtable layer with the new stuff is fairly straightforward now. The layering is fixed up in the new version so that all the invalidation goes through function pointers. Several small fixing patches have come out of this as I've been fixing the problems that the test suite uncovers in the current code, and implementing the fixed version in iommupt. On performance, there is a quite wide variety of implementation designs across all the drivers. Looking at some key performance across the main formats: iommu_map(): pgsz ,avg new,old ns, min new,old ns , min % (+ve is better) 2^12, 53,66 , 51,63 , 19.19 (AMDV1) 256*2^12, 386,1909 , 367,1795 , 79.79 256*2^21, 362,1633 , 355,1556 , 77.77 2^12, 56,62 , 52,59 , 11.11 (AMDv2) 256*2^12, 405,1355 , 357,1292 , 72.72 256*2^21, 393,1160 , 358,1114 , 67.67 2^12, 55,65 , 53,62 , 14.14 (VTD second stage) 256*2^12, 391,518 , 332,512 , 35.35 256*2^21, 383,635 , 336,624 , 46.46 2^12, 57,65 , 55,63 , 12.12 (ARM 64 bit) 256*2^12, 380,389 , 361,369 , 2.02 256*2^21, 358,419 , 345,400 , 13.13 iommu_unmap(): pgsz ,avg new,old ns, min new,old ns , min % (+ve is better) 2^12, 69,88 , 65,85 , 23.23 (AMDv1) 256*2^12, 353,6498 , 331,6029 , 94.94 256*2^21, 373,6014 , 360,5706 , 93.93 2^12, 71,72 , 66,69 , 4.04 (AMDv2) 256*2^12, 228,891 , 206,871 , 76.76 256*2^21, 254,721 , 245,711 , 65.65 2^12, 69,87 , 65,82 , 20.20 (VTD second stage) 256*2^12, 210,321 , 200,315 , 36.36 256*2^21, 255,349 , 238,342 , 30.30 2^12, 72,77 , 68,74 , 8.08 (ARM 64 bit) 256*2^12, 521,357 , 447,346 , -29.29 256*2^21, 489,358 , 433,345 , -25.25 * Above numbers include additional patches to remove the iommu_pgsize() overheads. gcc 13.3.0, i7-12700 This version provides fairly consistent performance across formats. ARM unmap performance is quite different because this version supports contiguous pages and uses a very different algorithm for unmapping. Though why it is so worse compared to AMDv1 I haven't figured out yet. The per-format commits include a more detailed chart. There is a second branch: https://github.com/jgunthorpe/linux/commits/iommu_pt_all Containing supporting work and future steps: - ARM short descriptor (32 bit), ARM long descriptor (64 bit) formats - RISCV format and RISCV conversion https://github.com/jgunthorpe/linux/commits/iommu_pt_riscv - Support for a DMA incoherent HW page table walker - VT-D second stage format and VT-D conversion https://github.com/jgunthorpe/linux/commits/iommu_pt_vtd - DART v1 & v2 format - Draft of a iommufd 'cut' operation to break down huge pages - A compare test that checks the iommupt formats against the iopgtable interface, including updating AMD to have a working iopgtable and patches to make VT-D have an iopgtable for testing. - A performance test to micro-benchmark map and unmap against iogptable My strategy is to go one by one for the drivers: - AMD driver conversion - RISCV page table and driver - Intel VT-D driver and VTDSS page table - Flushing improvements for RISCV - ARM SMMUv3 And concurrently work on the algorithm side: - debugfs content dump, like VT-D has - Cut support - Increase/Decrease page size support - map/unmap batching - KHO As we make more algorithm improvements the value to convert the drivers increases. This is on github: https://github.com/jgunthorpe/linux/commits/iommu_pt v6: - Improve comments and documentation - Rename pt_entry_oa_full -> pt_entry_oa_exact pt_has_system_page -> pt_has_system_page_size pt_max_output_address_lg2 -> pt_max_oa_lg2 log2_f*() -> vaf* / oaf* / f*_t pt_item_fully_covered -> pt_entry_fully_covered - Fix missed constant propogation causing division - Consolidate debugging checks to pt_check_install_leaf_args() - Change collect->ignore_mapped to check_mapped - Shuffle some hunks around to more appropriate patches - Two new mini kunit tests v5: https://patch.msgid.link/r/0-v5-116c4948af3d+68091-iommu_pt_jgg@nvidia.com - Text grammar updates and kdoc fixes v4: https://patch.msgid.link/r/0-v4-0d6a6726a372+18959-iommu_pt_jgg@nvidia.com - Rebase on v6.16-rc3 - Integrate the HATS/HATDis changes - Remove 'default n' from kconfig - Remove unused 'PT_FIXED_TOP_LEVEL' - Improve comments and documentation - Fix some compile warnings from kbuild robots v3: https://patch.msgid.link/r/0-v3-a93aab628dbc+521-iommu_pt_jgg@nvidia.com - Rebase on v6.16-rc2 - s/PT_ENTRY_WORD_SIZE/PT_ITEM_WORD_SIZE/s to follow the language better - Comment and documentation updates - Add PT_TOP_PHYS_MASK to help manage alignment restrictions on the top pointer - Add missed force_aperture = true - Make pt_iommu_deinit() take care of the not-yet-inited error case internally as AMD/RISCV/VTD all shared this logic - Change gather_range() into gather_range_pages() so it also deals with the page list. This makes the following cache flushing series simpler - Fix missed update of unmap->unmapped in some error cases - Change clear_contig() to order the gather more logically - Remove goto from the error handling in __map_range_leaf() - s/log2_/oalog2_/ in places where the argument is an oaddr_t - Pass the pts to pt_table_install64/32() - Do not use SIGN_EXTEND for the AMDv2 page table because of Vasant's information on how PASID 0 works. v2: https://patch.msgid.link/r/0-v2-5c26bde5c22d+58b-iommu_pt_jgg@nvidia.com - AMD driver only, many code changes RFC: https://lore.kernel.org/all/0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com/ Cc: Michael Roth <michael.roth(a)amd.com> Cc: Alexey Kardashevskiy <aik(a)amd.com> Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com> Cc: James Gowans <jgowans(a)amazon.com> Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com> Alejandro Jimenez (1): iommu/amd: Use the generic iommu page table Jason Gunthorpe (14): genpt: Generic Page Table base API genpt: Add Documentation/ files iommupt: Add the basic structure of the iommu implementation iommupt: Add the AMD IOMMU v1 page table format iommupt: Add iova_to_phys op iommupt: Add unmap_pages op iommupt: Add map_pages op iommupt: Add read_and_clear_dirty op iommupt: Add a kunit test for Generic Page Table iommupt: Add a mock pagetable format for iommufd selftest to use iommufd: Change the selftest to use iommupt instead of xarray iommupt: Add the x86 64 bit page table format iommu/amd: Remove AMD io_pgtable support iommupt: Add a kunit test for the IOMMU implementation .clang-format | 1 + Documentation/driver-api/generic_pt.rst | 142 ++ Documentation/driver-api/index.rst | 1 + drivers/iommu/Kconfig | 2 + drivers/iommu/Makefile | 1 + drivers/iommu/amd/Kconfig | 5 +- drivers/iommu/amd/Makefile | 2 +- drivers/iommu/amd/amd_iommu.h | 1 - drivers/iommu/amd/amd_iommu_types.h | 110 +- drivers/iommu/amd/io_pgtable.c | 577 -------- drivers/iommu/amd/io_pgtable_v2.c | 370 ------ drivers/iommu/amd/iommu.c | 538 ++++---- drivers/iommu/generic_pt/.kunitconfig | 13 + drivers/iommu/generic_pt/Kconfig | 67 + drivers/iommu/generic_pt/fmt/Makefile | 26 + drivers/iommu/generic_pt/fmt/amdv1.h | 408 ++++++ drivers/iommu/generic_pt/fmt/defs_amdv1.h | 21 + drivers/iommu/generic_pt/fmt/defs_x86_64.h | 21 + drivers/iommu/generic_pt/fmt/iommu_amdv1.c | 15 + drivers/iommu/generic_pt/fmt/iommu_mock.c | 10 + drivers/iommu/generic_pt/fmt/iommu_template.h | 48 + drivers/iommu/generic_pt/fmt/iommu_x86_64.c | 11 + drivers/iommu/generic_pt/fmt/x86_64.h | 251 ++++ drivers/iommu/generic_pt/iommu_pt.h | 1157 +++++++++++++++++ drivers/iommu/generic_pt/kunit_generic_pt.h | 713 ++++++++++ drivers/iommu/generic_pt/kunit_iommu.h | 182 +++ drivers/iommu/generic_pt/kunit_iommu_pt.h | 486 +++++++ drivers/iommu/generic_pt/pt_common.h | 358 +++++ drivers/iommu/generic_pt/pt_defs.h | 329 +++++ drivers/iommu/generic_pt/pt_fmt_defaults.h | 233 ++++ drivers/iommu/generic_pt/pt_iter.h | 636 +++++++++ drivers/iommu/generic_pt/pt_log2.h | 122 ++ drivers/iommu/io-pgtable.c | 4 - drivers/iommu/iommufd/Kconfig | 1 + drivers/iommu/iommufd/iommufd_test.h | 11 +- drivers/iommu/iommufd/selftest.c | 438 +++---- include/linux/generic_pt/common.h | 167 +++ include/linux/generic_pt/iommu.h | 270 ++++ include/linux/io-pgtable.h | 2 - tools/testing/selftests/iommu/iommufd.c | 60 +- tools/testing/selftests/iommu/iommufd_utils.h | 12 + 41 files changed, 6212 insertions(+), 1610 deletions(-) create mode 100644 Documentation/driver-api/generic_pt.rst delete mode 100644 drivers/iommu/amd/io_pgtable.c delete mode 100644 drivers/iommu/amd/io_pgtable_v2.c create mode 100644 drivers/iommu/generic_pt/.kunitconfig create mode 100644 drivers/iommu/generic_pt/Kconfig create mode 100644 drivers/iommu/generic_pt/fmt/Makefile create mode 100644 drivers/iommu/generic_pt/fmt/amdv1.h create mode 100644 drivers/iommu/generic_pt/fmt/defs_amdv1.h create mode 100644 drivers/iommu/generic_pt/fmt/defs_x86_64.h create mode 100644 drivers/iommu/generic_pt/fmt/iommu_amdv1.c create mode 100644 drivers/iommu/generic_pt/fmt/iommu_mock.c create mode 100644 drivers/iommu/generic_pt/fmt/iommu_template.h create mode 100644 drivers/iommu/generic_pt/fmt/iommu_x86_64.c create mode 100644 drivers/iommu/generic_pt/fmt/x86_64.h create mode 100644 drivers/iommu/generic_pt/iommu_pt.h create mode 100644 drivers/iommu/generic_pt/kunit_generic_pt.h create mode 100644 drivers/iommu/generic_pt/kunit_iommu.h create mode 100644 drivers/iommu/generic_pt/kunit_iommu_pt.h create mode 100644 drivers/iommu/generic_pt/pt_common.h create mode 100644 drivers/iommu/generic_pt/pt_defs.h create mode 100644 drivers/iommu/generic_pt/pt_fmt_defaults.h create mode 100644 drivers/iommu/generic_pt/pt_iter.h create mode 100644 drivers/iommu/generic_pt/pt_log2.h create mode 100644 include/linux/generic_pt/common.h create mode 100644 include/linux/generic_pt/iommu.h base-commit: cc1d7df505790fe734117b41455f1fe82ebf5ae5 -- 2.43.0

2 months, 3 weeks

3
29
0 0

[PATCH v6 00/10] liveupdate: Rework KHO for in-kernel users & Fix memory corruption

by Pasha Tatashin

This series addresses comments and combines into one the two series [1] and [2], and adds review-bys. This series refactors the KHO framework to better support in-kernel users like the upcoming LUO. The current design, which relies on a notifier chain and debugfs for control, is too restrictive for direct programmatic use. The core of this rework is the removal of the notifier chain in favor of a direct registration API. This decouples clients from the shutdown-time finalization sequence, allowing them to manage their preserved state more flexibly and at any time. Also, this series fixes a memory corruption bug in KHO that occurs when KFENCE is enabled. The root cause is that KHO metadata, allocated via kzalloc(), can be randomly serviced by kfence_alloc(). When a kernel boots via KHO, the early memblock allocator is restricted to a "scratch area". This forces the KFENCE pool to be allocated within this scratch area, creating a conflict. If KHO metadata is subsequently placed in this pool, it gets corrupted during the next kexec operation. [1] https://lore.kernel.org/all/20251007033100.836886-1-pasha.tatashin@soleen.c… [2] https://lore.kernel.org/all/20251015053121.3978358-1-pasha.tatashin@soleen.… Mike Rapoport (Microsoft) (1): kho: drop notifiers Pasha Tatashin (9): kho: allow to drive kho from within kernel kho: make debugfs interface optional kho: add interfaces to unpreserve folios and page ranes kho: don't unpreserve memory during abort liveupdate: kho: move to kernel/liveupdate kho: move kho debugfs directory to liveupdate liveupdate: kho: warn and fail on metadata or preserved memory in scratch area liveupdate: kho: Increase metadata bitmap size to PAGE_SIZE liveupdate: kho: allocate metadata directly from the buddy allocator Documentation/core-api/kho/concepts.rst | 2 +- MAINTAINERS | 3 +- include/linux/kexec_handover.h | 53 +- init/Kconfig | 2 + kernel/Kconfig.kexec | 15 - kernel/Makefile | 2 +- kernel/liveupdate/Kconfig | 38 ++ kernel/liveupdate/Makefile | 5 + kernel/{ => liveupdate}/kexec_handover.c | 588 +++++++++----------- kernel/liveupdate/kexec_handover_debug.c | 25 + kernel/liveupdate/kexec_handover_debugfs.c | 216 +++++++ kernel/liveupdate/kexec_handover_internal.h | 56 ++ lib/test_kho.c | 30 +- mm/memblock.c | 62 +-- tools/testing/selftests/kho/init.c | 2 +- tools/testing/selftests/kho/vmtest.sh | 1 + 16 files changed, 645 insertions(+), 455 deletions(-) create mode 100644 kernel/liveupdate/Kconfig create mode 100644 kernel/liveupdate/Makefile rename kernel/{ => liveupdate}/kexec_handover.c (78%) create mode 100644 kernel/liveupdate/kexec_handover_debug.c create mode 100644 kernel/liveupdate/kexec_handover_debugfs.c create mode 100644 kernel/liveupdate/kexec_handover_internal.h base-commit: f406055cb18c6e299c4a783fc1effeb16be41803 -- 2.51.0.915.g61a8936c21-goog

2 months, 3 weeks

2
22
0 0

[PATCH bpf-next v5 00/15] selftests/bpf: Integrate test_xsk.c to test_progs framework

by Bastien Curutchet (eBPF Foundation)

Hi all, Now that the merge window is over, here's a respin of the previous iteration rebased on the latest bpf-next_base. The bug triggering the XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF failure when CONFIG_DEBUG_VM is enabled hasn't been fixed yet so I've moved the test to the flaky table. The test_xsk.sh script covers many AF_XDP use cases. The tests it runs are defined in xksxceiver.c. Since this script is used to test real hardware, the goal here is to leave it as it is, and only integrate the tests that run on veth peers into the test_progs framework. Some tests are flaky so they can't be integrated in the CI as they are. I think that fixing their flakyness would require a significant amount of work. So, as first step, I've excluded them from the list of tests migrated to the CI (cf PATCH 14). If these tests get fixed at some point, integrating them into the CI will be straightforward. PATCH 1 extracts test_xsk[.c/.h] from xskxceiver[.c/.h] to make the tests available to test_progs. PATCH 2 to 7 fix small issues in the current test PATCH 8 to 13 handle all errors to release resources instead of calling exit() when any error occurs. PATCH 14 isolates some flaky tests PATCH 15 integrate the non-flaky tests to the test_progs framework Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com> --- Changes in v5: - Rebase on latest bpf-next_base - Move XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF to the flaky table - Add Maciej's reviewed-by - Link to v4: https://lore.kernel.org/r/20250924-xsk-v4-0-20e57537b876@bootlin.com Changes in v4: - Fix test_xsk.sh's summary report. - Merge PATCH 11 & 12 together, otherwise PATCH 11 fails to build. - Split old PATCH 3 in two patches. The first one fixes testapp_stats_rx_dropped(), the second one fixes testapp_xdp_shared_umem(). The unecessary frees (in testapp_stats_rx_full() and testapp_stats_fill_empty() are removed) - Link to v3: https://lore.kernel.org/r/20250904-xsk-v3-0-ce382e331485@bootlin.com Changes in v3: - Rebase on latest bpf-next_base to integrate commit c9110e6f7237 ("selftests/bpf: Fix count write in testapp_xdp_metadata_copy()"). - Move XDP_METADATA_COPY_* tests from flaky-tests to nominal tests - Link to v2: https://lore.kernel.org/r/20250902-xsk-v2-0-17c6345d5215@bootlin.com Changes in v2: - Rebase on the latest bpf-next_base and integrate the newly added tests to the work (adjust_tail* and tx_queue_consumer tests) - Re-order patches to split xkxceiver sooner. - Fix the bug reported by Maciej. - Fix verbose mode in test_xsk.sh by keeping kselftest (remove PATCH 1, 7 and 8) - Link to v1: https://lore.kernel.org/r/20250313-xsk-v1-0-7374729a93b9@bootlin.com --- Bastien Curutchet (eBPF Foundation) (15): selftests/bpf: test_xsk: Split xskxceiver selftests/bpf: test_xsk: Initialize bitmap before use selftests/bpf: test_xsk: Fix __testapp_validate_traffic()'s return value selftests/bpf: test_xsk: fix memory leak in testapp_stats_rx_dropped() selftests/bpf: test_xsk: fix memory leak in testapp_xdp_shared_umem() selftests/bpf: test_xsk: Wrap test clean-up in functions selftests/bpf: test_xsk: Release resources when swap fails selftests/bpf: test_xsk: Add return value to init_iface() selftests/bpf: test_xsk: Don't exit immediately when xsk_attach fails selftests/bpf: test_xsk: Don't exit immediately when gettimeofday fails selftests/bpf: test_xsk: Don't exit immediately when workers fail selftests/bpf: test_xsk: Don't exit immediately if validate_traffic fails selftests/bpf: test_xsk: Don't exit immediately on allocation failures selftests/bpf: test_xsk: Isolate flaky tests selftests/bpf: test_xsk: Integrate test_xsk.c to test_progs framework tools/testing/selftests/bpf/Makefile | 11 +- tools/testing/selftests/bpf/prog_tests/test_xsk.c | 2595 ++++++++++++++++++++ tools/testing/selftests/bpf/prog_tests/test_xsk.h | 294 +++ tools/testing/selftests/bpf/prog_tests/xsk.c | 146 ++ tools/testing/selftests/bpf/xskxceiver.c | 2696 +-------------------- tools/testing/selftests/bpf/xskxceiver.h | 156 -- 6 files changed, 3174 insertions(+), 2724 deletions(-) --- base-commit: bd61720310e0b11bfbb7c8e1f373bb87d98451d4 change-id: 20250218-xsk-0cf90e975d14 Best regards, -- Bastien Curutchet (eBPF Foundation) <tux(a)bootlin.com>

2 months, 3 weeks

4
18
0 0

[PATCH] Documentation: kunit: add description of kunit.enable parameter

by Yuya Ishikawa

The current KUnit documentation does not mention the kunit.enable kernel parameter, making it unclear how to troubleshoot cases where KUnit tests do not run as expected. Add a note explaining kunit.enable parmaeter. Disabling this parameter prevents all KUnit tests from running even if CONFIG_KUNIT is enabled. Signed-off-by: Yuya Ishikawa <ishikawa.yuy-00(a)jp.fujitsu.com> --- Documentation/dev-tools/kunit/run_manual.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/Documentation/dev-tools/kunit/run_manual.rst b/Documentation/dev-tools/kunit/run_manual.rst index 699d92885075..98e8d5b28808 100644 --- a/Documentation/dev-tools/kunit/run_manual.rst +++ b/Documentation/dev-tools/kunit/run_manual.rst @@ -35,6 +35,12 @@ or be built into the kernel. a good way of quickly testing everything applicable to the current config. + KUnit can be enabled or disabled at boot time, and this behavior is + controlled by the kunit.enable kernel parameter. + By default, kunit.enable is set to 1 because KUNIT_DEFAULT_ENABLED is + enabled by default. To ensure that tests are executed as expected, + verify that kunit.enable=1 at boot time. + Once we have built our kernel (and/or modules), it is simple to run the tests. If the tests are built-in, they will run automatically on the kernel boot. The results will be written to the kernel log (``dmesg``) -- 2.47.3

2 months, 3 weeks

2
1
0 0

[PATCH rc] iommufd/selftest: Fix ioctl return value in _test_cmd_trigger_vevents()

by Nicolin Chen

The ioctl returns 0 upon success, so !0 returning -1 breaks the selftest. Drop the '!' to fix it. Fixes: 1d235d849425 ("iommu/selftest: prevent use of uninitialized variable") Signed-off-by: Nicolin Chen <nicolinc(a)nvidia.com> --- tools/testing/selftests/iommu/iommufd_utils.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 772ca1db6e597..9f472c20c1905 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -1044,8 +1044,8 @@ static int _test_cmd_trigger_vevents(int fd, __u32 dev_id, __u32 nvevents) }; while (nvevents--) { - if (!ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_TRIGGER_VEVENT), - &trigger_vevent_cmd)) + if (ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_TRIGGER_VEVENT), + &trigger_vevent_cmd)) return -1; } return 0; -- 2.43.0

2 months, 3 weeks

4
4
0 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror