- Linux-kselftest-mirror - lists.linaro.org

[PATCH] kunit: Make filter parameters configurable via Kconfig

by Thomas Weißschuh

Enable the preset of filter parameters from kconfig options, similar to how other KUnit configuration parameters are handled already. This is useful to run a subset of tests even if the cmdline is not readily modifyable. Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- lib/kunit/Kconfig | 24 ++++++++++++++++++++++++ lib/kunit/executor.c | 8 +++++--- 2 files changed, 29 insertions(+), 3 deletions(-) diff --git a/lib/kunit/Kconfig b/lib/kunit/Kconfig index 7a6af361d2fc6276b9667be8c694b0c80e33c1e8..50ecf55d2b9c8a82f2aff7a0b4156bd6179b0a2f 100644 --- a/lib/kunit/Kconfig +++ b/lib/kunit/Kconfig @@ -93,6 +93,30 @@ config KUNIT_AUTORUN_ENABLED In most cases this should be left as Y. Only if additional opt-in behavior is needed should this be set to N. +config KUNIT_DEFAULT_FILTER_GLOB + string "Default value of the filter_glob module parameter" + help + Sets the default value of kunit.filter_glob. If set to a non-empty + string only matching tests are executed. + + If unsure, leave empty so all tests are executed. + +config KUNIT_DEFAULT_FILTER + string "Default value of the filter module parameter" + help + Sets the default value of kunit.filter. If set to a non-empty + string only matching tests are executed. + + If unsure, leave empty so all tests are executed. + +config KUNIT_DEFAULT_FILTER_ACTION + string "Default value of the filter_action module parameter" + help + Sets the default value of kunit.filter_action. If set to a non-empty + string only matching tests are executed. + + If unsure, leave empty so all tests are executed. + config KUNIT_DEFAULT_TIMEOUT int "Default value of the timeout module parameter" default 300 diff --git a/lib/kunit/executor.c b/lib/kunit/executor.c index 0061d4c7e35170634a3c1d1cff7179037fb8ba07..02ff380ab7938cfac2be3f8c0e7630a78961cc3d 100644 --- a/lib/kunit/executor.c +++ b/lib/kunit/executor.c @@ -45,9 +45,11 @@ bool kunit_autorun(void) return autorun_param; } -static char *filter_glob_param; -static char *filter_param; -static char *filter_action_param; +#define PARAM_FROM_CONFIG(config) (config[0] ? config : NULL) + +static char *filter_glob_param = PARAM_FROM_CONFIG(CONFIG_KUNIT_DEFAULT_FILTER_GLOB); +static char *filter_param = PARAM_FROM_CONFIG(CONFIG_KUNIT_DEFAULT_FILTER); +static char *filter_action_param = PARAM_FROM_CONFIG(CONFIG_KUNIT_DEFAULT_FILTER_ACTION); module_param_named(filter_glob, filter_glob_param, charp, 0600); MODULE_PARM_DESC(filter_glob, --- base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787 change-id: 20251106-kunit-filter-kconfig-f08998936fc6 Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

2 weeks, 6 days

2
2
0 0

[PATCH RESEND] selftests/futex: skip tests if shmget unsupported

by Carlos Llamas

On systems where the shmget() syscall is not supported, tests like anon_page and shared_waitv will fail. Skip these tests in such cases to allow the rest of the test suite to run. Signed-off-by: Carlos Llamas <cmllamas(a)google.com> --- tools/testing/selftests/futex/functional/futex_wait.c | 2 ++ tools/testing/selftests/futex/functional/futex_waitv.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/tools/testing/selftests/futex/functional/futex_wait.c b/tools/testing/selftests/futex/functional/futex_wait.c index 152ca4612886..1269642bb662 100644 --- a/tools/testing/selftests/futex/functional/futex_wait.c +++ b/tools/testing/selftests/futex/functional/futex_wait.c @@ -71,6 +71,8 @@ TEST(anon_page) /* Testing an anon page shared memory */ shm_id = shmget(IPC_PRIVATE, 4096, IPC_CREAT | 0666); if (shm_id < 0) { + if (errno == ENOSYS) + ksft_exit_skip("shmget syscall not supported\n"); perror("shmget"); exit(1); } diff --git a/tools/testing/selftests/futex/functional/futex_waitv.c b/tools/testing/selftests/futex/functional/futex_waitv.c index c684b10eb76e..3bc4e5dc70e7 100644 --- a/tools/testing/selftests/futex/functional/futex_waitv.c +++ b/tools/testing/selftests/futex/functional/futex_waitv.c @@ -86,6 +86,8 @@ TEST(shared_waitv) int shm_id = shmget(IPC_PRIVATE, 4096, IPC_CREAT | 0666); if (shm_id < 0) { + if (errno == ENOSYS) + ksft_exit_skip("shmget syscall not supported\n"); perror("shmget"); exit(1); } -- 2.52.0.rc1.455.g30608eb744-goog

3 weeks

2
1
0 0

[PATCH net v5 0/3] mptcp: Fix conflicts between MPTCP and sockmap

by Jiayuan Chen

Overall, we encountered a warning [1] that can be triggered by running the selftest I provided. sockmap works by replacing sk_data_ready, recvmsg, sendmsg operations and implementing fast socket-level forwarding logic: 1. Users can obtain file descriptors through userspace socket()/accept() interfaces, then call BPF syscall to perform these replacements. 2. Users can also use the bpf_sock_hash_update helper (in sockops programs) to replace handlers when TCP connections enter ESTABLISHED state (BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB/BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB) However, when combined with MPTCP, an issue arises: MPTCP creates subflow sk's and performs TCP handshakes, so the BPF program obtains subflow sk's and may incorrectly replace their sk_prot. We need to reject such operations. In patch 1, we set psock_update_sk_prot to NULL in the subflow's custom sk_prot. Additionally, if the server's listening socket has MPTCP enabled and the client's TCP also uses MPTCP, we should allow the combination of subflow and sockmap. This is because the latest Golang programs have enabled MPTCP for listening sockets by default [2]. For programs already using sockmap, upgrading Golang should not cause sockmap functionality to fail. Patch 2 prevents the WARNING from occurring. Despite these patches fixing stream corruption, users of sockmap must set GODEBUG=multipathtcp=0 to disable MPTCP until sockmap fully supports it. [1] truncated warning: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 mptcp_stream_accept+0x34c/0x380 Modules linked in: RIP: 0010:mptcp_stream_accept+0x34c/0x380 RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202 PKRU: 55555554 Call Trace: <TASK> do_accept+0xeb/0x190 ? __x64_sys_pselect6+0x61/0x80 ? _raw_spin_unlock+0x12/0x30 ? alloc_fd+0x11e/0x190 __sys_accept4+0x8c/0x100 __x64_sys_accept+0x1f/0x30 x64_sys_call+0x202f/0x20f0 do_syscall_64+0x72/0x9a0 ? switch_fpu_return+0x60/0xf0 ? irqentry_exit_to_user_mode+0xdb/0x1e0 ? irqentry_exit+0x3f/0x50 ? clear_bhb_loop+0x50/0xa0 ? clear_bhb_loop+0x50/0xa0 ? clear_bhb_loop+0x50/0xa0 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> ---[ end trace 0000000000000000 ]--- [2]: https://go-review.googlesource.com/c/go/+/607715 --- v4 -> v5: Dropped redundant selftest code, updated the Fixes tag, and added a Reviewed-by tag. v3 -> v4: Addressed questions from Matthieu and Paolo, explained sockmap's operational mechanism, and finalized the changes v2 -> v3: Adopted Jakub Sitnicki's suggestions - atomic retrieval of sk_family is required v1 -> v2: Had initial discussion with Matthieu on sockmap and MPTCP technical details v4: https://lore.kernel.org/bpf/20251105113625.148900-1-jiayuan.chen@linux.dev/ v3: https://lore.kernel.org/bpf/20251023125450.105859-1-jiayuan.chen@linux.dev/ v2: https://lore.kernel.org/bpf/20251020060503.325369-1-jiayuan.chen@linux.dev/… v1: https://lore.kernel.org/mptcp/a0a2b87119a06c5ffaa51427a0964a05534fe6f1@linu… Jiayuan Chen (3): mptcp: disallow MPTCP subflows from sockmap net,mptcp: fix proto fallback detection with BPF selftests/bpf: Add mptcp test with sockmap net/mptcp/protocol.c | 6 +- net/mptcp/subflow.c | 8 + .../testing/selftests/bpf/prog_tests/mptcp.c | 141 ++++++++++++++++++ .../selftests/bpf/progs/mptcp_sockmap.c | 43 ++++++ 4 files changed, 196 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/mptcp_sockmap.c base-commit: 8c0726e861f3920bac958d76cf134b5a3aa14ce4 -- 2.43.0

3 weeks

5
12
0 0

Re: [PATCH] selftests/cgroup: conform test to TAP format output

by Sebastian Chlad

On Fri, Nov 14, 2025 at 4:59 AM Guopeng Zhang <zhangguopeng(a)kylinos.cn> wrote: > > Hi Michal, > > Thanks for reviewing and pointing out [1]. > > > Could you please explain more why is the TAP layout beneficial? > > (I understand selftest are for oneself, i.e. human readable only by default.) > > Actually, selftests are no longer just something for developers to view locally; they are now extensively > run in CI and stable branch regression testing. Using a standardized layout means that general test runners > and CI systems can parse the cgroup test results without any special handling. I second that. In fact, we do run some of those tests in the CI; i.e. https://openqa.opensuse.org/tests/5453031#external We added this: https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Parser/Format/… to our CI but frankly the use of the KTAP across the selftests is very inconsistent, so we need to post-process some of the output files quite a lot. Therefore the more standardized the output, the better for any CI. Small ask: should we amend the commit message to say KTAP? That being said - the cgroups tests produce nice output which is easy to parse and gives us no issues in our CI apart from the shell tests, specifically test_cpuset_prs.sh. We currently run the cgroup tests only internally because some of them tend to fail when crossing resource-usage boundaries and don’t provide clear information about by how much. That ties into my earlier effort Michal linked here:: https://lore.kernel.org/all/rua6ubri67gh3b7atarbm5mggqgjyh6646mzkry2n2547jn… I’ll try to add the cgroup tests to the public openSUSE CI and will test your patches. > > TAP provides a structured format that is both human-readable and machine-readable. The plan/result lines are parsed by tools, > while the diagnostic lines can still contain human-readable debug information. Over time, other selftest suites (such as mm, KVM, mptcp, etc.) > have also been converted to TAP-style output, so this change just brings the cgroup tests in line with that broader direction. > > > Or is this part of some tree-wide effort? > > This patch is not part of a formal, tree-wide conversion series I am running; it is an incremental step to align the > cgroup C tests with the existing TAP usage. I started here because these tests already use ksft_test_result_*() and > only require minor changes to generate proper TAP output. > > > I'm asking to better asses whether also the scripts listed in > > Makefile:TEST_PROGS should be converted too. > > I agree that having them produce TAP output would benefit tooling and CI. I did not want to mix > that into this change, but if you and other maintainers think this direction is reasonable, > I would be happy to follow up and convert the cgroup shell tests to TAP as well. > > Thanks again for your review. > > Best regards, > Guopeng > >

3 weeks

2
1
0 0

[PATCH] selftests/cgroup: conform test to TAP format output

by Guopeng Zhang

Conform the layout, informational and status messages to TAP. No functional change is intended other than the layout of output messages. Signed-off-by: Guopeng Zhang <zhangguopeng(a)kylinos.cn> --- tools/testing/selftests/cgroup/test_core.c | 7 ++++--- tools/testing/selftests/cgroup/test_cpu.c | 7 ++++--- tools/testing/selftests/cgroup/test_cpuset.c | 7 ++++--- tools/testing/selftests/cgroup/test_freezer.c | 7 ++++--- tools/testing/selftests/cgroup/test_kill.c | 7 ++++--- tools/testing/selftests/cgroup/test_kmem.c | 7 ++++--- tools/testing/selftests/cgroup/test_memcontrol.c | 7 ++++--- tools/testing/selftests/cgroup/test_zswap.c | 7 ++++--- 8 files changed, 32 insertions(+), 24 deletions(-) diff --git a/tools/testing/selftests/cgroup/test_core.c b/tools/testing/selftests/cgroup/test_core.c index 5e5b8c4b8c0e..102262555a59 100644 --- a/tools/testing/selftests/cgroup/test_core.c +++ b/tools/testing/selftests/cgroup/test_core.c @@ -923,8 +923,10 @@ struct corecg_test { int main(int argc, char *argv[]) { char root[PATH_MAX]; - int i, ret = EXIT_SUCCESS; + int i; + ksft_print_header(); + ksft_set_plan(ARRAY_SIZE(tests)); if (cg_find_unified_root(root, sizeof(root), &nsdelegate)) { if (setup_named_v1_root(root, sizeof(root), CG_NAMED_NAME)) ksft_exit_skip("cgroup v2 isn't mounted and could not setup named v1 hierarchy\n"); @@ -946,12 +948,11 @@ int main(int argc, char *argv[]) ksft_test_result_skip("%s\n", tests[i].name); break; default: - ret = EXIT_FAILURE; ksft_test_result_fail("%s\n", tests[i].name); break; } } cleanup_named_v1_root(root); - return ret; + ksft_finished(); } diff --git a/tools/testing/selftests/cgroup/test_cpu.c b/tools/testing/selftests/cgroup/test_cpu.c index 7d77d3d43c8e..c83f05438d7c 100644 --- a/tools/testing/selftests/cgroup/test_cpu.c +++ b/tools/testing/selftests/cgroup/test_cpu.c @@ -796,8 +796,10 @@ struct cpucg_test { int main(int argc, char *argv[]) { char root[PATH_MAX]; - int i, ret = EXIT_SUCCESS; + int i; + ksft_print_header(); + ksft_set_plan(ARRAY_SIZE(tests)); if (cg_find_unified_root(root, sizeof(root), NULL)) ksft_exit_skip("cgroup v2 isn't mounted\n"); @@ -814,11 +816,10 @@ int main(int argc, char *argv[]) ksft_test_result_skip("%s\n", tests[i].name); break; default: - ret = EXIT_FAILURE; ksft_test_result_fail("%s\n", tests[i].name); break; } } - return ret; + ksft_finished(); } diff --git a/tools/testing/selftests/cgroup/test_cpuset.c b/tools/testing/selftests/cgroup/test_cpuset.c index 8094091a5857..c5cf8b56ceb8 100644 --- a/tools/testing/selftests/cgroup/test_cpuset.c +++ b/tools/testing/selftests/cgroup/test_cpuset.c @@ -247,8 +247,10 @@ struct cpuset_test { int main(int argc, char *argv[]) { char root[PATH_MAX]; - int i, ret = EXIT_SUCCESS; + int i; + ksft_print_header(); + ksft_set_plan(ARRAY_SIZE(tests)); if (cg_find_unified_root(root, sizeof(root), NULL)) ksft_exit_skip("cgroup v2 isn't mounted\n"); @@ -265,11 +267,10 @@ int main(int argc, char *argv[]) ksft_test_result_skip("%s\n", tests[i].name); break; default: - ret = EXIT_FAILURE; ksft_test_result_fail("%s\n", tests[i].name); break; } } - return ret; + ksft_finished(); } diff --git a/tools/testing/selftests/cgroup/test_freezer.c b/tools/testing/selftests/cgroup/test_freezer.c index 714c963aa3f5..97fae92c8387 100644 --- a/tools/testing/selftests/cgroup/test_freezer.c +++ b/tools/testing/selftests/cgroup/test_freezer.c @@ -1488,8 +1488,10 @@ struct cgfreezer_test { int main(int argc, char *argv[]) { char root[PATH_MAX]; - int i, ret = EXIT_SUCCESS; + int i; + ksft_print_header(); + ksft_set_plan(ARRAY_SIZE(tests)); if (cg_find_unified_root(root, sizeof(root), NULL)) ksft_exit_skip("cgroup v2 isn't mounted\n"); for (i = 0; i < ARRAY_SIZE(tests); i++) { @@ -1501,11 +1503,10 @@ int main(int argc, char *argv[]) ksft_test_result_skip("%s\n", tests[i].name); break; default: - ret = EXIT_FAILURE; ksft_test_result_fail("%s\n", tests[i].name); break; } } - return ret; + ksft_finished(); } diff --git a/tools/testing/selftests/cgroup/test_kill.c b/tools/testing/selftests/cgroup/test_kill.c index a4dd326ced79..c8c9d306925b 100644 --- a/tools/testing/selftests/cgroup/test_kill.c +++ b/tools/testing/selftests/cgroup/test_kill.c @@ -274,8 +274,10 @@ struct cgkill_test { int main(int argc, char *argv[]) { char root[PATH_MAX]; - int i, ret = EXIT_SUCCESS; + int i; + ksft_print_header(); + ksft_set_plan(ARRAY_SIZE(tests)); if (cg_find_unified_root(root, sizeof(root), NULL)) ksft_exit_skip("cgroup v2 isn't mounted\n"); for (i = 0; i < ARRAY_SIZE(tests); i++) { @@ -287,11 +289,10 @@ int main(int argc, char *argv[]) ksft_test_result_skip("%s\n", tests[i].name); break; default: - ret = EXIT_FAILURE; ksft_test_result_fail("%s\n", tests[i].name); break; } } - return ret; + ksft_finished(); } diff --git a/tools/testing/selftests/cgroup/test_kmem.c b/tools/testing/selftests/cgroup/test_kmem.c index 005a142f3492..ca38525484e3 100644 --- a/tools/testing/selftests/cgroup/test_kmem.c +++ b/tools/testing/selftests/cgroup/test_kmem.c @@ -421,8 +421,10 @@ struct kmem_test { int main(int argc, char **argv) { char root[PATH_MAX]; - int i, ret = EXIT_SUCCESS; + int i; + ksft_print_header(); + ksft_set_plan(ARRAY_SIZE(tests)); if (cg_find_unified_root(root, sizeof(root), NULL)) ksft_exit_skip("cgroup v2 isn't mounted\n"); @@ -446,11 +448,10 @@ int main(int argc, char **argv) ksft_test_result_skip("%s\n", tests[i].name); break; default: - ret = EXIT_FAILURE; ksft_test_result_fail("%s\n", tests[i].name); break; } } - return ret; + ksft_finished(); } diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index 2e9d78ab641c..4e1647568c5b 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -1650,8 +1650,10 @@ struct memcg_test { int main(int argc, char **argv) { char root[PATH_MAX]; - int i, proc_status, ret = EXIT_SUCCESS; + int i, proc_status; + ksft_print_header(); + ksft_set_plan(ARRAY_SIZE(tests)); if (cg_find_unified_root(root, sizeof(root), NULL)) ksft_exit_skip("cgroup v2 isn't mounted\n"); @@ -1685,11 +1687,10 @@ int main(int argc, char **argv) ksft_test_result_skip("%s\n", tests[i].name); break; default: - ret = EXIT_FAILURE; ksft_test_result_fail("%s\n", tests[i].name); break; } } - return ret; + ksft_finished(); } diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/testing/selftests/cgroup/test_zswap.c index ab865d900791..64ebc3f3f203 100644 --- a/tools/testing/selftests/cgroup/test_zswap.c +++ b/tools/testing/selftests/cgroup/test_zswap.c @@ -597,8 +597,10 @@ static bool zswap_configured(void) int main(int argc, char **argv) { char root[PATH_MAX]; - int i, ret = EXIT_SUCCESS; + int i; + ksft_print_header(); + ksft_set_plan(ARRAY_SIZE(tests)); if (cg_find_unified_root(root, sizeof(root), NULL)) ksft_exit_skip("cgroup v2 isn't mounted\n"); @@ -625,11 +627,10 @@ int main(int argc, char **argv) ksft_test_result_skip("%s\n", tests[i].name); break; default: - ret = EXIT_FAILURE; ksft_test_result_fail("%s\n", tests[i].name); break; } } - return ret; + ksft_finished(); } -- 2.25.1

3 weeks

2
2
0 0

[PATCH v4 0/3] VMM can handle guest SEA via KVM_EXIT_ARM_SEA

by Jiaqi Yan

Problem ======= When host APEI is unable to claim a synchronous external abort (SEA) during guest abort, today KVM directly injects an asynchronous SError into the VCPU then resumes it. The injected SError usually results in unpleasant guest kernel panic. One of the major situation of guest SEA is when VCPU consumes recoverable uncorrected memory error (UER), which is not uncommon at all in modern datacenter servers with large amounts of physical memory. Although SError and guest panic is sufficient to stop the propagation of corrupted memory, there is room to recover from an UER in a more graceful manner. Proposed Solution ================= The idea is, we can replay the SEA to the faulting VCPU. If the memory error consumption or the fault that cause SEA is not from guest kernel, the blast radius can be limited to the poison-consuming guest process, while the VM can keep running. In addition, instead of doing under the hood without involving userspace, there are benefits to redirect the SEA to VMM: - VM customers care about the disruptions caused by memory errors, and VMM usually has the responsibility to start the process of notifying the customers of memory error events in their VMs. For example some cloud provider emits a critical log in their observability UI [1], and provides a playbook for customers on how to mitigate disruptions to their workloads. - VMM can protect future memory error consumption by unmapping the poisoned pages from stage-2 page table with KVM userfault [2], or by splitting the memslot that contains the poisoned pages. - VMM can keep track of SEA events in the VM. When VMM thinks the status on the host or the VM is bad enough, e.g. number of distinct SEAs exceeds a threshold, it can restart the VM on another healthy host. - Behavior parity with x86 architecture. When machine check exception (MCE) is caused by VCPU, kernel or KVM signals userspace SIGBUS to let VMM either recover from the MCE, or terminate itself with VM. The prior RFC proposes to implement SIGBUS on arm64 as well, but Marc preferred KVM exit over signal [3]. However, implementation aside, returning SEA to VMM is on par with returning MCE to VMM. Once SEA is redirected to VMM, among other actions, VMM is encouraged to inject external aborts into the faulting VCPU. New UAPIs ========= This patchset introduces following userspace-visible changes to empower VMM to control what happens for SEA on guest memory: - KVM_CAP_ARM_SEA_TO_USER. While taking SEA, if userspace has enabled this new capability at VM creation, and the SEA is not owned by kernel allocated memory, instead of injecting SError, return KVM_EXIT_ARM_SEA to userspace. - KVM_EXIT_ARM_SEA. This is the VM exit reason VMM gets. The details about the SEA is provided in arm_sea as much as possible, including sanitized ESR value at EL2, faulting guest virtual and physical addresses if available. * From v3 [4] - Rebased on commit 3a8660878839 ("Linux 6.18-rc1"). - In selftest, print a message if GVA or GPA expects to be valid. * From v2 [5]: - Rebased on "[PATCH] KVM: arm64: nv: Handle SEAs due to VNCR redirection" [6] and kvmarm/next commit 7b8346bd9fce6 ("KVM: arm64: Don't attempt vLPI mappings when vPE allocation is disabled") - Took the host_owns_sea implementation from Oliver [7, 8]. - Excluded the guest SEA injection patches. - Updated selftest. * From v1 [9]: - Rebased on commit 4d62121ce9b5 ("KVM: arm64: vgic-debug: Avoid dereferencing NULL ITE pointer"). - Sanitize ESR_EL2 before reporting it to userspace. - Do not do KVM_EXIT_ARM_SEA when SEA is caused by memory allocated to stage-2 translation table. [1] https://cloud.google.com/solutions/sap/docs/manage-host-errors [2] https://lore.kernel.org/kvm/20250109204929.1106563-1-jthoughton@google.com [3] https://lore.kernel.org/kvm/86pljbqqh0.wl-maz@kernel.org [4] https://lore.kernel.org/kvmarm/20250731205844.1346839-1-jiaqiyan@google.com [5] https://lore.kernel.org/kvm/20250604050902.3944054-1-jiaqiyan@google.com [6] https://lore.kernel.org/kvmarm/20250729182342.3281742-1-oliver.upton@linux.… [7] https://lore.kernel.org/kvm/aHFohmTb9qR_JG1E@linux.dev [8] https://lore.kernel.org/kvm/aHK-DPufhLy5Dtuk@linux.dev [9] https://lore.kernel.org/kvm/20250505161412.1926643-1-jiaqiyan@google.com Jiaqi Yan (3): KVM: arm64: VM exit to userspace to handle SEA KVM: selftests: Test for KVM_EXIT_ARM_SEA Documentation: kvm: new UAPI for handling SEA Documentation/virt/kvm/api.rst | 61 ++++ arch/arm64/include/asm/kvm_host.h | 2 + arch/arm64/kvm/arm.c | 5 + arch/arm64/kvm/mmu.c | 68 +++- include/uapi/linux/kvm.h | 10 + tools/arch/arm64/include/asm/esr.h | 2 + tools/testing/selftests/kvm/Makefile.kvm | 1 + .../testing/selftests/kvm/arm64/sea_to_user.c | 331 ++++++++++++++++++ tools/testing/selftests/kvm/lib/kvm_util.c | 1 + 9 files changed, 480 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/kvm/arm64/sea_to_user.c -- 2.51.0.760.g7b8bcc2412-goog

3 weeks

8
18
0 0

[PATCH v9 0/9] liveupdate: Rework KHO for in-kernel users

by Pasha Tatashin

Changelog: v9: Added review-bys and addressed comments from Mike Rapoport and Pratyush Yadav. Dropped patch that moves abort/finalize to public header per Mike's request. Added patch from Zhu Yanjun to output errors by name. This series appliyes against akpm's mm-unstable branch. This series refactors the KHO framework to better support in-kernel users like the upcoming LUO. The current design, which relies on a notifier chain and debugfs for control, is too restrictive for direct programmatic use. The core of this rework is the removal of the notifier chain in favor of a direct registration API. This decouples clients from the shutdown-time finalization sequence, allowing them to manage their preserved state more flexibly and at any time. In support of this new model, this series also: - Makes the debugfs interface optional. - Introduces APIs to unpreserve memory and fixes a bug in the abort path where client state was being incorrectly discarded. Note that this is an interim step, as a more comprehensive fix is planned as part of the stateless KHO work [1]. - Moves all KHO code into a new kernel/liveupdate/ directory to consolidate live update components. [1] https://lore.kernel.org/all/20251020100306.2709352-1-jasonmiu@google.com Mike Rapoport (Microsoft) (1): kho: drop notifiers Pasha Tatashin (7): kho: make debugfs interface optional kho: add interfaces to unpreserve folios, page ranges, and vmalloc memblock: Unpreserve memory in case of error test_kho: Unpreserve memory in case of error kho: don't unpreserve memory during abort liveupdate: kho: move to kernel/liveupdate MAINTAINERS: update KHO maintainers Zhu Yanjun (1): liveupdate: kho: Use %pe format specifier for error pointer printing Documentation/core-api/kho/concepts.rst | 2 +- MAINTAINERS | 4 +- include/linux/kexec_handover.h | 46 +- init/Kconfig | 2 + kernel/Kconfig.kexec | 24 - kernel/Makefile | 3 +- kernel/kexec_handover_internal.h | 16 - kernel/liveupdate/Kconfig | 39 ++ kernel/liveupdate/Makefile | 5 + kernel/{ => liveupdate}/kexec_handover.c | 532 +++++++----------- .../{ => liveupdate}/kexec_handover_debug.c | 0 kernel/liveupdate/kexec_handover_debugfs.c | 221 ++++++++ kernel/liveupdate/kexec_handover_internal.h | 56 ++ lib/test_kho.c | 128 +++-- mm/memblock.c | 93 +-- tools/testing/selftests/kho/vmtest.sh | 1 + 16 files changed, 690 insertions(+), 482 deletions(-) delete mode 100644 kernel/kexec_handover_internal.h create mode 100644 kernel/liveupdate/Kconfig create mode 100644 kernel/liveupdate/Makefile rename kernel/{ => liveupdate}/kexec_handover.c (80%) rename kernel/{ => liveupdate}/kexec_handover_debug.c (100%) create mode 100644 kernel/liveupdate/kexec_handover_debugfs.c create mode 100644 kernel/liveupdate/kexec_handover_internal.h base-commit: 9ef7b034116354ee75502d1849280a4d2ff98a7c -- 2.51.1.930.gacf6e81ea2-goog

3 weeks

7
40
0 0

[GIT PULL] kselftest fixes update for Linux 6.18-rc6

by Shuah Khan

Hi Linus, Please pull this kselftest fixes update for Linux 6.18-rc6 Fixes event-filter-function.tc tracing test failure caused when a first run to sample events triggers kmem_cache_free which interferes with the rest of the test. Fix this calling sample_events twice to eliminate the kmem_cache_free related noise from the sampling. diff is attached. thanks, -- Shuah ---------------------------------------------------------------- The following changes since commit 920aa3a7705a061cb3004572d8b7932b54463dbf: selftests: cachestat: Fix warning on declaration under label (2025-10-22 09:23:18 -0600) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-fixes-6.18-rc6 for you to fetch changes up to dd4adb986a86727ed8f56c48b6d0695f1e211e65: selftests/tracing: Run sample events to clear page cache events (2025-11-10 18:00:07 -0700) ---------------------------------------------------------------- linux_kselftest-fixes-6.18-rc6 Fixes event-filter-function.tc tracing test failure caused when a first run to sample events triggers kmem_cache_free which interferes with the rest of the test. Fix this calling sample_events twice to eliminate the kmem_cache_free related noise from the sampling. ---------------------------------------------------------------- Steven Rostedt (1): selftests/tracing: Run sample events to clear page cache events tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc | 4 ++++ 1 file changed, 4 insertions(+) ----------------------------------------------------------------

3 weeks

2
1
0 0

[PATCH v2] selftests/mm/uffd: remove static address usage in shmem_allocate_area()

by Mehdi Ben Hadj Khelifa

The current shmem_allocate_area() implementation uses a hardcoded virtual base address (BASE_PMD_ADDR) as a hint for mmap() when creating shmem-backed test areas. This approach is fragile and may fail on systems with ASLR or different virtual memory layouts, where the chosen address is unavailable. Replace the static base address with a dynamically reserved address range obtained via mmap(NULL, ..., PROT_NONE). The memfd-backed areas and their alias are then mapped into that reserved region using MAP_FIXED, preserving the original layout and aliasing semantics while avoiding collisions with unrelated mappings. This change improves robustness and portability of the test suite without altering its behavior or coverage. Suggested-by: Mike Rapoport <rppt(a)kernel.org> Signed-off-by: Mehdi Ben Hadj Khelifa <mehdi.benhadjkhelifa(a)gmail.com> --- Testing(Retested): A diff between running the mm selftests on 6.18-rc5 from before and after the change show no regression on x86_64 architecture with 32GB DDR5 RAM. ChangeLog: Changes from v1: -Implemented Mike's suggestions to make cleanup code more clear. Link:https://lore.kernel.org/all/20251111205739.420009-1-mehdi.benhadjkheli… tools/testing/selftests/mm/uffd-common.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/mm/uffd-common.c b/tools/testing/selftests/mm/uffd-common.c index 994fe8c03923..edd02328f77b 100644 --- a/tools/testing/selftests/mm/uffd-common.c +++ b/tools/testing/selftests/mm/uffd-common.c @@ -10,7 +10,6 @@ uffd_test_ops_t *uffd_test_ops; uffd_test_case_ops_t *uffd_test_case_ops; -#define BASE_PMD_ADDR ((void *)(1UL << 30)) /* pthread_mutex_t starts at page offset 0 */ pthread_mutex_t *area_mutex(char *area, unsigned long nr, uffd_global_test_opts_t *gopts) @@ -142,30 +141,37 @@ static int shmem_allocate_area(uffd_global_test_opts_t *gopts, void **alloc_area unsigned long offset = is_src ? 0 : bytes; char *p = NULL, *p_alias = NULL; int mem_fd = uffd_mem_fd_create(bytes * 2, false); + size_t region_size = bytes * 2 + hpage_size; - /* TODO: clean this up. Use a static addr is ugly */ - p = BASE_PMD_ADDR; - if (!is_src) - /* src map + alias + interleaved hpages */ - p += 2 * (bytes + hpage_size); + void *reserve = mmap(NULL, region_size, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, + -1, 0); + if (reserve == MAP_FAILED) { + close(mem_fd); + return -errno; + } + + p = reserve; p_alias = p; p_alias += bytes; p_alias += hpage_size; /* Prevent src/dst VMA merge */ - *alloc_area = mmap(p, bytes, PROT_READ | PROT_WRITE, MAP_SHARED, + *alloc_area = mmap(p, bytes, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, mem_fd, offset); if (*alloc_area == MAP_FAILED) { *alloc_area = NULL; + munmap(reserve, region_size); + close(mem_fd); return -errno; } if (*alloc_area != p) err("mmap of memfd failed at %p", p); - area_alias = mmap(p_alias, bytes, PROT_READ | PROT_WRITE, MAP_SHARED, + area_alias = mmap(p_alias, bytes, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, mem_fd, offset); if (area_alias == MAP_FAILED) { - munmap(*alloc_area, bytes); *alloc_area = NULL; + munmap(reserve, region_size); + close(mem_fd); return -errno; } if (area_alias != p_alias) -- 2.51.2

3 weeks

2
1
0 0

[PATCH] selftests/mm/uffd: remove static address usage in shmem_allocate_area()

by Mehdi Ben Hadj Khelifa

The current shmem_allocate_area() implementation uses a hardcoded virtual base address(BASE_PMD_ADDR) as a hint for mmap() when creating shmem-backed test areas. This approach is fragile and may fail on systems with ASLR or different virtual memory layouts, where the chosen address is unavailable. Replace the static base address with a dynamically reserved address range obtained via mmap(NULL, ..., PROT_NONE). The memfd-backed areas and their alias are then mapped into that reserved region using MAP_FIXED, preserving the original layout and aliasing semantics while avoiding collisions with unrelated mappings. This change improves robustness and portability of the test suite without altering its behavior or coverage. Signed-off-by: Mehdi Ben Hadj Khelifa <mehdi.benhadjkhelifa(a)gmail.com> --- Testing: A diff between running the mm selftests on 6.18-rc5 from before and after the change show no regression on x86_64 architecture with 32GB DDR5 RAM. tools/testing/selftests/mm/uffd-common.c | 25 +++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/mm/uffd-common.c b/tools/testing/selftests/mm/uffd-common.c index 994fe8c03923..492b21c960bb 100644 --- a/tools/testing/selftests/mm/uffd-common.c +++ b/tools/testing/selftests/mm/uffd-common.c @@ -6,11 +6,11 @@ */ #include "uffd-common.h" +#include "asm-generic/mman-common.h" uffd_test_ops_t *uffd_test_ops; uffd_test_case_ops_t *uffd_test_case_ops; -#define BASE_PMD_ADDR ((void *)(1UL << 30)) /* pthread_mutex_t starts at page offset 0 */ pthread_mutex_t *area_mutex(char *area, unsigned long nr, uffd_global_test_opts_t *gopts) @@ -142,30 +142,37 @@ static int shmem_allocate_area(uffd_global_test_opts_t *gopts, void **alloc_area unsigned long offset = is_src ? 0 : bytes; char *p = NULL, *p_alias = NULL; int mem_fd = uffd_mem_fd_create(bytes * 2, false); + size_t region_size = bytes * 2 + hpage_size; - /* TODO: clean this up. Use a static addr is ugly */ - p = BASE_PMD_ADDR; - if (!is_src) - /* src map + alias + interleaved hpages */ - p += 2 * (bytes + hpage_size); + void *reserve = mmap(NULL, region_size, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, + -1, 0); + if (reserve == MAP_FAILED) { + close(mem_fd); + return -errno; + } + + p = (char *)reserve; p_alias = p; p_alias += bytes; p_alias += hpage_size; /* Prevent src/dst VMA merge */ - *alloc_area = mmap(p, bytes, PROT_READ | PROT_WRITE, MAP_SHARED, + *alloc_area = mmap(p, bytes, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, mem_fd, offset); if (*alloc_area == MAP_FAILED) { + munmap(reserve, region_size); *alloc_area = NULL; + close(mem_fd); return -errno; } if (*alloc_area != p) err("mmap of memfd failed at %p", p); - area_alias = mmap(p_alias, bytes, PROT_READ | PROT_WRITE, MAP_SHARED, + area_alias = mmap(p_alias, bytes, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, mem_fd, offset); if (area_alias == MAP_FAILED) { - munmap(*alloc_area, bytes); + munmap(reserve, region_size); *alloc_area = NULL; + close(mem_fd); return -errno; } if (area_alias != p_alias) -- 2.51.2

3 weeks

2
2
0 0

[PATCH v3 0/2] libbpf: fix BTF dedup to support recursive typedef

by Paul Houssel

Pahole fails to encode BTF for some Go projects (e.g. Kubernetes and Podman) due to recursive type definitions that create reference loops not representable in C. These recursive typedefs trigger a failure in the BTF deduplication algorithm. This patch extends btf_dedup_struct_types() to properly handle potential recursion for BTF_KIND_TYPEDEF, similar to how recursion is already handled for BTF_KIND_STRUCT. This allows pahole to successfully generate BTF for Go binaries using recursive types without impacting existing C-based workflows. Changes in v3: 1. Patch 1: Adjusted the comment of btf_dedup_ref_type() to refer to typedef as well. 2. Patch 2: Update of the "dedup: recursive typedef" test to include a duplicated version of the types to make sure deduplication still happens in this case. Changes in v2: 1. Patch 1: Refactored code to prevent copying existing logic. Instead of adding a new function we modify the existing btf_dedup_struct_type() function to handle the BTF_KIND_TYPEDEF case. Calls to btf_hash_struct() and btf_shallow_equal_struct() are replaced with calls to functions that select btf_hash_struct() / btf_hash_typedef() based on the type. 2. Patch 2: Added tests v2: https://lore.kernel.org/lkml/cover.1762956564.git.paul.houssel@orange.com/ v1: https://lore.kernel.org/lkml/20251107153408.159342-1-paulhoussel2@gmail.com/ Paul Houssel (2): libbpf: fix BTF dedup to support recursive typedef definitions selftests/bpf: add BTF dedup tests for recursive typedef definitions tools/lib/bpf/btf.c | 73 +++++++++++++++----- tools/testing/selftests/bpf/prog_tests/btf.c | 65 +++++++++++++++++ 2 files changed, 121 insertions(+), 17 deletions(-) -- 2.51.0

3 weeks

2
3
0 0

[PATCH v2 0/4] KVM ARM64 pre_fault_memory

by Jack Thomson

From: Jack Thomson <jackabt(a)amazon.com> This patch series adds ARM64 support for the KVM_PRE_FAULT_MEMORY feature, which was previously only available on x86 [1]. This allows us to reduce the number of stage-2 faults during execution. This is of benefit in post-copy migration scenarios, particularly in memory intensive applications, where we are experiencing high latencies due to the stage-2 faults. Patch Overview: - The first patch adds support for the KVM_PRE_FAULT_MEMORY ioctl on arm64. - The second patch fixes an issue with unaligned mmap allocations in the selftests. - The third patch updates the pre_fault_memory_test to support arm64. - The last patch extends the pre_fault_memory_test to cover different vm memory backings. === Changes Since v1 [2] === Addressing feedback from Oliver: - No pre-fault flag is passed to user_mem_abort() or gmem_abort() now aborts are synthesized. - Remove retry loop from kvm_arch_vcpu_pre_fault_memory() [1]: https://lore.kernel.org/kvm/20240710174031.312055-1-pbonzini@redhat.com [2]: https://lore.kernel.org/all/20250911134648.58945-1-jackabt.amazon@gmail.com Jack Thomson (4): KVM: arm64: Add pre_fault_memory implementation KVM: selftests: Fix unaligned mmap allocations KVM: selftests: Enable pre_fault_memory_test for arm64 KVM: selftests: Add option for different backing in pre-fault tests Documentation/virt/kvm/api.rst | 3 +- arch/arm64/kvm/Kconfig | 1 + arch/arm64/kvm/arm.c | 1 + arch/arm64/kvm/mmu.c | 73 +++++++++++- tools/testing/selftests/kvm/Makefile.kvm | 1 + tools/testing/selftests/kvm/lib/kvm_util.c | 12 +- .../selftests/kvm/pre_fault_memory_test.c | 110 +++++++++++++----- 7 files changed, 163 insertions(+), 38 deletions(-) base-commit: 42188667be387867d2bf763d028654cbad046f7b -- 2.43.0

3 weeks

4
11
0 0

[PATCH v5] selftests: af_unix: Add tests for ECONNRESET and EOF semantics

by Sunday Adelodun

Add selftests to verify and document Linux’s intended behaviour for UNIX domain sockets (SOCK_STREAM and SOCK_DGRAM) when a peer closes. The tests verify that: 1. SOCK_STREAM returns EOF when the peer closes normally. 2. SOCK_STREAM returns ECONNRESET if the peer closes with unread data. 3. SOCK_SEQPACKET returns EOF when the peer closes normally. 4. SOCK_SEQPACKET returns ECONNRESET if the peer closes with unread data. 5. SOCK_DGRAM does not return ECONNRESET when the peer closes. This follows up on review feedback suggesting a selftest to clarify Linux’s semantics. Suggested-by: Kuniyuki Iwashima <kuniyu(a)google.com> Signed-off-by: Sunday Adelodun <adelodunolaoluwa(a)yahoo.com> --- Changelog: changes made in v4 to v5: 1. Moved the send() call before the socket type check in Test 2 to ensure the unread data behavior is tested for SOCK_DGRAM as well. 2. Removed the misleading commend about accept() for clarity. 3. Applied indentation fixes for style consistency (alignment with open parenthesis). 4. Minor comment and formatting cleanups for clarity and adherence to kernel coding style. tools/testing/selftests/net/.gitignore | 1 + tools/testing/selftests/net/af_unix/Makefile | 1 + .../selftests/net/af_unix/unix_connreset.c | 178 ++++++++++++++++++ 3 files changed, 180 insertions(+) create mode 100644 tools/testing/selftests/net/af_unix/unix_connreset.c diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore index 439101b518ee..e89a60581a13 100644 --- a/tools/testing/selftests/net/.gitignore +++ b/tools/testing/selftests/net/.gitignore @@ -65,3 +65,4 @@ udpgso udpgso_bench_rx udpgso_bench_tx unix_connect +unix_connreset diff --git a/tools/testing/selftests/net/af_unix/Makefile b/tools/testing/selftests/net/af_unix/Makefile index de805cbbdf69..5826a8372451 100644 --- a/tools/testing/selftests/net/af_unix/Makefile +++ b/tools/testing/selftests/net/af_unix/Makefile @@ -7,6 +7,7 @@ TEST_GEN_PROGS := \ scm_pidfd \ scm_rights \ unix_connect \ + unix_connreset \ # end of TEST_GEN_PROGS include ../../lib.mk diff --git a/tools/testing/selftests/net/af_unix/unix_connreset.c b/tools/testing/selftests/net/af_unix/unix_connreset.c new file mode 100644 index 000000000000..9cb0f48597eb --- /dev/null +++ b/tools/testing/selftests/net/af_unix/unix_connreset.c @@ -0,0 +1,178 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Selftest for AF_UNIX socket close and ECONNRESET behaviour. + * + * This test verifies: + * 1. SOCK_STREAM returns EOF when the peer closes normally. + * 2. SOCK_STREAM returns ECONNRESET if peer closes with unread data. + * 3. SOCK_SEQPACKET returns EOF when the peer closes normally. + * 4. SOCK_SEQPACKET returns ECONNRESET if the peer closes with unread data. + * 5. SOCK_DGRAM does not return ECONNRESET when the peer closes. + * + * These tests document the intended Linux behaviour. + * + */ + +#define _GNU_SOURCE +#include <stdlib.h> +#include <string.h> +#include <fcntl.h> +#include <unistd.h> +#include <errno.h> +#include <sys/socket.h> +#include <sys/un.h> +#include "../../kselftest_harness.h" + +#define SOCK_PATH "/tmp/af_unix_connreset.sock" + +static void remove_socket_file(void) +{ + unlink(SOCK_PATH); +} + +FIXTURE(unix_sock) +{ + int server; + int client; + int child; +}; + +FIXTURE_VARIANT(unix_sock) +{ + int socket_type; + const char *name; +}; + +FIXTURE_VARIANT_ADD(unix_sock, stream) { + .socket_type = SOCK_STREAM, + .name = "SOCK_STREAM", +}; + +FIXTURE_VARIANT_ADD(unix_sock, dgram) { + .socket_type = SOCK_DGRAM, + .name = "SOCK_DGRAM", +}; + +FIXTURE_VARIANT_ADD(unix_sock, seqpacket) { + .socket_type = SOCK_SEQPACKET, + .name = "SOCK_SEQPACKET", +}; + +FIXTURE_SETUP(unix_sock) +{ + struct sockaddr_un addr = {}; + int err; + + addr.sun_family = AF_UNIX; + strcpy(addr.sun_path, SOCK_PATH); + remove_socket_file(); + + self->server = socket(AF_UNIX, variant->socket_type, 0); + ASSERT_LT(-1, self->server); + + err = bind(self->server, (struct sockaddr *)&addr, sizeof(addr)); + ASSERT_EQ(0, err); + + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + err = listen(self->server, 1); + ASSERT_EQ(0, err); + } + + self->client = socket(AF_UNIX, variant->socket_type | SOCK_NONBLOCK, 0); + ASSERT_LT(-1, self->client); + + err = connect(self->client, (struct sockaddr *)&addr, sizeof(addr)); + ASSERT_EQ(0, err); +} + +FIXTURE_TEARDOWN(unix_sock) +{ + if ((variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) & self->child > 0) + close(self->child); + + close(self->client); + close(self->server); + remove_socket_file(); +} + +/* Test 1: peer closes normally */ +TEST_F(unix_sock, eof) +{ + char buf[16] = {}; + ssize_t n; + + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + self->child = accept(self->server, NULL, NULL); + ASSERT_LT(-1, self->child); + + close(self->child); + } else { + close(self->server); + } + + n = recv(self->client, buf, sizeof(buf), 0); + + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + ASSERT_EQ(0, n); + } else { + ASSERT_EQ(-1, n); + ASSERT_EQ(EAGAIN, errno); + } +} + +/* Test 2: peer closes with unread data */ +TEST_F(unix_sock, reset_unread_behavior) +{ + char buf[16] = {}; + ssize_t n; + + /* Send data that will remain unread */ + send(self->client, "hello", 5, 0); + + if (variant->socket_type == SOCK_DGRAM) { + /* No real connection, just close the server */ + close(self->server); + } else { + /* Accept client connection */ + self->child = accept(self->server, NULL, NULL); + ASSERT_LT(-1, self->child); + + /* Peer closes before client reads */ + close(self->child); + } + + n = recv(self->client, buf, sizeof(buf), 0); + ASSERT_EQ(-1, n); + + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + ASSERT_EQ(ECONNRESET, errno); + } else { + ASSERT_EQ(EAGAIN, errno); + } +} + +/* Test 3: closing unaccepted (embryo) server socket should reset client. */ +TEST_F(unix_sock, reset_closed_embryo) +{ + char buf[16] = {}; + ssize_t n; + + if (variant->socket_type == SOCK_DGRAM) + SKIP(return, "This test only applies to SOCK_STREAM and SOCK_SEQPACKET"); + + /* Close server without accept()ing */ + close(self->server); + + n = recv(self->client, buf, sizeof(buf), 0); + + ASSERT_EQ(-1, n); + ASSERT_EQ(ECONNRESET, errno); +} + +TEST_HARNESS_MAIN + -- 2.43.0

3 weeks, 1 day

2
2
0 0

[PATCH v4] selftests: af_unix: Add tests for ECONNRESET and EOF semantics

by Sunday Adelodun

Add selftests to verify and document Linux’s intended behaviour for UNIX domain sockets (SOCK_STREAM and SOCK_DGRAM) when a peer closes. The tests verify that: 1. SOCK_STREAM returns EOF when the peer closes normally. 2. SOCK_STREAM returns ECONNRESET if the peer closes with unread data. 3. SOCK_SEQPACKET returns EOF when the peer closes normally. 4. SOCK_SEQPACKET returns ECONNRESET if the peer closes with unread data. 5. SOCK_DGRAM does not return ECONNRESET when the peer closes. This follows up on review feedback suggesting a selftest to clarify Linux’s semantics. Suggested-by: Kuniyuki Iwashima <kuniyu(a)google.com> Signed-off-by: Sunday Adelodun <adelodunolaoluwa(a)yahoo.com> --- tools/testing/selftests/net/.gitignore | 1 + tools/testing/selftests/net/af_unix/Makefile | 1 + .../selftests/net/af_unix/unix_connreset.c | 178 ++++++++++++++++++ 3 files changed, 180 insertions(+) create mode 100644 tools/testing/selftests/net/af_unix/unix_connreset.c diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore index 439101b518ee..e89a60581a13 100644 --- a/tools/testing/selftests/net/.gitignore +++ b/tools/testing/selftests/net/.gitignore @@ -65,3 +65,4 @@ udpgso udpgso_bench_rx udpgso_bench_tx unix_connect +unix_connreset diff --git a/tools/testing/selftests/net/af_unix/Makefile b/tools/testing/selftests/net/af_unix/Makefile index de805cbbdf69..5826a8372451 100644 --- a/tools/testing/selftests/net/af_unix/Makefile +++ b/tools/testing/selftests/net/af_unix/Makefile @@ -7,6 +7,7 @@ TEST_GEN_PROGS := \ scm_pidfd \ scm_rights \ unix_connect \ + unix_connreset \ # end of TEST_GEN_PROGS include ../../lib.mk diff --git a/tools/testing/selftests/net/af_unix/unix_connreset.c b/tools/testing/selftests/net/af_unix/unix_connreset.c new file mode 100644 index 000000000000..9413f8a0814f --- /dev/null +++ b/tools/testing/selftests/net/af_unix/unix_connreset.c @@ -0,0 +1,178 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Selftest for AF_UNIX socket close and ECONNRESET behaviour. + * + * This test verifies: + * 1. SOCK_STREAM returns EOF when the peer closes normally. + * 2. SOCK_STREAM returns ECONNRESET if peer closes with unread data. + * 3. SOCK_SEQPACKET returns EOF when the peer closes normally. + * 4. SOCK_SEQPACKET returns ECONNRESET if the peer closes with unread data. + * 5. SOCK_DGRAM does not return ECONNRESET when the peer closes. + * + * These tests document the intended Linux behaviour. + * + */ + +#define _GNU_SOURCE +#include <stdlib.h> +#include <string.h> +#include <fcntl.h> +#include <unistd.h> +#include <errno.h> +#include <sys/socket.h> +#include <sys/un.h> +#include "../../kselftest_harness.h" + +#define SOCK_PATH "/tmp/af_unix_connreset.sock" + +static void remove_socket_file(void) +{ + unlink(SOCK_PATH); +} + +FIXTURE(unix_sock) +{ + int server; + int client; + int child; +}; + +FIXTURE_VARIANT(unix_sock) +{ + int socket_type; + const char *name; +}; + +FIXTURE_VARIANT_ADD(unix_sock, stream) { + .socket_type = SOCK_STREAM, + .name = "SOCK_STREAM", +}; + +FIXTURE_VARIANT_ADD(unix_sock, dgram) { + .socket_type = SOCK_DGRAM, + .name = "SOCK_DGRAM", +}; + +FIXTURE_VARIANT_ADD(unix_sock, seqpacket) { + .socket_type = SOCK_SEQPACKET, + .name = "SOCK_SEQPACKET", +}; + +FIXTURE_SETUP(unix_sock) +{ + struct sockaddr_un addr = {}; + int err; + + addr.sun_family = AF_UNIX; + strcpy(addr.sun_path, SOCK_PATH); + remove_socket_file(); + + self->server = socket(AF_UNIX, variant->socket_type, 0); + ASSERT_LT(-1, self->server); + + err = bind(self->server, (struct sockaddr *)&addr, sizeof(addr)); + ASSERT_EQ(0, err); + + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + err = listen(self->server, 1); + ASSERT_EQ(0, err); + } + + self->client = socket(AF_UNIX, variant->socket_type | SOCK_NONBLOCK, 0); + ASSERT_LT(-1, self->client); + + err = connect(self->client, (struct sockaddr *)&addr, sizeof(addr)); + ASSERT_EQ(0, err); +} + +FIXTURE_TEARDOWN(unix_sock) +{ + if ((variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) & self->child > 0) + close(self->child); + + close(self->client); + close(self->server); + remove_socket_file(); +} + +/* Test 1: peer closes normally */ +TEST_F(unix_sock, eof) +{ + char buf[16] = {}; + ssize_t n; + + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + self->child = accept(self->server, NULL, NULL); + ASSERT_LT(-1, self->child); + + close(self->child); + } else { + close(self->server); + } + + n = recv(self->client, buf, sizeof(buf), 0); + + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + ASSERT_EQ(0, n); + } else { + ASSERT_EQ(-1, n); + ASSERT_EQ(EAGAIN, errno); + } +} + +/* Test 2: peer closes with unread data */ +TEST_F(unix_sock, reset_unread_behavior) +{ + char buf[16] = {}; + ssize_t n; + + if (variant->socket_type == SOCK_DGRAM) { + /* No real connection, just close the server */ + close(self->server); + } else { + /* Establish full connection first */ + self->child = accept(self->server, NULL, NULL); + ASSERT_LT(-1, self->child); + + /* Send data that will remain unread */ + send(self->client, "hello", 5, 0); + + /* Peer closes before client reads */ + close(self->child); + } + + n = recv(self->client, buf, sizeof(buf), 0); + ASSERT_EQ(-1, n); + + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + ASSERT_EQ(ECONNRESET, errno); + } else { + ASSERT_EQ(EAGAIN, errno); + } +} + +/* Test 3: closing unaccepted (embryo) server socket should reset client. */ +TEST_F(unix_sock, reset_closed_embryo) +{ + char buf[16] = {}; + ssize_t n; + + if (variant->socket_type == SOCK_DGRAM) + SKIP(return, "This test only applies to SOCK_STREAM and SOCK_SEQPACKET"); + + /* Close server without accept()ing */ + close(self->server); + + n = recv(self->client, buf, sizeof(buf), 0); + + ASSERT_EQ(-1, n); + ASSERT_EQ(ECONNRESET, errno); +} + +TEST_HARNESS_MAIN + -- 2.43.0

3 weeks, 1 day

2
6
0 0

[PATCH v1] cpuset: Avoid unnecessary partition invalidation

by Sun Shaojie

Currently, when a non-exclusive cpuset's "cpuset.cpus" overlaps with a partitioned sibling, the sibling's partition state becomes invalid. However, this invalidation is often unnecessary. This can be observed in specific configuration sequences: Case 1: Partition created first, then non-exclusive cpuset overlaps #1> mkdir -p /sys/fs/cgroup/A1 #2> echo "0-1" > /sys/fs/cgroup/A1/cpuset.cpus #3> echo "root" > /sys/fs/cgroup/A1/cpuset.cpus.partition #4> mkdir -p /sys/fs/cgroup/B1 #5> echo "0-3" > /sys/fs/cgroup/B1/cpuset.cpus // A1's partition becomes "root invalid" - this is unnecessary Case 2: Non-exclusive cpuset exists first, then partition created #1> mkdir -p /sys/fs/cgroup/B1 #2> echo "0-1" > /sys/fs/cgroup/B1/cpuset.cpus #3> mkdir -p /sys/fs/cgroup/A1 #4> echo "0-1" > /sys/fs/cgroup/A1/cpuset.cpus #5> echo "root" > /sys/fs/cgroup/A1/cpuset.cpus.partition // A1's partition becomes "root invalid" - this is unnecessary In Case 1, the effective CPU mask of B1 can differ from its requested mask. B1 can use CPUs 2-3 which don't overlap with A1's exclusive CPUs (0-1), thus not violating A1's exclusivity requirement. In Case 2, B1 can inherit the effective CPUs from its parent, so there is no need to invalidate A1's partition state. This patch relaxes the overlap check to only consider conflicts between partitioned siblings, not between a partitioned cpuset and a regular non-exclusive one. Signed-off-by: Sun Shaojie <sunshaojie(a)kylinos.cn> --- kernel/cgroup/cpuset.c | 8 ++++---- tools/testing/selftests/cgroup/test_cpuset_prs.sh | 10 +++++----- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..e0d27c9a101a 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -586,14 +586,14 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2) * Returns: true if CPU exclusivity conflict exists, false otherwise * * Conflict detection rules: - * 1. If either cpuset is CPU exclusive, they must be mutually exclusive + * 1. If both cpusets are exclusive, they must be mutually exclusive * 2. exclusive_cpus masks cannot intersect between cpusets * 3. The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs */ static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) { - /* If either cpuset is exclusive, check if they are mutually exclusive */ - if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2)) + /* If both cpusets are exclusive, check if they are mutually exclusive */ + if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2)) return !cpusets_are_exclusive(cs1, cs2); /* Exclusive_cpus cannot intersect */ @@ -695,7 +695,7 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial) goto out; /* - * If either I or some sibling (!= me) is exclusive, we can't + * If both I and some sibling (!= me) are exclusive, we can't * overlap. exclusive_cpus cannot overlap with each other if set. */ ret = -EINVAL; diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh index a17256d9f88a..903dddfe88d7 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -269,7 +269,7 @@ TEST_MATRIX=( " C0-3:S+ C1-3:S+ C2-3 . X2-3 X3:P2 . . 0 A1:0-2|A2:3|A3:3 A1:P0|A2:P2 3" " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3" " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2:C3 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3" - " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-3|A2:1-3|A3:2-3|B1:2-3 A1:P0|A3:P0|B1:P-2" + " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-1|A2:1|A3:1|B1:2-3 A1:P0|A3:P0|B1:P2 2-3" " C0-3:S+ C1-3:S+ C2-3 C4-5 . . . P2 0 B1:4-5 B1:P2 4-5" " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4" " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2:C1-3 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4" @@ -318,7 +318,7 @@ TEST_MATRIX=( # Invalid to valid local partition direct transition tests " C1-3:S+:P2 X4:P2 . . . . . . 0 A1:1-3|XA1:1-3|A2:1-3:XA2: A1:P2|A2:P-2 1-3" " C1-3:S+:P2 X4:P2 . . . X3:P2 . . 0 A1:1-2|XA1:1-3|A2:3:XA2:3 A1:P2|A2:P2 1-3" - " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:4-6 A1:P-2|B1:P0" + " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:5-6 A1:P2|B1:P0 0-4" " C0-3:P2 . . C4-6 C0-4:C0-3 . . . 0 A1:0-3|B1:4-6 A1:P2|B1:P0 0-3" # Local partition invalidation tests @@ -388,10 +388,10 @@ TEST_MATRIX=( " C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:1 A1:P0|A2:P-2" " C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1 A1:P1|A2:P2 0-1|1" - # A non-exclusive cpuset.cpus change will invalidate partition and its siblings - " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P0" + # A non-exclusive cpuset.cpus change will not invalidate partition and its siblings + " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:3 A1:P1|B1:P0" " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P-1" - " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P0|B1:P-1" + " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P0|B1:P1" # cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it " C0-3 . . C4-5 X5 . . . 0 A1:0-3|B1:4-5" -- 2.25.1

3 weeks, 1 day

3
11
0 0

[PATCH net-next v3 0/6] netconsole: support automatic target recovery

by Andre Carvalho

This patchset introduces target resume capability to netconsole allowing it to recover targets when underlying low-level interface comes back online. The patchset starts by refactoring netconsole state representation in order to allow representing deactivated targets (targets that are disabled due to interfaces going down). It then modifies netconsole to handle NETDEV_UP events for such targets and setups netpoll. Targets are matched with incoming interfaces depending on how they were initially bound in netconsole (by mac or interface name). The patchset includes a selftest that validates netconsole target state transitions and that target is functional after resumed. Signed-off-by: Andre Carvalho <asantostc(a)gmail.com> --- Changes in v3: - Resume by mac or interface name depending on how target was created. - Attempt to resume target without holding target list lock, by moving the target to a temporary list. This is required as netpoll may attempt to allocate memory. - Link to v2: https://lore.kernel.org/r/20250921-netcons-retrigger-v2-0-a0e84006237f@gmai… Changes in v2: - Attempt to resume target in the same thread, instead of using workqueue . - Add wrapper around __netpoll_setup (patch 4). - Renamed resume_target to maybe_resume_target and moved conditionals to inside its implementation, keeping code more clear. - Verify that device addr matches target mac address when target was setup using mac. - Update selftest to cover targets bound by mac and interface name. - Fix typo in selftest comment and sort tests alphabetically in Makefile. - Link to v1: https://lore.kernel.org/r/20250909-netcons-retrigger-v1-0-3aea904926cf@gmai… --- Andre Carvalho (4): netconsole: convert 'enabled' flag to enum for clearer state management netpoll: add wrapper around __netpoll_setup with dev reference netconsole: resume previously deactivated target selftests: netconsole: validate target resume Breno Leitao (2): netconsole: add target_state enum netconsole: add STATE_DEACTIVATED to track targets disabled by low level drivers/net/netconsole.c | 126 ++++++++++++++++----- include/linux/netpoll.h | 1 + net/core/netpoll.c | 20 ++++ tools/testing/selftests/drivers/net/Makefile | 1 + .../selftests/drivers/net/lib/sh/lib_netcons.sh | 30 ++++- .../selftests/drivers/net/netcons_resume.sh | 92 +++++++++++++++ 6 files changed, 238 insertions(+), 32 deletions(-) --- base-commit: a0c3aefb08cd81864b17c23c25b388dba90b9dad change-id: 20250816-netcons-retrigger-a4f547bfc867 Best regards, -- Andre Carvalho <asantostc(a)gmail.com>

3 weeks, 1 day

3
13
0 0

[PATCH v3 0/4] vfio: selftests: update DMA mapping tests to use queried IOVA ranges

by Alex Mastro

Not all IOMMUs support the same virtual address width as the processor, for instance older Intel consumer platforms only support 39-bits of IOMMU address space. On such platforms, using the virtual address as the IOVA and mappings at the top of the address space both fail. VFIO and IOMMUFD have facilities for retrieving valid IOVA ranges, VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE and IOMMU_IOAS_IOVA_RANGES, respectively. These provide compatible arrays of ranges from which we can construct a simple allocator. Use this new allocator in place of reusing the virtual address, and incorporate the maximum supported IOVA into the limit testing. This latter change doesn't test quite the same absolute end-of-address space behavior but still seems to have some value. This series is based on Alex Williamson's "Incorporate IOVA range info" [1] along with feedback from the discussion in David Matlack's "Skip vfio_dma_map_limit_test if mapping returns -EINVAL" [2]. Given David's plans to split IOMMU concerns from devices as described in [3], this series' home for `struct iova_allocator` and IOVA range helpers are likely to be short lived, since they reside in vfio_pci_device.c. I assume that the rework can move this functionality to a more appropriate location next to other IOMMU-focused code, once such a place exists. [1] https://lore.kernel.org/all/20251108212954.26477-1-alex@shazbot.org/#t [2] https://lore.kernel.org/all/20251107222058.2009244-1-dmatlack@google.com/ [3] https://lore.kernel.org/all/aRIoKJk0uwLD-yGr@google.com/ To: Alex Williamson <alex(a)shazbot.org> To: David Matlack <dmatlack(a)google.com> To: Shuah Khan <shuah(a)kernel.org> To: Jason Gunthorpe <jgg(a)ziepe.ca> Cc: kvm(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org Signed-off-by: Alex Mastro <amastro(a)fb.com> Changes in v3: - Update capability chain cycle detection - Clarify the iova=vaddr commit message - Link to v2: https://lore.kernel.org/r/20251111-iova-ranges-v2-0-0fa267ff9b78@fb.com Changes in v2: - Fix various nits - calloc() where appropriate - Update overflow test to run regardless of iova range constraints - Change iova_allocator_init() to return an allocated struct - Unfold iova_allocator_alloc() - Fix iova allocator initial state bug - Update vfio_pci_driver_test to use iova allocator - Link to v1: https://lore.kernel.org/r/20251110-iova-ranges-v1-0-4d441cf5bf6d@fb.com --- Alex Mastro (4): vfio: selftests: add iova range query helpers vfio: selftests: fix map limit tests to use last available iova vfio: selftests: add iova allocator vfio: selftests: replace iova=vaddr with allocated iovas .../testing/selftests/vfio/lib/include/vfio_util.h | 19 +- tools/testing/selftests/vfio/lib/vfio_pci_device.c | 246 ++++++++++++++++++++- .../testing/selftests/vfio/vfio_dma_mapping_test.c | 20 +- .../testing/selftests/vfio/vfio_pci_driver_test.c | 12 +- 4 files changed, 288 insertions(+), 9 deletions(-) --- base-commit: 0ed3a30fd996cb0cac872432cf25185fda7e5316 change-id: 20251110-iova-ranges-1c09549fbf63 Best regards, -- Alex Mastro <amastro(a)fb.com>

3 weeks, 1 day

3
10
0 0

[PATCH v3] selftests: af_unix: Add tests for ECONNRESET and EOF semantics

by Sunday Adelodun

Add selftests to verify and document Linux’s intended behaviour for UNIX domain sockets (SOCK_STREAM and SOCK_DGRAM) when a peer closes. The tests verify that: 1. SOCK_STREAM returns EOF when the peer closes normally. 2. SOCK_STREAM returns ECONNRESET if the peer closes with unread data. 3. SOCK_SEQPACKET returns EOF when the peer closes normally. 4. SOCK_SEQPACKET returns ECONNRESET if the peer closes with unread data. 5. SOCK_DGRAM does not return ECONNRESET when the peer closes. This follows up on review feedback suggesting a selftest to clarify Linux’s semantics. Suggested-by: Kuniyuki Iwashima <kuniyu(a)google.com> Signed-off-by: Sunday Adelodun <adelodunolaoluwa(a)yahoo.com> --- tools/testing/selftests/net/af_unix/Makefile | 1 + .../selftests/net/af_unix/unix_connreset.c | 179 ++++++++++++++++++ 2 files changed, 180 insertions(+) create mode 100644 tools/testing/selftests/net/af_unix/unix_connreset.c diff --git a/tools/testing/selftests/net/af_unix/Makefile b/tools/testing/selftests/net/af_unix/Makefile index de805cbbdf69..5826a8372451 100644 --- a/tools/testing/selftests/net/af_unix/Makefile +++ b/tools/testing/selftests/net/af_unix/Makefile @@ -7,6 +7,7 @@ TEST_GEN_PROGS := \ scm_pidfd \ scm_rights \ unix_connect \ + unix_connreset \ # end of TEST_GEN_PROGS include ../../lib.mk diff --git a/tools/testing/selftests/net/af_unix/unix_connreset.c b/tools/testing/selftests/net/af_unix/unix_connreset.c new file mode 100644 index 000000000000..6f43435d96e2 --- /dev/null +++ b/tools/testing/selftests/net/af_unix/unix_connreset.c @@ -0,0 +1,179 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Selftest for AF_UNIX socket close and ECONNRESET behaviour. + * + * This test verifies: + * 1. SOCK_STREAM returns EOF when the peer closes normally. + * 2. SOCK_STREAM returns ECONNRESET if peer closes with unread data. + * 3. SOCK_SEQPACKET returns EOF when the peer closes normally. + * 4. SOCK_SEQPACKET returns ECONNRESET if the peer closes with unread data. + * 5. SOCK_DGRAM does not return ECONNRESET when the peer closes. + * + * These tests document the intended Linux behaviour. + * + */ + +#define _GNU_SOURCE +#include <stdlib.h> +#include <string.h> +#include <fcntl.h> +#include <unistd.h> +#include <errno.h> +#include <sys/socket.h> +#include <sys/un.h> +#include "../../kselftest_harness.h" + +#define SOCK_PATH "/tmp/af_unix_connreset.sock" + +static void remove_socket_file(void) +{ + unlink(SOCK_PATH); +} + +FIXTURE(unix_sock) +{ + int server; + int client; + int child; +}; + +FIXTURE_VARIANT(unix_sock) +{ + int socket_type; + const char *name; +}; + +/* Define variants: stream and datagram */ +FIXTURE_VARIANT_ADD(unix_sock, stream) { + .socket_type = SOCK_STREAM, + .name = "SOCK_STREAM", +}; + +FIXTURE_VARIANT_ADD(unix_sock, dgram) { + .socket_type = SOCK_DGRAM, + .name = "SOCK_DGRAM", +}; + +FIXTURE_VARIANT_ADD(unix_sock, seqpacket) { + .socket_type = SOCK_SEQPACKET, + .name = "SOCK_SEQPACKET", +}; + +FIXTURE_SETUP(unix_sock) +{ + struct sockaddr_un addr = {}; + int err; + + addr.sun_family = AF_UNIX; + strcpy(addr.sun_path, SOCK_PATH); + remove_socket_file(); + + self->server = socket(AF_UNIX, variant->socket_type, 0); + ASSERT_LT(-1, self->server); + + err = bind(self->server, (struct sockaddr *)&addr, sizeof(addr)); + ASSERT_EQ(0, err); + + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + err = listen(self->server, 1); + ASSERT_EQ(0, err); + + self->client = socket(AF_UNIX, variant->socket_type, 0); + ASSERT_LT(-1, self->client); + + err = connect(self->client, (struct sockaddr *)&addr, sizeof(addr)); + ASSERT_EQ(0, err); + + self->child = accept(self->server, NULL, NULL); + ASSERT_LT(-1, self->child); + } else { + /* Datagram: bind and connect only */ + self->client = socket(AF_UNIX, SOCK_DGRAM | SOCK_NONBLOCK, 0); + ASSERT_LT(-1, self->client); + + err = connect(self->client, (struct sockaddr *)&addr, sizeof(addr)); + ASSERT_EQ(0, err); + } +} + +FIXTURE_TEARDOWN(unix_sock) +{ + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) + close(self->child); + + close(self->client); + close(self->server); + remove_socket_file(); +} + +/* Test 1: peer closes normally */ +TEST_F(unix_sock, eof) +{ + char buf[16] = {}; + ssize_t n; + + /* Peer closes normally */ + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) + close(self->child); + else + close(self->server); + + n = recv(self->client, buf, sizeof(buf), 0); + TH_LOG("%s: recv=%zd errno=%d (%s)", variant->name, n, errno, strerror(errno)); + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + ASSERT_EQ(0, n); + } else { + ASSERT_EQ(-1, n); + ASSERT_EQ(EAGAIN, errno); + } +} + +/* Test 2: peer closes with unread data */ +TEST_F(unix_sock, reset_unread) +{ + char buf[16] = {}; + ssize_t n; + + /* Send data that will remain unread by client */ + send(self->client, "hello", 5, 0); + close(self->child); + + n = recv(self->client, buf, sizeof(buf), 0); + TH_LOG("%s: recv=%zd errno=%d (%s)", variant->name, n, errno, strerror(errno)); + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + ASSERT_EQ(-1, n); + ASSERT_EQ(ECONNRESET, errno); + } else { + ASSERT_EQ(-1, n); + ASSERT_EQ(EAGAIN, errno); + } +} + +/* Test 3: SOCK_DGRAM peer close */ +TEST_F(unix_sock, dgram_reset) +{ + char buf[16] = {}; + ssize_t n; + + send(self->client, "hello", 5, 0); + close(self->server); + + n = recv(self->client, buf, sizeof(buf), 0); + TH_LOG("%s: recv=%zd errno=%d (%s)", variant->name, n, errno, strerror(errno)); + if (variant->socket_type == SOCK_STREAM || + variant->socket_type == SOCK_SEQPACKET) { + ASSERT_EQ(-1, n); + ASSERT_EQ(ECONNRESET, errno); + } else { + ASSERT_EQ(-1, n); + ASSERT_EQ(EAGAIN, errno); + } +} + +TEST_HARNESS_MAIN + -- 2.43.0

3 weeks, 1 day

2
6
0 0

[PATCH v2 0/2] libbpf: fix BTF dedup to support recursive typedef

by Paul Houssel

Pahole fails to encode BTF for some Go projects (e.g. Kubernetes and Podman) due to recursive type definitions that create reference loops not representable in C. These recursive typedefs trigger a failure in the BTF deduplication algorithm. This patch extends btf_dedup_struct_types() to properly handle potential recursion for BTF_KIND_TYPEDEF, similar to how recursion is already handled for BTF_KIND_STRUCT. This allows pahole to successfully generate BTF for Go binaries using recursive types without impacting existing C-based workflows. Changes in v2: 1. Patch 1: Refactored code to prevent copying existing logic. Instead of adding a new function () we modify the existing btf_dedup_struct_type() function to handle the BTF_KIND_TYPEDEF case. Calls to btf_hash_struct() and btf_shallow_equal_struct() are replaced with calls to functions that select btf_hash_struct() / btf_hash_typedef() based on the type. 2. Patch 2: Added tests v1: https://lore.kernel.org/lkml/20251107153408.159342-1-paulhoussel2@gmail.com/ Paul Houssel (2): libbpf: fix BTF dedup to support recursive typedef definitions selftests/bpf: add BTF dedup tests for recursive typedef definitions tools/lib/bpf/btf.c | 59 +++++++++++++++---- tools/testing/selftests/bpf/prog_tests/btf.c | 61 ++++++++++++++++++++ 2 files changed, 110 insertions(+), 10 deletions(-) -- 2.51.0

3 weeks, 1 day

2
4
0 0

[PATCHSET v10 sched_ext/for-6.19] Add a deadline server for sched_ext tasks

by Andrea Righi

sched_ext tasks can be starved by long-running RT tasks, especially since RT throttling was replaced by deadline servers to boost only SCHED_NORMAL tasks. Several users in the community have reported issues with RT stalling sched_ext tasks. This is fairly common on distributions or environments where applications like video compositors, audio services, etc. run as RT tasks by default. Example trace (showing a per-CPU kthread stalled due to the sway Wayland compositor running as an RT task): runnable task stall (kworker/0:0[106377] failed to run for 5.043s) ... CPU 0 : nr_run=3 flags=0xd cpu_rel=0 ops_qseq=20646200 pnt_seq=45388738 curr=sway[994] class=rt_sched_class R kworker/0:0[106377] -5043ms scx_state/flags=3/0x1 dsq_flags=0x0 ops_state/qseq=0/0 sticky/holding_cpu=-1/-1 dsq_id=0x8000000000000002 dsq_vtime=0 slice=20000000 cpus=01 This is often perceived as a bug in the BPF schedulers, but in reality schedulers can't do much: RT tasks run outside their control and can potentially consume 100% of the CPU bandwidth. Fix this by adding a sched_ext deadline server, so that sched_ext tasks are also boosted and do not suffer starvation. Two kselftests are also provided to verify the starvation fixes and bandwidth allocation is correct. == Highlights in this version == - wait for inactive_task_timer() to fire before removing the bandwidth reservation (Juri/Peter: please check if this new dl_server_remove_params() implementation makes sense to you) - removed the explicit dl_server_stop() from dequeue_task_scx() and rely on the delayed stop behavior (Juri/Peter: ditto) This patchset is also available in the following git branch: git://git.kernel.org/pub/scm/linux/kernel/git/arighi/linux.git scx-dl-server Changes in v10: - reordered patches to better isolate sched_ext changes vs sched/deadline changes (Andrea Righi) - define ext_server only with CONFIG_SCHED_CLASS_EXT=y (Andrea Righi) - add WARN_ON_ONCE(!cpus) check in dl_server_apply_params() (Andrea Righi) - wait for inactive_task_timer to fire before removing the bandwidth reservation (Juri Lelli) - remove explicit dl_server_stop() in dequeue_task_scx() to reduce timer reprogramming overhead (Juri Lelli) - do not restart pick_task() when invoked by the dl_server (Tejun Heo) - rename rq_dl_server to dl_server (Peter Zijlstra) - fixed a missing dl_server start in dl_server_on() (Christian Loehle) - add a comment to the rt_stall selftest to better explain the 4% threshold (Emil Tsalapatis) Changes in v9: - Drop the ->balance() logic as its functionality is now integrated into ->pick_task(), allowing dl_server to call pick_task_scx() directly - Link to v8: https://lore.kernel.org/all/20250903095008.162049-1-arighi@nvidia.com/ Changes in v8: - Add tj's patch to de-couple balance and pick_task and avoid changing sched/core callbacks to propagate @rf - Simplify dl_se->dl_server check (suggested by PeterZ) - Small coding style fixes in the kselftests - Link to v7: https://lore.kernel.org/all/20250809184800.129831-1-joelagnelf@nvidia.com/ Changes in v7: - Rebased to Linus master - Link to v6: https://lore.kernel.org/all/20250702232944.3221001-1-joelagnelf@nvidia.com/ Changes in v6: - Added Acks to few patches - Fixes to few nits suggested by Tejun - Link to v5: https://lore.kernel.org/all/20250620203234.3349930-1-joelagnelf@nvidia.com/ Changes in v5: - Added a kselftest (total_bw) to sched_ext to verify bandwidth values from debugfs - Address comment from Andrea about redundant rq clock invalidation - Link to v4: https://lore.kernel.org/all/20250617200523.1261231-1-joelagnelf@nvidia.com/ Changes in v4: - Fixed issues with hotplugged CPUs having their DL server bandwidth altered due to loading SCX - Fixed other issues - Rebased on Linus master - All sched_ext kselftests reliably pass now, also verified that the total_bw in debugfs (CONFIG_SCHED_DEBUG) is conserved with these patches - Link to v3: https://lore.kernel.org/all/20250613051734.4023260-1-joelagnelf@nvidia.com/ Changes in v3: - Removed code duplication in debugfs. Made ext interface separate - Fixed issue where rq_lock_irqsave was not used in the relinquish patch - Fixed running bw accounting issue in dl_server_remove_params - Link to v2: https://lore.kernel.org/all/20250602180110.816225-1-joelagnelf@nvidia.com/ Changes in v2: - Fixed a hang related to using rq_lock instead of rq_lock_irqsave - Added support to remove BW of DL servers when they are switched to/from EXT - Link to v1: https://lore.kernel.org/all/20250315022158.2354454-1-joelagnelf@nvidia.com/ Andrea Righi (5): sched/deadline: Add support to initialize and remove dl_server bandwidth sched_ext: Add a DL server for sched_ext tasks sched/deadline: Account ext server bandwidth sched_ext: Selectively enable ext and fair DL servers selftests/sched_ext: Add test for sched_ext dl_server Joel Fernandes (6): sched/debug: Fix updating of ppos on server write ops sched/debug: Stop and start server based on if it was active sched/deadline: Clear the defer params sched/deadline: Add a server arg to dl_server_update_idle_time() sched/debug: Add support to change sched_ext server params selftests/sched_ext: Add test for DL server total_bw consistency kernel/sched/core.c | 3 + kernel/sched/deadline.c | 169 +++++++++++--- kernel/sched/debug.c | 171 +++++++++++--- kernel/sched/ext.c | 144 +++++++++++- kernel/sched/fair.c | 2 +- kernel/sched/idle.c | 2 +- kernel/sched/sched.h | 8 +- kernel/sched/topology.c | 5 + tools/testing/selftests/sched_ext/Makefile | 2 + tools/testing/selftests/sched_ext/rt_stall.bpf.c | 23 ++ tools/testing/selftests/sched_ext/rt_stall.c | 222 ++++++++++++++++++ tools/testing/selftests/sched_ext/total_bw.c | 281 +++++++++++++++++++++++ 12 files changed, 955 insertions(+), 77 deletions(-) create mode 100644 tools/testing/selftests/sched_ext/rt_stall.bpf.c create mode 100644 tools/testing/selftests/sched_ext/rt_stall.c create mode 100644 tools/testing/selftests/sched_ext/total_bw.c

3 weeks, 1 day

4
27
0 0

[PATCH 0/9] mm/damon: misc cleanups

by SeongJae Park

Yet another batch of misc cleanups and refactoring for DAMON code, tests, and documents. First two patches (1and 2) rename DAMOS core filters related code for readability. Three following patches (3-5) refactor page table walk callback functions in DAMON, as suggested by Hugh and David, and I promised. Next two patches (6 and 7) refactor DAMON core layer kunit test and sysfs interface selftest to be simple and deduplicated. Final two patches (8 and 9) fix up sphinx and grammatical errors on documents. SeongJae Park (9): mm/damon: rename damos core filter helpers to have word core mm/damon: rename damos->filters to damos->core_filters mm/damon/vaddr: cleanup using pmd_trans_huge_lock() mm/damon/vaddr: use vm_normal_folio{,_pmd}() instead of damon_get_folio() mm/damon/vaddr: consistently use only pmd_entry for damos_migrate mm/damon/tests/core-kunit: remove DAMON_MIN_REGION redefinition selftests/damon/sysfs.py: merge DAMON status dumping into commitment assertion Docs/mm/damon/maintainer-profile: fix a typo on mm-untable link Docs/mm/damon/maintainer-profile: fix grammartical errors .clang-format | 4 +- Documentation/mm/damon/maintainer-profile.rst | 10 +- include/linux/damon.h | 14 +- mm/damon/core.c | 25 ++- mm/damon/tests/core-kunit.h | 59 ++++---- mm/damon/vaddr.c | 143 +++++++----------- .../selftests/damon/drgn_dump_damon_status.py | 8 +- tools/testing/selftests/damon/sysfs.py | 45 ++---- 8 files changed, 121 insertions(+), 187 deletions(-) base-commit: 4e9ec347bc14de636aec3014dee3b5d279ca33bf -- 2.47.3

3 weeks, 1 day

1
3
0 0

[PATCH bpf-next] selftests/bpf: Fix htab_update/reenter_update selftest failure

by Saket Kumar Bhaskar

Since commit 31158ad02ddb ("rqspinlock: Add deadlock detection and recovery") the updated path on re-entrancy now reports deadlock via -EDEADLK instead of the previous -EBUSY. The selftest is updated to align with expected errno with the kernel’s current behavior. Signed-off-by: Saket Kumar Bhaskar <skb99(a)linux.ibm.com> --- tools/testing/selftests/bpf/prog_tests/htab_update.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/prog_tests/htab_update.c b/tools/testing/selftests/bpf/prog_tests/htab_update.c index 2bc85f4814f4..98d52bb1446f 100644 --- a/tools/testing/selftests/bpf/prog_tests/htab_update.c +++ b/tools/testing/selftests/bpf/prog_tests/htab_update.c @@ -40,7 +40,7 @@ static void test_reenter_update(void) if (!ASSERT_OK(err, "add element")) goto out; - ASSERT_EQ(skel->bss->update_err, -EBUSY, "no reentrancy"); + ASSERT_EQ(skel->bss->update_err, -EDEADLK, "no reentrancy"); out: htab_update__destroy(skel); } -- 2.51.0

3 weeks, 1 day

2
5
0 0

[PATCH v5 net-next 00/14] AccECN protocol case handling series

by chia-yu.chang＠nokia-bell-labs.com

From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com> Hello, Plesae find the v5 AccECN case handling patch series, which covers several excpetional case handling of Accurate ECN spec (RFC9768), adds new identifiers to be used by CC modules, adds ecn_delta into rate_sample, and keeps the ACE counter for computation, etc. This patch series is part of the full AccECN patch series, which is available at https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/ Best regards, Chia-Yu --- v5: - Move previous #11 in v4 in latter patch after discussion with RFC author. - Add #3 to update the comments for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN. (Parav Pandit <parav(a)nvidia.com>) - Add gro self-test for TCP CWR flag in #4. (Eric Dumazet <edumazet(a)google.com>) - Add fixes: tag into #7 (Paolo Abeni <pabeni(a)redhat.com>) - Update commit message of #8 and if condition check (Paolo Abeni <pabeni(a)redhat.com>) - Add empty line between variable declarations and code in #13 (Paolo Abeni <pabeni(a)redhat.com>) v4: - Add previous #13 in v2 back after dicussion with the RFC author. - Add TCP_ACCECN_OPTION_PERSIST to tcp_ecn_option sysctl to ignore AccECN fallback policy on sending AccECN option. v3: - Add additional min() check if pkts_acked_ewma is not initialized in #1. (Paolo Abeni <pabeni(a)redhat.com>) - Change TCP_CONG_WANTS_ECT_1 into individual flag add helper function INET_ECN_xmit_wants_ect_1() in #3. (Paolo Abeni <pabeni(a)redhat.com>) - Add empty line between variable declarations and code in #4. (Paolo Abeni <pabeni(a)redhat.com>) - Update commit message to fix old AccECN commits in #5. (Paolo Abeni <pabeni(a)redhat.com>) - Remove unnecessary brackets in #10. (Paolo Abeni <pabeni(a)redhat.com>) - Move patch #3 in v2 to a later Prague patch serise and remove patch #13 in v2. (Paolo Abeni <pabeni(a)redhat.com>) --- Chia-Yu Chang (12): net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN selftests/net: gro: add self-test for TCP CWR flag tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules tcp: disable RFC3168 fallback identifier for CC modules tcp: accecn: handle unexpected AccECN negotiation feedback tcp: accecn: retransmit downgraded SYN in AccECN negotiation tcp: move increment of num_retrans tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion tcp: accecn: fallback outgoing half link to non-AccECN tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST tcp: accecn: enable AccECN Ilpo Järvinen (2): tcp: try to avoid safer when ACKs are thinned gro: flushing when CWR is set negatively affects AccECN Documentation/networking/ip-sysctl.rst | 4 +- .../networking/net_cachelines/tcp_sock.rst | 1 + include/linux/skbuff.h | 13 ++- include/linux/tcp.h | 4 +- include/net/inet_ecn.h | 20 +++- include/net/tcp.h | 32 ++++++- include/net/tcp_ecn.h | 92 ++++++++++++++----- net/ipv4/sysctl_net_ipv4.c | 4 +- net/ipv4/tcp.c | 2 + net/ipv4/tcp_cong.c | 10 +- net/ipv4/tcp_input.c | 37 +++++++- net/ipv4/tcp_minisocks.c | 40 +++++--- net/ipv4/tcp_offload.c | 3 +- net/ipv4/tcp_output.c | 42 ++++++--- tools/testing/selftests/net/gro.c | 80 +++++++++++----- 15 files changed, 294 insertions(+), 90 deletions(-) -- 2.34.1

3 weeks, 1 day

4
38
0 0

[PATCH net-next v4 00/12] selftests/vsock: refactor and improve vmtest infrastructure

by Bobby Eshleman

Hey all, This patch series refactors the vsock selftest VM infrastructure to improve test run times, improve logging, and prepare for future tests which make heavy usage of these refactored functions and have new requirements such as simultaneous QEMU processes. These patches were broken off from this prior series: https://lore.kernel.org/all/20251021-vsock-vmtest-v7-0-0661b7b6f081@meta.co… To: Stefano Garzarella <sgarzare(a)redhat.com> To: Shuah Khan <shuah(a)kernel.org> Cc: virtualization(a)lists.linux.dev Cc: netdev(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org Cc: Simon Horman <horms(a)kernel.org> Changes in v4: - fix messed up rebase (wrt check_result() and shared_vm_test() patches) - more consistent variable quotes style - use associative array for pidfiles, remove after terminate - Link to v3: https://lore.kernel.org/r/20251106-vsock-selftests-fixes-and-improvements-v… Changes in v3: - see per-patch changes - Link to v2: https://lore.kernel.org/all/20251104-vsock-selftests-fixes-and-improvements… Changes in v2: - remove "Fixes" for some patches because they do not fix bugs in kselftest runs (some fix bugs only when using bash args that kselftest does not use or otherwise prepare functions for new usage) - broke out one fixes patch for "net" - per-patch changes - add patch for shellcheck declaration to disable false positives - Link to v1: https://lore.kernel.org/r/20251022-vsock-selftests-fixes-and-improvements-v… --- Bobby Eshleman (12): selftests/vsock: improve logging in vmtest.sh selftests/vsock: make wait_for_listener() work even if pipefail is on selftests/vsock: reuse logic for vsock_test through wrapper functions selftests/vsock: avoid multi-VM pidfile collisions with QEMU selftests/vsock: do not unconditionally die if qemu fails selftests/vsock: speed up tests by reducing the QEMU pidfile timeout selftests/vsock: add check_result() for pass/fail counting selftests/vsock: identify and execute tests that can re-use VM selftests/vsock: add BUILD=0 definition selftests/vsock: add 1.37 to tested virtme-ng versions selftests/vsock: add vsock_loopback module loading selftests/vsock: disable shellcheck SC2317 and SC2119 tools/testing/selftests/vsock/vmtest.sh | 346 +++++++++++++++++++++----------- 1 file changed, 233 insertions(+), 113 deletions(-) --- base-commit: a0c3aefb08cd81864b17c23c25b388dba90b9dad change-id: 20251021-vsock-selftests-fixes-and-improvements-057440ffb2fa Best regards, -- Bobby Eshleman <bobbyeshleman(a)meta.com>

3 weeks, 1 day

4
25
0 0

[PATCH net-next v14 0/7] bonding: Extend arp_ip_target format to allow for a list of vlan tags.

by David Wilder

The current implementation of the arp monitor builds a list of vlan-tags by following the chain of net_devices above the bond. See bond_verify_device_path(). Unfortunately, with some configurations, this is not possible. One example is when an ovs switch is configured above the bond. This change extends the "arp_ip_target" parameter format to allow for a list of vlan tags to be included for each arp target. This new list of tags is optional and may be omitted to preserve the current format and process of discovering vlans. The new format for arp_ip_target is: arp_ip_target ipv4-address[vlan-tag\...],... For example: arp_ip_target 10.0.0.1[10/20] arp_ip_target 10.0.0.1[] (used to disable vlan discovery) Changes since V13 Thanks for the help Paolo: - Changed first argument of bond_option_arp_ip_target_add() to a const. - Changed first argument of bond_arp_target_to_string to a const. - Added compiler time check of size argument to: bond_arp_target_to_string(), BUILD_BUG_ON(size != BOND_OPTION_STRING_MAX_SIZE); - In bond_arp_send_all() I changed the condition for both the allocation and the free calls to be the same to improve the clarity of the code. - Removed extra tab in bond_fill_info(). - Updated update bond_get_size() to reflect the increased payload for the arp_ip_target option. - Corrected indentation and alignment in bond-arp-ip-target.sh. Changes since V12 Fixed uninitialized variable in bond_option_arp_ip_targets_set() (patch 4) causing a CI failure. Changes since V11 No Change. Changes since V10 Thanks Paolo: - 1/7 Changed the layout of struct bond_arp_target to reduce size of the struct. - 3/7 Fixed format 'size-num' -> 'size - num' - 7/7 Updated selftest (bond-arp-ip-target.sh). Removed sleep 10 in check_failure_count(). Added call to tc to verify arp probes are reaching the target interface. Then I verify that the Link Failure counts are not increasing over "time". Arp probes are sent every 100ms, two missed probes will trigger a Link failure. A one second wait between checking counts should be be more than sufficient. This speeds up the execution of the test. Thanks Nikolay: - 4/7 In bond_option_arp_ip_targets_clear() I changed the definition of empty_target to empty_target = {}. - bond_validate_tags() now verifies input is a multiple of sizeof(struct bond_vlan_tag). Updated VID validity check to use: !tags->vlan_id || tags->vlan_id >= VLAN_VID_MASK) as suggested. - In bond_option_arp_ip_targets_set() removed the redundant length check of target.target_ip. - Added kfree(target.tags) when bond_option_arp_ip_target_add() results in an error. - Removed the caching of struct bond_vlan_tag returned by bond_verify_device_path(), Nikolay pointed out that caching tags prevented the detection of VLAN configuration changes. Added a kfree(tags) for tags allocated in bond_verify_device_path(). Jay, Nikolay and I had a discussion regarding locking when adding, deleting or changing vlan tags. Jay pointed out that user supplied tags that are stashed in the bond configuration and can only be changed via user space this can be done safely in an RCU manner as netlink always operates with RTNL held. If user space provided tags and then replumbs things, it'll be on user space to update the tags in a safe manor. I was concerned about changing options on a configured bond, I found that attempting to change a bonds configuration (using "ip set") will abort the attempt to make a change if the bond's state is "UP" or has slaves configured. Therefor the configuration and operational side of a bond is separated. I agree with Jay that the existing locking scheme is sufficient. Change since V9 Fix kdoc build error. Changes since V8: Moved the #define BOND_MAX_VLAN_TAGS from patch 6 to patch 3. Thanks Simon for catching the bisection break. Changes since V7: These changes should eliminate the CI failures I have been seeing. 1) patch 2, changed type of bond_opt_value.extra_len to size_t. 2) Patch 4, added bond_validate_tags() to validate the array of bond_vlan_tag provided by the user. Changes since V6: 1) I made a number of changes to fix the failure seen in the kernel CI. I am still unable to reproduce the this failure, hopefully I have fixed it. These change are in patch #4 to functions: bond_option_arp_ip_targets_clear() and bond_option_arp_ip_targets_set() Changes since V5: Only the last 2 patches have changed since V5. 1) Fixed sparse warning in bond_fill_info(). 2) Also in bond_fill_info() I resolved data.addr uninitialized when if condition is not met. Thank you Simon for catching this. Note: The change is different that what I shared earlier. 3) Fixed shellcheck warnings in test script: Blocked source warning, Ignored specific unassigned references and exported ALL_TESTS to resolve a reference warning. Changes since V4: 1)Dropped changes to proc and sysfs APIs to bonding. These APIs do not need to be updated to support new functionality. Netlink and iproute2 have been updated to do the right thing, but the other APIs are more or less frozen in the past. 2)Jakub reported a warning triggered in bond_info_seq_show() during testing. I was unable to reproduce this warning or identify it with code inspection. However, all my changes to bond_info_seq_show() have been dropped as unnecessary (see above). Hopefully this will resolve the issue. 3)Selftest script has been updated based on the results of shellcheck. Two unresolved references that are not possible to resolve are all that remain. 4)A patch was added updating bond_info_fill() to support "ip -d show <bond-device>" command. The inclusion of a list of vlan tags is optional. The new logic preserves both forward and backward compatibility with the kernel and iproute2 versions. Changes since V3: 1) Moved the parsing of the extended arp_ip_target out of the kernel and into userspace (ip command). A separate patch to iproute2 to follow shortly. 2) Split up the patch set to make review easier. Please see iproute changes in a separate posting. Thank you for your time and reviews. Signed-off-by: David Wilder <wilder(a)us.ibm.com> David Wilder (7): bonding: Adding struct bond_arp_target bonding: Adding extra_len field to struct bond_opt_value. bonding: arp_ip_target helpers. bonding: Processing extended arp_ip_target from user space. bonding: Update to bond_arp_send_all() to use supplied vlan tags bonding: Update for extended arp_ip_target format. bonding: Selftest and documentation for the arp_ip_target parameter. Documentation/networking/bonding.rst | 11 + drivers/net/bonding/bond_main.c | 48 +++-- drivers/net/bonding/bond_netlink.c | 39 +++- drivers/net/bonding/bond_options.c | 146 ++++++++++--- drivers/net/bonding/bond_procfs.c | 4 +- drivers/net/bonding/bond_sysfs.c | 4 +- include/net/bond_options.h | 29 ++- include/net/bonding.h | 67 +++++- .../selftests/drivers/net/bonding/Makefile | 1 + .../drivers/net/bonding/bond-arp-ip-target.sh | 204 ++++++++++++++++++ 10 files changed, 474 insertions(+), 79 deletions(-) create mode 100755 tools/testing/selftests/drivers/net/bonding/bond-arp-ip-target.sh -- 2.50.1

3 weeks, 2 days

4
10
0 0

[PATCH v22 00/28] riscv control-flow integrity for usermode

by Deepak Gupta via B4 Relay

v22: fixing build error due to -march=zicfiss being picked in gcc-13 and above but not actually doing any codegen or recognizing instruction for zicfiss. Change in v22 makes dependence on `-fcf-protection=full` compiler flag to ensure that toolchain has support and then only CONFIG_RISCV_USER_CFI will be visible in menuconfig. v21: fixed build errors. Basics and overview =================== Software with larger attack surfaces (e.g. network facing apps like databases, browsers or apps relying on browser runtimes) suffer from memory corruption issues which can be utilized by attackers to bend control flow of the program to eventually gain control (by making their payload executable). Attackers are able to perform such attacks by leveraging call-sites which rely on indirect calls or return sites which rely on obtaining return address from stack memory. To mitigate such attacks, risc-v extension zicfilp enforces that all indirect calls must land on a landing pad instruction `lpad` else cpu will raise software check exception (a new cpu exception cause code on riscv). Similarly for return flow, risc-v extension zicfiss extends architecture with - `sspush` instruction to push return address on a shadow stack - `sspopchk` instruction to pop return address from shadow stack and compare with input operand (i.e. return address on stack) - `sspopchk` to raise software check exception if comparision above was a mismatch - Protection mechanism using which shadow stack is not writeable via regular store instructions More information an details can be found at extensions github repo [1]. Equivalent to landing pad (zicfilp) on x86 is `ENDBRANCH` instruction in Intel CET [3] and branch target identification (BTI) [4] on arm. Similarly x86's Intel CET has shadow stack [5] and arm64 has guarded control stack (GCS) [6] which are very similar to risc-v's zicfiss shadow stack. x86 and arm64 support for user mode shadow stack is already in mainline. Kernel awareness for user control flow integrity ================================================ This series picks up Samuel Holland's envcfg changes [2] as well. So if those are being applied independently, they should be removed from this series. Enabling: In order to maintain compatibility and not break anything in user mode, kernel doesn't enable control flow integrity cpu extensions on binary by default. Instead exposes a prctl interface to enable, disable and lock the shadow stack or landing pad feature for a task. This allows userspace (loader) to enumerate if all objects in its address space are compiled with shadow stack and landing pad support and accordingly enable the feature. Additionally if a subsequent `dlopen` happens on a library, user mode can take a decision again to disable the feature (if incoming library is not compiled with support) OR terminate the task (if user mode policy is strict to have all objects in address space to be compiled with control flow integirty cpu feature). prctl to enable shadow stack results in allocating shadow stack from virtual memory and activating for user address space. x86 and arm64 are also following same direction due to similar reason(s). clone/fork: On clone and fork, cfi state for task is inherited by child. Shadow stack is part of virtual memory and is a writeable memory from kernel perspective (writeable via a restricted set of instructions aka shadow stack instructions) Thus kernel changes ensure that this memory is converted into read-only when fork/clone happens and COWed when fault is taken due to sspush, sspopchk or ssamoswap. In case `CLONE_VM` is specified and shadow stack is to be enabled, kernel will automatically allocate a shadow stack for that clone call. map_shadow_stack: x86 introduced `map_shadow_stack` system call to allow user space to explicitly map shadow stack memory in its address space. It is useful to allocate shadow for different contexts managed by a single thread (green threads or contexts) risc-v implements this system call as well. signal management: If shadow stack is enabled for a task, kernel performs an asynchronous control flow diversion to deliver the signal and eventually expects userspace to issue sigreturn so that original execution can be resumed. Even though resume context is prepared by kernel, it is in user space memory and is subject to memory corruption and corruption bugs can be utilized by attacker in this race window to perform arbitrary sigreturn and eventually bypass cfi mechanism. Another issue is how to ensure that cfi related state on sigcontext area is not trampled by legacy apps or apps compiled with old kernel headers. In order to mitigate control-flow hijacting, kernel prepares a token and place it on shadow stack before signal delivery and places address of token in sigcontext structure. During sigreturn, kernel obtains address of token from sigcontext struture, reads token from shadow stack and validates it and only then allow sigreturn to succeed. Compatiblity issue is solved by adopting dynamic sigcontext management introduced for vector extension. This series re-factor the code little bit to allow future sigcontext management easy (as proposed by Andy Chiu from SiFive) config and compilation: Introduce a new risc-v config option `CONFIG_RISCV_USER_CFI`. Selecting this config option picks the kernel support for user control flow integrity. This optin is presented only if toolchain has shadow stack and landing pad support. And is on purpose guarded by toolchain support. Reason being that eventually vDSO also needs to be compiled in with shadow stack and landing pad support. vDSO compile patches are not included as of now because landing pad labeling scheme is yet to settle for usermode runtime. To get more information on kernel interactions with respect to zicfilp and zicfiss, patch series adds documentation for `zicfilp` and `zicfiss` in following: Documentation/arch/riscv/zicfiss.rst Documentation/arch/riscv/zicfilp.rst How to test this series ======================= Toolchain --------- $ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev $ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static" $ make -j$(nproc) Qemu ---- Get the lastest qemu $ cd qemu $ mkdir build $ cd build $ ../configure --target-list=riscv64-softmmu $ make -j$(nproc) Opensbi ------- $ git clone git@github.com:deepak0414/opensbi.git -b v6_cfi_spec_split_opensbi $ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic Linux ----- Running defconfig is fine. CFI is enabled by default if the toolchain supports it. $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) Running ------- Modify your qemu command to have: -bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin -cpu rv64,zicfilp=true,zicfiss=true,zimop=true,zcmop=true References ========== [1] - https://github.com/riscv/riscv-cfi [2] - https://lore.kernel.org/all/20240814081126.956287-1-samuel.holland@sifive.c… [3] - https://lwn.net/Articles/889475/ [4] - https://developer.arm.com/documentation/109576/0100/Branch-Target-Identific… [5] - https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-i… [6] - https://lwn.net/Articles/940403/ To: Thomas Gleixner <tglx(a)linutronix.de> To: Ingo Molnar <mingo(a)redhat.com> To: Borislav Petkov <bp(a)alien8.de> To: Dave Hansen <dave.hansen(a)linux.intel.com> To: x86(a)kernel.org To: H. Peter Anvin <hpa(a)zytor.com> To: Andrew Morton <akpm(a)linux-foundation.org> To: Liam R. Howlett <Liam.Howlett(a)oracle.com> To: Vlastimil Babka <vbabka(a)suse.cz> To: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> To: Paul Walmsley <paul.walmsley(a)sifive.com> To: Palmer Dabbelt <palmer(a)dabbelt.com> To: Albert Ou <aou(a)eecs.berkeley.edu> To: Conor Dooley <conor(a)kernel.org> To: Rob Herring <robh(a)kernel.org> To: Krzysztof Kozlowski <krzk+dt(a)kernel.org> To: Arnd Bergmann <arnd(a)arndb.de> To: Christian Brauner <brauner(a)kernel.org> To: Peter Zijlstra <peterz(a)infradead.org> To: Oleg Nesterov <oleg(a)redhat.com> To: Eric Biederman <ebiederm(a)xmission.com> To: Kees Cook <kees(a)kernel.org> To: Jonathan Corbet <corbet(a)lwn.net> To: Shuah Khan <shuah(a)kernel.org> To: Jann Horn <jannh(a)google.com> To: Conor Dooley <conor+dt(a)kernel.org> To: Miguel Ojeda <ojeda(a)kernel.org> To: Alex Gaynor <alex.gaynor(a)gmail.com> To: Boqun Feng <boqun.feng(a)gmail.com> To: Gary Guo <gary(a)garyguo.net> To: Björn Roy Baron <bjorn3_gh(a)protonmail.com> To: Benno Lossin <benno.lossin(a)proton.me> To: Andreas Hindborg <a.hindborg(a)kernel.org> To: Alice Ryhl <aliceryhl(a)google.com> To: Trevor Gross <tmgross(a)umich.edu> Cc: linux-kernel(a)vger.kernel.org Cc: linux-fsdevel(a)vger.kernel.org Cc: linux-mm(a)kvack.org Cc: linux-riscv(a)lists.infradead.org Cc: devicetree(a)vger.kernel.org Cc: linux-arch(a)vger.kernel.org Cc: linux-doc(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: alistair.francis(a)wdc.com Cc: richard.henderson(a)linaro.org Cc: jim.shu(a)sifive.com Cc: andybnac(a)gmail.com Cc: kito.cheng(a)sifive.com Cc: charlie(a)rivosinc.com Cc: atishp(a)rivosinc.com Cc: evan(a)rivosinc.com Cc: cleger(a)rivosinc.com Cc: alexghiti(a)rivosinc.com Cc: samitolvanen(a)google.com Cc: broonie(a)kernel.org Cc: rick.p.edgecombe(a)intel.com Cc: rust-for-linux(a)vger.kernel.org changelog --------- v22: - CONFIG_RISCV_USER_CFI was by default "n". With dual vdso support it is default "y" (if toolchain supports it). Fixing build error due to "-march=zicfiss" being picked in gcc-13 partially. gcc-13 only recognizes the flag but not actually doing any codegen or recognizing instruction for zicfiss. Change in v22 makes dependence on `-fcf-protection=full` compiler flag to ensure that toolchain has support and then only CONFIG_RISCV_USER_CFI will be visible in menuconfig. - picked up tags and some cosmetic changes in commit message for dual vdso patch. v21: - Fixing build errors due to changes in arch/riscv/include/asm/vdso.h Using #ifdef instead of IS_ENABLED in arch/riscv/include/asm/vdso.h vdso-cfi-offsets.h should be included only when CONFIG_RISCV_USER_CFI is selected. v20: - rebased on v6.18-rc1. - Added two vDSO support. If `CONFIG_RISCV_USER_CFI` is selected two vDSOs are compiled (one for hardware prior to RVA23 and one for RVA23 onwards). Kernel exposes RVA23 vDSO if hardware/cpu implements zimop else exposes existing vDSO to userspace. - default selection for `CONFIG_RISCV_USER_CFI` is "Yes". - replaced "__ASSEMBLY__" with "__ASSEMBLER__" v19: - riscv_nousercfi was `int`. changed it to unsigned long. Thanks to Alex Ghiti for reporting it. It was a bug. - ELP is cleared on trap entry only when CONFIG_64BIT. - restore ssp back on return to usermode was being done before `riscv_v_context_nesting_end` on trap exit path. If kernel shadow stack were enabled this would result in kernel operating on user shadow stack and panic (as I found in my testing of kcfi patch series). So fixed that. v18: - rebased on 6.16-rc1 - uprobe handling clears ELP in sstatus image in pt_regs - vdso was missing shadow stack elf note for object files. added that. Additional asm file for vdso needed the elf marker flag. toolchain should complain if `-fcf-protection=full` and marker is missing for object generated from asm file. Asked toolchain folks to fix this. Although no reason to gate the merge on that. - Split up compile options for march and fcf-protection in vdso Makefile - CONFIG_RISCV_USER_CFI option is moved under "Kernel features" menu Added `arch/riscv/configs/hardening.config` fragment which selects CONFIG_RISCV_USER_CFI v17: - fixed warnings due to empty macros in usercfi.h (reported by alexg) - fixed prefixes in commit titles reported by alexg - took below uprobe with fcfi v2 patch from Zong Li and squashed it with "riscv/traps: Introduce software check exception and uprobe handling" https://lore.kernel.org/all/20250604093403.10916-1-zong.li@sifive.com/ v16: - If FWFT is not implemented or returns error for shadow stack activation, then no_usercfi is set to disable shadow stack. Although this should be picked up by extension validation and activation. Fixed this bug for zicfilp and zicfiss both. Thanks to Charlie Jenkins for reporting this. - If toolchain doesn't support cfi, cfi kselftest shouldn't build. Suggested by Charlie Jenkins. - Default for CONFIG_RISCV_USER_CFI is set to no. Charlie/Atish suggested to keep it off till we have more hardware availibility with RVA23 profile and zimop/zcmop implemented. Else this will start breaking people's workflow - Includes the fix if "!RV64 and !SBI" then definitions for FWFT in asm-offsets.c error. v15: - Toolchain has been updated to include `-fcf-protection` flag. This exists for x86 as well. Updated kernel patches to compile vDSO and selftest to compile with `fcf-protection=full` flag. - selecting CONFIG_RISCV_USERCFI selects CONFIG_RISCV_SBI. - Patch to enable shadow stack for kernel wasn't hidden behind CONFIG_RISCV_USERCFI and CONFIG_RISCV_SBI. fixed that. v14: - rebased on top of palmer/sbi-v3. Thus dropped clement's FWFT patches Updated RISCV_ISA_EXT_XXXX in hwcap and hwprobe constants. - Took Radim's suggestions on bitfields. - Placed cfi_state at the end of thread_info block so that current situation is not disturbed with respect to member fields of thread_info in single cacheline. v13: - cpu_supports_shadow_stack/cpu_supports_indirect_br_lp_instr uses riscv_has_extension_unlikely() - uses nops(count) to create nop slide - RISCV_ACQUIRE_BARRIER is not needed in `amo_user_shstk`. Removed it - changed ternaries to simply use implicit casting to convert to bool. - kernel command line allows to disable zicfilp and zicfiss independently. updated kernel-parameters.txt. - ptrace user abi for cfi uses bitmasks instead of bitfields. Added ptrace kselftest. - cosmetic and grammatical changes to documentation. v12: - It seems like I had accidently squashed arch agnostic indirect branch tracking prctl and riscv implementation of those prctls. Split them again. - set_shstk_status/set_indir_lp_status perform CSR writes only when CPU support is available. As suggested by Zong Li. - Some minor clean up in kselftests as suggested by Zong Li. v11: - patch "arch/riscv: compile vdso with landing pad" was unconditionally selecting `_zicfilp` for vDSO compile. fixed that. Changed `lpad 1` to to `lpad 0`. v10: - dropped "mm: helper `is_shadow_stack_vma` to check shadow stack vma". This patch is not that interesting to this patch series for risc-v. There are instances in arch directories where VM_SHADOW_STACK flag is anyways used. Dropping this patch to expedite merging in riscv tree. - Took suggestions from `Clement` on "riscv: zicfiss / zicfilp enumeration" to validate presence of cfi based on config. - Added a patch for vDSO to have `lpad 0`. I had omitted this earlier to make sure we add single vdso object with cfi enabled. But a vdso object with scheme of zero labeled landing pad is least common denominator and should work with all objects of zero labeled as well as function-signature labeled objects. v9: - rebased on master (39a803b754d5 fix braino in "9p: fix ->rename_sem exclusion") - dropped "mm: Introduce ARCH_HAS_USER_SHADOW_STACK" (master has it from arm64/gcs) - dropped "prctl: arch-agnostic prctl for shadow stack" (master has it from arm64/gcs) v8: - rebased on palmer/for-next - dropped samuel holland's `envcfg` context switch patches. they are in parlmer/for-next v7: - Removed "riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv" Instead using `deactivate_mm` flow to clean up. see here for more context https://lore.kernel.org/all/20230908203655.543765-1-rick.p.edgecombe@intel.… - Changed the header include in `kselftest`. Hopefully this fixes compile issue faced by Zong Li at SiFive. - Cleaned up an orphaned change to `mm/mmap.c` in below patch "riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE" - Lock interfaces for shadow stack and indirect branch tracking expect arg == 0 Any future evolution of this interface should accordingly define how arg should be setup. - `mm/map.c` has an instance of using `VM_SHADOW_STACK`. Fixed it to use helper `is_shadow_stack_vma`. - Link to v6: https://lore.kernel.org/r/20241008-v5_user_cfi_series-v6-0-60d9fe073f37@riv… v6: - Picked up Samuel Holland's changes as is with `envcfg` placed in `thread` instead of `thread_info` - fixed unaligned newline escapes in kselftest - cleaned up messages in kselftest and included test output in commit message - fixed a bug in clone path reported by Zong Li - fixed a build issue if CONFIG_RISCV_ISA_V is not selected (this was introduced due to re-factoring signal context management code) v5: - rebased on v6.12-rc1 - Fixed schema related issues in device tree file - Fixed some of the documentation related issues in zicfilp/ss.rst (style issues and added index) - added `SHADOW_STACK_SET_MARKER` so that implementation can define base of shadow stack. - Fixed warnings on definitions added in usercfi.h when CONFIG_RISCV_USER_CFI is not selected. - Adopted context header based signal handling as proposed by Andy Chiu - Added support for enabling kernel mode access to shadow stack using FWFT (https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-firmware…) - Link to v5: https://lore.kernel.org/r/20241001-v5_user_cfi_series-v1-0-3ba65b6e550f@riv… (Note: I had an issue in my workflow due to which version number wasn't picked up correctly while sending out patches) v4: - rebased on 6.11-rc6 - envcfg: Converged with Samuel Holland's patches for envcfg management on per- thread basis. - vma_is_shadow_stack is renamed to is_vma_shadow_stack - picked up Mark Brown's `ARCH_HAS_USER_SHADOW_STACK` patch - signal context: using extended context management to maintain compatibility. - fixed `-Wmissing-prototypes` compiler warnings for prctl functions - Documentation fixes and amending typos. - Link to v4: https://lore.kernel.org/all/20240912231650.3740732-1-debug@rivosinc.com/ v3: - envcfg logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been picked on per task basis, even though CPU didn't implement it. Fixed in this series. - dt-bindings As suggested, split into separate commit. fixed the messaging that spec is in public review - arch_is_shadow_stack change arch_is_shadow_stack changed to vma_is_shadow_stack - hwprobe zicfiss / zicfilp if present will get enumerated in hwprobe - selftests As suggested, added object and binary filenames to .gitignore Selftest binary anyways need to be compiled with cfi enabled compiler which will make sure that landing pad and shadow stack are enabled. Thus removed separate enable/disable tests. Cleaned up tests a bit. - Link to v3: https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/ v2: - Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow integrity for user mode programs can be compiled in the kernel. - Enabling of control flow integrity for user programs is left to user runtime - This patch series introduces arch agnostic `prctls` to enable shadow stack and indirect branch tracking. And implements them on riscv. --- Changes in v22: - Link to v21: https://lore.kernel.org/r/20251015-v5_user_cfi_series-v21-0-6a07856e90e7@ri… Changes in v21: - Link to v20: https://lore.kernel.org/r/20251013-v5_user_cfi_series-v20-0-b9de4be9912e@ri… Changes in v20: - Link to v19: https://lore.kernel.org/r/20250731-v5_user_cfi_series-v19-0-09b468d7beab@ri… Changes in v19: - Link to v18: https://lore.kernel.org/r/20250711-v5_user_cfi_series-v18-0-a8ee62f9f38e@ri… Changes in v18: - Link to v17: https://lore.kernel.org/r/20250604-v5_user_cfi_series-v17-0-4565c2cf869f@ri… Changes in v17: - Link to v16: https://lore.kernel.org/r/20250522-v5_user_cfi_series-v16-0-64f61a35eee7@ri… Changes in v16: - Link to v15: https://lore.kernel.org/r/20250502-v5_user_cfi_series-v15-0-914966471885@ri… Changes in v15: - changelog posted just below cover letter - Link to v14: https://lore.kernel.org/r/20250429-v5_user_cfi_series-v14-0-5239410d012a@ri… Changes in v14: - changelog posted just below cover letter - Link to v13: https://lore.kernel.org/r/20250424-v5_user_cfi_series-v13-0-971437de586a@ri… Changes in v13: - changelog posted just below cover letter - Link to v12: https://lore.kernel.org/r/20250314-v5_user_cfi_series-v12-0-e51202b53138@ri… Changes in v12: - changelog posted just below cover letter - Link to v11: https://lore.kernel.org/r/20250310-v5_user_cfi_series-v11-0-86b36cbfb910@ri… Changes in v11: - changelog posted just below cover letter - Link to v10: https://lore.kernel.org/r/20250210-v5_user_cfi_series-v10-0-163dcfa31c60@ri… --- Andy Chiu (1): riscv: signal: abstract header saving for setup_sigcontext Deepak Gupta (26): mm: VM_SHADOW_STACK definition for riscv dt-bindings: riscv: zicfilp and zicfiss in dt-bindings (extensions.yaml) riscv: zicfiss / zicfilp enumeration riscv: zicfiss / zicfilp extension csr and bit definitions riscv: usercfi state for task and save/restore of CSR_SSP on trap entry/exit riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE riscv/mm: manufacture shadow stack pte riscv/mm: teach pte_mkwrite to manufacture shadow stack PTEs riscv/mm: write protect and shadow stack riscv/mm: Implement map_shadow_stack() syscall riscv/shstk: If needed allocate a new shadow stack on clone riscv: Implements arch agnostic shadow stack prctls prctl: arch-agnostic prctl for indirect branch tracking riscv: Implements arch agnostic indirect branch tracking prctls riscv/traps: Introduce software check exception and uprobe handling riscv/signal: save and restore of shadow stack for signal riscv/kernel: update __show_regs to print shadow stack register riscv/ptrace: riscv cfi status and state via ptrace and in core files riscv/hwprobe: zicfilp / zicfiss enumeration in hwprobe riscv: kernel command line option to opt out of user cfi riscv: enable kernel access to shadow stack memory via FWFT sbi call arch/riscv: dual vdso creation logic and select vdso based on hw riscv: create a config for shadow stack and landing pad instr support riscv: Documentation for landing pad / indirect branch tracking riscv: Documentation for shadow stack on riscv kselftest/riscv: kselftest for user mode cfi Jim Shu (1): arch/riscv: compile vdso with landing pad and shadow stack note Documentation/admin-guide/kernel-parameters.txt | 8 + Documentation/arch/riscv/index.rst | 2 + Documentation/arch/riscv/zicfilp.rst | 115 +++++ Documentation/arch/riscv/zicfiss.rst | 179 +++++++ .../devicetree/bindings/riscv/extensions.yaml | 14 + arch/riscv/Kconfig | 22 + arch/riscv/Makefile | 8 +- arch/riscv/configs/hardening.config | 4 + arch/riscv/include/asm/asm-prototypes.h | 1 + arch/riscv/include/asm/assembler.h | 44 ++ arch/riscv/include/asm/cpufeature.h | 12 + arch/riscv/include/asm/csr.h | 16 + arch/riscv/include/asm/entry-common.h | 2 + arch/riscv/include/asm/hwcap.h | 2 + arch/riscv/include/asm/mman.h | 26 + arch/riscv/include/asm/mmu_context.h | 7 + arch/riscv/include/asm/pgtable.h | 30 +- arch/riscv/include/asm/processor.h | 1 + arch/riscv/include/asm/thread_info.h | 3 + arch/riscv/include/asm/usercfi.h | 95 ++++ arch/riscv/include/asm/vdso.h | 13 +- arch/riscv/include/asm/vector.h | 3 + arch/riscv/include/uapi/asm/hwprobe.h | 2 + arch/riscv/include/uapi/asm/ptrace.h | 34 ++ arch/riscv/include/uapi/asm/sigcontext.h | 1 + arch/riscv/kernel/Makefile | 2 + arch/riscv/kernel/asm-offsets.c | 10 + arch/riscv/kernel/cpufeature.c | 27 + arch/riscv/kernel/entry.S | 38 ++ arch/riscv/kernel/head.S | 27 + arch/riscv/kernel/process.c | 27 +- arch/riscv/kernel/ptrace.c | 95 ++++ arch/riscv/kernel/signal.c | 148 +++++- arch/riscv/kernel/sys_hwprobe.c | 2 + arch/riscv/kernel/sys_riscv.c | 10 + arch/riscv/kernel/traps.c | 54 ++ arch/riscv/kernel/usercfi.c | 545 +++++++++++++++++++++ arch/riscv/kernel/vdso.c | 7 + arch/riscv/kernel/vdso/Makefile | 40 +- arch/riscv/kernel/vdso/flush_icache.S | 4 + arch/riscv/kernel/vdso/gen_vdso_offsets.sh | 4 +- arch/riscv/kernel/vdso/getcpu.S | 4 + arch/riscv/kernel/vdso/note.S | 3 + arch/riscv/kernel/vdso/rt_sigreturn.S | 4 + arch/riscv/kernel/vdso/sys_hwprobe.S | 4 + arch/riscv/kernel/vdso/vgetrandom-chacha.S | 5 +- arch/riscv/kernel/vdso_cfi/Makefile | 25 + arch/riscv/kernel/vdso_cfi/vdso-cfi.S | 11 + arch/riscv/mm/init.c | 2 +- arch/riscv/mm/pgtable.c | 16 + include/linux/cpu.h | 4 + include/linux/mm.h | 7 + include/uapi/linux/elf.h | 2 + include/uapi/linux/prctl.h | 27 + kernel/sys.c | 30 ++ tools/testing/selftests/riscv/Makefile | 2 +- tools/testing/selftests/riscv/cfi/.gitignore | 3 + tools/testing/selftests/riscv/cfi/Makefile | 16 + tools/testing/selftests/riscv/cfi/cfi_rv_test.h | 82 ++++ tools/testing/selftests/riscv/cfi/riscv_cfi_test.c | 173 +++++++ tools/testing/selftests/riscv/cfi/shadowstack.c | 385 +++++++++++++++ tools/testing/selftests/riscv/cfi/shadowstack.h | 27 + 62 files changed, 2475 insertions(+), 41 deletions(-) --- base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787 change-id: 20240930-v5_user_cfi_series-3dc332f8f5b2 -- - debug

3 weeks, 2 days

7
44
0 0

[PATCH net 0/6] selftests: mptcp: join: fix some flaky tests

by Matthieu Baerts (NGI0)

When looking at the recent CI results on NIPA and MPTCP CIs, a few MPTCP Join tests are marked as unstable. Here are some fixes for that. - Patch 1: a small fix for mptcp_connect.sh, printing a note as initially intended. For >=v5.13. - Patch 2: avoid unexpected reset when closing subflows. For >= 5.13. - Patches 3-4: longer transfer when not waiting for the end. For >=5.18. - Patch 5: read all received data when expecting a reset. For >= v6.1. - Patch 6: a fix to properly kill background tasks. For >= v6.5. Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org> --- Matthieu Baerts (NGI0) (6): selftests: mptcp: connect: fix fallback note due to OoO selftests: mptcp: join: rm: set backup flag selftests: mptcp: join: endpoints: longer transfer selftests: mptcp: join: userspace: longer transfer selftests: mptcp: connect: trunc: read all recv data selftests: mptcp: join: properly kill background tasks tools/testing/selftests/net/mptcp/mptcp_connect.c | 18 +++-- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 2 +- tools/testing/selftests/net/mptcp/mptcp_join.sh | 90 +++++++++++----------- tools/testing/selftests/net/mptcp/mptcp_lib.sh | 21 +++++ 4 files changed, 80 insertions(+), 51 deletions(-) --- base-commit: 96a9178a29a6b84bb632ebeb4e84cf61191c73d5 change-id: 20251108-net-mptcp-sft-join-unstable-5a28cdb6ea54 Best regards, -- Matthieu Baerts (NGI0) <matttbe(a)kernel.org>

3 weeks, 2 days

3
8
0 0

[PATCH net-next v3 00/11] selftests/vsock: refactor and improve vmtest infrastructure

by Bobby Eshleman

Hey all, This patch series refactors the vsock selftest VM infrastructure to improve test run times, improve logging, and prepare for future tests which make heavy usage of these refactored functions and have new requirements such as simultaneous QEMU processes. These patches were broken off from this prior series: https://lore.kernel.org/all/20251021-vsock-vmtest-v7-0-0661b7b6f081@meta.co… To: Stefano Garzarella <sgarzare(a)redhat.com> To: Shuah Khan <shuah(a)kernel.org> Cc: virtualization(a)lists.linux.dev Cc: netdev(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org Changes in v3: - see per-patch changes Changes in v2: - remove "Fixes" for some patches because they do not fix bugs in kselftest runs (some fix bugs only when using bash args that kselftest does not use or otherwise prepare functions for new usage) - broke out one fixes patch for "net" - per-patch changes - add patch for shellcheck declaration to disable false positives - Link to v1: https://lore.kernel.org/r/20251022-vsock-selftests-fixes-and-improvements-v… --- Bobby Eshleman (11): selftests/vsock: improve logging in vmtest.sh selftests/vsock: make wait_for_listener() work even if pipefail is on selftests/vsock: reuse logic for vsock_test through wrapper functions selftests/vsock: avoid multi-VM pidfile collisions with QEMU selftests/vsock: do not unconditionally die if qemu fails selftests/vsock: speed up tests by reducing the QEMU pidfile timeout selftests/vsock: add check_result() for pass/fail counting selftests/vsock: add BUILD=0 definition selftests/vsock: add 1.37 to tested virtme-ng versions selftests/vsock: add vsock_loopback module loading selftests/vsock: disable shellcheck SC2317 and SC2119 tools/testing/selftests/vsock/vmtest.sh | 355 ++++++++++++++++++++++---------- 1 file changed, 243 insertions(+), 112 deletions(-) --- base-commit: 8a25a2e34157d882032112e4194ccdfb29c499e8 change-id: 20251021-vsock-selftests-fixes-and-improvements-057440ffb2fa Best regards, -- Bobby Eshleman <bobbyeshleman(a)meta.com>

3 weeks, 2 days

2
15
0 0

[PATCH][next] selftests/bpf: test_xsk: Fix spelling mistake "conigure" -> "configure"

by Colin Ian King

There is a spelling mistake in an ASSERT_OK message. Fix it. Signed-off-by: Colin Ian King <coking(a)nvidia.com> --- tools/testing/selftests/bpf/prog_tests/xsk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/prog_tests/xsk.c b/tools/testing/selftests/bpf/prog_tests/xsk.c index dd4c35c0e428..04f9a5e73e5e 100644 --- a/tools/testing/selftests/bpf/prog_tests/xsk.c +++ b/tools/testing/selftests/bpf/prog_tests/xsk.c @@ -74,7 +74,7 @@ static void test_xsk(const struct test_spec *test_to_run, enum test_mode mode) if (!ASSERT_OK_PTR(ifobj_rx, "create ifobj_rx")) goto delete_tx; - if (!ASSERT_OK(configure_ifobj(ifobj_tx, ifobj_rx), "conigure ifobj")) + if (!ASSERT_OK(configure_ifobj(ifobj_tx, ifobj_rx), "configure ifobj")) goto delete_rx; ret = get_hw_ring_size(ifobj_tx->ifname, &ifobj_tx->ring); -- 2.51.0

3 weeks, 2 days

2
1
0 0

[PATCH v2 0/4] vfio: selftests: update DMA mapping tests to use queried IOVA ranges

by Alex Mastro

Not all IOMMUs support the same virtual address width as the processor, for instance older Intel consumer platforms only support 39-bits of IOMMU address space. On such platforms, using the virtual address as the IOVA and mappings at the top of the address space both fail. VFIO and IOMMUFD have facilities for retrieving valid IOVA ranges, VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE and IOMMU_IOAS_IOVA_RANGES, respectively. These provide compatible arrays of ranges from which we can construct a simple allocator and record the maximum supported IOVA address. Use this new allocator in place of reusing the virtual address, and incorporate the maximum supported IOVA into the limit testing. This latter change doesn't test quite the same absolute end-of-address space behavior but still seems to have some value. Testing for overflow is skipped when a reduced address space is supported as the desired errno is not generated. This series is based on Alex Williamson's "Incorporate IOVA range info" [1] along with feedback from the discussion in David Matlack's "Skip vfio_dma_map_limit_test if mapping returns -EINVAL" [2]. Given David's plans to split IOMMU concerns from devices as described in [3], this series' home for `struct iova_allocator` and IOVA range helpers are likely to be short lived, since they reside in vfio_pci_device.c. I assume that the rework can move this functionality to a more appropriate location next to other IOMMU-focused code, once such a place exists. [1] https://lore.kernel.org/all/20251108212954.26477-1-alex@shazbot.org/#t [2] https://lore.kernel.org/all/20251107222058.2009244-1-dmatlack@google.com/ [3] https://lore.kernel.org/all/aRIoKJk0uwLD-yGr@google.com/ To: Alex Williamson <alex(a)shazbot.org> To: David Matlack <dmatlack(a)google.com> To: Shuah Khan <shuah(a)kernel.org> To: Jason Gunthorpe <jgg(a)ziepe.ca> Cc: kvm(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org Signed-off-by: Alex Mastro <amastro(a)fb.com> Changes in v2: - Fix various nits - calloc() where appropriate - Update overflow test to run regardless of iova range constraints - Change iova_allocator_init() to return an allocated struct - Unfold iova_allocator_alloc() - Fix iova allocator initial state bug - Update vfio_pci_driver_test to use iova allocator - Link to v1: https://lore.kernel.org/r/20251110-iova-ranges-v1-0-4d441cf5bf6d@fb.com --- Alex Mastro (4): vfio: selftests: add iova range query helpers vfio: selftests: fix map limit tests to use last available iova vfio: selftests: add iova allocator vfio: selftests: replace iova=vaddr with allocated iovas .../testing/selftests/vfio/lib/include/vfio_util.h | 19 +- tools/testing/selftests/vfio/lib/vfio_pci_device.c | 241 ++++++++++++++++++++- .../testing/selftests/vfio/vfio_dma_mapping_test.c | 20 +- .../testing/selftests/vfio/vfio_pci_driver_test.c | 12 +- 4 files changed, 283 insertions(+), 9 deletions(-) --- base-commit: 0ed3a30fd996cb0cac872432cf25185fda7e5316 change-id: 20251110-iova-ranges-1c09549fbf63 Best regards, -- Alex Mastro <amastro(a)fb.com>

3 weeks, 2 days

3
13
0 0

[PATCH 00/11] mm/damon/tests: add more tests for online parameters commit

by SeongJae Park

A DAMON feature called parameters "commit" allows DAMON API callers and ABI users to update nearly every DAMON parameter while DAMON is running. This is being used for flexible DAMON use cases such as taking a snapshot of the monitoring results with minimum overhead, or adjusting access-aware system operations (DAMOS) for user-space driven auto-tuning or investigations. Compared to the usefulness of the feature and size of the implementation, the test coverage is pretty small. Only the filter commit part has a single test case, namely damos_test_commit_filter(). Actually, we found and fixed a few bugs of the feature in the past. The single existing test was also added to avoid reintroduction of a found bug. Add more unit tests for the feature. First four patches (1-4) refactor and extend the existing test for DAMOS filter commit for multiple test cases. Next three patches (5-7) add tests for DAMOS quota commit. Next two patches (8 and 9) refactor damos_commit_dests() for ease of code reading and test writing, and implement a new unit test of the function that is being refactored in a test-friendly way. Final two patches (10 and 11) further add new unit tests for damos_commit() and damon_commit_target_regions(). SeongJae Park (11): mm/damon/tests/core-kunit: remove dynamic allocs on damos_test_commit_filter() mm/damon/tests/core-kunit: split out damos_test_commit_filter() core logic mm/damon/tests/core-kunit: extend damos_test_commit_filter_for() for union fields mm/damon/tests/core-kunit: add test cases to damos_test_commit_filter() mm/damon/tests/core-kunit: add damos_commit_quota_goal() test mm/damon/tests/core-kunit: add damos_commit_quota_goals() test mm/damon/tests/core-kunit: add damos_commit_quota() test mm/damon/core: pass migrate_dests to damos_commit_dests() mm/damon/tests/core-kunit: add damos_commit_dests() test mm/damon/tests/core-kunit: add damos_commit() test mm/damon/tests/core-kunit: add damon_commit_target_regions() test mm/damon/core.c | 38 ++- mm/damon/tests/core-kunit.h | 544 +++++++++++++++++++++++++++++++++++- 2 files changed, 547 insertions(+), 35 deletions(-) base-commit: 620a4c1c5116eb811807ea7e63d61846015f69c8 -- 2.47.3

3 weeks, 2 days

1
11
0 0

[PATCH 00/10] selftests: vDSO: Stop using libc types for vDSO calls

by Thomas Weißschuh

Currently the vDSO selftests use the time-related types from libc. This works on glibc by chance today but will break with other libc implementations or on distributions which switch to 64-bit times everywhere. The kernel's UAPI headers provide the proper types to use with the vDSO (and raw syscalls) but are not necessarily compatible with libc types. Introduce a new header which makes the UAPI headers compatible with the libc. Also contains some related cleanups. Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- Thomas Weißschuh (10): Revert "selftests: vDSO: parse_vdso: Use UAPI headers instead of libc headers" selftests: vDSO: Introduce vdso_types.h selftests: vDSO: vdso_test_abi: Use types from vdso_types.h selftests: vDSO: vdso_test_abi: Provide compatibility with 32-bit musl selftests: vDSO: vdso_test_gettimeofday: Remove nolibc checks selftests: vDSO: vdso_test_gettimeofday: Use types from vdso_types.h selftests: vDSO: vdso_test_correctness: Drop SYS_getcpu fallbacks selftests: vDSO: vdso_test_correctness: Use types from vdso_types.h selftests: vDSO: vdso_test_correctness: Provide compatibility with 32-bit musl selftests: vDSO: vdso_test_correctness: Use facilities from parse_vdso.c tools/testing/selftests/vDSO/Makefile | 6 +- tools/testing/selftests/vDSO/parse_vdso.c | 3 +- tools/testing/selftests/vDSO/vdso_test_abi.c | 35 ++++----- .../testing/selftests/vDSO/vdso_test_correctness.c | 85 +++++++++------------- .../selftests/vDSO/vdso_test_gettimeofday.c | 9 +-- tools/testing/selftests/vDSO/vdso_types.h | 70 ++++++++++++++++++ 6 files changed, 121 insertions(+), 87 deletions(-) --- base-commit: 8c6abf7bda867b82f8a6d60a0d5ce9cb1da6c433 change-id: 20251110-vdso-test-types-68ce0c712b79 Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

3 weeks, 2 days

2
20
0 0

[PATCH] selftest: net: fix variable sized type not at the end of struct warnings

by Ankit Khushwaha

Some network selftests defined variable-sized types defined at the end of struct causing -Wgnu-variable-sized-type-not-at-end warning. warning: timestamping.c:285:18: warning: field 'cm' with variable sized type 'struct cmsghdr' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end] 285 | struct cmsghdr cm; | ^ ipsec.c:835:5: warning: field 'u' with variable sized type 'union (unnamed union at ipsec.c:831:3)' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end] 835 | } u; | ^ This patch move these field at the end of struct to fix these warnings. Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux(a)gmail.com> --- tools/testing/selftests/net/ipsec.c | 2 +- tools/testing/selftests/net/timestamping.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/net/ipsec.c b/tools/testing/selftests/net/ipsec.c index 0ccf484b1d9d..36083c8f884f 100644 --- a/tools/testing/selftests/net/ipsec.c +++ b/tools/testing/selftests/net/ipsec.c @@ -828,12 +828,12 @@ static int xfrm_state_pack_algo(struct nlmsghdr *nh, size_t req_sz, struct xfrm_desc *desc) { struct { + char buf[XFRM_ALGO_KEY_BUF_SIZE]; union { struct xfrm_algo alg; struct xfrm_algo_aead aead; struct xfrm_algo_auth auth; } u; - char buf[XFRM_ALGO_KEY_BUF_SIZE]; } alg = {}; size_t alen, elen, clen, aelen; unsigned short type; diff --git a/tools/testing/selftests/net/timestamping.c b/tools/testing/selftests/net/timestamping.c index 044bc0e9ed81..ad2be2143698 100644 --- a/tools/testing/selftests/net/timestamping.c +++ b/tools/testing/selftests/net/timestamping.c @@ -282,8 +282,8 @@ static void recvpacket(int sock, int recvmsg_flags, struct iovec entry; struct sockaddr_in from_addr; struct { - struct cmsghdr cm; char control[512]; + struct cmsghdr cm; } control; int res; -- 2.51.0

3 weeks, 2 days

2
3
0 0

[PATCH v5 00/34] sparc64: vdso: Switch to the generic vDSO library

by Thomas Weißschuh

The generic vDSO provides a lot common functionality shared between different architectures. SPARC is the last architecture not using it, preventing some necessary code cleanup. Make use of the generic infrastructure. Follow-up to and replacement for Arnd's SPARC vDSO removal patches: https://lore.kernel.org/lkml/20250707144726.4008707-1-arnd@kernel.org/ SPARC64 can not map .bss into userspace, so the vDSO datapages are switched over to be allocated dynamically. This requires changes to the s390 and random subsystem vDSO initialization as preparation. The random subsystem changes in turn require some cleanup of the vDSO headers to not end up as ugly #ifdef mess. Tested on a Niagara T4 and QEMU. This has a semantic conflict with my series "vdso: Reject absolute relocations during build" [0]. The last patch of this series expects all users of the generic vDSO library to use the vdsocheck tool. This is not the case (yet) for SPARC64. I do have the patches for the integration, the specifics will depend on which series is applied first. Based on v6.18-rc1. [0] https://lore.kernel.org/lkml/20250812-vdso-absolute-reloc-v4-0-61a8b615e5ec… Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- Changes in v5: - Merge the patches for 'struct page' mapping and dynamic allocation - Zero out newly-allocated data pages - Pick up review tags - Link to v4: https://lore.kernel.org/r/20251014-vdso-sparc64-generic-2-v4-0-e0607bf49dea… Changes in v4: - Rebase on v6.18-rc1. - Keep inclusion of asm/clocksource.h from linux/clocksource.h - Reword description of "s390/time: Set up vDSO datapage later" - Link to v3: https://lore.kernel.org/r/20250917-vdso-sparc64-generic-2-v3-0-3679b1bc8ee8… Changes in v3: - Allocate vDSO data pages dynamically (and lots of preparations for that) - Drop clock_getres() - Fix 32bit clock_gettime() syscall fallback - Link to v2: https://lore.kernel.org/r/20250815-vdso-sparc64-generic-2-v2-0-b5ff80672347… Changes in v2: - Rebase on v6.17-rc1 - Drop RFC state - Fix typo in commit message - Drop duplicate 'select GENERIC_TIME_VSYSCALL' - Merge "sparc64: time: Remove architecture-specific clocksource data" into the main conversion patch. It violated the check in __clocksource_register_scale() - Link to v1: https://lore.kernel.org/r/20250724-vdso-sparc64-generic-2-v1-0-e376a3bd24d1… --- Arnd Bergmann (1): clocksource: remove ARCH_CLOCKSOURCE_DATA Thomas Weißschuh (33): selftests: vDSO: vdso_test_correctness: Handle different tv_usec types arm64: vDSO: getrandom: Explicitly include asm/alternative.h arm64: vDSO: gettimeofday: Explicitly include vdso/clocksource.h arm64: vDSO: compat_gettimeofday: Add explicit includes ARM: vdso: gettimeofday: Add explicit includes powerpc/vdso/gettimeofday: Explicitly include vdso/time32.h powerpc/vdso: Explicitly include asm/cputable.h and asm/feature-fixups.h LoongArch: vDSO: Explicitly include asm/vdso/vdso.h MIPS: vdso: Add include guard to asm/vdso/vdso.h MIPS: vdso: Explicitly include asm/vdso/vdso.h random: vDSO: Add explicit includes vdso/gettimeofday: Add explicit includes vdso/helpers: Explicitly include vdso/processor.h vdso/datapage: Remove inclusion of gettimeofday.h vdso/datapage: Trim down unnecessary includes random: vDSO: trim vDSO includes random: vDSO: remove ifdeffery random: vDSO: split out datapage update into helper functions random: vDSO: only access vDSO datapage after random_init() s390/time: Set up vDSO datapage later vdso/datastore: Reduce scope of some variables in vvar_fault() vdso/datastore: Drop inclusion of linux/mmap_lock.h vdso/datastore: Allocate data pages dynamically sparc64: vdso: Link with -z noexecstack sparc64: vdso: Remove obsolete "fake section table" reservation sparc64: vdso: Replace code patching with runtime conditional sparc64: vdso: Move hardware counter read into header sparc64: vdso: Move syscall fallbacks into header sparc64: vdso: Introduce vdso/processor.h sparc64: vdso: Switch to the generic vDSO library sparc64: vdso2c: Drop sym_vvar_start handling sparc64: vdso2c: Remove symbol handling sparc64: vdso: Implement clock_gettime64() arch/arm/include/asm/vdso/gettimeofday.h | 2 + arch/arm64/include/asm/vdso/compat_gettimeofday.h | 3 + arch/arm64/include/asm/vdso/gettimeofday.h | 2 + arch/arm64/kernel/vdso/vgetrandom.c | 2 + arch/loongarch/kernel/process.c | 1 + arch/loongarch/kernel/vdso.c | 1 + arch/mips/include/asm/vdso/vdso.h | 5 + arch/mips/kernel/vdso.c | 1 + arch/powerpc/include/asm/vdso/gettimeofday.h | 1 + arch/powerpc/include/asm/vdso/processor.h | 3 + arch/s390/kernel/time.c | 4 +- arch/sparc/Kconfig | 3 +- arch/sparc/include/asm/clocksource.h | 9 - arch/sparc/include/asm/processor.h | 3 + arch/sparc/include/asm/processor_32.h | 2 - arch/sparc/include/asm/processor_64.h | 25 -- arch/sparc/include/asm/vdso.h | 2 - arch/sparc/include/asm/vdso/clocksource.h | 10 + arch/sparc/include/asm/vdso/gettimeofday.h | 184 ++++++++++ arch/sparc/include/asm/vdso/processor.h | 41 +++ arch/sparc/include/asm/vdso/vsyscall.h | 10 + arch/sparc/include/asm/vvar.h | 75 ---- arch/sparc/kernel/Makefile | 1 - arch/sparc/kernel/time_64.c | 6 +- arch/sparc/kernel/vdso.c | 69 ---- arch/sparc/vdso/Makefile | 8 +- arch/sparc/vdso/vclock_gettime.c | 380 ++------------------- arch/sparc/vdso/vdso-layout.lds.S | 26 +- arch/sparc/vdso/vdso.lds.S | 2 - arch/sparc/vdso/vdso2c.c | 24 -- arch/sparc/vdso/vdso2c.h | 45 +-- arch/sparc/vdso/vdso32/vdso32.lds.S | 4 +- arch/sparc/vdso/vma.c | 274 +-------------- drivers/char/random.c | 71 ++-- include/linux/clocksource.h | 6 +- include/linux/vdso_datastore.h | 6 + include/vdso/datapage.h | 23 +- include/vdso/helpers.h | 1 + init/main.c | 2 + kernel/time/Kconfig | 4 - lib/vdso/datastore.c | 74 ++-- lib/vdso/getrandom.c | 3 + lib/vdso/gettimeofday.c | 17 + .../testing/selftests/vDSO/vdso_test_correctness.c | 8 +- 44 files changed, 449 insertions(+), 994 deletions(-) --- base-commit: 28b1ac5ccd8d4900a8f53f0e6e84d517a7ccc71f change-id: 20250722-vdso-sparc64-generic-2-25f2e058e92c Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

3 weeks, 3 days

7
52
0 0

[PATCH net v10 0/4] net: netpoll: fix memory leak and add comprehensive selftests

by Breno Leitao

Fix a memory leak in netpoll and introduce netconsole selftests that expose the issue when running with kmemleak detection enabled. This patchset includes a selftest for netpoll with multiple concurrent users (netconsole + bonding), which simulates the scenario from test[1] that originally demonstrated the issue allegedly fixed by commit efa95b01da18 ("netpoll: fix use after free") - a commit that is now being reverted. Sending this to "net" branch because this is a fix, and the selftest might help with the backports validation. Link: https://lore.kernel.org/lkml/96b940137a50e5c387687bb4f57de8b0435a653f.14048… [1] Signed-off-by: Breno Leitao <leitao(a)debian.org> --- Changes in v10: - Get rid of the create_and_enable_dynamic_target() (Simon) - Link to v9: https://lore.kernel.org/r/20251106-netconsole_torture-v9-0-f73cd147c13c@deb… Changes in v9: - Reordered the config entries in tools/testing/selftests/drivers/net/bonding/config (NIPA) - Link to v8: https://lore.kernel.org/r/20251104-netconsole_torture-v8-0-5288440e2fa0@deb… Changes in v8: - Sending it again, now that commit 1a8fed52f7be1 ("netdevsim: set the carrier when the device goes up") has landed in net - Created one namespace for TX and one for RX (Paolo) - Used additional helpers to create and delete netdevsim (Paolo) - Link to v7: https://lore.kernel.org/r/20251003-netconsole_torture-v7-0-aa92fcce62a9@deb… Changes in v7: - Rebased on top of `net` - Link to v6: https://lore.kernel.org/r/20251002-netconsole_torture-v6-0-543bf52f6b46@deb… Changes in v6: - Expand the tests even more and some small fixups - Moved the test to bonding selftests - Link to v5: https://lore.kernel.org/r/20250918-netconsole_torture-v5-0-77e25e0a4eb6@deb… Changes in v5: - Set CONFIG_BONDING=m in selftests/drivers/net/config. - Link to v4: https://lore.kernel.org/r/20250917-netconsole_torture-v4-0-0a5b3b8f81ce@deb… Changes in v4: - Added an additional selftest to test multiple netpoll users in parallel - Link to v3: https://lore.kernel.org/r/20250905-netconsole_torture-v3-0-875c7febd316@deb… Changes in v3: - This patchset is a merge of the fix and the selftest together as recommended by Jakub. Changes in v2: - Reuse the netconsole creation from lib_netcons.sh. Thus, refactoring the create_dynamic_target() (Jakub) - Move the "wait" to after all the messages has been sent. - Link to v1: https://lore.kernel.org/r/20250902-netconsole_torture-v1-1-03c6066598e9@deb… --- Breno Leitao (4): net: netpoll: fix incorrect refcount handling causing incorrect cleanup selftest: netcons: refactor target creation selftest: netcons: create a torture test selftest: netcons: add test for netconsole over bonded interfaces net/core/netpoll.c | 7 +- tools/testing/selftests/drivers/net/Makefile | 1 + .../testing/selftests/drivers/net/bonding/Makefile | 2 + tools/testing/selftests/drivers/net/bonding/config | 4 + .../drivers/net/bonding/netcons_over_bonding.sh | 361 +++++++++++++++++++++ .../selftests/drivers/net/lib/sh/lib_netcons.sh | 78 ++++- .../selftests/drivers/net/netcons_torture.sh | 130 ++++++++ 7 files changed, 566 insertions(+), 17 deletions(-) --- base-commit: 7d1988a943850c584e8e2e4bcc7a3b5275024072 change-id: 20250902-netconsole_torture-8fc23f0aca99 Best regards, -- Breno Leitao <leitao(a)debian.org>

3 weeks, 3 days

2
5
0 0

[PATCH] selftests/tracing: Run sample events to clear page cache events

by Steven Rostedt

From: Steven Rostedt <rostedt(a)goodmis.org> The tracing selftest "event-filter-function.tc" was failing because it first runs the "sample_events" function that triggers the kmem_cache_free event and it looks at what function was used during a call to "ls". But the first time it calls this, it could trigger events that are used to pull pages into the page cache. The rest of the test uses the function it finds during that call to see if it will be called in subsequent "sample_events" calls. But if there's no need to pull pages into the page cache, it will not trigger that function and the test will fail. Call the "sample_events" twice to trigger all the page cache work before it calls it to find a function to use in subsequent checks. Cc: stable(a)vger.kernel.org Fixes: eb50d0f250e96 ("selftests/ftrace: Choose target function for filter test from samples") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- .../selftests/ftrace/test.d/filter/event-filter-function.tc | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc index c62165fabd0c..003f612f57b0 100644 --- a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc +++ b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc @@ -20,6 +20,10 @@ sample_events() { echo 0 > tracing_on echo 0 > events/enable +# Clear functions caused by page cache; run sample_events twice +sample_events +sample_events + echo "Get the most frequently calling function" echo > trace sample_events -- 2.51.0

3 weeks, 3 days

3
3
0 0

[PATCH 0/4] vfio: selftests: update DMA mapping tests to use queried IOVA ranges

by Alex Mastro

Not all IOMMUs support the same virtual address width as the processor, for instance older Intel consumer platforms only support 39-bits of IOMMU address space. On such platforms, using the virtual address as the IOVA and mappings at the top of the address space both fail. VFIO and IOMMUFD have facilities for retrieving valid IOVA ranges, VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE and IOMMU_IOAS_IOVA_RANGES, respectively. These provide compatible arrays of ranges from which we can construct a simple allocator and record the maximum supported IOVA address. Use this new allocator in place of reusing the virtual address, and incorporate the maximum supported IOVA into the limit testing. This latter change doesn't test quite the same absolute end-of-address space behavior but still seems to have some value. Testing for overflow is skipped when a reduced address space is supported as the desired errno is not generated. This series is based on Alex Williamson's "Incorporate IOVA range info" [1] along with feedback from the discussion in David Matlack's "Skip vfio_dma_map_limit_test if mapping returns -EINVAL" [2]. Given David's plans to split IOMMU concerns from devices as described in [3], this series' home for `struct iova_allocator` is likely to be short lived, since it resides in vfio_pci_device.c. I assume that the rework can move this functionality to a more appropriate location next to other IOMMU-focused code, once such a place exists. [1] https://lore.kernel.org/all/20251108212954.26477-1-alex@shazbot.org/#t [2] https://lore.kernel.org/all/20251107222058.2009244-1-dmatlack@google.com/ [3] https://lore.kernel.org/all/aRIoKJk0uwLD-yGr@google.com/ Signed-off-by: Alex Mastro <amastro(a)fb.com> --- Alex Mastro (4): vfio: selftests: add iova range query helpers vfio: selftests: fix map limit tests to use last available iova vfio: selftests: add iova allocator vfio: selftests: update vfio_dma_mapping_test to allocate iovas .../testing/selftests/vfio/lib/include/vfio_util.h | 22 +- tools/testing/selftests/vfio/lib/vfio_pci_device.c | 226 ++++++++++++++++++++- .../testing/selftests/vfio/vfio_dma_mapping_test.c | 25 ++- 3 files changed, 268 insertions(+), 5 deletions(-) --- base-commit: 0ed3a30fd996cb0cac872432cf25185fda7e5316 change-id: 20251110-iova-ranges-1c09549fbf63 Best regards, -- Alex Mastro <amastro(a)fb.com>

3 weeks, 3 days

3
20
0 0

[PATCH] selftest/mm: fix pointer comparison in mremap_test

by Ankit Khushwaha

Pointer arthemitic with 'void * addr' and 'unsigned long long dest_alignment' triggers following warning: mremap_test.c:1035:31: warning: pointer comparison always evaluates to false [-Wtautological-compare] 1035 | if (addr + c.dest_alignment < addr) { | ^ typecasting 'addr' to 'unsigned long long' to fix pointer comparison. Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux(a)gmail.com> --- tools/testing/selftests/mm/mremap_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c index a95c0663a011..5ae0400176af 100644 --- a/tools/testing/selftests/mm/mremap_test.c +++ b/tools/testing/selftests/mm/mremap_test.c @@ -1032,7 +1032,7 @@ static long long remap_region(struct config c, unsigned int threshold_mb, /* Don't destroy existing mappings unless expected to overlap */ while (!is_remap_region_valid(addr, c.region_size) && !c.overlapping) { /* Check for unsigned overflow */ - if (addr + c.dest_alignment < addr) { + if ((unsigned long long) addr + c.dest_alignment < (unsigned long long) addr) { ksft_print_msg("Couldn't find a valid region to remap to\n"); ret = -1; goto clean_up_src; -- 2.51.0

3 weeks, 3 days

7
10
0 0

[PATCH] mm/selftests: Fix -Wtautological-compare warning in mremap_test.c

by Wake Liu

The compiler warns about a tautological comparison in mremap_test.c: "pointer comparison always evaluates to false [-Wtautological-compare]" This occurs when checking for unsigned overflow: if (addr + c.dest_alignment < addr) Cast 'addr' to 'unsigned long long' to ensure the comparison is performed with a wider type, correctly detecting potential overflow and resolving the warning. Signed-off-by: Wake Liu <wakel(a)google.com> --- tools/testing/selftests/mm/mremap_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c index bf2863b102e3..c4933f4cbd48 100644 --- a/tools/testing/selftests/mm/mremap_test.c +++ b/tools/testing/selftests/mm/mremap_test.c @@ -1032,7 +1032,7 @@ static long long remap_region(struct config c, unsigned int threshold_mb, /* Don't destroy existing mappings unless expected to overlap */ while (!is_remap_region_valid(addr, c.region_size) && !c.overlapping) { /* Check for unsigned overflow */ - if (addr + c.dest_alignment < addr) { + if ((unsigned long long)addr + c.dest_alignment < (unsigned long long)addr) { ksft_print_msg("Couldn't find a valid region to remap to\n"); ret = -1; goto clean_up_src; -- 2.51.2.1041.gc1ab5b90ca-goog

3 weeks, 3 days

2
1
0 0

[PATCH net v9 0/4] net: netpoll: fix memory leak and add comprehensive selftests

by Breno Leitao

Fix a memory leak in netpoll and introduce netconsole selftests that expose the issue when running with kmemleak detection enabled. This patchset includes a selftest for netpoll with multiple concurrent users (netconsole + bonding), which simulates the scenario from test[1] that originally demonstrated the issue allegedly fixed by commit efa95b01da18 ("netpoll: fix use after free") - a commit that is now being reverted. Sending this to "net" branch because this is a fix, and the selftest might help with the backports validation. Link: https://lore.kernel.org/lkml/96b940137a50e5c387687bb4f57de8b0435a653f.14048… [1] Signed-off-by: Breno Leitao <leitao(a)debian.org> --- Changes in v9: - Reordered the config entries in tools/testing/selftests/drivers/net/bonding/config (NIPA) - Link to v8: https://lore.kernel.org/r/20251104-netconsole_torture-v8-0-5288440e2fa0@deb… Changes in v8: - Sending it again, now that commit 1a8fed52f7be1 ("netdevsim: set the carrier when the device goes up") has landed in net - Created one namespace for TX and one for RX (Paolo) - Used additional helpers to create and delete netdevsim (Paolo) - Link to v7: https://lore.kernel.org/r/20251003-netconsole_torture-v7-0-aa92fcce62a9@deb… Changes in v7: - Rebased on top of `net` - Link to v6: https://lore.kernel.org/r/20251002-netconsole_torture-v6-0-543bf52f6b46@deb… Changes in v6: - Expand the tests even more and some small fixups - Moved the test to bonding selftests - Link to v5: https://lore.kernel.org/r/20250918-netconsole_torture-v5-0-77e25e0a4eb6@deb… Changes in v5: - Set CONFIG_BONDING=m in selftests/drivers/net/config. - Link to v4: https://lore.kernel.org/r/20250917-netconsole_torture-v4-0-0a5b3b8f81ce@deb… Changes in v4: - Added an additional selftest to test multiple netpoll users in parallel - Link to v3: https://lore.kernel.org/r/20250905-netconsole_torture-v3-0-875c7febd316@deb… Changes in v3: - This patchset is a merge of the fix and the selftest together as recommended by Jakub. Changes in v2: - Reuse the netconsole creation from lib_netcons.sh. Thus, refactoring the create_dynamic_target() (Jakub) - Move the "wait" to after all the messages has been sent. - Link to v1: https://lore.kernel.org/r/20250902-netconsole_torture-v1-1-03c6066598e9@deb… --- Breno Leitao (4): net: netpoll: fix incorrect refcount handling causing incorrect cleanup selftest: netcons: refactor target creation selftest: netcons: create a torture test selftest: netcons: add test for netconsole over bonded interfaces net/core/netpoll.c | 7 +- tools/testing/selftests/drivers/net/Makefile | 1 + .../testing/selftests/drivers/net/bonding/Makefile | 2 + tools/testing/selftests/drivers/net/bonding/config | 4 + .../drivers/net/bonding/netcons_over_bonding.sh | 361 +++++++++++++++++++++ .../selftests/drivers/net/lib/sh/lib_netcons.sh | 82 ++++- .../selftests/drivers/net/netcons_torture.sh | 130 ++++++++ 7 files changed, 569 insertions(+), 18 deletions(-) --- base-commit: 7d1988a943850c584e8e2e4bcc7a3b5275024072 change-id: 20250902-netconsole_torture-8fc23f0aca99 Best regards, -- Breno Leitao <leitao(a)debian.org>

3 weeks, 3 days

2
7
0 0

[PATCH v7 00/12] Direct Map Removal Support for guest_memfd

by Patrick Roy

From: Patrick Roy <roypat(a)amazon.co.uk> [ based on kvm/next ] Unmapping virtual machine guest memory from the host kernel's direct map is a successful mitigation against Spectre-style transient execution issues: If the kernel page tables do not contain entries pointing to guest memory, then any attempted speculative read through the direct map will necessarily be blocked by the MMU before any observable microarchitectural side-effects happen. This means that Spectre-gadgets and similar cannot be used to target virtual machine memory. Roughly 60% of speculative execution issues fall into this category [1, Table 1]. This patch series extends guest_memfd with the ability to remove its memory from the host kernel's direct map, to be able to attain the above protection for KVM guests running inside guest_memfd. Additionally, a Firecracker branch with support for these VMs can be found on GitHub [2]. For more details, please refer to the v5 cover letter [v5]. No substantial changes in design have taken place since. === Changes Since v6 === - Drop patch for passing struct address_space to ->free_folio(), due to possible races with freeing of the address_space. (Hugh) - Stop using PG_uptodate / gmem preparedness tracking to keep track of direct map state. Instead, use the lowest bit of folio->private. (Mike, David) - Do direct map removal when establishing mapping of gmem folio instead of at allocation time, due to impossibility of handling direct map removal errors in kvm_gmem_populate(). (Patrick) - Do TLB flushes after direct map removal, and provide a module parameter to opt out from them, and a new patch to export flush_tlb_kernel_range() to KVM. (Will) [1]: https://download.vusec.net/papers/quarantine_raid23.pdf [2]: https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hidi… [RFCv1]: https://lore.kernel.org/kvm/20240709132041.3625501-1-roypat@amazon.co.uk/ [RFCv2]: https://lore.kernel.org/kvm/20240910163038.1298452-1-roypat@amazon.co.uk/ [RFCv3]: https://lore.kernel.org/kvm/20241030134912.515725-1-roypat@amazon.co.uk/ [v4]: https://lore.kernel.org/kvm/20250221160728.1584559-1-roypat@amazon.co.uk/ [v5]: https://lore.kernel.org/kvm/20250828093902.2719-1-roypat@amazon.co.uk/ [v6]: https://lore.kernel.org/kvm/20250912091708.17502-1-roypat@amazon.co.uk/ Patrick Roy (12): arch: export set_direct_map_valid_noflush to KVM module x86/tlb: export flush_tlb_kernel_range to KVM module mm: introduce AS_NO_DIRECT_MAP KVM: guest_memfd: Add stub for kvm_arch_gmem_invalidate KVM: guest_memfd: Add flag to remove from direct map KVM: guest_memfd: add module param for disabling TLB flushing KVM: selftests: load elf via bounce buffer KVM: selftests: set KVM_MEM_GUEST_MEMFD in vm_mem_add() if guest_memfd != -1 KVM: selftests: Add guest_memfd based vm_mem_backing_src_types KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in existing selftests KVM: selftests: stuff vm_mem_backing_src_type into vm_shape KVM: selftests: Test guest execution from direct map removed gmem Documentation/virt/kvm/api.rst | 5 ++ arch/arm64/include/asm/kvm_host.h | 12 ++++ arch/arm64/mm/pageattr.c | 1 + arch/loongarch/mm/pageattr.c | 1 + arch/riscv/mm/pageattr.c | 1 + arch/s390/mm/pageattr.c | 1 + arch/x86/include/asm/tlbflush.h | 3 +- arch/x86/mm/pat/set_memory.c | 1 + arch/x86/mm/tlb.c | 1 + include/linux/kvm_host.h | 9 +++ include/linux/pagemap.h | 16 +++++ include/linux/secretmem.h | 18 ----- include/uapi/linux/kvm.h | 2 + lib/buildid.c | 4 +- mm/gup.c | 19 ++---- mm/mlock.c | 2 +- mm/secretmem.c | 8 +-- .../testing/selftests/kvm/guest_memfd_test.c | 2 + .../testing/selftests/kvm/include/kvm_util.h | 37 ++++++++--- .../testing/selftests/kvm/include/test_util.h | 8 +++ tools/testing/selftests/kvm/lib/elf.c | 8 +-- tools/testing/selftests/kvm/lib/io.c | 23 +++++++ tools/testing/selftests/kvm/lib/kvm_util.c | 61 +++++++++-------- tools/testing/selftests/kvm/lib/test_util.c | 8 +++ tools/testing/selftests/kvm/lib/x86/sev.c | 1 + .../selftests/kvm/pre_fault_memory_test.c | 1 + .../selftests/kvm/set_memory_region_test.c | 50 ++++++++++++-- .../kvm/x86/private_mem_conversions_test.c | 7 +- virt/kvm/guest_memfd.c | 66 +++++++++++++++++-- virt/kvm/kvm_main.c | 8 +++ 30 files changed, 290 insertions(+), 94 deletions(-) base-commit: a6ad54137af92535cfe32e19e5f3bc1bb7dbd383 -- 2.51.0

3 weeks, 3 days

11
52
0 0

[PATCH v2] selftest/mm: fix pointer comparison in mremap_test

by Ankit Khushwaha

Pointer arthemitic with 'void * addr' and 'ulong dest_alignment' triggers following warning: mremap_test.c:1035:31: warning: pointer comparison always evaluates to false [-Wtautological-compare] 1035 | if (addr + c.dest_alignment < addr) { | ^ this warning is raised from clang version 20.1.8 (Fedora 20.1.8-4.fc42). use 'void *tmp_addr' to do the pointer arthemitic. Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux(a)gmail.com> --- Changelog: v2: - use 'void *tmp_addr' for pointer arthemitic instead of typecasting 'addr' to 'unsigned long long' as suggested by Andrew. v1: https://lore.kernel.org/linux-kselftest/20251106104917.39890-1-ankitkhushwa… --- tools/testing/selftests/mm/mremap_test.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c index a95c0663a011..308576437228 100644 --- a/tools/testing/selftests/mm/mremap_test.c +++ b/tools/testing/selftests/mm/mremap_test.c @@ -994,7 +994,7 @@ static void mremap_move_multi_invalid_vmas(FILE *maps_fp, unsigned long page_siz static long long remap_region(struct config c, unsigned int threshold_mb, char *rand_addr) { - void *addr, *src_addr, *dest_addr, *dest_preamble_addr = NULL; + void *addr, *tmp_addr, *src_addr, *dest_addr, *dest_preamble_addr = NULL; unsigned long long t, d; struct timespec t_start = {0, 0}, t_end = {0, 0}; long long start_ns, end_ns, align_mask, ret, offset; @@ -1032,7 +1032,8 @@ static long long remap_region(struct config c, unsigned int threshold_mb, /* Don't destroy existing mappings unless expected to overlap */ while (!is_remap_region_valid(addr, c.region_size) && !c.overlapping) { /* Check for unsigned overflow */ - if (addr + c.dest_alignment < addr) { + tmp_addr = addr + c.dest_alignment; + if (tmp_addr < addr) { ksft_print_msg("Couldn't find a valid region to remap to\n"); ret = -1; goto clean_up_src; -- 2.51.1

3 weeks, 4 days

3
2
0 0

[PATCH net-next 0/3] Add YNL test framework and library improvements

by Hangbin Liu

This series enhances YNL tools with some functionalities and adds YNL selftest framework. Changes include: - Add MAC address parsing support in YNL library - Fix rt-rule spec consistency with other rt-* families - Add selftests covering CLI and ethtool functionality The tests provide usage examples and regression testing for YNL tools. Hangbin Liu (3): tools: ynl: Add MAC address parsing support netlink: specs: update rt-rule src/dst attribute types to support IPv4 addresses selftests: net: add YNL test framework Documentation/netlink/specs/rt-rule.yaml | 6 +- tools/net/ynl/pyynl/lib/ynl.py | 9 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/net/ynl/Makefile | 18 ++ tools/testing/selftests/net/ynl/cli.sh | 234 +++++++++++++++++++++ tools/testing/selftests/net/ynl/config | 6 + tools/testing/selftests/net/ynl/ethtool.sh | 188 +++++++++++++++++ tools/testing/selftests/net/ynl/settings | 1 + 8 files changed, 461 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/net/ynl/Makefile create mode 100755 tools/testing/selftests/net/ynl/cli.sh create mode 100644 tools/testing/selftests/net/ynl/config create mode 100755 tools/testing/selftests/net/ynl/ethtool.sh create mode 100644 tools/testing/selftests/net/ynl/settings -- 2.50.1

3 weeks, 4 days

3
22
0 0

[PATCH] tools/nolibc: add support for fchdir()

by Thomas Weißschuh

Add support for the file descriptor based variant of chdir(). Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- tools/include/nolibc/sys.h | 13 +++++++++++++ tools/testing/selftests/nolibc/nolibc-test.c | 2 ++ 2 files changed, 15 insertions(+) diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h index c5564f57deec88b8aa70291fcf6f9ca4dbc1d03f..a4b0fdb9b641230174f5e62d62762f59af81a00e 100644 --- a/tools/include/nolibc/sys.h +++ b/tools/include/nolibc/sys.h @@ -118,6 +118,7 @@ void *sbrk(intptr_t inc) /* * int chdir(const char *path); + * int fchdir(int fildes); */ static __attribute__((unused)) @@ -132,6 +133,18 @@ int chdir(const char *path) return __sysret(sys_chdir(path)); } +static __attribute__((unused)) +int sys_fchdir(int fildes) +{ + return my_syscall1(__NR_fchdir, fildes); +} + +static __attribute__((unused)) +int fchdir(int fildes) +{ + return __sysret(sys_fchdir(fildes)); +} + /* * int chmod(const char *path, mode_t mode); diff --git a/tools/testing/selftests/nolibc/nolibc-test.c b/tools/testing/selftests/nolibc/nolibc-test.c index 29de21595fc95341c2aa975375a8d471cb3933fc..5927a84466cc0ede3b99611e134a8c6b8ab91e72 100644 --- a/tools/testing/selftests/nolibc/nolibc-test.c +++ b/tools/testing/selftests/nolibc/nolibc-test.c @@ -1343,6 +1343,8 @@ int run_syscall(int min, int max) CASE_TEST(dup3_0); tmp = dup3(0, 100, 0); EXPECT_SYSNE(1, tmp, -1); close(tmp); break; CASE_TEST(dup3_m1); tmp = dup3(-1, 100, 0); EXPECT_SYSER(1, tmp, -1, EBADF); if (tmp != -1) close(tmp); break; CASE_TEST(execve_root); EXPECT_SYSER(1, execve("/", (char*[]){ [0] = "/", [1] = NULL }, NULL), -1, EACCES); break; + CASE_TEST(fchdir_stdin); EXPECT_SYSER(1, fchdir(STDIN_FILENO), -1, ENOTDIR); break; + CASE_TEST(fchdir_badfd); EXPECT_SYSER(1, fchdir(-1), -1, EBADF); break; CASE_TEST(file_stream); EXPECT_SYSZR(1, test_file_stream()); break; CASE_TEST(fork); EXPECT_SYSZR(1, test_fork(FORK_STANDARD)); break; CASE_TEST(getdents64_root); EXPECT_SYSNE(1, test_getdents64("/"), -1); break; --- base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787 change-id: 20251107-nolibc-fchdir-2645c298a538 Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

3 weeks, 5 days

2
1
0 0

[PATCH net-next v3 0/5] psp: track stats from core and provide a driver stats api

by Daniel Zahka

This series introduces stats counters for psp. Device key rotations, and so called 'stale-events' are common to all drivers and are tracked by the core. A driver facing api is provided for reporting stats required by the "Implementation Requirements" section of the PSP Architecture Specification. Drivers must implement these stats. Lastly, implementations of the driver stats api for mlx5 and netdevsim are included. Here is the output of running the psp selftest suite and then printing out stats with the ynl cli on system with a psp-capable CX7: $ ./ksft-psp-stats/drivers/net/psp.py TAP version 13 1..28 ok 1 psp.test_case # SKIP Test requires IPv4 connectivity ok 2 psp.data_basic_send_v0_ip6 ok 3 psp.test_case # SKIP Test requires IPv4 connectivity ok 4 psp.data_basic_send_v1_ip6 ok 5 psp.test_case # SKIP Test requires IPv4 connectivity ok 6 psp.data_basic_send_v2_ip6 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-128') ok 7 psp.test_case # SKIP Test requires IPv4 connectivity ok 8 psp.data_basic_send_v3_ip6 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-256') ok 9 psp.test_case # SKIP Test requires IPv4 connectivity ok 10 psp.data_mss_adjust_ip6 ok 11 psp.dev_list_devices ok 12 psp.dev_get_device ok 13 psp.dev_get_device_bad ok 14 psp.dev_rotate ok 15 psp.dev_rotate_spi ok 16 psp.assoc_basic ok 17 psp.assoc_bad_dev ok 18 psp.assoc_sk_only_conn ok 19 psp.assoc_sk_only_mismatch ok 20 psp.assoc_sk_only_mismatch_tx ok 21 psp.assoc_sk_only_unconn ok 22 psp.assoc_version_mismatch ok 23 psp.assoc_twice ok 24 psp.data_send_bad_key ok 25 psp.data_send_disconnect ok 26 psp.data_stale_key ok 27 psp.removal_device_rx # XFAIL Test only works on netdevsim ok 28 psp.removal_device_bi # XFAIL Test only works on netdevsim # Totals: pass:19 fail:0 xfail:2 xpass:0 skip:7 error:0 # # Responder logs (0): # STDERR: # Set PSP enable on device 1 to 0x3 # Set PSP enable on device 1 to 0x0 $ cd ynl/ $ ./pyynl/cli.py --spec netlink/specs/psp.yaml --dump get-stats [{'dev-id': 1, 'key-rotations': 5, 'rx-auth-fail': 21, 'rx-bad': 0, 'rx-bytes': 11844, 'rx-error': 0, 'rx-packets': 94, 'stale-events': 6, 'tx-bytes': 1128456, 'tx-error': 0, 'tx-packets': 780}] CHANGES: v3: - simplify error path in accel_psp_fs_init_tx() - avoid casting argument in mlx5e_accel_psp_fs_get_stats_fill() - delete unused member stats member in mlx5e_psp - remove zero length array from psp_dev_stats v2: https://lore.kernel.org/netdev/20251028000018.3869664-1-daniel.zahka@gmail.… - don't return skb->len from psp_nl_get_stats_dumpit() on success and EMSGSIZE - use %pe to print PTR_ERR() v1: https://lore.kernel.org/netdev/20251022193739.1376320-1-daniel.zahka@gmail.… Daniel Zahka (2): selftests: drv-net: psp: add assertions on core-tracked psp dev stats netdevsim: implement psp device stats Jakub Kicinski (3): psp: report basic stats from the core psp: add stats from psp spec to driver facing api net/mlx5e: Add PSP stats support for Rx/Tx flows Documentation/netlink/specs/psp.yaml | 95 +++++++ .../mellanox/mlx5/core/en_accel/psp.c | 233 ++++++++++++++++-- .../mellanox/mlx5/core/en_accel/psp.h | 16 ++ .../mellanox/mlx5/core/en_accel/psp_rxtx.c | 1 + .../net/ethernet/mellanox/mlx5/core/en_main.c | 5 + drivers/net/netdevsim/netdevsim.h | 5 + drivers/net/netdevsim/psp.c | 27 ++ include/net/psp/types.h | 32 +++ include/uapi/linux/psp.h | 18 ++ net/psp/psp-nl-gen.c | 19 ++ net/psp/psp-nl-gen.h | 2 + net/psp/psp_main.c | 3 +- net/psp/psp_nl.c | 93 +++++++ net/psp/psp_sock.c | 4 +- tools/testing/selftests/drivers/net/psp.py | 13 + 15 files changed, 549 insertions(+), 17 deletions(-) -- 2.47.3

3 weeks, 6 days

2
6
0 0

[PATCH net v3] selftests: net: local_termination: Wait for interfaces to come up

by A. Sverdlin

From: Alexander Sverdlin <alexander.sverdlin(a)siemens.com> It seems that most of the tests prepare the interfaces once before the test run (setup_prepare()), rely on setup_wait() to wait for link and only then run the test(s). local_termination brings the physical interfaces down and up during test run but never wait for them to come up. If the auto-negotiation takes some seconds, first test packets are being lost, which leads to false-negative test results. Use setup_wait() in run_test() to make sure auto-negotiation has been completed after all simple_if_init() calls on physical interfaces and test packets will not be lost because of the race against link establishment. Fixes: 90b9566aa5cd3f ("selftests: forwarding: add a test for local_termination.sh") Reviewed-by: Vladimir Oltean <vladimir.oltean(a)nxp.com> Signed-off-by: Alexander Sverdlin <alexander.sverdlin(a)siemens.com> --- Changelog: v3: - moved setup_wait() from individual test groups into run_test() v2: - replaced "setup_wait_dev $h1; setup_wait_dev $h2" with setup_wait() tools/testing/selftests/net/forwarding/local_termination.sh | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/net/forwarding/local_termination.sh b/tools/testing/selftests/net/forwarding/local_termination.sh index ecd34f364125c..892895659c7e4 100755 --- a/tools/testing/selftests/net/forwarding/local_termination.sh +++ b/tools/testing/selftests/net/forwarding/local_termination.sh @@ -176,6 +176,8 @@ run_test() local rcv_dmac=$(mac_get $rcv_if_name) local should_receive + setup_wait + tcpdump_start $rcv_if_name mc_route_prepare $send_if_name -- 2.51.1

3 weeks, 6 days

2
1
0 0

[PATCH net-next 0/4] netconsole: Allow userdata buffer to grow dynamically

by Gustavo Luiz Duarte

The current netconsole implementation allocates a static buffer for extradata (userdata + sysdata) with a fixed size of MAX_EXTRADATA_ENTRY_LEN * MAX_EXTRADATA_ITEMS bytes for every target, regardless of whether userspace actually uses this feature. This forces us to keep MAX_EXTRADATA_ITEMS small (16), which is restrictive for users who need to attach more metadata to their log messages. This patch series enables dynamic allocation of the userdata buffer, allowing it to grow on-demand based on actual usage. The series: 1. Refactors send_fragmented_body() to simplify handling of separated userdata and sysdata (patch 1/4) 2. Splits userdata and sysdata into separate buffers (patch 2/4) 3. Implements dynamic allocation for the userdata buffer (patch 3/4) 4. Increases MAX_USERDATA_ITEMS from 16 to 256 now that we can do so without memory waste (patch 4/4) Benefits: - No memory waste when userdata is not used - Targets that use userdata only consume what they need - Users can attach significantly more metadata without impacting systems that don't use this feature Signed-off-by: Gustavo Luiz Duarte <gustavold(a)gmail.com> --- Gustavo Luiz Duarte (4): netconsole: Simplify send_fragmented_body() netconsole: Split userdata and sysdata netconsole: Dynamic allocation of userdata buffer netconsole: Increase MAX_USERDATA_ITEMS drivers/net/netconsole.c | 338 +++++++++------------ .../selftests/drivers/net/netcons_overflow.sh | 2 +- 2 files changed, 152 insertions(+), 188 deletions(-) --- base-commit: 89aec171d9d1ab168e43fcf9754b82e4c0aef9b9 change-id: 20251007-netconsole_dynamic_extradata-21bd9d726568 Best regards, -- Gustavo Duarte <gustavold(a)meta.com>

3 weeks, 6 days

2
8
0 0

[PATCH] kselftest/arm64: Align zt-test register dumps

by Mark Rutland

The zt-test output is awkward to read, as the 'Expected' value isn't dumped on its own line and isn't aligned with the 'Got' value beneath. For example: Mismatch: PID=5281, iteration=3270249 Expected [00a1146901a1146902a1146903a1146904a1146905a1146906a1146907a1146908a1146909a114690aa114690ba114690ca114690da114690ea114690fa11469] Got [00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000] SVCR: 2 Add a newline, matching the other FPSIMD/SVE/SME tests, so that we get output that can be read more easily: Mismatch: PID=5281, iteration=3270249 Expected [00a1146901a1146902a1146903a1146904a1146905a1146906a1146907a1146908a1146909a114690aa114690ba114690ca114690da114690ea114690fa11469] Got [00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000] SVCR: 2 Admittedly this isn't all that important when the 'Got' value is all zeroes, but otherwise this would be a major help for identifying which portion of the 'Got' value is not as expected. Signed-off-by: Mark Rutland <mark.rutland(a)arm.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: Mark Brown <broonie(a)kernel.org> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Will Deacon <will(a)kernel.org> Cc: linux-arm-kernel(a)lists.infradead.org Cc: linux-kselftest(a)vger.kernel.org --- tools/testing/selftests/arm64/fp/zt-test.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/arm64/fp/zt-test.S b/tools/testing/selftests/arm64/fp/zt-test.S index 38080f3c32804..a8df057716707 100644 --- a/tools/testing/selftests/arm64/fp/zt-test.S +++ b/tools/testing/selftests/arm64/fp/zt-test.S @@ -276,7 +276,7 @@ function barf bl putdec puts ", iteration=" mov x0, x22 - bl putdec + bl putdecn puts "\tExpected [" mov x0, x10 mov x1, x12 -- 2.30.2

3 weeks, 6 days

3
2
0 0

[PATCH v6 0/2] KVM: guest_memfd: use write for population

by Kalyazin, Nikita

[ based on kvm/next ] Implement guest_memfd population via the write syscall. This is useful in non-CoCo use cases where the host can access guest memory. Even though the same can also be achieved via userspace mapping and memcpying from userspace, write provides a more performant option because it does not need to set page tables and it does not cause a page fault for every page like memcpy would. Note that memcpy cannot be accelerated via MADV_POPULATE_WRITE as it is not supported by guest_memfd and relies on GUP. Populating 512MiB of guest_memfd on a x86 machine: - via memcpy: 436 ms - via write: 202 ms (-54%) The write syscall support is conditional on kvm_gmem_supports_mmap. When in-place shared/private conversion is supported, write should only be allowed on shared pages. v6: - Make write support conditional on mmap support instead of relying on the up-to-date flag to decide whether writing to a page is allowed - James: Remove depenendencies on folio_test_large - James: Remove page alignment restriction - James: Formatting fixes v5: - https://lore.kernel.org/kvm/20250902111951.58315-1-kalyazin@amazon.com/ - Replace the call to the unexported filemap_remove_folio with zeroing the bytes that could not be copied - Fix checkpatch findings v4: - https://lore.kernel.org/kvm/20250828153049.3922-1-kalyazin@amazon.com - Switch from implementing the write callback to write_iter - Remove conditional compilation v3: - https://lore.kernel.org/kvm/20250303130838.28812-1-kalyazin@amazon.com - David/Mike D: Only compile support for the write syscall if CONFIG_KVM_GMEM_SHARED_MEM (now gone) is enabled. v2: - https://lore.kernel.org/kvm/20241129123929.64790-1-kalyazin@amazon.com - Switch from an ioctl to the write syscall to implement population v1: - https://lore.kernel.org/kvm/20241024095429.54052-1-kalyazin@amazon.com Nikita Kalyazin (2): KVM: guest_memfd: add generic population via write KVM: selftests: update guest_memfd write tests .../testing/selftests/kvm/guest_memfd_test.c | 51 ++++++++++++++++--- virt/kvm/guest_memfd.c | 49 ++++++++++++++++++ 2 files changed, 94 insertions(+), 6 deletions(-) base-commit: 6b36119b94d0b2bb8cea9d512017efafd461d6ac -- 2.50.1

3 weeks, 6 days

4
8
0 0

[PATCH v11 0/9] support FEAT_LSUI

by Yeoreum Yun

Since Armv9.6, FEAT_LSUI supplies the load/store instructions for previleged level to access to access user memory without clearing PSTATE.PAN bit. This patchset support FEAT_LSUI and applies in futex atomic operation and user_swpX emulation where can replace from ldxr/st{l}xr pair implmentation with clearing PSTATE.PAN bit to correspondant load/store unprevileged atomic operation without clearing PSTATE.PAN bit. Patch Sequences ================ Patch #1 adds cpufeature for FEAT_LSUI Patch #2-#3 expose FEAT_LSUI to guest Patch #4 adds Kconfig for FEAT_LSUI Patch #5-#6 support futex atomic-op with FEAT_LSUI Patch #7-#9 support user_swpX emulation with FEAT_LSUI Patch History ============== from v10 to v11: - use cast instruction to emulate deprecated swpb instruction - https://lore.kernel.org/all/20251103163224.818353-1-yeoreum.yun@arm.com/ from v9 to v10: - apply FEAT_LSUI to user_swpX emulation. - add test coverage for LSUI bit in ID_AA64ISAR3_EL1 - rebase to v6.18-rc4 - https://lore.kernel.org/all/20250922102244.2068414-1-yeoreum.yun@arm.com/ from v8 to v9: - refotoring __lsui_cmpxchg64() - rebase to v6.17-rc7 - https://lore.kernel.org/all/20250917110838.917281-1-yeoreum.yun@arm.com/ from v7 to v8: - implements futex_atomic_eor() and futex_atomic_cmpxchg() with casalt with C helper. - Drop the small optimisation on ll/sc futex_atomic_set operation. - modify some commit message. - https://lore.kernel.org/all/20250816151929.197589-1-yeoreum.yun@arm.com/ from v6 to v7: - wrap FEAT_LSUI with CONFIG_AS_HAS_LSUI in cpufeature - remove unnecessary addition of indentation. - remove unnecessary mte_tco_enable()/disable() on LSUI operation. - https://lore.kernel.org/all/20250811163635.1562145-1-yeoreum.yun@arm.com/ from v5 to v6: - rebase to v6.17-rc1 - https://lore.kernel.org/all/20250722121956.1509403-1-yeoreum.yun@arm.com/ from v4 to v5: - remove futex_ll_sc.h futext_lsui and lsui.h and move them to futex.h - reorganize the patches. - https://lore.kernel.org/all/20250721083618.2743569-1-yeoreum.yun@arm.com/ from v3 to v4: - rebase to v6.16-rc7 - modify some patch's title. - https://lore.kernel.org/all/20250617183635.1266015-1-yeoreum.yun@arm.com/ from v2 to v3: - expose FEAT_LUSI to guest - add help section for LUSI Kconfig - https://lore.kernel.org/all/20250611151154.46362-1-yeoreum.yun@arm.com/ from v1 to v2: - remove empty v9.6 menu entry - locate HAS_LUSI in cpucaps in order - https://lore.kernel.org/all/20250611104916.10636-1-yeoreum.yun@arm.com/ Yeoreum Yun (9): arm64: cpufeature: add FEAT_LSUI KVM: arm64: expose FEAT_LSUI to guest KVM: arm64: kselftest: set_id_regs: add test for FEAT_LSUI arm64: Kconfig: Detect toolchain support for LSUI arm64: futex: refactor futex atomic operation arm64: futex: support futex with FEAT_LSUI arm64: separate common LSUI definitions into lsui.h arm64: armv8_deprecated: convert user_swpX to inline function arm64: armv8_deprecated: apply FEAT_LSUI for swpX emulation. arch/arm64/Kconfig | 5 + arch/arm64/include/asm/futex.h | 291 +++++++++++++++--- arch/arm64/include/asm/lsui.h | 25 ++ arch/arm64/kernel/armv8_deprecated.c | 111 +++++-- arch/arm64/kernel/cpufeature.c | 10 + arch/arm64/kvm/sys_regs.c | 3 +- arch/arm64/tools/cpucaps | 1 + .../testing/selftests/kvm/arm64/set_id_regs.c | 1 + 8 files changed, 381 insertions(+), 66 deletions(-) create mode 100644 arch/arm64/include/asm/lsui.h base-commit: 6146a0f1dfae5d37442a9ddcba012add260bceb0 -- LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}

3 weeks, 6 days

2
13
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror