- Linux-kselftest-mirror - lists.linaro.org

[PATCH bpf-next v3 0/2] bpf: Allow XDP_REDIRECT for XDP dev-bound programs

by Lorenzo Bianconi

In the current implementation if the program is dev-bound to a specific device, it will not be possible to perform XDP_REDIRECT into a DEVMAP or CPUMAP even if the program is running in the driver NAPI context. Fix the issue introducing __bpf_prog_map_compatible utility routine in order to avoid bpf_prog_is_dev_bound() during the XDP program load. Continue forbidding to attach a dev-bound program to XDP maps. --- Changes in v3: - move seltest changes in a dedicated patch - Link to v2: https://lore.kernel.org/r/20250423-xdp-prog-bound-fix-v2-1-51742a5dfbce@ker… Changes in v2: - Introduce __bpf_prog_map_compatible() utility routine in order to skip bpf_prog_is_dev_bound check in bpf_check_tail_call() - Extend xdp_metadata selftest - Link to v1: https://lore.kernel.org/r/20250422-xdp-prog-bound-fix-v1-1-0b581fa186fe@ker… --- Lorenzo Bianconi (2): bpf: Allow XDP dev-bound programs to perform XDP_REDIRECT into maps selftests/bpf: xdp_metadata: check XDP_REDIRCT support for dev-bound progs kernel/bpf/core.c | 27 +++++++++++++--------- .../selftests/bpf/prog_tests/xdp_metadata.c | 22 +++++++++++++++++- tools/testing/selftests/bpf/progs/xdp_metadata.c | 13 +++++++++++ 3 files changed, 50 insertions(+), 12 deletions(-) --- base-commit: 91dbac4076537b464639953c055c460d2bdfc7ea change-id: 20250422-xdp-prog-bound-fix-9f30f3e134aa Best regards, -- Lorenzo Bianconi <lorenzo(a)kernel.org>

8 months, 2 weeks

2
3
0 0

[PATCH 0/5] riscv: misaligned: fix interruptible context and add tests

by Clément Léger

This series fixes misaligned access handling when in non interruptible context by reenabling interrupts when possible. A previous commit changed raw_copy_from_user() with copy_from_user() which enables page faulting and thus can sleep. While correct, a warning is now triggered due to being called in an invalid context (sleeping in non-interruptible). This series fixes that problem by factorizing misaligned load/store entry in a single function than reenables interrupt if the interrupted context had interrupts enabled. In order for misaligned handling problems to be caught sooner, add a kselftest for all the currently supported instructions . Note: these commits were actually part of another larger series for misaligned request delegation but was split since it isn't directly required. Clément Léger (5): riscv: misaligned: factorize trap handling riscv: misaligned: enable IRQs while handling misaligned accesses riscv: misaligned: use get_user() instead of __get_user() Documentation/sysctl: add riscv to unaligned-trap supported archs selftests: riscv: add misaligned access testing Documentation/admin-guide/sysctl/kernel.rst | 4 +- arch/riscv/kernel/traps.c | 57 ++-- arch/riscv/kernel/traps_misaligned.c | 2 +- .../selftests/riscv/misaligned/.gitignore | 1 + .../selftests/riscv/misaligned/Makefile | 12 + .../selftests/riscv/misaligned/common.S | 33 +++ .../testing/selftests/riscv/misaligned/fpu.S | 180 +++++++++++++ tools/testing/selftests/riscv/misaligned/gp.S | 103 +++++++ .../selftests/riscv/misaligned/misaligned.c | 254 ++++++++++++++++++ 9 files changed, 614 insertions(+), 32 deletions(-) create mode 100644 tools/testing/selftests/riscv/misaligned/.gitignore create mode 100644 tools/testing/selftests/riscv/misaligned/Makefile create mode 100644 tools/testing/selftests/riscv/misaligned/common.S create mode 100644 tools/testing/selftests/riscv/misaligned/fpu.S create mode 100644 tools/testing/selftests/riscv/misaligned/gp.S create mode 100644 tools/testing/selftests/riscv/misaligned/misaligned.c -- 2.49.0

8 months, 2 weeks

4
14
0 0

[RFC net-next v1 1/2] udp: Introduce UDP_STOP_RCV option for UDP

by Jiayuan Chen

For some services we are using "established-over-unconnected" model. ''' // create unconnected socket and 'listen()' srv_fd = socket(AF_INET, SOCK_DGRAM) setsockopt(srv_fd, SO_REUSEPORT) bind(srv_fd, SERVER_ADDR, SERVER_PORT) // 'accept()' data, client_addr = recvmsg(srv_fd) // create a connected socket for this request cli_fd = socket(AF_INET, SOCK_DGRAM) setsockopt(cli_fd, SO_REUSEPORT) bind(cli_fd, SERVER_ADDR, SERVER_PORT) connect(cli, client_addr) ... // do handshake with cli_fd ''' This programming pattern simulates accept() using UDP, creating a new socket for each client request. The server can then use separate sockets to handle client requests, avoiding the need to use a single UDP socket for I/O transmission. But there is a race condition between the bind() and connect() of the connected socket: We might receive unexpected packets belonging to the unconnected socket before connect() is executed, which is not what we need. (Of course, before connect(), the unconnected socket will also receive packets from the connected socket, which is easily resolved because upper-layer protocols typically require explicit boundaries, and we receive a complete packet before creating a connected socket.) Before this patch, the connected socket had to filter requests at recvmsg time, acting as a dispatcher to some extent. With this patch, we can consider the bind and connect operations to be atomic. Signed-off-by: Jiayuan Chen <jiayuan.chen(a)linux.dev> --- include/linux/udp.h | 1 + include/uapi/linux/udp.h | 1 + net/ipv4/udp.c | 13 ++++++++++--- net/ipv6/udp.c | 5 +++-- 4 files changed, 15 insertions(+), 5 deletions(-) diff --git a/include/linux/udp.h b/include/linux/udp.h index 895240177f4f..8d281a0c0d9d 100644 --- a/include/linux/udp.h +++ b/include/linux/udp.h @@ -42,6 +42,7 @@ enum { UDP_FLAGS_ENCAP_ENABLED, /* This socket enabled encap */ UDP_FLAGS_UDPLITE_SEND_CC, /* set via udplite setsockopt */ UDP_FLAGS_UDPLITE_RECV_CC, /* set via udplite setsockopt */ + UDP_FLAGS_STOP_RCV, /* Stop receiving packets */ }; struct udp_sock { diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h index edca3e430305..bb8e0a749a55 100644 --- a/include/uapi/linux/udp.h +++ b/include/uapi/linux/udp.h @@ -34,6 +34,7 @@ struct udphdr { #define UDP_NO_CHECK6_RX 102 /* Disable accepting checksum for UDP6 */ #define UDP_SEGMENT 103 /* Set GSO segmentation size */ #define UDP_GRO 104 /* This socket can receive UDP GRO packets */ +#define UDP_STOP_RCV 105 /* This socket will not receive any packets */ /* UDP encapsulation types */ #define UDP_ENCAP_ESPINUDP_NON_IKE 1 /* unused draft-ietf-ipsec-nat-t-ike-00/01 */ diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index f9f5b92cf4b6..764d337ab1b3 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -376,7 +376,8 @@ static int compute_score(struct sock *sk, const struct net *net, if (!net_eq(sock_net(sk), net) || udp_sk(sk)->udp_port_hash != hnum || - ipv6_only_sock(sk)) + ipv6_only_sock(sk) || + udp_test_bit(STOP_RCV, sk)) return -1; if (sk->sk_rcv_saddr != daddr) @@ -494,7 +495,7 @@ static struct sock *udp4_lib_lookup2(const struct net *net, result = inet_lookup_reuseport(net, sk, skb, sizeof(struct udphdr), saddr, sport, daddr, hnum, udp_ehashfn); - if (!result) { + if (!result || udp_test_bit(STOP_RCV, result)) { result = sk; continue; } @@ -3031,7 +3032,9 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname, set_xfrm_gro_udp_encap_rcv(up->encap_type, sk->sk_family, sk); sockopt_release_sock(sk); break; - + case UDP_STOP_RCV: + udp_assign_bit(STOP_RCV, sk, valbool); + break; /* * UDP-Lite's partial checksum coverage (RFC 3828). */ @@ -3120,6 +3123,10 @@ int udp_lib_getsockopt(struct sock *sk, int level, int optname, val = udp_test_bit(GRO_ENABLED, sk); break; + case UDP_STOP_RCV: + val = udp_test_bit(STOP_RCV, sk); + break; + /* The following two cannot be changed on UDP sockets, the return is * always 0 (which corresponds to the full checksum coverage of UDP). */ case UDPLITE_SEND_CSCOV: diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 7317f8e053f1..55896a78e94b 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -137,7 +137,8 @@ static int compute_score(struct sock *sk, const struct net *net, if (!net_eq(sock_net(sk), net) || udp_sk(sk)->udp_port_hash != hnum || - sk->sk_family != PF_INET6) + sk->sk_family != PF_INET6 || + udp_test_bit(STOP_RCV, sk)) return -1; if (!ipv6_addr_equal(&sk->sk_v6_rcv_saddr, daddr)) @@ -245,7 +246,7 @@ static struct sock *udp6_lib_lookup2(const struct net *net, result = inet6_lookup_reuseport(net, sk, skb, sizeof(struct udphdr), saddr, sport, daddr, hnum, udp6_ehashfn); - if (!result) { + if (!result || udp_test_bit(STOP_RCV, result)) { result = sk; continue; } -- 2.47.1

8 months, 2 weeks

3
5
0 0

[PATCH v3 0/3] RISC-V KVM selftests improvements

by Atish Patra

This series improves the following tests. 1. Get-reg-list : Adds vector support 2. SBI PMU test : Distinguish between different types of illegal exception The first patch is just helper patch that adds stval support during exception handling. Signed-off-by: Atish Patra <atishp(a)rivosinc.com> --- Changes in v3: - Dropped the redundant macros and rv32 specific csr details. - Changed to vcpu_get_reg from __vcpu_get_reg based on suggestion from Drew. - Added RB tags from Drew. - Link to v2: https://lore.kernel.org/r/20250429-kvm_selftest_improve-v2-0-51713f91e04a@r… Changes in v2: - Rebased on top of Linux 6.15-rc4 - Changed from ex_regs to pt_regs based on Drew's suggestion. - Dropped Anup's review on PATCH1 as it is significantly changed from last review. - Moved the instruction decoding macros to a common header file. - Improved the vector reg list test as per the feedback. - Link to v1: https://lore.kernel.org/r/20250324-kvm_selftest_improve-v1-0-583620219d4f@r… --- Atish Patra (3): KVM: riscv: selftests: Align the trap information wiht pt_regs KVM: riscv: selftests: Decode stval to identify exact exception type KVM: riscv: selftests: Add vector extension tests .../selftests/kvm/include/riscv/processor.h | 23 +++- tools/testing/selftests/kvm/lib/riscv/handlers.S | 139 +++++++++++---------- tools/testing/selftests/kvm/lib/riscv/processor.c | 2 +- tools/testing/selftests/kvm/riscv/arch_timer.c | 2 +- tools/testing/selftests/kvm/riscv/ebreak_test.c | 2 +- tools/testing/selftests/kvm/riscv/get-reg-list.c | 132 +++++++++++++++++++ tools/testing/selftests/kvm/riscv/sbi_pmu_test.c | 24 +++- 7 files changed, 247 insertions(+), 77 deletions(-) --- base-commit: f15d97df5afae16f40ecef942031235d1c6ba14f change-id: 20250324-kvm_selftest_improve-9bedb9f0a6d3 -- Regards, Atish patra

8 months, 2 weeks

2
4
0 0

[PATCH 0/2] Update kunit doc and tool with tips to build errors

by Shuah Khan

kunit kernel build could fail if there are ny build artifacts from a prior kernel build. These can be hard to debug if the build artifact happens to be generated header file. It took me a while to debug kunit build fail on ARCH=x86_64 in a tree which had a generated header file arch/x86/realmode/rm/pasyms.h make ARCH=um mrproper will not clean the tree. It is necessary to run make ARCH=x86_64 mrproper Example work-flow that could lead to this: make allmodconfig (x86_64) make ./tools/testing/kunit/kunit.py run Add this to the documentation and kunit.py build help message. Shuah Khan (2): doc: kunit: add information about cleaning source trees kunit: add tips to clean source tree to build help message Documentation/dev-tools/kunit/start.rst | 12 ++++++++++++ tools/testing/kunit/kunit.py | 2 +- 2 files changed, 13 insertions(+), 1 deletion(-) -- 2.47.2

8 months, 2 weeks

5
11
0 0

[PATCH RESEND] selftests/seccomp: fix syscall_restart test for arm compat

by Neill Kapron

The inconsistencies in the systcall ABI between arm and arm-compat can can cause a failure in the syscall_restart test due to the logic attempting to work around the differences. The 'machine' field for an ARM64 device running in compat mode can report 'armv8l' or 'armv8b' which matches with the string 'arm' when only examining the first three characters of the string. This change adds additional validation to the workaround logic to make sure we only take the arm path when running natively, not in arm-compat. Fixes: 256d0afb11d6 ("selftests/seccomp: build and pass on arm64") Signed-off-by: Neill Kapron <nkapron(a)google.com> --- tools/testing/selftests/seccomp/seccomp_bpf.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index b2f76a52215a..53bf6a9c801f 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -3166,12 +3166,15 @@ TEST(syscall_restart) ret = get_syscall(_metadata, child_pid); #if defined(__arm__) /* - * FIXME: * - native ARM registers do NOT expose true syscall. * - compat ARM registers on ARM64 DO expose true syscall. + * - values of utsbuf.machine include 'armv8l' or 'armb8b' + * for ARM64 running in compat mode. */ ASSERT_EQ(0, uname(&utsbuf)); - if (strncmp(utsbuf.machine, "arm", 3) == 0) { + if ((strncmp(utsbuf.machine, "arm", 3) == 0) && + (strncmp(utsbuf.machine, "armv8l", 6) != 0) && + (strncmp(utsbuf.machine, "armv8b", 6) != 0)) { EXPECT_EQ(__NR_nanosleep, ret); } else #endif -- 2.49.0.850.g28803427d3-goog

8 months, 2 weeks

2
1
0 0

[PATCH v1 1/3] selftests: pidfd: add missing sys/mount.h include in pidfd_fdinfo_test.c

by Peter Seiderer

Fix compile on openSUSE Tumbleweed (gcc-14.2.1, glibc-2.40): - add missing sys/mount.h include Fixes: pidfd_fdinfo_test.c: In function ‘child_fdinfo_nspid_test’: pidfd_fdinfo_test.c:230:13: error: implicit declaration of function ‘mount’ [-Wimplicit-function-declaration] 230 | r = mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, 0); | ^~~~~ Signed-off-by: Peter Seiderer <ps.report(a)gmx.net> --- tools/testing/selftests/pidfd/pidfd_fdinfo_test.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c b/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c index f062a986e382..f718aac75068 100644 --- a/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c +++ b/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c @@ -13,6 +13,7 @@ #include <syscall.h> #include <sys/wait.h> #include <sys/mman.h> +#include <sys/mount.h> #include "pidfd.h" #include "../kselftest.h" -- 2.47.1

8 months, 2 weeks

2
5
0 0

[PATCH] kselftest: cpufreq: Get rid of double suspend in rtcwake case

by Nícolas F. R. A. Prado

Commit 0b631ed3ce92 ("kselftest: cpufreq: Add RTC wakeup alarm") added support for automatic wakeup in the suspend routine of the cpufreq kselftest by using rtcwake, however it left the manual power state change in the common path. The end result is that when running the cpufreq kselftest with '-t suspend_rtc' or '-t hibernate_rtc', the system will go to sleep and be woken up by the RTC, but then immediately go to sleep again with no wakeup programmed, so it will sleep forever in an automated testing setup. Fix this by moving the manual power state change so that it only happens when not using rtcwake. Fixes: 0b631ed3ce92 ("kselftest: cpufreq: Add RTC wakeup alarm") Signed-off-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com> --- tools/testing/selftests/cpufreq/cpufreq.sh | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/cpufreq/cpufreq.sh b/tools/testing/selftests/cpufreq/cpufreq.sh index e350c521b4675080943d2c74dc31533979410316..3aad9db921b5339585da5711a08775ebd965aaf3 100755 --- a/tools/testing/selftests/cpufreq/cpufreq.sh +++ b/tools/testing/selftests/cpufreq/cpufreq.sh @@ -244,9 +244,10 @@ do_suspend() printf "Failed to suspend using RTC wake alarm\n" return 1 fi + else + echo $filename > $SYSFS/power/state fi - echo $filename > $SYSFS/power/state printf "Came out of $1\n" printf "Do basic tests after finishing $1 to verify cpufreq state\n\n" --- base-commit: 8a2d53ce3c5f82683ad3df9a9a55822816fe64e7 change-id: 20250429-ksft-cpufreq-suspend-rtc-double-fix-75c560e1a30f Best regards, -- Nícolas F. R. A. Prado <nfraprado(a)collabora.com>

8 months, 2 weeks

1
0
0 0

[PATCH v4 0/5] Add support for the Bus Lock Threshold

by Manali Shukla

Misbehaving guests can cause bus locks to degrade the performance of a system. Non-WB (write-back) and misaligned locked RMW (read-modify-write) instructions are referred to as "bus locks" and require system wide synchronization among all processors to guarantee the atomicity. The bus locks can impose notable performance penalties for all processors within the system. Support for the Bus Lock Threshold is indicated by CPUID Fn8000_000A_EDX[29] BusLockThreshold=1, the VMCB provides a Bus Lock Threshold enable bit and an unsigned 16-bit Bus Lock Threshold count. VMCB intercept bit VMCB Offset Bits Function 14h 5 Intercept bus lock operations Bus lock threshold count VMCB Offset Bits Function 120h 15:0 Bus lock counter During VMRUN, the bus lock threshold count is fetched and stored in an internal count register. Prior to executing a bus lock within the guest, the processor verifies the count in the bus lock register. If the count is greater than zero, the processor executes the bus lock, reducing the count. However, if the count is zero, the bus lock operation is not performed, and instead, a Bus Lock Threshold #VMEXIT is triggered to transfer control to the Virtual Machine Monitor (VMM). A Bus Lock Threshold #VMEXIT is reported to the VMM with VMEXIT code 0xA5h, VMEXIT_BUSLOCK. EXITINFO1 and EXITINFO2 are set to 0 on a VMEXIT_BUSLOCK. On a #VMEXIT, the processor writes the current value of the Bus Lock Threshold Counter to the VMCB. Note: Currently, virtualizing the Bus Lock Threshold feature for L1 guest is not supported. More details about the Bus Lock Threshold feature can be found in AMD APM [1]. v3 -> v4 - Incorporated Sean's review comments - Added a preparatory patch to move linear_rip out of kvm_pio_request, so that it can be used by the bus lock threshold patches. - Added complete_userspace_buslock() function to reload bus_lock_counter to '1' only if the usespace has not changed the RIP. - Added changes to continue running bus_lock_counter accross the nested transitions. v2 -> v3 - Drop parch to add virt tag in /proc/cpuinfo. - Incorporated Tom's review comments. v1 -> v2 - Incorporated misc review comments from Sean. - Removed bus_lock_counter module parameter. - Set the value of bus_lock_counter to zero by default and reload the value by 1 in bus lock exit handler. - Add documentation for the behavioral difference for KVM_EXIT_BUS_LOCK. - Improved selftest for buslock to work on SVM and VMX. - Rewrite the commit messages. Patches are prepared on kvm-next/next (c9ea48bb6ee6). Testing done: - Tested the Bus Lock Threshold functionality on normal, SEV, SEV-ES and SEV-SNP guests. - Tested the Bus Lock Threshold functionality on nested guests. v1: https://lore.kernel.org/kvm/20240709175145.9986-4-manali.shukla@amd.com/T/ v2: https://lore.kernel.org/kvm/20241001063413.687787-4-manali.shukla@amd.com/T/ v3: https://lore.kernel.org/kvm/20241004053341.5726-1-manali.shukla@amd.com/T/ [1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024, Vol 2, 15.14.5 Bus Lock Threshold. https://bugzilla.kernel.org/attachment.cgi?id=306250 Manali Shukla (3): KVM: x86: Preparatory patch to move linear_rip out of kvm_pio_request x86/cpufeatures: Add CPUID feature bit for the Bus Lock Threshold KVM: SVM: Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM CPUs Nikunj A Dadhania (2): KVM: SVM: Enable Bus lock threshold exit KVM: selftests: Add bus lock exit test Documentation/virt/kvm/api.rst | 19 +++ arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/include/asm/svm.h | 5 +- arch/x86/include/uapi/asm/svm.h | 2 + arch/x86/kvm/svm/nested.c | 42 ++++++ arch/x86/kvm/svm/svm.c | 38 +++++ arch/x86/kvm/svm/svm.h | 2 + arch/x86/kvm/x86.c | 8 +- tools/testing/selftests/kvm/Makefile.kvm | 1 + .../selftests/kvm/x86/kvm_buslock_test.c | 135 ++++++++++++++++++ 11 files changed, 249 insertions(+), 6 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86/kvm_buslock_test.c base-commit: c9ea48bb6ee6b28bbc956c1e8af98044618fed5e -- 2.34.1

8 months, 2 weeks

4
21
0 0

[PATCH 0/4] selftest: can: Start importing selftests from can-tests

by Felix Maurer

This is the initial import of a CAN selftest from can-tests[1] into the tree. For now, it is just a single test but when agreed on the structure, we intend to import more tests from can-tests and add additional test cases. The goal of moving the CAN selftests into the tree is to align the tests more closely with the kernel, improve testing of CAN in general, and to simplify running the tests automatically in the various kernel CI systems. I have cc'ed netdev and its reviewers and maintainers to make sure they are okay with the location of the tests and the changes to the paths in MAINTAINERS. The changes should be merged through linux-can-next and subsequent changes will not go to netdev anymore. [1]: https://github.com/linux-can/can-tests Felix Maurer (4): selftests: can: Import tst-filter from can-tests selftests: can: use kselftest harness in test_raw_filter selftests: can: Use fixtures in test_raw_filter selftests: can: Document test_raw_filter test cases MAINTAINERS | 2 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/net/can/.gitignore | 2 + tools/testing/selftests/net/can/Makefile | 11 + .../selftests/net/can/test_raw_filter.c | 338 ++++++++++++++++++ .../selftests/net/can/test_raw_filter.sh | 37 ++ 6 files changed, 391 insertions(+) create mode 100644 tools/testing/selftests/net/can/.gitignore create mode 100644 tools/testing/selftests/net/can/Makefile create mode 100644 tools/testing/selftests/net/can/test_raw_filter.c create mode 100755 tools/testing/selftests/net/can/test_raw_filter.sh -- 2.49.0

8 months, 2 weeks

3
18
0 0

[PATCH v2 0/3] RISC-V KVM selftests improvements

by Atish Patra

This series improves the following tests. 1. Get-reg-list : Adds vector support 2. SBI PMU test : Distinguish between different types of illegal exception The first patch is just helper patch that adds stval support during exception handling. Signed-off-by: Atish Patra <atishp(a)rivosinc.com> --- Changes in v2: - Rebased on top of Linux 6.15-rc4 - Changed from ex_regs to pt_regs based on Drew's suggestion. - Dropped Anup's review on PATCH1 as it is significantly changed from last review. - Moved the instruction decoding macros to a common header file. - Improved the vector reg list test as per the feedback. - Link to v1: https://lore.kernel.org/r/20250324-kvm_selftest_improve-v1-0-583620219d4f@r… --- Atish Patra (3): KVM: riscv: selftests: Align the trap information wiht pt_regs KVM: riscv: selftests: Decode stval to identify exact exception type KVM: riscv: selftests: Add vector extension tests .../selftests/kvm/include/riscv/processor.h | 23 ++- tools/testing/selftests/kvm/lib/riscv/handlers.S | 164 ++++++++++++--------- tools/testing/selftests/kvm/lib/riscv/processor.c | 2 +- tools/testing/selftests/kvm/riscv/arch_timer.c | 2 +- tools/testing/selftests/kvm/riscv/ebreak_test.c | 2 +- tools/testing/selftests/kvm/riscv/get-reg-list.c | 133 +++++++++++++++++ tools/testing/selftests/kvm/riscv/sbi_pmu_test.c | 24 ++- 7 files changed, 270 insertions(+), 80 deletions(-) --- base-commit: f15d97df5afae16f40ecef942031235d1c6ba14f change-id: 20250324-kvm_selftest_improve-9bedb9f0a6d3 -- Regards, Atish patra

8 months, 2 weeks

3
9
0 0

[PATCH v14 00/27] riscv control-flow integrity for usermode

by Deepak Gupta

Basics and overview =================== Software with larger attack surfaces (e.g. network facing apps like databases, browsers or apps relying on browser runtimes) suffer from memory corruption issues which can be utilized by attackers to bend control flow of the program to eventually gain control (by making their payload executable). Attackers are able to perform such attacks by leveraging call-sites which rely on indirect calls or return sites which rely on obtaining return address from stack memory. To mitigate such attacks, risc-v extension zicfilp enforces that all indirect calls must land on a landing pad instruction `lpad` else cpu will raise software check exception (a new cpu exception cause code on riscv). Similarly for return flow, risc-v extension zicfiss extends architecture with - `sspush` instruction to push return address on a shadow stack - `sspopchk` instruction to pop return address from shadow stack and compare with input operand (i.e. return address on stack) - `sspopchk` to raise software check exception if comparision above was a mismatch - Protection mechanism using which shadow stack is not writeable via regular store instructions More information an details can be found at extensions github repo [1]. Equivalent to landing pad (zicfilp) on x86 is `ENDBRANCH` instruction in Intel CET [3] and branch target identification (BTI) [4] on arm. Similarly x86's Intel CET has shadow stack [5] and arm64 has guarded control stack (GCS) [6] which are very similar to risc-v's zicfiss shadow stack. x86 and arm64 support for user mode shadow stack is already in mainline. Kernel awareness for user control flow integrity ================================================ This series picks up Samuel Holland's envcfg changes [2] as well. So if those are being applied independently, they should be removed from this series. Enabling: In order to maintain compatibility and not break anything in user mode, kernel doesn't enable control flow integrity cpu extensions on binary by default. Instead exposes a prctl interface to enable, disable and lock the shadow stack or landing pad feature for a task. This allows userspace (loader) to enumerate if all objects in its address space are compiled with shadow stack and landing pad support and accordingly enable the feature. Additionally if a subsequent `dlopen` happens on a library, user mode can take a decision again to disable the feature (if incoming library is not compiled with support) OR terminate the task (if user mode policy is strict to have all objects in address space to be compiled with control flow integirty cpu feature). prctl to enable shadow stack results in allocating shadow stack from virtual memory and activating for user address space. x86 and arm64 are also following same direction due to similar reason(s). clone/fork: On clone and fork, cfi state for task is inherited by child. Shadow stack is part of virtual memory and is a writeable memory from kernel perspective (writeable via a restricted set of instructions aka shadow stack instructions) Thus kernel changes ensure that this memory is converted into read-only when fork/clone happens and COWed when fault is taken due to sspush, sspopchk or ssamoswap. In case `CLONE_VM` is specified and shadow stack is to be enabled, kernel will automatically allocate a shadow stack for that clone call. map_shadow_stack: x86 introduced `map_shadow_stack` system call to allow user space to explicitly map shadow stack memory in its address space. It is useful to allocate shadow for different contexts managed by a single thread (green threads or contexts) risc-v implements this system call as well. signal management: If shadow stack is enabled for a task, kernel performs an asynchronous control flow diversion to deliver the signal and eventually expects userspace to issue sigreturn so that original execution can be resumed. Even though resume context is prepared by kernel, it is in user space memory and is subject to memory corruption and corruption bugs can be utilized by attacker in this race window to perform arbitrary sigreturn and eventually bypass cfi mechanism. Another issue is how to ensure that cfi related state on sigcontext area is not trampled by legacy apps or apps compiled with old kernel headers. In order to mitigate control-flow hijacting, kernel prepares a token and place it on shadow stack before signal delivery and places address of token in sigcontext structure. During sigreturn, kernel obtains address of token from sigcontext struture, reads token from shadow stack and validates it and only then allow sigreturn to succeed. Compatiblity issue is solved by adopting dynamic sigcontext management introduced for vector extension. This series re-factor the code little bit to allow future sigcontext management easy (as proposed by Andy Chiu from SiFive) config and compilation: Introduce a new risc-v config option `CONFIG_RISCV_USER_CFI`. Selecting this config option picks the kernel support for user control flow integrity. This optin is presented only if toolchain has shadow stack and landing pad support. And is on purpose guarded by toolchain support. Reason being that eventually vDSO also needs to be compiled in with shadow stack and landing pad support. vDSO compile patches are not included as of now because landing pad labeling scheme is yet to settle for usermode runtime. To get more information on kernel interactions with respect to zicfilp and zicfiss, patch series adds documentation for `zicfilp` and `zicfiss` in following: Documentation/arch/riscv/zicfiss.rst Documentation/arch/riscv/zicfilp.rst How to test this series ======================= Toolchain --------- $ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev $ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static" $ make -j$(nproc) Qemu ---- Get the lastest qemu $ cd qemu $ mkdir build $ cd build $ ../configure --target-list=riscv64-softmmu $ make -j$(nproc) Opensbi ------- $ git clone git@github.com:deepak0414/opensbi.git -b v6_cfi_spec_split_opensbi $ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic Linux ----- Running defconfig is fine. CFI is enabled by default if the toolchain supports it. $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) In case you're building your own rootfs using toolchain, please make sure you pick following patch to ensure that vDSO compiled with lpad and shadow stack. "arch/riscv: compile vdso with landing pad" Branch where above patch can be picked https://github.com/deepak0414/linux-riscv-cfi/tree/vdso_user_cfi_v6.12-rc1 Running ------- Modify your qemu command to have: -bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin -cpu rv64,zicfilp=true,zicfiss=true,zimop=true,zcmop=true vDSO related Opens (in the flux) ================================= I am listing these opens for laying out plan and what to expect in future patch sets. And of course for the sake of discussion. Shadow stack and landing pad enabling in vDSO ---------------------------------------------- vDSO must have shadow stack and landing pad support compiled in for task to have shadow stack and landing pad support. This patch series doesn't enable that (yet). Enabling shadow stack support in vDSO should be straight forward (intend to do that in next versions of patch set). Enabling landing pad support in vDSO requires some collaboration with toolchain folks to follow a single label scheme for all object binaries. This is necessary to ensure that all indirect call-sites are setting correct label and target landing pads are decorated with same label scheme. How many vDSOs --------------- Shadow stack instructions are carved out of zimop (may be operations) and if CPU doesn't implement zimop, they're illegal instructions. Kernel could be running on a CPU which may or may not implement zimop. And thus kernel will have to carry 2 different vDSOs and expose the appropriate one depending on whether CPU implements zimop or not. References ========== [1] - https://github.com/riscv/riscv-cfi [2] - https://lore.kernel.org/all/20240814081126.956287-1-samuel.holland@sifive.c… [3] - https://lwn.net/Articles/889475/ [4] - https://developer.arm.com/documentation/109576/0100/Branch-Target-Identific… [5] - https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-i… [6] - https://lwn.net/Articles/940403/ To: Thomas Gleixner <tglx(a)linutronix.de> To: Ingo Molnar <mingo(a)redhat.com> To: Borislav Petkov <bp(a)alien8.de> To: Dave Hansen <dave.hansen(a)linux.intel.com> To: x86(a)kernel.org To: H. Peter Anvin <hpa(a)zytor.com> To: Andrew Morton <akpm(a)linux-foundation.org> To: Liam R. Howlett <Liam.Howlett(a)oracle.com> To: Vlastimil Babka <vbabka(a)suse.cz> To: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> To: Paul Walmsley <paul.walmsley(a)sifive.com> To: Palmer Dabbelt <palmer(a)dabbelt.com> To: Albert Ou <aou(a)eecs.berkeley.edu> To: Conor Dooley <conor(a)kernel.org> To: Rob Herring <robh(a)kernel.org> To: Krzysztof Kozlowski <krzk+dt(a)kernel.org> To: Arnd Bergmann <arnd(a)arndb.de> To: Christian Brauner <brauner(a)kernel.org> To: Peter Zijlstra <peterz(a)infradead.org> To: Oleg Nesterov <oleg(a)redhat.com> To: Eric Biederman <ebiederm(a)xmission.com> To: Kees Cook <kees(a)kernel.org> To: Jonathan Corbet <corbet(a)lwn.net> To: Shuah Khan <shuah(a)kernel.org> To: Jann Horn <jannh(a)google.com> To: Conor Dooley <conor+dt(a)kernel.org> To: Miguel Ojeda <ojeda(a)kernel.org> To: Alex Gaynor <alex.gaynor(a)gmail.com> To: Boqun Feng <boqun.feng(a)gmail.com> To: Gary Guo <gary(a)garyguo.net> To: Björn Roy Baron <bjorn3_gh(a)protonmail.com> To: Benno Lossin <benno.lossin(a)proton.me> To: Andreas Hindborg <a.hindborg(a)kernel.org> To: Alice Ryhl <aliceryhl(a)google.com> To: Trevor Gross <tmgross(a)umich.edu> Cc: linux-kernel(a)vger.kernel.org Cc: linux-fsdevel(a)vger.kernel.org Cc: linux-mm(a)kvack.org Cc: linux-riscv(a)lists.infradead.org Cc: devicetree(a)vger.kernel.org Cc: linux-arch(a)vger.kernel.org Cc: linux-doc(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: alistair.francis(a)wdc.com Cc: richard.henderson(a)linaro.org Cc: jim.shu(a)sifive.com Cc: andybnac(a)gmail.com Cc: kito.cheng(a)sifive.com Cc: charlie(a)rivosinc.com Cc: atishp(a)rivosinc.com Cc: evan(a)rivosinc.com Cc: cleger(a)rivosinc.com Cc: alexghiti(a)rivosinc.com Cc: samitolvanen(a)google.com Cc: broonie(a)kernel.org Cc: rick.p.edgecombe(a)intel.com Cc: rust-for-linux(a)vger.kernel.org changelog --------- v14: - rebased on top of palmer/sbi-v3. Thus dropped clement's FWFT patches Updated RISCV_ISA_EXT_XXXX in hwcap and hwprobe constants. - Took Radim's suggestions on bitfields. - Placed cfi_state at the end of thread_info block so that current situation is not disturbed with respect to member fields of thread_info in single cacheline. v13: - cpu_supports_shadow_stack/cpu_supports_indirect_br_lp_instr uses riscv_has_extension_unlikely() - uses nops(count) to create nop slide - RISCV_ACQUIRE_BARRIER is not needed in `amo_user_shstk`. Removed it - changed ternaries to simply use implicit casting to convert to bool. - kernel command line allows to disable zicfilp and zicfiss independently. updated kernel-parameters.txt. - ptrace user abi for cfi uses bitmasks instead of bitfields. Added ptrace kselftest. - cosmetic and grammatical changes to documentation. v12: - It seems like I had accidently squashed arch agnostic indirect branch tracking prctl and riscv implementation of those prctls. Split them again. - set_shstk_status/set_indir_lp_status perform CSR writes only when CPU support is available. As suggested by Zong Li. - Some minor clean up in kselftests as suggested by Zong Li. v11: - patch "arch/riscv: compile vdso with landing pad" was unconditionally selecting `_zicfilp` for vDSO compile. fixed that. Changed `lpad 1` to to `lpad 0`. v10: - dropped "mm: helper `is_shadow_stack_vma` to check shadow stack vma". This patch is not that interesting to this patch series for risc-v. There are instances in arch directories where VM_SHADOW_STACK flag is anyways used. Dropping this patch to expedite merging in riscv tree. - Took suggestions from `Clement` on "riscv: zicfiss / zicfilp enumeration" to validate presence of cfi based on config. - Added a patch for vDSO to have `lpad 0`. I had omitted this earlier to make sure we add single vdso object with cfi enabled. But a vdso object with scheme of zero labeled landing pad is least common denominator and should work with all objects of zero labeled as well as function-signature labeled objects. v9: - rebased on master (39a803b754d5 fix braino in "9p: fix ->rename_sem exclusion") - dropped "mm: Introduce ARCH_HAS_USER_SHADOW_STACK" (master has it from arm64/gcs) - dropped "prctl: arch-agnostic prctl for shadow stack" (master has it from arm64/gcs) v8: - rebased on palmer/for-next - dropped samuel holland's `envcfg` context switch patches. they are in parlmer/for-next v7: - Removed "riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv" Instead using `deactivate_mm` flow to clean up. see here for more context https://lore.kernel.org/all/20230908203655.543765-1-rick.p.edgecombe@intel.… - Changed the header include in `kselftest`. Hopefully this fixes compile issue faced by Zong Li at SiFive. - Cleaned up an orphaned change to `mm/mmap.c` in below patch "riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE" - Lock interfaces for shadow stack and indirect branch tracking expect arg == 0 Any future evolution of this interface should accordingly define how arg should be setup. - `mm/map.c` has an instance of using `VM_SHADOW_STACK`. Fixed it to use helper `is_shadow_stack_vma`. - Link to v6: https://lore.kernel.org/r/20241008-v5_user_cfi_series-v6-0-60d9fe073f37@riv… v6: - Picked up Samuel Holland's changes as is with `envcfg` placed in `thread` instead of `thread_info` - fixed unaligned newline escapes in kselftest - cleaned up messages in kselftest and included test output in commit message - fixed a bug in clone path reported by Zong Li - fixed a build issue if CONFIG_RISCV_ISA_V is not selected (this was introduced due to re-factoring signal context management code) v5: - rebased on v6.12-rc1 - Fixed schema related issues in device tree file - Fixed some of the documentation related issues in zicfilp/ss.rst (style issues and added index) - added `SHADOW_STACK_SET_MARKER` so that implementation can define base of shadow stack. - Fixed warnings on definitions added in usercfi.h when CONFIG_RISCV_USER_CFI is not selected. - Adopted context header based signal handling as proposed by Andy Chiu - Added support for enabling kernel mode access to shadow stack using FWFT (https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-firmware…) - Link to v5: https://lore.kernel.org/r/20241001-v5_user_cfi_series-v1-0-3ba65b6e550f@riv… (Note: I had an issue in my workflow due to which version number wasn't picked up correctly while sending out patches) v4: - rebased on 6.11-rc6 - envcfg: Converged with Samuel Holland's patches for envcfg management on per- thread basis. - vma_is_shadow_stack is renamed to is_vma_shadow_stack - picked up Mark Brown's `ARCH_HAS_USER_SHADOW_STACK` patch - signal context: using extended context management to maintain compatibility. - fixed `-Wmissing-prototypes` compiler warnings for prctl functions - Documentation fixes and amending typos. - Link to v4: https://lore.kernel.org/all/20240912231650.3740732-1-debug@rivosinc.com/ v3: - envcfg logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been picked on per task basis, even though CPU didn't implement it. Fixed in this series. - dt-bindings As suggested, split into separate commit. fixed the messaging that spec is in public review - arch_is_shadow_stack change arch_is_shadow_stack changed to vma_is_shadow_stack - hwprobe zicfiss / zicfilp if present will get enumerated in hwprobe - selftests As suggested, added object and binary filenames to .gitignore Selftest binary anyways need to be compiled with cfi enabled compiler which will make sure that landing pad and shadow stack are enabled. Thus removed separate enable/disable tests. Cleaned up tests a bit. - Link to v3: https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/ v2: - Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow integrity for user mode programs can be compiled in the kernel. - Enabling of control flow integrity for user programs is left to user runtime - This patch series introduces arch agnostic `prctls` to enable shadow stack and indirect branch tracking. And implements them on riscv. --- Changes in v14: - changelog posted just below cover letter - Link to v13: https://lore.kernel.org/r/20250424-v5_user_cfi_series-v13-0-971437de586a@ri… Changes in v13: - changelog posted just below cover letter - Link to v12: https://lore.kernel.org/r/20250314-v5_user_cfi_series-v12-0-e51202b53138@ri… Changes in v12: - changelog posted just below cover letter - Link to v11: https://lore.kernel.org/r/20250310-v5_user_cfi_series-v11-0-86b36cbfb910@ri… Changes in v11: - changelog posted just below cover letter - Link to v10: https://lore.kernel.org/r/20250210-v5_user_cfi_series-v10-0-163dcfa31c60@ri… --- Andy Chiu (1): riscv: signal: abstract header saving for setup_sigcontext Deepak Gupta (25): mm: VM_SHADOW_STACK definition for riscv dt-bindings: riscv: zicfilp and zicfiss in dt-bindings (extensions.yaml) riscv: zicfiss / zicfilp enumeration riscv: zicfiss / zicfilp extension csr and bit definitions riscv: usercfi state for task and save/restore of CSR_SSP on trap entry/exit riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE riscv mm: manufacture shadow stack pte riscv mmu: teach pte_mkwrite to manufacture shadow stack PTEs riscv mmu: write protect and shadow stack riscv/mm: Implement map_shadow_stack() syscall riscv/shstk: If needed allocate a new shadow stack on clone riscv: Implements arch agnostic shadow stack prctls prctl: arch-agnostic prctl for indirect branch tracking riscv: Implements arch agnostic indirect branch tracking prctls riscv/traps: Introduce software check exception riscv/signal: save and restore of shadow stack for signal riscv/kernel: update __show_regs to print shadow stack register riscv/ptrace: riscv cfi status and state via ptrace and in core files riscv/hwprobe: zicfilp / zicfiss enumeration in hwprobe riscv: kernel command line option to opt out of user cfi riscv: enable kernel access to shadow stack memory via FWFT sbi call riscv: create a config for shadow stack and landing pad instr support riscv: Documentation for landing pad / indirect branch tracking riscv: Documentation for shadow stack on riscv kselftest/riscv: kselftest for user mode cfi Jim Shu (1): arch/riscv: compile vdso with landing pad Documentation/admin-guide/kernel-parameters.txt | 8 + Documentation/arch/riscv/index.rst | 2 + Documentation/arch/riscv/zicfilp.rst | 115 +++++ Documentation/arch/riscv/zicfiss.rst | 179 +++++++ .../devicetree/bindings/riscv/extensions.yaml | 14 + arch/riscv/Kconfig | 20 + arch/riscv/Makefile | 5 +- arch/riscv/include/asm/asm-prototypes.h | 1 + arch/riscv/include/asm/assembler.h | 44 ++ arch/riscv/include/asm/cpufeature.h | 12 + arch/riscv/include/asm/csr.h | 16 + arch/riscv/include/asm/entry-common.h | 2 + arch/riscv/include/asm/hwcap.h | 2 + arch/riscv/include/asm/mman.h | 25 + arch/riscv/include/asm/mmu_context.h | 7 + arch/riscv/include/asm/pgtable.h | 30 +- arch/riscv/include/asm/processor.h | 2 + arch/riscv/include/asm/thread_info.h | 3 + arch/riscv/include/asm/usercfi.h | 95 ++++ arch/riscv/include/asm/vector.h | 3 + arch/riscv/include/uapi/asm/hwprobe.h | 2 + arch/riscv/include/uapi/asm/ptrace.h | 34 ++ arch/riscv/include/uapi/asm/sigcontext.h | 1 + arch/riscv/kernel/Makefile | 1 + arch/riscv/kernel/asm-offsets.c | 8 + arch/riscv/kernel/cpufeature.c | 13 + arch/riscv/kernel/entry.S | 33 +- arch/riscv/kernel/head.S | 23 + arch/riscv/kernel/process.c | 26 +- arch/riscv/kernel/ptrace.c | 95 ++++ arch/riscv/kernel/signal.c | 148 +++++- arch/riscv/kernel/sys_hwprobe.c | 2 + arch/riscv/kernel/sys_riscv.c | 10 + arch/riscv/kernel/traps.c | 43 ++ arch/riscv/kernel/usercfi.c | 545 +++++++++++++++++++++ arch/riscv/kernel/vdso/Makefile | 6 + arch/riscv/kernel/vdso/flush_icache.S | 4 + arch/riscv/kernel/vdso/getcpu.S | 4 + arch/riscv/kernel/vdso/rt_sigreturn.S | 4 + arch/riscv/kernel/vdso/sys_hwprobe.S | 4 + arch/riscv/mm/init.c | 2 +- arch/riscv/mm/pgtable.c | 17 + include/linux/cpu.h | 4 + include/linux/mm.h | 7 + include/uapi/linux/elf.h | 2 + include/uapi/linux/prctl.h | 27 + kernel/sys.c | 30 ++ tools/testing/selftests/riscv/Makefile | 2 +- tools/testing/selftests/riscv/cfi/.gitignore | 3 + tools/testing/selftests/riscv/cfi/Makefile | 10 + tools/testing/selftests/riscv/cfi/cfi_rv_test.h | 82 ++++ tools/testing/selftests/riscv/cfi/riscv_cfi_test.c | 171 +++++++ tools/testing/selftests/riscv/cfi/shadowstack.c | 385 +++++++++++++++ tools/testing/selftests/riscv/cfi/shadowstack.h | 27 + 54 files changed, 2331 insertions(+), 29 deletions(-) --- base-commit: 4181f8ad7a1061efed0219951d608d4988302af7 change-id: 20240930-v5_user_cfi_series-3dc332f8f5b2 -- - debug

8 months, 2 weeks

1
27
0 0

[PATCH AUTOSEL 6.1 08/10] selftests/bpf: Mitigate sockmap_ktls disconnect_after_delete failure

by Sasha Levin

From: Ihor Solodrai <ihor.solodrai(a)linux.dev> [ Upstream commit f2858f308131a09e33afb766cd70119b5b900569 ] "sockmap_ktls disconnect_after_delete" test has been failing on BPF CI after recent merges from netdev: * https://github.com/kernel-patches/bpf/actions/runs/14458537639 * https://github.com/kernel-patches/bpf/actions/runs/14457178732 It happens because disconnect has been disabled for TLS [1], and it renders the test case invalid. Removing all the test code creates a conflict between bpf and bpf-next, so for now only remove the offending assert [2]. The test will be removed later on bpf-next. [1] https://lore.kernel.org/netdev/20250404180334.3224206-1-kuba@kernel.org/ [2] https://lore.kernel.org/bpf/cfc371285323e1a3f3b006bfcf74e6cf7ad65258@linux.… Signed-off-by: Ihor Solodrai <ihor.solodrai(a)linux.dev> Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org> Reviewed-by: Jiayuan Chen <jiayuan.chen(a)linux.dev> Link: https://lore.kernel.org/bpf/20250416170246.2438524-1-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov <ast(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c b/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c index 2d0796314862a..0a99fd404f6dc 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c @@ -68,7 +68,6 @@ static void test_sockmap_ktls_disconnect_after_delete(int family, int map) goto close_cli; err = disconnect(cli); - ASSERT_OK(err, "disconnect"); close_cli: close(cli); -- 2.39.5

8 months, 2 weeks

1
0
0 0

[PATCH bpf-next v1 0/3] bpf, sockmap: Improve performance with CPU affinity

by Jiayuan Chen

Abstract === This patchset improves the performance of sockmap by providing CPU affinity, resulting in a 1-10x increase in throughput. Motivation === Traditional user-space reverse proxy: Reserve Proxy _________________ client -> | fd1 <-> fd2 | -> server |_________________| Using sockmap for reverse proxy: Reserve Proxy _________________ client -> | fd1 <-> fd2 | -> server | |_________________| | | | | | | _________ | | | sockmap | | --> |_________| --> By adding fds to sockmap and using a BPF program, we can quickly forward data and avoid data copying between user space and kernel space. Mainstream multi-process reverse proxy applications, such as Nginx and HAProxy, support CPU affinity settings, which allow each process to be pinned to a specific CPU, avoiding conflicts between data plane processes and other processes, especially in multi-tenant environments. Current Issues === The current design of sockmap uses a workqueue to forward ingress_skb and wakes up the workqueue without specifying a CPU (by calling schedule_delayed_work()). In the current implementation of schedule_delayed_work, it tends to run the workqueue on the current CPU. This approach has a high probability of running on the current CPU, which is the same CPU that handles the net rx soft interrupt, especially for programs that access each other using local interfaces. The loopback driver's transmit interface, loopback_xmit(), directly calls __netif_rx() on the current CPU, which means that the CPU handling sockmap's workqueue and the client's sending CPU are the same, resulting in contention. For a TCP flow, if the request or response is very large, the psock->ingress_skb queue can become very long. When the workqueue traverses this queue to forward the data, it can consume a significant amount of CPU time. Solution === Configuring RPS on a loopback interface can be useful, but it will trigger additional softirq, and furthermore, it fails to achieve our expected effect of CPU isolation from other processes. Instead, we provide a kfunc that allow users to specify the CPU on which the workqueue runs through a BPF program. We can use the existing benchmark to test the performance, which allows us to evaluate the effectiveness of this optimization. Because we use local interfaces for communication and the client consumes a significant amount of CPU when sending data, this prevents the workqueue from processing ingress_skb in a timely manner, ultimately causing the server to fail to read data quickly. Without cpu-affinity: ./bench sockmap -c 2 -p 1 -a --rx-verdict-ingress --no-verify Setting up benchmark 'sockmap'... create socket fd c1:14 p1:15 c2:16 p2:17 Benchmark 'sockmap' started. Iter 0 ( 36.031us): Send Speed 1143.693 MB/s ... Rcv Speed 109.572 MB/s Iter 1 ( 0.608us): Send Speed 1320.550 MB/s ... Rcv Speed 48.103 MB/s Iter 2 ( -5.448us): Send Speed 1314.790 MB/s ... Rcv Speed 47.842 MB/s Iter 3 ( -0.613us): Send Speed 1320.158 MB/s ... Rcv Speed 46.531 MB/s Iter 4 ( -3.441us): Send Speed 1319.375 MB/s ... Rcv Speed 46.662 MB/s Iter 5 ( 3.764us): Send Speed 1166.667 MB/s ... Rcv Speed 42.467 MB/s Iter 6 ( -4.404us): Send Speed 1319.508 MB/s ... Rcv Speed 47.973 MB/s Summary: total trans 7758 MB ± 1293.506 MB/s Without cpu-affinity(RPS enabled): ./bench sockmap -c 2 -p 1 -a --rx-verdict-ingress --no-verify Setting up benchmark 'sockmap'... create socket fd c1:14 p1:15 c2:16 p2:17 Benchmark 'sockmap' started. Iter 0 ( 28.925us): Send Speed 1630.357 MB/s ... Rcv Speed 850.960 MB/s Iter 1 ( -2.042us): Send Speed 1644.564 MB/s ... Rcv Speed 822.478 MB/s Iter 2 ( 0.754us): Send Speed 1644.297 MB/s ... Rcv Speed 850.787 MB/s Iter 3 ( 0.159us): Send Speed 1644.429 MB/s ... Rcv Speed 850.198 MB/s Iter 4 ( -2.898us): Send Speed 1646.924 MB/s ... Rcv Speed 830.867 MB/s Iter 5 ( -0.210us): Send Speed 1649.410 MB/s ... Rcv Speed 824.246 MB/s Iter 6 ( -1.448us): Send Speed 1650.723 MB/s ... Rcv Speed 808.256 MB/s With cpu-affinity(RPS disabled): ./bench sockmap -c 2 -p 1 -a --rx-verdict-ingress --no-verify --cpu-affinity Setting up benchmark 'sockmap'... create socket fd c1:14 p1:15 c2:16 p2:17 Benchmark 'sockmap' started. Iter 0 ( 36.051us): Send Speed 1883.437 MB/s ... Rcv Speed 1865.087 MB/s Iter 1 ( 1.246us): Send Speed 1900.542 MB/s ... Rcv Speed 1761.737 MB/s Iter 2 ( -8.595us): Send Speed 1883.128 MB/s ... Rcv Speed 1860.714 MB/s Iter 3 ( 7.033us): Send Speed 1890.831 MB/s ... Rcv Speed 1806.684 MB/s Iter 4 ( -8.397us): Send Speed 1884.700 MB/s ... Rcv Speed 1973.568 MB/s Iter 5 ( -1.822us): Send Speed 1894.125 MB/s ... Rcv Speed 1775.046 MB/s Iter 6 ( 4.936us): Send Speed 1877.597 MB/s ... Rcv Speed 1959.320 MB/s Summary: total trans 11328 MB ± 1888.507 MB/s Appendix === Through this optimization, we discovered that sk_mem_charge() and sk_mem_uncharge() have concurrency issues. The performance improvement brought by this optimization has made these concurrency issues more evident. This concurrency issue can cause the WARN_ON_ONCE(sk->sk_forward_alloc) check to be triggered when the socket is released. Since this patch is a feature-type patch and does not intend to fix this bug, I will provide additional patches to fix this issue later. Jiayuan Chen (3): bpf, sockmap: Introduce a new kfunc for sockmap bpf, sockmap: Affinitize workqueue to a specific CPU selftest/bpf/benchs: Add cpu-affinity for sockmap bench Documentation/bpf/map_sockmap.rst | 14 +++++++ include/linux/skmsg.h | 15 +++++++ kernel/bpf/btf.c | 3 ++ net/core/skmsg.c | 10 +++-- net/core/sock_map.c | 39 +++++++++++++++++++ .../selftests/bpf/benchs/bench_sockmap.c | 35 +++++++++++++++-- tools/testing/selftests/bpf/bpf_kfuncs.h | 6 +++ .../selftests/bpf/progs/bench_sockmap_prog.c | 7 ++++ 8 files changed, 122 insertions(+), 7 deletions(-) -- 2.47.1

8 months, 2 weeks

3
8
0 0

[PATCH AUTOSEL 6.6 19/21] selftests/bpf: Mitigate sockmap_ktls disconnect_after_delete failure

by Sasha Levin

From: Ihor Solodrai <ihor.solodrai(a)linux.dev> [ Upstream commit f2858f308131a09e33afb766cd70119b5b900569 ] "sockmap_ktls disconnect_after_delete" test has been failing on BPF CI after recent merges from netdev: * https://github.com/kernel-patches/bpf/actions/runs/14458537639 * https://github.com/kernel-patches/bpf/actions/runs/14457178732 It happens because disconnect has been disabled for TLS [1], and it renders the test case invalid. Removing all the test code creates a conflict between bpf and bpf-next, so for now only remove the offending assert [2]. The test will be removed later on bpf-next. [1] https://lore.kernel.org/netdev/20250404180334.3224206-1-kuba@kernel.org/ [2] https://lore.kernel.org/bpf/cfc371285323e1a3f3b006bfcf74e6cf7ad65258@linux.… Signed-off-by: Ihor Solodrai <ihor.solodrai(a)linux.dev> Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org> Reviewed-by: Jiayuan Chen <jiayuan.chen(a)linux.dev> Link: https://lore.kernel.org/bpf/20250416170246.2438524-1-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov <ast(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c b/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c index 2d0796314862a..0a99fd404f6dc 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c @@ -68,7 +68,6 @@ static void test_sockmap_ktls_disconnect_after_delete(int family, int map) goto close_cli; err = disconnect(cli); - ASSERT_OK(err, "disconnect"); close_cli: close(cli); -- 2.39.5

8 months, 2 weeks

1
0
0 0

[PATCH AUTOSEL 6.12 33/37] selftests/bpf: Mitigate sockmap_ktls disconnect_after_delete failure

by Sasha Levin

From: Ihor Solodrai <ihor.solodrai(a)linux.dev> [ Upstream commit f2858f308131a09e33afb766cd70119b5b900569 ] "sockmap_ktls disconnect_after_delete" test has been failing on BPF CI after recent merges from netdev: * https://github.com/kernel-patches/bpf/actions/runs/14458537639 * https://github.com/kernel-patches/bpf/actions/runs/14457178732 It happens because disconnect has been disabled for TLS [1], and it renders the test case invalid. Removing all the test code creates a conflict between bpf and bpf-next, so for now only remove the offending assert [2]. The test will be removed later on bpf-next. [1] https://lore.kernel.org/netdev/20250404180334.3224206-1-kuba@kernel.org/ [2] https://lore.kernel.org/bpf/cfc371285323e1a3f3b006bfcf74e6cf7ad65258@linux.… Signed-off-by: Ihor Solodrai <ihor.solodrai(a)linux.dev> Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org> Reviewed-by: Jiayuan Chen <jiayuan.chen(a)linux.dev> Link: https://lore.kernel.org/bpf/20250416170246.2438524-1-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov <ast(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c b/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c index 2d0796314862a..0a99fd404f6dc 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c @@ -68,7 +68,6 @@ static void test_sockmap_ktls_disconnect_after_delete(int family, int map) goto close_cli; err = disconnect(cli); - ASSERT_OK(err, "disconnect"); close_cli: close(cli); -- 2.39.5

8 months, 2 weeks

1
0
0 0

[PATCH AUTOSEL 6.14 34/39] selftests/bpf: Mitigate sockmap_ktls disconnect_after_delete failure

by Sasha Levin

From: Ihor Solodrai <ihor.solodrai(a)linux.dev> [ Upstream commit f2858f308131a09e33afb766cd70119b5b900569 ] "sockmap_ktls disconnect_after_delete" test has been failing on BPF CI after recent merges from netdev: * https://github.com/kernel-patches/bpf/actions/runs/14458537639 * https://github.com/kernel-patches/bpf/actions/runs/14457178732 It happens because disconnect has been disabled for TLS [1], and it renders the test case invalid. Removing all the test code creates a conflict between bpf and bpf-next, so for now only remove the offending assert [2]. The test will be removed later on bpf-next. [1] https://lore.kernel.org/netdev/20250404180334.3224206-1-kuba@kernel.org/ [2] https://lore.kernel.org/bpf/cfc371285323e1a3f3b006bfcf74e6cf7ad65258@linux.… Signed-off-by: Ihor Solodrai <ihor.solodrai(a)linux.dev> Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org> Reviewed-by: Jiayuan Chen <jiayuan.chen(a)linux.dev> Link: https://lore.kernel.org/bpf/20250416170246.2438524-1-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov <ast(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c b/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c index 2d0796314862a..0a99fd404f6dc 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c @@ -68,7 +68,6 @@ static void test_sockmap_ktls_disconnect_after_delete(int family, int map) goto close_cli; err = disconnect(cli); - ASSERT_OK(err, "disconnect"); close_cli: close(cli); -- 2.39.5

8 months, 2 weeks

1
0
0 0

[PATCH net 0/4] Fix Felix DSA taprio gates after clock jump

by Vladimir Oltean

Richie Pearn presented a reproducible situation where traffic would get blocked on the NXP LS1028A switch if a certain taprio schedule was applied, and stepping the PTP clock would take place. The latter event is an expected initial occurrence, but also at runtime, for example when transitioning from one grandmaster to another. The issue is completely described in patch 1/4, which also contains the fix, but it has left me with some doubts regarding the need for vsc9959_tas_clock_adjust() in general. In order to prove to myself that vsc9959_tas_clock_adjust() is needed in general, I have written a selftest for the tc-taprio data path in patch 4/4. On the LS1028A, we can clearly see the following failures without that function: INFO: Forcing a backward clock jump TEST: ping [FAIL] INFO: Setting up taprio after PTP TEST: In band with gate [FAIL] Reception of 100 packets failed TEST: Out of band with gate [FAIL] Reception of 100 packets failed As for testing my fix from patch 1/4, that was quite a bit more complex to do automatically. In fact, I couldn't find any other schedule that would fail to be updated by vsc9959_tas_clock_adjust() as cleanly as the schedule from Richie, so I've added that specific schedule as the test_clock_jump_backward() test. The test ordering is also (unfortunately) very strategic. Running the selftest to the end dirties the GCL RAM, and when running test_clock_jump_backward() once again, the GCL entries won't be all zeroes as they were the first time around. They will contain bits and pieces of old schedules, making it very challenging to make it fail. Thus, test_clock_jump_backward() is the first in the test suite, and without patch 1/4, it is only supposed to fail the _first_ time when running after a clean boot. Vladimir Oltean (4): net: dsa: felix: fix broken taprio gate states after clock jump selftests: net: tsn_lib: create common helper for counting received packets selftests: net: tsn_lib: add window_size argument to isochron_do() selftests: net: tc_taprio: new test drivers/net/dsa/ocelot/felix_vsc9959.c | 5 +- .../selftests/drivers/net/dsa/tc_taprio.sh | 1 + .../selftests/drivers/net/ocelot/psfp.sh | 8 +- .../selftests/net/forwarding/tc_taprio.sh | 421 ++++++++++++++++++ .../selftests/net/forwarding/tsn_lib.sh | 26 ++ 5 files changed, 454 insertions(+), 7 deletions(-) create mode 120000 tools/testing/selftests/drivers/net/dsa/tc_taprio.sh create mode 100755 tools/testing/selftests/net/forwarding/tc_taprio.sh -- 2.43.0

8 months, 2 weeks

2
5
0 0

[PATCH bpf-next] selftests/bpf: Fix compilation errors

by Feng Yang

From: Feng Yang <yangfeng(a)kylinos.cn> If the CONFIG_NET_SCH_BPF configuration is not enabled, the BPF test compilation will report the following error: In file included from progs/bpf_qdisc_fq.c:39: progs/bpf_qdisc_common.h:17:51: error: declaration of 'struct bpf_sk_buff_ptr' will not be visible outside of this function [-Werror,-Wvisibility] 17 | void bpf_qdisc_skb_drop(struct sk_buff *p, struct bpf_sk_buff_ptr *to_free) __ksym; | ^ progs/bpf_qdisc_fq.c:309:14: error: declaration of 'struct bpf_sk_buff_ptr' will not be visible outside of this function [-Werror,-Wvisibility] 309 | struct bpf_sk_buff_ptr *to_free) | ^ progs/bpf_qdisc_fq.c:309:14: error: declaration of 'struct bpf_sk_buff_ptr' will not be visible outside of this function [-Werror,-Wvisibility] progs/bpf_qdisc_fq.c:308:5: error: conflicting types for '____bpf_fq_enqueue' Fixes: 11c701639ba9 ("selftests/bpf: Add a basic fifo qdisc test") Signed-off-by: Feng Yang <yangfeng(a)kylinos.cn> --- tools/testing/selftests/bpf/progs/bpf_qdisc_common.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/bpf/progs/bpf_qdisc_common.h b/tools/testing/selftests/bpf/progs/bpf_qdisc_common.h index 65a2c561c0bb..7e7f2fe04f22 100644 --- a/tools/testing/selftests/bpf/progs/bpf_qdisc_common.h +++ b/tools/testing/selftests/bpf/progs/bpf_qdisc_common.h @@ -12,6 +12,8 @@ #define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8))) +struct bpf_sk_buff_ptr; + u32 bpf_skb_get_hash(struct sk_buff *p) __ksym; void bpf_kfree_skb(struct sk_buff *p) __ksym; void bpf_qdisc_skb_drop(struct sk_buff *p, struct bpf_sk_buff_ptr *to_free) __ksym; -- 2.43.0

8 months, 2 weeks

2
1
0 0

[PATCH] selftests/bpf: Fix kmem_cache iterator draining

by T.J. Mercier

The closing parentheses around the read syscall is misplaced, causing single byte reads from the iterator instead of buf sized reads. While the end result is the same, many more read calls than necessary are performed. $ tools/testing/selftests/bpf/vmtest.sh "./test_progs -t kmem_cache_iter" 145/1 kmem_cache_iter/check_task_struct:OK 145/2 kmem_cache_iter/check_slabinfo:OK 145/3 kmem_cache_iter/open_coded_iter:OK 145 kmem_cache_iter:OK Summary: 1/3 PASSED, 0 SKIPPED, 0 FAILED Fixes: a496d0cdc84d ("selftests/bpf: Add a test for kmem_cache_iter") Signed-off-by: T.J. Mercier <tjmercier(a)google.com> --- tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c b/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c index 8e13a3416a21..1de14b111931 100644 --- a/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c +++ b/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c @@ -104,7 +104,7 @@ void test_kmem_cache_iter(void) goto destroy; memset(buf, 0, sizeof(buf)); - while (read(iter_fd, buf, sizeof(buf) > 0)) { + while (read(iter_fd, buf, sizeof(buf)) > 0) { /* Read out all contents */ printf("%s", buf); } base-commit: b4432656b36e5cc1d50a1f2dc15357543add530e -- 2.49.0.906.g1f30a19c02-goog

8 months, 2 weeks

4
4
0 0

[PATCH net-next] selftests: net: exit cleanly on SIGTERM / timeout

by Jakub Kicinski

ksft runner sends 2 SIGTERMs in a row if a test runs out of time. Handle this in a similar way we handle SIGINT - cleanup and stop running further tests. Because we get 2 signals we need a bit of logic to ignore the subsequent one, they come immediately one after the other (due to commit 9616cb34b08e ("kselftest/runner.sh: Propagate SIGTERM to runner child")). This change makes sure we run cleanup (scheduled defer()s) and also print a stack trace on SIGTERM, which doesn't happen by default. Tests occasionally hang in NIPA and it's impossible to tell what they are waiting from or doing. Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- CC: petrm(a)nvidia.com CC: willemb(a)google.com CC: sdf(a)fomichev.me CC: linux-kselftest(a)vger.kernel.org --- tools/testing/selftests/net/lib/py/ksft.py | 27 +++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/lib/py/ksft.py b/tools/testing/selftests/net/lib/py/ksft.py index 3cfad0fd4570..73710634d457 100644 --- a/tools/testing/selftests/net/lib/py/ksft.py +++ b/tools/testing/selftests/net/lib/py/ksft.py @@ -3,6 +3,7 @@ import builtins import functools import inspect +import signal import sys import time import traceback @@ -26,6 +27,10 @@ KSFT_DISRUPTIVE = True pass +class KsftTerminate(KeyboardInterrupt): + pass + + def ksft_pr(*objs, **kwargs): print("#", *objs, **kwargs) @@ -193,6 +198,19 @@ KSFT_DISRUPTIVE = True return env +term_cnt = 0 + +def _ksft_intr(signum, frame): + # ksft runner.sh sends 2 SIGTERMs in a row on a timeout + # if we don't ignore the second one it will stop us from handling cleanup + global term_cnt + term_cnt += 1 + if term_cnt == 1: + raise KsftTerminate() + else: + ksft_pr(f"Ignoring SIGTERM (cnt: {term_cnt}), already exiting...") + + def ksft_run(cases=None, globs=None, case_pfx=None, args=()): cases = cases or [] @@ -205,6 +223,10 @@ KSFT_DISRUPTIVE = True cases.append(value) break + global term_cnt + term_cnt = 0 + prev_sigterm = signal.signal(signal.SIGTERM, _ksft_intr) + totals = {"pass": 0, "fail": 0, "skip": 0, "xfail": 0} print("TAP version 13") @@ -229,11 +251,12 @@ KSFT_DISRUPTIVE = True cnt_key = 'xfail' except BaseException as e: stop |= isinstance(e, KeyboardInterrupt) + stop |= isinstance(e, KsftTerminate) tb = traceback.format_exc() for line in tb.strip().split('\n'): ksft_pr("Exception|", line) if stop: - ksft_pr("Stopping tests due to KeyboardInterrupt.") + ksft_pr(f"Stopping tests due to {type(e).__name__}.") KSFT_RESULT = False cnt_key = 'fail' @@ -248,6 +271,8 @@ KSFT_DISRUPTIVE = True if stop: break + signal.signal(signal.SIGTERM, prev_sigterm) + print( f"# Totals: pass:{totals['pass']} fail:{totals['fail']} xfail:{totals['xfail']} xpass:0 skip:{totals['skip']} error:0" ) -- 2.49.0

8 months, 2 weeks

3
5
0 0

[PATCH v2 0/6] Add support for FEAT_{LS64, LS64_V} and related tests

by Yicong Yang

From: Yicong Yang <yangyicong(a)hisilicon.com> Armv8.7 introduces single-copy atomic 64-byte loads and stores instructions and its variants named under FEAT_{LS64, LS64_V}. Add support for Armv8.7 FEAT_{LS64, LS64_V}: - Add identifying and enabling in the cpufeature list - Expose the support of these features to userspace through HWCAP3 and cpuinfo - Add related hwcap test - Handle the trap of unsupported memory (normal/uncacheable) access in a VM A real scenario for this feature is that the userspace driver can make use of this to implement direct WQE (workqueue entry) - a mechanism to fill WQE directly into the hardware. This patchset also complement with Marc's patchset v2[1] for handling LS64* trapped if not advertised for a VM. [1] https://lore.kernel.org/linux-arm-kernel/20250310122505.2857610-1-maz@kerne… Tested with updated hwcap test: On host: root@localhost:/tmp# dmesg | grep "All CPU(s) started" [ 0.504846] CPU: All CPU(s) started at EL2 root@localhost:/tmp# ./hwcap [...] # LS64 present ok 217 cpuinfo_match_LS64 ok 218 sigill_LS64 ok 219 # SKIP sigbus_LS64 # LS64_V present ok 220 cpuinfo_match_LS64_V ok 221 sigill_LS64_V ok 222 # SKIP sigbus_LS64_V # 115 skipped test(s) detected. Consider enabling relevant config options to improve coverage. # Totals: pass:107 fail:0 xfail:0 xpass:0 skip:115 error:0 On guest: root@localhost:/# dmesg | grep "All CPU(s) started" [ 0.205580] CPU: All CPU(s) started at EL1 root@localhost:/mnt# ./hwcap [...] # LS64 present ok 217 cpuinfo_match_LS64 ok 218 sigill_LS64 ok 219 # SKIP sigbus_LS64 # LS64_V present ok 220 cpuinfo_match_LS64_V ok 221 sigill_LS64_V ok 222 # SKIP sigbus_LS64_V # 115 skipped test(s) detected. Consider enabling relevant config options to improve coverage. # Totals: pass:107 fail:0 xfail:0 xpass:0 skip:115 error:0 Change since v1: - Drop the suppport for LS64_ACCDATA - handle the DABT of unsupported memory type after checking the memory attributes Link: https://lore.kernel.org/linux-arm-kernel/20241202135504.14252-1-yangyicong@… Yicong Yang (6): arm64: Provide basic EL2 setup for FEAT_{LS64, LS64_V} usage at EL0/1 arm64: Add support for FEAT_{LS64, LS64_V} KVM: arm64: Enable FEAT_{LS64, LS64_V} in the supported guest kselftest/arm64: Add HWCAP test for FEAT_{LS64, LS64_V} arm64: Add ESR.DFSC definition of unsupported exclusive or atomic access KVM: arm64: Handle DABT caused by LS64* instructions on unsupported memory Documentation/arch/arm64/booting.rst | 12 +++ Documentation/arch/arm64/elf_hwcaps.rst | 6 ++ arch/arm64/include/asm/el2_setup.h | 12 ++- arch/arm64/include/asm/esr.h | 8 ++ arch/arm64/include/asm/hwcap.h | 2 + arch/arm64/include/asm/kvm_emulate.h | 7 ++ arch/arm64/include/uapi/asm/hwcap.h | 2 + arch/arm64/kernel/cpufeature.c | 51 +++++++++++++ arch/arm64/kernel/cpuinfo.c | 2 + arch/arm64/kvm/inject_fault.c | 35 +++++++++ arch/arm64/kvm/mmu.c | 37 +++++++++- arch/arm64/tools/cpucaps | 2 + tools/testing/selftests/arm64/abi/hwcap.c | 90 +++++++++++++++++++++++ 13 files changed, 264 insertions(+), 2 deletions(-) -- 2.24.0

8 months, 2 weeks

4
16
0 0

[PATCH 0/3] RISC-V KVM selftests improvements

by Atish Patra

This series improves the following tests. 1. Get-reg-list : Adds vector support 2. SBI PMU test : Distinguish between different types of illegal exception The first patch is just helper patch that adds stval support during exception handling. Signed-off-by: Atish Patra <atishp(a)rivosinc.com> --- Atish Patra (3): KVM: riscv: selftests: Add stval to exception handling KVM: riscv: selftests: Decode stval to identify exact exception type KVM: riscv: selftests: Add vector extension tests .../selftests/kvm/include/riscv/processor.h | 1 + tools/testing/selftests/kvm/lib/riscv/handlers.S | 2 + tools/testing/selftests/kvm/riscv/get-reg-list.c | 111 ++++++++++++++++++++- tools/testing/selftests/kvm/riscv/sbi_pmu_test.c | 32 ++++++ 4 files changed, 145 insertions(+), 1 deletion(-) --- base-commit: b3f263a98d30fe2e33eefea297598c590ee3560e change-id: 20250324-kvm_selftest_improve-9bedb9f0a6d3 -- Regards, Atish patra

8 months, 2 weeks

4
14
0 0

[PATCH] kunit: executor: Remove const from kunit_filter_suites() allocation type

by Kees Cook

In preparation for making the kmalloc family of allocators type aware, we need to make sure that the returned type from the allocation matches the type of the variable being assigned. (Before, the allocator would always return "void *", which can be implicitly cast to any pointer type.) The assigned type is "struct kunit_suite **" but the returned type will be "struct kunit_suite * const *". Since it isn't generally possible to remove the const qualifier, adjust the allocation type to match the assignment. Signed-off-by: Kees Cook <kees(a)kernel.org> --- Cc: Brendan Higgins <brendan.higgins(a)linux.dev> Cc: David Gow <davidgow(a)google.com> Cc: Rae Moar <rmoar(a)google.com> Cc: <linux-kselftest(a)vger.kernel.org> Cc: <kunit-dev(a)googlegroups.com> --- lib/kunit/executor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/kunit/executor.c b/lib/kunit/executor.c index 3f39955cb0f1..0061d4c7e351 100644 --- a/lib/kunit/executor.c +++ b/lib/kunit/executor.c @@ -177,7 +177,7 @@ kunit_filter_suites(const struct kunit_suite_set *suite_set, const size_t max = suite_set->end - suite_set->start; - copy = kcalloc(max, sizeof(*filtered.start), GFP_KERNEL); + copy = kcalloc(max, sizeof(*copy), GFP_KERNEL); if (!copy) { /* won't be able to run anything, return an empty set */ return filtered; } -- 2.34.1

8 months, 2 weeks

2
1
0 0

[PATCH] fix(script): improve safety and formatting for WRITES_STRICT restore

by Alexander Shatalin

From: Alexander Shatalin <sashatalin03(a)gmail.com> This patch improves the safety when restoring WRITES_STRICT by ensuring: - The variable is not empty (`-n`) - The target file is writable (`-w`) Also improves formatting by quoting variables and following shell best practices. Signed-off-by: Alexander Shatalin <sashatalin03(a)gmail.com>

8 months, 2 weeks

2
1
0 0

[PATCH 1/2] selftests/mm: Fix build break when compiling pkey_util.c

by Nysal Jan K.A.

From: Madhavan Srinivasan <maddy(a)linux.ibm.com> Commit 50910acd6f615 ("selftests/mm: use sys_pkey helpers consistently") added a pkey_util.c to refactor some of the protection_keys functions accessible by other tests. But this broken the build in powerpc in two ways, pkey-powerpc.h: In function ‘arch_is_powervm’: pkey-powerpc.h:73:21: error: storage size of ‘buf’ isn’t known 73 | struct stat buf; | ^~~ pkey-powerpc.h:75:14: error: implicit declaration of function ‘stat’; did you mean ‘strcat’? [-Wimplicit-function-declaration] 75 | if ((stat("/sys/firmware/devicetree/base/ibm,partition-name", &buf) == 0) && | ^~~~ | strcat Since pkey_util.c includes pkeys-helper.h, which in turn includes pkeys-powerpc.h, stat.h including is missing for "struct stat". This is fixed by adding "sys/stat.h" in pkeys-powerpc.h Secondly, pkey-powerpc.h:55:18: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘u64’ {aka ‘long unsigned int’} [-Wformat=] 55 | dprintf4("%s() changing %016llx to %016llx\n", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 56 | __func__, __read_pkey_reg(), pkey_reg); | ~~~~~~~~~~~~~~~~~ | | | u64 {aka long unsigned int} pkey-helpers.h:63:32: note: in definition of macro ‘dprintf_level’ 63 | sigsafe_printf(args); \ | ^~~~ These format specifier related warning are removed by adding "__SANE_USERSPACE_TYPES__" to pkeys_utils.c. Fixes: 50910acd6f615 ("selftests/mm: use sys_pkey helpers consistently") Signed-off-by: Madhavan Srinivasan <maddy(a)linux.ibm.com> Signed-off-by: Nysal Jan K.A. <nysal(a)linux.ibm.com> --- tools/testing/selftests/mm/pkey-powerpc.h | 2 ++ tools/testing/selftests/mm/pkey_util.c | 1 + 2 files changed, 3 insertions(+) diff --git a/tools/testing/selftests/mm/pkey-powerpc.h b/tools/testing/selftests/mm/pkey-powerpc.h index 1bad310d282a..d8ec906b8120 100644 --- a/tools/testing/selftests/mm/pkey-powerpc.h +++ b/tools/testing/selftests/mm/pkey-powerpc.h @@ -3,6 +3,8 @@ #ifndef _PKEYS_POWERPC_H #define _PKEYS_POWERPC_H +#include <sys/stat.h> + #ifndef SYS_pkey_alloc # define SYS_pkey_alloc 384 # define SYS_pkey_free 385 diff --git a/tools/testing/selftests/mm/pkey_util.c b/tools/testing/selftests/mm/pkey_util.c index ca4ad0d44ab2..255b332f7a08 100644 --- a/tools/testing/selftests/mm/pkey_util.c +++ b/tools/testing/selftests/mm/pkey_util.c @@ -1,4 +1,5 @@ // SPDX-License-Identifier: GPL-2.0-only +#define __SANE_USERSPACE_TYPES__ #include <sys/syscall.h> #include <unistd.h> -- 2.47.0

8 months, 2 weeks

3
4
0 0

[PATCH v2 00/15] tools/nolibc: various new functions

by Thomas Weißschuh

A few functions used by different selftests. Adding them now avoids later conflicts between different selftest serieses. Also add full support for nolibc-test.c on riscv32. All unsupported syscalls have been replaced. Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- Changes in v2: - Rebase onto latest nolibc next branch - Move #include "nolibc.h" to the top of the headers - Include linux/random.h from sys/random.h - Don't block on missing entropy in getrandom() selftest - Also test negative result of difftime() - Simplify fopen() - Link to v1: https://lore.kernel.org/r/20250423-nolibc-misc-v1-0-a925bf40297b@linutronix… --- Thomas Weißschuh (15): tools/nolibc: add strstr() tools/nolibc: add %m printf format tools/nolibc: add more stat() variants tools/nolibc: add mremap() tools/nolibc: add getrandom() tools/nolibc: add abs() and friends tools/nolibc: add support for access() and faccessat() tools/nolibc: add clock_getres(), clock_gettime() and clock_settime() tools/nolibc: add timer functions tools/nolibc: add timerfd functionality tools/nolibc: add difftime() tools/nolibc: add namespace functionality tools/nolibc: add fopen() tools/nolibc: fall back to sys_clock_gettime() in gettimeofday() tools/nolibc: implement wait() in terms of waitpid() tools/include/nolibc/Makefile | 4 + tools/include/nolibc/math.h | 31 ++++ tools/include/nolibc/nolibc.h | 4 + tools/include/nolibc/sched.h | 50 +++++ tools/include/nolibc/stdio.h | 27 +++ tools/include/nolibc/stdlib.h | 18 ++ tools/include/nolibc/string.h | 20 ++ tools/include/nolibc/sys/mman.h | 19 ++ tools/include/nolibc/sys/random.h | 34 ++++ tools/include/nolibc/sys/stat.h | 25 ++- tools/include/nolibc/sys/time.h | 15 +- tools/include/nolibc/sys/timerfd.h | 87 +++++++++ tools/include/nolibc/sys/wait.h | 12 +- tools/include/nolibc/time.h | 185 ++++++++++++++++++ tools/include/nolibc/types.h | 3 + tools/include/nolibc/unistd.h | 28 +++ tools/testing/selftests/nolibc/Makefile | 2 + tools/testing/selftests/nolibc/nolibc-test.c | 268 ++++++++++++++++++++++++++- 18 files changed, 820 insertions(+), 12 deletions(-) --- base-commit: 2051d3b830c0889ae55e37e9e8ff0d43a4acd482 change-id: 20250415-nolibc-misc-f2548ccc5ce1 Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

8 months, 2 weeks

1
15
0 0

[RFC PATCH 0/3] KVM: arm64: Don't claim MTE_ASYNC if not supported

by Ben Horgan

The ID_AA64PFR1_EL1.MTE_frac field is currently hidden from KVM. However, when ID_AA64PFR1_EL1.MTE==2, ID_AA64PFR1_EL1.MTE_frac==0 indicates that MTE_ASYNC is supported. On a host with ID_AA64PFR1_EL1.MTE==2 but without MTE_ASYNC support a guest with the MTE capability enabled will incorrectly see MTE_ASYNC advertised as supported. This series fixes that. This was found by inspection and the current behaviour is not known to break anything. Linux doesn't check MTE_frac, and wrongly, assumes MTE async faults can be generated whenever MTE is supported. This is a separate problem and not addressed here. I am looking for feedback on whether this change is valuable or otherwise. Ben Horgan (3): arm64/sysreg: Expose MTE_frac so that it is visible to KVM KVM: arm64: Make MTE_frac masking conditional on MTE capability KVM: selftests: Confirm exposing MTE_frac does not break migration arch/arm64/kernel/cpufeature.c | 1 + arch/arm64/kvm/sys_regs.c | 26 ++++++- .../testing/selftests/kvm/arm64/set_id_regs.c | 77 ++++++++++++++++++- 3 files changed, 101 insertions(+), 3 deletions(-) base-commit: 8ffd015db85fea3e15a77027fda6c02ced4d2444 -- 2.43.0

8 months, 2 weeks

2
5
0 0

[PATCH 00/15] tools/nolibc: various new functions

by Thomas Weißschuh

A few functions used by different selftests. Adding them now avoids later conflicts between different selftest serieses. Also add full support for nolibc-test.c on riscv32. All unsupported syscalls have been replaced. Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- Thomas Weißschuh (15): tools/nolibc: add strstr() tools/nolibc: add %m printf format tools/nolibc: add more stat() variants tools/nolibc: add mremap() tools/nolibc: add getrandom() tools/nolibc: add abs() and friends tools/nolibc: add support for access() and faccessat() tools/nolibc: add clock_getres(), clock_gettime() and clock_settime() tools/nolibc: add timer functions tools/nolibc: add timerfd functionality tools/nolibc: add difftime() tools/nolibc: add namespace functionality tools/nolibc: add fopen() tools/nolibc: implement fall back to sys_clock_gettime() in gettimeofday() tools/nolibc: implement wait() in terms of waitpid() tools/include/nolibc/Makefile | 4 + tools/include/nolibc/math.h | 31 ++++ tools/include/nolibc/nolibc.h | 4 + tools/include/nolibc/sched.h | 50 ++++++ tools/include/nolibc/stdio.h | 34 ++++ tools/include/nolibc/stdlib.h | 18 ++ tools/include/nolibc/string.h | 20 +++ tools/include/nolibc/sys/mman.h | 19 ++ tools/include/nolibc/sys/random.h | 32 ++++ tools/include/nolibc/sys/stat.h | 25 ++- tools/include/nolibc/sys/time.h | 15 +- tools/include/nolibc/sys/timerfd.h | 87 +++++++++ tools/include/nolibc/sys/wait.h | 12 +- tools/include/nolibc/time.h | 185 +++++++++++++++++++ tools/include/nolibc/types.h | 3 + tools/include/nolibc/unistd.h | 28 +++ tools/testing/selftests/nolibc/Makefile | 2 + tools/testing/selftests/nolibc/nolibc-test.c | 254 ++++++++++++++++++++++++++- 18 files changed, 811 insertions(+), 12 deletions(-) --- base-commit: e90ce42e81381665dbcedc5fa12e74759ee89639 change-id: 20250415-nolibc-misc-f2548ccc5ce1 Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

8 months, 2 weeks

2
23
0 0

Support for .kernelci.yaml file for in-tree configuration

by Krzysztof Wilczyński

Hello everyone, A heads up! Work on moving some aspects of the KernelCI configuration into an in-tree (or per-project, if you wish) file called ".kernelci.yaml" has started and is currently in the early design and implementation stage. This is why we need your feedback to ensure that the feature implementation moves in the right direction and will meet the user's needs. Please have a look at the following GitHub issue: https://github.com/kernelci/kernelci-pipeline/issues/1126 The linked issue is where anyone interested in this feature can get involved to discuss implementation details, ask questions, etc. You can, of course, ask questions here, too. This new feature will eventually pave the way for the introduction of a push model to KernelCI, along with the pull model we currently have, allowing for the integration of different Git hosting platforms, such as GitHub, GitLab, etc., where events sent to us from these platforms will trigger an appropriate behaviour on the KernelCI's side. However, this new functionality will be added as a mid-term deliverable, where the ability to configure KernelCI via the ".kernelci.yaml" file is one of the crucial dependencies. Please don't hesitate to get involved and provide feedback. Thank you! On behalf of the KernelCI maintainers, Krzysztof

8 months, 2 weeks

1
0
0 0

[PATCH 0/2] Fix compile-time and run-time issues for pid_namespace

by Tiezhu Yang

Tiezhu Yang (2): selftests: pid_namespace: Include sys/mount.h selftests: pid_namespace: Skip tests if not root tools/testing/selftests/pid_namespace/pid_max.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) -- 2.42.0

8 months, 2 weeks

1
2
0 0

[PATCH] selftests/mm: use long for dwRegionSize

by Siddarth G

Change the type of 'dwRegionSize' in wp_init() and wp_free() from int to long to match callers that pass long or unsigned long long values. wp_addr_range function is left unchanged because it passes 'dwRegionSize' parameter directly to pagemap_ioctl, which expects an int. Signed-off-by: Siddarth G <siddarthsgml(a)gmail.com> --- tools/testing/selftests/mm/pagemap_ioctl.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/mm/pagemap_ioctl.c b/tools/testing/selftests/mm/pagemap_ioctl.c index 57b4bba2b45f..5773666f07ea 100644 --- a/tools/testing/selftests/mm/pagemap_ioctl.c +++ b/tools/testing/selftests/mm/pagemap_ioctl.c @@ -112,7 +112,7 @@ int init_uffd(void) return 0; } -int wp_init(void *lpBaseAddress, int dwRegionSize) +int wp_init(void *lpBaseAddress, long dwRegionSize) { struct uffdio_register uffdio_register; struct uffdio_writeprotect wp; @@ -136,7 +136,7 @@ int wp_init(void *lpBaseAddress, int dwRegionSize) return 0; } -int wp_free(void *lpBaseAddress, int dwRegionSize) +int wp_free(void *lpBaseAddress, long dwRegionSize) { struct uffdio_register uffdio_register; -- 2.43.0

8 months, 2 weeks

2
1
0 0

[PATCH] kunit: fix longest symbol length test

by Sergio González Collado

The kunit test that checks the longests symbol length [1], has triggered warnings in some CI pilelines when symbol prefixes are used [2]. The test is adjusted to depend on !CONFIG_PREFIX_SYMBOLS as sujested in [3]. [1] https://lore.kernel.org/rust-for-linux/CABVgOSm=5Q0fM6neBhxSbOUHBgNzmwf2V22… [2] https://lore.kernel.org/all/20250328112156.2614513-1-arnd@kernel.org/T/#u [3] https://lore.kernel.org/all/ycgbf7jcq7nc62ndqiynogt6hkabgl3hld4uyelgo7rksyl… Fixes: c104c16073b7 ("Kunit to check the longest symbol length") Signed-off-by: Sergio González Collado <sergio.collado(a)gmail.com> --- lib/Kconfig.debug | 2 +- lib/tests/longest_symbol_kunit.c | 3 +-- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index f9051ab610d5..6937dedce04d 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2885,7 +2885,7 @@ config FORTIFY_KUNIT_TEST config LONGEST_SYM_KUNIT_TEST tristate "Test the longest symbol possible" if !KUNIT_ALL_TESTS - depends on KUNIT && KPROBES + depends on KUNIT && KPROBES && !CONFIG_PREFIX_SYMBOLS default KUNIT_ALL_TESTS help Tests the longest symbol possible diff --git a/lib/tests/longest_symbol_kunit.c b/lib/tests/longest_symbol_kunit.c index e3c28ff1807f..b183fb92d1b2 100644 --- a/lib/tests/longest_symbol_kunit.c +++ b/lib/tests/longest_symbol_kunit.c @@ -3,8 +3,7 @@ * Test the longest symbol length. Execute with: * ./tools/testing/kunit/kunit.py run longest-symbol * --arch=x86_64 --kconfig_add CONFIG_KPROBES=y --kconfig_add CONFIG_MODULES=y - * --kconfig_add CONFIG_RETPOLINE=n --kconfig_add CONFIG_CFI_CLANG=n - * --kconfig_add CONFIG_MITIGATION_RETPOLINE=n + * --kconfig_add CONFIG_CPU_MITIGATIONS=n */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt base-commit: f1a3944c860b0615d0513110d8cf62bb94adbb41 -- 2.39.2

8 months, 2 weeks

2
1
0 0

[PATCH] selftests/seccomp: fix syscall_restart test for arm compat

by FirstName LastName

From: Neill Kapron <nkapron(a)google.com> The inconsistencies in the systcall ABI between arm and arm-compat can can cause a failure in the syscall_restart test due to the logic attempting to work around the differences. The 'machine' field for an ARM64 device running in compat mode can report 'armv8l' or 'armv8b' which matches with the string 'arm' when only examining the first three characters of the string. This change adds additional validation to the workaround logic to make sure we only take the arm path when running natively, not in arm-compat. Fixes: 256d0afb11d6 ("selftests/seccomp: build and pass on arm64") Signed-off-by: Neill Kapron <nkapron(a)google.com> --- tools/testing/selftests/seccomp/seccomp_bpf.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index b2f76a52215a..53bf6a9c801f 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -3166,12 +3166,15 @@ TEST(syscall_restart) ret = get_syscall(_metadata, child_pid); #if defined(__arm__) /* - * FIXME: * - native ARM registers do NOT expose true syscall. * - compat ARM registers on ARM64 DO expose true syscall. + * - values of utsbuf.machine include 'armv8l' or 'armb8b' + * for ARM64 running in compat mode. */ ASSERT_EQ(0, uname(&utsbuf)); - if (strncmp(utsbuf.machine, "arm", 3) == 0) { + if ((strncmp(utsbuf.machine, "arm", 3) == 0) && + (strncmp(utsbuf.machine, "armv8l", 6) != 0) && + (strncmp(utsbuf.machine, "armv8b", 6) != 0)) { EXPECT_EQ(__NR_nanosleep, ret); } else #endif -- 2.49.0.850.g28803427d3-goog

8 months, 2 weeks

1
0
0 0

[PATCH v10 0/5] KVM: selftests: Add LoongArch support

by Bibo Mao

This patchset adds KVM selftests for LoongArch system, currently only some common test cases are supported and pass to run. These test cases are listed as following: coalesced_io_test demand_paging_test dirty_log_perf_test dirty_log_test guest_print_test hardware_disable_test kvm_binary_stats_test kvm_create_max_vcpus kvm_page_table_test memslot_modification_stress_test memslot_perf_test set_memory_region_test --- Changes in v10: 1. Add PS_64K and remove PS_8K in file include/loongarch/processor.h 2. Fix a typo issue in file lib/loongarch/processor.c 3. Update file MAINTAINERS about LoongArch KVM selftests Changes in v9: 1. Add vm mode VM_MODE_P47V47_16K, LoongArch VM uses this mode by default, rather than VM_MODE_P36V47_16K. 2. Refresh some spelling issues in changelog. Changes in v8: 1. Porting patch based on the latest version. 2. For macro PC_OFFSET_EXREGS, offsetof() method is used for C header file, still hardcoded definition for assemble language. Changes in v7: 1. Refine code to add LoongArch support in test case set_memory_region_test. Changes in v6: 1. Refresh the patch based on latest kernel 6.8-rc1, add LoongArch support about testcase set_memory_region_test. 2. Add hardware_disable_test test case. 3. Drop modification about macro DEFAULT_GUEST_TEST_MEM, it is problem of LoongArch binutils, this issue is raised to LoongArch binutils owners. Changes in v5: 1. In LoongArch kvm self tests, the DEFAULT_GUEST_TEST_MEM could be 0x130000000, it is different from the default value in memstress.h. So we Move the definition of DEFAULT_GUEST_TEST_MEM into LoongArch ucall.h, and add 'ifndef' condition for DEFAULT_GUEST_TEST_MEM in memstress.h. Changes in v4: 1. Remove the based-on flag, as the LoongArch KVM patch series have been accepted by Linux kernel, so this can be applied directly in kernel. Changes in v3: 1. Improve implementation of LoongArch VM page walk. 2. Add exception handler for LoongArch. 3. Add dirty_log_test, dirty_log_perf_test, guest_print_test test cases for LoongArch. 4. Add __ASSEMBLER__ macro to distinguish asm file and c file. 5. Move ucall_arch_do_ucall to the header file and make it as static inline to avoid function calls. 6. Change the DEFAULT_GUEST_TEST_MEM base addr for LoongArch. Changes in v2: 1. We should use ".balign 4096" to align the assemble code with 4K in exception.S instead of "align 12". 2. LoongArch only supports 3 or 4 levels page tables, so we remove the hanlders for 2-levels page table. 3. Remove the DEFAULT_LOONGARCH_GUEST_STACK_VADDR_MIN and use the common DEFAULT_GUEST_STACK_VADDR_MIN to allocate stack memory in guest. 4. Reorganize the test cases supported by LoongArch. 5. Fix some code comments. 6. Add kvm_binary_stats_test test case into LoongArch KVM selftests. --- Bibo Mao (5): KVM: selftests: Add VM_MODE_P47V47_16K vm mode KVM: selftests: Add KVM selftests header files for LoongArch KVM: selftests: Add core KVM selftests support for LoongArch KVM: selftests: Add ucall test support for LoongArch KVM: selftests: Add test cases for LoongArch MAINTAINERS | 2 + tools/testing/selftests/kvm/Makefile | 2 +- tools/testing/selftests/kvm/Makefile.kvm | 18 + .../testing/selftests/kvm/include/kvm_util.h | 6 + .../kvm/include/loongarch/kvm_util_arch.h | 7 + .../kvm/include/loongarch/processor.h | 141 ++++++++ .../selftests/kvm/include/loongarch/ucall.h | 20 + tools/testing/selftests/kvm/lib/kvm_util.c | 3 + .../selftests/kvm/lib/loongarch/exception.S | 59 +++ .../selftests/kvm/lib/loongarch/processor.c | 342 ++++++++++++++++++ .../selftests/kvm/lib/loongarch/ucall.c | 38 ++ .../selftests/kvm/set_memory_region_test.c | 2 +- 12 files changed, 638 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h create mode 100644 tools/testing/selftests/kvm/include/loongarch/processor.h create mode 100644 tools/testing/selftests/kvm/include/loongarch/ucall.h create mode 100644 tools/testing/selftests/kvm/lib/loongarch/exception.S create mode 100644 tools/testing/selftests/kvm/lib/loongarch/processor.c create mode 100644 tools/testing/selftests/kvm/lib/loongarch/ucall.c base-commit: 9d7a0577c9db35c4cc52db90bc415ea248446472 -- 2.39.3

8 months, 2 weeks

3
7
0 0

[PATCH v6 iproute2-next 0/1] DUALPI2 iproute2 patch

by chia-yu.chang＠nokia-bell-labs.com

From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com> Hello, Please find DUALPI2 iproute2 patch v6. v6 (26-Apr-25) - Update JSON file output due to spec modifiocation in tc.yaml of net-next v5 (25-Mar-25) - Use matches() to replace current strcmp() (Stephen Hemminger <stephen(a)networkplumber.org>) - Use general parse_percent() for handling scaled percentage values (Stephen Hemminger <stephen(a)networkplumber.org>) - Add print function for JSON of dualpi2 stats (Stephen Hemminger <stephen(a)networkplumber.org>) v4 (16-Mar-25) - Add min_qlen_step to dualpi2 attribute as the minimum queue length in number of packets in the L-queue to start step amrking. v3 (21-Feb-25) - Add memlimit to dualpi2 attribute, and add memory_used, max_memory_used, memory_limit in dualpi2 stats (Dave Taht <dave.taht(a)gmail.com>) - Update manual to align latest implementation and clarify the queue naming and default unit - Use common "get_scaled_alpha_beta" and clean print_opt for Dualpi2 v2 (23-Oct-24) - Rename get_float in dualpi2 to get_float_min_max in utils.c - Move get_float from iplink_can.c in utils.c (Stephen Hemminger <stephen(a)networkplumber.org>) - Add print function for JSON of dualpi2 (Stephen Hemminger <stephen(a)networkplumber.org>) For more details of DualPI2, plesae refer IETF RFC9332 (https://datatracker.ietf.org/doc/html/rfc9332). Best Regards, Chia-Yu Chia-Yu Chang (1): tc: add dualpi2 scheduler module bash-completion/tc | 11 +- include/uapi/linux/pkt_sched.h | 39 +++ include/utils.h | 2 + ip/iplink_can.c | 14 - lib/utils.c | 30 ++ man/man8/tc-dualpi2.8 | 249 +++++++++++++++ tc/Makefile | 1 + tc/q_dualpi2.c | 538 +++++++++++++++++++++++++++++++++ 8 files changed, 869 insertions(+), 15 deletions(-) create mode 100644 man/man8/tc-dualpi2.8 create mode 100644 tc/q_dualpi2.c -- 2.34.1

8 months, 2 weeks

1
1
0 0

[PATCH net-next v12 0/9] Device memory TCP TX

by Mina Almasry

v12: https://lore.kernel.org/netdev/20250423031117.907681-1-almasrymina@google.c… ==== No changes in v12, just restored the selftests patch I accidentally dropped in v11 v11: https://lore.kernel.org/netdev/20250423031117.907681-1-almasrymina@google.c… ==== Addressed a couple of nits and collected Acked-by from Harshitha (thanks!) v10: https://lore.kernel.org/netdev/20250417231540.2780723-1-almasrymina@google.… ==== Addressed comments following conversations with Pavel, Stan, and Harshitha. Thank you guys for the reviews again. Overall minor changes: Changelog: - Check for !niov->pp in io_zcrx_recv_frag, just in case we end up with a TX niov in that path (Pavel). - Fix locking case in !netif_device_present (Jakub/Stan). v9: https://lore.kernel.org/netdev/20250415224756.152002-1-almasrymina@google.c… === Changelog: - Use priv->bindings list instead of sock_bindings_list. This was missed during the rebase as the bindings have been updated to use priv->bindings recently (thanks Stan!) v8: https://lore.kernel.org/netdev/20250308214045.1160445-1-almasrymina@google.… === Only address minor comments on V7 Changelog: - Use netdev locking instead of rtnl_locking to match rx path. - Now that iouring zcrx is in net-next, use NET_IOV_IOURING instead of NET_IOV_UNSPECIFIED. - Post send binding to net_devmem_dmabuf_bindings after it's been fully initialized (Stan). v7: https://lore.kernel.org/netdev/20250227041209.2031104-1-almasrymina@google.… === Changelog: - Check the dmabuf net_iov binding belongs to the device the TX is going out on. (Jakub) - Provide detailed inspection of callsites of __skb_frag_ref/skb_page_unref in patch 2's changelog (Jakub) v6: https://lore.kernel.org/netdev/20250222191517.743530-1-almasrymina@google.c… === v6 has no major changes. Addressed a few issues from Paolo and David, and collected Acks from Stan. Thank you everyone for the review! Changes: - retain behavior to process MSG_FASTOPEN even if the provided cmsg is invalid (Paolo). - Rework the freeing of tx_vec slightly (it now has its own err label). (Paolo). - Squash the commit that makes dmabuf unbinding scheduled work into the same one which implements the TX path so we don't run into future errors on bisecting (Paolo). - Fix/add comments to explain how dmabuf binding refcounting works (David). v5: https://lore.kernel.org/netdev/20250220020914.895431-1-almasrymina@google.c… === v5 has no major changes; it clears up the relatively minor issues pointed out to in v4, and rebases the series on top of net-next to resolve the conflict with a patch that raced to the tree. It also collects the review tags from v4. Changes: - Rebase to net-next - Fix issues in selftest (Stan). - Address comments in the devmem and netmem driver docs (Stan and Bagas) - Fix zerocopy_fill_skb_from_devmem return error code (Stan). v4: https://lore.kernel.org/netdev/20250203223916.1064540-1-almasrymina@google.… === v4 mainly addresses the critical driver support issue surfaced in v3 by Paolo and Stan. Drivers aiming to support netmem_tx should make sure not to pass the netmem dma-addrs to the dma-mapping APIs, as these dma-addrs may come from dma-bufs. Additionally other feedback from v3 is addressed. Major changes: - Add helpers to handle netmem dma-addrs. Add GVE support for netmem_tx. - Fix binding->tx_vec not being freed on error paths during the tx binding. - Add a minimal devmem_tx test to devmem.py. - Clean up everything obsolete from the cover letter (Paolo). v3: https://patchwork.kernel.org/project/netdevbpf/list/?series=929401&state=* === Address minor comments from RFCv2 and fix a few build warnings and ynl-regen issues. No major changes. RFC v2: https://patchwork.kernel.org/project/netdevbpf/list/?series=920056&state=* ======= RFC v2 addresses much of the feedback from RFC v1. I plan on sending something close to this as net-next reopens, sending it slightly early to get feedback if any. Major changes: -------------- - much improved UAPI as suggested by Stan. We now interpret the iov_base of the passed in iov from userspace as the offset into the dmabuf to send from. This removes the need to set iov.iov_base = NULL which may be confusing to users, and enables us to send multiple iovs in the same sendmsg() call. ncdevmem and the docs show a sample use of that. - Removed the duplicate dmabuf iov_iter in binding->iov_iter. I think this is good improvment as it was confusing to keep track of 2 iterators for the same sendmsg, and mistracking both iterators caused a couple of bugs reported in the last iteration that are now resolved with this streamlining. - Improved test coverage in ncdevmem. Now multiple sendmsg() are tested, and sending multiple iovs in the same sendmsg() is tested. - Fixed issue where dmabuf unmapping was happening in invalid context (Stan). ==================================================================== The TX path had been dropped from the Device Memory TCP patch series post RFCv1 [1], to make that series slightly easier to review. This series rebases the implementation of the TX path on top of the net_iov/netmem framework agreed upon and merged. The motivation for the feature is thoroughly described in the docs & cover letter of the original proposal, so I don't repeat the lengthy descriptions here, but they are available in [1]. Full outline on usage of the TX path is detailed in the documentation included with this series. Test example is available via the kselftest included in the series as well. The series is relatively small, as the TX path for this feature largely piggybacks on the existing MSG_ZEROCOPY implementation. Patch Overview: --------------- 1. Documentation & tests to give high level overview of the feature being added. 1. Add netmem refcounting needed for the TX path. 2. Devmem TX netlink API. 3. Devmem TX net stack implementation. 4. Make dma-buf unbinding scheduled work to handle TX cases where it gets freed from contexts where we can't sleep. 5. Add devmem TX documentation. 6. Add scaffolding enabling driver support for netmem_tx. Add helpers, driver feature flag, and docs to enable drivers to declare netmem_tx support. 7. Guard netmem_tx against being enabled against drivers that don't support it. 8. Add devmem_tx selftests. Add TX path to ncdevmem and add a test to devmem.py. Testing: -------- Testing is very similar to devmem TCP RX path. The ncdevmem test used for the RX path is now augemented with client functionality to test TX path. * Test Setup: Kernel: net-next with this RFC and memory provider API cherry-picked locally. Hardware: Google Cloud A3 VMs. NIC: GVE with header split & RSS & flow steering support. Performance results are not included with this version, unfortunately. I'm having issues running the dma-buf exporter driver against the upstream kernel on my test setup. The issues are specific to that dma-buf exporter and do not affect this patch series. I plan to follow up this series with perf fixes if the tests point to issues once they're up and running. Special thanks to Stan who took a stab at rebasing the TX implementation on top of the netmem/net_iov framework merged. Parts of his proposal [2] that are reused as-is are forked off into their own patches to give full credit. [1] https://lore.kernel.org/netdev/20240909054318.1809580-1-almasrymina@google.… [2] https://lore.kernel.org/netdev/20240913150913.1280238-2-sdf@fomichev.me/T/#… Cc: sdf(a)fomichev.me Cc: asml.silence(a)gmail.com Cc: dw(a)davidwei.uk Cc: Jamal Hadi Salim <jhs(a)mojatatu.com> Cc: Victor Nogueira <victor(a)mojatatu.com> Cc: Pedro Tammela <pctammela(a)mojatatu.com> Cc: Samiullah Khawaja <skhawaja(a)google.com> Cc: Kuniyuki Iwashima <kuniyu(a)amazon.com> Mina Almasry (8): netmem: add niov->type attribute to distinguish different net_iov types net: add get_netmem/put_netmem support net: devmem: Implement TX path net: add devmem TCP TX documentation net: enable driver support for netmem TX gve: add netmem TX support to GVE DQO-RDA mode net: check for driver support in netmem TX selftests: ncdevmem: Implement devmem TCP TX Stanislav Fomichev (1): net: devmem: TCP tx netlink api Documentation/netlink/specs/netdev.yaml | 12 + Documentation/networking/devmem.rst | 150 ++++++++- .../networking/net_cachelines/net_device.rst | 1 + Documentation/networking/netdev-features.rst | 5 + Documentation/networking/netmem.rst | 23 +- drivers/net/ethernet/google/gve/gve_main.c | 3 + drivers/net/ethernet/google/gve/gve_tx_dqo.c | 8 +- include/linux/netdevice.h | 2 + include/linux/skbuff.h | 17 +- include/linux/skbuff_ref.h | 4 +- include/net/netmem.h | 34 +- include/net/sock.h | 1 + include/uapi/linux/netdev.h | 1 + io_uring/zcrx.c | 3 +- net/core/datagram.c | 48 ++- net/core/dev.c | 34 +- net/core/devmem.c | 133 ++++++-- net/core/devmem.h | 83 ++++- net/core/netdev-genl-gen.c | 13 + net/core/netdev-genl-gen.h | 1 + net/core/netdev-genl.c | 80 ++++- net/core/skbuff.c | 48 ++- net/core/sock.c | 6 + net/ipv4/ip_output.c | 3 +- net/ipv4/tcp.c | 50 ++- net/ipv6/ip6_output.c | 3 +- net/vmw_vsock/virtio_transport_common.c | 5 +- tools/include/uapi/linux/netdev.h | 1 + .../selftests/drivers/net/hw/devmem.py | 26 +- .../selftests/drivers/net/hw/ncdevmem.c | 300 +++++++++++++++++- 30 files changed, 1010 insertions(+), 88 deletions(-) base-commit: 4acf6d4f6afc3478753e49c495132619667549d9 -- 2.49.0.850.g28803427d3-goog

8 months, 2 weeks

2
11
0 0

Re: [PATCH bpf-next v2 06/11] bpf, arm64, powerpc: Change nospec to include v1 barrier

by Luis Gerhorst

kernel test robot <lkp(a)intel.com> writes: > All errors (new ones prefixed by >>): > > arch/powerpc/net/bpf_jit_comp64.c: In function 'bpf_jit_build_body': >>> arch/powerpc/net/bpf_jit_comp64.c:814:4: error: a label can only be part of a statement and a declaration is not a statement > 814 | bool sync_emitted = false; > | ^~~~ >>> arch/powerpc/net/bpf_jit_comp64.c:815:4: error: expected expression before 'bool' > 815 | bool ori31_emitted = false; > | ^~~~ >>> arch/powerpc/net/bpf_jit_comp64.c:833:6: error: 'ori31_emitted' undeclared (first use in this function) > 833 | ori31_emitted = true; > | ^~~~~~~~~~~~~ > arch/powerpc/net/bpf_jit_comp64.c:833:6: note: each undeclared identifier is reported only once for each function it appears in Fixed this for v3. For the other archs, the patches also don't add declarations in a switch/case. I also checked that there are no new W=2 warnings for the touched C files on x86 with the vmtest bpf config. I have not checked that all files that include a touched header don't have new warnings. When doing -j $(nproc) the diff does not work and with -j 1 it takes forever (e.g., because bpf.h is touched). If you think this is required just let me know (and if you have a tip on how to do it more quickly that would be great too).

8 months, 2 weeks

1
0
0 0

[PATCH v4 00/14] Add support for suppressing warning backtraces

by Alessandro Carminati

Some unit tests intentionally trigger warning backtraces by passing bad parameters to kernel API functions. Such unit tests typically check the return value from such calls, not the existence of the warning backtrace. Such intentionally generated warning backtraces are neither desirable nor useful for a number of reasons. - They can result in overlooked real problems. - A warning that suddenly starts to show up in unit tests needs to be investigated and has to be marked to be ignored, for example by adjusting filter scripts. Such filters are ad-hoc because there is no real standard format for warnings. On top of that, such filter scripts would require constant maintenance. One option to address problem would be to add messages such as "expected warning backtraces start / end here" to the kernel log. However, that would again require filter scripts, it might result in missing real problematic warning backtraces triggered while the test is running, and the irrelevant backtrace(s) would still clog the kernel log. Solve the problem by providing a means to identify and suppress specific warning backtraces while executing test code. Support suppressing multiple backtraces while at the same time limiting changes to generic code to the absolute minimum. Architecture specific changes are kept at minimum by retaining function names only if both CONFIG_DEBUG_BUGVERBOSE and CONFIG_KUNIT are enabled. The first patch of the series introduces the necessary infrastructure. The second patch introduces support for counting suppressed backtraces. This capability is used in patch three to implement unit tests. Patch four documents the new API. The next two patches add support for suppressing backtraces in drm_rect and dev_addr_lists unit tests. These patches are intended to serve as examples for the use of the functionality introduced with this series. The remaining patches implement the necessary changes for all architectures with GENERIC_BUG support. With CONFIG_KUNIT enabled, image size increase with this series applied is approximately 1%. The image size increase (and with it the functionality introduced by this series) can be avoided by disabling CONFIG_KUNIT_SUPPRESS_BACKTRACE. This series is based on the RFC patch and subsequent discussion at https://patchwork.kernel.org/project/linux-kselftest/patch/02546e59-1afe-4b… and offers a more comprehensive solution of the problem discussed there. Design note: Function pointers are only added to the __bug_table section if both CONFIG_KUNIT_SUPPRESS_BACKTRACE and CONFIG_DEBUG_BUGVERBOSE are enabled to avoid image size increases if CONFIG_KUNIT is disabled. There would be some benefits to adding those pointers all the time (reduced complexity, ability to display function names in BUG/WARNING messages). That change, if desired, can be made later. Checkpatch note: Remaining checkpatch errors and warnings were deliberately ignored. Some are triggered by matching coding style or by comments interpreted as code, others by assembler macros which are disliked by checkpatch. Suggestions for improvements are welcome. Changes since RFC: - Introduced CONFIG_KUNIT_SUPPRESS_BACKTRACE - Minor cleanups and bug fixes - Added support for all affected architectures - Added support for counting suppressed warnings - Added unit tests using those counters - Added patch to suppress warning backtraces in dev_addr_lists tests Changes since v1: - Rebased to v6.9-rc1 - Added Tested-by:, Acked-by:, and Reviewed-by: tags [I retained those tags since there have been no functional changes] - Introduced KUNIT_SUPPRESS_BACKTRACE configuration option, enabled by default. Changes since v2: - Rebased to v6.9-rc2 - Added comments to drm warning suppression explaining why it is needed. - Added patch to move conditional code in arch/sh/include/asm/bug.h to avoid kerneldoc warning - Added architecture maintainers to Cc: for architecture specific patches - No functional changes Changes since v3: - Rebased to v6.14-rc6 - Dropped net: "kunit: Suppress lock warning noise at end of dev_addr_lists tests" since 3db3b62955cd6d73afde05a17d7e8e106695c3b9 - Added __kunit_ and KUNIT_ prefixes. - Tested on interessed architectures. ---- Guenter Roeck (14): bug/kunit: Core support for suppressing warning backtraces kunit: bug: Count suppressed warning backtraces kunit: Add test cases for backtrace warning suppression kunit: Add documentation for warning backtrace suppression API drm: Suppress intentional warning backtraces in scaling unit tests x86: Add support for suppressing warning backtraces arm64: Add support for suppressing warning backtraces loongarch: Add support for suppressing warning backtraces parisc: Add support for suppressing warning backtraces s390: Add support for suppressing warning backtraces sh: Add support for suppressing warning backtraces sh: Move defines needed for suppressing warning backtraces riscv: Add support for suppressing warning backtraces powerpc: Add support for suppressing warning backtraces Documentation/dev-tools/kunit/usage.rst | 30 ++++++- arch/arm64/include/asm/asm-bug.h | 27 ++++-- arch/arm64/include/asm/bug.h | 8 +- arch/loongarch/include/asm/bug.h | 42 +++++++--- arch/parisc/include/asm/bug.h | 29 +++++-- arch/powerpc/include/asm/bug.h | 37 +++++++-- arch/riscv/include/asm/bug.h | 38 ++++++--- arch/s390/include/asm/bug.h | 17 +++- arch/sh/include/asm/bug.h | 28 ++++++- arch/x86/include/asm/bug.h | 21 +++-- drivers/gpu/drm/tests/drm_rect_test.c | 16 ++++ include/asm-generic/bug.h | 16 +++- include/kunit/bug.h | 56 +++++++++++++ include/kunit/test.h | 1 + include/linux/bug.h | 13 +++ lib/bug.c | 51 +++++++++++- lib/kunit/Kconfig | 9 ++ lib/kunit/Makefile | 7 +- lib/kunit/backtrace-suppression-test.c | 104 ++++++++++++++++++++++++ lib/kunit/bug.c | 42 ++++++++++ 20 files changed, 519 insertions(+), 73 deletions(-) create mode 100644 include/kunit/bug.h create mode 100644 lib/kunit/backtrace-suppression-test.c create mode 100644 lib/kunit/bug.c -- 2.34.1

8 months, 3 weeks

15
44
0 0

[PATCH net 1/2] net: mscc: ocelot: delete PVID VLAN when readding it as non-PVID

by Vladimir Oltean

The following set of commands: ip link add br0 type bridge vlan_filtering 1 # vlan_default_pvid 1 is implicit ip link set swp0 master br0 bridge vlan add dev swp0 vid 1 should result in the dropping of untagged and 802.1p-tagged traffic, but we see that it continues to be accepted. Whereas, had we deleted VID 1 instead, the aforementioned dropping would have worked This is because the ANA_PORT_DROP_CFG update logic doesn't run, because ocelot_vlan_add() only calls ocelot_port_set_pvid() if the new VLAN has the BRIDGE_VLAN_INFO_PVID flag. Similar to other drivers like mt7530_port_vlan_add() which handle this case correctly, we need to test whether the VLAN we're changing used to have the BRIDGE_VLAN_INFO_PVID flag, but lost it now. That amounts to a PVID deletion and should be treated as such. Regarding blame attribution: this never worked properly since the introduction of bridge VLAN filtering in commit 7142529f1688 ("net: mscc: ocelot: add VLAN filtering"). However, there was a significant paradigm shift which aligned the ANA_PORT_DROP_CFG register with the PVID concept rather than with the native VLAN concept, and that change wasn't targeted for 'stable'. Realistically, that is as far as this fix needs to be propagated to. Fixes: be0576fed6d3 ("net: mscc: ocelot: move the logic to drop 802.1p traffic to the pvid deletion") Signed-off-by: Vladimir Oltean <vladimir.oltean(a)nxp.com> --- drivers/net/ethernet/mscc/ocelot.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index ef93df520887..08bee56aea35 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -830,6 +830,7 @@ EXPORT_SYMBOL(ocelot_vlan_prepare); int ocelot_vlan_add(struct ocelot *ocelot, int port, u16 vid, bool pvid, bool untagged) { + struct ocelot_port *ocelot_port = ocelot->ports[port]; int err; /* Ignore VID 0 added to our RX filter by the 8021q module, since @@ -849,6 +850,11 @@ int ocelot_vlan_add(struct ocelot *ocelot, int port, u16 vid, bool pvid, ocelot_bridge_vlan_find(ocelot, vid)); if (err) return err; + } else if (ocelot_port->pvid_vlan && + ocelot_bridge_vlan_find(ocelot, vid) == ocelot_port->pvid_vlan) { + err = ocelot_port_set_pvid(ocelot, port, NULL); + if (err) + return err; } /* Untagged egress vlan clasification */ -- 2.43.0

8 months, 3 weeks

3
3
0 0

[PATCH v2 security-next 0/4] Introducing Hornet LSM

by Blaise Boscaccy

This patch series introduces the Hornet LSM. The goal of Hornet is to provide a signature verification mechanism for eBPF programs. eBPF has similar requirements to that of modules when it comes to loading: find symbol addresses, fix up ELF relocations, some struct field offset handling stuff called CO-RE (compile-once run-anywhere), and some other miscellaneous bookkeeping. During eBPF program compilation, pseudo-values get written to the immediate operands of instructions. During loading, those pseudo-values get rewritten with concrete addresses or data applicable to the currently running system, e.g., a kallsyms address or an fd for a map. This needs to happen before the instructions for a bpf program are loaded into the kernel via the bpf() syscall. Unlike modules, an in-kernel loader unfortunately doesn't exist. Typically, the instruction rewriting is done dynamically in userspace via libbpf. Since the relocations and instruction modifications are happening in userspace, and their values may change depending upon the running system, this breaks known signature verification mechanisms. Light skeleton programs were introduced in order to support early loading of eBPF programs along with user-mode drivers. They utilize a separate eBPF program that can load a target eBPF program and perform all necessary relocations in-kernel without needing a working userspace. Light skeletons were mentioned as a possible path forward for signature verification. Hornet takes a simple approach to light-skeleton-based eBPF signature verification. A PKCS#7 signature of a data buffer containing the raw instructions of an eBPF program, followed by the initial values of any maps used by the program is used. A utility script is provided to parse and extract the contents of autogenerated header files created via bpftool. That payload can then be signed and appended to the light skeleton executable. Maps are frozen to prevent TOCTOU bugs where a sufficiently privileged user could rewrite map data between the calls to BPF_PROG_LOAD and BPF_PROG_RUN. Additionally, both sparse-array-based and fd_array_cnt-based map fd arrays are supported for signature verification. References: [1] https://lore.kernel.org/bpf/20220209054315.73833-1-alexei.starovoitov@gmail… [2] https://lore.kernel.org/bpf/CAADnVQ+wPK1KKZhCgb-Nnf0Xfjk8M1UpX5fnXC=cBzdEYb… Change list: - v1 -> v2 - Jargon clarification, maintainer entry and a few cosmetic fixes Revisions: - v1 https://lore.kernel.org/bpf/20250321164537.16719-1-bboscaccy@linux.microsof… Blaise Boscaccy (4): security: Hornet LSM hornet: Introduce sign-ebpf hornet: Add a light-skeleton data extactor script selftests/hornet: Add a selftest for the Hornet LSM Documentation/admin-guide/LSM/Hornet.rst | 53 +++ Documentation/admin-guide/LSM/index.rst | 1 + MAINTAINERS | 9 + crypto/asymmetric_keys/pkcs7_verify.c | 10 + include/linux/kernel_read_file.h | 1 + include/linux/verification.h | 1 + include/uapi/linux/lsm.h | 1 + scripts/Makefile | 1 + scripts/hornet/Makefile | 5 + scripts/hornet/extract-skel.sh | 29 ++ scripts/hornet/sign-ebpf.c | 411 +++++++++++++++++++ security/Kconfig | 3 +- security/Makefile | 1 + security/hornet/Kconfig | 11 + security/hornet/Makefile | 4 + security/hornet/hornet_lsm.c | 239 +++++++++++ tools/testing/selftests/Makefile | 1 + tools/testing/selftests/hornet/Makefile | 51 +++ tools/testing/selftests/hornet/loader.c | 21 + tools/testing/selftests/hornet/trivial.bpf.c | 33 ++ 20 files changed, 885 insertions(+), 1 deletion(-) create mode 100644 Documentation/admin-guide/LSM/Hornet.rst create mode 100644 scripts/hornet/Makefile create mode 100755 scripts/hornet/extract-skel.sh create mode 100644 scripts/hornet/sign-ebpf.c create mode 100644 security/hornet/Kconfig create mode 100644 security/hornet/Makefile create mode 100644 security/hornet/hornet_lsm.c create mode 100644 tools/testing/selftests/hornet/Makefile create mode 100644 tools/testing/selftests/hornet/loader.c create mode 100644 tools/testing/selftests/hornet/trivial.bpf.c -- 2.48.1

8 months, 3 weeks

7
37
0 0

[PATCH v12 net-next 0/5] DUALPI2 patch

by chia-yu.chang＠nokia-bell-labs.com

From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com> Hello, Please find the DualPI2 patch v12. v12 (22-Apr-2025) - Remove anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>) - Replace u32/u8 with uint and s32 with int in tc spec document (Paolo Abeni <pabeni(a)redhat.com>) - Introduce get_memory_limit function to handle potential overflow when multipling limit with MTU (Paolo Abeni <pabeni(a)redhat.com>) - Double the packet length to further include packet overhead in memory_limit (Paolo Abeni <pabeni(a)redhat.com>) - Remove the check of qdisc_qlen(sch) when calling qdisc_tree_reduce_backlog (Paolo Abeni <pabeni(a)redhat.com>) v11 (15-Apr-2025) - Replace hstimer_init with hstimer_setup in sch_dualpi2.c v10 (25-Mar-2025) - Remove leftover include in include/linux/netdevice.h and anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>) - Use kfree_skb_reason() and add SKB_DROP_REASON_DUALPI2_STEP_DROP drop reason (Paolo Abeni <pabeni(a)redhat.com>) - Split sch_dualpi2.c into 3 patches (and overall 5 patches): Struct definition & parsing, Dump stats & configuration, Enqueue/Dequeue (Paolo Abeni <pabeni(a)redhat.com>) v9 (16-Mar-2025) - Fix mem_usage error in previous version - Add min_qlen_step to the dualpi2 attribute as the minimum queue length in number of packets in the L-queue to start step threshold marking. In previous versions, this value was fixed to 2, so the step threshold was applied to mark packets in the L queue only when the queue length of the L queue was greater than or equal to 2 packets. This will cause larger queuing delays for L4S traffic at low rates (<20Mbps). So we parameterize it and change the default value to 0. Comparison of tcp_1down run 'HTB 20Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 11.55 11.70 ms 350 TCP upload avg : 18.96 N/A Mbits/s 350 TCP upload sum : 18.96 N/A Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 10.81 10.70 ms 350 TCP upload avg : 18.91 N/A Mbits/s 350 TCP upload sum : 18.91 N/A Mbits/s 350 Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 12.61 12.80 ms 350 TCP upload avg : 9.48 N/A Mbits/s 350 TCP upload sum : 9.48 N/A Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 11.06 10.80 ms 350 TCP upload avg : 9.43 N/A Mbits/s 350 TCP upload sum : 9.43 N/A Mbits/s 350 Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 40.86 37.45 ms 350 TCP upload avg : 0.88 N/A Mbits/s 350 TCP upload sum : 0.88 N/A Mbits/s 350 TCP upload::1 : 0.88 0.97 Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 11.07 10.40 ms 350 TCP upload avg : 0.55 N/A Mbits/s 350 TCP upload sum : 0.55 N/A Mbits/s 350 TCP upload::1 : 0.55 0.59 Mbits/s 350 v8 (11-Mar-2025) - Fix warning messages in v7 v7 (07-Mar-2025) - Separate into 3 patches to avoid mixing changes of documentation, selftest, and code. (Cong Wang <xiyou.wangcong(a)gmail.com>) v6 (04-Mar-2025) - Add modprobe for dulapi2 in tc-testing script tc-testing/tdc.sh (Jakub Kicinski <kuba(a)kernel.org>) - Update test cases in dualpi2.json - Update commit message v5 (22-Feb-2025) - A comparison was done between MQ + DUALPI2, MQ + FQ_PIE, MQ + FQ_CODEL: Unshaped 1gigE with 4 download streams test: - Summary of tcp_4down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 1.19 1.34 ms 349 TCP download avg : 235.42 N/A Mbits/s 349 TCP download sum : 941.68 N/A Mbits/s 349 TCP download::1 : 235.19 235.39 Mbits/s 349 TCP download::2 : 235.03 235.35 Mbits/s 349 TCP download::3 : 236.89 235.44 Mbits/s 349 TCP download::4 : 234.57 235.19 Mbits/s 349 - Summary of tcp_4down run 'MQ + FQ_PIE' avg median # data pts Ping (ms) ICMP : 1.21 1.37 ms 350 TCP download avg : 235.42 N/A Mbits/s 350 TCP download sum : 941.61 N/A Mbits/s 350 TCP download::1 : 232.54 233.13 Mbits/s 350 TCP download::2 : 232.52 232.80 Mbits/s 350 TCP download::3 : 233.14 233.78 Mbits/s 350 TCP download::4 : 243.41 241.48 Mbits/s 350 - Summary of tcp_4down run 'MQ + DUALPI2' avg median # data pts Ping (ms) ICMP : 1.19 1.34 ms 349 TCP download avg : 235.42 N/A Mbits/s 349 TCP download sum : 941.68 N/A Mbits/s 349 TCP download::1 : 235.19 235.39 Mbits/s 349 TCP download::2 : 235.03 235.35 Mbits/s 349 TCP download::3 : 236.89 235.44 Mbits/s 349 TCP download::4 : 234.57 235.19 Mbits/s 349 Unshaped 1gigE with 128 download streams test: - Summary of tcp_128down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 Unshaped 10gigE with 4 download streams test: - Summary of tcp_4down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 0.22 0.23 ms 350 TCP download avg : 2354.08 N/A Mbits/s 350 TCP download sum : 9416.31 N/A Mbits/s 350 TCP download::1 : 2353.65 2352.81 Mbits/s 350 TCP download::2 : 2354.54 2354.21 Mbits/s 350 TCP download::3 : 2353.56 2353.78 Mbits/s 350 TCP download::4 : 2354.56 2354.45 Mbits/s 350 - Summary of tcp_4down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 0.20 0.19 ms 350 TCP download avg : 2354.76 N/A Mbits/s 350 TCP download sum : 9419.04 N/A Mbits/s 350 TCP download::1 : 2354.77 2353.89 Mbits/s 350 TCP download::2 : 2353.41 2354.29 Mbits/s 350 TCP download::3 : 2356.18 2354.19 Mbits/s 350 TCP download::4 : 2354.68 2353.15 Mbits/s 350 - Summary of tcp_4down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 0.24 0.24 ms 350 TCP download avg : 2354.11 N/A Mbits/s 350 TCP download sum : 9416.43 N/A Mbits/s 350 TCP download::1 : 2354.75 2353.93 Mbits/s 350 TCP download::2 : 2353.15 2353.75 Mbits/s 350 TCP download::3 : 2353.49 2353.72 Mbits/s 350 TCP download::4 : 2355.04 2353.73 Mbits/s 350 Unshaped 10gigE with 128 download streams test: - Summary of tcp_128down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 7.57 8.69 ms 350 TCP download avg : 73.97 N/A Mbits/s 350 TCP download sum : 9467.82 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 7.82 8.91 ms 350 TCP download avg : 73.97 N/A Mbits/s 350 TCP download sum : 9468.42 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 6.87 7.93 ms 350 TCP download avg : 73.95 N/A Mbits/s 350 TCP download sum : 9465.87 N/A Mbits/s 350 From the results shown above, we see small differences between combinations. - Update commit message to include results of no_split_gso and split_gso (Dave Taht <dave.taht(a)gmail.com> and Paolo Abeni <pabeni(a)redhat.com>) - Add memlimit in the dualpi2 attribute, and add memory_used, max_memory_used, memory_limit in dualpi2 stats (Dave Taht <dave.taht(a)gmail.com>) - Update note in sch_dualpi2.c related to BBRv3 status (Dave Taht <dave.taht(a)gmail.com>) - Update license identifier (Dave Taht <dave.taht(a)gmail.com>) - Add selftest in tools/testing/selftests/tc-testing (Cong Wang <xiyou.wangcong(a)gmail.com>) - Use netlink policies for parameter checks (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Modify texts & fix typos in Documentation/netlink/specs/tc.yaml (Dave Taht <dave.taht(a)gmail.com>) - Add descriptions of packet counter statistics and the reset function of sch_dualpi2.c - Fix step_thresh in packets - Update code comments in sch_dualpi2.c v4 (22-Oct-2024) - Update statement in Kconfig for DualPI2 (Stephen Hemminger <stephen(a)networkplumber.org>) - Put a blank line after #define in sch_dualpi2.c (Stephen Hemminger <stephen(a)networkplumber.org>) - Fix line length warning. v3 (19-Oct-2024) - Fix compilaiton error - Update Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>) v2 (18-Oct-2024) - Add Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>) - Use dualpi2 instead of skb prefix (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Replace nla_parse_nested_deprecated with nla_parse_nested (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Fix line length warning For more details of DualPI2, please refer IETF RFC9332 (https://datatracker.ietf.org/doc/html/rfc9332). Best regards, Chia-Yu Chia-Yu Chang (4): Documentation: netlink: specs: tc: Add DualPI2 specification selftests/tc-testing: Add selftests for qdisc DualPI2 sched: Struct definition and parsing of dualpi2 qdisc sched: Dump configuration and statistics of dualpi2 qdisc Koen De Schepper (1): sched: Add enqueue/dequeue of dualpi2 qdisc Documentation/netlink/specs/tc.yaml | 144 +++ include/net/dropreason-core.h | 6 + include/uapi/linux/pkt_sched.h | 39 + net/sched/Kconfig | 12 + net/sched/Makefile | 1 + net/sched/sch_dualpi2.c | 1096 +++++++++++++++++ tools/testing/selftests/tc-testing/config | 1 + .../tc-testing/tc-tests/qdiscs/dualpi2.json | 149 +++ tools/testing/selftests/tc-testing/tdc.sh | 1 + 9 files changed, 1449 insertions(+) create mode 100644 net/sched/sch_dualpi2.c create mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/dualpi2.json -- 2.34.1

8 months, 3 weeks

3
10
0 0

[PATCH net-next v4 0/3] Fix netdevim to correctly mark NAPI IDs

by Joe Damato

Greetings: Welcome to v4. This series fixes netdevsim to correctly set the NAPI ID on the skb. This is helpful for writing tests around features that use SO_INCOMING_NAPI_ID. In addition to the netdevsim fix in patch 1, patches 2 & 3 do some self test refactoring and add a test for NAPI IDs. The test itself (patch 3) introduces a C helper because apparently python doesn't have socket.SO_INCOMING_NAPI_ID. Thanks, Joe v4: - Updated the macro guard in patch 2 - Removed the remote deploy from patch 3 v3: https://lore.kernel.org/netdev/20250418013719.12094-1-jdamato@fastly.com/ - Dropped patch 3 from v2 as it is no longer necessary. - Patch 3 from this series (which was patch 4 in the v2) - Sorted .gitignore alphabetically - added cfg.remote_deploy so the test supports real remote machines - Dropped the NetNSEnter as it is unnecessary - Fixed a string interpolation issue that Paolo hit with his Python version v2: https://lore.kernel.org/netdev/20250417013301.39228-1-jdamato@fastly.com/ - No longer an RFC - Minor whitespace change in patch 1 (no functional change). - Patches 2-4 new in v2 rfcv1: https://lore.kernel.org/netdev/20250329000030.39543-1-jdamato@fastly.com/ Joe Damato (3): netdevsim: Mark NAPI ID on skb in nsim_rcv selftests: drv-net: Factor out ksft C helpers selftests: drv-net: Test that NAPI ID is non-zero drivers/net/netdevsim/netdev.c | 2 + .../testing/selftests/drivers/net/.gitignore | 1 + tools/testing/selftests/drivers/net/Makefile | 6 +- tools/testing/selftests/drivers/net/ksft.h | 56 +++++++++++++ .../testing/selftests/drivers/net/napi_id.py | 23 +++++ .../selftests/drivers/net/napi_id_helper.c | 83 +++++++++++++++++++ .../selftests/drivers/net/xdp_helper.c | 49 +---------- 7 files changed, 172 insertions(+), 48 deletions(-) create mode 100644 tools/testing/selftests/drivers/net/ksft.h create mode 100755 tools/testing/selftests/drivers/net/napi_id.py create mode 100644 tools/testing/selftests/drivers/net/napi_id_helper.c base-commit: cd7276ecac9c64c80433fbcff2e35aceaea6f477 -- 2.43.0

8 months, 3 weeks

3
5
0 0

[PATCH v12 00/28] riscv control-flow integrity for usermode

by Deepak Gupta

Basics and overview =================== Software with larger attack surfaces (e.g. network facing apps like databases, browsers or apps relying on browser runtimes) suffer from memory corruption issues which can be utilized by attackers to bend control flow of the program to eventually gain control (by making their payload executable). Attackers are able to perform such attacks by leveraging call-sites which rely on indirect calls or return sites which rely on obtaining return address from stack memory. To mitigate such attacks, risc-v extension zicfilp enforces that all indirect calls must land on a landing pad instruction `lpad` else cpu will raise software check exception (a new cpu exception cause code on riscv). Similarly for return flow, risc-v extension zicfiss extends architecture with - `sspush` instruction to push return address on a shadow stack - `sspopchk` instruction to pop return address from shadow stack and compare with input operand (i.e. return address on stack) - `sspopchk` to raise software check exception if comparision above was a mismatch - Protection mechanism using which shadow stack is not writeable via regular store instructions More information an details can be found at extensions github repo [1]. Equivalent to landing pad (zicfilp) on x86 is `ENDBRANCH` instruction in Intel CET [3] and branch target identification (BTI) [4] on arm. Similarly x86's Intel CET has shadow stack [5] and arm64 has guarded control stack (GCS) [6] which are very similar to risc-v's zicfiss shadow stack. x86 and arm64 support for user mode shadow stack is already in mainline. Kernel awareness for user control flow integrity ================================================ This series picks up Samuel Holland's envcfg changes [2] as well. So if those are being applied independently, they should be removed from this series. Enabling: In order to maintain compatibility and not break anything in user mode, kernel doesn't enable control flow integrity cpu extensions on binary by default. Instead exposes a prctl interface to enable, disable and lock the shadow stack or landing pad feature for a task. This allows userspace (loader) to enumerate if all objects in its address space are compiled with shadow stack and landing pad support and accordingly enable the feature. Additionally if a subsequent `dlopen` happens on a library, user mode can take a decision again to disable the feature (if incoming library is not compiled with support) OR terminate the task (if user mode policy is strict to have all objects in address space to be compiled with control flow integirty cpu feature). prctl to enable shadow stack results in allocating shadow stack from virtual memory and activating for user address space. x86 and arm64 are also following same direction due to similar reason(s). clone/fork: On clone and fork, cfi state for task is inherited by child. Shadow stack is part of virtual memory and is a writeable memory from kernel perspective (writeable via a restricted set of instructions aka shadow stack instructions) Thus kernel changes ensure that this memory is converted into read-only when fork/clone happens and COWed when fault is taken due to sspush, sspopchk or ssamoswap. In case `CLONE_VM` is specified and shadow stack is to be enabled, kernel will automatically allocate a shadow stack for that clone call. map_shadow_stack: x86 introduced `map_shadow_stack` system call to allow user space to explicitly map shadow stack memory in its address space. It is useful to allocate shadow for different contexts managed by a single thread (green threads or contexts) risc-v implements this system call as well. signal management: If shadow stack is enabled for a task, kernel performs an asynchronous control flow diversion to deliver the signal and eventually expects userspace to issue sigreturn so that original execution can be resumed. Even though resume context is prepared by kernel, it is in user space memory and is subject to memory corruption and corruption bugs can be utilized by attacker in this race window to perform arbitrary sigreturn and eventually bypass cfi mechanism. Another issue is how to ensure that cfi related state on sigcontext area is not trampled by legacy apps or apps compiled with old kernel headers. In order to mitigate control-flow hijacting, kernel prepares a token and place it on shadow stack before signal delivery and places address of token in sigcontext structure. During sigreturn, kernel obtains address of token from sigcontext struture, reads token from shadow stack and validates it and only then allow sigreturn to succeed. Compatiblity issue is solved by adopting dynamic sigcontext management introduced for vector extension. This series re-factor the code little bit to allow future sigcontext management easy (as proposed by Andy Chiu from SiFive) config and compilation: Introduce a new risc-v config option `CONFIG_RISCV_USER_CFI`. Selecting this config option picks the kernel support for user control flow integrity. This optin is presented only if toolchain has shadow stack and landing pad support. And is on purpose guarded by toolchain support. Reason being that eventually vDSO also needs to be compiled in with shadow stack and landing pad support. vDSO compile patches are not included as of now because landing pad labeling scheme is yet to settle for usermode runtime. To get more information on kernel interactions with respect to zicfilp and zicfiss, patch series adds documentation for `zicfilp` and `zicfiss` in following: Documentation/arch/riscv/zicfiss.rst Documentation/arch/riscv/zicfilp.rst How to test this series ======================= Toolchain --------- $ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev $ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static" $ make -j$(nproc) Qemu ---- Get the lastest qemu $ cd qemu $ mkdir build $ cd build $ ../configure --target-list=riscv64-softmmu $ make -j$(nproc) Opensbi ------- $ git clone git@github.com:deepak0414/opensbi.git -b v6_cfi_spec_split_opensbi $ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic Linux ----- Running defconfig is fine. CFI is enabled by default if the toolchain supports it. $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) In case you're building your own rootfs using toolchain, please make sure you pick following patch to ensure that vDSO compiled with lpad and shadow stack. "arch/riscv: compile vdso with landing pad" Branch where above patch can be picked https://github.com/deepak0414/linux-riscv-cfi/tree/vdso_user_cfi_v6.12-rc1 Running ------- Modify your qemu command to have: -bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin -cpu rv64,zicfilp=true,zicfiss=true,zimop=true,zcmop=true vDSO related Opens (in the flux) ================================= I am listing these opens for laying out plan and what to expect in future patch sets. And of course for the sake of discussion. Shadow stack and landing pad enabling in vDSO ---------------------------------------------- vDSO must have shadow stack and landing pad support compiled in for task to have shadow stack and landing pad support. This patch series doesn't enable that (yet). Enabling shadow stack support in vDSO should be straight forward (intend to do that in next versions of patch set). Enabling landing pad support in vDSO requires some collaboration with toolchain folks to follow a single label scheme for all object binaries. This is necessary to ensure that all indirect call-sites are setting correct label and target landing pads are decorated with same label scheme. How many vDSOs --------------- Shadow stack instructions are carved out of zimop (may be operations) and if CPU doesn't implement zimop, they're illegal instructions. Kernel could be running on a CPU which may or may not implement zimop. And thus kernel will have to carry 2 different vDSOs and expose the appropriate one depending on whether CPU implements zimop or not. References ========== [1] - https://github.com/riscv/riscv-cfi [2] - https://lore.kernel.org/all/20240814081126.956287-1-samuel.holland@sifive.c… [3] - https://lwn.net/Articles/889475/ [4] - https://developer.arm.com/documentation/109576/0100/Branch-Target-Identific… [5] - https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-i… [6] - https://lwn.net/Articles/940403/ --- changelog --------- v12: - It seems like I had accidently squashed arch agnostic indirect branch tracking prctl and riscv implementation of those prctls. Split them again. - set_shstk_status/set_indir_lp_status perform CSR writes only when CPU support is available. As suggested by Zong Li. - Some minor clean up in kselftests as suggested by Zong Li. v11: - patch "arch/riscv: compile vdso with landing pad" was unconditionally selecting `_zicfilp` for vDSO compile. fixed that. Changed `lpad 1` to to `lpad 0`. v10: - dropped "mm: helper `is_shadow_stack_vma` to check shadow stack vma". This patch is not that interesting to this patch series for risc-v. There are instances in arch directories where VM_SHADOW_STACK flag is anyways used. Dropping this patch to expedite merging in riscv tree. - Took suggestions from `Clement` on "riscv: zicfiss / zicfilp enumeration" to validate presence of cfi based on config. - Added a patch for vDSO to have `lpad 0`. I had omitted this earlier to make sure we add single vdso object with cfi enabled. But a vdso object with scheme of zero labeled landing pad is least common denominator and should work with all objects of zero labeled as well as function-signature labeled objects. v9: - rebased on master (39a803b754d5 fix braino in "9p: fix ->rename_sem exclusion") - dropped "mm: Introduce ARCH_HAS_USER_SHADOW_STACK" (master has it from arm64/gcs) - dropped "prctl: arch-agnostic prctl for shadow stack" (master has it from arm64/gcs) v8: - rebased on palmer/for-next - dropped samuel holland's `envcfg` context switch patches. they are in parlmer/for-next v7: - Removed "riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv" Instead using `deactivate_mm` flow to clean up. see here for more context https://lore.kernel.org/all/20230908203655.543765-1-rick.p.edgecombe@intel.… - Changed the header include in `kselftest`. Hopefully this fixes compile issue faced by Zong Li at SiFive. - Cleaned up an orphaned change to `mm/mmap.c` in below patch "riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE" - Lock interfaces for shadow stack and indirect branch tracking expect arg == 0 Any future evolution of this interface should accordingly define how arg should be setup. - `mm/map.c` has an instance of using `VM_SHADOW_STACK`. Fixed it to use helper `is_shadow_stack_vma`. - Link to v6: https://lore.kernel.org/r/20241008-v5_user_cfi_series-v6-0-60d9fe073f37@riv… v6: - Picked up Samuel Holland's changes as is with `envcfg` placed in `thread` instead of `thread_info` - fixed unaligned newline escapes in kselftest - cleaned up messages in kselftest and included test output in commit message - fixed a bug in clone path reported by Zong Li - fixed a build issue if CONFIG_RISCV_ISA_V is not selected (this was introduced due to re-factoring signal context management code) v5: - rebased on v6.12-rc1 - Fixed schema related issues in device tree file - Fixed some of the documentation related issues in zicfilp/ss.rst (style issues and added index) - added `SHADOW_STACK_SET_MARKER` so that implementation can define base of shadow stack. - Fixed warnings on definitions added in usercfi.h when CONFIG_RISCV_USER_CFI is not selected. - Adopted context header based signal handling as proposed by Andy Chiu - Added support for enabling kernel mode access to shadow stack using FWFT (https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-firmware…) - Link to v5: https://lore.kernel.org/r/20241001-v5_user_cfi_series-v1-0-3ba65b6e550f@riv… (Note: I had an issue in my workflow due to which version number wasn't picked up correctly while sending out patches) v4: - rebased on 6.11-rc6 - envcfg: Converged with Samuel Holland's patches for envcfg management on per- thread basis. - vma_is_shadow_stack is renamed to is_vma_shadow_stack - picked up Mark Brown's `ARCH_HAS_USER_SHADOW_STACK` patch - signal context: using extended context management to maintain compatibility. - fixed `-Wmissing-prototypes` compiler warnings for prctl functions - Documentation fixes and amending typos. - Link to v4: https://lore.kernel.org/all/20240912231650.3740732-1-debug@rivosinc.com/ v3: - envcfg logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been picked on per task basis, even though CPU didn't implement it. Fixed in this series. - dt-bindings As suggested, split into separate commit. fixed the messaging that spec is in public review - arch_is_shadow_stack change arch_is_shadow_stack changed to vma_is_shadow_stack - hwprobe zicfiss / zicfilp if present will get enumerated in hwprobe - selftests As suggested, added object and binary filenames to .gitignore Selftest binary anyways need to be compiled with cfi enabled compiler which will make sure that landing pad and shadow stack are enabled. Thus removed separate enable/disable tests. Cleaned up tests a bit. - Link to v3: https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/ v2: - Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow integrity for user mode programs can be compiled in the kernel. - Enabling of control flow integrity for user programs is left to user runtime - This patch series introduces arch agnostic `prctls` to enable shadow stack and indirect branch tracking. And implements them on riscv. --- --- Changes in v12: - EDITME: describe what is new in this series revision. - EDITME: use bulletpoints and terse descriptions. - Link to v11: https://lore.kernel.org/r/20250310-v5_user_cfi_series-v11-0-86b36cbfb910@ri… Changes in v11: - EDITME: describe what is new in this series revision. - EDITME: use bulletpoints and terse descriptions. - Link to v10: https://lore.kernel.org/r/20250210-v5_user_cfi_series-v10-0-163dcfa31c60@ri… --- Andy Chiu (1): riscv: signal: abstract header saving for setup_sigcontext Clément Léger (1): riscv: Add Firmware Feature SBI extensions definitions Deepak Gupta (25): mm: VM_SHADOW_STACK definition for riscv dt-bindings: riscv: zicfilp and zicfiss in dt-bindings (extensions.yaml) riscv: zicfiss / zicfilp enumeration riscv: zicfiss / zicfilp extension csr and bit definitions riscv: usercfi state for task and save/restore of CSR_SSP on trap entry/exit riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE riscv mm: manufacture shadow stack pte riscv mmu: teach pte_mkwrite to manufacture shadow stack PTEs riscv mmu: write protect and shadow stack riscv/mm: Implement map_shadow_stack() syscall riscv/shstk: If needed allocate a new shadow stack on clone riscv: Implements arch agnostic shadow stack prctls prctl: arch-agnostic prctl for indirect branch tracking riscv: Implements arch agnostic indirect branch tracking prctls riscv/traps: Introduce software check exception riscv/signal: save and restore of shadow stack for signal riscv/kernel: update __show_regs to print shadow stack register riscv/ptrace: riscv cfi status and state via ptrace and in core files riscv/hwprobe: zicfilp / zicfiss enumeration in hwprobe riscv: enable kernel access to shadow stack memory via FWFT sbi call riscv: kernel command line option to opt out of user cfi riscv: create a config for shadow stack and landing pad instr support riscv: Documentation for landing pad / indirect branch tracking riscv: Documentation for shadow stack on riscv kselftest/riscv: kselftest for user mode cfi Jim Shu (1): arch/riscv: compile vdso with landing pad Documentation/arch/riscv/index.rst | 2 + Documentation/arch/riscv/zicfilp.rst | 115 +++++ Documentation/arch/riscv/zicfiss.rst | 176 +++++++ .../devicetree/bindings/riscv/extensions.yaml | 14 + arch/riscv/Kconfig | 20 + arch/riscv/Makefile | 5 +- arch/riscv/include/asm/asm-prototypes.h | 1 + arch/riscv/include/asm/assembler.h | 44 ++ arch/riscv/include/asm/cpufeature.h | 13 + arch/riscv/include/asm/csr.h | 16 + arch/riscv/include/asm/entry-common.h | 2 + arch/riscv/include/asm/hwcap.h | 2 + arch/riscv/include/asm/mman.h | 25 + arch/riscv/include/asm/mmu_context.h | 7 + arch/riscv/include/asm/pgtable.h | 30 +- arch/riscv/include/asm/processor.h | 2 + arch/riscv/include/asm/sbi.h | 26 + arch/riscv/include/asm/thread_info.h | 3 + arch/riscv/include/asm/usercfi.h | 89 ++++ arch/riscv/include/asm/vector.h | 3 + arch/riscv/include/uapi/asm/hwprobe.h | 2 + arch/riscv/include/uapi/asm/ptrace.h | 22 + arch/riscv/include/uapi/asm/sigcontext.h | 1 + arch/riscv/kernel/Makefile | 1 + arch/riscv/kernel/asm-offsets.c | 8 + arch/riscv/kernel/cpufeature.c | 13 + arch/riscv/kernel/entry.S | 31 +- arch/riscv/kernel/head.S | 12 + arch/riscv/kernel/process.c | 26 +- arch/riscv/kernel/ptrace.c | 83 ++++ arch/riscv/kernel/signal.c | 142 +++++- arch/riscv/kernel/sys_hwprobe.c | 2 + arch/riscv/kernel/sys_riscv.c | 10 + arch/riscv/kernel/traps.c | 43 ++ arch/riscv/kernel/usercfi.c | 530 +++++++++++++++++++++ arch/riscv/kernel/vdso/Makefile | 12 + arch/riscv/kernel/vdso/flush_icache.S | 4 + arch/riscv/kernel/vdso/getcpu.S | 4 + arch/riscv/kernel/vdso/rt_sigreturn.S | 4 + arch/riscv/kernel/vdso/sys_hwprobe.S | 4 + arch/riscv/mm/init.c | 2 +- arch/riscv/mm/pgtable.c | 17 + include/linux/cpu.h | 4 + include/linux/mm.h | 7 + include/uapi/linux/elf.h | 1 + include/uapi/linux/prctl.h | 27 ++ kernel/sys.c | 30 ++ tools/testing/selftests/riscv/Makefile | 2 +- tools/testing/selftests/riscv/cfi/.gitignore | 3 + tools/testing/selftests/riscv/cfi/Makefile | 10 + tools/testing/selftests/riscv/cfi/cfi_rv_test.h | 82 ++++ tools/testing/selftests/riscv/cfi/riscv_cfi_test.c | 78 +++ tools/testing/selftests/riscv/cfi/shadowstack.c | 385 +++++++++++++++ tools/testing/selftests/riscv/cfi/shadowstack.h | 27 ++ 54 files changed, 2195 insertions(+), 29 deletions(-) --- base-commit: 39a803b754d5224a3522016b564113ee1e4091b2 change-id: 20240930-v5_user_cfi_series-3dc332f8f5b2 -- - debug

8 months, 3 weeks

5
82
0 0

[PATCH bpf-next v2] bpf: Allow XDP dev-bound programs to perform XDP_REDIRECT into maps

by Lorenzo Bianconi

In the current implementation if the program is dev-bound to a specific device, it will not be possible to perform XDP_REDIRECT into a DEVMAP or CPUMAP even if the program is running in the driver NAPI context and it is not attached to any map entry. This seems in contrast with the explanation available in bpf_prog_map_compatible routine. Fix the issue introducing __bpf_prog_map_compatible utility routine in order to avoid bpf_prog_is_dev_bound() check running bpf_check_tail_call() at program load time (bpf_prog_select_runtime()). Continue forbidding to attach a dev-bound program to XDP maps (BPF_MAP_TYPE_PROG_ARRAY, BPF_MAP_TYPE_DEVMAP and BPF_MAP_TYPE_CPUMAP). Fixes: 3d76a4d3d4e59 ("bpf: XDP metadata RX kfuncs") Signed-off-by: Lorenzo Bianconi <lorenzo(a)kernel.org> --- Changes in v2: - Introduce __bpf_prog_map_compatible() utility routine in order to skip bpf_prog_is_dev_bound check in bpf_check_tail_call() - Extend xdp_metadata selftest - Link to v1: https://lore.kernel.org/r/20250422-xdp-prog-bound-fix-v1-1-0b581fa186fe@ker… --- kernel/bpf/core.c | 27 +++++++++++++--------- .../selftests/bpf/prog_tests/xdp_metadata.c | 22 +++++++++++++++++- tools/testing/selftests/bpf/progs/xdp_metadata.c | 13 +++++++++++ 3 files changed, 50 insertions(+), 12 deletions(-) diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index ba6b6118cf504041278d05417c4212d57be6fca0..a3e571688421196c3ceaed62b3b59b62a0258a8c 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -2358,8 +2358,8 @@ static unsigned int __bpf_prog_ret0_warn(const void *ctx, return 0; } -bool bpf_prog_map_compatible(struct bpf_map *map, - const struct bpf_prog *fp) +static bool __bpf_prog_map_compatible(struct bpf_map *map, + const struct bpf_prog *fp) { enum bpf_prog_type prog_type = resolve_prog_type(fp); bool ret; @@ -2368,14 +2368,6 @@ bool bpf_prog_map_compatible(struct bpf_map *map, if (fp->kprobe_override) return false; - /* XDP programs inserted into maps are not guaranteed to run on - * a particular netdev (and can run outside driver context entirely - * in the case of devmap and cpumap). Until device checks - * are implemented, prohibit adding dev-bound programs to program maps. - */ - if (bpf_prog_is_dev_bound(aux)) - return false; - spin_lock(&map->owner.lock); if (!map->owner.type) { /* There's no owner yet where we could check for @@ -2409,6 +2401,19 @@ bool bpf_prog_map_compatible(struct bpf_map *map, return ret; } +bool bpf_prog_map_compatible(struct bpf_map *map, const struct bpf_prog *fp) +{ + /* XDP programs inserted into maps are not guaranteed to run on + * a particular netdev (and can run outside driver context entirely + * in the case of devmap and cpumap). Until device checks + * are implemented, prohibit adding dev-bound programs to program maps. + */ + if (bpf_prog_is_dev_bound(fp->aux)) + return false; + + return __bpf_prog_map_compatible(map, fp); +} + static int bpf_check_tail_call(const struct bpf_prog *fp) { struct bpf_prog_aux *aux = fp->aux; @@ -2421,7 +2426,7 @@ static int bpf_check_tail_call(const struct bpf_prog *fp) if (!map_type_contains_progs(map)) continue; - if (!bpf_prog_map_compatible(map, fp)) { + if (!__bpf_prog_map_compatible(map, fp)) { ret = -EINVAL; goto out; } diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c index 3d47878ef6bfb55c236dc9df2c100fcc449f8de3..19f92affc2daa23fdd869554e7a0475b86350a4f 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c @@ -351,9 +351,10 @@ void test_xdp_metadata(void) struct xdp_metadata2 *bpf_obj2 = NULL; struct xdp_metadata *bpf_obj = NULL; struct bpf_program *new_prog, *prog; + struct bpf_devmap_val devmap_e = {}; + struct bpf_map *prog_arr, *devmap; struct nstoken *tok = NULL; __u32 queue_id = QUEUE_ID; - struct bpf_map *prog_arr; struct xsk tx_xsk = {}; struct xsk rx_xsk = {}; __u32 val, key = 0; @@ -409,6 +410,13 @@ void test_xdp_metadata(void) bpf_program__set_ifindex(prog, rx_ifindex); bpf_program__set_flags(prog, BPF_F_XDP_DEV_BOUND_ONLY); + /* Make sure we can load a dev-bound program that performs + * XDP_REDIRECT into a devmap. + */ + new_prog = bpf_object__find_program_by_name(bpf_obj->obj, "redirect"); + bpf_program__set_ifindex(new_prog, rx_ifindex); + bpf_program__set_flags(new_prog, BPF_F_XDP_DEV_BOUND_ONLY); + if (!ASSERT_OK(xdp_metadata__load(bpf_obj), "load skeleton")) goto out; @@ -423,6 +431,18 @@ void test_xdp_metadata(void) "update prog_arr")) goto out; + /* Make sure we can't add dev-bound programs to devmaps. */ + devmap = bpf_object__find_map_by_name(bpf_obj->obj, "dev_map"); + if (!ASSERT_OK_PTR(devmap, "no dev_map found")) + goto out; + + devmap_e.bpf_prog.fd = val; + if (!ASSERT_ERR(bpf_map__update_elem(devmap, &key, sizeof(key), + &devmap_e, sizeof(devmap_e), + BPF_ANY), + "update dev_map")) + goto out; + /* Attach BPF program to RX interface. */ ret = bpf_xdp_attach(rx_ifindex, diff --git a/tools/testing/selftests/bpf/progs/xdp_metadata.c b/tools/testing/selftests/bpf/progs/xdp_metadata.c index 31ca229bb3c0f182483334a4ab0bd204acae20f3..09bb8a038d528cf26c5b314cc927915ac2796bf0 100644 --- a/tools/testing/selftests/bpf/progs/xdp_metadata.c +++ b/tools/testing/selftests/bpf/progs/xdp_metadata.c @@ -19,6 +19,13 @@ struct { __type(value, __u32); } prog_arr SEC(".maps"); +struct { + __uint(type, BPF_MAP_TYPE_DEVMAP); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(struct bpf_devmap_val)); + __uint(max_entries, 1); +} dev_map SEC(".maps"); + extern int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, __u64 *timestamp) __ksym; extern int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, __u32 *hash, @@ -95,4 +102,10 @@ int rx(struct xdp_md *ctx) return bpf_redirect_map(&xsk, ctx->rx_queue_index, XDP_PASS); } +SEC("xdp") +int redirect(struct xdp_md *ctx) +{ + return bpf_redirect_map(&dev_map, ctx->rx_queue_index, XDP_PASS); +} + char _license[] SEC("license") = "GPL"; --- base-commit: 5cffad0a5c8f0cc53ce9fe7cff7bc67c3a97c406 change-id: 20250422-xdp-prog-bound-fix-9f30f3e134aa Best regards, -- Lorenzo Bianconi <lorenzo(a)kernel.org>

8 months, 3 weeks

5
10
0 0

[PATCH net 1/2] net/devmem: Reject insufficiently large dmabuf pools

by Cosmin Ratiu

Drivers that are told to allocate RX buffers from pools of DMA memory should have enough memory in the pool to satisfy projected allocation requests (a function of ring size, MTU & other parameters). If there's not enough memory, RX ring refill might fail later at inconvenient times (e.g. during NAPI poll). This commit adds a check at dmabuf pool init time that compares the amount of memory in the underlying chunk pool (configured by the user space application providing dmabuf memory) with the desired pool size (previously set by the driver) and fails with an error message if chunk memory isn't enough. Fixes: 0f9214046893 ("memory-provider: dmabuf devmem memory provider") Signed-off-by: Cosmin Ratiu <cratiu(a)nvidia.com> --- net/core/devmem.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/net/core/devmem.c b/net/core/devmem.c index 6e27a47d0493..651cd55ebb28 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -299,6 +299,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd, int mp_dmabuf_devmem_init(struct page_pool *pool) { struct net_devmem_dmabuf_binding *binding = pool->mp_priv; + size_t size; if (!binding) return -EINVAL; @@ -312,6 +313,16 @@ int mp_dmabuf_devmem_init(struct page_pool *pool) if (pool->p.order != 0) return -E2BIG; + /* Validate that the underlying dmabuf has enough memory to satisfy + * requested pool size. + */ + size = gen_pool_size(binding->chunk_pool) >> PAGE_SHIFT; + if (size < pool->p.pool_size) { + pr_warn("%s: Insufficient dmabuf memory (%zu pages) to satisfy pool_size (%u pages)\n", + __func__, size, pool->p.pool_size); + return -ENOMEM; + } + net_devmem_dmabuf_binding_get(binding); return 0; } -- 2.45.0

8 months, 3 weeks

3
14
0 0

[PATCH v1] selftests: iou-zcrx: Get the page size at runtime

by Haiyue Wang

Use the API `sysconf()` to query page size at runtime, instead of using hard code number 4096. And use `posix_memalign` to allocate the page size aligned momory. Signed-off-by: Haiyue Wang <haiyuewa(a)163.com> --- .../selftests/drivers/net/hw/iou-zcrx.c | 23 ++++++++++++------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/drivers/net/hw/iou-zcrx.c b/tools/testing/selftests/drivers/net/hw/iou-zcrx.c index c26b4180eddd..8aa426014c87 100644 --- a/tools/testing/selftests/drivers/net/hw/iou-zcrx.c +++ b/tools/testing/selftests/drivers/net/hw/iou-zcrx.c @@ -37,8 +37,8 @@ #include <liburing.h> -#define PAGE_SIZE (4096) -#define AREA_SIZE (8192 * PAGE_SIZE) +static long page_size; +#define AREA_SIZE (8192 * page_size) #define SEND_SIZE (512 * 4096) #define min(a, b) \ ({ \ @@ -66,7 +66,7 @@ static int cfg_oneshot_recvs; static int cfg_send_size = SEND_SIZE; static struct sockaddr_in6 cfg_addr; -static char payload[SEND_SIZE] __attribute__((aligned(PAGE_SIZE))); +static char *payload; static void *area_ptr; static void *ring_ptr; static size_t ring_size; @@ -114,8 +114,8 @@ static inline size_t get_refill_ring_size(unsigned int rq_entries) ring_size = rq_entries * sizeof(struct io_uring_zcrx_rqe); /* add space for the header (head/tail/etc.) */ - ring_size += PAGE_SIZE; - return ALIGN_UP(ring_size, 4096); + ring_size += page_size; + return ALIGN_UP(ring_size, page_size); } static void setup_zcrx(struct io_uring *ring) @@ -219,7 +219,7 @@ static void process_accept(struct io_uring *ring, struct io_uring_cqe *cqe) connfd = cqe->res; if (cfg_oneshot) - add_recvzc_oneshot(ring, connfd, PAGE_SIZE); + add_recvzc_oneshot(ring, connfd, page_size); else add_recvzc(ring, connfd); } @@ -245,7 +245,7 @@ static void process_recvzc(struct io_uring *ring, struct io_uring_cqe *cqe) if (cfg_oneshot) { if (cqe->res == 0 && cqe->flags == 0 && cfg_oneshot_recvs) { - add_recvzc_oneshot(ring, connfd, PAGE_SIZE); + add_recvzc_oneshot(ring, connfd, page_size); cfg_oneshot_recvs--; } } else if (!(cqe->flags & IORING_CQE_F_MORE)) { @@ -370,7 +370,7 @@ static void usage(const char *filepath) static void parse_opts(int argc, char **argv) { - const int max_payload_len = sizeof(payload) - + const int max_payload_len = SEND_SIZE - sizeof(struct ipv6hdr) - sizeof(struct tcphdr) - 40 /* max tcp options */; @@ -443,6 +443,13 @@ int main(int argc, char **argv) const char *cfg_test = argv[argc - 1]; int i; + page_size = sysconf(_SC_PAGESIZE); + if (page_size < 0) + return 1; + + if (posix_memalign((void **)&payload, page_size, SEND_SIZE)) + return 1; + parse_opts(argc, argv); for (i = 0; i < SEND_SIZE; i++) -- 2.49.0

8 months, 3 weeks

6
8
0 0

Call for Participation: Automated Testing Summit 2025 @ OSS NA

by Gustavo Padovan

The Automated Testing Summit (ATS) 2025 will be held as a co-located event at the Open Source Summit North America, and we’re now accepting talk proposals! https://events.linuxfoundation.org/open-source-summit-north-america/feature… 📅 Date: June 26, 2025 📍 Location: Denver, CO, USA Hosted by KernelCI, ATS is a technical summit focused on the challenges of testing and quality assurance in the Linux ecosystem — especially in upstream kernel development, embedded systems, cloud environments, and CI integration. This is a great opportunity to share your work on: * Kernel and userspace test frameworks * Lab infrastructure and automation * CI/CD pipelines for Linux * Fuzzing, performance testing, and debugging tools * Sharing and standardizing test results across systems Whether you’re working on kernel testing, running tests on hardware labs, developing QA tools, or building infrastructure that scales across projects, ATS is the place to collaborate and move the ecosystem forward. Submit your talk by May 18, 2025: 👉 Call for Proposals (CFP): https://sessionize.com/atsna2025 We hope to see you in Denver! — The KernelCI Team

8 months, 3 weeks

1
0
0 0

[PATCH v13 00/28] riscv control-flow integrity for usermode

by Deepak Gupta

Basics and overview =================== Software with larger attack surfaces (e.g. network facing apps like databases, browsers or apps relying on browser runtimes) suffer from memory corruption issues which can be utilized by attackers to bend control flow of the program to eventually gain control (by making their payload executable). Attackers are able to perform such attacks by leveraging call-sites which rely on indirect calls or return sites which rely on obtaining return address from stack memory. To mitigate such attacks, risc-v extension zicfilp enforces that all indirect calls must land on a landing pad instruction `lpad` else cpu will raise software check exception (a new cpu exception cause code on riscv). Similarly for return flow, risc-v extension zicfiss extends architecture with - `sspush` instruction to push return address on a shadow stack - `sspopchk` instruction to pop return address from shadow stack and compare with input operand (i.e. return address on stack) - `sspopchk` to raise software check exception if comparision above was a mismatch - Protection mechanism using which shadow stack is not writeable via regular store instructions More information an details can be found at extensions github repo [1]. Equivalent to landing pad (zicfilp) on x86 is `ENDBRANCH` instruction in Intel CET [3] and branch target identification (BTI) [4] on arm. Similarly x86's Intel CET has shadow stack [5] and arm64 has guarded control stack (GCS) [6] which are very similar to risc-v's zicfiss shadow stack. x86 and arm64 support for user mode shadow stack is already in mainline. Kernel awareness for user control flow integrity ================================================ This series picks up Samuel Holland's envcfg changes [2] as well. So if those are being applied independently, they should be removed from this series. Enabling: In order to maintain compatibility and not break anything in user mode, kernel doesn't enable control flow integrity cpu extensions on binary by default. Instead exposes a prctl interface to enable, disable and lock the shadow stack or landing pad feature for a task. This allows userspace (loader) to enumerate if all objects in its address space are compiled with shadow stack and landing pad support and accordingly enable the feature. Additionally if a subsequent `dlopen` happens on a library, user mode can take a decision again to disable the feature (if incoming library is not compiled with support) OR terminate the task (if user mode policy is strict to have all objects in address space to be compiled with control flow integirty cpu feature). prctl to enable shadow stack results in allocating shadow stack from virtual memory and activating for user address space. x86 and arm64 are also following same direction due to similar reason(s). clone/fork: On clone and fork, cfi state for task is inherited by child. Shadow stack is part of virtual memory and is a writeable memory from kernel perspective (writeable via a restricted set of instructions aka shadow stack instructions) Thus kernel changes ensure that this memory is converted into read-only when fork/clone happens and COWed when fault is taken due to sspush, sspopchk or ssamoswap. In case `CLONE_VM` is specified and shadow stack is to be enabled, kernel will automatically allocate a shadow stack for that clone call. map_shadow_stack: x86 introduced `map_shadow_stack` system call to allow user space to explicitly map shadow stack memory in its address space. It is useful to allocate shadow for different contexts managed by a single thread (green threads or contexts) risc-v implements this system call as well. signal management: If shadow stack is enabled for a task, kernel performs an asynchronous control flow diversion to deliver the signal and eventually expects userspace to issue sigreturn so that original execution can be resumed. Even though resume context is prepared by kernel, it is in user space memory and is subject to memory corruption and corruption bugs can be utilized by attacker in this race window to perform arbitrary sigreturn and eventually bypass cfi mechanism. Another issue is how to ensure that cfi related state on sigcontext area is not trampled by legacy apps or apps compiled with old kernel headers. In order to mitigate control-flow hijacting, kernel prepares a token and place it on shadow stack before signal delivery and places address of token in sigcontext structure. During sigreturn, kernel obtains address of token from sigcontext struture, reads token from shadow stack and validates it and only then allow sigreturn to succeed. Compatiblity issue is solved by adopting dynamic sigcontext management introduced for vector extension. This series re-factor the code little bit to allow future sigcontext management easy (as proposed by Andy Chiu from SiFive) config and compilation: Introduce a new risc-v config option `CONFIG_RISCV_USER_CFI`. Selecting this config option picks the kernel support for user control flow integrity. This optin is presented only if toolchain has shadow stack and landing pad support. And is on purpose guarded by toolchain support. Reason being that eventually vDSO also needs to be compiled in with shadow stack and landing pad support. vDSO compile patches are not included as of now because landing pad labeling scheme is yet to settle for usermode runtime. To get more information on kernel interactions with respect to zicfilp and zicfiss, patch series adds documentation for `zicfilp` and `zicfiss` in following: Documentation/arch/riscv/zicfiss.rst Documentation/arch/riscv/zicfilp.rst How to test this series ======================= Toolchain --------- $ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev $ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static" $ make -j$(nproc) Qemu ---- Get the lastest qemu $ cd qemu $ mkdir build $ cd build $ ../configure --target-list=riscv64-softmmu $ make -j$(nproc) Opensbi ------- $ git clone git@github.com:deepak0414/opensbi.git -b v6_cfi_spec_split_opensbi $ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic Linux ----- Running defconfig is fine. CFI is enabled by default if the toolchain supports it. $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) In case you're building your own rootfs using toolchain, please make sure you pick following patch to ensure that vDSO compiled with lpad and shadow stack. "arch/riscv: compile vdso with landing pad" Branch where above patch can be picked https://github.com/deepak0414/linux-riscv-cfi/tree/vdso_user_cfi_v6.12-rc1 Running ------- Modify your qemu command to have: -bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin -cpu rv64,zicfilp=true,zicfiss=true,zimop=true,zcmop=true vDSO related Opens (in the flux) ================================= I am listing these opens for laying out plan and what to expect in future patch sets. And of course for the sake of discussion. Shadow stack and landing pad enabling in vDSO ---------------------------------------------- vDSO must have shadow stack and landing pad support compiled in for task to have shadow stack and landing pad support. This patch series doesn't enable that (yet). Enabling shadow stack support in vDSO should be straight forward (intend to do that in next versions of patch set). Enabling landing pad support in vDSO requires some collaboration with toolchain folks to follow a single label scheme for all object binaries. This is necessary to ensure that all indirect call-sites are setting correct label and target landing pads are decorated with same label scheme. How many vDSOs --------------- Shadow stack instructions are carved out of zimop (may be operations) and if CPU doesn't implement zimop, they're illegal instructions. Kernel could be running on a CPU which may or may not implement zimop. And thus kernel will have to carry 2 different vDSOs and expose the appropriate one depending on whether CPU implements zimop or not. References ========== [1] - https://github.com/riscv/riscv-cfi [2] - https://lore.kernel.org/all/20240814081126.956287-1-samuel.holland@sifive.c… [3] - https://lwn.net/Articles/889475/ [4] - https://developer.arm.com/documentation/109576/0100/Branch-Target-Identific… [5] - https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-i… [6] - https://lwn.net/Articles/940403/ To: Thomas Gleixner <tglx(a)linutronix.de> To: Ingo Molnar <mingo(a)redhat.com> To: Borislav Petkov <bp(a)alien8.de> To: Dave Hansen <dave.hansen(a)linux.intel.com> To: x86(a)kernel.org To: H. Peter Anvin <hpa(a)zytor.com> To: Andrew Morton <akpm(a)linux-foundation.org> To: Liam R. Howlett <Liam.Howlett(a)oracle.com> To: Vlastimil Babka <vbabka(a)suse.cz> To: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> To: Paul Walmsley <paul.walmsley(a)sifive.com> To: Palmer Dabbelt <palmer(a)dabbelt.com> To: Albert Ou <aou(a)eecs.berkeley.edu> To: Conor Dooley <conor(a)kernel.org> To: Rob Herring <robh(a)kernel.org> To: Krzysztof Kozlowski <krzk+dt(a)kernel.org> To: Arnd Bergmann <arnd(a)arndb.de> To: Christian Brauner <brauner(a)kernel.org> To: Peter Zijlstra <peterz(a)infradead.org> To: Oleg Nesterov <oleg(a)redhat.com> To: Eric Biederman <ebiederm(a)xmission.com> To: Kees Cook <kees(a)kernel.org> To: Jonathan Corbet <corbet(a)lwn.net> To: Shuah Khan <shuah(a)kernel.org> To: Jann Horn <jannh(a)google.com> To: Conor Dooley <conor+dt(a)kernel.org> To: Miguel Ojeda <ojeda(a)kernel.org> To: Alex Gaynor <alex.gaynor(a)gmail.com> To: Boqun Feng <boqun.feng(a)gmail.com> To: Gary Guo <gary(a)garyguo.net> To: Björn Roy Baron <bjorn3_gh(a)protonmail.com> To: Benno Lossin <benno.lossin(a)proton.me> To: Andreas Hindborg <a.hindborg(a)kernel.org> To: Alice Ryhl <aliceryhl(a)google.com> To: Trevor Gross <tmgross(a)umich.edu> Cc: linux-kernel(a)vger.kernel.org Cc: linux-fsdevel(a)vger.kernel.org Cc: linux-mm(a)kvack.org Cc: linux-riscv(a)lists.infradead.org Cc: devicetree(a)vger.kernel.org Cc: linux-arch(a)vger.kernel.org Cc: linux-doc(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: alistair.francis(a)wdc.com Cc: richard.henderson(a)linaro.org Cc: jim.shu(a)sifive.com Cc: andybnac(a)gmail.com Cc: kito.cheng(a)sifive.com Cc: charlie(a)rivosinc.com Cc: atishp(a)rivosinc.com Cc: evan(a)rivosinc.com Cc: cleger(a)rivosinc.com Cc: alexghiti(a)rivosinc.com Cc: samitolvanen(a)google.com Cc: broonie(a)kernel.org Cc: rick.p.edgecombe(a)intel.com Cc: rust-for-linux(a)vger.kernel.org changelog --------- v13: - cpu_supports_shadow_stack/cpu_supports_indirect_br_lp_instr uses riscv_has_extension_unlikely() - uses nops(count) to create nop slide - RISCV_ACQUIRE_BARRIER is not needed in `amo_user_shstk`. Removed it - changed ternaries to simply use implicit casting to convert to bool. - kernel command line allows to disable zicfilp and zicfiss independently. updated kernel-parameters.txt. - ptrace user abi for cfi uses bitmasks instead of bitfields. Added ptrace kselftest. - cosmetic and grammatical changes to documentation. v12: - It seems like I had accidently squashed arch agnostic indirect branch tracking prctl and riscv implementation of those prctls. Split them again. - set_shstk_status/set_indir_lp_status perform CSR writes only when CPU support is available. As suggested by Zong Li. - Some minor clean up in kselftests as suggested by Zong Li. v11: - patch "arch/riscv: compile vdso with landing pad" was unconditionally selecting `_zicfilp` for vDSO compile. fixed that. Changed `lpad 1` to to `lpad 0`. v10: - dropped "mm: helper `is_shadow_stack_vma` to check shadow stack vma". This patch is not that interesting to this patch series for risc-v. There are instances in arch directories where VM_SHADOW_STACK flag is anyways used. Dropping this patch to expedite merging in riscv tree. - Took suggestions from `Clement` on "riscv: zicfiss / zicfilp enumeration" to validate presence of cfi based on config. - Added a patch for vDSO to have `lpad 0`. I had omitted this earlier to make sure we add single vdso object with cfi enabled. But a vdso object with scheme of zero labeled landing pad is least common denominator and should work with all objects of zero labeled as well as function-signature labeled objects. v9: - rebased on master (39a803b754d5 fix braino in "9p: fix ->rename_sem exclusion") - dropped "mm: Introduce ARCH_HAS_USER_SHADOW_STACK" (master has it from arm64/gcs) - dropped "prctl: arch-agnostic prctl for shadow stack" (master has it from arm64/gcs) v8: - rebased on palmer/for-next - dropped samuel holland's `envcfg` context switch patches. they are in parlmer/for-next v7: - Removed "riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv" Instead using `deactivate_mm` flow to clean up. see here for more context https://lore.kernel.org/all/20230908203655.543765-1-rick.p.edgecombe@intel.… - Changed the header include in `kselftest`. Hopefully this fixes compile issue faced by Zong Li at SiFive. - Cleaned up an orphaned change to `mm/mmap.c` in below patch "riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE" - Lock interfaces for shadow stack and indirect branch tracking expect arg == 0 Any future evolution of this interface should accordingly define how arg should be setup. - `mm/map.c` has an instance of using `VM_SHADOW_STACK`. Fixed it to use helper `is_shadow_stack_vma`. - Link to v6: https://lore.kernel.org/r/20241008-v5_user_cfi_series-v6-0-60d9fe073f37@riv… v6: - Picked up Samuel Holland's changes as is with `envcfg` placed in `thread` instead of `thread_info` - fixed unaligned newline escapes in kselftest - cleaned up messages in kselftest and included test output in commit message - fixed a bug in clone path reported by Zong Li - fixed a build issue if CONFIG_RISCV_ISA_V is not selected (this was introduced due to re-factoring signal context management code) v5: - rebased on v6.12-rc1 - Fixed schema related issues in device tree file - Fixed some of the documentation related issues in zicfilp/ss.rst (style issues and added index) - added `SHADOW_STACK_SET_MARKER` so that implementation can define base of shadow stack. - Fixed warnings on definitions added in usercfi.h when CONFIG_RISCV_USER_CFI is not selected. - Adopted context header based signal handling as proposed by Andy Chiu - Added support for enabling kernel mode access to shadow stack using FWFT (https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-firmware…) - Link to v5: https://lore.kernel.org/r/20241001-v5_user_cfi_series-v1-0-3ba65b6e550f@riv… (Note: I had an issue in my workflow due to which version number wasn't picked up correctly while sending out patches) v4: - rebased on 6.11-rc6 - envcfg: Converged with Samuel Holland's patches for envcfg management on per- thread basis. - vma_is_shadow_stack is renamed to is_vma_shadow_stack - picked up Mark Brown's `ARCH_HAS_USER_SHADOW_STACK` patch - signal context: using extended context management to maintain compatibility. - fixed `-Wmissing-prototypes` compiler warnings for prctl functions - Documentation fixes and amending typos. - Link to v4: https://lore.kernel.org/all/20240912231650.3740732-1-debug@rivosinc.com/ v3: - envcfg logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been picked on per task basis, even though CPU didn't implement it. Fixed in this series. - dt-bindings As suggested, split into separate commit. fixed the messaging that spec is in public review - arch_is_shadow_stack change arch_is_shadow_stack changed to vma_is_shadow_stack - hwprobe zicfiss / zicfilp if present will get enumerated in hwprobe - selftests As suggested, added object and binary filenames to .gitignore Selftest binary anyways need to be compiled with cfi enabled compiler which will make sure that landing pad and shadow stack are enabled. Thus removed separate enable/disable tests. Cleaned up tests a bit. - Link to v3: https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/ v2: - Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow integrity for user mode programs can be compiled in the kernel. - Enabling of control flow integrity for user programs is left to user runtime - This patch series introduces arch agnostic `prctls` to enable shadow stack and indirect branch tracking. And implements them on riscv. --- Changes in v13: - changelog posted just below cover letter - Link to v12: https://lore.kernel.org/r/20250314-v5_user_cfi_series-v12-0-e51202b53138@ri… Changes in v12: - changelog posted just below cover letter - Link to v11: https://lore.kernel.org/r/20250310-v5_user_cfi_series-v11-0-86b36cbfb910@ri… Changes in v11: - changelog posted just below cover letter - Link to v10: https://lore.kernel.org/r/20250210-v5_user_cfi_series-v10-0-163dcfa31c60@ri… --- Andy Chiu (1): riscv: signal: abstract header saving for setup_sigcontext Clément Léger (1): riscv: Add Firmware Feature SBI extensions definitions Deepak Gupta (25): mm: VM_SHADOW_STACK definition for riscv dt-bindings: riscv: zicfilp and zicfiss in dt-bindings (extensions.yaml) riscv: zicfiss / zicfilp enumeration riscv: zicfiss / zicfilp extension csr and bit definitions riscv: usercfi state for task and save/restore of CSR_SSP on trap entry/exit riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE riscv mm: manufacture shadow stack pte riscv mmu: teach pte_mkwrite to manufacture shadow stack PTEs riscv mmu: write protect and shadow stack riscv/mm: Implement map_shadow_stack() syscall riscv/shstk: If needed allocate a new shadow stack on clone riscv: Implements arch agnostic shadow stack prctls prctl: arch-agnostic prctl for indirect branch tracking riscv: Implements arch agnostic indirect branch tracking prctls riscv/traps: Introduce software check exception riscv/signal: save and restore of shadow stack for signal riscv/kernel: update __show_regs to print shadow stack register riscv/ptrace: riscv cfi status and state via ptrace and in core files riscv/hwprobe: zicfilp / zicfiss enumeration in hwprobe riscv: kernel command line option to opt out of user cfi riscv: enable kernel access to shadow stack memory via FWFT sbi call riscv: create a config for shadow stack and landing pad instr support riscv: Documentation for landing pad / indirect branch tracking riscv: Documentation for shadow stack on riscv kselftest/riscv: kselftest for user mode cfi Jim Shu (1): arch/riscv: compile vdso with landing pad Documentation/admin-guide/kernel-parameters.txt | 8 + Documentation/arch/riscv/index.rst | 2 + Documentation/arch/riscv/zicfilp.rst | 115 +++++ Documentation/arch/riscv/zicfiss.rst | 179 +++++++ .../devicetree/bindings/riscv/extensions.yaml | 14 + arch/riscv/Kconfig | 20 + arch/riscv/Makefile | 5 +- arch/riscv/include/asm/asm-prototypes.h | 1 + arch/riscv/include/asm/assembler.h | 44 ++ arch/riscv/include/asm/cpufeature.h | 12 + arch/riscv/include/asm/csr.h | 16 + arch/riscv/include/asm/entry-common.h | 2 + arch/riscv/include/asm/hwcap.h | 2 + arch/riscv/include/asm/mman.h | 25 + arch/riscv/include/asm/mmu_context.h | 7 + arch/riscv/include/asm/pgtable.h | 30 +- arch/riscv/include/asm/processor.h | 2 + arch/riscv/include/asm/sbi.h | 26 + arch/riscv/include/asm/thread_info.h | 3 + arch/riscv/include/asm/usercfi.h | 96 ++++ arch/riscv/include/asm/vector.h | 3 + arch/riscv/include/uapi/asm/hwprobe.h | 2 + arch/riscv/include/uapi/asm/ptrace.h | 34 ++ arch/riscv/include/uapi/asm/sigcontext.h | 1 + arch/riscv/kernel/Makefile | 1 + arch/riscv/kernel/asm-offsets.c | 8 + arch/riscv/kernel/cpufeature.c | 13 + arch/riscv/kernel/entry.S | 28 +- arch/riscv/kernel/head.S | 23 + arch/riscv/kernel/process.c | 26 +- arch/riscv/kernel/ptrace.c | 95 ++++ arch/riscv/kernel/signal.c | 142 +++++- arch/riscv/kernel/sys_hwprobe.c | 2 + arch/riscv/kernel/sys_riscv.c | 10 + arch/riscv/kernel/traps.c | 43 ++ arch/riscv/kernel/usercfi.c | 545 +++++++++++++++++++++ arch/riscv/kernel/vdso/Makefile | 6 + arch/riscv/kernel/vdso/flush_icache.S | 4 + arch/riscv/kernel/vdso/getcpu.S | 4 + arch/riscv/kernel/vdso/rt_sigreturn.S | 4 + arch/riscv/kernel/vdso/sys_hwprobe.S | 4 + arch/riscv/mm/init.c | 2 +- arch/riscv/mm/pgtable.c | 17 + include/linux/cpu.h | 4 + include/linux/mm.h | 7 + include/uapi/linux/elf.h | 1 + include/uapi/linux/prctl.h | 27 + kernel/sys.c | 30 ++ tools/testing/selftests/riscv/Makefile | 2 +- tools/testing/selftests/riscv/cfi/.gitignore | 3 + tools/testing/selftests/riscv/cfi/Makefile | 10 + tools/testing/selftests/riscv/cfi/cfi_rv_test.h | 82 ++++ tools/testing/selftests/riscv/cfi/riscv_cfi_test.c | 171 +++++++ tools/testing/selftests/riscv/cfi/shadowstack.c | 385 +++++++++++++++ tools/testing/selftests/riscv/cfi/shadowstack.h | 27 + 55 files changed, 2346 insertions(+), 29 deletions(-) --- base-commit: 39a803b754d5224a3522016b564113ee1e4091b2 change-id: 20240930-v5_user_cfi_series-3dc332f8f5b2 -- - debug

8 months, 3 weeks

2
31
0 0

[PATCH v1 00/16] iommufd: Add vIOMMU infrastructure (Part-4 vCMDQ)

by Nicolin Chen

The vIOMMU object is designed to represent a slice of an IOMMU HW for its virtualization features shared with or passed to user space (a VM mostly) in a way of HW acceleration. This extended the HWPT-based design for more advanced virtualization feature. A vCMDQ introduced by this series as a part of the vIOMMU infrastructure represents a HW supported queue/buffer for VM to use exclusively, e.g. - NVIDIA's virtual command queue - AMD vIOMMU's command buffer either of which is an IOMMU HW feature to directly load and execute cache invalidation commands issued by a guest kernel, to shoot down TLB entries that HW cached for guest-owned stage-1 page table entries. This is a big improvement since there is no VM Exit during an invalidation, compared to the traditional invalidation pathway by trapping a guest-own invalidation queue and forwarding those commands/requests to the host kernel that will eventually fill a HW-owned queue to execute those commands. Thus, a vCMDQ object, as an initial use case, is all about a guest-owned HW command queue that VMM can allocate/configure depending on the request from a guest kernel. Introduce a new IOMMUFD_OBJ_VCMDQ and its allocator IOMMUFD_CMD_VCMDQ_ALLOC allowing VMM to forward the IOMMU-specific queue info, such as queue base address, size, and etc. Meanwhile, a guest-owned command queue needs the kernel (a command queue driver) to control the queue by reading/writing its consumer and producer indexes, which means the command queue HW allows the guest kernel to get a direct R/W access to those registers. Introduce an mmap infrastructure to the iommufd core so as to support pass through a piece of MMIO region from the host physical address space to the guest physical address space. The VMA info (vm_pgoff/size) used by an mmap must be pre-allocated during the IOMMUFD_CMD_VCMDQ_ALLOC and given those info to the user space as an output driver-data by the IOMMUFD_CMD_VCMDQ_ALLOC. So, this requires a driver-specific user data support by a vIOMMU object. As a real-world use case, this series implements a vCMDQ support to the tegra241-cmdqv driver for the vCMDQ on NVIDIA Grace CPU. In another word, this is also the Tegra CMDQV series Part-2 (user-space support), reworked from Previous RFCv1: https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/ This is on Github: https://github.com/nicolinc/iommufd/commits/iommufd_vcmdq-v1 Paring QEMU branch for testing: https://github.com/nicolinc/qemu/commits/wip/for_iommufd_vcmdq-v1 Thanks Nicolin Nicolin Chen (16): iommu: Pass in a driver-level user data structure to viommu_alloc op iommufd/viommu: Allow driver-specific user data for a vIOMMU object iommu: Add iommu_copy_struct_to_user helper iommufd: Add iommufd_struct_destroy to revert iommufd_viommu_alloc iommufd/selftest: Support user_data in mock_viommu_alloc iommufd/selftest: Add covearge for viommu data iommufd/viommu: Add driver-allocated vDEVICE support iommufd/viommu: Introduce IOMMUFD_OBJ_VCMDQ and its related struct iommufd/viommmu: Add IOMMUFD_CMD_VCMDQ_ALLOC ioctl iommufd: Add mmap interface iommufd/selftest: Add coverage for the new mmap interface Documentation: userspace-api: iommufd: Update vCMDQ iommu/tegra241-cmdqv: Use request_threaded_irq iommu/arm-smmu-v3: Add vsmmu_alloc impl op iommu/tegra241-cmdqv: Add user-space use support iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 24 +- drivers/iommu/iommufd/iommufd_private.h | 20 +- drivers/iommu/iommufd/iommufd_test.h | 17 + include/linux/iommu.h | 43 ++- include/linux/iommufd.h | 93 +++++ include/uapi/linux/iommufd.h | 87 +++++ tools/testing/selftests/iommu/iommufd_utils.h | 21 +- .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 26 +- .../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 349 +++++++++++++++++- drivers/iommu/iommufd/driver.c | 54 +++ drivers/iommu/iommufd/main.c | 54 ++- drivers/iommu/iommufd/selftest.c | 58 ++- drivers/iommu/iommufd/viommu.c | 78 +++- tools/testing/selftests/iommu/iommufd.c | 34 +- .../selftests/iommu/iommufd_fail_nth.c | 5 +- Documentation/userspace-api/iommufd.rst | 11 + 16 files changed, 912 insertions(+), 62 deletions(-) -- 2.43.0

8 months, 3 weeks

6
51
0 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror