- Linux-kselftest-mirror - lists.linaro.org

[PATCH RESEND v4 0/7] futex: Create set_robust_list2

by André Almeida

This patch adds a new robust_list() syscall. The current syscall can't be expanded to cover the following use case, so a new one is needed. This new syscall allows users to set multiple robust lists per process and to have either 32bit or 64bit pointers in the list. * Use case FEX-Emu[1] is an application that runs x86 and x86-64 binaries on an AArch64 Linux host. One of the tasks of FEX-Emu is to translate syscalls from one platform to another. Existing set_robust_list() can't be easily translated because of two limitations: 1) x86 apps can have 32bit pointers robust lists. For a x86-64 kernel this is not a problem, because of the compat entry point. But there's no such compat entry point for AArch64, so the kernel would do the pointer arithmetic wrongly. Is also unviable to userspace to keep track every addition/removal to the robust list and keep a 64bit version of it somewhere else to feed the kernel. Thus, the new interface has an option of telling the kernel if the list is filled with 32bit or 64bit pointers. 2) Apps can set just one robust list (in theory, x86-64 can set two if they also use the compat entry point). That means that when a x86 app asks FEX-Emu to call set_robust_list(), FEX have two options: to overwrite their own robust list pointer and make the app robust, or to ignore the app robust list and keep the emulator robust. The new interface allows for multiple robust lists per application, solving this. * Interface This is the proposed interface: long set_robust_list2(void *head, int index, unsigned int flags) `head` is the head of the userspace struct robust_list_head, just as old set_robust_list(). It needs to be a void pointer since it can point to a normal robust_list_head or a compat_robust_list_head. `flags` can be used for defining the list type: enum robust_list_type { ROBUST_LIST_32BIT, ROBUST_LIST_64BIT, }; `index` is the index in the internal robust_list's linked list (the naming starts to get confusing, I reckon). If `index == -1`, that means that user wants to set a new robust_list, and the kernel will append it in the end of the list, assign a new index and return this index to the user. If `index >= 0`, that means that user wants to re-set `*head` of an already existing list (similarly to what happens when you call set_robust_list() twice with different `*head`). If `index` is out of range, or it points to a non-existing robust_list, or if the internal list is full, an error is returned. * Implementation The implementation re-uses most of the existing robust list interface as possible. The new task_struct member `struct list_head robust_list2` is just a linked list where new lists are appended as the user requests more lists, and by futex_cleanup(), the kernel walks through the internal list feeding exit_robust_list() with the robust_list's. This implementation supports up to 10 lists (defined at ROBUST_LISTS_PER_TASK), but it was an arbitrary number for this RFC. For the described use case above, 4 should be enough, I'm not sure which should be the limit. It doesn't support list removal (should it support?). It doesn't have a proper get_robust_list2() yet as well, but I can add it in a next revision. We could also have a generic robust_list() syscall that can be used to set/get and be controlled by flags. The new interface has a `unsigned int flags` argument, making it extensible for future use cases as well. It refuses unaligned `head` addresses. It doesn't have a limit for elements in a single list (like ROBUST_LIST_LIMIT), it destroys the list as it is parsed to be safe against circular lists. * Testing This patcheset has a selftest patch that expands this one: https://lore.kernel.org/lkml/20250212131123.37431-1-andrealmeid@igalia.com/ Also, FEX-Emu added support for this interface to validate it: https://github.com/FEX-Emu/FEX/pull/3966 Feedback is very welcomed! Thanks, André [1] https://github.com/FEX-Emu/FEX Changelog: - Rebased on top of new futex work (private hash) v4: https://lore.kernel.org/lkml/20250225183531.682556-1-andrealmeid@igalia.com/ - Refuse unaligned head pointers - Ignore ROBUST_LIST_LIMIT for lists created with this interface and make it robust against circular lists - Fix a get_robust_list() syscall bug for getting the list from another thread - Adapt selftest to use the new interface v3: https://lore.kernel.org/lkml/20241217174958.477692-1-andrealmeid@igalia.com/ - Old syscall set_robust_list() adds new head to the internal linked list of robust lists pointers, instead of having a field just for them. Remove tsk->robust_list and use only tsk->robust_list2 v2: https://lore.kernel.org/lkml/20241101162147.284993-1-andrealmeid@igalia.com/ - Added a patch to properly deal with exit_robust_list() in 64bit vs 32bit - Wired-up syscall for all archs - Added more of the cover letter to the commit message v1: https://lore.kernel.org/lkml/20241024145735.162090-1-andrealmeid@igalia.com/ --- André Almeida (7): selftests/futex: Add ASSERT_ macros selftests/futex: Create test for robust list futex: Use explicit sizes for compat_exit_robust_list futex: Create set_robust_list2 futex: Wire up set_robust_list2 syscall futex: Remove the limit of elements for sys_set_robust_list2 lists selftests: futex: Expand robust list test for the new interface arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/mips/kernel/syscalls/syscall_n64.tbl | 1 + arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + include/linux/compat.h | 12 +- include/linux/futex.h | 16 +- include/linux/sched.h | 5 +- include/uapi/asm-generic/unistd.h | 2 + include/uapi/linux/futex.h | 24 + kernel/futex/core.c | 165 ++++- kernel/futex/futex.h | 5 + kernel/futex/syscalls.c | 85 ++- kernel/sys_ni.c | 1 + scripts/syscall.tbl | 1 + .../testing/selftests/futex/functional/.gitignore | 1 + tools/testing/selftests/futex/functional/Makefile | 3 +- .../selftests/futex/functional/robust_list.c | 706 +++++++++++++++++++++ tools/testing/selftests/futex/include/logging.h | 38 ++ 29 files changed, 1026 insertions(+), 53 deletions(-) --- base-commit: 3ee84e3dd88e39b55b534e17a7b9a181f1d46809 change-id: 20250225-tonyk-robust_futex-60adeedac695 Best regards, -- André Almeida <andrealmeid(a)igalia.com>

2 weeks, 3 days

3
11
0 0

[PATCH] selftests: vDSO: fix -Wunitialized in powerpc VDSO_CALL() wrapper

by Thomas Weißschuh

The _rval register variable is meant to be an output operand of the asm statement but is instead used as input operand. clang 20.1 notices this and triggers -Wuninitialized warnings: tools/testing/selftests/timers/auxclock.c:154:10: error: variable '_rval' is uninitialized when used here [-Werror,-Wuninitialized] 154 | return VDSO_CALL(self->vdso_clock_gettime64, 2, clockid, ts); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ tools/testing/selftests/timers/../vDSO/vdso_call.h:59:10: note: expanded from macro 'VDSO_CALL' 59 | : "r" (_rval) \ | ^~~~~ tools/testing/selftests/timers/auxclock.c:154:10: note: variable '_rval' is declared here tools/testing/selftests/timers/../vDSO/vdso_call.h:47:2: note: expanded from macro 'VDSO_CALL' 47 | register long _rval asm ("r3"); \ | ^ It seems the list of input and output operands have been switched around. However as the argument registers are not always initialized they can not be marked as pure inputs as that would trigger -Wuninitialized warnings. Adding _rval as another input and output operand does also not work as it would collide with the existing _r3 variable. Instead reuse _r3 for both the argument and the return value. Reported-by: kernel test robot <lkp(a)intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506180223.BOOk5jDK-lkp@intel.com/ Fixes: 6eda706a535c ("selftests: vDSO: fix the way vDSO functions are called for powerpc") Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- tools/testing/selftests/vDSO/vdso_call.h | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/vDSO/vdso_call.h b/tools/testing/selftests/vDSO/vdso_call.h index bb237d771051bd4103367fc60b54b505b7586965..e7205584cbdca5e10c13c1e9425d2023b02cda7f 100644 --- a/tools/testing/selftests/vDSO/vdso_call.h +++ b/tools/testing/selftests/vDSO/vdso_call.h @@ -44,7 +44,6 @@ register long _r6 asm ("r6"); \ register long _r7 asm ("r7"); \ register long _r8 asm ("r8"); \ - register long _rval asm ("r3"); \ \ LOADARGS_##nr(fn, args); \ \ @@ -54,13 +53,13 @@ " bns+ 1f\n" \ " neg 3, 3\n" \ "1:" \ - : "+r" (_r0), "=r" (_r3), "+r" (_r4), "+r" (_r5), \ + : "+r" (_r0), "+r" (_r3), "+r" (_r4), "+r" (_r5), \ "+r" (_r6), "+r" (_r7), "+r" (_r8) \ - : "r" (_rval) \ + : \ : "r9", "r10", "r11", "r12", "cr0", "cr1", "cr5", \ "cr6", "cr7", "xer", "lr", "ctr", "memory" \ ); \ - _rval; \ + _r3; \ }) #else --- base-commit: 52da431bf03b5506203bca27fe14a97895c80faf change-id: 20250618-vdso-vdso_call-uninit-ccc33be00568 Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

2 weeks, 3 days

1
0
0 0

[PATCH rc 0/4] Fix iommufd selftest FAIL and warnings with v6.16

by Nicolin Chen

A few selftest harness changes being merged to v6.16, which exposed some bugs and vulnerabilities in the iommufd selftest code. Fix them properly. Note that the patch fixing the build warnings at mfd is not ideal, as it has possibly hit some corner case in the gcc: https://lore.kernel.org/all/aEi8DV+ReF3v3Rlf@nvidia.com/ This is on github: https://github.com/nicolinc/iommufd/commits/iommufd_selftest_fixes-v6.16 Thanks Nicolin Nicolin Chen (4): iommufd/selftest: Fix iommufd_dirty_tracking with large hugepage sizes iommufd/selftest: Add missing close(mfd) in memfd_mmap() iommufd/selftest: Add asserts testing global mfd iommufd/selftest: Fix build warnings due to uninitialized mfd tools/testing/selftests/iommu/iommufd_utils.h | 9 ++++- tools/testing/selftests/iommu/iommufd.c | 38 +++++++++++++++---- 2 files changed, 38 insertions(+), 9 deletions(-) -- 2.43.0

2 weeks, 3 days

2
14
0 0

[PATCH net-next, v3] selftest: Add selftest for multicast address notifications

by Yuyang Huang

This commit adds a new kernel selftest to verify RTNLGRP_IPV4_MCADDR and RTNLGRP_IPV6_MCADDR notifications. The test works by adding and removing a dummy interface and then confirming that the system correctly receives join and removal notifications for the 224.0.0.1 and ff02::1 multicast addresses. The test relies on the iproute2 version to be 6.13+. Tested by the following command: $ vng -v --user root --cpus 16 -- \ make -C tools/testing/selftests TARGETS=net TEST_PROGS=rtnetlink_notification.sh \ TEST_GEN_PROGS="" run_tests Cc: Maciej Żenczykowski <maze(a)google.com> Cc: Lorenzo Colitti <lorenzo(a)google.com> Signed-off-by: Yuyang Huang <yuyanghuang(a)google.com> --- Changelog since v2: - Move the test cases to a separate file. Changelog since v1: - Skip the test if the iproute2 is too old. tools/testing/selftests/net/Makefile | 1 + .../selftests/net/rtnetlink_notification.sh | 159 ++++++++++++++++++ 2 files changed, 160 insertions(+) create mode 100755 tools/testing/selftests/net/rtnetlink_notification.sh diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 70a38f485d4d..ad258b25bc9d 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -40,6 +40,7 @@ TEST_PROGS += netns-name.sh TEST_PROGS += link_netns.py TEST_PROGS += nl_netdev.py TEST_PROGS += rtnetlink.py +TEST_PROGS += rtnetlink_notification.sh TEST_PROGS += srv6_end_dt46_l3vpn_test.sh TEST_PROGS += srv6_end_dt4_l3vpn_test.sh TEST_PROGS += srv6_end_dt6_l3vpn_test.sh diff --git a/tools/testing/selftests/net/rtnetlink_notification.sh b/tools/testing/selftests/net/rtnetlink_notification.sh new file mode 100755 index 000000000000..a2c1afed5023 --- /dev/null +++ b/tools/testing/selftests/net/rtnetlink_notification.sh @@ -0,0 +1,159 @@ +#!/bin/bash +# +# This test is for checking rtnetlink notification callpaths, and get as much +# coverage as possible. +# +# set -e + +ALL_TESTS=" + kci_test_mcast_addr_notification +" + +VERBOSE=0 +PAUSE=no +PAUSE_ON_FAIL=no + +source lib.sh + +# set global exit status, but never reset nonzero one. +check_err() +{ + if [ $ret -eq 0 ]; then + ret=$1 + fi + [ -n "$2" ] && echo "$2" +} + +run_cmd_common() +{ + local cmd="$*" + local out + if [ "$VERBOSE" = "1" ]; then + echo "COMMAND: ${cmd}" + fi + out=$($cmd 2>&1) + rc=$? + if [ "$VERBOSE" = "1" -a -n "$out" ]; then + echo " $out" + fi + return $rc +} + +run_cmd() { + run_cmd_common "$@" + rc=$? + check_err $rc + return $rc +} + +end_test() +{ + echo "$*" + [ "${VERBOSE}" = "1" ] && echo + + if [[ $ret -ne 0 ]] && [[ "${PAUSE_ON_FAIL}" = "yes" ]]; then + echo "Hit enter to continue" + read a + fi; + + if [ "${PAUSE}" = "yes" ]; then + echo "Hit enter to continue" + read a + fi + +} + +kci_test_mcast_addr_notification() +{ + local tmpfile + local monitor_pid + local match_result + + tmpfile=$(mktemp) + + ip monitor maddr > $tmpfile & + monitor_pid=$! + sleep 1 + if [ ! -e "/proc/$monitor_pid" ]; then + end_test "SKIP: mcast addr notification: iproute2 too old" + rm $tmpfile + return $ksft_skip + fi + + run_cmd ip link add name test-dummy1 type dummy + run_cmd ip link set test-dummy1 up + run_cmd ip link del dev test-dummy1 + sleep 1 + + match_result=$(grep -cE "test-dummy1.*(224.0.0.1|ff02::1)" $tmpfile) + + kill $monitor_pid + rm $tmpfile + # There should be 4 line matches as follows. + # 13: test-dummy1 inet6 mcast ff02::1 scope global + # 13: test-dummy1 inet mcast 224.0.0.1 scope global + # Deleted 13: test-dummy1 inet mcast 224.0.0.1 scope global + # Deleted 13: test-dummy1 inet6 mcast ff02::1 scope global + if [ $match_result -ne 4 ];then + end_test "FAIL: mcast addr notification" + return 1 + fi + end_test "PASS: mcast addr notification" +} + +kci_test_rtnl() +{ + local current_test + local ret=0 + + for current_test in ${TESTS:-$ALL_TESTS}; do + $current_test + check_err $? + done + + return $ret +} + +usage() +{ + cat <<EOF +usage: ${0##*/} OPTS + + -t <test> Test(s) to run (default: all) + (options: $(echo $ALL_TESTS)) + -v Verbose mode (show commands and output) + -P Pause after every test + -p Pause after every failing test before cleanup (for debugging) +EOF +} + +#check for needed privileges +if [ "$(id -u)" -ne 0 ];then + end_test "SKIP: Need root privileges" + exit $ksft_skip +fi + +for x in ip;do + $x -Version 2>/dev/null >/dev/null + if [ $? -ne 0 ];then + end_test "SKIP: Could not run test without the $x tool" + exit $ksft_skip + fi +done + +while getopts t:hvpP o; do + case $o in + t) TESTS=$OPTARG;; + v) VERBOSE=1;; + p) PAUSE_ON_FAIL=yes;; + P) PAUSE=yes;; + h) usage; exit 0;; + *) usage; exit 1;; + esac +done + +[ $PAUSE = "yes" ] && PAUSE_ON_FAIL="no" + +kci_test_rtnl + +exit $? -- 2.50.0.rc1.591.g9c95f17f64-goog

2 weeks, 3 days

1
1
0 0

[PATCH net-next v3 0/5] netconsole: Add support for msgid in sysdata

by Gustavo Luiz Duarte

This patch series introduces a new feature to netconsole which allows appending a message ID to the userdata dictionary. If the msgid feature is enabled, the message ID is built from a per-target 32 bit counter that is incremented and appended to every message sent to the target. Example:: echo 1 > "/sys/kernel/config/netconsole/cmdline0/userdata/msgid_enabled" echo "This is message #1" > /dev/kmsg echo "This is message #2" > /dev/kmsg 13,434,54928466,-;This is message #1 msgid=1 13,435,54934019,-;This is message #2 msgid=2 This feature can be used by the target to detect if messages were dropped or reordered before reaching the target. This allows system administrators to assess the reliability of their netconsole pipeline and detect loss of messages due to network contention or temporary unavailability. Suggested-by: Breno Leitao <leitao(a)debian.org> Signed-off-by: Gustavo Luiz Duarte <gustavold(a)gmail.com> --- Changes in v3: - Add kdoc documentation for msgcounter. - Link to v2: https://lore.kernel.org/r/20250612-netconsole-msgid-v2-0-d4c1abc84bac@gmail… Changes in v2: - Use wrapping_assign_add() to avoid warnings in UBSAN and friends. - Improve documentation to clarify wrapping and distinguish msgid from sequnum. - Rebase and fix conflict in prepare_extradata(). - Link to v1: https://lore.kernel.org/r/20250611-netconsole-msgid-v1-0-1784a51feb1e@gmail… --- Gustavo Luiz Duarte (5): netconsole: introduce 'msgid' as a new sysdata field netconsole: implement configfs for msgid_enabled netconsole: append msgid to sysdata selftests: netconsole: Add tests for 'msgid' feature in sysdata docs: netconsole: document msgid feature Documentation/networking/netconsole.rst | 32 +++++++++++ drivers/net/netconsole.c | 66 ++++++++++++++++++++++ .../selftests/drivers/net/netcons_sysdata.sh | 30 ++++++++++ 3 files changed, 128 insertions(+) --- base-commit: 08207f42d3ffee43c97f16baf03d7426a3c353ca change-id: 20250609-netconsole-msgid-b93c6f8e9c60 Best regards, -- Gustavo Luiz Duarte <gustavold(a)gmail.com>

2 weeks, 3 days

4
10
0 0

[PATCH 0/3] Add kselftest harness selftest with variant

by Wei Yang

We already have a selftest for harness, while there is not usage of FIXTURE_VARIANT. Patch 3 add FIXTURE_VARIANT usage in the selftest. Patch 1/2 are trivial fix. Wei Yang (3): selftests: correct one typo in comment selftests: print 0 if no test is chosen selftests: harness: Add kselftest harness selftest with variant tools/testing/selftests/kselftest.h | 2 +- tools/testing/selftests/kselftest_harness.h | 2 +- .../kselftest_harness/harness-selftest.c | 34 +++++++++++++++++++ .../harness-selftest.expected | 22 +++++++++--- 4 files changed, 54 insertions(+), 6 deletions(-) -- 2.34.1

2 weeks, 3 days

3
14
0 0

[PATCH net-next v2 1/3] netmem: fix netmem comments

by Mina Almasry

Trivial fix to a couple of outdated netmem comments. No code changes, just more accurately describing current code. Signed-off-by: Mina Almasry <almasrymina(a)google.com> --- v2: https://lore.kernel.org/netdev/20250613042804.3259045-2-almasrymina@google.… - Adjust comment for clearing lsb as (Jakub) --- include/net/netmem.h | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/include/net/netmem.h b/include/net/netmem.h index 386164fb9c18..850869b45b45 100644 --- a/include/net/netmem.h +++ b/include/net/netmem.h @@ -89,8 +89,7 @@ static inline unsigned int net_iov_idx(const struct net_iov *niov) * typedef netmem_ref - a nonexistent type marking a reference to generic * network memory. * - * A netmem_ref currently is always a reference to a struct page. This - * abstraction is introduced so support for new memory types can be added. + * A netmem_ref can be a struct page* or a struct net_iov* underneath. * * Use the supplied helpers to obtain the underlying memory pointer and fields. */ @@ -117,9 +116,6 @@ static inline struct page *__netmem_to_page(netmem_ref netmem) return (__force struct page *)netmem; } -/* This conversion fails (returns NULL) if the netmem_ref is not struct page - * backed. - */ static inline struct page *netmem_to_page(netmem_ref netmem) { if (WARN_ON_ONCE(netmem_is_net_iov(netmem))) @@ -178,6 +174,21 @@ static inline unsigned long netmem_pfn_trace(netmem_ref netmem) return page_to_pfn(netmem_to_page(netmem)); } +/* __netmem_clear_lsb - convert netmem_ref to struct net_iov * for access to + * common fields. + * @netmem: netmem reference to extract as net_iov. + * + * All the sub types of netmem_ref (page, net_iov) have the same pp, pp_magic, + * dma_addr, and pp_ref_count fields at the same offsets. Thus, we can access + * these fields without a type check to make sure that the underlying mem is + * net_iov or page. + * + * The resulting value of this function can only be used to access the fields + * that are NET_IOV_ASSERT_OFFSET'd. Accessing any other fields will result in + * undefined behavior. + * + * Return: the netmem_ref cast to net_iov* regardless of its underlying type. + */ static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem) { return (struct net_iov *)((__force unsigned long)netmem & ~NET_IOV); base-commit: 8909f5f4ecd551c2299b28e05254b77424c8c7dc -- 2.50.0.rc1.591.g9c95f17f64-goog

2 weeks, 3 days

3
6
0 0

[PATCH net-next, v4] selftest: Add selftest for multicast address notifications

by Yuyang Huang

This commit adds a new kernel selftest to verify RTNLGRP_IPV4_MCADDR and RTNLGRP_IPV6_MCADDR notifications. The test works by adding and removing a dummy interface and then confirming that the system correctly receives join and removal notifications for the 224.0.0.1 and ff02::1 multicast addresses. The test relies on the iproute2 version to be 6.13+. Tested by the following command: $ vng -v --user root --cpus 16 -- \ make -C tools/testing/selftests TARGETS=net TEST_PROGS=rtnetlink_notification.sh \ TEST_GEN_PROGS="" run_tests Cc: Maciej Żenczykowski <maze(a)google.com> Cc: Lorenzo Colitti <lorenzo(a)google.com> Signed-off-by: Yuyang Huang <yuyanghuang(a)google.com> --- Changelog since v3: - Refactor the test to use utilities provided by lib.h. - Fix shellcheck warnings. Changelog since v2: - Move the test case to a separate file. Changelog since v1: - Skip the test if the iproute2 is too old. tools/testing/selftests/net/Makefile | 1 + .../selftests/net/rtnetlink_notification.sh | 70 +++++++++++++++++++ 2 files changed, 71 insertions(+) create mode 100755 tools/testing/selftests/net/rtnetlink_notification.sh diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index ab996bd22a5f..3abb74d563a7 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -41,6 +41,7 @@ TEST_PROGS += netns-name.sh TEST_PROGS += link_netns.py TEST_PROGS += nl_netdev.py TEST_PROGS += rtnetlink.py +TEST_PROGS += rtnetlink_notification.sh TEST_PROGS += srv6_end_dt46_l3vpn_test.sh TEST_PROGS += srv6_end_dt4_l3vpn_test.sh TEST_PROGS += srv6_end_dt6_l3vpn_test.sh diff --git a/tools/testing/selftests/net/rtnetlink_notification.sh b/tools/testing/selftests/net/rtnetlink_notification.sh new file mode 100755 index 000000000000..39c1b815bbe4 --- /dev/null +++ b/tools/testing/selftests/net/rtnetlink_notification.sh @@ -0,0 +1,70 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# This test is for checking rtnetlink notification callpaths, and get as much +# coverage as possible. +# +# set -e + +ALL_TESTS=" + kci_test_mcast_addr_notification +" + +source lib.sh + +kci_test_mcast_addr_notification() +{ + RET=0 + local tmpfile + local monitor_pid + local match_result + local test_dev="test-dummy1" + + tmpfile=$(mktemp) + defer rm "$tmpfile" + + ip monitor maddr > $tmpfile & + monitor_pid=$! + defer kill_process "$monitor_pid" + + sleep 1 + + if [ ! -e "/proc/$monitor_pid" ]; then + RET=$ksft_skip + log_test "mcast addr notification: iproute2 too old" + return $RET + fi + + ip link add name "$test_dev" type dummy + check_err $? "failed to add dummy interface" + ip link set "$test_dev" up + check_err $? "failed to set dummy interface up" + ip link del dev "$test_dev" + check_err $? "Failed to delete dummy interface" + sleep 1 + + # There should be 4 line matches as follows. + # 13: test-dummy1 inet6 mcast ff02::1 scope global + # 13: test-dummy1 inet mcast 224.0.0.1 scope global + # Deleted 13: test-dummy1 inet mcast 224.0.0.1 scope global + # Deleted 13: test-dummy1 inet6 mcast ff02::1 scope global + match_result=$(grep -cE "$test_dev.*(224.0.0.1|ff02::1)" "$tmpfile") + if [ "$match_result" -ne 4 ]; then + RET=$ksft_fail + fi + log_test "mcast addr notification: Expected 4 matches, got $match_result" + return $RET +} + +#check for needed privileges +if [ "$(id -u)" -ne 0 ];then + RET=$ksft_skip + log_test "need root privileges" + exit $RET +fi + +require_command ip + +tests_run + +exit $EXIT_STATUS -- 2.50.0.rc1.591.g9c95f17f64-goog

2 weeks, 3 days

2
1
0 0

[PATCH v2] selftests: nettest: Fix typo in log and error messages for clarity

by Alok Tiwari

This patch corrects several logging and error message in nettest.c: - Corrects function name in log messages "setsockopt" -> "getsockopt". - Closes missing parentheses in "setsockopt(IPV6_FREEBIND)". - Replaces misleading error text ("Invalid port") with the correct description ("Invalid prefix length"). - remove Redundant wording like "status from status" and clarifies context in IPC error messages. These changes improve readability and aid in debugging test output. Signed-off-by: Alok Tiwari <alok.a.tiwari(a)oracle.com> --- v1 ->v2 remove extra space --- tools/testing/selftests/net/nettest.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/net/nettest.c b/tools/testing/selftests/net/nettest.c index cd8a58097448..1f5227f3d64d 100644 --- a/tools/testing/selftests/net/nettest.c +++ b/tools/testing/selftests/net/nettest.c @@ -385,7 +385,7 @@ static int get_bind_to_device(int sd, char *name, size_t len) name[0] = '\0'; rc = getsockopt(sd, SOL_SOCKET, SO_BINDTODEVICE, name, &optlen); if (rc < 0) - log_err_errno("setsockopt(SO_BINDTODEVICE)"); + log_err_errno("getsockopt(SO_BINDTODEVICE)"); return rc; } @@ -535,7 +535,7 @@ static int set_freebind(int sd, int version) break; case AF_INET6: if (setsockopt(sd, SOL_IPV6, IPV6_FREEBIND, &one, sizeof(one))) { - log_err_errno("setsockopt(IPV6_FREEBIND"); + log_err_errno("setsockopt(IPV6_FREEBIND)"); rc = -1; } break; @@ -812,7 +812,7 @@ static int convert_addr(struct sock_args *args, const char *_str, sep++; if (str_to_uint(sep, 1, pfx_len_max, &args->prefix_len) != 0) { - fprintf(stderr, "Invalid port\n"); + fprintf(stderr, "Invalid prefix length\n"); return 1; } } else { @@ -1272,7 +1272,7 @@ static int msg_loop(int client, int sd, void *addr, socklen_t alen, } } - nfds = interactive ? MAX(fileno(stdin), sd) + 1 : sd + 1; + nfds = interactive ? MAX(fileno(stdin), sd) + 1 : sd + 1; while (1) { FD_ZERO(&rfds); FD_SET(sd, &rfds); @@ -1492,7 +1492,7 @@ static int lsock_init(struct sock_args *args) sd = socket(args->version, args->type, args->protocol); if (sd < 0) { log_err_errno("Error opening socket"); - return -1; + return -1; } if (set_reuseaddr(sd) != 0) @@ -1912,7 +1912,7 @@ static int ipc_parent(int cpid, int fd, struct sock_args *args) * waiting to be told when to continue */ if (read(fd, &buf, sizeof(buf)) <= 0) { - log_err_errno("Failed to read IPC status from status"); + log_err_errno("Failed to read IPC status from pipe"); return 1; } if (!buf) { -- 2.47.1

2 weeks, 3 days

3
2
0 0

[PATCH v4 13/15] selftests/sched_ext: Add test for sched_ext dl_server

by Joel Fernandes

From: Andrea Righi <arighi(a)nvidia.com> Add a selftest to validate the correct behavior of the deadline server for the ext_sched_class. [ Joel: Replaced occurences of CFS in the test with EXT. ] Signed-off-by: Joel Fernandes <joelagnelf(a)nvidia.com> Signed-off-by: Andrea Righi <arighi(a)nvidia.com> --- tools/testing/selftests/sched_ext/Makefile | 1 + .../selftests/sched_ext/rt_stall.bpf.c | 23 ++ tools/testing/selftests/sched_ext/rt_stall.c | 213 ++++++++++++++++++ 3 files changed, 237 insertions(+) create mode 100644 tools/testing/selftests/sched_ext/rt_stall.bpf.c create mode 100644 tools/testing/selftests/sched_ext/rt_stall.c diff --git a/tools/testing/selftests/sched_ext/Makefile b/tools/testing/selftests/sched_ext/Makefile index 9d9d6b4c38b0..f0a8cba3a99f 100644 --- a/tools/testing/selftests/sched_ext/Makefile +++ b/tools/testing/selftests/sched_ext/Makefile @@ -182,6 +182,7 @@ auto-test-targets := \ select_cpu_dispatch_bad_dsq \ select_cpu_dispatch_dbl_dsp \ select_cpu_vtime \ + rt_stall \ test_example \ testcase-targets := $(addsuffix .o,$(addprefix $(SCXOBJ_DIR)/,$(auto-test-targets))) diff --git a/tools/testing/selftests/sched_ext/rt_stall.bpf.c b/tools/testing/selftests/sched_ext/rt_stall.bpf.c new file mode 100644 index 000000000000..80086779dd1e --- /dev/null +++ b/tools/testing/selftests/sched_ext/rt_stall.bpf.c @@ -0,0 +1,23 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * A scheduler that verified if RT tasks can stall SCHED_EXT tasks. + * + * Copyright (c) 2025 NVIDIA Corporation. + */ + +#include <scx/common.bpf.h> + +char _license[] SEC("license") = "GPL"; + +UEI_DEFINE(uei); + +void BPF_STRUCT_OPS(rt_stall_exit, struct scx_exit_info *ei) +{ + UEI_RECORD(uei, ei); +} + +SEC(".struct_ops.link") +struct sched_ext_ops rt_stall_ops = { + .exit = (void *)rt_stall_exit, + .name = "rt_stall", +}; diff --git a/tools/testing/selftests/sched_ext/rt_stall.c b/tools/testing/selftests/sched_ext/rt_stall.c new file mode 100644 index 000000000000..d4cb545ebfd8 --- /dev/null +++ b/tools/testing/selftests/sched_ext/rt_stall.c @@ -0,0 +1,213 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2025 NVIDIA Corporation. + */ +#define _GNU_SOURCE +#include <stdio.h> +#include <stdlib.h> +#include <unistd.h> +#include <sched.h> +#include <sys/prctl.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <time.h> +#include <linux/sched.h> +#include <signal.h> +#include <bpf/bpf.h> +#include <scx/common.h> +#include <sys/wait.h> +#include <unistd.h> +#include "rt_stall.bpf.skel.h" +#include "scx_test.h" +#include "../kselftest.h" + +#define CORE_ID 0 /* CPU to pin tasks to */ +#define RUN_TIME 5 /* How long to run the test in seconds */ + +/* Simple busy-wait function for test tasks */ +static void process_func(void) +{ + while (1) { + /* Busy wait */ + for (volatile unsigned long i = 0; i < 10000000UL; i++); + } +} + +/* Set CPU affinity to a specific core */ +static void set_affinity(int cpu) +{ + cpu_set_t mask; + + CPU_ZERO(&mask); + CPU_SET(cpu, &mask); + if (sched_setaffinity(0, sizeof(mask), &mask) != 0) { + perror("sched_setaffinity"); + exit(EXIT_FAILURE); + } +} + +/* Set task scheduling policy and priority */ +static void set_sched(int policy, int priority) +{ + struct sched_param param; + + param.sched_priority = priority; + if (sched_setscheduler(0, policy, &param) != 0) { + perror("sched_setscheduler"); + exit(EXIT_FAILURE); + } +} + +/* Get process runtime from /proc/<pid>/stat */ +static float get_process_runtime(int pid) +{ + char path[256]; + FILE *file; + long utime, stime; + int fields; + + snprintf(path, sizeof(path), "/proc/%d/stat", pid); + file = fopen(path, "r"); + if (file == NULL) { + perror("Failed to open stat file"); + return -1; + } + + /* Skip the first 13 fields and read the 14th and 15th */ + fields = fscanf(file, + "%*d %*s %*c %*d %*d %*d %*d %*d %*u %*u %*u %*u %*u %lu %lu", + &utime, &stime); + fclose(file); + + if (fields != 2) { + fprintf(stderr, "Failed to read stat file\n"); + return -1; + } + + /* Calculate the total time spent in the process */ + long total_time = utime + stime; + long ticks_per_second = sysconf(_SC_CLK_TCK); + float runtime_seconds = total_time * 1.0 / ticks_per_second; + + return runtime_seconds; +} + +static enum scx_test_status setup(void **ctx) +{ + struct rt_stall *skel; + + skel = rt_stall__open(); + SCX_FAIL_IF(!skel, "Failed to open"); + SCX_ENUM_INIT(skel); + SCX_FAIL_IF(rt_stall__load(skel), "Failed to load skel"); + + *ctx = skel; + + return SCX_TEST_PASS; +} + +static bool sched_stress_test(void) +{ + float cfs_runtime, rt_runtime; + int cfs_pid, rt_pid; + float expected_min_ratio = 0.04; /* 4% */ + + ksft_print_header(); + ksft_set_plan(1); + + /* Create and set up a EXT task */ + cfs_pid = fork(); + if (cfs_pid == 0) { + set_affinity(CORE_ID); + process_func(); + exit(0); + } else if (cfs_pid < 0) { + perror("fork for EXT task"); + ksft_exit_fail(); + } + + /* Create an RT task */ + rt_pid = fork(); + if (rt_pid == 0) { + set_affinity(CORE_ID); + set_sched(SCHED_FIFO, 50); + process_func(); + exit(0); + } else if (rt_pid < 0) { + perror("fork for RT task"); + ksft_exit_fail(); + } + + /* Let the processes run for the specified time */ + sleep(RUN_TIME); + + /* Get runtime for the EXT task */ + cfs_runtime = get_process_runtime(cfs_pid); + if (cfs_runtime != -1) + ksft_print_msg("Runtime of EXT task (PID %d) is %f seconds\n", cfs_pid, cfs_runtime); + else + ksft_exit_fail_msg("Error getting runtime for EXT task (PID %d)\n", cfs_pid); + + /* Get runtime for the RT task */ + rt_runtime = get_process_runtime(rt_pid); + if (rt_runtime != -1) + ksft_print_msg("Runtime of RT task (PID %d) is %f seconds\n", rt_pid, rt_runtime); + else + ksft_exit_fail_msg("Error getting runtime for RT task (PID %d)\n", rt_pid); + + /* Kill the processes */ + kill(cfs_pid, SIGKILL); + kill(rt_pid, SIGKILL); + waitpid(cfs_pid, NULL, 0); + waitpid(rt_pid, NULL, 0); + + /* Verify that the scx task got enough runtime */ + float actual_ratio = cfs_runtime / (cfs_runtime + rt_runtime); + ksft_print_msg("EXT task got %.2f%% of total runtime\n", actual_ratio * 100); + + if (actual_ratio >= expected_min_ratio) { + ksft_test_result_pass("PASS: EXT task got more than %.2f%% of runtime\n", + expected_min_ratio * 100); + return true; + } else { + ksft_test_result_fail("FAIL: EXT task got less than %.2f%% of runtime\n", + expected_min_ratio * 100); + return false; + } +} + +static enum scx_test_status run(void *ctx) +{ + struct rt_stall *skel = ctx; + struct bpf_link *link; + bool res; + + link = bpf_map__attach_struct_ops(skel->maps.rt_stall_ops); + SCX_FAIL_IF(!link, "Failed to attach scheduler"); + + res = sched_stress_test(); + + SCX_EQ(skel->data->uei.kind, EXIT_KIND(SCX_EXIT_NONE)); + bpf_link__destroy(link); + + if (!res) + ksft_exit_fail(); + + return SCX_TEST_PASS; +} + +static void cleanup(void *ctx) +{ + struct rt_stall *skel = ctx; + + rt_stall__destroy(skel); +} + +struct scx_test rt_stall = { + .name = "rt_stall", + .description = "Verify that RT tasks cannot stall SCHED_EXT tasks", + .setup = setup, + .run = run, + .cleanup = cleanup, +}; +REGISTER_SCX_TEST(&rt_stall) -- 2.43.0

2 weeks, 4 days

1
0
0 0

[PATCH net-next v3 0/4] netdevsim: implement RX statistics using NETDEV_PCPU_STAT_DSTATS

by Breno Leitao

The netdevsim driver previously lacked RX statistics support, which prevented its use with the GenerateTraffic() test framework, as this framework verifies traffic flow by checking RX byte counts. This patch migrates netdevsim from its custom statistics collection to the NETDEV_PCPU_STAT_DSTATS framework, as suggested by Jakub. This change not only standardizes the statistics handling but also adds the necessary RX statistics support required by the test framework. Signed-off-by: Breno Leitao <leitao(a)debian.org> --- Changes in v3: - Rely on netdev from caller instead of napi->dev in nsim_queue_free(). - Link to v2: https://lore.kernel.org/r/20250613-netdevsim_stat-v2-0-98fa38836c48@debian.… Changes in v2: - Changed the RX collection place from nsim_napi_rx() to nsim_rcv (Joe Damato) - Collect RX dropped packets statistic in nsim_queue_free() (Jakub) - Added a helper in dstat to add values to RX dropped packets - Link to v1: https://lore.kernel.org/r/20250611-netdevsim_stat-v1-0-c11b657d96bf@debian.… --- Breno Leitao (4): netdevsim: migrate to dstats stats collection netdevsim: collect statistics at RX side net: add dev_dstats_rx_dropped_add() helper netdevsim: account dropped packet length in stats on queue free drivers/net/netdevsim/netdev.c | 54 +++++++++++++++------------------------ drivers/net/netdevsim/netdevsim.h | 5 ---- include/linux/netdevice.h | 10 ++++++++ 3 files changed, 31 insertions(+), 38 deletions(-) --- base-commit: 3b5b1c428260152e47c9584bc176f358b87ca82d change-id: 20250610-netdevsim_stat-95995921e03e Best regards, -- Breno Leitao <leitao(a)debian.org>

2 weeks, 4 days

3
9
0 0

[PATCH 0/4] selftests: cgroup: Add support for named v1 hierarchies in test_core

by Michal Koutný

This comes useful when using selftests from mainline on older kernels/setups that still rely on cgroup v1 while maintains single variant of selftests. The core tests that rely on v2 specific features are skipped, the remaining ones are adjusted to work with a v1 hierarchy. The expected output on v1 system: ok 1 # SKIP test_cgcore_internal_process_constraint ok 2 # SKIP test_cgcore_top_down_constraint_enable ok 3 # SKIP test_cgcore_top_down_constraint_disable ok 4 # SKIP test_cgcore_no_internal_process_constraint_on_threads ok 5 # SKIP test_cgcore_parent_becomes_threaded ok 6 # SKIP test_cgcore_invalid_domain ok 7 # SKIP test_cgcore_populated ok 8 test_cgcore_proc_migration ok 9 test_cgcore_thread_migration ok 10 test_cgcore_destroy ok 11 test_cgcore_lesser_euid_open ok 12 # SKIP test_cgcore_lesser_ns_open Michal Koutný (4): selftests: cgroup_util: Add helpers for testing named v1 hierarchies selftests: cgroup: Add support for named v1 hierarchies in test_core selftests: cgroup: Optionally set up v1 environment selftests: cgroup: Fix compilation on pre-cgroupns kernels .../selftests/cgroup/lib/cgroup_util.c | 4 +- .../cgroup/lib/include/cgroup_util.h | 5 ++ tools/testing/selftests/cgroup/test_core.c | 84 ++++++++++++++++--- 3 files changed, 82 insertions(+), 11 deletions(-) base-commit: 9afe652958c3ee88f24df1e4a97f298afce89407 -- 2.49.0

2 weeks, 4 days

2
5
0 0

[PATCH v6 0/4] mm: introduce THP deferred setting

by Nico Pache

This series is a follow-up to [1], which adds mTHP support to khugepaged. mTHP khugepaged support is a "loose" dependency for the sysfs/sysctl configs to make sense. Without it global="defer" and mTHP="inherit" case is "undefined" behavior. We've seen cases were customers switching from RHEL7 to RHEL8 see a significant increase in the memory footprint for the same workloads. Through our investigations we found that a large contributing factor to the increase in RSS was an increase in THP usage. For workloads like MySQL, or when using allocators like jemalloc, it is often recommended to set /transparent_hugepages/enabled=never. This is in part due to performance degradations and increased memory waste. This series introduces enabled=defer, this setting acts as a middle ground between always and madvise. If the mapping is MADV_HUGEPAGE, the page fault handler will act normally, making a hugepage if possible. If the allocation is not MADV_HUGEPAGE, then the page fault handler will default to the base size allocation. The caveat is that khugepaged can still operate on pages that are not MADV_HUGEPAGE. This allows for three things... one, applications specifically designed to use hugepages will get them, and two, applications that don't use hugepages can still benefit from them without aggressively inserting THPs at every possible chance. This curbs the memory waste, and defers the use of hugepages to khugepaged. Khugepaged can then scan the memory for eligible collapsing. Lastly there is the added benefit for those who want THPs but experience higher latency PFs. Now you can get base page performance at the PF handler and Hugepage performance for those mappings after they collapse. Admins may want to lower max_ptes_none, if not, khugepaged may aggressively collapse single allocations into hugepages. TESTING: - Built for x86_64, aarch64, ppc64le, and s390x - selftests mm - In [1] I provided a script [2] that has multiple access patterns - lots of general use. - redis testing. This test was my original case for the defer mode. What I was able to prove was that THP=always leads to increased max_latency cases; hence why it is recommended to disable THPs for redis servers. However with 'defer' we dont have the max_latency spikes and can still get the system to utilize THPs. I further tested this with the mTHP defer setting and found that redis (and probably other jmalloc users) can utilize THPs via defer (+mTHP defer) without a large latency penalty and some potential gains. I uploaded some mmtest results here[3] which compares: stock+thp=never stock+(m)thp=always khugepaged-mthp + defer (max_ptes_none=64) The results show that (m)THPs can cause some throughput regression in some cases, but also has gains in other cases. The mTHP+defer results have more gains and less losses over the (m)THP=always case. V6 Changes: - nits - rebased dependent series and added review tags V5 Changes: - rebased dependent series - added reviewed-by tag on 2/4 V4 Changes: - Minor Documentation fixes - rebased the dependent series [1] onto mm-unstable commit 0e68b850b1d3 ("vmalloc: use atomic_long_add_return_relaxed()") V3 Changes: - Combined the documentation commits into one, and moved a section to the khugepaged mthp patchset V2 Changes: - base changes on mTHP khugepaged support - Fix selftests parsing issue - add mTHP defer option - add mTHP defer Documentation [1] - https://lore.kernel.org/all/20250515032226.128900-1-npache@redhat.com/ [2] - https://gitlab.com/npache/khugepaged_mthp_test [3] - https://people.redhat.com/npache/mthp_khugepaged_defer/testoutput2/output.h… Nico Pache (4): mm: defer THP insertion to khugepaged mm: document (m)THP defer usage khugepaged: add defer option to mTHP options selftests: mm: add defer to thp setting parser Documentation/admin-guide/mm/transhuge.rst | 31 +++++++--- include/linux/huge_mm.h | 18 +++++- mm/huge_memory.c | 69 +++++++++++++++++++--- mm/khugepaged.c | 8 +-- tools/testing/selftests/mm/thp_settings.c | 1 + tools/testing/selftests/mm/thp_settings.h | 1 + 6 files changed, 106 insertions(+), 22 deletions(-) -- 2.49.0

2 weeks, 4 days

5
19
0 0

[PATCH RFC net-next v4 00/11] vsock: add namespace support to vhost-vsock

by Bobby Eshleman

This series adds namespace support to vhost-vsock. It does not add namespaces to any of the guest transports (virtio-vsock, hyperv, or vmci). The current revision only supports two modes: local or global. Local mode is complete isolation of namespaces, while global mode is complete sharing between namespaces of CIDs (the original behavior). If it is deemed necessary to add mixed mode up front, it is doable but at the cost of more complexity than local and global modes. Mixed will require adding the notion of allocation to the socket lookup functions (like vhost_vsock_get()) and also more logic will be necessary for controlling or using lookups differently based on mixed-to-global or global-to-mixed scenarios. The current implementation takes into consideration the future need for mixed mode and makes sure it is possible by making vsock_ns_mode per-namespace, as for mixed mode we need at least one "global" namespace and one "mixed" namespace for it to work. Is it feasible to support local and global modes only initially? I've demoted this series to RFC, as I haven't been able to re-run the tests after rebasing onto the upstreamed vmtest.sh, some of the code is still pretty messy, there are still some TODOs, stale comments, and other work to do. I thought reviewers might want to see the current state even though unfinished, since I'll be OoO until the second week of July and that just feels like a long time of silence given we've already all done work on this together. Thanks again for everyone's help and reviews! Signed-off-by: Bobby Eshleman <bobbyeshleman(a)gmail.com> --- Changes in v3: - add notion of "modes" - add procfs /proc/net/vsock_ns_mode - local and global modes only - no /dev/vhost-vsock-netns - vmtest.sh already merged, so new patch just adds new tests for NS - Link to v2: https://lore.kernel.org/kvm/20250312-vsock-netns-v2-0-84bffa1aa97a@gmail.com Changes in v2: - only support vhost-vsock namespaces - all g2h namespaces retain old behavior, only common API changes impacted by vhost-vsock changes - add /dev/vhost-vsock-netns for "opt-in" - leave /dev/vhost-vsock to old behavior - removed netns module param - Link to v1: https://lore.kernel.org/r/20200116172428.311437-1-sgarzare@redhat.com Changes in v1: - added 'netns' module param to vsock.ko to enable the network namespace support (disabled by default) - added 'vsock_net_eq()' to check the "net" assigned to a socket only when 'netns' support is enabled - Link to RFC: https://patchwork.ozlabs.org/cover/1202235/ --- Bobby Eshleman (11): selftests/vsock: add NS tests to vmtest.sh vsock: a per-net vsock NS mode state vsock: add vsock net ns helpers vsock: add net to vsock skb cb vsock: add common code for vsock NS support virtio-vsock: add netns to common code vhost/vsock: add netns support vsock/virtio: add netns hooks hv_sock: add netns hooks vsock/vmci: add netns hooks vsock/loopback: add netns support MAINTAINERS | 1 + drivers/vhost/vsock.c | 48 ++- include/linux/virtio_vsock.h | 12 + include/net/af_vsock.h | 53 ++- include/net/net_namespace.h | 4 + include/net/netns/vsock.h | 19 ++ net/vmw_vsock/af_vsock.c | 203 +++++++++++- net/vmw_vsock/hyperv_transport.c | 2 +- net/vmw_vsock/virtio_transport.c | 5 +- net/vmw_vsock/virtio_transport_common.c | 14 +- net/vmw_vsock/vmci_transport.c | 4 +- net/vmw_vsock/vsock_loopback.c | 4 +- tools/testing/selftests/vsock/vmtest.sh | 555 +++++++++++++++++++++++++++++--- 13 files changed, 843 insertions(+), 81 deletions(-) --- base-commit: 8909f5f4ecd551c2299b28e05254b77424c8c7dc change-id: 20250325-vsock-vmtest-b3a21d2102c2 Best regards, -- Bobby Eshleman <bobbyeshleman(a)meta.com>

2 weeks, 4 days

2
12
0 0

selftests ublk UBLK_IO_F_NEED_REG_BUF undeclared

by Naresh Kamboju

The following build warnings / errors noticed while building the selftest/ublk with gcc-13 and clang-nightly toolchains on Linux next tree. Please suggest if I am missing something in my build setup. Regressions found on arm arm64 x86_64 - selftests ublk Regression Analysis: - New regression? Yes - Reproducibility? Yes Build regression: selftests ublk UBLK_IO_F_NEED_REG_BUF undeclared Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org> ## Build log CC kublk In file included from kublk.c:6: kublk.h: In function 'ublk_io_auto_zc_fallback': kublk.h:240:35: error: 'UBLK_IO_F_NEED_REG_BUF' undeclared (first use in this function); did you mean 'UBLKSRV_NEED_REG_BUF'? 240 | return !!(iod->op_flags & UBLK_IO_F_NEED_REG_BUF); | ^~~~~~~~~~~~~~~~~~~~~~ | UBLKSRV_NEED_REG_BUF kublk.h:240:35: note: each undeclared identifier is reported only once for each function it appears in kublk.c: In function 'ublk_ctrl_update_size': kublk.c:223:27: error: 'UBLK_U_CMD_UPDATE_SIZE' undeclared (first use in this function) 223 | .cmd_op = UBLK_U_CMD_UPDATE_SIZE, | ^~~~~~~~~~~~~~~~~~~~~~ kublk.c: In function 'ublk_ctrl_quiesce_dev': kublk.c:235:27: error: 'UBLK_U_CMD_QUIESCE_DEV' undeclared (first use in this function); did you mean 'UBLK_U_CMD_DEL_DEV'? 235 | .cmd_op = UBLK_U_CMD_QUIESCE_DEV, | ^~~~~~~~~~~~~~~~~~~~~~ | UBLK_U_CMD_DEL_DEV kublk.c: In function 'ublk_queue_init': kublk.c:447:63: error: 'UBLK_F_AUTO_BUF_REG' undeclared (first use in this function); did you mean 'UBLKSRV_AUTO_BUF_REG'? 447 | if (dev->dev_info.flags & (UBLK_F_SUPPORT_ZERO_COPY | UBLK_F_AUTO_BUF_REG)) { | ^~~~~~~~~~~~~~~~~~~ | UBLKSRV_AUTO_BUF_REG kublk.c: In function 'ublk_thread_init': kublk.c:507:63: error: 'UBLK_F_AUTO_BUF_REG' undeclared (first use in this function); did you mean 'UBLKSRV_AUTO_BUF_REG'? 507 | if (dev->dev_info.flags & (UBLK_F_SUPPORT_ZERO_COPY | UBLK_F_AUTO_BUF_REG)) { | ^~~~~~~~~~~~~~~~~~~ | UBLKSRV_AUTO_BUF_REG kublk.c: In function 'ublk_set_auto_buf_reg': kublk.c:579:16: error: variable 'buf' has initializer but incomplete type 579 | struct ublk_auto_buf_reg buf = {}; | ^~~~~~~~~~~~~~~~~ kublk.c:579:34: error: storage size of 'buf' isn't known 579 | struct ublk_auto_buf_reg buf = {}; | ^~~ kublk.c:587:29: error: 'UBLK_AUTO_BUF_REG_FALLBACK' undeclared (first use in this function); did you mean 'UBLKSRV_AUTO_BUF_REG_FALLBACK'? 587 | buf.flags = UBLK_AUTO_BUF_REG_FALLBACK; | ^~~~~~~~~~~~~~~~~~~~~~~~~~ | UBLKSRV_AUTO_BUF_REG_FALLBACK kublk.c:589:21: error: implicit declaration of function 'ublk_auto_buf_reg_to_sqe_addr' [-Werror=implicit-function-declaration] 589 | sqe->addr = ublk_auto_buf_reg_to_sqe_addr(&buf); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ kublk.c:579:34: error: unused variable 'buf' [-Werror=unused-variable] 579 | struct ublk_auto_buf_reg buf = {}; | ^~~ kublk.c: In function '__cmd_dev_add': kublk.c:1178:25: error: 'UBLK_F_QUIESCE' undeclared (first use in this function) 1178 | if ((features & UBLK_F_QUIESCE) && | ^~~~~~~~~~~~~~ kublk.c: In function 'cmd_dev_get_features': kublk.c:1381:30: error: 'UBLK_F_UPDATE_SIZE' undeclared (first use in this function) 1381 | [const_ilog2(UBLK_F_UPDATE_SIZE)] = "UPDATE_SIZE", | ^~~~~~~~~~~~~~~~~~ kublk.c:1369:46: note: in definition of macro 'const_ilog2' 1369 | #define const_ilog2(x) (63 - __builtin_clzll(x)) | ^ kublk.c:1369:24: error: array index in initializer not of integer type 1369 | #define const_ilog2(x) (63 - __builtin_clzll(x)) | ^ kublk.c:1381:18: note: in expansion of macro 'const_ilog2' 1381 | [const_ilog2(UBLK_F_UPDATE_SIZE)] = "UPDATE_SIZE", | ^~~~~~~~~~~ kublk.c:1369:24: note: (near initialization for 'feat_map') 1369 | #define const_ilog2(x) (63 - __builtin_clzll(x)) | ^ kublk.c:1381:18: note: in expansion of macro 'const_ilog2' 1381 | [const_ilog2(UBLK_F_UPDATE_SIZE)] = "UPDATE_SIZE", | ^~~~~~~~~~~ kublk.c:1382:30: error: 'UBLK_F_AUTO_BUF_REG' undeclared (first use in this function); did you mean 'UBLKSRV_AUTO_BUF_REG'? 1382 | [const_ilog2(UBLK_F_AUTO_BUF_REG)] = "AUTO_BUF_REG", | ^~~~~~~~~~~~~~~~~~~ kublk.c:1369:46: note: in definition of macro 'const_ilog2' 1369 | #define const_ilog2(x) (63 - __builtin_clzll(x)) | ^ kublk.c:1369:24: error: array index in initializer not of integer type 1369 | #define const_ilog2(x) (63 - __builtin_clzll(x)) | ^ kublk.c:1382:18: note: in expansion of macro 'const_ilog2' 1382 | [const_ilog2(UBLK_F_AUTO_BUF_REG)] = "AUTO_BUF_REG", | ^~~~~~~~~~~ kublk.c:1369:24: note: (near initialization for 'feat_map') 1369 | #define const_ilog2(x) (63 - __builtin_clzll(x)) | ^ kublk.c:1382:18: note: in expansion of macro 'const_ilog2' 1382 | [const_ilog2(UBLK_F_AUTO_BUF_REG)] = "AUTO_BUF_REG", | ^~~~~~~~~~~ kublk.c:1383:30: error: 'UBLK_F_QUIESCE' undeclared (first use in this function) 1383 | [const_ilog2(UBLK_F_QUIESCE)] = "QUIESCE", | ^~~~~~~~~~~~~~ kublk.c:1369:46: note: in definition of macro 'const_ilog2' 1369 | #define const_ilog2(x) (63 - __builtin_clzll(x)) | ^ kublk.c:1369:24: error: array index in initializer not of integer type 1369 | #define const_ilog2(x) (63 - __builtin_clzll(x)) | ^ kublk.c:1383:18: note: in expansion of macro 'const_ilog2' 1383 | [const_ilog2(UBLK_F_QUIESCE)] = "QUIESCE", | ^~~~~~~~~~~ kublk.c:1369:24: note: (near initialization for 'feat_map') 1369 | #define const_ilog2(x) (63 - __builtin_clzll(x)) | ^ kublk.c:1383:18: note: in expansion of macro 'const_ilog2' 1383 | [const_ilog2(UBLK_F_QUIESCE)] = "QUIESCE", | ^~~~~~~~~~~ kublk.c:1384:30: error: 'UBLK_F_PER_IO_DAEMON' undeclared (first use in this function) 1384 | [const_ilog2(UBLK_F_PER_IO_DAEMON)] = "PER_IO_DAEMON", | ^~~~~~~~~~~~~~~~~~~~ kublk.c:1369:46: note: in definition of macro 'const_ilog2' 1369 | #define const_ilog2(x) (63 - __builtin_clzll(x)) | ^ kublk.c:1369:24: error: array index in initializer not of integer type 1369 | #define const_ilog2(x) (63 - __builtin_clzll(x)) | ^ kublk.c:1384:18: note: in expansion of macro 'const_ilog2' 1384 | [const_ilog2(UBLK_F_PER_IO_DAEMON)] = "PER_IO_DAEMON", | ^~~~~~~~~~~ kublk.c:1369:24: note: (near initialization for 'feat_map') 1369 | #define const_ilog2(x) (63 - __builtin_clzll(x)) | ^ kublk.c:1384:18: note: in expansion of macro 'const_ilog2' 1384 | [const_ilog2(UBLK_F_PER_IO_DAEMON)] = "PER_IO_DAEMON", | ^~~~~~~~~~~ kublk.c: In function 'main': kublk.c:1613:46: error: 'UBLK_F_AUTO_BUF_REG' undeclared (first use in this function); did you mean 'UBLKSRV_AUTO_BUF_REG'? 1613 | ctx.flags |= UBLK_F_AUTO_BUF_REG; | ^~~~~~~~~~~~~~~~~~~ | UBLKSRV_AUTO_BUF_REG cc1: all warnings being treated as errors ## Source * Kernel version: 6.16.0-rc2 * Git tree: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git * Git sha: 050f8ad7b58d9079455af171ac279c4b9b828c11 * Git describe: next-20250616 * Project details: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250616/ * Architectures: arm, arm64, x86_64 * Toolchains: gcc-13, clang nightly * Kconfigs: selftest/*/config+defconfig+ ## Build arm64 * Build log clang: https://regressions.linaro.org/lkft/linux-next-master/next-20250617/log-par… * Build log gcc: https://regressions.linaro.org/lkft/linux-next-master/next-20250617/log-par… * Build details: https://regressions.linaro.org/lkft/linux-next-master/next-20250617/log-par… * Build link: https://storage.tuxsuite.com/public/linaro/lkft/builds/2ycmBy2r4aPm6emlo7FX… * Kernel config: https://storage.tuxsuite.com/public/linaro/lkft/builds/2ycmBy2r4aPm6emlo7FX… ## Steps to reproduce - tuxmake \ --runtime podman \ --target-arch arm64 \ --toolchain gcc-13 \ --kconfig defconfig \ --kconfig-add https://gitlab.com/Linaro/lkft/kernel-fragments/-/raw/main/netdev.config \ --kconfig-add https://gitlab.com/Linaro/lkft/kernel-fragments/-/raw/main/systemd.config \ --kconfig-add CONFIG_SYN_COOKIES=y \ --kconfig-add CONFIG_SCHEDSTATS=y debugkernel dtbs dtbs-legacy headers kernel kselftest modules -- Linaro LKFT https://lkft.linaro.org

2 weeks, 4 days

2
1
0 0

Re: [syzbot] [cgroups?] general protection fault in __cgroup_rstat_lock

by syzbot

syzbot suspects this issue was fixed by commit: commit a97915559f5c5ff1972d678b94fd460c72a3b5f2 Author: JP Kobryn <inwardvessel(a)gmail.com> Date: Fri Apr 4 01:10:48 2025 +0000 cgroup: change rstat function signatures from cgroup-based to css-based bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=12ca4c82580000 start commit: 932fc2f19b74 Merge branch 'irq-save-restore' git tree: bpf-next kernel config: https://syzkaller.appspot.com/x/.config?x=50c7a61469ce77e7 dashboard link: https://syzkaller.appspot.com/bug?extid=31eb4d4e7d9bc1fc1312 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=161cdfc0580000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12dfc8df980000 If the result looks correct, please mark the issue as fixed by replying with: #syz fix: cgroup: change rstat function signatures from cgroup-based to css-based For information about bisection process see: https://goo.gl/tpsmEJ#bisection

2 weeks, 4 days

2
2
0 0

clang: selftests/mm gup_longterm error while loading shared libraries liburing.so.2 cannot open shared object file No such file or directory

by Naresh Kamboju

The following test regressions noticed while running selftests/mm gup_longterm test cases on Dragonboard-845c, Dragonboard-410c, rock-pi-4, qemu-arm64 and qemu-x86_64 this build have required selftest/mm/configs included and toolchain is clang nightly. Regressions found on Dragonboard-845c, Dragonboard-410c, rock-pi-4, qemu-arm64 and qemu-x86_64 - selftests mm gup_longterm fails Regression Analysis: - New regression? Yes - Reproducibility? Yes Test regression: selftests mm gup_longterm error while loading shared libraries liburing.so.2 cannot open shared object file No such file or directory Test regression: selftests mm cow error while loading shared libraries liburing.so.2 cannot open shared object file No such file or directory Test regression: selftests mm mlock-random-test exit=139 Test regression: selftests mm pagemap_ioctl exit=1 Test regression: selftests mm guard_regions file hole_punch Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org> ## Test log Linux version 6.15.0-next-20250606 (tuxmake@tuxmake) (Debian clang version 21.0.0 (++20250602112323+c5a56f74fef7-1~exp1~20250602112342.1487), Debian LLD 21.0.0) #1 SMP PREEMPT @1749190532 running ./gup_longterm ---------------------- ./gup_longterm: error while loading shared libraries: liburing.so.2: cannot open shared object file: No such file or directory [FAIL] not ok 14 gup_longterm # exit=127 ./cow: error while loading shared libraries: liburing.so.2: cannot open shared object file: No such file or directory [FAIL] not ok 50 cow # exit=127 running ./mlock-random-test --------------------------- TAP version 13 1..2 [ 311.408456] traps: mlock-random-te[21661] general protection fault ip:7f63210dbf0f sp:7ffdff6fca28 error:0 in libc.so.6[adf0f,7f6321056000+165000] [FAIL] not ok 23 mlock-random-test # exit=139 running ./pagemap_ioctl ... ok 53 Huge page testing: only two middle pages dirty ok 54 # SKIP Hugetlb shmem testing: all new pages must not be written (dirty) ok 55 # SKIP Hugetlb shmem testing: all pages must be written (dirty) ok 56 # SKIP Hugetlb shmem testing: all pages dirty other than first and the last one ok 57 # SKIP Hugetlb shmem testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC ok 58 # SKIP Hugetlb shmem testing: only middle page dirty ok 59 # SKIP Hugetlb shmem testing: only two middle pages dirty ok 60 # SKIP Hugetlb mem testing: all new pages must not be written (dirty) ok 61 # SKIP Hugetlb mem testing: all pages must be written (dirty) ok 62 # SKIP Hugetlb mem testing: all pages dirty other than first and the last one ok 63 # SKIP Hugetlb mem testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC[ 241.731600] run_vmtests.sh (456): drop_caches: 3 ok 64 # SKIP Hugetlb mem testing: only middle page dirty ok 65 # SKIP Hugetlb mem testing: only two middle pages dirty Bail out! uffd-test creation failed 12 Cannot allocate memory 12 skipped test(s) detected. Consider enabling relevant config options to improve coverage. Planned tests != run tests (115 != 65) Totals: pass:53 fail:0 xfail:0 xpass:0 skip:12 error:0 [FAIL] # not ok 48 pagemap_ioctl # exit=1 running ./guard-regions ... RUN guard_regions.file.hole_punch ... guard-regions.c:1905:hole_punch:Expected madvise(&ptr[3 * page_size], 4 * page_size, MADV_REMOVE) (-1) == 0 (0) hole_punch: Test terminated by assertion FAIL guard_regions.file.hole_punch not ok 80 guard_regions.file.hole_punch ## Source * Kernel version: 6.16.0-rc2 * Git tree: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git * Git sha: 050f8ad7b58d9079455af171ac279c4b9b828c11 * Git describe: next-20250616 * Project details: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250616/ * Architectures: arm64, x86_64 * Test environments: Dragonboard-845c, Dragonboard-410c, rock-pi-4, qemu-arm64, qemu-x86_64 and x86 * Toolchains: clang nightly * Kconfigs: selftest/mm/config+defconfig+ ## Test * Test log: https://qa-reports.linaro.org/api/testruns/28766026/log_file/ * Test log 2: https://qa-reports.linaro.org/api/testruns/28743077/log_file/ * Build details: https://regressions.linaro.org/lkft/linux-next-master/next-20250616/kselfte… * Build link: https://storage.tuxsuite.com/public/linaro/lkft/builds/2ya0viPHafKAe0u89drI… * Kernel config: https://storage.tuxsuite.com/public/linaro/lkft/builds/2ya0viPHafKAe0u89drI… ## Steps to reproduce - tuxrun \ --runtime podman \ --device qemu-x86_64 \ --boot-args rw \ --kernel https://storage.tuxsuite.com/public/linaro/lkft/builds/2ya0wmVl0eHb9koWyQYC… \ --rootfs https://storage.tuxboot.com/debian/20250605/trixie/amd64/rootfs.ext4.xz \ --modules https://storage.tuxsuite.com/public/linaro/lkft/builds/2ya0wmVl0eHb9koWyQYC… /usr/ \ --parameters MODULES_PATH=/usr/ \ --parameters SQUAD_URL=https://qa-reports.linaro.org//api/submit/lkft/linux-next-master/… \ --parameters SKIPFILE=skipfile-lkft.yaml \ --parameters KSELFTEST=https://storage.tuxsuite.com/public/linaro/lkft/builds/2ya0wmVl0e… \ --image docker.io/linaro/tuxrun-dispatcher:v1.2.2 \ --tests kselftest-mm \ --timeouts boot=15 -- Linaro LKFT https://lkft.linaro.org

2 weeks, 4 days

3
3
0 0

[PATCH] Documentation: kunit: Correct MODULE_IMPORT_NS() syntax

by Brian Norris

The argument should be the string "EXPORTED_FOR_KUNIT_TESTING", not a bare identifier. Signed-off-by: Brian Norris <briannorris(a)chromium.org> --- Documentation/dev-tools/kunit/usage.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/dev-tools/kunit/usage.rst b/Documentation/dev-tools/kunit/usage.rst index 038f480074fd..066ecda1dd98 100644 --- a/Documentation/dev-tools/kunit/usage.rst +++ b/Documentation/dev-tools/kunit/usage.rst @@ -699,7 +699,7 @@ the template below. #include <kunit/visibility.h> #include <my_file.h> ... - MODULE_IMPORT_NS(EXPORTED_FOR_KUNIT_TESTING); + MODULE_IMPORT_NS("EXPORTED_FOR_KUNIT_TESTING"); ... // Use do_interesting_thing() in tests -- 2.50.0.rc2.692.g299adb8693-goog

2 weeks, 4 days

2
1
0 0

[PATCH net-next v3 15/15] selftests: forwarding: Add a test for verifying VXLAN MC underlay

by Petr Machata

Add tests for MC-routing underlay VXLAN traffic. Signed-off-by: Petr Machata <petrm(a)nvidia.com> --- Notes: v3: - Disable shellchecking apparently-unreachable commands. They are actually reachable through tests_run, in_ns, etc. - Run setup_wait with an explicit argument to avoid shellcheck citation. v2: - Adjust as per shellcheck citations --- CC: Shuah Khan <shuah(a)kernel.org> CC: linux-kselftest(a)vger.kernel.org .../testing/selftests/net/forwarding/Makefile | 1 + .../net/forwarding/vxlan_bridge_1q_mc_ul.sh | 771 ++++++++++++++++++ 2 files changed, 772 insertions(+) create mode 100755 tools/testing/selftests/net/forwarding/vxlan_bridge_1q_mc_ul.sh diff --git a/tools/testing/selftests/net/forwarding/Makefile b/tools/testing/selftests/net/forwarding/Makefile index 00bde7b6f39e..d7bb2e80e88c 100644 --- a/tools/testing/selftests/net/forwarding/Makefile +++ b/tools/testing/selftests/net/forwarding/Makefile @@ -102,6 +102,7 @@ TEST_PROGS = bridge_fdb_learning_limit.sh \ vxlan_bridge_1d_port_8472.sh \ vxlan_bridge_1d.sh \ vxlan_bridge_1q_ipv6.sh \ + vxlan_bridge_1q_mc_ul.sh \ vxlan_bridge_1q_port_8472_ipv6.sh \ vxlan_bridge_1q_port_8472.sh \ vxlan_bridge_1q.sh \ diff --git a/tools/testing/selftests/net/forwarding/vxlan_bridge_1q_mc_ul.sh b/tools/testing/selftests/net/forwarding/vxlan_bridge_1q_mc_ul.sh new file mode 100755 index 000000000000..7ec58b6b1128 --- /dev/null +++ b/tools/testing/selftests/net/forwarding/vxlan_bridge_1q_mc_ul.sh @@ -0,0 +1,771 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +# +-----------------------------------------+ +# | + $h1.10 + $h1.20 | +# | | 192.0.2.1/28 | 2001:db8:1::1/64 | +# | \________ ________/ | +# | \ / | +# | + $h1 H1 (vrf) | +# +-----------|-----------------------------+ +# | +# +-----------|----------------------------------------------------------------+ +# | +---------|--------------------------------------+ SWITCH (main vrf) | +# | | + $swp1 BR1 (802.1q) | | +# | | vid 10 20 | | +# | | | | +# | | + vx10 (vxlan) + vx20 (vxlan) | + lo10 (dummy) | +# | | local 192.0.2.100 local 2001:db8:4::1 | 192.0.2.100/28 | +# | | group 233.252.0.1 group ff0e::1:2:3 | 2001:db8:4::1/64 | +# | | id 1000 id 2000 | | +# | | vid 10 pvid untagged vid 20 pvid untagged | | +# | +------------------------------------------------+ | +# | | +# | + $swp2 $swp3 + | +# | | 192.0.2.33/28 192.0.2.65/28 | | +# | | 2001:db8:2::1/64 2001:db8:3::1/64 | | +# | | | | +# +---|--------------------------------------------------------------------|---+ +# | | +# +---|--------------------------------+ +--------------------------------|---+ +# | | H2 (vrf) | | H3 (vrf) | | +# | +-|----------------------------+ | | +-----------------------------|-+ | +# | | + $h2 BR2 (802.1d) | | | | BR3 (802.1d) $h3 + | | +# | | | | | | | | +# | | + v1$h2 (veth) | | | | v1$h3 (veth) + | | +# | +-|----------------------------+ | | +-----------------------------|-+ | +# | | | | | | +# +---|--------------------------------+ +--------------------------------|---+ +# | | +# +---|--------------------------------+ +--------------------------------|---+ +# | + v2$h2 (veth) NS2 (netns) | | NS3 (netns) v2$h3 (veth) + | +# | 192.0.2.34/28 | | 192.0.2.66/28 | +# | 2001:db8:2::2/64 | | 2001:db8:3::2/64 | +# | | | | +# | +--------------------------------+ | | +--------------------------------+ | +# | | BR1 (802.1q) | | | | BR1 (802.1q) | | +# | | + vx10 (vxlan) | | | | + vx10 (vxlan) | | +# | | local 192.0.2.34 | | | | local 192.0.2.50 | | +# | | group 233.252.0.1 dev v2$h2 | | | | group 233.252.0.1 dev v2$h3 | | +# | | id 1000 dstport $VXPORT | | | | id 1000 dstport $VXPORT | | +# | | vid 10 pvid untagged | | | | vid 10 pvid untagged | | +# | | | | | | | | +# | | + vx20 (vxlan) | | | | + vx20 (vxlan) | | +# | | local 2001:db8:2::2 | | | | local 2001:db8:3::2 | | +# | | group ff0e::1:2:3 dev v2$h2 | | | | group ff0e::1:2:3 dev v2$h3 | | +# | | id 2000 dstport $VXPORT | | | | id 2000 dstport $VXPORT | | +# | | vid 20 pvid untagged | | | | vid 20 pvid untagged | | +# | | | | | | | | +# | | + w1 (veth) | | | | + w1 (veth) | | +# | | | vid 10 20 | | | | | vid 10 20 | | +# | +--|-----------------------------+ | | +--|-----------------------------+ | +# | | | | | | +# | +--|-----------------------------+ | | +--|-----------------------------+ | +# | | + w2 (veth) VW2 (vrf) | | | | + w2 (veth) VW2 (vrf) | | +# | | |\ | | | | |\ | | +# | | | + w2.10 | | | | | + w2.10 | | +# | | | 192.0.2.3/28 | | | | | 192.0.2.4/28 | | +# | | | | | | | | | | +# | | + w2.20 | | | | + w2.20 | | +# | | 2001:db8:1::3/64 | | | | 2001:db8:1::4/64 | | +# | +--------------------------------+ | | +--------------------------------+ | +# +------------------------------------+ +------------------------------------+ +# +#shellcheck disable=SC2317 # SC doesn't see our uses of functions. + +: "${VXPORT:=4789}" +export VXPORT + +: "${GROUP4:=233.252.0.1}" +export GROUP4 + +: "${GROUP6:=ff0e::1:2:3}" +export GROUP6 + +: "${IPMR:=lo10}" + +ALL_TESTS=" + ipv4_nomcroute + ipv4_mcroute + ipv4_mcroute_changelink + ipv4_mcroute_starg + ipv4_mcroute_noroute + ipv4_mcroute_fdb + ipv4_mcroute_fdb_oif0 + ipv4_mcroute_fdb_oif0_sep + + ipv6_nomcroute + ipv6_mcroute + ipv6_mcroute_changelink + ipv6_mcroute_starg + ipv6_mcroute_noroute + ipv6_mcroute_fdb + ipv6_mcroute_fdb_oif0 + + ipv4_nomcroute_rx + ipv4_mcroute_rx + ipv4_mcroute_starg_rx + ipv4_mcroute_fdb_oif0_sep_rx + ipv4_mcroute_fdb_sep_rx + + ipv6_nomcroute_rx + ipv6_mcroute_rx + ipv6_mcroute_starg_rx + ipv6_mcroute_fdb_sep_rx +" + +NUM_NETIFS=6 +source lib.sh + +h1_create() +{ + simple_if_init "$h1" + defer simple_if_fini "$h1" + + ip_link_add "$h1.10" master "v$h1" link "$h1" type vlan id 10 + ip_link_set_up "$h1.10" + ip_addr_add "$h1.10" 192.0.2.1/28 + + ip_link_add "$h1.20" master "v$h1" link "$h1" type vlan id 20 + ip_link_set_up "$h1.20" + ip_addr_add "$h1.20" 2001:db8:1::1/64 +} + +install_capture() +{ + local dev=$1; shift + + tc qdisc add dev "$dev" clsact + defer tc qdisc del dev "$dev" clsact + + tc filter add dev "$dev" ingress proto ip pref 104 \ + flower skip_hw ip_proto udp dst_port "$VXPORT" \ + action pass + defer tc filter del dev "$dev" ingress proto ip pref 104 + + tc filter add dev "$dev" ingress proto ipv6 pref 106 \ + flower skip_hw ip_proto udp dst_port "$VXPORT" \ + action pass + defer tc filter del dev "$dev" ingress proto ipv6 pref 106 +} + +h2_create() +{ + # $h2 + ip_link_set_up "$h2" + + # H2 + vrf_create "v$h2" + defer vrf_destroy "v$h2" + + ip_link_set_up "v$h2" + + # br2 + ip_link_add br2 type bridge vlan_filtering 0 mcast_snooping 0 + ip_link_set_master br2 "v$h2" + ip_link_set_up br2 + + # $h2 + ip_link_set_master "$h2" br2 + install_capture "$h2" + + # v1$h2 + ip_link_set_up "v1$h2" + ip_link_set_master "v1$h2" br2 +} + +h3_create() +{ + # $h3 + ip_link_set_up "$h3" + + # H3 + vrf_create "v$h3" + defer vrf_destroy "v$h3" + + ip_link_set_up "v$h3" + + # br3 + ip_link_add br3 type bridge vlan_filtering 0 mcast_snooping 0 + ip_link_set_master br3 "v$h3" + ip_link_set_up br3 + + # $h3 + ip_link_set_master "$h3" br3 + install_capture "$h3" + + # v1$h3 + ip_link_set_up "v1$h3" + ip_link_set_master "v1$h3" br3 +} + +switch_create() +{ + local swp1_mac + + # br1 + swp1_mac=$(mac_get "$swp1") + ip_link_add br1 type bridge vlan_filtering 1 \ + vlan_default_pvid 0 mcast_snooping 0 + ip_link_set_addr br1 "$swp1_mac" + ip_link_set_up br1 + + # A dummy to force the IPv6 OIF=0 test to install a suitable MC route on + # $IPMR to be deterministic. Also used for the IPv6 RX!=TX ping test. + ip_link_add "X$IPMR" up type dummy + + # IPMR + ip_link_add "$IPMR" up type dummy + ip_addr_add "$IPMR" 192.0.2.100/28 + ip_addr_add "$IPMR" 2001:db8:4::1/64 + + # $swp1 + ip_link_set_up "$swp1" + ip_link_set_master "$swp1" br1 + bridge_vlan_add vid 10 dev "$swp1" + bridge_vlan_add vid 20 dev "$swp1" + + # $swp2 + ip_link_set_up "$swp2" + ip_addr_add "$swp2" 192.0.2.33/28 + ip_addr_add "$swp2" 2001:db8:2::1/64 + + # $swp3 + ip_link_set_up "$swp3" + ip_addr_add "$swp3" 192.0.2.65/28 + ip_addr_add "$swp3" 2001:db8:3::1/64 +} + +vx_create() +{ + local name=$1; shift + local vid=$1; shift + + ip_link_add "$name" up type vxlan dstport "$VXPORT" \ + nolearning noudpcsum tos inherit ttl 16 \ + "$@" + ip_link_set_master "$name" br1 + bridge_vlan_add vid "$vid" dev "$name" pvid untagged +} +export -f vx_create + +vx_wait() +{ + # Wait for all the ARP, IGMP etc. noise to settle down so that the + # tunnel is clear for measurements. + sleep 10 +} + +vx10_create() +{ + vx_create vx10 10 id 1000 "$@" +} +export -f vx10_create + +vx20_create() +{ + vx_create vx20 20 id 2000 "$@" +} +export -f vx20_create + +vx10_create_wait() +{ + vx10_create "$@" + vx_wait +} + +vx20_create_wait() +{ + vx20_create "$@" + vx_wait +} + +ns_init_common() +{ + local ns=$1; shift + local if_in=$1; shift + local ipv4_in=$1; shift + local ipv6_in=$1; shift + local ipv4_host=$1; shift + local ipv6_host=$1; shift + + # v2$h2 / v2$h3 + ip_link_set_up "$if_in" + ip_addr_add "$if_in" "$ipv4_in" + ip_addr_add "$if_in" "$ipv6_in" + + # br1 + ip_link_add br1 type bridge vlan_filtering 1 \ + vlan_default_pvid 0 mcast_snooping 0 + ip_link_set_up br1 + + # vx10, vx20 + vx10_create local "${ipv4_in%/*}" group "$GROUP4" dev "$if_in" + vx20_create local "${ipv6_in%/*}" group "$GROUP6" dev "$if_in" + + # w1 + ip_link_add w1 type veth peer name w2 + ip_link_set_master w1 br1 + ip_link_set_up w1 + bridge_vlan_add vid 10 dev w1 + bridge_vlan_add vid 20 dev w1 + + # w2 + simple_if_init w2 + defer simple_if_fini w2 + + # w2.10 + ip_link_add w2.10 master vw2 link w2 type vlan id 10 + ip_link_set_up w2.10 + ip_addr_add w2.10 "$ipv4_host" + + # w2.20 + ip_link_add w2.20 master vw2 link w2 type vlan id 20 + ip_link_set_up w2.20 + ip_addr_add w2.20 "$ipv6_host" +} +export -f ns_init_common + +ns2_create() +{ + # NS2 + ip netns add ns2 + defer ip netns del ns2 + + # v2$h2 + ip link set dev "v2$h2" netns ns2 + defer ip -n ns2 link set dev "v2$h2" netns 1 + + in_ns ns2 \ + ns_init_common ns2 "v2$h2" \ + 192.0.2.34/28 2001:db8:2::2/64 \ + 192.0.2.3/28 2001:db8:1::3/64 +} + +ns3_create() +{ + # NS3 + ip netns add ns3 + defer ip netns del ns3 + + # v2$h3 + ip link set dev "v2$h3" netns ns3 + defer ip -n ns3 link set dev "v2$h3" netns 1 + + ip -n ns3 link set dev "v2$h3" up + + in_ns ns3 \ + ns_init_common ns3 "v2$h3" \ + 192.0.2.66/28 2001:db8:3::2/64 \ + 192.0.2.4/28 2001:db8:1::4/64 +} + +setup_prepare() +{ + h1=${NETIFS[p1]} + swp1=${NETIFS[p2]} + + swp2=${NETIFS[p3]} + h2=${NETIFS[p4]} + + swp3=${NETIFS[p5]} + h3=${NETIFS[p6]} + + vrf_prepare + defer vrf_cleanup + + forwarding_enable + defer forwarding_restore + + ip_link_add "v1$h2" type veth peer name "v2$h2" + ip_link_add "v1$h3" type veth peer name "v2$h3" + + h1_create + h2_create + h3_create + switch_create + ns2_create + ns3_create +} + +adf_install_broken_sg() +{ + adf_mcd_start "$IPMR" || exit "$EXIT_STATUS" + + mc_cli add "$swp2" 192.0.2.100 "$GROUP4" "$swp1" "$swp3" + defer mc_cli remove "$swp2" 192.0.2.100 "$GROUP4" "$swp1" "$swp3" + + mc_cli add "$swp2" 2001:db8:4::1 "$GROUP6" "$swp1" "$swp3" + defer mc_cli remove "$swp2" 2001:db8:4::1 "$GROUP6" "$swp1" "$swp3" +} + +adf_install_rx() +{ + mc_cli add "$swp2" 0.0.0.0 "$GROUP4" "$IPMR" + defer mc_cli remove "$swp2" 0.0.0.0 "$GROUP4" lo10 + + mc_cli add "$swp3" 0.0.0.0 "$GROUP4" "$IPMR" + defer mc_cli remove "$swp3" 0.0.0.0 "$GROUP4" lo10 + + mc_cli add "$swp2" :: "$GROUP6" "$IPMR" + defer mc_cli remove "$swp2" :: "$GROUP6" lo10 + + mc_cli add "$swp3" :: "$GROUP6" "$IPMR" + defer mc_cli remove "$swp3" :: "$GROUP6" lo10 +} + +adf_install_sg() +{ + adf_mcd_start "$IPMR" || exit "$EXIT_STATUS" + + mc_cli add "$IPMR" 192.0.2.100 "$GROUP4" "$swp2" "$swp3" + defer mc_cli remove "$IPMR" 192.0.2.33 "$GROUP4" "$swp2" "$swp3" + + mc_cli add "$IPMR" 2001:db8:4::1 "$GROUP6" "$swp2" "$swp3" + defer mc_cli remove "$IPMR" 2001:db8:4::1 "$GROUP6" "$swp2" "$swp3" + + adf_install_rx +} + +adf_install_sg_sep() +{ + adf_mcd_start lo || exit "$EXIT_STATUS" + + mc_cli add lo 192.0.2.120 "$GROUP4" "$swp2" "$swp3" + defer mc_cli remove lo 192.0.2.120 "$GROUP4" "$swp2" "$swp3" + + mc_cli add lo 2001:db8:5::1 "$GROUP6" "$swp2" "$swp3" + defer mc_cli remove lo 2001:db8:5::1 "$GROUP6" "$swp2" "$swp3" +} + +adf_install_sg_sep_rx() +{ + local lo=$1; shift + + adf_mcd_start "$IPMR" "$lo" || exit "$EXIT_STATUS" + + mc_cli add "$lo" 192.0.2.120 "$GROUP4" "$swp2" "$swp3" + defer mc_cli remove "$lo" 192.0.2.120 "$GROUP4" "$swp2" "$swp3" + + mc_cli add "$lo" 2001:db8:5::1 "$GROUP6" "$swp2" "$swp3" + defer mc_cli remove "$lo" 2001:db8:5::1 "$GROUP6" "$swp2" "$swp3" + + adf_install_rx +} + +adf_install_starg() +{ + adf_mcd_start "$IPMR" || exit "$EXIT_STATUS" + + mc_cli add "$IPMR" 0.0.0.0 "$GROUP4" "$swp2" "$swp3" + defer mc_cli remove "$IPMR" 0.0.0.0 "$GROUP4" "$swp2" "$swp3" + + mc_cli add "$IPMR" :: "$GROUP6" "$swp2" "$swp3" + defer mc_cli remove "$IPMR" :: "$GROUP6" "$swp2" "$swp3" + + adf_install_rx +} + +do_packets_v4() +{ + local mac + + mac=$(mac_get "$h2") + "$MZ" "$h1" -Q 10 -c 10 -d 100msec -p 64 -a own -b "$mac" \ + -A 192.0.2.1 -B 192.0.2.2 -t udp sp=1234,dp=2345 -q +} + +do_packets_v6() +{ + local mac + + mac=$(mac_get "$h2") + "$MZ" -6 "$h1" -Q 20 -c 10 -d 100msec -p 64 -a own -b "$mac" \ + -A 2001:db8:1::1 -B 2001:db8:1::2 -t udp sp=1234,dp=2345 -q +} + +do_test() +{ + local ipv=$1; shift + local expect_h2=$1; shift + local expect_h3=$1; shift + local what=$1; shift + + local pref=$((100 + ipv)) + local t0_h2 + local t0_h3 + local t1_h2 + local t1_h3 + local d_h2 + local d_h3 + + RET=0 + + t0_h2=$(tc_rule_stats_get "$h2" "$pref" ingress) + t0_h3=$(tc_rule_stats_get "$h3" "$pref" ingress) + + "do_packets_v$ipv" + sleep 1 + + t1_h2=$(tc_rule_stats_get "$h2" "$pref" ingress) + t1_h3=$(tc_rule_stats_get "$h3" "$pref" ingress) + + d_h2=$((t1_h2 - t0_h2)) + d_h3=$((t1_h3 - t0_h3)) + + ((d_h2 == expect_h2)) + check_err $? "Expected $expect_h2 packets on H2, got $d_h2" + + ((d_h3 == expect_h3)) + check_err $? "Expected $expect_h3 packets on H3, got $d_h3" + + log_test "VXLAN MC flood $what" +} + +ipv4_do_test_rx() +{ + local h3_should_fail=$1; shift + local what=$1; shift + + RET=0 + + ping_do "$h1.10" 192.0.2.3 + check_err $? "H2 should respond" + + ping_do "$h1.10" 192.0.2.4 + check_err_fail "$h3_should_fail" $? "H3 responds" + + log_test "VXLAN MC flood $what" +} + +ipv6_do_test_rx() +{ + local h3_should_fail=$1; shift + local what=$1; shift + + RET=0 + + ping6_do "$h1.20" 2001:db8:1::3 + check_err $? "H2 should respond" + + ping6_do "$h1.20" 2001:db8:1::4 + check_err_fail "$h3_should_fail" $? "H3 responds" + + log_test "VXLAN MC flood $what" +} + +ipv4_nomcroute() +{ + # Install a misleading (S,G) rule to attempt to trick the system into + # pushing the packets elsewhere. + adf_install_broken_sg + vx10_create_wait local 192.0.2.100 group "$GROUP4" dev "$swp2" + do_test 4 10 0 "IPv4 nomcroute" +} + +ipv6_nomcroute() +{ + # Like for IPv4, install a misleading (S,G). + adf_install_broken_sg + vx20_create_wait local 2001:db8:4::1 group "$GROUP6" dev "$swp2" + do_test 6 10 0 "IPv6 nomcroute" +} + +ipv4_nomcroute_rx() +{ + vx10_create local 192.0.2.100 group "$GROUP4" dev "$swp2" + ipv4_do_test_rx 1 "IPv4 nomcroute ping" +} + +ipv6_nomcroute_rx() +{ + vx20_create local 2001:db8:4::1 group "$GROUP6" dev "$swp2" + ipv6_do_test_rx 1 "IPv6 nomcroute ping" +} + +ipv4_mcroute() +{ + adf_install_sg + vx10_create_wait local 192.0.2.100 group "$GROUP4" dev "$IPMR" mcroute + do_test 4 10 10 "IPv4 mcroute" +} + +ipv6_mcroute() +{ + adf_install_sg + vx20_create_wait local 2001:db8:4::1 group "$GROUP6" dev "$IPMR" mcroute + do_test 6 10 10 "IPv6 mcroute" +} + +ipv4_mcroute_rx() +{ + adf_install_sg + vx10_create_wait local 192.0.2.100 group "$GROUP4" dev "$IPMR" mcroute + ipv4_do_test_rx 0 "IPv4 mcroute ping" +} + +ipv6_mcroute_rx() +{ + adf_install_sg + vx20_create_wait local 2001:db8:4::1 group "$GROUP6" dev "$IPMR" mcroute + ipv6_do_test_rx 0 "IPv6 mcroute ping" +} + +ipv4_mcroute_changelink() +{ + adf_install_sg + vx10_create_wait local 192.0.2.100 group "$GROUP4" dev "$IPMR" + ip link set dev vx10 type vxlan mcroute + sleep 1 + do_test 4 10 10 "IPv4 mcroute changelink" +} + +ipv6_mcroute_changelink() +{ + adf_install_sg + vx20_create_wait local 2001:db8:4::1 group "$GROUP6" dev "$IPMR" mcroute + ip link set dev vx20 type vxlan mcroute + sleep 1 + do_test 6 10 10 "IPv6 mcroute changelink" +} + +ipv4_mcroute_starg() +{ + adf_install_starg + vx10_create_wait local 192.0.2.100 group "$GROUP4" dev "$IPMR" mcroute + do_test 4 10 10 "IPv4 mcroute (*,G)" +} + +ipv6_mcroute_starg() +{ + adf_install_starg + vx20_create_wait local 2001:db8:4::1 group "$GROUP6" dev "$IPMR" mcroute + do_test 6 10 10 "IPv6 mcroute (*,G)" +} + +ipv4_mcroute_starg_rx() +{ + adf_install_starg + vx10_create_wait local 192.0.2.100 group "$GROUP4" dev "$IPMR" mcroute + ipv4_do_test_rx 0 "IPv4 mcroute (*,G) ping" +} + +ipv6_mcroute_starg_rx() +{ + adf_install_starg + vx20_create_wait local 2001:db8:4::1 group "$GROUP6" dev "$IPMR" mcroute + ipv6_do_test_rx 0 "IPv6 mcroute (*,G) ping" +} + +ipv4_mcroute_noroute() +{ + vx10_create_wait local 192.0.2.100 group "$GROUP4" dev "$IPMR" mcroute + do_test 4 0 0 "IPv4 mcroute, no route" +} + +ipv6_mcroute_noroute() +{ + vx20_create_wait local 2001:db8:4::1 group "$GROUP6" dev "$IPMR" mcroute + do_test 6 0 0 "IPv6 mcroute, no route" +} + +ipv4_mcroute_fdb() +{ + adf_install_sg + vx10_create_wait local 192.0.2.100 dev "$IPMR" mcroute + bridge fdb add dev vx10 \ + 00:00:00:00:00:00 self static dst "$GROUP4" via "$IPMR" + do_test 4 10 10 "IPv4 mcroute FDB" +} + +ipv6_mcroute_fdb() +{ + adf_install_sg + vx20_create_wait local 2001:db8:4::1 dev "$IPMR" mcroute + bridge -6 fdb add dev vx20 \ + 00:00:00:00:00:00 self static dst "$GROUP6" via "$IPMR" + do_test 6 10 10 "IPv6 mcroute FDB" +} + +# Use FDB to configure VXLAN in a way where oif=0 for purposes of FIB lookup. +ipv4_mcroute_fdb_oif0() +{ + adf_install_sg + vx10_create_wait local 192.0.2.100 group "$GROUP4" dev "$IPMR" mcroute + bridge fdb del dev vx10 00:00:00:00:00:00 + bridge fdb add dev vx10 00:00:00:00:00:00 self static dst "$GROUP4" + do_test 4 10 10 "IPv4 mcroute oif=0" +} + +ipv6_mcroute_fdb_oif0() +{ + # The IPv6 tunnel lookup does not fall back to selection by source + # address. Instead it just does a FIB match, and that would find one of + # the several ff00::/8 multicast routes -- each device has one. In order + # to reliably force the $IPMR device, add a /128 route for the + # destination group address. + ip -6 route add table local multicast "$GROUP6/128" dev "$IPMR" + defer ip -6 route del table local multicast "$GROUP6/128" dev "$IPMR" + + adf_install_sg + vx20_create_wait local 2001:db8:4::1 group "$GROUP6" dev "$IPMR" mcroute + bridge -6 fdb del dev vx20 00:00:00:00:00:00 + bridge -6 fdb add dev vx20 00:00:00:00:00:00 self static dst "$GROUP6" + do_test 6 10 10 "IPv6 mcroute oif=0" +} + +# In oif=0 test as above, have FIB lookup resolve to loopback instead of IPMR. +# This doesn't work with IPv6 -- a MC route on lo would be marked as RTF_REJECT. +ipv4_mcroute_fdb_oif0_sep() +{ + adf_install_sg_sep + + ip_addr_add lo 192.0.2.120/28 + vx10_create_wait local 192.0.2.120 group "$GROUP4" dev "$IPMR" mcroute + bridge fdb del dev vx10 00:00:00:00:00:00 + bridge fdb add dev vx10 00:00:00:00:00:00 self static dst "$GROUP4" + do_test 4 10 10 "IPv4 mcroute TX!=RX oif=0" +} + +ipv4_mcroute_fdb_oif0_sep_rx() +{ + adf_install_sg_sep_rx lo + + ip_addr_add lo 192.0.2.120/28 + vx10_create_wait local 192.0.2.120 group "$GROUP4" dev "$IPMR" mcroute + bridge fdb del dev vx10 00:00:00:00:00:00 + bridge fdb add dev vx10 00:00:00:00:00:00 self static dst "$GROUP4" + ipv4_do_test_rx 0 "IPv4 mcroute TX!=RX oif=0 ping" +} + +ipv4_mcroute_fdb_sep_rx() +{ + adf_install_sg_sep_rx lo + + ip_addr_add lo 192.0.2.120/28 + vx10_create_wait local 192.0.2.120 group "$GROUP4" dev "$IPMR" mcroute + bridge fdb del dev vx10 00:00:00:00:00:00 + bridge fdb add \ + dev vx10 00:00:00:00:00:00 self static dst "$GROUP4" via lo + ipv4_do_test_rx 0 "IPv4 mcroute TX!=RX ping" +} + +ipv6_mcroute_fdb_sep_rx() +{ + adf_install_sg_sep_rx "X$IPMR" + + ip_addr_add "X$IPMR" 2001:db8:5::1/64 + vx20_create_wait local 2001:db8:5::1 group "$GROUP6" dev "$IPMR" mcroute + bridge -6 fdb del dev vx20 00:00:00:00:00:00 + bridge -6 fdb add dev vx20 00:00:00:00:00:00 \ + self static dst "$GROUP6" via "X$IPMR" + ipv6_do_test_rx 0 "IPv6 mcroute TX!=RX ping" +} + +trap cleanup EXIT + +setup_prepare +setup_wait "$NUM_NETIFS" +tests_run + +exit "$EXIT_STATUS" -- 2.49.0

2 weeks, 4 days

1
0
0 0

[PATCH net-next v3 14/15] selftests: forwarding: adf_mcd_start(): Allow configuring custom interfaces

by Petr Machata

Tests may wish to add other interfaces to listen on. Notably locally generated traffic uses dummy interfaces. The multicast daemon needs to know about these so that it allows forming rules that involve these interfaces, and so that net.ipv4.conf.X.mc_forwarding is set for the interfaces. To that end, allow passing in a list of interfaces to configure in addition to all the physical ones. Signed-off-by: Petr Machata <petrm(a)nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor(a)blackwall.org> --- Notes: v2: - Adjust as per shellcheck citations - Retain Nik's R-b, the changes were very minor. --- CC: Shuah Khan <shuah(a)kernel.org> CC: linux-kselftest(a)vger.kernel.org tools/testing/selftests/net/forwarding/lib.sh | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh index 253847372062..83ee6a07e072 100644 --- a/tools/testing/selftests/net/forwarding/lib.sh +++ b/tools/testing/selftests/net/forwarding/lib.sh @@ -1760,9 +1760,12 @@ mc_send() adf_mcd_start() { + local ifs=("$@") + local table_name="$MCD_TABLE_NAME" local smcroutedir local pid + local if local i check_command "$MCD" || return 1 @@ -1776,6 +1779,16 @@ adf_mcd_start() "$smcroutedir/$table_name.conf" done + for if in "${ifs[@]}"; do + if ! ip_link_has_flag "$if" MULTICAST; then + ip link set dev "$if" multicast on + defer ip link set dev "$if" multicast off + fi + + echo "phyint $if enable" >> \ + "$smcroutedir/$table_name.conf" + done + "$MCD" -N -I "$table_name" -f "$smcroutedir/$table_name.conf" \ -P "$smcroutedir/$table_name.pid" busywait "$BUSYWAIT_TIMEOUT" test -e "$smcroutedir/$table_name.pid" -- 2.49.0

2 weeks, 4 days

1
0
0 0

[PATCH net-next v3 13/15] selftests: net: lib: Add ip_link_has_flag()

by Petr Machata

Add a helper to determine whether a given netdevice has a given flag. Rewrite ip_link_is_up() in terms of the new helper. Signed-off-by: Petr Machata <petrm(a)nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor(a)blackwall.org> --- Notes: CC: Shuah Khan <shuah(a)kernel.org> CC: linux-kselftest(a)vger.kernel.org tools/testing/selftests/net/lib.sh | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh index 006fdadcc4b9..ff0dbe23e8e0 100644 --- a/tools/testing/selftests/net/lib.sh +++ b/tools/testing/selftests/net/lib.sh @@ -547,13 +547,19 @@ ip_link_set_addr() defer ip link set dev "$name" address "$old_addr" } -ip_link_is_up() +ip_link_has_flag() { local name=$1; shift + local flag=$1; shift local state=$(ip -j link show "$name" | - jq -r '(.[].flags[] | select(. == "UP")) // "DOWN"') - [[ $state == "UP" ]] + jq --arg flag "$flag" 'any(.[].flags.[]; . == $flag)') + [[ $state == true ]] +} + +ip_link_is_up() +{ + ip_link_has_flag "$1" UP } ip_link_set_up() -- 2.49.0

2 weeks, 4 days

1
0
0 0

[PATCH net-next v3 12/15] selftests: forwarding: lib: Move smcrouted helpers here

by Petr Machata

router_multicast.sh has several helpers for work with smcrouted. Extract them to lib.sh so that other selftests can use them as well. Convert the helpers to defer in the process, because that simplifies the interface quite a bit. Therefore have router_multicast.sh invoke defer_scopes_cleanup() in its cleanup() function. Signed-off-by: Petr Machata <petrm(a)nvidia.com> --- Notes: v2: - Adjust as per shellcheck citations --- CC: Shuah Khan <shuah(a)kernel.org> CC: linux-kselftest(a)vger.kernel.org tools/testing/selftests/net/forwarding/lib.sh | 33 +++++++++++++++++ .../net/forwarding/router_multicast.sh | 35 ++++--------------- 2 files changed, 39 insertions(+), 29 deletions(-) diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh index 508f3c700d71..253847372062 100644 --- a/tools/testing/selftests/net/forwarding/lib.sh +++ b/tools/testing/selftests/net/forwarding/lib.sh @@ -37,6 +37,7 @@ declare -A NETIFS=( : "${TEAMD:=teamd}" : "${MCD:=smcrouted}" : "${MC_CLI:=smcroutectl}" +: "${MCD_TABLE_NAME:=selftests}" # Constants for netdevice bring-up: # Default time in seconds to wait for an interface to come up before giving up @@ -1757,6 +1758,38 @@ mc_send() msend -g $groups -I $if_name -c 1 > /dev/null 2>&1 } +adf_mcd_start() +{ + local table_name="$MCD_TABLE_NAME" + local smcroutedir + local pid + local i + + check_command "$MCD" || return 1 + check_command "$MC_CLI" || return 1 + + smcroutedir=$(mktemp -d) + defer rm -rf "$smcroutedir" + + for ((i = 1; i <= NUM_NETIFS; ++i)); do + echo "phyint ${NETIFS[p$i]} enable" >> \ + "$smcroutedir/$table_name.conf" + done + + "$MCD" -N -I "$table_name" -f "$smcroutedir/$table_name.conf" \ + -P "$smcroutedir/$table_name.pid" + busywait "$BUSYWAIT_TIMEOUT" test -e "$smcroutedir/$table_name.pid" + pid=$(cat "$smcroutedir/$table_name.pid") + defer kill_process "$pid" +} + +mc_cli() +{ + local table_name="$MCD_TABLE_NAME" + + "$MC_CLI" -I "$table_name" "$@" +} + start_ip_monitor() { local mtype=$1; shift diff --git a/tools/testing/selftests/net/forwarding/router_multicast.sh b/tools/testing/selftests/net/forwarding/router_multicast.sh index 5a58b1ec8aef..83e52abdbc2e 100755 --- a/tools/testing/selftests/net/forwarding/router_multicast.sh +++ b/tools/testing/selftests/net/forwarding/router_multicast.sh @@ -33,10 +33,6 @@ NUM_NETIFS=6 source lib.sh source tc_common.sh -require_command $MCD -require_command $MC_CLI -table_name=selftests - h1_create() { simple_if_init $h1 198.51.100.2/28 2001:db8:1::2/64 @@ -149,25 +145,6 @@ router_destroy() ip link set dev $rp1 down } -start_mcd() -{ - SMCROUTEDIR="$(mktemp -d)" - - for ((i = 1; i <= $NUM_NETIFS; ++i)); do - echo "phyint ${NETIFS[p$i]} enable" >> \ - $SMCROUTEDIR/$table_name.conf - done - - $MCD -N -I $table_name -f $SMCROUTEDIR/$table_name.conf \ - -P $SMCROUTEDIR/$table_name.pid -} - -kill_mcd() -{ - pkill $MCD - rm -rf $SMCROUTEDIR -} - setup_prepare() { h1=${NETIFS[p1]} @@ -179,7 +156,7 @@ setup_prepare() rp3=${NETIFS[p5]} h3=${NETIFS[p6]} - start_mcd + adf_mcd_start || exit "$EXIT_STATUS" vrf_prepare @@ -206,7 +183,7 @@ cleanup() vrf_cleanup - kill_mcd + defer_scopes_cleanup } create_mcast_sg() @@ -214,9 +191,9 @@ create_mcast_sg() local if_name=$1; shift local s_addr=$1; shift local mcast=$1; shift - local dest_ifs=${@} + local dest_ifs=("${@}") - $MC_CLI -I $table_name add $if_name $s_addr $mcast $dest_ifs + mc_cli add "$if_name" "$s_addr" "$mcast" "${dest_ifs[@]}" } delete_mcast_sg() @@ -224,9 +201,9 @@ delete_mcast_sg() local if_name=$1; shift local s_addr=$1; shift local mcast=$1; shift - local dest_ifs=${@} + local dest_ifs=("${@}") - $MC_CLI -I $table_name remove $if_name $s_addr $mcast $dest_ifs + mc_cli remove "$if_name" "$s_addr" "$mcast" "${dest_ifs[@]}" } mcast_v4() -- 2.49.0

2 weeks, 4 days

1
0
0 0

[PATCH net-next v3 0/8] netpoll: Untangle netconsole and netpoll

by Breno Leitao

Initially netpoll and netconsole were created together, and some functions are in the wrong file. Seperate netconsole-only functions in netconsole, avoiding exports. 1. Expose netpoll logging macros in the public header to enable consistent log formatting across netpoll consumers. 2. Relocate netconsole-specific functions from netpoll to the netconsole module where they are actually used, reducing unnecessary coupling. 3. Remove unnecessary function exports 4. Rename netpoll parsing functions in netconsole to better reflect their specific usage. 5. Create a test to check that cmdline works fine. This was in my todo list since [1], this was a good time to add it here to make sure this patchset doesn't regress. PS: The code was split in a way that it is easy to review. When copying the functions from netpoll to netconsole, I do not change than other than adding `static`. This will make checkpatch unhappy, but, further patches will address the issues. It is done this way to make it easy for reviewers. Link: https://lore.kernel.org/netdev/Z36TlACdNMwFD7wv@dev-ushankar.dev.purestorag… [1] Signed-off-by: Breno Leitao <leitao(a)debian.org> --- Changes in v3: - The cleanup on the netcons_cmdline.sh test was not cleaning the netdevsim. Clean it at the end of the test. (Jakub) - Link to v2: https://lore.kernel.org/r/20250611-rework-v2-0-ab1d92b458ca@debian.org Changes in v2: - No change in the code. Just rebased the patches onto netnext/main - Link to v1: https://lore.kernel.org/r/20250610-rework-v1-0-7cfde283f246@debian.org --- Breno Leitao (8): netpoll: remove __netpoll_cleanup from exported API netpoll: expose netpoll logging macros in public header netpoll: relocate netconsole-specific functions to netconsole module netpoll: move netpoll_print_options to netconsole netconsole: rename functions to better reflect their purpose netconsole: improve code style in parser function selftests: net: Refactor cleanup logic in lib_netcons.sh selftests: net: add netconsole test for cmdline configuration drivers/net/netconsole.c | 137 ++++++++++++++++++++- include/linux/netpoll.h | 10 +- net/core/netpoll.c | 136 +------------------- tools/testing/selftests/drivers/net/Makefile | 1 + .../selftests/drivers/net/lib/sh/lib_netcons.sh | 59 ++++++--- .../selftests/drivers/net/netcons_cmdline.sh | 52 ++++++++ 6 files changed, 240 insertions(+), 155 deletions(-) --- base-commit: 6d4e01d29d87356924f1521ca6df7a364e948f13 change-id: 20250603-rework-c175cad8d22e Best regards, -- Breno Leitao <leitao(a)debian.org>

2 weeks, 4 days

2
9
0 0

[PATCH net-next v2 14/14] selftests: forwarding: Add a test for verifying VXLAN MC underlay

by Petr Machata

Add tests for MC-routing underlay VXLAN traffic. Signed-off-by: Petr Machata <petrm(a)nvidia.com> --- Notes: v2: - Adjust as per shellcheck citations --- CC: Shuah Khan <shuah(a)kernel.org> CC: linux-kselftest(a)vger.kernel.org .../testing/selftests/net/forwarding/Makefile | 1 + .../net/forwarding/vxlan_bridge_1q_mc_ul.sh | 769 ++++++++++++++++++ 2 files changed, 770 insertions(+) create mode 100755 tools/testing/selftests/net/forwarding/vxlan_bridge_1q_mc_ul.sh diff --git a/tools/testing/selftests/net/forwarding/Makefile b/tools/testing/selftests/net/forwarding/Makefile index 00bde7b6f39e..d7bb2e80e88c 100644 --- a/tools/testing/selftests/net/forwarding/Makefile +++ b/tools/testing/selftests/net/forwarding/Makefile @@ -102,6 +102,7 @@ TEST_PROGS = bridge_fdb_learning_limit.sh \ vxlan_bridge_1d_port_8472.sh \ vxlan_bridge_1d.sh \ vxlan_bridge_1q_ipv6.sh \ + vxlan_bridge_1q_mc_ul.sh \ vxlan_bridge_1q_port_8472_ipv6.sh \ vxlan_bridge_1q_port_8472.sh \ vxlan_bridge_1q.sh \ diff --git a/tools/testing/selftests/net/forwarding/vxlan_bridge_1q_mc_ul.sh b/tools/testing/selftests/net/forwarding/vxlan_bridge_1q_mc_ul.sh new file mode 100755 index 000000000000..c62744e1521d --- /dev/null +++ b/tools/testing/selftests/net/forwarding/vxlan_bridge_1q_mc_ul.sh @@ -0,0 +1,769 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +# +-----------------------------------------+ +# | + $h1.10 + $h1.20 | +# | | 192.0.2.1/28 | 2001:db8:1::1/64 | +# | \________ ________/ | +# | \ / | +# | + $h1 H1 (vrf) | +# +-----------|-----------------------------+ +# | +# +-----------|----------------------------------------------------------------+ +# | +---------|--------------------------------------+ SWITCH (main vrf) | +# | | + $swp1 BR1 (802.1q) | | +# | | vid 10 20 | | +# | | | | +# | | + vx10 (vxlan) + vx20 (vxlan) | + lo10 (dummy) | +# | | local 192.0.2.100 local 2001:db8:4::1 | 192.0.2.100/28 | +# | | group 233.252.0.1 group ff0e::1:2:3 | 2001:db8:4::1/64 | +# | | id 1000 id 2000 | | +# | | vid 10 pvid untagged vid 20 pvid untagged | | +# | +------------------------------------------------+ | +# | | +# | + $swp2 $swp3 + | +# | | 192.0.2.33/28 192.0.2.65/28 | | +# | | 2001:db8:2::1/64 2001:db8:3::1/64 | | +# | | | | +# +---|--------------------------------------------------------------------|---+ +# | | +# +---|--------------------------------+ +--------------------------------|---+ +# | | H2 (vrf) | | H3 (vrf) | | +# | +-|----------------------------+ | | +-----------------------------|-+ | +# | | + $h2 BR2 (802.1d) | | | | BR3 (802.1d) $h3 + | | +# | | | | | | | | +# | | + v1$h2 (veth) | | | | v1$h3 (veth) + | | +# | +-|----------------------------+ | | +-----------------------------|-+ | +# | | | | | | +# +---|--------------------------------+ +--------------------------------|---+ +# | | +# +---|--------------------------------+ +--------------------------------|---+ +# | + v2$h2 (veth) NS2 (netns) | | NS3 (netns) v2$h3 (veth) + | +# | 192.0.2.34/28 | | 192.0.2.66/28 | +# | 2001:db8:2::2/64 | | 2001:db8:3::2/64 | +# | | | | +# | +--------------------------------+ | | +--------------------------------+ | +# | | BR1 (802.1q) | | | | BR1 (802.1q) | | +# | | + vx10 (vxlan) | | | | + vx10 (vxlan) | | +# | | local 192.0.2.34 | | | | local 192.0.2.50 | | +# | | group 233.252.0.1 dev v2$h2 | | | | group 233.252.0.1 dev v2$h3 | | +# | | id 1000 dstport $VXPORT | | | | id 1000 dstport $VXPORT | | +# | | vid 10 pvid untagged | | | | vid 10 pvid untagged | | +# | | | | | | | | +# | | + vx20 (vxlan) | | | | + vx20 (vxlan) | | +# | | local 2001:db8:2::2 | | | | local 2001:db8:3::2 | | +# | | group ff0e::1:2:3 dev v2$h2 | | | | group ff0e::1:2:3 dev v2$h3 | | +# | | id 2000 dstport $VXPORT | | | | id 2000 dstport $VXPORT | | +# | | vid 20 pvid untagged | | | | vid 20 pvid untagged | | +# | | | | | | | | +# | | + w1 (veth) | | | | + w1 (veth) | | +# | | | vid 10 20 | | | | | vid 10 20 | | +# | +--|-----------------------------+ | | +--|-----------------------------+ | +# | | | | | | +# | +--|-----------------------------+ | | +--|-----------------------------+ | +# | | + w2 (veth) VW2 (vrf) | | | | + w2 (veth) VW2 (vrf) | | +# | | |\ | | | | |\ | | +# | | | + w2.10 | | | | | + w2.10 | | +# | | | 192.0.2.3/28 | | | | | 192.0.2.4/28 | | +# | | | | | | | | | | +# | | + w2.20 | | | | + w2.20 | | +# | | 2001:db8:1::3/64 | | | | 2001:db8:1::4/64 | | +# | +--------------------------------+ | | +--------------------------------+ | +# +------------------------------------+ +------------------------------------+ + +: "${VXPORT:=4789}" +export VXPORT + +: "${GROUP4:=233.252.0.1}" +export GROUP4 + +: "${GROUP6:=ff0e::1:2:3}" +export GROUP6 + +: "${IPMR:=lo10}" + +ALL_TESTS=" + ipv4_nomcroute + ipv4_mcroute + ipv4_mcroute_changelink + ipv4_mcroute_starg + ipv4_mcroute_noroute + ipv4_mcroute_fdb + ipv4_mcroute_fdb_oif0 + ipv4_mcroute_fdb_oif0_sep + + ipv6_nomcroute + ipv6_mcroute + ipv6_mcroute_changelink + ipv6_mcroute_starg + ipv6_mcroute_noroute + ipv6_mcroute_fdb + ipv6_mcroute_fdb_oif0 + + ipv4_nomcroute_rx + ipv4_mcroute_rx + ipv4_mcroute_starg_rx + ipv4_mcroute_fdb_oif0_sep_rx + ipv4_mcroute_fdb_sep_rx + + ipv6_nomcroute_rx + ipv6_mcroute_rx + ipv6_mcroute_starg_rx + ipv6_mcroute_fdb_sep_rx +" + +NUM_NETIFS=6 +source lib.sh + +h1_create() +{ + simple_if_init "$h1" + defer simple_if_fini "$h1" + + ip_link_add "$h1.10" master "v$h1" link "$h1" type vlan id 10 + ip_link_set_up "$h1.10" + ip_addr_add "$h1.10" 192.0.2.1/28 + + ip_link_add "$h1.20" master "v$h1" link "$h1" type vlan id 20 + ip_link_set_up "$h1.20" + ip_addr_add "$h1.20" 2001:db8:1::1/64 +} + +install_capture() +{ + local dev=$1; shift + + tc qdisc add dev "$dev" clsact + defer tc qdisc del dev "$dev" clsact + + tc filter add dev "$dev" ingress proto ip pref 104 \ + flower skip_hw ip_proto udp dst_port "$VXPORT" \ + action pass + defer tc filter del dev "$dev" ingress proto ip pref 104 + + tc filter add dev "$dev" ingress proto ipv6 pref 106 \ + flower skip_hw ip_proto udp dst_port "$VXPORT" \ + action pass + defer tc filter del dev "$dev" ingress proto ipv6 pref 106 +} + +h2_create() +{ + # $h2 + ip_link_set_up "$h2" + + # H2 + vrf_create "v$h2" + defer vrf_destroy "v$h2" + + ip_link_set_up "v$h2" + + # br2 + ip_link_add br2 type bridge vlan_filtering 0 mcast_snooping 0 + ip_link_set_master br2 "v$h2" + ip_link_set_up br2 + + # $h2 + ip_link_set_master "$h2" br2 + install_capture "$h2" + + # v1$h2 + ip_link_set_up "v1$h2" + ip_link_set_master "v1$h2" br2 +} + +h3_create() +{ + # $h3 + ip_link_set_up "$h3" + + # H3 + vrf_create "v$h3" + defer vrf_destroy "v$h3" + + ip_link_set_up "v$h3" + + # br3 + ip_link_add br3 type bridge vlan_filtering 0 mcast_snooping 0 + ip_link_set_master br3 "v$h3" + ip_link_set_up br3 + + # $h3 + ip_link_set_master "$h3" br3 + install_capture "$h3" + + # v1$h3 + ip_link_set_up "v1$h3" + ip_link_set_master "v1$h3" br3 +} + +switch_create() +{ + local swp1_mac + + # br1 + swp1_mac=$(mac_get "$swp1") + ip_link_add br1 type bridge vlan_filtering 1 \ + vlan_default_pvid 0 mcast_snooping 0 + ip_link_set_addr br1 "$swp1_mac" + ip_link_set_up br1 + + # A dummy to force the IPv6 OIF=0 test to install a suitable MC route on + # $IPMR to be deterministic. Also used for the IPv6 RX!=TX ping test. + ip_link_add "X$IPMR" up type dummy + + # IPMR + ip_link_add "$IPMR" up type dummy + ip_addr_add "$IPMR" 192.0.2.100/28 + ip_addr_add "$IPMR" 2001:db8:4::1/64 + + # $swp1 + ip_link_set_up "$swp1" + ip_link_set_master "$swp1" br1 + bridge_vlan_add vid 10 dev "$swp1" + bridge_vlan_add vid 20 dev "$swp1" + + # $swp2 + ip_link_set_up "$swp2" + ip_addr_add "$swp2" 192.0.2.33/28 + ip_addr_add "$swp2" 2001:db8:2::1/64 + + # $swp3 + ip_link_set_up "$swp3" + ip_addr_add "$swp3" 192.0.2.65/28 + ip_addr_add "$swp3" 2001:db8:3::1/64 +} + +vx_create() +{ + local name=$1; shift + local vid=$1; shift + + ip_link_add "$name" up type vxlan dstport "$VXPORT" \ + nolearning noudpcsum tos inherit ttl 16 \ + "$@" + ip_link_set_master "$name" br1 + bridge_vlan_add vid "$vid" dev "$name" pvid untagged +} +export -f vx_create + +vx_wait() +{ + # Wait for all the ARP, IGMP etc. noise to settle down so that the + # tunnel is clear for measurements. + sleep 10 +} + +vx10_create() +{ + vx_create vx10 10 id 1000 "$@" +} +export -f vx10_create + +vx20_create() +{ + vx_create vx20 20 id 2000 "$@" +} +export -f vx20_create + +vx10_create_wait() +{ + vx10_create "$@" + vx_wait +} + +vx20_create_wait() +{ + vx20_create "$@" + vx_wait +} + +ns_init_common() +{ + local ns=$1; shift + local if_in=$1; shift + local ipv4_in=$1; shift + local ipv6_in=$1; shift + local ipv4_host=$1; shift + local ipv6_host=$1; shift + + # v2$h2 / v2$h3 + ip_link_set_up "$if_in" + ip_addr_add "$if_in" "$ipv4_in" + ip_addr_add "$if_in" "$ipv6_in" + + # br1 + ip_link_add br1 type bridge vlan_filtering 1 \ + vlan_default_pvid 0 mcast_snooping 0 + ip_link_set_up br1 + + # vx10, vx20 + vx10_create local "${ipv4_in%/*}" group "$GROUP4" dev "$if_in" + vx20_create local "${ipv6_in%/*}" group "$GROUP6" dev "$if_in" + + # w1 + ip_link_add w1 type veth peer name w2 + ip_link_set_master w1 br1 + ip_link_set_up w1 + bridge_vlan_add vid 10 dev w1 + bridge_vlan_add vid 20 dev w1 + + # w2 + simple_if_init w2 + defer simple_if_fini w2 + + # w2.10 + ip_link_add w2.10 master vw2 link w2 type vlan id 10 + ip_link_set_up w2.10 + ip_addr_add w2.10 "$ipv4_host" + + # w2.20 + ip_link_add w2.20 master vw2 link w2 type vlan id 20 + ip_link_set_up w2.20 + ip_addr_add w2.20 "$ipv6_host" +} +export -f ns_init_common + +ns2_create() +{ + # NS2 + ip netns add ns2 + defer ip netns del ns2 + + # v2$h2 + ip link set dev "v2$h2" netns ns2 + defer ip -n ns2 link set dev "v2$h2" netns 1 + + in_ns ns2 \ + ns_init_common ns2 "v2$h2" \ + 192.0.2.34/28 2001:db8:2::2/64 \ + 192.0.2.3/28 2001:db8:1::3/64 +} + +ns3_create() +{ + # NS3 + ip netns add ns3 + defer ip netns del ns3 + + # v2$h3 + ip link set dev "v2$h3" netns ns3 + defer ip -n ns3 link set dev "v2$h3" netns 1 + + ip -n ns3 link set dev "v2$h3" up + + in_ns ns3 \ + ns_init_common ns3 "v2$h3" \ + 192.0.2.66/28 2001:db8:3::2/64 \ + 192.0.2.4/28 2001:db8:1::4/64 +} + +setup_prepare() +{ + h1=${NETIFS[p1]} + swp1=${NETIFS[p2]} + + swp2=${NETIFS[p3]} + h2=${NETIFS[p4]} + + swp3=${NETIFS[p5]} + h3=${NETIFS[p6]} + + vrf_prepare + defer vrf_cleanup + + forwarding_enable + defer forwarding_restore + + ip_link_add "v1$h2" type veth peer name "v2$h2" + ip_link_add "v1$h3" type veth peer name "v2$h3" + + h1_create + h2_create + h3_create + switch_create + ns2_create + ns3_create +} + +adf_install_broken_sg() +{ + adf_mcd_start "$IPMR" || exit "$EXIT_STATUS" + + mc_cli add "$swp2" 192.0.2.100 "$GROUP4" "$swp1" "$swp3" + defer mc_cli remove "$swp2" 192.0.2.100 "$GROUP4" "$swp1" "$swp3" + + mc_cli add "$swp2" 2001:db8:4::1 "$GROUP6" "$swp1" "$swp3" + defer mc_cli remove "$swp2" 2001:db8:4::1 "$GROUP6" "$swp1" "$swp3" +} + +adf_install_rx() +{ + mc_cli add "$swp2" 0.0.0.0 "$GROUP4" "$IPMR" + defer mc_cli remove "$swp2" 0.0.0.0 "$GROUP4" lo10 + + mc_cli add "$swp3" 0.0.0.0 "$GROUP4" "$IPMR" + defer mc_cli remove "$swp3" 0.0.0.0 "$GROUP4" lo10 + + mc_cli add "$swp2" :: "$GROUP6" "$IPMR" + defer mc_cli remove "$swp2" :: "$GROUP6" lo10 + + mc_cli add "$swp3" :: "$GROUP6" "$IPMR" + defer mc_cli remove "$swp3" :: "$GROUP6" lo10 +} + +adf_install_sg() +{ + adf_mcd_start "$IPMR" || exit "$EXIT_STATUS" + + mc_cli add "$IPMR" 192.0.2.100 "$GROUP4" "$swp2" "$swp3" + defer mc_cli remove "$IPMR" 192.0.2.33 "$GROUP4" "$swp2" "$swp3" + + mc_cli add "$IPMR" 2001:db8:4::1 "$GROUP6" "$swp2" "$swp3" + defer mc_cli remove "$IPMR" 2001:db8:4::1 "$GROUP6" "$swp2" "$swp3" + + adf_install_rx +} + +adf_install_sg_sep() +{ + adf_mcd_start lo || exit "$EXIT_STATUS" + + mc_cli add lo 192.0.2.120 "$GROUP4" "$swp2" "$swp3" + defer mc_cli remove lo 192.0.2.120 "$GROUP4" "$swp2" "$swp3" + + mc_cli add lo 2001:db8:5::1 "$GROUP6" "$swp2" "$swp3" + defer mc_cli remove lo 2001:db8:5::1 "$GROUP6" "$swp2" "$swp3" +} + +adf_install_sg_sep_rx() +{ + local lo=$1; shift + + adf_mcd_start "$IPMR" "$lo" || exit "$EXIT_STATUS" + + mc_cli add "$lo" 192.0.2.120 "$GROUP4" "$swp2" "$swp3" + defer mc_cli remove "$lo" 192.0.2.120 "$GROUP4" "$swp2" "$swp3" + + mc_cli add "$lo" 2001:db8:5::1 "$GROUP6" "$swp2" "$swp3" + defer mc_cli remove "$lo" 2001:db8:5::1 "$GROUP6" "$swp2" "$swp3" + + adf_install_rx +} + +adf_install_starg() +{ + adf_mcd_start "$IPMR" || exit "$EXIT_STATUS" + + mc_cli add "$IPMR" 0.0.0.0 "$GROUP4" "$swp2" "$swp3" + defer mc_cli remove "$IPMR" 0.0.0.0 "$GROUP4" "$swp2" "$swp3" + + mc_cli add "$IPMR" :: "$GROUP6" "$swp2" "$swp3" + defer mc_cli remove "$IPMR" :: "$GROUP6" "$swp2" "$swp3" + + adf_install_rx +} + +do_packets_v4() +{ + local mac + + mac=$(mac_get "$h2") + "$MZ" "$h1" -Q 10 -c 10 -d 100msec -p 64 -a own -b "$mac" \ + -A 192.0.2.1 -B 192.0.2.2 -t udp sp=1234,dp=2345 -q +} + +do_packets_v6() +{ + local mac + + mac=$(mac_get "$h2") + "$MZ" -6 "$h1" -Q 20 -c 10 -d 100msec -p 64 -a own -b "$mac" \ + -A 2001:db8:1::1 -B 2001:db8:1::2 -t udp sp=1234,dp=2345 -q +} + +do_test() +{ + local ipv=$1; shift + local expect_h2=$1; shift + local expect_h3=$1; shift + local what=$1; shift + + local pref=$((100 + ipv)) + local t0_h2 + local t0_h3 + local t1_h2 + local t1_h3 + local d_h2 + local d_h3 + + RET=0 + + t0_h2=$(tc_rule_stats_get "$h2" "$pref" ingress) + t0_h3=$(tc_rule_stats_get "$h3" "$pref" ingress) + + "do_packets_v$ipv" + sleep 1 + + t1_h2=$(tc_rule_stats_get "$h2" "$pref" ingress) + t1_h3=$(tc_rule_stats_get "$h3" "$pref" ingress) + + d_h2=$((t1_h2 - t0_h2)) + d_h3=$((t1_h3 - t0_h3)) + + ((d_h2 == expect_h2)) + check_err $? "Expected $expect_h2 packets on H2, got $d_h2" + + ((d_h3 == expect_h3)) + check_err $? "Expected $expect_h3 packets on H3, got $d_h3" + + log_test "VXLAN MC flood $what" +} + +ipv4_do_test_rx() +{ + local h3_should_fail=$1; shift + local what=$1; shift + + RET=0 + + ping_do "$h1.10" 192.0.2.3 + check_err $? "H2 should respond" + + ping_do "$h1.10" 192.0.2.4 + check_err_fail "$h3_should_fail" $? "H3 responds" + + log_test "VXLAN MC flood $what" +} + +ipv6_do_test_rx() +{ + local h3_should_fail=$1; shift + local what=$1; shift + + RET=0 + + ping6_do "$h1.20" 2001:db8:1::3 + check_err $? "H2 should respond" + + ping6_do "$h1.20" 2001:db8:1::4 + check_err_fail "$h3_should_fail" $? "H3 responds" + + log_test "VXLAN MC flood $what" +} + +ipv4_nomcroute() +{ + # Install a misleading (S,G) rule to attempt to trick the system into + # pushing the packets elsewhere. + adf_install_broken_sg + vx10_create_wait local 192.0.2.100 group "$GROUP4" dev "$swp2" + do_test 4 10 0 "IPv4 nomcroute" +} + +ipv6_nomcroute() +{ + # Like for IPv4, install a misleading (S,G). + adf_install_broken_sg + vx20_create_wait local 2001:db8:4::1 group "$GROUP6" dev "$swp2" + do_test 6 10 0 "IPv6 nomcroute" +} + +ipv4_nomcroute_rx() +{ + vx10_create local 192.0.2.100 group "$GROUP4" dev "$swp2" + ipv4_do_test_rx 1 "IPv4 nomcroute ping" +} + +ipv6_nomcroute_rx() +{ + vx20_create local 2001:db8:4::1 group "$GROUP6" dev "$swp2" + ipv6_do_test_rx 1 "IPv6 nomcroute ping" +} + +ipv4_mcroute() +{ + adf_install_sg + vx10_create_wait local 192.0.2.100 group "$GROUP4" dev "$IPMR" mcroute + do_test 4 10 10 "IPv4 mcroute" +} + +ipv6_mcroute() +{ + adf_install_sg + vx20_create_wait local 2001:db8:4::1 group "$GROUP6" dev "$IPMR" mcroute + do_test 6 10 10 "IPv6 mcroute" +} + +ipv4_mcroute_rx() +{ + adf_install_sg + vx10_create_wait local 192.0.2.100 group "$GROUP4" dev "$IPMR" mcroute + ipv4_do_test_rx 0 "IPv4 mcroute ping" +} + +ipv6_mcroute_rx() +{ + adf_install_sg + vx20_create_wait local 2001:db8:4::1 group "$GROUP6" dev "$IPMR" mcroute + ipv6_do_test_rx 0 "IPv6 mcroute ping" +} + +ipv4_mcroute_changelink() +{ + adf_install_sg + vx10_create_wait local 192.0.2.100 group "$GROUP4" dev "$IPMR" + ip link set dev vx10 type vxlan mcroute + sleep 1 + do_test 4 10 10 "IPv4 mcroute changelink" +} + +ipv6_mcroute_changelink() +{ + adf_install_sg + vx20_create_wait local 2001:db8:4::1 group "$GROUP6" dev "$IPMR" mcroute + ip link set dev vx20 type vxlan mcroute + sleep 1 + do_test 6 10 10 "IPv6 mcroute changelink" +} + +ipv4_mcroute_starg() +{ + adf_install_starg + vx10_create_wait local 192.0.2.100 group "$GROUP4" dev "$IPMR" mcroute + do_test 4 10 10 "IPv4 mcroute (*,G)" +} + +ipv6_mcroute_starg() +{ + adf_install_starg + vx20_create_wait local 2001:db8:4::1 group "$GROUP6" dev "$IPMR" mcroute + do_test 6 10 10 "IPv6 mcroute (*,G)" +} + +ipv4_mcroute_starg_rx() +{ + adf_install_starg + vx10_create_wait local 192.0.2.100 group "$GROUP4" dev "$IPMR" mcroute + ipv4_do_test_rx 0 "IPv4 mcroute (*,G) ping" +} + +ipv6_mcroute_starg_rx() +{ + adf_install_starg + vx20_create_wait local 2001:db8:4::1 group "$GROUP6" dev "$IPMR" mcroute + ipv6_do_test_rx 0 "IPv6 mcroute (*,G) ping" +} + +ipv4_mcroute_noroute() +{ + vx10_create_wait local 192.0.2.100 group "$GROUP4" dev "$IPMR" mcroute + do_test 4 0 0 "IPv4 mcroute, no route" +} + +ipv6_mcroute_noroute() +{ + vx20_create_wait local 2001:db8:4::1 group "$GROUP6" dev "$IPMR" mcroute + do_test 6 0 0 "IPv6 mcroute, no route" +} + +ipv4_mcroute_fdb() +{ + adf_install_sg + vx10_create_wait local 192.0.2.100 dev "$IPMR" mcroute + bridge fdb add dev vx10 \ + 00:00:00:00:00:00 self static dst "$GROUP4" via "$IPMR" + do_test 4 10 10 "IPv4 mcroute FDB" +} + +ipv6_mcroute_fdb() +{ + adf_install_sg + vx20_create_wait local 2001:db8:4::1 dev "$IPMR" mcroute + bridge -6 fdb add dev vx20 \ + 00:00:00:00:00:00 self static dst "$GROUP6" via "$IPMR" + do_test 6 10 10 "IPv6 mcroute FDB" +} + +# Use FDB to configure VXLAN in a way where oif=0 for purposes of FIB lookup. +ipv4_mcroute_fdb_oif0() +{ + adf_install_sg + vx10_create_wait local 192.0.2.100 group "$GROUP4" dev "$IPMR" mcroute + bridge fdb del dev vx10 00:00:00:00:00:00 + bridge fdb add dev vx10 00:00:00:00:00:00 self static dst "$GROUP4" + do_test 4 10 10 "IPv4 mcroute oif=0" +} + +ipv6_mcroute_fdb_oif0() +{ + # The IPv6 tunnel lookup does not fall back to selection by source + # address. Instead it just does a FIB match, and that would find one of + # the several ff00::/8 multicast routes -- each device has one. In order + # to reliably force the $IPMR device, add a /128 route for the + # destination group address. + ip -6 route add table local multicast "$GROUP6/128" dev "$IPMR" + defer ip -6 route del table local multicast "$GROUP6/128" dev "$IPMR" + + adf_install_sg + vx20_create_wait local 2001:db8:4::1 group "$GROUP6" dev "$IPMR" mcroute + bridge -6 fdb del dev vx20 00:00:00:00:00:00 + bridge -6 fdb add dev vx20 00:00:00:00:00:00 self static dst "$GROUP6" + do_test 6 10 10 "IPv6 mcroute oif=0" +} + +# In oif=0 test as above, have FIB lookup resolve to loopback instead of IPMR. +# This doesn't work with IPv6 -- a MC route on lo would be marked as RTF_REJECT. +ipv4_mcroute_fdb_oif0_sep() +{ + adf_install_sg_sep + + ip_addr_add lo 192.0.2.120/28 + vx10_create_wait local 192.0.2.120 group "$GROUP4" dev "$IPMR" mcroute + bridge fdb del dev vx10 00:00:00:00:00:00 + bridge fdb add dev vx10 00:00:00:00:00:00 self static dst "$GROUP4" + do_test 4 10 10 "IPv4 mcroute TX!=RX oif=0" +} + +ipv4_mcroute_fdb_oif0_sep_rx() +{ + adf_install_sg_sep_rx lo + + ip_addr_add lo 192.0.2.120/28 + vx10_create_wait local 192.0.2.120 group "$GROUP4" dev "$IPMR" mcroute + bridge fdb del dev vx10 00:00:00:00:00:00 + bridge fdb add dev vx10 00:00:00:00:00:00 self static dst "$GROUP4" + ipv4_do_test_rx 0 "IPv4 mcroute TX!=RX oif=0 ping" +} + +ipv4_mcroute_fdb_sep_rx() +{ + adf_install_sg_sep_rx lo + + ip_addr_add lo 192.0.2.120/28 + vx10_create_wait local 192.0.2.120 group "$GROUP4" dev "$IPMR" mcroute + bridge fdb del dev vx10 00:00:00:00:00:00 + bridge fdb add \ + dev vx10 00:00:00:00:00:00 self static dst "$GROUP4" via lo + ipv4_do_test_rx 0 "IPv4 mcroute TX!=RX ping" +} + +ipv6_mcroute_fdb_sep_rx() +{ + adf_install_sg_sep_rx "X$IPMR" + + ip_addr_add "X$IPMR" 2001:db8:5::1/64 + vx20_create_wait local 2001:db8:5::1 group "$GROUP6" dev "$IPMR" mcroute + bridge -6 fdb del dev vx20 00:00:00:00:00:00 + bridge -6 fdb add dev vx20 00:00:00:00:00:00 \ + self static dst "$GROUP6" via "X$IPMR" + ipv6_do_test_rx 0 "IPv6 mcroute TX!=RX ping" +} + +trap cleanup EXIT + +setup_prepare +setup_wait +tests_run + +exit "$EXIT_STATUS" -- 2.49.0

2 weeks, 5 days

3
8
0 0

clang: selftests mm pkey_sighandler_tests.c warning duplicate 'inline' declaration specifier [-Wduplicate-decl-specifier]

by Naresh Kamboju

The following build warnings were noticed while building selftests/mm with clang nightly toolchain for arm64 and x86_64 architectures. Regressions found on arm64 and x86_64 - Build/clang-nightly-lkftconfig-kselftest Regression Analysis: - New regression? Yes - Reproducibility? Yes Build regression: selftests mm pkey_sighandler_tests.c warning duplicate 'inline' declaration specifier [-Wduplicate-decl-specifier] Build regression: selftests mm mremap_test.c warning pointer comparison always evaluates to false [-Wtautological-compare] Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org> ## Build log make[4]: Entering directory '/builds/linux/tools/testing/selftests/mm' /bin/sh ./check_config.sh clang --target=aarch64-linux-gnu -fintegrated-as -Werror=unknown-warning-option -Werror=ignored-optimization-argument -Werror=option-ignored -Werror=unused-command-line-argument --target=aarch64-linux-gnu -fintegrated-as CC cow CC compaction_test CC gup_longterm CC gup_test CC hmm-tests CC hugetlb-madvise CC hugetlb-read-hwpoison CC hugetlb-soft-offline CC hugepage-mmap CC hugepage-mremap CC hugepage-shm CC hugepage-vmemmap CC khugepaged CC madv_populate CC map_fixed_noreplace CC map_hugetlb CC map_populate CC memfd_secret CC migration CC mkdirty CC mlock-random-test CC mlock2-tests CC mrelease_test CC mremap_dontunmap CC mremap_test mremap_test.c:425:31: warning: pointer comparison always evaluates to false [-Wtautological-compare] 425 | if (addr + c.dest_alignment < addr) { | ^ 1 warning generated. CC mseal_test CC on-fault-limit CC pagemap_ioctl CC pfnmap CC thuge-gen CC transhuge-stress CC uffd-stress CC uffd-unit-tests CC uffd-wp-mremap CC split_huge_page_test CC ksm_tests CC ksm_functional_tests CC mdwe_test CC hugetlb_fault_after_madv CC hugetlb_madv_vs_map CC hugetlb_dio CC droppable CC guard-regions CC merge CC protection_keys CC pkey_sighandler_tests pkey_sighandler_tests.c:44:15: warning: duplicate 'inline' declaration specifier [-Wduplicate-decl-specifier] 44 | static inline __always_inline | ^ /usr/lib/gcc-cross/aarch64-linux-gnu/12/../../../../aarch64-linux-gnu/include/sys/cdefs.h:424:26: note: expanded from macro '__always_inline' 424 | # define __always_inline __inline __attribute__ ((__always_inline__)) | ^ 1 warning generated. CC va_high_addr_switch CC virtual_address_range CC write_to_hugetlbfs Warning: missing Module.symvers, please have the kernel built first. page_frag test will be skipped. make[4]: Leaving directory '/builds/linux/tools/testing/selftests/mm' ## Source * Kernel version: 6.16.0-rc2 * Git tree: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git * Git sha: 050f8ad7b58d9079455af171ac279c4b9b828c11 * Git describe: next-20250616 * Project details: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250616/ * Architectures: arm64, x86_64 * Toolchains: clang nightly * Kconfigs: selftest/mm/config+defconfig+ ## Build arm64 * Build log: https://qa-reports.linaro.org/api/testruns/28765515/log_file/ * Build details: https://regressions.linaro.org/lkft/linux-next-master/next-20250616/log-par… * Build link: https://storage.tuxsuite.com/public/linaro/lkft/builds/2ya0viPHafKAe0u89drI… * Kernel config: https://storage.tuxsuite.com/public/linaro/lkft/builds/2ya0viPHafKAe0u89drI… ## Steps to reproduce on arm64 - tuxmake --runtime podman --target-arch arm64 --toolchain clang-20 \ --kconfig defconfig \ --kconfig-add https://gitlab.com/Linaro/lkft/kernel-fragments/-/raw/main/netdev.config \ --kconfig-add https://gitlab.com/Linaro/lkft/kernel-fragments/-/raw/main/systemd.config \ --kconfig-add CONFIG_SYN_COOKIES=y \ --kconfig-add CONFIG_SCHEDSTATS=y LLVM=1 LLVM_IAS=1 debugkernel dtbs dtbs-legacy headers kernel kselftest modules -- Linaro LKFT https://lkft.linaro.org

2 weeks, 5 days

1
0
0 0

[PATCH net-next v2 0/5] netconsole: Add support for msgid in sysdata

by Gustavo Luiz Duarte

This patch series introduces a new feature to netconsole which allows appending a message ID to the userdata dictionary. If the msgid feature is enabled, the message ID is built from a per-target 32 bit counter that is incremented and appended to every message sent to the target. Example:: echo 1 > "/sys/kernel/config/netconsole/cmdline0/userdata/msgid_enabled" echo "This is message #1" > /dev/kmsg echo "This is message #2" > /dev/kmsg 13,434,54928466,-;This is message #1 msgid=1 13,435,54934019,-;This is message #2 msgid=2 This feature can be used by the target to detect if messages were dropped or reordered before reaching the target. This allows system administrators to assess the reliability of their netconsole pipeline and detect loss of messages due to network contention or temporary unavailability. Suggested-by: Breno Leitao <leitao(a)debian.org> Signed-off-by: Gustavo Luiz Duarte <gustavold(a)gmail.com> --- Changes in v2: - Use wrapping_assign_add() to avoid warnings in UBSAN and friends. - Improve documentation to clarify wrapping and distinguish msgid from sequnum. - Rebase and fix conflict in prepare_extradata(). - Link to v1: https://lore.kernel.org/r/20250611-netconsole-msgid-v1-0-1784a51feb1e@gmail… --- Gustavo Luiz Duarte (5): netconsole: introduce 'msgid' as a new sysdata field netconsole: implement configfs for msgid_enabled netconsole: append msgid to sysdata selftests: netconsole: Add tests for 'msgid' feature in sysdata docs: netconsole: document msgid feature Documentation/networking/netconsole.rst | 32 +++++++++++ drivers/net/netconsole.c | 65 ++++++++++++++++++++++ .../selftests/drivers/net/netcons_sysdata.sh | 30 ++++++++++ 3 files changed, 127 insertions(+) --- base-commit: 535de528015b56e34a40a8e1eb1629fadf809a84 change-id: 20250609-netconsole-msgid-b93c6f8e9c60 Best regards, -- Gustavo Luiz Duarte <gustavold(a)gmail.com>

2 weeks, 5 days

3
8
0 0

[PATCH bpf-next v3 1/2] bpftool: Use appropriate permissions for map access

by Slava Imameev

Modify several functions in tools/bpf/bpftool/common.c to allow specification of requested access for file descriptors, such as read-only access. Update bpftool to request only read access for maps when write access is not required. This fixes errors when reading from maps that are protected from modification via security_bpf_map. Signed-off-by: Slava Imameev <slava.imameev(a)crowdstrike.com> --- Changes in v2: - fix for a test compilation error: "conflicting types for 'bpf_fentry_test1'" Changes in v3: - Addressed review feedback - Converted the check for flags to an assert in map_parse_fds - Modified map_fd_by_name to keep an existing fd where possible - Fixed requested access for map delete command in do_delete - Changed requested access to RDONLY for inner map fd in do_create - Changed requested access to RDONLY for iterator fd in do_pin --- --- tools/bpf/bpftool/btf.c | 3 +- tools/bpf/bpftool/common.c | 58 ++++++++++++++++++++++--------- tools/bpf/bpftool/iter.c | 2 +- tools/bpf/bpftool/link.c | 2 +- tools/bpf/bpftool/main.h | 13 ++++--- tools/bpf/bpftool/map.c | 56 +++++++++++++++++------------ tools/bpf/bpftool/map_perf_ring.c | 3 +- tools/bpf/bpftool/prog.c | 4 +-- 8 files changed, 91 insertions(+), 50 deletions(-) diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c index 6b14cbfa58aa..1ba27cb03348 100644 --- a/tools/bpf/bpftool/btf.c +++ b/tools/bpf/bpftool/btf.c @@ -905,7 +905,8 @@ static int do_dump(int argc, char **argv) return -1; } - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, + BPF_F_RDONLY); if (fd < 0) return -1; diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c index ecfa790adc13..3bdc65112c0d 100644 --- a/tools/bpf/bpftool/common.c +++ b/tools/bpf/bpftool/common.c @@ -193,7 +193,8 @@ int mount_tracefs(const char *target) return err; } -int open_obj_pinned(const char *path, bool quiet) +int open_obj_pinned(const char *path, bool quiet, + const struct bpf_obj_get_opts *opts) { char *pname; int fd = -1; @@ -205,7 +206,7 @@ int open_obj_pinned(const char *path, bool quiet) goto out_ret; } - fd = bpf_obj_get(pname); + fd = bpf_obj_get_opts(pname, opts); if (fd < 0) { if (!quiet) p_err("bpf obj get (%s): %s", pname, @@ -221,12 +222,13 @@ int open_obj_pinned(const char *path, bool quiet) return fd; } -int open_obj_pinned_any(const char *path, enum bpf_obj_type exp_type) +int open_obj_pinned_any(const char *path, enum bpf_obj_type exp_type, + const struct bpf_obj_get_opts *opts) { enum bpf_obj_type type; int fd; - fd = open_obj_pinned(path, false); + fd = open_obj_pinned(path, false, opts); if (fd < 0) return -1; @@ -555,7 +557,7 @@ static int do_build_table_cb(const char *fpath, const struct stat *sb, if (typeflag != FTW_F) goto out_ret; - fd = open_obj_pinned(fpath, true); + fd = open_obj_pinned(fpath, true, NULL); if (fd < 0) goto out_ret; @@ -928,7 +930,7 @@ int prog_parse_fds(int *argc, char ***argv, int **fds) path = **argv; NEXT_ARGP(); - (*fds)[0] = open_obj_pinned_any(path, BPF_OBJ_PROG); + (*fds)[0] = open_obj_pinned_any(path, BPF_OBJ_PROG, NULL); if ((*fds)[0] < 0) return -1; return 1; @@ -965,7 +967,8 @@ int prog_parse_fd(int *argc, char ***argv) return fd; } -static int map_fd_by_name(char *name, int **fds) +static int map_fd_by_name(char *name, int **fds, + const struct bpf_get_fd_by_id_opts *opts) { unsigned int id = 0; int fd, nb_fds = 0; @@ -973,6 +976,7 @@ static int map_fd_by_name(char *name, int **fds) int err; while (true) { + LIBBPF_OPTS(bpf_get_fd_by_id_opts, opts_ro); struct bpf_map_info info = {}; __u32 len = sizeof(info); @@ -985,7 +989,9 @@ static int map_fd_by_name(char *name, int **fds) return nb_fds; } - fd = bpf_map_get_fd_by_id(id); + /* Request a read-only fd to query the map info */ + opts_ro.open_flags = BPF_F_RDONLY; + fd = bpf_map_get_fd_by_id_opts(id, &opts_ro); if (fd < 0) { p_err("can't get map by id (%u): %s", id, strerror(errno)); @@ -1004,6 +1010,19 @@ static int map_fd_by_name(char *name, int **fds) continue; } + /* Get an fd with the requested options, if they differ + * from the read-only options used to get the fd above. + */ + if (memcmp(opts, &opts_ro, sizeof(opts_ro))) { + close(fd); + fd = bpf_map_get_fd_by_id_opts(id, opts); + if (fd < 0) { + p_err("can't get map by id (%u): %s", id, + strerror(errno)); + goto err_close_fds; + } + } + if (nb_fds > 0) { tmp = realloc(*fds, (nb_fds + 1) * sizeof(int)); if (!tmp) { @@ -1023,8 +1042,13 @@ static int map_fd_by_name(char *name, int **fds) return -1; } -int map_parse_fds(int *argc, char ***argv, int **fds) +int map_parse_fds(int *argc, char ***argv, int **fds, __u32 open_flags) { + LIBBPF_OPTS(bpf_get_fd_by_id_opts, opts); + + assert((open_flags & ~BPF_F_RDONLY) == 0); + opts.open_flags = open_flags; + if (is_prefix(**argv, "id")) { unsigned int id; char *endptr; @@ -1038,7 +1062,7 @@ int map_parse_fds(int *argc, char ***argv, int **fds) } NEXT_ARGP(); - (*fds)[0] = bpf_map_get_fd_by_id(id); + (*fds)[0] = bpf_map_get_fd_by_id_opts(id, &opts); if ((*fds)[0] < 0) { p_err("get map by id (%u): %s", id, strerror(errno)); return -1; @@ -1056,16 +1080,18 @@ int map_parse_fds(int *argc, char ***argv, int **fds) } NEXT_ARGP(); - return map_fd_by_name(name, fds); + return map_fd_by_name(name, fds, &opts); } else if (is_prefix(**argv, "pinned")) { char *path; + LIBBPF_OPTS(bpf_obj_get_opts, get_opts); + get_opts.file_flags = open_flags; NEXT_ARGP(); path = **argv; NEXT_ARGP(); - (*fds)[0] = open_obj_pinned_any(path, BPF_OBJ_MAP); + (*fds)[0] = open_obj_pinned_any(path, BPF_OBJ_MAP, &get_opts); if ((*fds)[0] < 0) return -1; return 1; @@ -1075,7 +1101,7 @@ int map_parse_fds(int *argc, char ***argv, int **fds) return -1; } -int map_parse_fd(int *argc, char ***argv) +int map_parse_fd(int *argc, char ***argv, __u32 open_flags) { int *fds = NULL; int nb_fds, fd; @@ -1085,7 +1111,7 @@ int map_parse_fd(int *argc, char ***argv) p_err("mem alloc failed"); return -1; } - nb_fds = map_parse_fds(argc, argv, &fds); + nb_fds = map_parse_fds(argc, argv, &fds, open_flags); if (nb_fds != 1) { if (nb_fds > 1) { p_err("several maps match this handle"); @@ -1103,12 +1129,12 @@ int map_parse_fd(int *argc, char ***argv) } int map_parse_fd_and_info(int *argc, char ***argv, struct bpf_map_info *info, - __u32 *info_len) + __u32 *info_len, __u32 open_flags) { int err; int fd; - fd = map_parse_fd(argc, argv); + fd = map_parse_fd(argc, argv, open_flags); if (fd < 0) return -1; diff --git a/tools/bpf/bpftool/iter.c b/tools/bpf/bpftool/iter.c index 5c39c2ed36a2..df5f0d1e07e8 100644 --- a/tools/bpf/bpftool/iter.c +++ b/tools/bpf/bpftool/iter.c @@ -37,7 +37,7 @@ static int do_pin(int argc, char **argv) return -1; } - map_fd = map_parse_fd(&argc, &argv); + map_fd = map_parse_fd(&argc, &argv, BPF_F_RDONLY); if (map_fd < 0) return -1; diff --git a/tools/bpf/bpftool/link.c b/tools/bpf/bpftool/link.c index 03513ffffb79..a773e05d5ade 100644 --- a/tools/bpf/bpftool/link.c +++ b/tools/bpf/bpftool/link.c @@ -117,7 +117,7 @@ static int link_parse_fd(int *argc, char ***argv) path = **argv; NEXT_ARGP(); - return open_obj_pinned_any(path, BPF_OBJ_LINK); + return open_obj_pinned_any(path, BPF_OBJ_LINK, NULL); } p_err("expected 'id' or 'pinned', got: '%s'?", **argv); diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h index 9eb764fe4cc8..6db704fda5c0 100644 --- a/tools/bpf/bpftool/main.h +++ b/tools/bpf/bpftool/main.h @@ -15,6 +15,7 @@ #include <bpf/hashmap.h> #include <bpf/libbpf.h> +#include <bpf/bpf.h> #include "json_writer.h" @@ -140,8 +141,10 @@ void get_prog_full_name(const struct bpf_prog_info *prog_info, int prog_fd, int get_fd_type(int fd); const char *get_fd_type_name(enum bpf_obj_type type); char *get_fdinfo(int fd, const char *key); -int open_obj_pinned(const char *path, bool quiet); -int open_obj_pinned_any(const char *path, enum bpf_obj_type exp_type); +int open_obj_pinned(const char *path, bool quiet, + const struct bpf_obj_get_opts *opts); +int open_obj_pinned_any(const char *path, enum bpf_obj_type exp_type, + const struct bpf_obj_get_opts *opts); int mount_bpffs_for_file(const char *file_name); int create_and_mount_bpffs_dir(const char *dir_name); int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(int *, char ***)); @@ -167,10 +170,10 @@ int do_iter(int argc, char **argv) __weak; int parse_u32_arg(int *argc, char ***argv, __u32 *val, const char *what); int prog_parse_fd(int *argc, char ***argv); int prog_parse_fds(int *argc, char ***argv, int **fds); -int map_parse_fd(int *argc, char ***argv); -int map_parse_fds(int *argc, char ***argv, int **fds); +int map_parse_fd(int *argc, char ***argv, __u32 open_flags); +int map_parse_fds(int *argc, char ***argv, int **fds, __u32 open_flags); int map_parse_fd_and_info(int *argc, char ***argv, struct bpf_map_info *info, - __u32 *info_len); + __u32 *info_len, __u32 open_flags); struct bpf_prog_linfo; #if defined(HAVE_LLVM_SUPPORT) || defined(HAVE_LIBBFD_SUPPORT) diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c index 81cc668b4b05..c9de44a45778 100644 --- a/tools/bpf/bpftool/map.c +++ b/tools/bpf/bpftool/map.c @@ -337,9 +337,9 @@ static void fill_per_cpu_value(struct bpf_map_info *info, void *value) memcpy(value + i * step, value, info->value_size); } -static int parse_elem(char **argv, struct bpf_map_info *info, - void *key, void *value, __u32 key_size, __u32 value_size, - __u32 *flags, __u32 **value_fd) +static int parse_elem(char **argv, struct bpf_map_info *info, void *key, + void *value, __u32 key_size, __u32 value_size, + __u32 *flags, __u32 **value_fd, __u32 open_flags) { if (!*argv) { if (!key && !value) @@ -362,7 +362,7 @@ static int parse_elem(char **argv, struct bpf_map_info *info, return -1; return parse_elem(argv, info, NULL, value, key_size, value_size, - flags, value_fd); + flags, value_fd, open_flags); } else if (is_prefix(*argv, "value")) { int fd; @@ -388,7 +388,7 @@ static int parse_elem(char **argv, struct bpf_map_info *info, return -1; } - fd = map_parse_fd(&argc, &argv); + fd = map_parse_fd(&argc, &argv, open_flags); if (fd < 0) return -1; @@ -424,7 +424,7 @@ static int parse_elem(char **argv, struct bpf_map_info *info, } return parse_elem(argv, info, key, NULL, key_size, value_size, - flags, NULL); + flags, NULL, open_flags); } else if (is_prefix(*argv, "any") || is_prefix(*argv, "noexist") || is_prefix(*argv, "exist")) { if (!flags) { @@ -440,7 +440,7 @@ static int parse_elem(char **argv, struct bpf_map_info *info, *flags = BPF_EXIST; return parse_elem(argv + 1, info, key, value, key_size, - value_size, NULL, value_fd); + value_size, NULL, value_fd, open_flags); } p_err("expected key or value, got: %s", *argv); @@ -639,7 +639,7 @@ static int do_show_subset(int argc, char **argv) p_err("mem alloc failed"); return -1; } - nb_fds = map_parse_fds(&argc, &argv, &fds); + nb_fds = map_parse_fds(&argc, &argv, &fds, BPF_F_RDONLY); if (nb_fds < 1) goto exit_free; @@ -672,12 +672,15 @@ static int do_show_subset(int argc, char **argv) static int do_show(int argc, char **argv) { + LIBBPF_OPTS(bpf_get_fd_by_id_opts, opts); struct bpf_map_info info = {}; __u32 len = sizeof(info); __u32 id = 0; int err; int fd; + opts.open_flags = BPF_F_RDONLY; + if (show_pinned) { map_table = hashmap__new(hash_fn_for_key_as_id, equal_fn_for_key_as_id, NULL); @@ -707,7 +710,7 @@ static int do_show(int argc, char **argv) break; } - fd = bpf_map_get_fd_by_id(id); + fd = bpf_map_get_fd_by_id_opts(id, &opts); if (fd < 0) { if (errno == ENOENT) continue; @@ -909,7 +912,7 @@ static int do_dump(int argc, char **argv) p_err("mem alloc failed"); return -1; } - nb_fds = map_parse_fds(&argc, &argv, &fds); + nb_fds = map_parse_fds(&argc, &argv, &fds, BPF_F_RDONLY); if (nb_fds < 1) goto exit_free; @@ -997,7 +1000,7 @@ static int do_update(int argc, char **argv) if (argc < 2) usage(); - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, 0); if (fd < 0) return -1; @@ -1006,7 +1009,7 @@ static int do_update(int argc, char **argv) goto exit_free; err = parse_elem(argv, &info, key, value, info.key_size, - info.value_size, &flags, &value_fd); + info.value_size, &flags, &value_fd, 0); if (err) goto exit_free; @@ -1076,7 +1079,7 @@ static int do_lookup(int argc, char **argv) if (argc < 2) usage(); - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, BPF_F_RDONLY); if (fd < 0) return -1; @@ -1084,7 +1087,8 @@ static int do_lookup(int argc, char **argv) if (err) goto exit_free; - err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, NULL); + err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, NULL, + BPF_F_RDONLY); if (err) goto exit_free; @@ -1127,7 +1131,7 @@ static int do_getnext(int argc, char **argv) if (argc < 2) usage(); - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, BPF_F_RDONLY); if (fd < 0) return -1; @@ -1140,8 +1144,8 @@ static int do_getnext(int argc, char **argv) } if (argc) { - err = parse_elem(argv, &info, key, NULL, info.key_size, 0, - NULL, NULL); + err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, + NULL, BPF_F_RDONLY); if (err) goto exit_free; } else { @@ -1198,7 +1202,7 @@ static int do_delete(int argc, char **argv) if (argc < 2) usage(); - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, 0); if (fd < 0) return -1; @@ -1209,7 +1213,8 @@ static int do_delete(int argc, char **argv) goto exit_free; } - err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, NULL); + err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, NULL, + 0); if (err) goto exit_free; @@ -1226,11 +1231,16 @@ static int do_delete(int argc, char **argv) return err; } +static int map_parse_read_only_fd(int *argc, char ***argv) +{ + return map_parse_fd(argc, argv, BPF_F_RDONLY); +} + static int do_pin(int argc, char **argv) { int err; - err = do_pin_any(argc, argv, map_parse_fd); + err = do_pin_any(argc, argv, map_parse_read_only_fd); if (!err && json_output) jsonw_null(json_wtr); return err; @@ -1319,7 +1329,7 @@ static int do_create(int argc, char **argv) if (!REQ_ARGS(2)) usage(); inner_map_fd = map_parse_fd_and_info(&argc, &argv, - &info, &len); + &info, &len, BPF_F_RDONLY); if (inner_map_fd < 0) return -1; attr.inner_map_fd = inner_map_fd; @@ -1368,7 +1378,7 @@ static int do_pop_dequeue(int argc, char **argv) if (argc < 2) usage(); - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, 0); if (fd < 0) return -1; @@ -1407,7 +1417,7 @@ static int do_freeze(int argc, char **argv) if (!REQ_ARGS(2)) return -1; - fd = map_parse_fd(&argc, &argv); + fd = map_parse_fd(&argc, &argv, 0); if (fd < 0) return -1; diff --git a/tools/bpf/bpftool/map_perf_ring.c b/tools/bpf/bpftool/map_perf_ring.c index 552b4ca40c27..bcb767e2d673 100644 --- a/tools/bpf/bpftool/map_perf_ring.c +++ b/tools/bpf/bpftool/map_perf_ring.c @@ -128,7 +128,8 @@ int do_event_pipe(int argc, char **argv) int err, map_fd; map_info_len = sizeof(map_info); - map_fd = map_parse_fd_and_info(&argc, &argv, &map_info, &map_info_len); + map_fd = map_parse_fd_and_info(&argc, &argv, &map_info, &map_info_len, + 0); if (map_fd < 0) return -1; diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c index 96eea8a67225..deeaa5c1ed7d 100644 --- a/tools/bpf/bpftool/prog.c +++ b/tools/bpf/bpftool/prog.c @@ -1062,7 +1062,7 @@ static int parse_attach_detach_args(int argc, char **argv, int *progfd, if (!REQ_ARGS(2)) return -EINVAL; - *mapfd = map_parse_fd(&argc, &argv); + *mapfd = map_parse_fd(&argc, &argv, 0); if (*mapfd < 0) return *mapfd; @@ -1608,7 +1608,7 @@ static int load_with_options(int argc, char **argv, bool first_prog_only) } NEXT_ARG(); - fd = map_parse_fd(&argc, &argv); + fd = map_parse_fd(&argc, &argv, 0); if (fd < 0) goto err_free_reuse_maps; -- 2.34.1

2 weeks, 5 days

2
3
0 0

[PATCH v19 net-next 0/5] DUALPI2 patch

by chia-yu.chang＠nokia-bell-labs.com

From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com> Hello, Please find the DualPI2 patch v18. This patch serise adds DualPI Improved with a Square (DualPI2) with following features: * Supports congestion controls that comply with the Prague requirements in RFC9331 (e.g. TCP-Prague) * Coupled dual-queue that separates the L4S traffic in a low latency queue (L-queue), without harming remaining traffic that is scheduled in classic queue (C-queue) due to congestion-coupling using PI2 as defined in RFC9332 * Configurable overload strategies * Use of sojourn time to reliably estimate queue delay * Supports ECN L4S-identifier (IP.ECN==0b*1) to classify traffic into respective queues For more details of DualPI2, please refer IETF RFC9332 (https://datatracker.ietf.org/doc/html/rfc9332). Best regards, Chia-Yu --- v19 (14-Jun-2025) - Fix one typo in the comment of #1 (ALOK TIWARI <alok.a.tiwari(a)oracle.com>) - Update commit message of #4 (ALOK TIWARI <alok.a.tiwari(a)oracle.com>) - Wrap long lines of Documentation/netlink/specs/tc.yaml to within 80 characters (Jakub Kicinski <kuba(a)kernel.org>) v18 (13-Jun-2025) - Add the num of enum used by DualPI2 and fix name and name-prefix of DualPI2 enum and attribute - Replace from_timer() with timer_container_of() (Pedro Tammela <pctammela(a)mojatatu.com>) v17 (25-May-2025, Resent at 11-Jun-2025) - Replace 0xffffffff with U32_MAX (Paolo Abeni <pabeni(a)redhat.com>) - Use helper function qdisc_dequeue_internal() and add new helper function skb_apply_step() (Paolo Abeni <pabeni(a)redhat.com>) - Add s64 casting when calculating the delta of the PI controller (Paolo Abeni <pabeni(a)redhat.com>) - Change the drop reason into SKB_DROP_REASON_QDISC_CONGESTED for drop_early (Paolo Abeni <pabeni(a)redhat.com>) - Modify the condition to remove the original skb when enqueuing multiple GSO segments (Paolo Abeni <pabeni(a)redhat.com>) - Add READ_ONCE() in dualpi2_dump_stat() (Paolo Abeni <pabeni(a)redhat.com>) - Add comments, brackets, and brackets for readability (Paolo Abeni <pabeni(a)redhat.com>) v16 (16-MAy-2025) - Add qdisc_lock() to dualpi2_timer() in dualpi2_timer (Paolo Abeni <pabeni(a)redhat.com>) - Introduce convert_ns_to_usec() to convert usec to nsec without overflow in #1 (Paolo Abeni <pabeni(a)redhat.com>) - Update convert_us_tonsec() to convert nsec to usec without overflow in #2 (Paolo Abeni <pabeni(a)redhat.com>) - Add more descriptions with respect to DualPI2 in the cover ltter and add changelog in each patch (Paolo Abeni <pabeni(a)redhat.com>) v15 (09-May-2025) - Add enum of TCA_DUALPI2_ECN_MASK_CLA_ECT to remove potential leakeage in #1 (Simon Horman <horms(a)kernel.org>) - Fix one typo in comment of #2 - Update tc.yaml in #5 to aligh with the updated enum of pkt_sched.h v14 (05-May-2025) - Modify tc.yaml: (1) Replace flags with enum and remove enum-as-flags, (2) Remove credit-queue in xstats, and (3) Change attribute types (Donald Hunter <donald.hun - Add enum and fix the ordering of variables in pkt_sched.h to align with the modified tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Add validators for DROP_OVERLOAD, DROP_EARLY, ECN_MASK, and SPLIT_GSO in sch_dualpi2.c (Donald Hunter <donald.hunter(a)gmail.com>) - Update dualpi2.json to align with the updated variable order in pkt_sched.h - Reorder patches (Donald Hunter <donald.hunter(a)gmail.com>) v13 (26-Apr-2025) - Use dashes in member names to follow YNL conventions in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Define enumerations separately for flags of drop-early, drop-overload, ecn-mask, credit-queue in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Change the types of split-gso and step-packets into flag in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Revert to u32/u8 types for tc-dualpi2-xstats members in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Add new test cases in tc-tests/qdiscs/dualpi2.json to cover all dualpi2 parameters (Donald Hunter <donald.hunter(a)gmail.com>) - Change the type of TCA_DUALPI2_STEP_PACKETS into NLA_FLAG (Donald Hunter <donald.hunter(a)gmail.com>) v12 (22-Apr-2025) - Remove anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>) - Replace u32/u8 with uint and s32 with int in tc spec document (Paolo Abeni <pabeni(a)redhat.com>) - Introduce get_memory_limit function to handle potential overflow when multipling limit with MTU (Paolo Abeni <pabeni(a)redhat.com>) - Double the packet length to further include packet overhead in memory_limit (Paolo Abeni <pabeni(a)redhat.com>) - Remove the check of qdisc_qlen(sch) when calling qdisc_tree_reduce_backlog (Paolo Abeni <pabeni(a)redhat.com>) v11 (15-Apr-2025) - Replace hstimer_init with hstimer_setup in sch_dualpi2.c v10 (25-Mar-2025) - Remove leftover include in include/linux/netdevice.h and anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>) - Use kfree_skb_reason() and add SKB_DROP_REASON_DUALPI2_STEP_DROP drop reason (Paolo Abeni <pabeni(a)redhat.com>) - Split sch_dualpi2.c into 3 patches (and overall 5 patches): Struct definition & parsing, Dump stats & configuration, Enqueue/Dequeue (Paolo Abeni <pabeni(a)redhat.com>) v9 (16-Mar-2025) - Fix mem_usage error in previous version - Add min_qlen_step to the dualpi2 attribute as the minimum queue length in number of packets in the L-queue to start step threshold marking. In previous versions, this value was fixed to 2, so the step threshold was applied to mark packets in the L queue only when the queue length of the L queue was greater than or equal to 2 packets. This will cause larger queuing delays for L4S traffic at low rates (<20Mbps). So we parameterize it and change the default value to 0. Comparison of tcp_1down run 'HTB 20Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 11.55 11.70 ms 350 TCP upload avg : 18.96 N/A Mbits/s 350 TCP upload sum : 18.96 N/A Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 10.81 10.70 ms 350 TCP upload avg : 18.91 N/A Mbits/s 350 TCP upload sum : 18.91 N/A Mbits/s 350 Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 12.61 12.80 ms 350 TCP upload avg : 9.48 N/A Mbits/s 350 TCP upload sum : 9.48 N/A Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 11.06 10.80 ms 350 TCP upload avg : 9.43 N/A Mbits/s 350 TCP upload sum : 9.43 N/A Mbits/s 350 Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 40.86 37.45 ms 350 TCP upload avg : 0.88 N/A Mbits/s 350 TCP upload sum : 0.88 N/A Mbits/s 350 TCP upload::1 : 0.88 0.97 Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 11.07 10.40 ms 350 TCP upload avg : 0.55 N/A Mbits/s 350 TCP upload sum : 0.55 N/A Mbits/s 350 TCP upload::1 : 0.55 0.59 Mbits/s 350 v8 (11-Mar-2025) - Fix warning messages in v7 v7 (07-Mar-2025) - Separate into 3 patches to avoid mixing changes of documentation, selftest, and code. (Cong Wang <xiyou.wangcong(a)gmail.com>) v6 (04-Mar-2025) - Add modprobe for dulapi2 in tc-testing script tc-testing/tdc.sh (Jakub Kicinski <kuba(a)kernel.org>) - Update test cases in dualpi2.json - Update commit message v5 (22-Feb-2025) - A comparison was done between MQ + DUALPI2, MQ + FQ_PIE, MQ + FQ_CODEL: Unshaped 1gigE with 4 download streams test: - Summary of tcp_4down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 1.19 1.34 ms 349 TCP download avg : 235.42 N/A Mbits/s 349 TCP download sum : 941.68 N/A Mbits/s 349 TCP download::1 : 235.19 235.39 Mbits/s 349 TCP download::2 : 235.03 235.35 Mbits/s 349 TCP download::3 : 236.89 235.44 Mbits/s 349 TCP download::4 : 234.57 235.19 Mbits/s 349 - Summary of tcp_4down run 'MQ + FQ_PIE' avg median # data pts Ping (ms) ICMP : 1.21 1.37 ms 350 TCP download avg : 235.42 N/A Mbits/s 350 TCP download sum : 941.61 N/A Mbits/s 350 TCP download::1 : 232.54 233.13 Mbits/s 350 TCP download::2 : 232.52 232.80 Mbits/s 350 TCP download::3 : 233.14 233.78 Mbits/s 350 TCP download::4 : 243.41 241.48 Mbits/s 350 - Summary of tcp_4down run 'MQ + DUALPI2' avg median # data pts Ping (ms) ICMP : 1.19 1.34 ms 349 TCP download avg : 235.42 N/A Mbits/s 349 TCP download sum : 941.68 N/A Mbits/s 349 TCP download::1 : 235.19 235.39 Mbits/s 349 TCP download::2 : 235.03 235.35 Mbits/s 349 TCP download::3 : 236.89 235.44 Mbits/s 349 TCP download::4 : 234.57 235.19 Mbits/s 349 Unshaped 1gigE with 128 download streams test: - Summary of tcp_128down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 Unshaped 10gigE with 4 download streams test: - Summary of tcp_4down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 0.22 0.23 ms 350 TCP download avg : 2354.08 N/A Mbits/s 350 TCP download sum : 9416.31 N/A Mbits/s 350 TCP download::1 : 2353.65 2352.81 Mbits/s 350 TCP download::2 : 2354.54 2354.21 Mbits/s 350 TCP download::3 : 2353.56 2353.78 Mbits/s 350 TCP download::4 : 2354.56 2354.45 Mbits/s 350 - Summary of tcp_4down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 0.20 0.19 ms 350 TCP download avg : 2354.76 N/A Mbits/s 350 TCP download sum : 9419.04 N/A Mbits/s 350 TCP download::1 : 2354.77 2353.89 Mbits/s 350 TCP download::2 : 2353.41 2354.29 Mbits/s 350 TCP download::3 : 2356.18 2354.19 Mbits/s 350 TCP download::4 : 2354.68 2353.15 Mbits/s 350 - Summary of tcp_4down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 0.24 0.24 ms 350 TCP download avg : 2354.11 N/A Mbits/s 350 TCP download sum : 9416.43 N/A Mbits/s 350 TCP download::1 : 2354.75 2353.93 Mbits/s 350 TCP download::2 : 2353.15 2353.75 Mbits/s 350 TCP download::3 : 2353.49 2353.72 Mbits/s 350 TCP download::4 : 2355.04 2353.73 Mbits/s 350 Unshaped 10gigE with 128 download streams test: - Summary of tcp_128down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 7.57 8.69 ms 350 TCP download avg : 73.97 N/A Mbits/s 350 TCP download sum : 9467.82 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 7.82 8.91 ms 350 TCP download avg : 73.97 N/A Mbits/s 350 TCP download sum : 9468.42 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 6.87 7.93 ms 350 TCP download avg : 73.95 N/A Mbits/s 350 TCP download sum : 9465.87 N/A Mbits/s 350 From the results shown above, we see small differences between combinations. - Update commit message to include results of no_split_gso and split_gso (Dave Taht <dave.taht(a)gmail.com> and Paolo Abeni <pabeni(a)redhat.com>) - Add memlimit in the dualpi2 attribute, and add memory_used, max_memory_used, memory_limit in dualpi2 stats (Dave Taht <dave.taht(a)gmail.com>) - Update note in sch_dualpi2.c related to BBRv3 status (Dave Taht <dave.taht(a)gmail.com>) - Update license identifier (Dave Taht <dave.taht(a)gmail.com>) - Add selftest in tools/testing/selftests/tc-testing (Cong Wang <xiyou.wangcong(a)gmail.com>) - Use netlink policies for parameter checks (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Modify texts & fix typos in Documentation/netlink/specs/tc.yaml (Dave Taht <dave.taht(a)gmail.com>) - Add descriptions of packet counter statistics and the reset function of sch_dualpi2.c - Fix step_thresh in packets - Update code comments in sch_dualpi2.c v4 (22-Oct-2024) - Update statement in Kconfig for DualPI2 (Stephen Hemminger <stephen(a)networkplumber.org>) - Put a blank line after #define in sch_dualpi2.c (Stephen Hemminger <stephen(a)networkplumber.org>) - Fix line length warning. v3 (19-Oct-2024) - Fix compilaiton error - Update Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>) v2 (18-Oct-2024) - Add Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>) - Use dualpi2 instead of skb prefix (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Replace nla_parse_nested_deprecated with nla_parse_nested (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Fix line length warning --- Chia-Yu Chang (4): sched: Struct definition and parsing of dualpi2 qdisc sched: Dump configuration and statistics of dualpi2 qdisc selftests/tc-testing: Add selftests for qdisc DualPI2 Documentation: netlink: specs: tc: Add DualPI2 specification Koen De Schepper (1): sched: Add enqueue/dequeue of dualpi2 qdisc Documentation/netlink/specs/tc.yaml | 166 +++ include/net/dropreason-core.h | 6 + include/uapi/linux/pkt_sched.h | 70 +- net/sched/Kconfig | 12 + net/sched/Makefile | 1 + net/sched/sch_dualpi2.c | 1146 +++++++++++++++++ tools/testing/selftests/tc-testing/config | 1 + .../tc-testing/tc-tests/qdiscs/dualpi2.json | 254 ++++ tools/testing/selftests/tc-testing/tdc.sh | 1 + 9 files changed, 1656 insertions(+), 1 deletion(-) create mode 100644 net/sched/sch_dualpi2.c create mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/dualpi2.json -- 2.34.1

2 weeks, 5 days

2
6
0 0

[PATCH 0/1] selftests/memfd: skip hugetlbfs test if not supported

by Po-Hsu Lin

Check hugetlbfs support before starting tests in run_hugetlbfs_test.sh. Otherwise on a system that does not support hugetlbfs the free huge pages availability check will fail with: ./run_hugetlbfs_test.sh: line 47: [: -lt: unary operator expected ./run_hugetlbfs_test.sh: line 60: 12577 Aborted (core dumped) ./memfd_test hugetlbfs Aborted (core dumped) And it will left a fuse_mnt process behind, which may cause some unexpected issues. Patch tested with a kernel that does not have hugetlbfs support enabled and the test was skipped as expected. Po-Hsu Lin (1): selftests/memfd: skip hugetlbfs test if not supported tools/testing/selftests/memfd/run_hugetlbfs_test.sh | 5 +++++ 1 file changed, 5 insertions(+) -- 2.34.1

2 weeks, 5 days

1
1
0 0

[PATCH v11 0/6] rust: reduce `as` casts, enable related lints

by Tamir Duberstein

This started with a patch that enabled `clippy::ptr_as_ptr`. Benno Lossin suggested I also look into `clippy::ptr_cast_constness` and I discovered `clippy::as_ptr_cast_mut`. This series now enables all 3 lints. It also enables `clippy::as_underscore` which ensures other pointer casts weren't missed. As a later addition, `clippy::cast_lossless` and `clippy::ref_as_ptr` are also enabled. This series depends on "rust: retain pointer mut-ness in `container_of!`"[1]. Link: https://lore.kernel.org/all/20250409-container-of-mutness-v1-1-64f472b94534… [1] Signed-off-by: Tamir Duberstein <tamird(a)gmail.com> --- Changes in v11: - Rebase on v6.16-rc1. - Replace some `as <integer>` with `as bindings::T` and others with `as ffi::T`. (Miguel Ojeda) - Revert explicit `ffi::c_void` import which is in the prelude. (Miguel Ojeda) - Link to v10: https://lore.kernel.org/r/20250418-ptr-as-ptr-v10-0-3d63d27907aa@gmail.com Changes in v10: - Move fragment from "rust: enable `clippy::ptr_cast_constness` lint" to "rust: enable `clippy::ptr_as_ptr` lint". (Boqun Feng) - Replace `(...).into()` with `T::from(...)` where the destination type isn't obvious in "rust: enable `clippy::cast_lossless` lint". (Boqun Feng) - Link to v9: https://lore.kernel.org/r/20250416-ptr-as-ptr-v9-0-18ec29b1b1f3@gmail.com Changes in v9: - Replace ref-to-ptr coercion using `let` bindings with `core::ptr::from_{ref,mut}`. (Boqun Feng). - Link to v8: https://lore.kernel.org/r/20250409-ptr-as-ptr-v8-0-3738061534ef@gmail.com Changes in v8: - Use coercion to go ref -> ptr. - rustfmt. - Rebase on v6.15-rc1. - Extract first commit to its own series as it is shared with other series. - Link to v7: https://lore.kernel.org/r/20250325-ptr-as-ptr-v7-0-87ab452147b9@gmail.com Changes in v7: - Add patch to enable `clippy::ref_as_ptr`. - Link to v6: https://lore.kernel.org/r/20250324-ptr-as-ptr-v6-0-49d1b7fd4290@gmail.com Changes in v6: - Drop strict provenance patch. - Fix URLs in doc comments. - Add patch to enable `clippy::cast_lossless`. - Rebase on rust-next. - Link to v5: https://lore.kernel.org/r/20250317-ptr-as-ptr-v5-0-5b5f21fa230a@gmail.com Changes in v5: - Use `pointer::addr` in OF. (Boqun Feng) - Add documentation on stubs. (Benno Lossin) - Mark stubs `#[inline]`. - Pick up Alice's RB on a shared commit from https://lore.kernel.org/all/Z9f-3Aj3_FWBZRrm@google.com/. - Link to v4: https://lore.kernel.org/r/20250315-ptr-as-ptr-v4-0-b2d72c14dc26@gmail.com Changes in v4: - Add missing SoB. (Benno Lossin) - Use `without_provenance_mut` in alloc. (Boqun Feng) - Limit strict provenance lints to the `kernel` crate to avoid complex logic in the build system. This can be revisited on MSRV >= 1.84.0. - Rebase on rust-next. - Link to v3: https://lore.kernel.org/r/20250314-ptr-as-ptr-v3-0-e7ba61048f4a@gmail.com Changes in v3: - Fixed clippy warning in rust/kernel/firmware.rs. (kernel test robot) Link: https://lore.kernel.org/all/202503120332.YTCpFEvv-lkp@intel.com/ - s/as u64/as bindings::phys_addr_t/g. (Benno Lossin) - Use strict provenance APIs and enable lints. (Benno Lossin) - Link to v2: https://lore.kernel.org/r/20250309-ptr-as-ptr-v2-0-25d60ad922b7@gmail.com Changes in v2: - Fixed typo in first commit message. - Added additional patches, converted to series. - Link to v1: https://lore.kernel.org/r/20250307-ptr-as-ptr-v1-1-582d06514c98@gmail.com --- Tamir Duberstein (6): rust: enable `clippy::ptr_as_ptr` lint rust: enable `clippy::ptr_cast_constness` lint rust: enable `clippy::as_ptr_cast_mut` lint rust: enable `clippy::as_underscore` lint rust: enable `clippy::cast_lossless` lint rust: enable `clippy::ref_as_ptr` lint Makefile | 6 ++++ drivers/gpu/drm/drm_panic_qr.rs | 4 +-- rust/bindings/lib.rs | 3 ++ rust/kernel/alloc/allocator_test.rs | 2 +- rust/kernel/alloc/kvec.rs | 4 +-- rust/kernel/block/mq/operations.rs | 2 +- rust/kernel/block/mq/request.rs | 11 +++++-- rust/kernel/device.rs | 4 +-- rust/kernel/device_id.rs | 4 +-- rust/kernel/devres.rs | 17 +++++------ rust/kernel/dma.rs | 6 ++-- rust/kernel/drm/device.rs | 6 ++-- rust/kernel/error.rs | 2 +- rust/kernel/firmware.rs | 3 +- rust/kernel/fs/file.rs | 2 +- rust/kernel/io.rs | 18 ++++++------ rust/kernel/kunit.rs | 11 ++++--- rust/kernel/list/impl_list_item_mod.rs | 2 +- rust/kernel/miscdevice.rs | 2 +- rust/kernel/mm/virt.rs | 52 +++++++++++++++++----------------- rust/kernel/net/phy.rs | 4 +-- rust/kernel/of.rs | 6 ++-- rust/kernel/pci.rs | 11 ++++--- rust/kernel/platform.rs | 4 ++- rust/kernel/print.rs | 6 ++-- rust/kernel/seq_file.rs | 2 +- rust/kernel/str.rs | 14 ++++----- rust/kernel/sync/poll.rs | 2 +- rust/kernel/time/hrtimer/pin.rs | 2 +- rust/kernel/time/hrtimer/pin_mut.rs | 2 +- rust/kernel/uaccess.rs | 4 +-- rust/kernel/workqueue.rs | 8 +++--- rust/uapi/lib.rs | 3 ++ 33 files changed, 128 insertions(+), 101 deletions(-) --- base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494 change-id: 20250307-ptr-as-ptr-21b1867fc4d4 Best regards, -- Tamir Duberstein <tamird(a)gmail.com>

2 weeks, 6 days

2
8
0 0

[PATCH bpf 0/7] bpf: deny trampoline attachment if args can not be located exactly on stack

by Alexis Lothoré (eBPF Foundation)

Hello, this series follows some discussions started in [1] around bpf trampolines limitations on specific cases. When a trampoline is generated for a target function involving many arguments, it has to properly find and save the arguments that has been passed through stack. While this is doable with basic types (eg: scalars), it brings more uncertainty when dealing with specific types like structs (many ABIs allow to pass structures by value if they fit in a register or a pair of registers). The issue is that those structures layout and location on the stack can be altered (ie with attributes, like packed or aligned(x)), and this kind of alteration is not encoded in dwarf or BTF, making the trampolines clueless about the needed adjustments. Rather than trying to support this specific case, as agreed in [2], this series aims to properly deny it. It targets all the architectures currently implementing arch_prepare_bpf_trampoline (except aarch64, since it has been handled while adding the support for many args): - x86 - s390 - riscv - powerpc A small validation function is added in the JIT compiler for each of those architectures, ensuring that no argument passed on stack is a struct. If so, the trampoline creation is cancelled. Any check on args already implemented in a JIT comp has been moved in this new function. On top of that, it updates the tracing_struct_many_args test, which now merely checks that this case is indeed denied. [1] https://lore.kernel.org/bpf/20250411-many_args_arm64-v1-0-0a32fe72339e@boot… [2] https://lore.kernel.org/bpf/CAADnVQKr3ftNt1uQVrXBE0a2o37ZYRo2PHqCoHUnw6PE5T… Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com> --- Alexis Lothoré (eBPF Foundation) (7): bpf/x86: use define for max regs count used for arguments bpf/x86: prevent trampoline attachment when args location on stack is uncertain bpf/riscv: prevent trampoline attachment when args location on stack is uncertain bpf/s390: prevent trampoline attachment when args location on stack is uncertain bpf/powerpc64: use define for max regs count used for arguments bpf/powerpc64: prevent trampoline attachment when args location on stack is uncertain selftests/bpf: ensure that functions passing structs on stack can not be hooked arch/powerpc/net/bpf_jit_comp.c | 38 ++++++++++-- arch/riscv/net/bpf_jit_comp64.c | 26 +++++++- arch/s390/net/bpf_jit_comp.c | 33 ++++++++-- arch/x86/net/bpf_jit_comp.c | 50 ++++++++++++---- .../selftests/bpf/prog_tests/tracing_struct.c | 37 +----------- .../selftests/bpf/progs/tracing_struct_many_args.c | 70 ---------------------- .../testing/selftests/bpf/test_kmods/bpf_testmod.c | 43 ++----------- 7 files changed, 129 insertions(+), 168 deletions(-) --- base-commit: c4f4f8da70044d8b28fccf73016b4119f3e2fd50 change-id: 20250609-deny_trampoline_structs_on_stack-5bbc7bc20dd1 Best regards, -- Alexis Lothoré, Bootlin Embedded Linux and Kernel engineering https://bootlin.com

2 weeks, 6 days

4
14
0 0

[PATCH net-next v1 1/4] net: netmem: fix skb_ensure_writable with unreadable skbs

by Mina Almasry

skb_ensure_writable actually makes sure that the header of the skb is writable, and doesn't touch the payload. It doesn't need an skb_frags_readable check. Removing this check restores DSCP functionality with unreadable skbs as it's called from dscp_tg. Fixes: 65249feb6b3d ("net: add support for skbs with unreadable frags") Signed-off-by: Mina Almasry <almasrymina(a)google.com> --- net/core/skbuff.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 85fc82f72d26..d6420b74ea9c 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -6261,9 +6261,6 @@ int skb_ensure_writable(struct sk_buff *skb, unsigned int write_len) if (!pskb_may_pull(skb, write_len)) return -ENOMEM; - if (!skb_frags_readable(skb)) - return -EFAULT; - if (!skb_cloned(skb) || skb_clone_writable(skb, write_len)) return 0; base-commit: 6d4e01d29d87356924f1521ca6df7a364e948f13 -- 2.50.0.rc1.591.g9c95f17f64-goog

3 weeks

2
5
0 0

[PATCH net-next v3] page_pool: import Jesper's page_pool benchmark

by Mina Almasry

From: Jesper Dangaard Brouer <hawk(a)kernel.org> We frequently consult with Jesper's out-of-tree page_pool benchmark to evaluate page_pool changes. Import the benchmark into the upstream linux kernel tree so that (a) we're all running the same version, (b) pave the way for shared improvements, and (c) maybe one day integrate it with nipa, if possible. Import bench_page_pool_simple from commit 35b1716d0c30 ("Add page_bench06_walk_all"), from this repository: https://github.com/netoptimizer/prototype-kernel.git Changes done during upstreaming: - Fix checkpatch issues. - Remove the tasklet logic not needed. - Move under tools/testing - Create ksft for the benchmark. - Changed slightly how the benchmark gets build. Out of tree, time_bench is built as an independent .ko. Here it is included in bench_page_pool.ko Steps to run: ``` mkdir -p /tmp/run-pp-bench make -C ./tools/testing/selftests/net/bench make -C ./tools/testing/selftests/net/bench install INSTALL_PATH=/tmp/run-pp-bench rsync --delete -avz --progress /tmp/run-pp-bench mina@$SERVER:~/ ssh mina@$SERVER << EOF cd ~/run-pp-bench && sudo ./test_bench_page_pool.sh EOF ``` Output: ``` (benchmrk dmesg logs) Fast path results: no-softirq-page_pool01 Per elem: 11 cycles(tsc) 4.368 ns ptr_ring results: no-softirq-page_pool02 Per elem: 527 cycles(tsc) 195.187 ns slow path results: no-softirq-page_pool03 Per elem: 549 cycles(tsc) 203.466 ns ``` Cc: Jesper Dangaard Brouer <hawk(a)kernel.org> Cc: Ilias Apalodimas <ilias.apalodimas(a)linaro.org> Cc: Jakub Kicinski <kuba(a)kernel.org> Cc: Toke Høiland-Jørgensen <toke(a)toke.dk> Signed-off-by: Mina Almasry <almasrymina(a)google.com> Acked-by: Ilias Apalodimas <ilias.apalodimas(a)linaro.org> Signed-off-by: Jesper Dangaard Brouer <hawk(a)kernel.org> --- v3: - Non RFC - Collect Signed-off-by from Jesper and Acked-by Ilias. - Move test_bench_page_pool.sh to address nipa complaint. - Remove `static inline` in .c files to address nipa complaint. v2: - Move under tools/selftests (Jakub) - Create ksft for it. - Remove the tasklet logic no longer needed (Jesper + Toke) RFC discussion points: - Desirable to import it? - Can the benchmark be imported as-is for an initial version? Or needs lots of modifications? - Code location. I retained the location in Jesper's tree, but a path like net/core/bench/ may make more sense. --- tools/testing/selftests/net/bench/Makefile | 7 + .../selftests/net/bench/page_pool/Makefile | 17 + .../bench/page_pool/bench_page_pool_simple.c | 274 ++++++++++++ .../net/bench/page_pool/time_bench.c | 406 ++++++++++++++++++ .../net/bench/page_pool/time_bench.h | 259 +++++++++++ .../net/bench/test_bench_page_pool.sh | 32 ++ 6 files changed, 995 insertions(+) create mode 100644 tools/testing/selftests/net/bench/Makefile create mode 100644 tools/testing/selftests/net/bench/page_pool/Makefile create mode 100644 tools/testing/selftests/net/bench/page_pool/bench_page_pool_simple.c create mode 100644 tools/testing/selftests/net/bench/page_pool/time_bench.c create mode 100644 tools/testing/selftests/net/bench/page_pool/time_bench.h create mode 100755 tools/testing/selftests/net/bench/test_bench_page_pool.sh diff --git a/tools/testing/selftests/net/bench/Makefile b/tools/testing/selftests/net/bench/Makefile new file mode 100644 index 000000000000..2546c45e42f7 --- /dev/null +++ b/tools/testing/selftests/net/bench/Makefile @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0 + +TEST_GEN_MODS_DIR := page_pool + +TEST_PROGS += test_bench_page_pool.sh + +include ../../lib.mk diff --git a/tools/testing/selftests/net/bench/page_pool/Makefile b/tools/testing/selftests/net/bench/page_pool/Makefile new file mode 100644 index 000000000000..0549a16ba275 --- /dev/null +++ b/tools/testing/selftests/net/bench/page_pool/Makefile @@ -0,0 +1,17 @@ +BENCH_PAGE_POOL_SIMPLE_TEST_DIR := $(realpath $(dir $(abspath $(lastword $(MAKEFILE_LIST))))) +KDIR ?= /lib/modules/$(shell uname -r)/build + +ifeq ($(V),1) +Q = +else +Q = @ +endif + +obj-m += bench_page_pool.o +bench_page_pool-y += bench_page_pool_simple.o time_bench.o + +all: + +$(Q)make -C $(KDIR) M=$(BENCH_PAGE_POOL_SIMPLE_TEST_DIR) modules + +clean: + +$(Q)make -C $(KDIR) M=$(BENCH_PAGE_POOL_SIMPLE_TEST_DIR) clean diff --git a/tools/testing/selftests/net/bench/page_pool/bench_page_pool_simple.c b/tools/testing/selftests/net/bench/page_pool/bench_page_pool_simple.c new file mode 100644 index 000000000000..44355043b632 --- /dev/null +++ b/tools/testing/selftests/net/bench/page_pool/bench_page_pool_simple.c @@ -0,0 +1,274 @@ +/* + * Benchmark module for page_pool. + * + */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/module.h> +#include <linux/mutex.h> + +#include <linux/version.h> +#include <net/page_pool/helpers.h> + +#include <linux/interrupt.h> +#include <linux/limits.h> + +#include "time_bench.h" + +static int verbose = 1; +#define MY_POOL_SIZE 1024 + +static void _page_pool_put_page(struct page_pool *pool, struct page *page, + bool allow_direct) +{ + page_pool_put_page(pool, page, -1, allow_direct); +} + +/* Makes tests selectable. Useful for perf-record to analyze a single test. + * Hint: Bash shells support writing binary number like: $((2#101010) + * + * # modprobe bench_page_pool_simple run_flags=$((2#100)) + */ +static unsigned long run_flags = 0xFFFFFFFF; +module_param(run_flags, ulong, 0); +MODULE_PARM_DESC(run_flags, "Limit which bench test that runs"); +/* Count the bit number from the enum */ +enum benchmark_bit { + bit_run_bench_baseline, + bit_run_bench_no_softirq01, + bit_run_bench_no_softirq02, + bit_run_bench_no_softirq03, +}; +#define bit(b) (1 << (b)) +#define enabled(b) ((run_flags & (bit(b)))) + +/* notice time_bench is limited to U32_MAX nr loops */ +static unsigned long loops = 10000000; +module_param(loops, ulong, 0); +MODULE_PARM_DESC(loops, "Specify loops bench will run"); + +/* Timing at the nanosec level, we need to know the overhead + * introduced by the for loop itself */ +static int time_bench_for_loop(struct time_bench_record *rec, void *data) +{ + uint64_t loops_cnt = 0; + int i; + + time_bench_start(rec); + /** Loop to measure **/ + for (i = 0; i < rec->loops; i++) { + loops_cnt++; + barrier(); /* avoid compiler to optimize this loop */ + } + time_bench_stop(rec, loops_cnt); + return loops_cnt; +} + +static int time_bench_atomic_inc(struct time_bench_record *rec, void *data) +{ + uint64_t loops_cnt = 0; + atomic_t cnt; + int i; + + atomic_set(&cnt, 0); + + time_bench_start(rec); + /** Loop to measure **/ + for (i = 0; i < rec->loops; i++) { + atomic_inc(&cnt); + barrier(); /* avoid compiler to optimize this loop */ + } + loops_cnt = atomic_read(&cnt); + time_bench_stop(rec, loops_cnt); + return loops_cnt; +} + +/* The ptr_ping in page_pool uses a spinlock. We need to know the minimum + * overhead of taking+releasing a spinlock, to know the cycles that can be saved + * by e.g. amortizing this via bulking. + */ +static int time_bench_lock(struct time_bench_record *rec, void *data) +{ + uint64_t loops_cnt = 0; + spinlock_t lock; + int i; + + spin_lock_init(&lock); + + time_bench_start(rec); + /** Loop to measure **/ + for (i = 0; i < rec->loops; i++) { + spin_lock(&lock); + loops_cnt++; + barrier(); /* avoid compiler to optimize this loop */ + spin_unlock(&lock); + } + time_bench_stop(rec, loops_cnt); + return loops_cnt; +} + +/* Helper for filling some page's into ptr_ring */ +static void pp_fill_ptr_ring(struct page_pool *pp, int elems) +{ + gfp_t gfp_mask = GFP_ATOMIC; /* GFP_ATOMIC needed when under run softirq */ + struct page **array; + int i; + + array = kzalloc(sizeof(struct page *) * elems, gfp_mask); + + for (i = 0; i < elems; i++) { + array[i] = page_pool_alloc_pages(pp, gfp_mask); + } + for (i = 0; i < elems; i++) { + _page_pool_put_page(pp, array[i], false); + } + + kfree(array); +} + +enum test_type { type_fast_path, type_ptr_ring, type_page_allocator }; + +/* Depends on compile optimizing this function */ +static int time_bench_page_pool(struct time_bench_record *rec, void *data, + enum test_type type, const char *func) +{ + uint64_t loops_cnt = 0; + gfp_t gfp_mask = GFP_ATOMIC; /* GFP_ATOMIC is not really needed */ + int i, err; + + struct page_pool *pp; + struct page *page; + + struct page_pool_params pp_params = { + .order = 0, + .flags = 0, + .pool_size = MY_POOL_SIZE, + .nid = NUMA_NO_NODE, + .dev = NULL, /* Only use for DMA mapping */ + .dma_dir = DMA_BIDIRECTIONAL, + }; + + pp = page_pool_create(&pp_params); + if (IS_ERR(pp)) { + err = PTR_ERR(pp); + pr_warn("%s: Error(%d) creating page_pool\n", func, err); + goto out; + } + pp_fill_ptr_ring(pp, 64); + + if (in_serving_softirq()) + pr_warn("%s(): in_serving_softirq fast-path\n", func); + else + pr_warn("%s(): Cannot use page_pool fast-path\n", func); + + time_bench_start(rec); + /** Loop to measure **/ + for (i = 0; i < rec->loops; i++) { + /* Common fast-path alloc, that depend on in_serving_softirq() */ + page = page_pool_alloc_pages(pp, gfp_mask); + if (!page) + break; + loops_cnt++; + barrier(); /* avoid compiler to optimize this loop */ + + /* The benchmarks purpose it to test different return paths. + * Compiler should inline optimize other function calls out + */ + if (type == type_fast_path) { + /* Fast-path recycling e.g. XDP_DROP use-case */ + page_pool_recycle_direct(pp, page); + + } else if (type == type_ptr_ring) { + /* Normal return path */ + _page_pool_put_page(pp, page, false); + + } else if (type == type_page_allocator) { + /* Test if not pages are recycled, but instead + * returned back into systems page allocator + */ + get_page(page); /* cause no-recycling */ + _page_pool_put_page(pp, page, false); + put_page(page); + } else { + BUILD_BUG(); + } + } + time_bench_stop(rec, loops_cnt); +out: + page_pool_destroy(pp); + return loops_cnt; +} + +static int time_bench_page_pool01_fast_path(struct time_bench_record *rec, + void *data) +{ + return time_bench_page_pool(rec, data, type_fast_path, __func__); +} + +static int time_bench_page_pool02_ptr_ring(struct time_bench_record *rec, + void *data) +{ + return time_bench_page_pool(rec, data, type_ptr_ring, __func__); +} + +static int time_bench_page_pool03_slow(struct time_bench_record *rec, + void *data) +{ + return time_bench_page_pool(rec, data, type_page_allocator, __func__); +} + +static int run_benchmark_tests(void) +{ + uint32_t nr_loops = loops; + int passed_count = 0; + + /* Baseline tests */ + if (enabled(bit_run_bench_baseline)) { + time_bench_loop(nr_loops * 10, 0, "for_loop", NULL, + time_bench_for_loop); + time_bench_loop(nr_loops * 10, 0, "atomic_inc", NULL, + time_bench_atomic_inc); + time_bench_loop(nr_loops, 0, "lock", NULL, time_bench_lock); + } + + /* This test cannot activate correct code path, due to no-softirq ctx */ + if (enabled(bit_run_bench_no_softirq01)) + time_bench_loop(nr_loops, 0, "no-softirq-page_pool01", NULL, + time_bench_page_pool01_fast_path); + if (enabled(bit_run_bench_no_softirq02)) + time_bench_loop(nr_loops, 0, "no-softirq-page_pool02", NULL, + time_bench_page_pool02_ptr_ring); + if (enabled(bit_run_bench_no_softirq03)) + time_bench_loop(nr_loops, 0, "no-softirq-page_pool03", NULL, + time_bench_page_pool03_slow); + + return passed_count; +} + +static int __init bench_page_pool_simple_module_init(void) +{ + if (verbose) + pr_info("Loaded\n"); + + if (loops > U32_MAX) { + pr_err("Module param loops(%lu) exceeded U32_MAX(%u)\n", loops, + U32_MAX); + return -ECHRNG; + } + + run_benchmark_tests(); + + return 0; +} +module_init(bench_page_pool_simple_module_init); + +static void __exit bench_page_pool_simple_module_exit(void) +{ + if (verbose) + pr_info("Unloaded\n"); +} +module_exit(bench_page_pool_simple_module_exit); + +MODULE_DESCRIPTION("Benchmark of page_pool simple cases"); +MODULE_AUTHOR("Jesper Dangaard Brouer <netoptimizer(a)brouer.com>"); +MODULE_LICENSE("GPL"); diff --git a/tools/testing/selftests/net/bench/page_pool/time_bench.c b/tools/testing/selftests/net/bench/page_pool/time_bench.c new file mode 100644 index 000000000000..257b1515c64e --- /dev/null +++ b/tools/testing/selftests/net/bench/page_pool/time_bench.c @@ -0,0 +1,406 @@ +/* + * Benchmarking code execution time inside the kernel + * + * Copyright (C) 2014, Red Hat, Inc., Jesper Dangaard Brouer + * for licensing details see kernel-base/COPYING + */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/module.h> +#include <linux/time.h> + +#include <linux/perf_event.h> /* perf_event_create_kernel_counter() */ + +/* For concurrency testing */ +#include <linux/completion.h> +#include <linux/sched.h> +#include <linux/workqueue.h> +#include <linux/kthread.h> + +#include "time_bench.h" + +static int verbose = 1; + +/** TSC (Time-Stamp Counter) based ** + * See: linux/time_bench.h + * tsc_start_clock() and tsc_stop_clock() + */ + +/** Wall-clock based ** + */ + +/** PMU (Performance Monitor Unit) based ** + */ +#define PERF_FORMAT \ + (PERF_FORMAT_GROUP | PERF_FORMAT_ID | PERF_FORMAT_TOTAL_TIME_ENABLED | \ + PERF_FORMAT_TOTAL_TIME_RUNNING) + +struct raw_perf_event { + uint64_t config; /* event */ + uint64_t config1; /* umask */ + struct perf_event *save; + char *desc; +}; + +/* if HT is enable a maximum of 4 events (5 if one is instructions + * retired can be specified, if HT is disabled a maximum of 8 (9 if + * one is instructions retired) can be specified. + * + * From Table 19-1. Architectural Performance Events + * Architectures Software Developer’s Manual Volume 3: System Programming Guide + */ +struct raw_perf_event perf_events[] = { + { 0x3c, 0x00, NULL, "Unhalted CPU Cycles" }, + { 0xc0, 0x00, NULL, "Instruction Retired" } +}; + +#define NUM_EVTS (sizeof(perf_events) / sizeof(struct raw_perf_event)) + +/* WARNING: PMU config is currently broken! + */ +bool time_bench_PMU_config(bool enable) +{ + int i; + struct perf_event_attr perf_conf; + struct perf_event *perf_event; + int cpu; + + preempt_disable(); + cpu = smp_processor_id(); + pr_info("DEBUG: cpu:%d\n", cpu); + preempt_enable(); + + memset(&perf_conf, 0, sizeof(struct perf_event_attr)); + perf_conf.type = PERF_TYPE_RAW; + perf_conf.size = sizeof(struct perf_event_attr); + perf_conf.read_format = PERF_FORMAT; + perf_conf.pinned = 1; + perf_conf.exclude_user = 1; /* No userspace events */ + perf_conf.exclude_kernel = 0; /* Only kernel events */ + + for (i = 0; i < NUM_EVTS; i++) { + perf_conf.disabled = enable; + //perf_conf.disabled = (i == 0) ? 1 : 0; + perf_conf.config = perf_events[i].config; + perf_conf.config1 = perf_events[i].config1; + if (verbose) + pr_info("%s() enable PMU counter: %s\n", + __func__, perf_events[i].desc); + perf_event = perf_event_create_kernel_counter(&perf_conf, cpu, + NULL /* task */, + NULL /* overflow_handler*/, + NULL /* context */); + if (perf_event) { + perf_events[i].save = perf_event; + pr_info("%s():DEBUG perf_event success\n", __func__); + + perf_event_enable(perf_event); + } else { + pr_info("%s():DEBUG perf_event is NULL\n", __func__); + } + } + + return true; +} + +/** Generic functions ** + */ + +/* Calculate stats, store results in record */ +bool time_bench_calc_stats(struct time_bench_record *rec) +{ +#define NANOSEC_PER_SEC 1000000000 /* 10^9 */ + uint64_t ns_per_call_tmp_rem = 0; + uint32_t ns_per_call_remainder = 0; + uint64_t pmc_ipc_tmp_rem = 0; + uint32_t pmc_ipc_remainder = 0; + uint32_t pmc_ipc_div = 0; + uint32_t invoked_cnt_precision = 0; + uint32_t invoked_cnt = 0; /* 32-bit due to div_u64_rem() */ + + if (rec->flags & TIME_BENCH_LOOP) { + if (rec->invoked_cnt < 1000) { + pr_err("ERR: need more(>1000) loops(%llu) for timing\n", + rec->invoked_cnt); + return false; + } + if (rec->invoked_cnt > ((1ULL << 32) - 1)) { + /* div_u64_rem() can only support div with 32bit*/ + pr_err("ERR: Invoke cnt(%llu) too big overflow 32bit\n", + rec->invoked_cnt); + return false; + } + invoked_cnt = (uint32_t)rec->invoked_cnt; + } + + /* TSC (Time-Stamp Counter) records */ + if (rec->flags & TIME_BENCH_TSC) { + rec->tsc_interval = rec->tsc_stop - rec->tsc_start; + if (rec->tsc_interval == 0) { + pr_err("ABORT: timing took ZERO TSC time\n"); + return false; + } + /* Calculate stats */ + if (rec->flags & TIME_BENCH_LOOP) + rec->tsc_cycles = rec->tsc_interval / invoked_cnt; + else + rec->tsc_cycles = rec->tsc_interval; + } + + /* Wall-clock time calc */ + if (rec->flags & TIME_BENCH_WALLCLOCK) { + rec->time_start = rec->ts_start.tv_nsec + + (NANOSEC_PER_SEC * rec->ts_start.tv_sec); + rec->time_stop = rec->ts_stop.tv_nsec + + (NANOSEC_PER_SEC * rec->ts_stop.tv_sec); + rec->time_interval = rec->time_stop - rec->time_start; + if (rec->time_interval == 0) { + pr_err("ABORT: timing took ZERO wallclock time\n"); + return false; + } + /* Calculate stats */ + /*** Division in kernel it tricky ***/ + /* Orig: time_sec = (time_interval / NANOSEC_PER_SEC); */ + /* remainder only correct because NANOSEC_PER_SEC is 10^9 */ + rec->time_sec = div_u64_rem(rec->time_interval, NANOSEC_PER_SEC, + &rec->time_sec_remainder); + //TODO: use existing struct timespec records instead of div? + + if (rec->flags & TIME_BENCH_LOOP) { + /*** Division in kernel it tricky ***/ + /* Orig: ns = ((double)time_interval / invoked_cnt); */ + /* First get quotient */ + rec->ns_per_call_quotient = + div_u64_rem(rec->time_interval, invoked_cnt, + &ns_per_call_remainder); + /* Now get decimals .xxx precision (incorrect roundup)*/ + ns_per_call_tmp_rem = ns_per_call_remainder; + invoked_cnt_precision = invoked_cnt / 1000; + if (invoked_cnt_precision > 0) { + rec->ns_per_call_decimal = + div_u64_rem(ns_per_call_tmp_rem, + invoked_cnt_precision, + &ns_per_call_remainder); + } + } + } + + /* Performance Monitor Unit (PMU) counters */ + if (rec->flags & TIME_BENCH_PMU) { + //FIXME: Overflow handling??? + rec->pmc_inst = rec->pmc_inst_stop - rec->pmc_inst_start; + rec->pmc_clk = rec->pmc_clk_stop - rec->pmc_clk_start; + + /* Calc Instruction Per Cycle (IPC) */ + /* First get quotient */ + rec->pmc_ipc_quotient = div_u64_rem(rec->pmc_inst, rec->pmc_clk, + &pmc_ipc_remainder); + /* Now get decimals .xxx precision (incorrect roundup)*/ + pmc_ipc_tmp_rem = pmc_ipc_remainder; + pmc_ipc_div = rec->pmc_clk / 1000; + if (pmc_ipc_div > 0) { + rec->pmc_ipc_decimal = div_u64_rem(pmc_ipc_tmp_rem, + pmc_ipc_div, + &pmc_ipc_remainder); + } + } + + return true; +} + +/* Generic function for invoking a loop function and calculating + * execution time stats. The function being called/timed is assumed + * to perform a tight loop, and update the timing record struct. + */ +bool time_bench_loop(uint32_t loops, int step, char *txt, void *data, + int (*func)(struct time_bench_record *record, void *data)) +{ + struct time_bench_record rec; + + /* Setup record */ + memset(&rec, 0, sizeof(rec)); /* zero func might not update all */ + rec.version_abi = 1; + rec.loops = loops; + rec.step = step; + rec.flags = (TIME_BENCH_LOOP|TIME_BENCH_TSC|TIME_BENCH_WALLCLOCK); +// rec.flags = (TIME_BENCH_LOOP|TIME_BENCH_TSC| +// TIME_BENCH_WALLCLOCK|TIME_BENCH_PMU); + //TODO: Add/copy txt to rec + + /*** Loop function being timed ***/ + if (!func(&rec, data)) { + pr_err("ABORT: function being timed failed\n"); + return false; + } + + if (rec.invoked_cnt < loops) + pr_warn("WARNING: Invoke count(%llu) smaller than loops(%d)\n", + rec.invoked_cnt, loops); + + /* Calculate stats */ + time_bench_calc_stats(&rec); + + pr_info("Type:%s Per elem: %llu cycles(tsc) %llu.%03llu ns (step:%d)" + " - (measurement period time:%llu.%09u sec time_interval:%llu)" + " - (invoke count:%llu tsc_interval:%llu)\n", + txt, rec.tsc_cycles, + rec.ns_per_call_quotient, rec.ns_per_call_decimal, rec.step, + rec.time_sec, rec.time_sec_remainder, rec.time_interval, + rec.invoked_cnt, rec.tsc_interval); +/* pr_info("DEBUG check is %llu/%llu == %llu.%03llu ?\n", + rec.time_interval, rec.invoked_cnt, + rec.ns_per_call_quotient, rec.ns_per_call_decimal); +*/ + if (rec.flags & TIME_BENCH_PMU) { + pr_info("Type:%s PMU inst/clock" + "%llu/%llu = %llu.%03llu IPC (inst per cycle)\n", + txt, rec.pmc_inst, rec.pmc_clk, + rec.pmc_ipc_quotient, rec.pmc_ipc_decimal); + } + return true; +} + +/* Function getting invoked by kthread */ +static int invoke_test_on_cpu_func(void *private) +{ + struct time_bench_cpu *cpu = private; + struct time_bench_sync *sync = cpu->sync; + cpumask_t newmask = CPU_MASK_NONE; + void *data = cpu->data; + + /* Restrict CPU */ + cpumask_set_cpu(cpu->rec.cpu, &newmask); + set_cpus_allowed_ptr(current, &newmask); + + /* Synchronize start of concurrency test */ + atomic_inc(&sync->nr_tests_running); + wait_for_completion(&sync->start_event); + + /* Start benchmark function */ + if (!cpu->bench_func(&cpu->rec, data)) { + pr_err("ERROR: function being timed failed on CPU:%d(%d)\n", + cpu->rec.cpu, smp_processor_id()); + } else { + if (verbose) + pr_info("SUCCESS: ran on CPU:%d(%d)\n", cpu->rec.cpu, + smp_processor_id()); + } + cpu->did_bench_run = true; + + /* End test */ + atomic_dec(&sync->nr_tests_running); + /* Wait for kthread_stop() telling us to stop */ + while (!kthread_should_stop()) { + set_current_state(TASK_INTERRUPTIBLE); + schedule(); + } + __set_current_state(TASK_RUNNING); + return 0; +} + +void time_bench_print_stats_cpumask(const char *desc, + struct time_bench_cpu *cpu_tasks, + const struct cpumask *mask) +{ + uint64_t average = 0; + int cpu; + int step = 0; + struct sum { + uint64_t tsc_cycles; + int records; + } sum = { 0 }; + + /* Get stats */ + for_each_cpu(cpu, mask) { + struct time_bench_cpu *c = &cpu_tasks[cpu]; + struct time_bench_record *rec = &c->rec; + + /* Calculate stats */ + time_bench_calc_stats(rec); + + pr_info("Type:%s CPU(%d) %llu cycles(tsc) %llu.%03llu ns" + " (step:%d)" + " - (measurement period time:%llu.%09u sec time_interval:%llu)" + " - (invoke count:%llu tsc_interval:%llu)\n", + desc, cpu, rec->tsc_cycles, rec->ns_per_call_quotient, + rec->ns_per_call_decimal, rec->step, rec->time_sec, + rec->time_sec_remainder, rec->time_interval, + rec->invoked_cnt, rec->tsc_interval); + + /* Collect average */ + sum.records++; + sum.tsc_cycles += rec->tsc_cycles; + step = rec->step; + } + + if (sum.records) /* avoid div-by-zero */ + average = sum.tsc_cycles / sum.records; + pr_info("Sum Type:%s Average: %llu cycles(tsc) CPUs:%d step:%d\n", desc, + average, sum.records, step); +} + +void time_bench_run_concurrent( + uint32_t loops, int step, void *data, + const struct cpumask *mask, /* Support masking outsome CPUs*/ + struct time_bench_sync *sync, struct time_bench_cpu *cpu_tasks, + int (*func)(struct time_bench_record *record, void *data)) +{ + int cpu, running = 0; + + if (verbose) // DEBUG + pr_warn("%s() Started on CPU:%d\n", __func__, + smp_processor_id()); + + /* Reset sync conditions */ + atomic_set(&sync->nr_tests_running, 0); + init_completion(&sync->start_event); + + /* Spawn off jobs on all CPUs */ + for_each_cpu(cpu, mask) { + struct time_bench_cpu *c = &cpu_tasks[cpu]; + + running++; + c->sync = sync; /* Send sync variable along */ + c->data = data; /* Send opaque along */ + + /* Init benchmark record */ + memset(&c->rec, 0, sizeof(struct time_bench_record)); + c->rec.version_abi = 1; + c->rec.loops = loops; + c->rec.step = step; + c->rec.flags = (TIME_BENCH_LOOP|TIME_BENCH_TSC| + TIME_BENCH_WALLCLOCK); + c->rec.cpu = cpu; + c->bench_func = func; + c->task = kthread_run(invoke_test_on_cpu_func, c, + "time_bench%d", cpu); + if (IS_ERR(c->task)) { + pr_err("%s(): Failed to start test func\n", __func__); + return; /* Argh, what about cleanup?! */ + } + } + + /* Wait until all processes are running */ + while (atomic_read(&sync->nr_tests_running) < running) { + set_current_state(TASK_UNINTERRUPTIBLE); + schedule_timeout(10); + } + /* Kick off all CPU concurrently on completion event */ + complete_all(&sync->start_event); + + /* Wait for CPUs to finish */ + while (atomic_read(&sync->nr_tests_running)) { + set_current_state(TASK_UNINTERRUPTIBLE); + schedule_timeout(10); + } + + /* Stop the kthreads */ + for_each_cpu(cpu, mask) { + struct time_bench_cpu *c = &cpu_tasks[cpu]; + kthread_stop(c->task); + } + + if (verbose) // DEBUG - happens often, finish on another CPU + pr_warn("%s() Finished on CPU:%d\n", __func__, + smp_processor_id()); +} diff --git a/tools/testing/selftests/net/bench/page_pool/time_bench.h b/tools/testing/selftests/net/bench/page_pool/time_bench.h new file mode 100644 index 000000000000..b0e7139179e5 --- /dev/null +++ b/tools/testing/selftests/net/bench/page_pool/time_bench.h @@ -0,0 +1,259 @@ +/* + * Benchmarking code execution time inside the kernel + * + * Copyright (C) 2014, Red Hat, Inc., Jesper Dangaard Brouer + * for licensing details see kernel-base/COPYING + */ +#ifndef _LINUX_TIME_BENCH_H +#define _LINUX_TIME_BENCH_H + +/* Main structure used for recording a benchmark run */ +struct time_bench_record { + uint32_t version_abi; + uint32_t loops; /* Requested loop invocations */ + uint32_t step; /* option for e.g. bulk invocations */ + + uint32_t flags; /* Measurements types enabled */ +#define TIME_BENCH_LOOP (1<<0) +#define TIME_BENCH_TSC (1<<1) +#define TIME_BENCH_WALLCLOCK (1<<2) +#define TIME_BENCH_PMU (1<<3) + + uint32_t cpu; /* Used when embedded in time_bench_cpu */ + + /* Records */ + uint64_t invoked_cnt; /* Returned actual invocations */ + uint64_t tsc_start; + uint64_t tsc_stop; + struct timespec64 ts_start; + struct timespec64 ts_stop; + /** PMU counters for instruction and cycles + * instructions counter including pipelined instructions */ + uint64_t pmc_inst_start; + uint64_t pmc_inst_stop; + /* CPU unhalted clock counter */ + uint64_t pmc_clk_start; + uint64_t pmc_clk_stop; + + /* Result records */ + uint64_t tsc_interval; + uint64_t time_start, time_stop, time_interval; /* in nanosec */ + uint64_t pmc_inst, pmc_clk; + + /* Derived result records */ + uint64_t tsc_cycles; // +decimal? + uint64_t ns_per_call_quotient, ns_per_call_decimal; + uint64_t time_sec; + uint32_t time_sec_remainder; + uint64_t pmc_ipc_quotient, pmc_ipc_decimal; /* inst per cycle */ +}; + +/* For synchronizing parallel CPUs to run concurrently */ +struct time_bench_sync { + atomic_t nr_tests_running; + struct completion start_event; +}; + +/* Keep track of CPUs executing our bench function. + * + * Embed a time_bench_record for storing info per cpu + */ +struct time_bench_cpu { + struct time_bench_record rec; + struct time_bench_sync *sync; /* back ptr */ + struct task_struct *task; + /* "data" opaque could have been placed in time_bench_sync, + * but to avoid any false sharing, place it per CPU + */ + void *data; + /* Support masking outsome CPUs, mark if it ran */ + bool did_bench_run; + /* int cpu; // note CPU stored in time_bench_record */ + int (*bench_func)(struct time_bench_record *record, void *data); +}; + +/* + * Below TSC assembler code is not compatible with other archs, and + * can also fail on guests if cpu-flags are not correct. + * + * The way TSC reading is used, many iterations, does not require as + * high accuracy as described below (in Intel Doc #324264). + * + * Considering changing to use get_cycles() (#include <asm/timex.h>). + */ + +/** TSC (Time-Stamp Counter) based ** + * Recommend reading, to understand details of reading TSC accurately: + * Intel Doc #324264, "How to Benchmark Code Execution Times on Intel" + * + * Consider getting exclusive ownership of CPU by using: + * unsigned long flags; + * preempt_disable(); + * raw_local_irq_save(flags); + * _your_code_ + * raw_local_irq_restore(flags); + * preempt_enable(); + * + * Clobbered registers: "%rax", "%rbx", "%rcx", "%rdx" + * RDTSC only change "%rax" and "%rdx" but + * CPUID clears the high 32-bits of all (rax/rbx/rcx/rdx) + */ +static __always_inline uint64_t tsc_start_clock(void) +{ + /* See: Intel Doc #324264 */ + unsigned hi, lo; + asm volatile("CPUID\n\t" + "RDTSC\n\t" + "mov %%edx, %0\n\t" + "mov %%eax, %1\n\t" + : "=r"(hi), "=r"(lo)::"%rax", "%rbx", "%rcx", "%rdx"); + //FIXME: on 32bit use clobbered %eax + %edx + return ((uint64_t)lo) | (((uint64_t)hi) << 32); +} + +static __always_inline uint64_t tsc_stop_clock(void) +{ + /* See: Intel Doc #324264 */ + unsigned hi, lo; + asm volatile("RDTSCP\n\t" + "mov %%edx, %0\n\t" + "mov %%eax, %1\n\t" + "CPUID\n\t" + : "=r"(hi), "=r"(lo)::"%rax", "%rbx", "%rcx", "%rdx"); + return ((uint64_t)lo) | (((uint64_t)hi) << 32); +} + +/* Notes for RDTSC and RDTSCP + * + * Hannes found out that __builtin_ia32_rdtsc and + * __builtin_ia32_rdtscp are undocumented available in gcc, so there + * is no need to write inline assembler functions for them any more. + * + * unsigned long long __builtin_ia32_rdtscp(unsigned int *foo); + * (where foo is set to: numa_node << 12 | cpu) + * and + * unsigned long long __builtin_ia32_rdtsc(void); + * + * Above we combine the calls with CPUID, thus I don't see how this is + * directly appreciable. + */ + +/* +inline uint64_t rdtsc(void) +{ + uint32_t low, high; + asm volatile("rdtsc" : "=a" (low), "=d" (high)); + return low | (((uint64_t )high ) << 32); +} +*/ + +/** Wall-clock based ** + * + * use: getnstimeofday() + * getnstimeofday(&rec->ts_start); + * getnstimeofday(&rec->ts_stop); + * + * API changed see: Documentation/core-api/timekeeping.rst + * https://www.kernel.org/doc/html/latest/core-api/timekeeping.html#c.getnstim… + * + * We should instead use: ktime_get_real_ts64() is a direct + * replacement, but consider using monotonic time (ktime_get_ts64()) + * and/or a ktime_t based interface (ktime_get()/ktime_get_real()). + */ + +/** PMU (Performance Monitor Unit) based ** + * + * Needed for calculating: Instructions Per Cycle (IPC) + * - The IPC number tell how efficient the CPU pipelining were + */ +//lookup: perf_event_create_kernel_counter() + +bool time_bench_PMU_config(bool enable); + +/* Raw reading via rdpmc() using fixed counters + * + * From: https://github.com/andikleen/simple-pmu + */ +enum { + FIXED_SELECT = (1U << 30), /* == 0x40000000 */ + FIXED_INST_RETIRED_ANY = 0, + FIXED_CPU_CLK_UNHALTED_CORE = 1, + FIXED_CPU_CLK_UNHALTED_REF = 2, +}; + +static __always_inline unsigned long long p_rdpmc(unsigned in) +{ + unsigned d, a; + + asm volatile("rdpmc" : "=d"(d), "=a"(a) : "c"(in) : "memory"); + return ((unsigned long long)d << 32) | a; +} + +/* These PMU counter needs to be enabled, but I don't have the + * configure code implemented. My current hack is running: + * sudo perf stat -e cycles:k -e instructions:k insmod lib/ring_queue_test.ko + */ +/* Reading all pipelined instruction */ +static __always_inline unsigned long long pmc_inst(void) +{ + return p_rdpmc(FIXED_SELECT | FIXED_INST_RETIRED_ANY); +} + +/* Reading CPU clock cycles */ +static __always_inline unsigned long long pmc_clk(void) +{ + return p_rdpmc(FIXED_SELECT | FIXED_CPU_CLK_UNHALTED_CORE); +} + +/* Raw reading via MSR rdmsr() is likely wrong + * FIXME: How can I know which raw MSR registers are conf for what? + */ +#define MSR_IA32_PCM0 0x400000C1 /* PERFCTR0 */ +#define MSR_IA32_PCM1 0x400000C2 /* PERFCTR1 */ +#define MSR_IA32_PCM2 0x400000C3 +static inline uint64_t msr_inst(unsigned long long *msr_result) +{ + return rdmsrq_safe(MSR_IA32_PCM0, msr_result); +} + +/** Generic functions ** + */ +bool time_bench_loop(uint32_t loops, int step, char *txt, void *data, + int (*func)(struct time_bench_record *rec, void *data)); +bool time_bench_calc_stats(struct time_bench_record *rec); + +void time_bench_run_concurrent( + uint32_t loops, int step, void *data, + const struct cpumask *mask, /* Support masking outsome CPUs*/ + struct time_bench_sync *sync, struct time_bench_cpu *cpu_tasks, + int (*func)(struct time_bench_record *record, void *data)); +void time_bench_print_stats_cpumask(const char *desc, + struct time_bench_cpu *cpu_tasks, + const struct cpumask *mask); + +//FIXME: use rec->flags to select measurement, should be MACRO +static __always_inline void time_bench_start(struct time_bench_record *rec) +{ + //getnstimeofday(&rec->ts_start); + ktime_get_real_ts64(&rec->ts_start); + if (rec->flags & TIME_BENCH_PMU) { + rec->pmc_inst_start = pmc_inst(); + rec->pmc_clk_start = pmc_clk(); + } + rec->tsc_start = tsc_start_clock(); +} + +static __always_inline void time_bench_stop(struct time_bench_record *rec, + uint64_t invoked_cnt) +{ + rec->tsc_stop = tsc_stop_clock(); + if (rec->flags & TIME_BENCH_PMU) { + rec->pmc_inst_stop = pmc_inst(); + rec->pmc_clk_stop = pmc_clk(); + } + //getnstimeofday(&rec->ts_stop); + ktime_get_real_ts64(&rec->ts_stop); + rec->invoked_cnt = invoked_cnt; +} + +#endif /* _LINUX_TIME_BENCH_H */ diff --git a/tools/testing/selftests/net/bench/test_bench_page_pool.sh b/tools/testing/selftests/net/bench/test_bench_page_pool.sh new file mode 100755 index 000000000000..5eb48f28b659 --- /dev/null +++ b/tools/testing/selftests/net/bench/test_bench_page_pool.sh @@ -0,0 +1,32 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# + +set -e + +DRIVER="./page_pool/bench_page_pool.ko" +result="" + +function run_test() +{ + rmmod "bench_page_pool.ko" || true + insmod $DRIVER > /dev/null 2>&1 + result=$(dmesg | tail -10) + echo "$result" + + echo + echo "Fast path results:" + echo ${result} | grep -o -E "no-softirq-page_pool01 Per elem: ([0-9]+) cycles$tsc$ ([0-9]+\.[0-9]+) ns" + + echo + echo "ptr_ring results:" + echo ${result} | grep -o -E "no-softirq-page_pool02 Per elem: ([0-9]+) cycles$tsc$ ([0-9]+\.[0-9]+) ns" + + echo + echo "slow path results:" + echo ${result} | grep -o -E "no-softirq-page_pool03 Per elem: ([0-9]+) cycles$tsc$ ([0-9]+\.[0-9]+) ns" +} + +run_test + +exit 0 base-commit: 08207f42d3ffee43c97f16baf03d7426a3c353ca -- 2.50.0.rc1.591.g9c95f17f64-goog

3 weeks

2
1
0 0

[PATCH v18 net-next 0/5] DUALPI2 patch

by chia-yu.chang＠nokia-bell-labs.com

From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com> Hello, Please find the DualPI2 patch v18. This patch serise adds DualPI Improved with a Square (DualPI2) with following features: * Supports congestion controls that comply with the Prague requirements in RFC9331 (e.g. TCP-Prague) * Coupled dual-queue that separates the L4S traffic in a low latency queue (L-queue), without harming remaining traffic that is scheduled in classic queue (C-queue) due to congestion-coupling using PI2 as defined in RFC9332 * Configurable overload strategies * Use of sojourn time to reliably estimate queue delay * Supports ECN L4S-identifier (IP.ECN==0b*1) to classify traffic into respective queues For more details of DualPI2, please refer IETF RFC9332 (https://datatracker.ietf.org/doc/html/rfc9332). Best regards, Chia-Yu --- v18 (13-Jun-2025) - Add the num of enum used by DualPI2 and fix name and name-prefix of DualPI2 enum and attribute - Replace from_timer() with timer_container_of() (Pedro Tammela <pctammela(a)mojatatu.com>) v17 (25-May-2025, Resent at 11-Jun-2025) - Replace 0xffffffff with U32_MAX (Paolo Abeni <pabeni(a)redhat.com>) - Use helper function qdisc_dequeue_internal() and add new helper function skb_apply_step() (Paolo Abeni <pabeni(a)redhat.com>) - Add s64 casting when calculating the delta of the PI controller (Paolo Abeni <pabeni(a)redhat.com>) - Change the drop reason into SKB_DROP_REASON_QDISC_CONGESTED for drop_early (Paolo Abeni <pabeni(a)redhat.com>) - Modify the condition to remove the original skb when enqueuing multiple GSO segments (Paolo Abeni <pabeni(a)redhat.com>) - Add READ_ONCE() in dualpi2_dump_stat() (Paolo Abeni <pabeni(a)redhat.com>) - Add comments, brackets, and brackets for readability (Paolo Abeni <pabeni(a)redhat.com>) v16 (16-MAy-2025) - Add qdisc_lock() to dualpi2_timer() in dualpi2_timer (Paolo Abeni <pabeni(a)redhat.com>) - Introduce convert_ns_to_usec() to convert usec to nsec without overflow in #1 (Paolo Abeni <pabeni(a)redhat.com>) - Update convert_us_tonsec() to convert nsec to usec without overflow in #2 (Paolo Abeni <pabeni(a)redhat.com>) - Add more descriptions with respect to DualPI2 in the cover ltter and add changelog in each patch (Paolo Abeni <pabeni(a)redhat.com>) v15 (09-May-2025) - Add enum of TCA_DUALPI2_ECN_MASK_CLA_ECT to remove potential leakeage in #1 (Simon Horman <horms(a)kernel.org>) - Fix one typo in comment of #2 - Update tc.yaml in #5 to aligh with the updated enum of pkt_sched.h v14 (05-May-2025) - Modify tc.yaml: (1) Replace flags with enum and remove enum-as-flags, (2) Remove credit-queue in xstats, and (3) Change attribute types (Donald Hunter <donald.hun - Add enum and fix the ordering of variables in pkt_sched.h to align with the modified tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Add validators for DROP_OVERLOAD, DROP_EARLY, ECN_MASK, and SPLIT_GSO in sch_dualpi2.c (Donald Hunter <donald.hunter(a)gmail.com>) - Update dualpi2.json to align with the updated variable order in pkt_sched.h - Reorder patches (Donald Hunter <donald.hunter(a)gmail.com>) v13 (26-Apr-2025) - Use dashes in member names to follow YNL conventions in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Define enumerations separately for flags of drop-early, drop-overload, ecn-mask, credit-queue in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Change the types of split-gso and step-packets into flag in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Revert to u32/u8 types for tc-dualpi2-xstats members in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Add new test cases in tc-tests/qdiscs/dualpi2.json to cover all dualpi2 parameters (Donald Hunter <donald.hunter(a)gmail.com>) - Change the type of TCA_DUALPI2_STEP_PACKETS into NLA_FLAG (Donald Hunter <donald.hunter(a)gmail.com>) v12 (22-Apr-2025) - Remove anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>) - Replace u32/u8 with uint and s32 with int in tc spec document (Paolo Abeni <pabeni(a)redhat.com>) - Introduce get_memory_limit function to handle potential overflow when multipling limit with MTU (Paolo Abeni <pabeni(a)redhat.com>) - Double the packet length to further include packet overhead in memory_limit (Paolo Abeni <pabeni(a)redhat.com>) - Remove the check of qdisc_qlen(sch) when calling qdisc_tree_reduce_backlog (Paolo Abeni <pabeni(a)redhat.com>) v11 (15-Apr-2025) - Replace hstimer_init with hstimer_setup in sch_dualpi2.c v10 (25-Mar-2025) - Remove leftover include in include/linux/netdevice.h and anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>) - Use kfree_skb_reason() and add SKB_DROP_REASON_DUALPI2_STEP_DROP drop reason (Paolo Abeni <pabeni(a)redhat.com>) - Split sch_dualpi2.c into 3 patches (and overall 5 patches): Struct definition & parsing, Dump stats & configuration, Enqueue/Dequeue (Paolo Abeni <pabeni(a)redhat.com>) v9 (16-Mar-2025) - Fix mem_usage error in previous version - Add min_qlen_step to the dualpi2 attribute as the minimum queue length in number of packets in the L-queue to start step threshold marking. In previous versions, this value was fixed to 2, so the step threshold was applied to mark packets in the L queue only when the queue length of the L queue was greater than or equal to 2 packets. This will cause larger queuing delays for L4S traffic at low rates (<20Mbps). So we parameterize it and change the default value to 0. Comparison of tcp_1down run 'HTB 20Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 11.55 11.70 ms 350 TCP upload avg : 18.96 N/A Mbits/s 350 TCP upload sum : 18.96 N/A Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 10.81 10.70 ms 350 TCP upload avg : 18.91 N/A Mbits/s 350 TCP upload sum : 18.91 N/A Mbits/s 350 Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 12.61 12.80 ms 350 TCP upload avg : 9.48 N/A Mbits/s 350 TCP upload sum : 9.48 N/A Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 11.06 10.80 ms 350 TCP upload avg : 9.43 N/A Mbits/s 350 TCP upload sum : 9.43 N/A Mbits/s 350 Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 40.86 37.45 ms 350 TCP upload avg : 0.88 N/A Mbits/s 350 TCP upload sum : 0.88 N/A Mbits/s 350 TCP upload::1 : 0.88 0.97 Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 11.07 10.40 ms 350 TCP upload avg : 0.55 N/A Mbits/s 350 TCP upload sum : 0.55 N/A Mbits/s 350 TCP upload::1 : 0.55 0.59 Mbits/s 350 v8 (11-Mar-2025) - Fix warning messages in v7 v7 (07-Mar-2025) - Separate into 3 patches to avoid mixing changes of documentation, selftest, and code. (Cong Wang <xiyou.wangcong(a)gmail.com>) v6 (04-Mar-2025) - Add modprobe for dulapi2 in tc-testing script tc-testing/tdc.sh (Jakub Kicinski <kuba(a)kernel.org>) - Update test cases in dualpi2.json - Update commit message v5 (22-Feb-2025) - A comparison was done between MQ + DUALPI2, MQ + FQ_PIE, MQ + FQ_CODEL: Unshaped 1gigE with 4 download streams test: - Summary of tcp_4down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 1.19 1.34 ms 349 TCP download avg : 235.42 N/A Mbits/s 349 TCP download sum : 941.68 N/A Mbits/s 349 TCP download::1 : 235.19 235.39 Mbits/s 349 TCP download::2 : 235.03 235.35 Mbits/s 349 TCP download::3 : 236.89 235.44 Mbits/s 349 TCP download::4 : 234.57 235.19 Mbits/s 349 - Summary of tcp_4down run 'MQ + FQ_PIE' avg median # data pts Ping (ms) ICMP : 1.21 1.37 ms 350 TCP download avg : 235.42 N/A Mbits/s 350 TCP download sum : 941.61 N/A Mbits/s 350 TCP download::1 : 232.54 233.13 Mbits/s 350 TCP download::2 : 232.52 232.80 Mbits/s 350 TCP download::3 : 233.14 233.78 Mbits/s 350 TCP download::4 : 243.41 241.48 Mbits/s 350 - Summary of tcp_4down run 'MQ + DUALPI2' avg median # data pts Ping (ms) ICMP : 1.19 1.34 ms 349 TCP download avg : 235.42 N/A Mbits/s 349 TCP download sum : 941.68 N/A Mbits/s 349 TCP download::1 : 235.19 235.39 Mbits/s 349 TCP download::2 : 235.03 235.35 Mbits/s 349 TCP download::3 : 236.89 235.44 Mbits/s 349 TCP download::4 : 234.57 235.19 Mbits/s 349 Unshaped 1gigE with 128 download streams test: - Summary of tcp_128down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 Unshaped 10gigE with 4 download streams test: - Summary of tcp_4down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 0.22 0.23 ms 350 TCP download avg : 2354.08 N/A Mbits/s 350 TCP download sum : 9416.31 N/A Mbits/s 350 TCP download::1 : 2353.65 2352.81 Mbits/s 350 TCP download::2 : 2354.54 2354.21 Mbits/s 350 TCP download::3 : 2353.56 2353.78 Mbits/s 350 TCP download::4 : 2354.56 2354.45 Mbits/s 350 - Summary of tcp_4down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 0.20 0.19 ms 350 TCP download avg : 2354.76 N/A Mbits/s 350 TCP download sum : 9419.04 N/A Mbits/s 350 TCP download::1 : 2354.77 2353.89 Mbits/s 350 TCP download::2 : 2353.41 2354.29 Mbits/s 350 TCP download::3 : 2356.18 2354.19 Mbits/s 350 TCP download::4 : 2354.68 2353.15 Mbits/s 350 - Summary of tcp_4down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 0.24 0.24 ms 350 TCP download avg : 2354.11 N/A Mbits/s 350 TCP download sum : 9416.43 N/A Mbits/s 350 TCP download::1 : 2354.75 2353.93 Mbits/s 350 TCP download::2 : 2353.15 2353.75 Mbits/s 350 TCP download::3 : 2353.49 2353.72 Mbits/s 350 TCP download::4 : 2355.04 2353.73 Mbits/s 350 Unshaped 10gigE with 128 download streams test: - Summary of tcp_128down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 7.57 8.69 ms 350 TCP download avg : 73.97 N/A Mbits/s 350 TCP download sum : 9467.82 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 7.82 8.91 ms 350 TCP download avg : 73.97 N/A Mbits/s 350 TCP download sum : 9468.42 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 6.87 7.93 ms 350 TCP download avg : 73.95 N/A Mbits/s 350 TCP download sum : 9465.87 N/A Mbits/s 350 From the results shown above, we see small differences between combinations. - Update commit message to include results of no_split_gso and split_gso (Dave Taht <dave.taht(a)gmail.com> and Paolo Abeni <pabeni(a)redhat.com>) - Add memlimit in the dualpi2 attribute, and add memory_used, max_memory_used, memory_limit in dualpi2 stats (Dave Taht <dave.taht(a)gmail.com>) - Update note in sch_dualpi2.c related to BBRv3 status (Dave Taht <dave.taht(a)gmail.com>) - Update license identifier (Dave Taht <dave.taht(a)gmail.com>) - Add selftest in tools/testing/selftests/tc-testing (Cong Wang <xiyou.wangcong(a)gmail.com>) - Use netlink policies for parameter checks (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Modify texts & fix typos in Documentation/netlink/specs/tc.yaml (Dave Taht <dave.taht(a)gmail.com>) - Add descriptions of packet counter statistics and the reset function of sch_dualpi2.c - Fix step_thresh in packets - Update code comments in sch_dualpi2.c v4 (22-Oct-2024) - Update statement in Kconfig for DualPI2 (Stephen Hemminger <stephen(a)networkplumber.org>) - Put a blank line after #define in sch_dualpi2.c (Stephen Hemminger <stephen(a)networkplumber.org>) - Fix line length warning. v3 (19-Oct-2024) - Fix compilaiton error - Update Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>) v2 (18-Oct-2024) - Add Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>) - Use dualpi2 instead of skb prefix (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Replace nla_parse_nested_deprecated with nla_parse_nested (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Fix line length warning --- Chia-Yu Chang (4): sched: Struct definition and parsing of dualpi2 qdisc sched: Dump configuration and statistics of dualpi2 qdisc selftests/tc-testing: Add selftests for qdisc DualPI2 Documentation: netlink: specs: tc: Add DualPI2 specification Koen De Schepper (1): sched: Add enqueue/dequeue of dualpi2 qdisc Documentation/netlink/specs/tc.yaml | 161 +++ include/net/dropreason-core.h | 6 + include/uapi/linux/pkt_sched.h | 70 +- net/sched/Kconfig | 12 + net/sched/Makefile | 1 + net/sched/sch_dualpi2.c | 1146 +++++++++++++++++ tools/testing/selftests/tc-testing/config | 1 + .../tc-testing/tc-tests/qdiscs/dualpi2.json | 254 ++++ tools/testing/selftests/tc-testing/tdc.sh | 1 + 9 files changed, 1651 insertions(+), 1 deletion(-) create mode 100644 net/sched/sch_dualpi2.c create mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/dualpi2.json -- 2.34.1

3 weeks

4
9
0 0

[PATCH v18 net-next 0/5] DUALPI2 patch

by chia-yu.chang＠nokia-bell-labs.com

From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com> Hello, Please find the DualPI2 patch v18. This patch serise adds DualPI Improved with a Square (DualPI2) with following features: * Supports congestion controls that comply with the Prague requirements in RFC9331 (e.g. TCP-Prague) * Coupled dual-queue that separates the L4S traffic in a low latency queue (L-queue), without harming remaining traffic that is scheduled in classic queue (C-queue) due to congestion-coupling using PI2 as defined in RFC9332 * Configurable overload strategies * Use of sojourn time to reliably estimate queue delay * Supports ECN L4S-identifier (IP.ECN==0b*1) to classify traffic into respective queues For more details of DualPI2, please refer IETF RFC9332 (https://datatracker.ietf.org/doc/html/rfc9332). Best regards, Chia-Yu --- v18 (13-Jun-2025) - Add the num of enum used by DualPI2 and fix name and name-prefix of DualPI2 enum and attribute - Replace from_timer() with timer_container_of() (Pedro Tammela <pctammela(a)mojatatu.com>) v17 (25-May-2025, Resent at 11-Jun-2025) - Replace 0xffffffff with U32_MAX (Paolo Abeni <pabeni(a)redhat.com>) - Use helper function qdisc_dequeue_internal() and add new helper function skb_apply_step() (Paolo Abeni <pabeni(a)redhat.com>) - Add s64 casting when calculating the delta of the PI controller (Paolo Abeni <pabeni(a)redhat.com>) - Change the drop reason into SKB_DROP_REASON_QDISC_CONGESTED for drop_early (Paolo Abeni <pabeni(a)redhat.com>) - Modify the condition to remove the original skb when enqueuing multiple GSO segments (Paolo Abeni <pabeni(a)redhat.com>) - Add READ_ONCE() in dualpi2_dump_stat() (Paolo Abeni <pabeni(a)redhat.com>) - Add comments, brackets, and brackets for readability (Paolo Abeni <pabeni(a)redhat.com>) v16 (16-MAy-2025) - Add qdisc_lock() to dualpi2_timer() in dualpi2_timer (Paolo Abeni <pabeni(a)redhat.com>) - Introduce convert_ns_to_usec() to convert usec to nsec without overflow in #1 (Paolo Abeni <pabeni(a)redhat.com>) - Update convert_us_tonsec() to convert nsec to usec without overflow in #2 (Paolo Abeni <pabeni(a)redhat.com>) - Add more descriptions with respect to DualPI2 in the cover ltter and add changelog in each patch (Paolo Abeni <pabeni(a)redhat.com>) v15 (09-May-2025) - Add enum of TCA_DUALPI2_ECN_MASK_CLA_ECT to remove potential leakeage in #1 (Simon Horman <horms(a)kernel.org>) - Fix one typo in comment of #2 - Update tc.yaml in #5 to aligh with the updated enum of pkt_sched.h v14 (05-May-2025) - Modify tc.yaml: (1) Replace flags with enum and remove enum-as-flags, (2) Remove credit-queue in xstats, and (3) Change attribute types (Donald Hunter <donald.hun - Add enum and fix the ordering of variables in pkt_sched.h to align with the modified tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Add validators for DROP_OVERLOAD, DROP_EARLY, ECN_MASK, and SPLIT_GSO in sch_dualpi2.c (Donald Hunter <donald.hunter(a)gmail.com>) - Update dualpi2.json to align with the updated variable order in pkt_sched.h - Reorder patches (Donald Hunter <donald.hunter(a)gmail.com>) v13 (26-Apr-2025) - Use dashes in member names to follow YNL conventions in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Define enumerations separately for flags of drop-early, drop-overload, ecn-mask, credit-queue in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Change the types of split-gso and step-packets into flag in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Revert to u32/u8 types for tc-dualpi2-xstats members in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Add new test cases in tc-tests/qdiscs/dualpi2.json to cover all dualpi2 parameters (Donald Hunter <donald.hunter(a)gmail.com>) - Change the type of TCA_DUALPI2_STEP_PACKETS into NLA_FLAG (Donald Hunter <donald.hunter(a)gmail.com>) v12 (22-Apr-2025) - Remove anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>) - Replace u32/u8 with uint and s32 with int in tc spec document (Paolo Abeni <pabeni(a)redhat.com>) - Introduce get_memory_limit function to handle potential overflow when multipling limit with MTU (Paolo Abeni <pabeni(a)redhat.com>) - Double the packet length to further include packet overhead in memory_limit (Paolo Abeni <pabeni(a)redhat.com>) - Remove the check of qdisc_qlen(sch) when calling qdisc_tree_reduce_backlog (Paolo Abeni <pabeni(a)redhat.com>) v11 (15-Apr-2025) - Replace hstimer_init with hstimer_setup in sch_dualpi2.c v10 (25-Mar-2025) - Remove leftover include in include/linux/netdevice.h and anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>) - Use kfree_skb_reason() and add SKB_DROP_REASON_DUALPI2_STEP_DROP drop reason (Paolo Abeni <pabeni(a)redhat.com>) - Split sch_dualpi2.c into 3 patches (and overall 5 patches): Struct definition & parsing, Dump stats & configuration, Enqueue/Dequeue (Paolo Abeni <pabeni(a)redhat.com>) v9 (16-Mar-2025) - Fix mem_usage error in previous version - Add min_qlen_step to the dualpi2 attribute as the minimum queue length in number of packets in the L-queue to start step threshold marking. In previous versions, this value was fixed to 2, so the step threshold was applied to mark packets in the L queue only when the queue length of the L queue was greater than or equal to 2 packets. This will cause larger queuing delays for L4S traffic at low rates (<20Mbps). So we parameterize it and change the default value to 0. Comparison of tcp_1down run 'HTB 20Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 11.55 11.70 ms 350 TCP upload avg : 18.96 N/A Mbits/s 350 TCP upload sum : 18.96 N/A Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 10.81 10.70 ms 350 TCP upload avg : 18.91 N/A Mbits/s 350 TCP upload sum : 18.91 N/A Mbits/s 350 Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 12.61 12.80 ms 350 TCP upload avg : 9.48 N/A Mbits/s 350 TCP upload sum : 9.48 N/A Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 11.06 10.80 ms 350 TCP upload avg : 9.43 N/A Mbits/s 350 TCP upload sum : 9.43 N/A Mbits/s 350 Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 40.86 37.45 ms 350 TCP upload avg : 0.88 N/A Mbits/s 350 TCP upload sum : 0.88 N/A Mbits/s 350 TCP upload::1 : 0.88 0.97 Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 11.07 10.40 ms 350 TCP upload avg : 0.55 N/A Mbits/s 350 TCP upload sum : 0.55 N/A Mbits/s 350 TCP upload::1 : 0.55 0.59 Mbits/s 350 v8 (11-Mar-2025) - Fix warning messages in v7 v7 (07-Mar-2025) - Separate into 3 patches to avoid mixing changes of documentation, selftest, and code. (Cong Wang <xiyou.wangcong(a)gmail.com>) v6 (04-Mar-2025) - Add modprobe for dulapi2 in tc-testing script tc-testing/tdc.sh (Jakub Kicinski <kuba(a)kernel.org>) - Update test cases in dualpi2.json - Update commit message v5 (22-Feb-2025) - A comparison was done between MQ + DUALPI2, MQ + FQ_PIE, MQ + FQ_CODEL: Unshaped 1gigE with 4 download streams test: - Summary of tcp_4down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 1.19 1.34 ms 349 TCP download avg : 235.42 N/A Mbits/s 349 TCP download sum : 941.68 N/A Mbits/s 349 TCP download::1 : 235.19 235.39 Mbits/s 349 TCP download::2 : 235.03 235.35 Mbits/s 349 TCP download::3 : 236.89 235.44 Mbits/s 349 TCP download::4 : 234.57 235.19 Mbits/s 349 - Summary of tcp_4down run 'MQ + FQ_PIE' avg median # data pts Ping (ms) ICMP : 1.21 1.37 ms 350 TCP download avg : 235.42 N/A Mbits/s 350 TCP download sum : 941.61 N/A Mbits/s 350 TCP download::1 : 232.54 233.13 Mbits/s 350 TCP download::2 : 232.52 232.80 Mbits/s 350 TCP download::3 : 233.14 233.78 Mbits/s 350 TCP download::4 : 243.41 241.48 Mbits/s 350 - Summary of tcp_4down run 'MQ + DUALPI2' avg median # data pts Ping (ms) ICMP : 1.19 1.34 ms 349 TCP download avg : 235.42 N/A Mbits/s 349 TCP download sum : 941.68 N/A Mbits/s 349 TCP download::1 : 235.19 235.39 Mbits/s 349 TCP download::2 : 235.03 235.35 Mbits/s 349 TCP download::3 : 236.89 235.44 Mbits/s 349 TCP download::4 : 234.57 235.19 Mbits/s 349 Unshaped 1gigE with 128 download streams test: - Summary of tcp_128down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 Unshaped 10gigE with 4 download streams test: - Summary of tcp_4down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 0.22 0.23 ms 350 TCP download avg : 2354.08 N/A Mbits/s 350 TCP download sum : 9416.31 N/A Mbits/s 350 TCP download::1 : 2353.65 2352.81 Mbits/s 350 TCP download::2 : 2354.54 2354.21 Mbits/s 350 TCP download::3 : 2353.56 2353.78 Mbits/s 350 TCP download::4 : 2354.56 2354.45 Mbits/s 350 - Summary of tcp_4down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 0.20 0.19 ms 350 TCP download avg : 2354.76 N/A Mbits/s 350 TCP download sum : 9419.04 N/A Mbits/s 350 TCP download::1 : 2354.77 2353.89 Mbits/s 350 TCP download::2 : 2353.41 2354.29 Mbits/s 350 TCP download::3 : 2356.18 2354.19 Mbits/s 350 TCP download::4 : 2354.68 2353.15 Mbits/s 350 - Summary of tcp_4down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 0.24 0.24 ms 350 TCP download avg : 2354.11 N/A Mbits/s 350 TCP download sum : 9416.43 N/A Mbits/s 350 TCP download::1 : 2354.75 2353.93 Mbits/s 350 TCP download::2 : 2353.15 2353.75 Mbits/s 350 TCP download::3 : 2353.49 2353.72 Mbits/s 350 TCP download::4 : 2355.04 2353.73 Mbits/s 350 Unshaped 10gigE with 128 download streams test: - Summary of tcp_128down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 7.57 8.69 ms 350 TCP download avg : 73.97 N/A Mbits/s 350 TCP download sum : 9467.82 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 7.82 8.91 ms 350 TCP download avg : 73.97 N/A Mbits/s 350 TCP download sum : 9468.42 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 6.87 7.93 ms 350 TCP download avg : 73.95 N/A Mbits/s 350 TCP download sum : 9465.87 N/A Mbits/s 350 From the results shown above, we see small differences between combinations. - Update commit message to include results of no_split_gso and split_gso (Dave Taht <dave.taht(a)gmail.com> and Paolo Abeni <pabeni(a)redhat.com>) - Add memlimit in the dualpi2 attribute, and add memory_used, max_memory_used, memory_limit in dualpi2 stats (Dave Taht <dave.taht(a)gmail.com>) - Update note in sch_dualpi2.c related to BBRv3 status (Dave Taht <dave.taht(a)gmail.com>) - Update license identifier (Dave Taht <dave.taht(a)gmail.com>) - Add selftest in tools/testing/selftests/tc-testing (Cong Wang <xiyou.wangcong(a)gmail.com>) - Use netlink policies for parameter checks (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Modify texts & fix typos in Documentation/netlink/specs/tc.yaml (Dave Taht <dave.taht(a)gmail.com>) - Add descriptions of packet counter statistics and the reset function of sch_dualpi2.c - Fix step_thresh in packets - Update code comments in sch_dualpi2.c v4 (22-Oct-2024) - Update statement in Kconfig for DualPI2 (Stephen Hemminger <stephen(a)networkplumber.org>) - Put a blank line after #define in sch_dualpi2.c (Stephen Hemminger <stephen(a)networkplumber.org>) - Fix line length warning. v3 (19-Oct-2024) - Fix compilaiton error - Update Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>) v2 (18-Oct-2024) - Add Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>) - Use dualpi2 instead of skb prefix (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Replace nla_parse_nested_deprecated with nla_parse_nested (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Fix line length warning --- Chia-Yu Chang (4): sched: Struct definition and parsing of dualpi2 qdisc sched: Dump configuration and statistics of dualpi2 qdisc selftests/tc-testing: Add selftests for qdisc DualPI2 Documentation: netlink: specs: tc: Add DualPI2 specification Koen De Schepper (1): sched: Add enqueue/dequeue of dualpi2 qdisc Documentation/netlink/specs/tc.yaml | 161 +++ include/net/dropreason-core.h | 6 + include/uapi/linux/pkt_sched.h | 70 +- net/sched/Kconfig | 12 + net/sched/Makefile | 1 + net/sched/sch_dualpi2.c | 1146 +++++++++++++++++ tools/testing/selftests/tc-testing/config | 1 + .../tc-testing/tc-tests/qdiscs/dualpi2.json | 254 ++++ tools/testing/selftests/tc-testing/tdc.sh | 1 + 9 files changed, 1651 insertions(+), 1 deletion(-) create mode 100644 net/sched/sch_dualpi2.c create mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/dualpi2.json -- 2.34.1

3 weeks

1
5
0 0

[PATCH net-next, v3] selftest: Add selftest for multicast address notifications

by Yuyang Huang

This commit adds a new kernel selftest to verify RTNLGRP_IPV4_MCADDR and RTNLGRP_IPV6_MCADDR notifications. The test works by adding and removing a dummy interface and then confirming that the system correctly receives join and removal notifications for the 224.0.0.1 and ff02::1 multicast addresses. The test relies on the iproute2 version to be 6.13+. Tested by the following command: $ vng -v --user root --cpus 16 -- \ make -C tools/testing/selftests TARGETS=net TEST_PROGS=rtnetlink_notification.sh \ TEST_GEN_PROGS="" run_tests Cc: Maciej Żenczykowski <maze(a)google.com> Cc: Lorenzo Colitti <lorenzo(a)google.com> Signed-off-by: Yuyang Huang <yuyanghuang(a)google.com> --- Changelog since v2: - Move the test case to a separate file. Changelog since v1: - Skip the test if the iproute2 is too old. tools/testing/selftests/net/Makefile | 1 + .../selftests/net/rtnetlink_notification.sh | 159 ++++++++++++++++++ 2 files changed, 160 insertions(+) create mode 100755 tools/testing/selftests/net/rtnetlink_notification.sh diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 70a38f485d4d..ad258b25bc9d 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -40,6 +40,7 @@ TEST_PROGS += netns-name.sh TEST_PROGS += link_netns.py TEST_PROGS += nl_netdev.py TEST_PROGS += rtnetlink.py +TEST_PROGS += rtnetlink_notification.sh TEST_PROGS += srv6_end_dt46_l3vpn_test.sh TEST_PROGS += srv6_end_dt4_l3vpn_test.sh TEST_PROGS += srv6_end_dt6_l3vpn_test.sh diff --git a/tools/testing/selftests/net/rtnetlink_notification.sh b/tools/testing/selftests/net/rtnetlink_notification.sh new file mode 100755 index 000000000000..a2c1afed5023 --- /dev/null +++ b/tools/testing/selftests/net/rtnetlink_notification.sh @@ -0,0 +1,159 @@ +#!/bin/bash +# +# This test is for checking rtnetlink notification callpaths, and get as much +# coverage as possible. +# +# set -e + +ALL_TESTS=" + kci_test_mcast_addr_notification +" + +VERBOSE=0 +PAUSE=no +PAUSE_ON_FAIL=no + +source lib.sh + +# set global exit status, but never reset nonzero one. +check_err() +{ + if [ $ret -eq 0 ]; then + ret=$1 + fi + [ -n "$2" ] && echo "$2" +} + +run_cmd_common() +{ + local cmd="$*" + local out + if [ "$VERBOSE" = "1" ]; then + echo "COMMAND: ${cmd}" + fi + out=$($cmd 2>&1) + rc=$? + if [ "$VERBOSE" = "1" -a -n "$out" ]; then + echo " $out" + fi + return $rc +} + +run_cmd() { + run_cmd_common "$@" + rc=$? + check_err $rc + return $rc +} + +end_test() +{ + echo "$*" + [ "${VERBOSE}" = "1" ] && echo + + if [[ $ret -ne 0 ]] && [[ "${PAUSE_ON_FAIL}" = "yes" ]]; then + echo "Hit enter to continue" + read a + fi; + + if [ "${PAUSE}" = "yes" ]; then + echo "Hit enter to continue" + read a + fi + +} + +kci_test_mcast_addr_notification() +{ + local tmpfile + local monitor_pid + local match_result + + tmpfile=$(mktemp) + + ip monitor maddr > $tmpfile & + monitor_pid=$! + sleep 1 + if [ ! -e "/proc/$monitor_pid" ]; then + end_test "SKIP: mcast addr notification: iproute2 too old" + rm $tmpfile + return $ksft_skip + fi + + run_cmd ip link add name test-dummy1 type dummy + run_cmd ip link set test-dummy1 up + run_cmd ip link del dev test-dummy1 + sleep 1 + + match_result=$(grep -cE "test-dummy1.*(224.0.0.1|ff02::1)" $tmpfile) + + kill $monitor_pid + rm $tmpfile + # There should be 4 line matches as follows. + # 13: test-dummy1 inet6 mcast ff02::1 scope global + # 13: test-dummy1 inet mcast 224.0.0.1 scope global + # Deleted 13: test-dummy1 inet mcast 224.0.0.1 scope global + # Deleted 13: test-dummy1 inet6 mcast ff02::1 scope global + if [ $match_result -ne 4 ];then + end_test "FAIL: mcast addr notification" + return 1 + fi + end_test "PASS: mcast addr notification" +} + +kci_test_rtnl() +{ + local current_test + local ret=0 + + for current_test in ${TESTS:-$ALL_TESTS}; do + $current_test + check_err $? + done + + return $ret +} + +usage() +{ + cat <<EOF +usage: ${0##*/} OPTS + + -t <test> Test(s) to run (default: all) + (options: $(echo $ALL_TESTS)) + -v Verbose mode (show commands and output) + -P Pause after every test + -p Pause after every failing test before cleanup (for debugging) +EOF +} + +#check for needed privileges +if [ "$(id -u)" -ne 0 ];then + end_test "SKIP: Need root privileges" + exit $ksft_skip +fi + +for x in ip;do + $x -Version 2>/dev/null >/dev/null + if [ $? -ne 0 ];then + end_test "SKIP: Could not run test without the $x tool" + exit $ksft_skip + fi +done + +while getopts t:hvpP o; do + case $o in + t) TESTS=$OPTARG;; + v) VERBOSE=1;; + p) PAUSE_ON_FAIL=yes;; + P) PAUSE=yes;; + h) usage; exit 0;; + *) usage; exit 1;; + esac +done + +[ $PAUSE = "yes" ] && PAUSE_ON_FAIL="no" + +kci_test_rtnl + +exit $? -- 2.50.0.rc1.591.g9c95f17f64-goog

3 weeks

2
2
0 0

[PATCH v3 RESEND] selftests: filesystems: Add functional test for the abort file in fusectl

by Chen Linxuan

This patch add a simple functional test for the "abort" file in fusectlfs (/sys/fs/fuse/connections/ID/about). A simple fuse daemon is added for testing. Related discussion can be found in the link below. Link: https://lore.kernel.org/all/CAOQ4uxjKFXOKQxPpxtS6G_nR0tpw95w0GiO68UcWg_OBhm… Signed-off-by: Chen Linxuan <chenlinxuan(a)uniontech.com> Acked-by: Shuah Khan <skhan(a)linuxfoundation.org> Reviewed-by: Amir Goldstein <amir73il(a)gmail.com> --- Changes in v3: - Apply changes suggested by Amir Goldstein - Rename the test subdir to filesystems/fuse - Verify errno when connection is aborted - Apply changes suggested by Shuah Khan - Update commit message - Link to v2: https://lore.kernel.org/all/20250517012350.10317-2-chenlinxuan@uniontech.co… Changes in v2: - Apply changes suggested by Amir Goldstein - Check errno - Link to v1: https://lore.kernel.org/all/20250515073449.346774-2-chenlinxuan@uniontech.c… --- MAINTAINERS | 1 + tools/testing/selftests/Makefile | 1 + .../selftests/filesystems/fuse/.gitignore | 3 + .../selftests/filesystems/fuse/Makefile | 21 +++ .../selftests/filesystems/fuse/fuse_mnt.c | 146 ++++++++++++++++++ .../selftests/filesystems/fuse/fusectl_test.c | 116 ++++++++++++++ 6 files changed, 288 insertions(+) create mode 100644 tools/testing/selftests/filesystems/fuse/.gitignore create mode 100644 tools/testing/selftests/filesystems/fuse/Makefile create mode 100644 tools/testing/selftests/filesystems/fuse/fuse_mnt.c create mode 100644 tools/testing/selftests/filesystems/fuse/fusectl_test.c diff --git a/MAINTAINERS b/MAINTAINERS index a92290fffa163..04d90432c1841 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9901,6 +9901,7 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git F: Documentation/filesystems/fuse* F: fs/fuse/ F: include/uapi/linux/fuse.h +F: tools/testing/selftests/filesystems/fuse/ FUTEX SUBSYSTEM M: Thomas Gleixner <tglx(a)linutronix.de> diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 339b31e6a6b59..c37a76a8ca214 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -36,6 +36,7 @@ TARGETS += filesystems/fat TARGETS += filesystems/overlayfs TARGETS += filesystems/statmount TARGETS += filesystems/mount-notify +TARGETS += filesystems/fuse TARGETS += firmware TARGETS += fpu TARGETS += ftrace diff --git a/tools/testing/selftests/filesystems/fuse/.gitignore b/tools/testing/selftests/filesystems/fuse/.gitignore new file mode 100644 index 0000000000000..3e72e742d08e8 --- /dev/null +++ b/tools/testing/selftests/filesystems/fuse/.gitignore @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0-only +fuse_mnt +fusectl_test diff --git a/tools/testing/selftests/filesystems/fuse/Makefile b/tools/testing/selftests/filesystems/fuse/Makefile new file mode 100644 index 0000000000000..612aad69a93aa --- /dev/null +++ b/tools/testing/selftests/filesystems/fuse/Makefile @@ -0,0 +1,21 @@ +# SPDX-License-Identifier: GPL-2.0-or-later + +CFLAGS += -Wall -O2 -g $(KHDR_INCLUDES) + +TEST_GEN_PROGS := fusectl_test +TEST_GEN_FILES := fuse_mnt + +include ../../lib.mk + +VAR_CFLAGS := $(shell pkg-config fuse --cflags 2>/dev/null) +ifeq ($(VAR_CFLAGS),) +VAR_CFLAGS := -D_FILE_OFFSET_BITS=64 -I/usr/include/fuse +endif + +VAR_LDLIBS := $(shell pkg-config fuse --libs 2>/dev/null) +ifeq ($(VAR_LDLIBS),) +VAR_LDLIBS := -lfuse -pthread +endif + +$(OUTPUT)/fuse_mnt: CFLAGS += $(VAR_CFLAGS) +$(OUTPUT)/fuse_mnt: LDLIBS += $(VAR_LDLIBS) diff --git a/tools/testing/selftests/filesystems/fuse/fuse_mnt.c b/tools/testing/selftests/filesystems/fuse/fuse_mnt.c new file mode 100644 index 0000000000000..d12b17f30fadc --- /dev/null +++ b/tools/testing/selftests/filesystems/fuse/fuse_mnt.c @@ -0,0 +1,146 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * fusectl test file-system + * Creates a simple FUSE filesystem with a single read-write file (/test) + */ + +#define FUSE_USE_VERSION 26 + +#include <fuse.h> +#include <stdio.h> +#include <string.h> +#include <errno.h> +#include <fcntl.h> +#include <stdlib.h> +#include <unistd.h> + +#define MAX(a, b) ((a) > (b) ? (a) : (b)) + +static char *content; +static size_t content_size = 0; +static const char test_path[] = "/test"; + +static int test_getattr(const char *path, struct stat *st) +{ + memset(st, 0, sizeof(*st)); + + if (!strcmp(path, "/")) { + st->st_mode = S_IFDIR | 0755; + st->st_nlink = 2; + return 0; + } + + if (!strcmp(path, test_path)) { + st->st_mode = S_IFREG | 0664; + st->st_nlink = 1; + st->st_size = content_size; + return 0; + } + + return -ENOENT; +} + +static int test_readdir(const char *path, void *buf, fuse_fill_dir_t filler, + off_t offset, struct fuse_file_info *fi) +{ + if (strcmp(path, "/")) + return -ENOENT; + + filler(buf, ".", NULL, 0); + filler(buf, "..", NULL, 0); + filler(buf, test_path + 1, NULL, 0); + + return 0; +} + +static int test_open(const char *path, struct fuse_file_info *fi) +{ + if (strcmp(path, test_path)) + return -ENOENT; + + return 0; +} + +static int test_read(const char *path, char *buf, size_t size, off_t offset, + struct fuse_file_info *fi) +{ + if (strcmp(path, test_path) != 0) + return -ENOENT; + + if (!content || content_size == 0) + return 0; + + if (offset >= content_size) + return 0; + + if (offset + size > content_size) + size = content_size - offset; + + memcpy(buf, content + offset, size); + + return size; +} + +static int test_write(const char *path, const char *buf, size_t size, + off_t offset, struct fuse_file_info *fi) +{ + size_t new_size; + + if (strcmp(path, test_path) != 0) + return -ENOENT; + + if(offset > content_size) + return -EINVAL; + + new_size = MAX(offset + size, content_size); + + if (new_size > content_size) + content = realloc(content, new_size); + + content_size = new_size; + + if (!content) + return -ENOMEM; + + memcpy(content + offset, buf, size); + + return size; +} + +static int test_truncate(const char *path, off_t size) +{ + if (strcmp(path, test_path) != 0) + return -ENOENT; + + if (size == 0) { + free(content); + content = NULL; + content_size = 0; + return 0; + } + + content = realloc(content, size); + + if (!content) + return -ENOMEM; + + if (size > content_size) + memset(content + content_size, 0, size - content_size); + + content_size = size; + return 0; +} + +static struct fuse_operations memfd_ops = { + .getattr = test_getattr, + .readdir = test_readdir, + .open = test_open, + .read = test_read, + .write = test_write, + .truncate = test_truncate, +}; + +int main(int argc, char *argv[]) +{ + return fuse_main(argc, argv, &memfd_ops, NULL); +} diff --git a/tools/testing/selftests/filesystems/fuse/fusectl_test.c b/tools/testing/selftests/filesystems/fuse/fusectl_test.c new file mode 100644 index 0000000000000..7050fbe0970e7 --- /dev/null +++ b/tools/testing/selftests/filesystems/fuse/fusectl_test.c @@ -0,0 +1,116 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +// Copyright (c) 2025 Chen Linxuan <chenlinxuan(a)uniontech.com> + +#define _GNU_SOURCE + +#include <errno.h> +#include <fcntl.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/mount.h> +#include <sys/stat.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <unistd.h> +#include <dirent.h> +#include <linux/limits.h> + +#include "../../kselftest_harness.h" + +#define FUSECTL_MOUNTPOINT "/sys/fs/fuse/connections" +#define FUSE_MOUNTPOINT "/tmp/fuse_mnt_XXXXXX" +#define FUSE_DEVICE "/dev/fuse" +#define FUSECTL_TEST_VALUE "1" + +FIXTURE(fusectl){ + char fuse_mountpoint[sizeof(FUSE_MOUNTPOINT)]; + int connection; +}; + +FIXTURE_SETUP(fusectl) +{ + const char *fuse_mnt_prog = "./fuse_mnt"; + int status, pid; + struct stat statbuf; + + strcpy(self->fuse_mountpoint, FUSE_MOUNTPOINT); + + if (!mkdtemp(self->fuse_mountpoint)) + SKIP(return, + "Failed to create FUSE mountpoint %s", + strerror(errno)); + + if (access(FUSECTL_MOUNTPOINT, F_OK)) + SKIP(return, + "FUSE control filesystem not mounted"); + + pid = fork(); + if (pid < 0) + SKIP(return, + "Failed to fork FUSE daemon process: %s", + strerror(errno)); + + if (pid == 0) { + execlp(fuse_mnt_prog, fuse_mnt_prog, self->fuse_mountpoint, NULL); + exit(errno); + } + + waitpid(pid, &status, 0); + if (!WIFEXITED(status) || WEXITSTATUS(status) != 0) { + SKIP(return, + "Failed to start FUSE daemon %s", + strerror(WEXITSTATUS(status))); + } + + if (stat(self->fuse_mountpoint, &statbuf)) + SKIP(return, + "Failed to stat FUSE mountpoint %s", + strerror(errno)); + + self->connection = statbuf.st_dev; +} + +FIXTURE_TEARDOWN(fusectl) +{ + umount(self->fuse_mountpoint); + rmdir(self->fuse_mountpoint); +} + +TEST_F(fusectl, abort) +{ + char path_buf[PATH_MAX]; + int abort_fd, test_fd, ret; + + sprintf(path_buf, "/sys/fs/fuse/connections/%d/abort", self->connection); + + ASSERT_EQ(0, access(path_buf, F_OK)); + + abort_fd = open(path_buf, O_WRONLY); + ASSERT_GE(abort_fd, 0); + + sprintf(path_buf, "%s/test", self->fuse_mountpoint); + + test_fd = open(path_buf, O_RDWR); + ASSERT_GE(test_fd, 0); + + ret = read(test_fd, path_buf, sizeof(path_buf)); + ASSERT_EQ(ret, 0); + + ret = write(test_fd, "test", sizeof("test")); + ASSERT_EQ(ret, sizeof("test")); + + ret = lseek(test_fd, 0, SEEK_SET); + ASSERT_GE(ret, 0); + + ret = write(abort_fd, FUSECTL_TEST_VALUE, sizeof(FUSECTL_TEST_VALUE)); + ASSERT_GT(ret, 0); + + close(abort_fd); + + ret = read(test_fd, path_buf, sizeof(path_buf)); + ASSERT_EQ(ret, -1); + ASSERT_EQ(errno, ENOTCONN); +} + +TEST_HARNESS_MAIN -- 2.43.0

3 weeks

2
2
0 0

[PATCH v4 0/7] use per-vma locks for /proc/pid/maps reads and PROCMAP_QUERY

by Suren Baghdasaryan

Reading /proc/pid/maps requires read-locking mmap_lock which prevents any other task from concurrently modifying the address space. This guarantees coherent reporting of virtual address ranges, however it can block important updates from happening. Oftentimes /proc/pid/maps readers are low priority monitoring tasks and them blocking high priority tasks results in priority inversion. Locking the entire address space is required to present fully coherent picture of the address space, however even current implementation does not strictly guarantee that by outputting vmas in page-size chunks and dropping mmap_lock in between each chunk. Address space modifications are possible while mmap_lock is dropped and userspace reading the content is expected to deal with possible concurrent address space modifications. Considering these relaxed rules, holding mmap_lock is not strictly needed as long as we can guarantee that a concurrently modified vma is reported either in its original form or after it was modified. This patchset switches from holding mmap_lock while reading /proc/pid/maps to taking per-vma locks as we walk the vma tree. This reduces the contention with tasks modifying the address space because they would have to contend for the same vma as opposed to the entire address space. Same is done for PROCMAP_QUERY ioctl which locks only the vma that fell into the requested range instead of the entire address space. Previous version of this patchset [1] tried to perform /proc/pid/maps reading under RCU, however its implementation is quite complex and the results are worse than the new version because it still relied on mmap_lock speculation which retries if any part of the address space gets modified. New implementaion is both simpler and results in less contention. Note that similar approach would not work for /proc/pid/smaps reading as it also walks the page table and that's not RCU-safe. Paul McKenney's designed a test [2] to measure mmap/munmap latencies while concurrently reading /proc/pid/maps. The test has a pair of processes scanning /proc/PID/maps, and another process unmapping and remapping 4K pages from a 128MB range of anonymous memory. At the end of each 10 second run, the latency of each mmap() or munmap() operation is measured, and for each run the maximum and mean latency is printed. The map/unmap process is started first, its PID is passed to the scanners, and then the map/unmap process waits until both scanners are running before starting its timed test. The scanners keep scanning until the specified /proc/PID/maps file disappears. This test registered close to 10x improvement in update latencies: Before the change: ./run-proc-vs-map.sh --nsamples 100 --rawdata -- --busyduration 2 0.011 0.008 0.455 0.011 0.008 0.472 0.011 0.008 0.535 0.011 0.009 0.545 ... 0.011 0.014 2.875 0.011 0.014 2.913 0.011 0.014 3.007 0.011 0.015 3.018 After the change: ./run-proc-vs-map.sh --nsamples 100 --rawdata -- --busyduration 2 0.006 0.005 0.036 0.006 0.005 0.039 0.006 0.005 0.039 0.006 0.005 0.039 ... 0.006 0.006 0.403 0.006 0.006 0.474 0.006 0.006 0.479 0.006 0.006 0.498 The patchset also adds a number of tests to check for /proc/pid/maps data coherency. They are designed to detect any unexpected data tearing while performing some common address space modifications (vma split, resize and remap). Even before these changes, reading /proc/pid/maps might have inconsistent data because the file is read page-by-page with mmap_lock being dropped between the pages. An example of user-visible inconsistency can be that the same vma is printed twice: once before it was modified and then after the modifications. For example if vma was extended, it might be found and reported twice. What is not expected is to see a gap where there should have been a vma both before and after modification. This patchset increases the chances of such tearing, therefore it's event more important now to test for unexpected inconsistencies. [1] https://lore.kernel.org/all/20250418174959.1431962-1-surenb@google.com/ [2] https://github.com/paulmckrcu/proc-mmap_sem-test Suren Baghdasaryan (7): selftests/proc: add /proc/pid/maps tearing from vma split test selftests/proc: extend /proc/pid/maps tearing test to include vma resizing selftests/proc: extend /proc/pid/maps tearing test to include vma remapping selftests/proc: test PROCMAP_QUERY ioctl while vma is concurrently modified selftests/proc: add verbose more for tests to facilitate debugging mm/maps: read proc/pid/maps under per-vma lock mm/maps: execute PROCMAP_QUERY ioctl under per-vma locks fs/proc/internal.h | 6 + fs/proc/task_mmu.c | 233 +++++- tools/testing/selftests/proc/proc-pid-vm.c | 793 ++++++++++++++++++++- 3 files changed, 1011 insertions(+), 21 deletions(-) base-commit: 2d0c297637e7d59771c1533847c666cdddc19884 -- 2.49.0.1266.g31b7d2e469-goog

3 weeks, 1 day

4
19
0 0

[PATCH net-next v4 0/6] udp_tunnel: remove rtnl_lock dependency

by Stanislav Fomichev

Recently bnxt had to grow back a bunch of rtnl dependencies because of udp_tunnel's infra. Add separate (global) mutext to protect udp_tunnel state. v4: - grab lock in more places, specifically netlink and notifiers (Jakub) - convert geneve and vxlan notifiers to (sleepable) rtnl lock v3: - fix 2 test failures (Jakub NIPA) v2: - move the lock into udp_tunnel_nic (Jakub) - reorder the lock ordering (Jakub) - move udp_ports_sleep removal into separate patch and update the test (Jakub) Cc: Michael Chan <michael.chan(a)broadcom.com> Stanislav Fomichev (6): geneve: rely on rtnl lock in geneve_offload_rx_ports vxlan: drop sock_lock udp_tunnel: remove rtnl_lock dependency net: remove redundant ASSERT_RTNL() in queue setup functions netdevsim: remove udp_ports_sleep Revert "bnxt_en: bring back rtnl_lock() in the bnxt_open() path" .../net/ethernet/broadcom/bnx2x/bnx2x_main.c | 3 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 42 ++------- drivers/net/ethernet/emulex/benet/be_main.c | 3 +- drivers/net/ethernet/intel/i40e/i40e_main.c | 1 - drivers/net/ethernet/intel/ice/ice_main.c | 1 - .../net/ethernet/mellanox/mlx4/en_netdev.c | 3 +- .../net/ethernet/mellanox/mlx5/core/en_main.c | 3 +- .../ethernet/netronome/nfp/nfp_net_common.c | 3 +- .../net/ethernet/qlogic/qede/qede_filter.c | 3 - .../net/ethernet/qlogic/qlcnic/qlcnic_main.c | 1 - drivers/net/ethernet/sfc/ef10.c | 1 - drivers/net/geneve.c | 7 +- drivers/net/netdevsim/netdevsim.h | 2 - drivers/net/netdevsim/udp_tunnels.c | 12 --- drivers/net/vxlan/vxlan_core.c | 34 ++++---- drivers/net/vxlan/vxlan_private.h | 2 +- drivers/net/vxlan/vxlan_vnifilter.c | 18 ++-- include/net/udp_tunnel.h | 87 ++++++++++++++----- net/core/dev.c | 4 +- net/ipv4/udp_tunnel_core.c | 16 ++-- net/ipv4/udp_tunnel_nic.c | 78 +++++++++++++---- .../drivers/net/netdevsim/udp_tunnel_nic.sh | 23 +---- 22 files changed, 177 insertions(+), 170 deletions(-) -- 2.49.0

3 weeks, 1 day

1
6
0 0

[PATCH v7 00/30] TDX KVM selftests

by Sagi Shahar

This is v7 of the TDX selftests now that the base TDX patches have been accepted. This series is based on v6.16-rc1 No major changes from v6 asside from rebasing. Thanks, Changes from v6: - Rebased on top of v6.16-rc1 Ackerley Tng (12): KVM: selftests: Add function to allow one-to-one GVA to GPA mappings KVM: selftests: Expose function that sets up sregs based on VM's mode KVM: selftests: Store initial stack address in struct kvm_vcpu KVM: selftests: Add vCPU descriptor table initialization utility KVM: selftests: TDX: Use KVM_TDX_CAPABILITIES to validate TDs' attribute configuration KVM: selftests: TDX: Update load_td_memory_region() for VM memory backed by guest memfd KVM: selftests: Add functions to allow mapping as shared KVM: selftests: KVM: selftests: Expose new vm_vaddr_alloc_private() KVM: selftests: TDX: Add support for TDG.MEM.PAGE.ACCEPT KVM: selftests: TDX: Add support for TDG.VP.VEINFO.GET KVM: selftests: TDX: Add TDX UPM selftest KVM: selftests: TDX: Add TDX UPM selftests for implicit conversion Erdem Aktas (3): KVM: selftests: Add helper functions to create TDX VMs KVM: selftests: TDX: Add TDX lifecycle test KVM: selftests: TDX: Add TDX HLT exit test Isaku Yamahata (1): KVM: selftests: Update kvm_init_vm_address_properties() for TDX Roger Wang (1): KVM: selftests: TDX: Add TDG.VP.INFO test Ryan Afranji (2): KVM: selftests: TDX: Verify the behavior when host consumes a TD private memory KVM: selftests: TDX: Add shared memory test Sagi Shahar (10): KVM: selftests: TDX: Add report_fatal_error test KVM: selftests: TDX: Adding test case for TDX port IO KVM: selftests: TDX: Add basic TDX CPUID test KVM: selftests: TDX: Add basic TDG.VP.VMCALL<GetTdVmCallInfo> test KVM: selftests: TDX: Add TDX IO writes test KVM: selftests: TDX: Add TDX IO reads test KVM: selftests: TDX: Add TDX MSR read/write tests KVM: selftests: TDX: Add TDX MMIO reads test KVM: selftests: TDX: Add TDX MMIO writes test KVM: selftests: TDX: Add TDX CPUID TDVMCALL test Yan Zhao (1): KVM: selftests: TDX: Test LOG_DIRTY_PAGES flag to a non-GUEST_MEMFD memslot tools/testing/selftests/kvm/Makefile.kvm | 8 + .../testing/selftests/kvm/include/kvm_util.h | 36 + .../selftests/kvm/include/x86/kvm_util_arch.h | 1 + .../selftests/kvm/include/x86/processor.h | 2 + .../selftests/kvm/include/x86/tdx/td_boot.h | 83 ++ .../kvm/include/x86/tdx/td_boot_asm.h | 16 + .../selftests/kvm/include/x86/tdx/tdcall.h | 54 + .../selftests/kvm/include/x86/tdx/tdx.h | 67 + .../selftests/kvm/include/x86/tdx/tdx_util.h | 23 + .../selftests/kvm/include/x86/tdx/test_util.h | 133 ++ tools/testing/selftests/kvm/lib/kvm_util.c | 74 +- .../testing/selftests/kvm/lib/x86/processor.c | 97 +- .../selftests/kvm/lib/x86/tdx/td_boot.S | 100 ++ .../selftests/kvm/lib/x86/tdx/tdcall.S | 163 +++ tools/testing/selftests/kvm/lib/x86/tdx/tdx.c | 243 ++++ .../selftests/kvm/lib/x86/tdx/tdx_util.c | 643 +++++++++ .../selftests/kvm/lib/x86/tdx/test_util.c | 187 +++ .../selftests/kvm/x86/tdx_shared_mem_test.c | 129 ++ .../testing/selftests/kvm/x86/tdx_upm_test.c | 461 ++++++ tools/testing/selftests/kvm/x86/tdx_vm_test.c | 1254 +++++++++++++++++ 20 files changed, 3734 insertions(+), 40 deletions(-) create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/td_boot.h create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/td_boot_asm.h create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdcall.h create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdx.h create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdx_util.h create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/test_util.h create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/td_boot.S create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdcall.S create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdx.c create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdx_util.c create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/test_util.c create mode 100644 tools/testing/selftests/kvm/x86/tdx_shared_mem_test.c create mode 100644 tools/testing/selftests/kvm/x86/tdx_upm_test.c create mode 100644 tools/testing/selftests/kvm/x86/tdx_vm_test.c -- 2.50.0.rc2.692.g299adb8693-goog

3 weeks, 1 day

1
30
0 0

[kvm-unit-tests PATCH v5] riscv: sbi: Add SBI Debug Triggers Extension tests

by Jesse Taube

Add tests for the DBTR SBI extension. Signed-off-by: Jesse Taube <jesse(a)rivosinc.com> Reviewed-by: Charlie Jenkins <charlie(a)rivosinc.com> Tested-by: Charlie Jenkins <charlie(a)rivosinc.com> --- V1 -> V2: - Call report_prefix_pop before returning - Disable compressed instructions in exec_call, update related comment - Remove extra "| 1" in dbtr_test_load - Remove extra newlines - Remove extra tabs in check_exec - Remove typedefs from enums - Return when dbtr_install_trigger fails - s/avalible/available/g - s/unistall/uninstall/g V2 -> V3: - Change SBI_DBTR_SHMEM_INVALID_ADDR to -1UL - Move all dbtr functions to sbi-dbtr.c - Move INSN_LEN to processor.h - Update include list - Use C-style comments V3 -> V4: - Include libcflat.h - Remove #define SBI_DBTR_SHMEM_INVALID_ADDR V4 -> V5: - Sort includes - Add kfail for update triggers --- lib/riscv/asm/sbi.h | 1 + riscv/Makefile | 1 + riscv/sbi-dbtr.c | 824 ++++++++++++++++++++++++++++++++++++++++++++ riscv/sbi-tests.h | 1 + riscv/sbi.c | 1 + 5 files changed, 828 insertions(+) create mode 100644 riscv/sbi-dbtr.c diff --git a/lib/riscv/asm/sbi.h b/lib/riscv/asm/sbi.h index a5738a5c..78fd6e2a 100644 --- a/lib/riscv/asm/sbi.h +++ b/lib/riscv/asm/sbi.h @@ -51,6 +51,7 @@ enum sbi_ext_id { SBI_EXT_SUSP = 0x53555350, SBI_EXT_FWFT = 0x46574654, SBI_EXT_SSE = 0x535345, + SBI_EXT_DBTR = 0x44425452, }; enum sbi_ext_base_fid { diff --git a/riscv/Makefile b/riscv/Makefile index 11e68eae..55c7ac93 100644 --- a/riscv/Makefile +++ b/riscv/Makefile @@ -20,6 +20,7 @@ all: $(tests) $(TEST_DIR)/sbi-deps += $(TEST_DIR)/sbi-asm.o $(TEST_DIR)/sbi-deps += $(TEST_DIR)/sbi-fwft.o $(TEST_DIR)/sbi-deps += $(TEST_DIR)/sbi-sse.o +$(TEST_DIR)/sbi-deps += $(TEST_DIR)/sbi-dbtr.o all_deps += $($(TEST_DIR)/sbi-deps) diff --git a/riscv/sbi-dbtr.c b/riscv/sbi-dbtr.c new file mode 100644 index 00000000..ebaa6f59 --- /dev/null +++ b/riscv/sbi-dbtr.c @@ -0,0 +1,824 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * SBI DBTR testsuite + * + * Copyright (C) 2025, Rivos Inc., Jesse Taube <jesse(a)rivosinc.com> + */ + + #include <libcflat.h> + #include <bitops.h> + + #include <asm/io.h> + #include <asm/processor.h> + + #include "sbi-tests.h" + +#define RV_MAX_TRIGGERS 32 + +#define SBI_DBTR_TRIG_STATE_MAPPED BIT(0) +#define SBI_DBTR_TRIG_STATE_U BIT(1) +#define SBI_DBTR_TRIG_STATE_S BIT(2) +#define SBI_DBTR_TRIG_STATE_VU BIT(3) +#define SBI_DBTR_TRIG_STATE_VS BIT(4) +#define SBI_DBTR_TRIG_STATE_HAVE_HW_TRIG BIT(5) + +#define SBI_DBTR_TRIG_STATE_HW_TRIG_IDX_SHIFT 8 +#define SBI_DBTR_TRIG_STATE_HW_TRIG_IDX(trig_state) (trig_state >> SBI_DBTR_TRIG_STATE_HW_TRIG_IDX_SHIFT) + +#define SBI_DBTR_TDATA1_TYPE_SHIFT (__riscv_xlen - 4) + +#define SBI_DBTR_TDATA1_MCONTROL6_LOAD_BIT BIT(0) +#define SBI_DBTR_TDATA1_MCONTROL6_STORE_BIT BIT(1) +#define SBI_DBTR_TDATA1_MCONTROL6_EXECUTE_BIT BIT(2) +#define SBI_DBTR_TDATA1_MCONTROL6_U_BIT BIT(3) +#define SBI_DBTR_TDATA1_MCONTROL6_S_BIT BIT(4) +#define SBI_DBTR_TDATA1_MCONTROL6_SELECT_BIT BIT(21) +#define SBI_DBTR_TDATA1_MCONTROL6_VS_BIT BIT(23) +#define SBI_DBTR_TDATA1_MCONTROL6_VU_BIT BIT(24) + +#define SBI_DBTR_TDATA1_MCONTROL_LOAD_BIT BIT(0) +#define SBI_DBTR_TDATA1_MCONTROL_STORE_BIT BIT(1) +#define SBI_DBTR_TDATA1_MCONTROL_EXECUTE_BIT BIT(2) +#define SBI_DBTR_TDATA1_MCONTROL_U_BIT BIT(3) +#define SBI_DBTR_TDATA1_MCONTROL_S_BIT BIT(4) +#define SBI_DBTR_TDATA1_MCONTROL_SELECT_BIT BIT(19) + +enum McontrolType { + SBI_DBTR_TDATA1_TYPE_NONE = (0UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_LEGACY = (1UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_MCONTROL = (2UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_ICOUNT = (3UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_ITRIGGER = (4UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_ETRIGGER = (5UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_MCONTROL6 = (6UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_TMEXTTRIGGER = (7UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_RESERVED0 = (8UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_RESERVED1 = (9UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_RESERVED2 = (10UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_RESERVED3 = (11UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_CUSTOM0 = (12UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_CUSTOM1 = (13UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_CUSTOM2 = (14UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_DISABLED = (15UL << SBI_DBTR_TDATA1_TYPE_SHIFT), +}; + +enum Tdata1Value { + VALUE_NONE = 0, + VALUE_LOAD = BIT(0), + VALUE_STORE = BIT(1), + VALUE_EXECUTE = BIT(2), +}; + +enum Tdata1Mode { + MODE_NONE = 0, + MODE_M = BIT(0), + MODE_U = BIT(1), + MODE_S = BIT(2), + MODE_VU = BIT(3), + MODE_VS = BIT(4), +}; + +enum sbi_ext_dbtr_fid { + SBI_EXT_DBTR_NUM_TRIGGERS = 0, + SBI_EXT_DBTR_SETUP_SHMEM, + SBI_EXT_DBTR_TRIGGER_READ, + SBI_EXT_DBTR_TRIGGER_INSTALL, + SBI_EXT_DBTR_TRIGGER_UPDATE, + SBI_EXT_DBTR_TRIGGER_UNINSTALL, + SBI_EXT_DBTR_TRIGGER_ENABLE, + SBI_EXT_DBTR_TRIGGER_DISABLE, +}; + +struct sbi_dbtr_data_msg { + unsigned long tstate; + unsigned long tdata1; + unsigned long tdata2; + unsigned long tdata3; +}; + +struct sbi_dbtr_id_msg { + unsigned long idx; +}; + +/* SBI shared mem messages layout */ +struct sbi_dbtr_shmem_entry { + union { + struct sbi_dbtr_data_msg data; + struct sbi_dbtr_id_msg id; + }; +}; + +static bool dbtr_handled; + +/* Expected to be leaf function as not to disrupt frame-pointer */ +static __attribute__((naked)) void exec_call(void) +{ + /* skip over nop when triggered instead of ret. */ + asm volatile (".option push\n" + ".option arch, -c\n" + "nop\n" + "ret\n" + ".option pop\n"); +} + +static void dbtr_exception_handler(struct pt_regs *regs) +{ + dbtr_handled = true; + + /* Reading *epc may cause a fault, skip over nop */ + if ((void *)regs->epc == exec_call) { + regs->epc += 4; + return; + } + + /* WARNING: Skips over the trapped intruction */ + regs->epc += RV_INSN_LEN(readw((void *)regs->epc)); +} + +static bool do_save(void *tdata2) +{ + bool ret; + + writel(0, tdata2); + + ret = dbtr_handled; + dbtr_handled = false; + + return ret; +} + +static bool do_load(void *tdata2) +{ + bool ret; + + readl(tdata2); + + ret = dbtr_handled; + dbtr_handled = false; + + return ret; +} + +static bool do_exec(void) +{ + bool ret; + + exec_call(); + + ret = dbtr_handled; + dbtr_handled = false; + + return ret; +} + +static unsigned long gen_tdata1_mcontrol(enum Tdata1Mode mode, enum Tdata1Value value) +{ + unsigned long tdata1 = SBI_DBTR_TDATA1_TYPE_MCONTROL; + + if (value & VALUE_LOAD) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_LOAD_BIT; + + if (value & VALUE_STORE) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_STORE_BIT; + + if (value & VALUE_EXECUTE) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_EXECUTE_BIT; + + if (mode & MODE_M) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_U_BIT; + + if (mode & MODE_U) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_U_BIT; + + if (mode & MODE_S) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_S_BIT; + + return tdata1; +} + +static unsigned long gen_tdata1_mcontrol6(enum Tdata1Mode mode, enum Tdata1Value value) +{ + unsigned long tdata1 = SBI_DBTR_TDATA1_TYPE_MCONTROL6; + + if (value & VALUE_LOAD) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_LOAD_BIT; + + if (value & VALUE_STORE) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_STORE_BIT; + + if (value & VALUE_EXECUTE) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_EXECUTE_BIT; + + if (mode & MODE_M) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_U_BIT; + + if (mode & MODE_U) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_U_BIT; + + if (mode & MODE_S) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_S_BIT; + + if (mode & MODE_VU) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_VU_BIT; + + if (mode & MODE_VS) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_VS_BIT; + + return tdata1; +} + +static unsigned long gen_tdata1(enum McontrolType type, enum Tdata1Value value, enum Tdata1Mode mode) +{ + switch (type) { + case SBI_DBTR_TDATA1_TYPE_MCONTROL: + return gen_tdata1_mcontrol(mode, value); + case SBI_DBTR_TDATA1_TYPE_MCONTROL6: + return gen_tdata1_mcontrol6(mode, value); + default: + return 0; + } +} + +static struct sbiret sbi_debug_num_triggers(unsigned long trig_tdata1) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_NUM_TRIGGERS, trig_tdata1, 0, 0, 0, 0, 0); +} + +static struct sbiret sbi_debug_set_shmem_raw(unsigned long shmem_phys_lo, + unsigned long shmem_phys_hi, + unsigned long flags) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_SETUP_SHMEM, shmem_phys_lo, + shmem_phys_hi, flags, 0, 0, 0); +} + +static struct sbiret sbi_debug_set_shmem(void *shmem) +{ + phys_addr_t p = virt_to_phys(shmem); + + return sbi_debug_set_shmem_raw(lower_32_bits(p), upper_32_bits(p), 0); +} + +static struct sbiret sbi_debug_read_triggers(unsigned long trig_idx_base, + unsigned long trig_count) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_READ, trig_idx_base, + trig_count, 0, 0, 0, 0); +} + +static struct sbiret sbi_debug_install_triggers(unsigned long trig_count) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_INSTALL, trig_count, 0, 0, 0, 0, 0); +} + +static struct sbiret sbi_debug_update_triggers(unsigned long trig_count) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_UPDATE, trig_count, 0, 0, 0, 0, 0); +} + +static struct sbiret sbi_debug_uninstall_triggers(unsigned long trig_idx_base, + unsigned long trig_idx_mask) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_UNINSTALL, trig_idx_base, + trig_idx_mask, 0, 0, 0, 0); +} + +static struct sbiret sbi_debug_enable_triggers(unsigned long trig_idx_base, + unsigned long trig_idx_mask) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_ENABLE, trig_idx_base, + trig_idx_mask, 0, 0, 0, 0); +} + +static struct sbiret sbi_debug_disable_triggers(unsigned long trig_idx_base, + unsigned long trig_idx_mask) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_DISABLE, trig_idx_base, + trig_idx_mask, 0, 0, 0, 0); +} + +static bool dbtr_install_trigger(struct sbi_dbtr_shmem_entry *shmem, void *tdata2, + unsigned long tdata1) +{ + struct sbiret sbi_ret; + bool ret; + + shmem->data.tdata1 = tdata1; + shmem->data.tdata2 = (unsigned long)tdata2; + + sbi_ret = sbi_debug_install_triggers(1); + ret = sbiret_report_error(&sbi_ret, SBI_SUCCESS, "sbi_debug_install_triggers"); + if (ret) + install_exception_handler(EXC_BREAKPOINT, dbtr_exception_handler); + + return ret; +} + +static bool dbtr_uninstall_trigger(void) +{ + struct sbiret ret; + + install_exception_handler(EXC_BREAKPOINT, NULL); + + ret = sbi_debug_uninstall_triggers(0, 1); + return sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_uninstall_triggers"); +} + +static unsigned long dbtr_test_num_triggers(void) +{ + struct sbiret ret; + unsigned long tdata1 = 0; + /* sbi_debug_num_triggers will return trig_max in sbiret.value when trig_tdata1 == 0 */ + + /* should be at least one trigger. */ + ret = sbi_debug_num_triggers(tdata1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_num_triggers"); + + if (ret.value == 0) + report_fail("sbi_debug_num_triggers: Returned 0 triggers available"); + else + report_pass("sbi_debug_num_triggers: Returned %lu triggers available", ret.value); + + return ret.value; +} + +static enum McontrolType dbtr_test_type(unsigned long *num_trig) +{ + struct sbiret ret; + unsigned long tdata1 = SBI_DBTR_TDATA1_TYPE_MCONTROL6; + + ret = sbi_debug_num_triggers(tdata1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_num_triggers"); + if (ret.value > 0) { + report_pass("sbi_debug_num_triggers: Returned %lu mcontrol6 triggers available", + ret.value); + *num_trig = ret.value; + return tdata1; + } + + tdata1 = SBI_DBTR_TDATA1_TYPE_MCONTROL; + + ret = sbi_debug_num_triggers(tdata1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_num_triggers"); + *num_trig = ret.value; + if (ret.value > 0) { + report_pass("sbi_debug_num_triggers: Returned %lu mcontrol triggers available", + ret.value); + return tdata1; + } + + report_fail("sbi_debug_num_triggers: Returned 0 mcontrol(6) triggers available"); + + return SBI_DBTR_TDATA1_TYPE_NONE; +} + +static struct sbiret dbtr_test_save_install_uninstall(struct sbi_dbtr_shmem_entry *shmem, + enum McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + + report_prefix_push("save_trigger"); + + shmem->data.tdata1 = gen_tdata1(type, VALUE_STORE, MODE_S | MODE_S); + shmem->data.tdata2 = (unsigned long)&test; + + ret = sbi_debug_install_triggers(1); + if (!sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_install_triggers")) { + report_prefix_pop(); + return ret; + } + + install_exception_handler(EXC_BREAKPOINT, dbtr_exception_handler); + + report(do_save(&test), "triggered"); + + if (do_load(&test)) + report_fail("triggered by load"); + + ret = sbi_debug_uninstall_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_uninstall_triggers"); + + if (do_save(&test)) + report_fail("triggered after uninstall"); + + install_exception_handler(EXC_BREAKPOINT, NULL); + report_prefix_pop(); + + return ret; +} + +static void dbtr_test_update(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + bool kfail; + + report_prefix_push("update_trigger"); + + if (!dbtr_install_trigger(shmem, NULL, gen_tdata1(type, VALUE_NONE, MODE_NONE))) { + report_prefix_pop(); + return; + } + + shmem->id.idx = 0; + shmem->data.tdata1 = gen_tdata1(type, VALUE_STORE, MODE_S); + shmem->data.tdata2 = (unsigned long)&test; + + ret = sbi_debug_update_triggers(1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_update_triggers"); + + /* Known broken update_triggers. + * https://lore.kernel.org/opensbi/aDdp1UeUh7GugeHp@ghost/T/#t + */ + kfail = __sbi_get_imp_id() == SBI_IMPL_OPENSBI && + __sbi_get_imp_version() < sbi_impl_opensbi_mk_version(1, 7); + report_kfail(kfail, do_save(&test), "triggered"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_load(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + static unsigned long test; + + report_prefix_push("load_trigger"); + if (!dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_LOAD, MODE_S))) { + report_prefix_pop(); + return; + } + + report(do_load(&test), "triggered"); + + if (do_save(&test)) + report_fail("triggered by save"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_disable_enable(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + + report_prefix_push("sbi_debug_disable_triggers"); + if (!dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + + ret = sbi_debug_disable_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_disable_triggers"); + + if (do_save(&test)) { + report_fail("should not trigger"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); + report_skip("sbi_debug_enable_triggers: no disable"); + + return; + } + + report_pass("should not trigger"); + + report_prefix_pop(); + report_prefix_push("sbi_debug_enable_triggers"); + + ret = sbi_debug_enable_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_enable_triggers"); + + report(do_save(&test), "triggered"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_exec(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + static unsigned long test; + + report_prefix_push("exec_trigger"); + /* check if loads and saves trigger exec */ + if (!dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_EXECUTE, MODE_S))) { + report_prefix_pop(); + return; + } + + if (do_load(&test)) + report_fail("triggered by load"); + + if (do_save(&test)) + report_fail("triggered by save"); + + dbtr_uninstall_trigger(); + + /* Check if exec works */ + if (!dbtr_install_trigger(shmem, exec_call, gen_tdata1(type, VALUE_EXECUTE, MODE_S))) { + report_prefix_pop(); + return; + } + report(do_exec(), "exec trigger"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_read(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + const unsigned long tstatus_expected = SBI_DBTR_TRIG_STATE_S | SBI_DBTR_TRIG_STATE_MAPPED; + const unsigned long tdata1 = gen_tdata1(type, VALUE_STORE, MODE_S); + static unsigned long test; + struct sbiret ret; + + report_prefix_push("sbi_debug_read_triggers"); + if (!dbtr_install_trigger(shmem, &test, tdata1)) { + report_prefix_pop(); + return; + } + + ret = sbi_debug_read_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_read_triggers"); + + report(shmem->data.tdata1 == tdata1, "tdata1 expected: 0x%016lx, found: 0x%016lx", + tdata1, shmem->data.tdata1); + report(shmem->data.tdata2 == ((unsigned long)&test), + "tdata2 expected: 0x%016lx, found: 0x%016lx", ((unsigned long)&test), + shmem->data.tdata2); + report(shmem->data.tstate == tstatus_expected, "tstate expected: 0x%016lx, found: 0x%016lx", + tstatus_expected, shmem->data.tstate); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void check_exec(unsigned long base) +{ + struct sbiret ret; + + report(do_exec(), "exec triggered"); + + ret = sbi_debug_uninstall_triggers(base, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_uninstall_triggers"); +} + +static void dbtr_test_multiple(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type, + unsigned long num_trigs) +{ + static unsigned long test[2]; + struct sbiret ret; + bool have_three = num_trigs > 2; + + if (num_trigs < 2) + return; + + report_prefix_push("test_multiple"); + + if (!dbtr_install_trigger(shmem, &test[0], gen_tdata1(type, VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + if (!dbtr_install_trigger(shmem, &test[1], gen_tdata1(type, VALUE_LOAD, MODE_S))) + goto error; + if (have_three && + !dbtr_install_trigger(shmem, exec_call, gen_tdata1(type, VALUE_EXECUTE, MODE_S))) { + ret = sbi_debug_uninstall_triggers(1, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_uninstall_triggers"); + goto error; + } + + report(do_save(&test[0]), "save triggered"); + + if (do_load(&test[0])) + report_fail("save triggered by load"); + + report(do_load(&test[1]), "load triggered"); + + if (do_save(&test[1])) + report_fail("load triggered by save"); + + if (have_three) + check_exec(2); + + ret = sbi_debug_uninstall_triggers(1, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_uninstall_triggers"); + + if (do_load(&test[1])) + report_fail("load triggered after uninstall"); + + report(do_save(&test[0]), "save triggered"); + + if (!have_three && + dbtr_install_trigger(shmem, exec_call, gen_tdata1(type, VALUE_EXECUTE, MODE_S))) + check_exec(1); + +error: + ret = sbi_debug_uninstall_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_uninstall_triggers"); + + install_exception_handler(EXC_BREAKPOINT, NULL); + report_prefix_pop(); +} + +static void dbtr_test_multiple_types(struct sbi_dbtr_shmem_entry *shmem, unsigned long type) +{ + static unsigned long test; + + report_prefix_push("dbtr_test_multiple_types"); + + /* check if loads and saves trigger exec */ + if (!dbtr_install_trigger(shmem, &test, + gen_tdata1(type, VALUE_EXECUTE | VALUE_LOAD | VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + + report(do_load(&test), "load trigger"); + + report(do_save(&test), "save trigger"); + + dbtr_uninstall_trigger(); + + /* Check if exec works */ + if (!dbtr_install_trigger(shmem, exec_call, + gen_tdata1(type, VALUE_EXECUTE | VALUE_LOAD | VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + + report(do_exec(), "exec trigger"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_disable_uninstall(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + + report_prefix_push("disable uninstall"); + if (!dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + + ret = sbi_debug_disable_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_disable_triggers"); + + dbtr_uninstall_trigger(); + + if (!dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + + report(do_save(&test), "triggered"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_uninstall_enable(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + + report_prefix_push("uninstall enable"); + if (!dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + dbtr_uninstall_trigger(); + + ret = sbi_debug_enable_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_enable_triggers"); + + install_exception_handler(EXC_BREAKPOINT, dbtr_exception_handler); + + report(!do_save(&test), "should not trigger"); + + install_exception_handler(EXC_BREAKPOINT, NULL); + report_prefix_pop(); +} + +static void dbtr_test_uninstall_update(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + bool kfail; + + report_prefix_push("uninstall update"); + if (!dbtr_install_trigger(shmem, NULL, gen_tdata1(type, VALUE_NONE, MODE_NONE))) { + report_prefix_pop(); + return; + } + + dbtr_uninstall_trigger(); + + shmem->id.idx = 0; + shmem->data.tdata1 = gen_tdata1(type, VALUE_STORE, MODE_S); + shmem->data.tdata2 = (unsigned long)&test; + + /* Known broken update_triggers. + * https://lore.kernel.org/opensbi/aDdp1UeUh7GugeHp@ghost/T/#t + */ + kfail = __sbi_get_imp_id() == SBI_IMPL_OPENSBI && + __sbi_get_imp_version() < sbi_impl_opensbi_mk_version(1, 7); + ret = sbi_debug_update_triggers(1); + sbiret_kfail_error(kfail, &ret, SBI_ERR_FAILURE, "sbi_debug_update_triggers"); + + install_exception_handler(EXC_BREAKPOINT, dbtr_exception_handler); + + report(!do_save(&test), "should not trigger"); + + install_exception_handler(EXC_BREAKPOINT, NULL); + report_prefix_pop(); +} + +static void dbtr_test_disable_read(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + const unsigned long tstatus_expected = SBI_DBTR_TRIG_STATE_S | SBI_DBTR_TRIG_STATE_MAPPED; + const unsigned long tdata1 = gen_tdata1(type, VALUE_STORE, MODE_NONE); + static unsigned long test; + struct sbiret ret; + + report_prefix_push("disable_read"); + if (!dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + + ret = sbi_debug_disable_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_disable_triggers"); + + ret = sbi_debug_read_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_read_triggers"); + + report(shmem->data.tdata1 == tdata1, "tdata1 expected: 0x%016lx, found: 0x%016lx", + tdata1, shmem->data.tdata1); + report(shmem->data.tdata2 == ((unsigned long)&test), + "tdata2 expected: 0x%016lx, found: 0x%016lx", + ((unsigned long)&test), shmem->data.tdata2); + report(shmem->data.tstate == tstatus_expected, "tstate expected: 0x%016lx, found: 0x%016lx", + tstatus_expected, shmem->data.tstate); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +void check_dbtr(void) +{ + static struct sbi_dbtr_shmem_entry shmem[RV_MAX_TRIGGERS] = {}; + unsigned long num_trigs; + enum McontrolType trig_type; + struct sbiret ret; + + report_prefix_push("dbtr"); + + if (!sbi_probe(SBI_EXT_DBTR)) { + report_skip("extension not available"); + report_prefix_pop(); + return; + } + + if (__sbi_get_imp_id() == SBI_IMPL_OPENSBI && + __sbi_get_imp_version() < sbi_impl_opensbi_mk_version(1, 6)) { + report_skip("OpenSBI < v1.7 detected, skipping tests"); + report_prefix_pop(); + return; + } + + num_trigs = dbtr_test_num_triggers(); + if (!num_trigs) + goto error; + + trig_type = dbtr_test_type(&num_trigs); + if (trig_type == SBI_DBTR_TDATA1_TYPE_NONE) + goto error; + + ret = sbi_debug_set_shmem(shmem); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_set_shmem"); + + ret = dbtr_test_save_install_uninstall(&shmem[0], trig_type); + /* install or uninstall failed */ + if (ret.error != SBI_SUCCESS) + goto error; + + dbtr_test_load(&shmem[0], trig_type); + dbtr_test_exec(&shmem[0], trig_type); + dbtr_test_read(&shmem[0], trig_type); + dbtr_test_disable_enable(&shmem[0], trig_type); + dbtr_test_update(&shmem[0], trig_type); + dbtr_test_multiple_types(&shmem[0], trig_type); + dbtr_test_multiple(shmem, trig_type, num_trigs); + dbtr_test_disable_uninstall(&shmem[0], trig_type); + dbtr_test_uninstall_enable(&shmem[0], trig_type); + dbtr_test_uninstall_update(&shmem[0], trig_type); + dbtr_test_disable_read(&shmem[0], trig_type); + +error: + report_prefix_pop(); +} diff --git a/riscv/sbi-tests.h b/riscv/sbi-tests.h index d5c4ae70..6a227745 100644 --- a/riscv/sbi-tests.h +++ b/riscv/sbi-tests.h @@ -99,6 +99,7 @@ static inline bool env_enabled(const char *env) void sbi_bad_fid(int ext); void check_sse(void); +void check_dbtr(void); #endif /* __ASSEMBLER__ */ #endif /* _RISCV_SBI_TESTS_H_ */ diff --git a/riscv/sbi.c b/riscv/sbi.c index edb1a6be..5bd496d0 100644 --- a/riscv/sbi.c +++ b/riscv/sbi.c @@ -1561,6 +1561,7 @@ int main(int argc, char **argv) check_susp(); check_sse(); check_fwft(); + check_dbtr(); return report_summary(); } -- 2.43.0

3 weeks, 1 day

2
1
0 0

[PATCH net-next v2 0/4] netdevsim: implement RX statistics using NETDEV_PCPU_STAT_DSTATS

by Breno Leitao

The netdevsim driver previously lacked RX statistics support, which prevented its use with the GenerateTraffic() test framework, as this framework verifies traffic flow by checking RX byte counts. This patch migrates netdevsim from its custom statistics collection to the NETDEV_PCPU_STAT_DSTATS framework, as suggested by Jakub. This change not only standardizes the statistics handling but also adds the necessary RX statistics support required by the test framework. Signed-off-by: Breno Leitao <leitao(a)debian.org> --- Changes in v2: - Changed the RX collection place from nsim_napi_rx() to nsim_rcv (Joe Damato) - Collect RX dropped packets statistic in nsim_queue_free() (Jakub) - Added a helper in dstat to add values to RX dropped packets - Link to v1: https://lore.kernel.org/r/20250611-netdevsim_stat-v1-0-c11b657d96bf@debian.… --- Breno Leitao (4): netdevsim: migrate to dstats stats collection netdevsim: collect statistics at RX side net: add dev_dstats_rx_dropped_add() helper netdevsim: account dropped packet length in stats on queue free drivers/net/netdevsim/netdev.c | 48 ++++++++++++++++----------------------- drivers/net/netdevsim/netdevsim.h | 5 ---- include/linux/netdevice.h | 10 ++++++++ 3 files changed, 29 insertions(+), 34 deletions(-) --- base-commit: 6d4e01d29d87356924f1521ca6df7a364e948f13 change-id: 20250610-netdevsim_stat-95995921e03e Best regards, -- Breno Leitao <leitao(a)debian.org>

3 weeks, 1 day

2
7
0 0

[PATCH v9 iproute2-next 0/1] DUALPI2 iproute2 patch

by chia-yu.chang＠nokia-bell-labs.com

From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com> Hello, Please find DUALPI2 iproute2 patch v9. For more details of DualPI2, please refer IETF RFC9332 (https://datatracker.ietf.org/doc/html/rfc9332). Best Regards, Chia-Yu --- v9 (13-Jun-25) - Fix space issue and typos (ALOK TIWARI <alok.a.tiwari(a)oracle.com>) - Change 'rtt_typical' to 'typical_rtt' in tc/q_dualpi2.c (ALOK TIWARI <alok.a.tiwari(a)oracle.com>) - Add the num of enum used by DualPI2 in pkt_sched.h v8 (09-May-25) - Update pkt_sched.h with the one in nex-next - Correct a typo in the comment within pkt_sched.h (ALOK TIWARI <alok.a.tiwari(a)oracle.com>) - Update manual content in man/man8/tc-dualpi2.8 (ALOK TIWARI <alok.a.tiwari(a)oracle.com>) - Update tc/q_dualpi2.c to fix missing blank lines and add missing case (ALOK TIWARI <alok.a.tiwari(a)oracle.com>) v7 (05-May-25) - Align pkt_sched.h with the v14 version of net-next due to spec modification in tc.yaml - Reorganize dualpi2_print_opt() to match the order in tc.yaml - Remove credit-queue in PRINT_JSON v6 (26-Apr-25) - Update JSON file output due to spec modification in tc.yaml of net-next v5 (25-Mar-25) - Use matches() to replace current strcmp() (Stephen Hemminger <stephen(a)networkplumber.org>) - Use general parse_percent() for handling scaled percentage values (Stephen Hemminger <stephen(a)networkplumber.org>) - Add print function for JSON of dualpi2 stats (Stephen Hemminger <stephen(a)networkplumber.org>) v4 (16-Mar-25) - Add min_qlen_step to the dualpi2 attribute as the minimum queue length in number of packets in the L-queue to start step marking. v3 (21-Feb-25) - Add memlimit to the dualpi2 attribute, and add memory_used, max_memory_used, and memory_limit in dualpi2 stats (Dave Taht <dave.taht(a)gmail.com>) - Update the manual to align with the latest implementation and clarify the queue naming and default unit - Use common "get_scaled_alpha_beta" and clean print_opt for Dualpi2 v2 (23-Oct-24) - Rename get_float in dualpi2 to get_float_min_max in utils.c - Move get_float from iplink_can.c in utils.c (Stephen Hemminger <stephen(a)networkplumber.org>) - Add print function for JSON of dualpi2 (Stephen Hemminger <stephen(a)networkplumber.org>) --- Chia-Yu Chang (1): tc: add dualpi2 scheduler module bash-completion/tc | 11 +- include/uapi/linux/pkt_sched.h | 70 ++++- include/utils.h | 2 + ip/iplink_can.c | 14 - lib/utils.c | 30 ++ man/man8/tc-dualpi2.8 | 249 +++++++++++++++ tc/Makefile | 1 + tc/q_dualpi2.c | 534 +++++++++++++++++++++++++++++++++ 8 files changed, 895 insertions(+), 16 deletions(-) create mode 100644 man/man8/tc-dualpi2.8 create mode 100644 tc/q_dualpi2.c -- 2.34.1

3 weeks, 1 day

1
1
0 0

[kvm-unit-tests PATCH 1/2] riscv: Allow SBI_CONSOLE with no uart in device tree

by Jesse Taube

When CONFIG_SBI_CONSOLE is enabled and there is no uart defined in the device tree kvm-unit-tests fails to start. Only check if uart exists in device tree if SBI_CONSOLE is false. Signed-off-by: Jesse Taube <jesse(a)rivosinc.com> --- lib/riscv/io.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/riscv/io.c b/lib/riscv/io.c index fb40adb7..96a3c048 100644 --- a/lib/riscv/io.c +++ b/lib/riscv/io.c @@ -104,6 +104,7 @@ static void uart0_init_acpi(void) void io_init(void) { +#ifndef CONFIG_SBI_CONSOLE if (dt_available()) uart0_init_fdt(); else @@ -114,6 +115,7 @@ void io_init(void) "Found uart at %p, but early base is %p.\n", uart0_base, UART_EARLY_BASE); } +#endif } #ifdef CONFIG_SBI_CONSOLE -- 2.43.0

3 weeks, 1 day

2
3
0 0

[PATCH v2] selftests/mm: Add configs to fix testcase failure

by Dev Jain

If CONFIG_UPROBES is not set, a merge subtest fails: Failure log: 7151 12:46:54.627936 # # # RUN merge.handle_uprobe_upon_merged_vma ... 7152 12:46:54.639014 # # f /sys/bus/event_source/devices/uprobe/type 7153 12:46:54.639306 # # fopen: No such file or directory 7154 12:46:54.650451 # # # merge.c:473:handle_uprobe_upon_merged_vma:Expected read_sysfs("/sys/bus/event_source/devices/uprobe/type", &type) (1) == 0 (0) 7155 12:46:54.650730 # # # handle_uprobe_upon_merged_vma: Test terminated by assertion 7156 12:46:54.661750 # # # FAIL merge.handle_uprobe_upon_merged_vma 7157 12:46:54.662030 # # not ok 8 merge.handle_uprobe_upon_merged_vma CONFIG_UPROBES is enabled by CONFIG_UPROBE_EVENTS, which gets enabled by CONFIG_FTRACE. Therefore add these configs to selftests/mm/config so that CI systems can include this config in the kernel build. To be completely safe, add CONFIG_PROFILING too, to enable the dependency chain PROFILING -> PERF_EVENTS -> UPROBE_EVENTS -> UPROBES. Fixes: efe99fabeb11b ("selftests/mm: add test about uprobe pte be orphan during vma merge") Reported-by: Aishwarya <aishwarya.tcv(a)arm.com> Closes: https://lore.kernel.org/all/20250610103729.72440-1-aishwarya.tcv@arm.com/ Tested-by: Aishwarya TCV <aishwarya.tcv(a)arm.com> Tested-by : Donet Tom <donettom(a)linux.ibm.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Signed-off-by: Dev Jain <dev.jain(a)arm.com> --- v1->v2: - Add CONFIG_UPROBES (Mark Brown) - Add CONFIG_PROFILING (Lorenzo) tools/testing/selftests/mm/config | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/testing/selftests/mm/config b/tools/testing/selftests/mm/config index a28baa536332..deba93379c80 100644 --- a/tools/testing/selftests/mm/config +++ b/tools/testing/selftests/mm/config @@ -8,3 +8,6 @@ CONFIG_GUP_TEST=y CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_MEM_SOFT_DIRTY=y CONFIG_ANON_VMA_NAME=y +CONFIG_FTRACE=y +CONFIG_PROFILING=y +CONFIG_UPROBES=y -- 2.30.2

3 weeks, 1 day

5
4
0 0

[PATCH] selftest/mm: Skip if fallocate() is unsupported in gup_longterm

by Mark Brown

Currently gup_longterm assumes that filesystems support fallocate() and uses that to allocate space in files, however this is an optional feature and is in particular not implemented by NFSv3 which is commonly used in CI systems leading to spurious failures. Check for lack of support and report a skip instead for that case. Signed-off-by: Mark Brown <broonie(a)kernel.org> --- tools/testing/selftests/mm/gup_longterm.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/mm/gup_longterm.c b/tools/testing/selftests/mm/gup_longterm.c index 8a97ac5176a4..0e99494268ed 100644 --- a/tools/testing/selftests/mm/gup_longterm.c +++ b/tools/testing/selftests/mm/gup_longterm.c @@ -114,7 +114,15 @@ static void do_test(int fd, size_t size, enum test_type type, bool shared) } if (fallocate(fd, 0, 0, size)) { - if (size == pagesize) { + /* + * Some filesystems (eg, NFSv3) don't support + * fallocate(), report this as a skip rather than a + * test failure. + */ + if (errno == EOPNOTSUPP) { + ksft_print_msg("fallocate() not supported by filesystem\n"); + result = KSFT_SKIP; + } else if (size == pagesize) { ksft_print_msg("fallocate() failed (%s)\n", strerror(errno)); result = KSFT_FAIL; } else { --- base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494 change-id: 20250610-selftest-mm-gup-longterm-fallocate-nfs-21ef54627ef2 Best regards, -- Mark Brown <broonie(a)kernel.org>

3 weeks, 1 day

1
0
0 0

[PATCH net-next v2 0/7] netpoll: Untangle netconsole and netpoll

by Breno Leitao

Initially netpoll and netconsole were created together, and some functions are in the wrong file. Seperate netconsole-only functions in netconsole, avoiding exports. 1. Expose netpoll logging macros in the public header to enable consistent log formatting across netpoll consumers. 2. Relocate netconsole-specific functions from netpoll to the netconsole module where they are actually used, reducing unnecessary coupling. 3. Remove unnecessary function exports 4. Rename netpoll parsing functions in netconsole to better reflect their specific usage. 5. Create a test to check that cmdline works fine. This was in my todo list since [1], this was a good time to add it here to make sure this patchset doesn't regress. PS: The code was split in a way that it is easy to review. When copying the functions from netpoll to netconsole, I do not change than other than adding `static`. This will make checkpatch unhappy, but, further patches will address the issues. It is done this way to make it easy for reviewers. Link: https://lore.kernel.org/netdev/Z36TlACdNMwFD7wv@dev-ushankar.dev.purestorag… [1] Signed-off-by: Breno Leitao <leitao(a)debian.org> --- Changes in v2: - No change in the code. Just rebased the patches onto netnext/main - Link to v1: https://lore.kernel.org/r/20250610-rework-v1-0-7cfde283f246@debian.org --- Breno Leitao (7): netpoll: remove __netpoll_cleanup from exported API netpoll: expose netpoll logging macros in public header netpoll: relocate netconsole-specific functions to netconsole module netpoll: move netpoll_print_options to netconsole netconsole: rename functions to better reflect their purpose netconsole: improve code style in parser function selftest: netconsole: add test for cmdline configuration drivers/net/netconsole.c | 137 ++++++++++++++++++++- include/linux/netpoll.h | 10 +- net/core/netpoll.c | 136 +------------------- tools/testing/selftests/drivers/net/Makefile | 1 + .../selftests/drivers/net/lib/sh/lib_netcons.sh | 39 +++++- .../selftests/drivers/net/netcons_cmdline.sh | 52 ++++++++ 6 files changed, 228 insertions(+), 147 deletions(-) --- base-commit: 0097c4195b1d0ca57d15979626c769c74747b5a0 change-id: 20250603-rework-c175cad8d22e Best regards, -- Breno Leitao <leitao(a)debian.org>

3 weeks, 1 day

2
9
0 0

[PATCH v2 1/2] selftests: khugepaged: fix the shmem collapse failure

by Baolin Wang

When running the khugepaged selftest for shmem (./khugepaged all:shmem), I encountered the following test failures: " Run test: collapse_full (khugepaged:shmem) Collapse multiple fully populated PTE table.... Fail ... Run test: collapse_single_pte_entry (khugepaged:shmem) Collapse PTE table with single PTE entry present.... Fail ... Run test: collapse_full_of_compound (khugepaged:shmem) Allocate huge page... OK Split huge page leaving single PTE page table full of compound pages... OK Collapse PTE table full of compound pages.... Fail " The reason for the failure is that, it will set MADV_NOHUGEPAGE to prevent khugepaged from continuing to scan shmem VMA after khugepaged finishes scanning in the wait_for_scan() function. Moreover, shmem requires a refault to establish PMD mappings. However, after commit 2b0f922323cc ("mm: don't install PMD mappings when THPs are disabled by the hw/process/vma"), PMD mappings are prevented if the VMA is set with MADV_NOHUGEPAGE flag, so shmem cannot establish PMD mappings during refault. One way to fix this issue is to move the MADV_NOHUGEPAGE setting after the shmem refault. After shmem refault and check huge, the test case will unmap the shmem immediately. So it seems unnecessary to set the MADV_NOHUGEPAGE. Then we can simply drop the MADV_NOHUGEPAGE setting, and all khugepaged test cases passed. Fixes: 2b0f922323cc ("mm: don't install PMD mappings when THPs are disabled by the hw/process/vma") Reviewed-by: Zi Yan <ziy(a)nvidia.com> Signed-off-by: Baolin Wang <baolin.wang(a)linux.alibaba.com> --- Changes from v1: - Add reviewed tag from Zi. Thanks. - Drop the MADV_NOHUGEPAGE setting, per David. --- tools/testing/selftests/mm/khugepaged.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selftests/mm/khugepaged.c index 8a4d34cce36b..4341ce6b3b38 100644 --- a/tools/testing/selftests/mm/khugepaged.c +++ b/tools/testing/selftests/mm/khugepaged.c @@ -561,8 +561,6 @@ static bool wait_for_scan(const char *msg, char *p, int nr_hpages, usleep(TICK); } - madvise(p, nr_hpages * hpage_pmd_size, MADV_NOHUGEPAGE); - return timeout == -1; } -- 2.43.5

3 weeks, 1 day

3
3
0 0

[PATCH v4 00/14] kselftest harness and nolibc compatibility

by Thomas Weißschuh

Nolibc is useful for selftests as the test programs can be very small, and compiled with just a kernel crosscompiler, without userspace support. Currently nolibc is only usable with kselftest.h, not the more convenient to use kselftest_harness.h This series provides this compatibility by removing the usage of problematic libc features from the harness. Based on nolibc/for-next. The series is meant to be merged through the nolibc tree. Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- Changes in v4: - Drop patches for nolibc which where already applied - Preserve signatures of test functions for tests making assumptions about them drop 'selftests: harness: Always provide "self" and "variant"' add 'selftests: harness: Add "variant" and "self" to test metadata' adapt 'selftests: harness: Stop using setjmp()/longjmp()' - Validate test function signatures in harness selftest - Link to v3: https://lore.kernel.org/r/20250411-nolibc-kselftest-harness-v3-0-4d9c029589… Changes in v3: - Send patches to correct kselftest harness maintainers - Move harness selftest to dedicated directory - Add harness selftest to MAINTAINERS - Integrate harness selftest cleanup with the selftest framework - Consistently use "kselftest harness" in commit messages - Properly propagate kselftest harness failure - Link to v2: https://lore.kernel.org/r/20250407-nolibc-kselftest-harness-v2-0-f8812f76e9… Changes in v2: - Rebase unto v6.15-rc1 - Rename internal nolibc symbols - Handle edge case of waitpid(INT_MIN) == ESRCH - Fix arm configurations for final testing patch - Clean up global getopt.h variable declarations - Add Acks from Willy - Link to v1: https://lore.kernel.org/r/20250304-nolibc-kselftest-harness-v1-0-adca7cd231… --- Thomas Weißschuh (14): selftests: harness: Add kselftest harness selftest selftests: harness: Use C89 comment style selftests: harness: Ignore unused variant argument warning selftests: harness: Mark functions without prototypes static selftests: harness: Remove inline qualifier for wrappers selftests: harness: Remove dependency on libatomic selftests: harness: Implement test timeouts through pidfd selftests: harness: Don't set setup_completed for fixtureless tests selftests: harness: Move teardown conditional into test metadata selftests: harness: Add teardown callback to test metadata selftests: harness: Add "variant" and "self" to test metadata selftests: harness: Stop using setjmp()/longjmp() selftests: harness: Guard includes on nolibc HACK: selftests/nolibc: demonstrate usage of the kselftest harness MAINTAINERS | 1 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/kselftest_harness.h | 175 +- .../testing/selftests/kselftest_harness/.gitignore | 2 + tools/testing/selftests/kselftest_harness/Makefile | 7 + .../selftests/kselftest_harness/harness-selftest.c | 138 ++ .../kselftest_harness/harness-selftest.expected | 64 + .../kselftest_harness/harness-selftest.sh | 13 + tools/testing/selftests/nolibc/Makefile | 15 +- tools/testing/selftests/nolibc/harness-selftest.c | 1 + tools/testing/selftests/nolibc/nolibc-test.c | 1715 +------------------- tools/testing/selftests/nolibc/run-tests.sh | 2 +- 12 files changed, 313 insertions(+), 1821 deletions(-) --- base-commit: 2051d3b830c0889ae55e37e9e8ff0d43a4acd482 change-id: 20250130-nolibc-kselftest-harness-8b2c8cac43bf Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

3 weeks, 1 day

6
45
0 0

[PATCH v3 10/10] selftests/sched_ext: Add test for sched_ext dl_server

by Joel Fernandes

From: Andrea Righi <arighi(a)nvidia.com> Add a selftest to validate the correct behavior of the deadline server for the ext_sched_class. [ Joel: Replaced occurences of CFS in the test with EXT. ] Signed-off-by: Joel Fernandes <joelagnelf(a)nvidia.com> Signed-off-by: Andrea Righi <arighi(a)nvidia.com> --- tools/testing/selftests/sched_ext/Makefile | 1 + .../selftests/sched_ext/rt_stall.bpf.c | 23 ++ tools/testing/selftests/sched_ext/rt_stall.c | 213 ++++++++++++++++++ 3 files changed, 237 insertions(+) create mode 100644 tools/testing/selftests/sched_ext/rt_stall.bpf.c create mode 100644 tools/testing/selftests/sched_ext/rt_stall.c diff --git a/tools/testing/selftests/sched_ext/Makefile b/tools/testing/selftests/sched_ext/Makefile index f4531327b8e7..dcc803eeab39 100644 --- a/tools/testing/selftests/sched_ext/Makefile +++ b/tools/testing/selftests/sched_ext/Makefile @@ -181,6 +181,7 @@ auto-test-targets := \ select_cpu_dispatch_bad_dsq \ select_cpu_dispatch_dbl_dsp \ select_cpu_vtime \ + rt_stall \ test_example \ testcase-targets := $(addsuffix .o,$(addprefix $(SCXOBJ_DIR)/,$(auto-test-targets))) diff --git a/tools/testing/selftests/sched_ext/rt_stall.bpf.c b/tools/testing/selftests/sched_ext/rt_stall.bpf.c new file mode 100644 index 000000000000..80086779dd1e --- /dev/null +++ b/tools/testing/selftests/sched_ext/rt_stall.bpf.c @@ -0,0 +1,23 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * A scheduler that verified if RT tasks can stall SCHED_EXT tasks. + * + * Copyright (c) 2025 NVIDIA Corporation. + */ + +#include <scx/common.bpf.h> + +char _license[] SEC("license") = "GPL"; + +UEI_DEFINE(uei); + +void BPF_STRUCT_OPS(rt_stall_exit, struct scx_exit_info *ei) +{ + UEI_RECORD(uei, ei); +} + +SEC(".struct_ops.link") +struct sched_ext_ops rt_stall_ops = { + .exit = (void *)rt_stall_exit, + .name = "rt_stall", +}; diff --git a/tools/testing/selftests/sched_ext/rt_stall.c b/tools/testing/selftests/sched_ext/rt_stall.c new file mode 100644 index 000000000000..d4cb545ebfd8 --- /dev/null +++ b/tools/testing/selftests/sched_ext/rt_stall.c @@ -0,0 +1,213 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2025 NVIDIA Corporation. + */ +#define _GNU_SOURCE +#include <stdio.h> +#include <stdlib.h> +#include <unistd.h> +#include <sched.h> +#include <sys/prctl.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <time.h> +#include <linux/sched.h> +#include <signal.h> +#include <bpf/bpf.h> +#include <scx/common.h> +#include <sys/wait.h> +#include <unistd.h> +#include "rt_stall.bpf.skel.h" +#include "scx_test.h" +#include "../kselftest.h" + +#define CORE_ID 0 /* CPU to pin tasks to */ +#define RUN_TIME 5 /* How long to run the test in seconds */ + +/* Simple busy-wait function for test tasks */ +static void process_func(void) +{ + while (1) { + /* Busy wait */ + for (volatile unsigned long i = 0; i < 10000000UL; i++); + } +} + +/* Set CPU affinity to a specific core */ +static void set_affinity(int cpu) +{ + cpu_set_t mask; + + CPU_ZERO(&mask); + CPU_SET(cpu, &mask); + if (sched_setaffinity(0, sizeof(mask), &mask) != 0) { + perror("sched_setaffinity"); + exit(EXIT_FAILURE); + } +} + +/* Set task scheduling policy and priority */ +static void set_sched(int policy, int priority) +{ + struct sched_param param; + + param.sched_priority = priority; + if (sched_setscheduler(0, policy, &param) != 0) { + perror("sched_setscheduler"); + exit(EXIT_FAILURE); + } +} + +/* Get process runtime from /proc/<pid>/stat */ +static float get_process_runtime(int pid) +{ + char path[256]; + FILE *file; + long utime, stime; + int fields; + + snprintf(path, sizeof(path), "/proc/%d/stat", pid); + file = fopen(path, "r"); + if (file == NULL) { + perror("Failed to open stat file"); + return -1; + } + + /* Skip the first 13 fields and read the 14th and 15th */ + fields = fscanf(file, + "%*d %*s %*c %*d %*d %*d %*d %*d %*u %*u %*u %*u %*u %lu %lu", + &utime, &stime); + fclose(file); + + if (fields != 2) { + fprintf(stderr, "Failed to read stat file\n"); + return -1; + } + + /* Calculate the total time spent in the process */ + long total_time = utime + stime; + long ticks_per_second = sysconf(_SC_CLK_TCK); + float runtime_seconds = total_time * 1.0 / ticks_per_second; + + return runtime_seconds; +} + +static enum scx_test_status setup(void **ctx) +{ + struct rt_stall *skel; + + skel = rt_stall__open(); + SCX_FAIL_IF(!skel, "Failed to open"); + SCX_ENUM_INIT(skel); + SCX_FAIL_IF(rt_stall__load(skel), "Failed to load skel"); + + *ctx = skel; + + return SCX_TEST_PASS; +} + +static bool sched_stress_test(void) +{ + float cfs_runtime, rt_runtime; + int cfs_pid, rt_pid; + float expected_min_ratio = 0.04; /* 4% */ + + ksft_print_header(); + ksft_set_plan(1); + + /* Create and set up a EXT task */ + cfs_pid = fork(); + if (cfs_pid == 0) { + set_affinity(CORE_ID); + process_func(); + exit(0); + } else if (cfs_pid < 0) { + perror("fork for EXT task"); + ksft_exit_fail(); + } + + /* Create an RT task */ + rt_pid = fork(); + if (rt_pid == 0) { + set_affinity(CORE_ID); + set_sched(SCHED_FIFO, 50); + process_func(); + exit(0); + } else if (rt_pid < 0) { + perror("fork for RT task"); + ksft_exit_fail(); + } + + /* Let the processes run for the specified time */ + sleep(RUN_TIME); + + /* Get runtime for the EXT task */ + cfs_runtime = get_process_runtime(cfs_pid); + if (cfs_runtime != -1) + ksft_print_msg("Runtime of EXT task (PID %d) is %f seconds\n", cfs_pid, cfs_runtime); + else + ksft_exit_fail_msg("Error getting runtime for EXT task (PID %d)\n", cfs_pid); + + /* Get runtime for the RT task */ + rt_runtime = get_process_runtime(rt_pid); + if (rt_runtime != -1) + ksft_print_msg("Runtime of RT task (PID %d) is %f seconds\n", rt_pid, rt_runtime); + else + ksft_exit_fail_msg("Error getting runtime for RT task (PID %d)\n", rt_pid); + + /* Kill the processes */ + kill(cfs_pid, SIGKILL); + kill(rt_pid, SIGKILL); + waitpid(cfs_pid, NULL, 0); + waitpid(rt_pid, NULL, 0); + + /* Verify that the scx task got enough runtime */ + float actual_ratio = cfs_runtime / (cfs_runtime + rt_runtime); + ksft_print_msg("EXT task got %.2f%% of total runtime\n", actual_ratio * 100); + + if (actual_ratio >= expected_min_ratio) { + ksft_test_result_pass("PASS: EXT task got more than %.2f%% of runtime\n", + expected_min_ratio * 100); + return true; + } else { + ksft_test_result_fail("FAIL: EXT task got less than %.2f%% of runtime\n", + expected_min_ratio * 100); + return false; + } +} + +static enum scx_test_status run(void *ctx) +{ + struct rt_stall *skel = ctx; + struct bpf_link *link; + bool res; + + link = bpf_map__attach_struct_ops(skel->maps.rt_stall_ops); + SCX_FAIL_IF(!link, "Failed to attach scheduler"); + + res = sched_stress_test(); + + SCX_EQ(skel->data->uei.kind, EXIT_KIND(SCX_EXIT_NONE)); + bpf_link__destroy(link); + + if (!res) + ksft_exit_fail(); + + return SCX_TEST_PASS; +} + +static void cleanup(void *ctx) +{ + struct rt_stall *skel = ctx; + + rt_stall__destroy(skel); +} + +struct scx_test rt_stall = { + .name = "rt_stall", + .description = "Verify that RT tasks cannot stall SCHED_EXT tasks", + .setup = setup, + .run = run, + .cleanup = cleanup, +}; +REGISTER_SCX_TEST(&rt_stall) -- 2.34.1

3 weeks, 1 day

1
0
0 0

[PATCH V5 2/2] KVM: selftests: Change MDSCR_EL1 register holding variables as uint64_t

by Anshuman Khandual

Change MDSCR_EL1 register holding local variables as uint64_t that reflects its true register width as well. Cc: Marc Zyngier <maz(a)kernel.org> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Joey Gouly <joey.gouly(a)arm.com> Cc: kvm(a)vger.kernel.org Cc: kvmarm(a)lists.linux.dev Cc: linux-kernel(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: linux-arm-kernel(a)lists.infradead.org Reviewed-by: Ada Couprie Diaz <ada.coupriediaz(a)arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual(a)arm.com> --- tools/testing/selftests/kvm/arm64/debug-exceptions.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/kvm/arm64/debug-exceptions.c b/tools/testing/selftests/kvm/arm64/debug-exceptions.c index c7fb55c9135b..e34963956fbc 100644 --- a/tools/testing/selftests/kvm/arm64/debug-exceptions.c +++ b/tools/testing/selftests/kvm/arm64/debug-exceptions.c @@ -140,7 +140,7 @@ static void enable_os_lock(void) static void enable_monitor_debug_exceptions(void) { - uint32_t mdscr; + uint64_t mdscr; asm volatile("msr daifclr, #8"); @@ -223,7 +223,7 @@ void install_hw_bp_ctx(uint8_t addr_bp, uint8_t ctx_bp, uint64_t addr, static void install_ss(void) { - uint32_t mdscr; + uint64_t mdscr; asm volatile("msr daifclr, #8"); -- 2.25.1

3 weeks, 1 day

1
0
0 0

[PATCH net-next] selftests: tcp_ao: fix spelling in seq-ext.c comment

by Ankit Chauhan

Spelling fix: conneciton --> connection This is a non-functional change aimed at improving code clarity. Signed-off-by: Ankit Chauhan <ankitchauhan2065(a)gmail.com> --- tools/testing/selftests/net/tcp_ao/seq-ext.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/tcp_ao/seq-ext.c b/tools/testing/selftests/net/tcp_ao/seq-ext.c index f00245263b20..6478da6a71c3 100644 --- a/tools/testing/selftests/net/tcp_ao/seq-ext.c +++ b/tools/testing/selftests/net/tcp_ao/seq-ext.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 /* Check that after SEQ number wrap-around: * 1. SEQ-extension has upper bytes set - * 2. TCP conneciton is alive and no TCPAOBad segments + * 2. TCP connection is alive and no TCPAOBad segments * In order to test (2), the test doesn't just adjust seq number for a queue * on a connected socket, but migrates it to another sk+port number, so * that there won't be any delayed packets that will fail to verify -- 2.34.1

3 weeks, 1 day

2
1
0 0

[PATCH v17 RESEND net-next 0/5] DUALPI2 patch

by chia-yu.chang＠nokia-bell-labs.com

From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com> Hello, Please find the DualPI2 patch v17. This patch serise adds DualPI Improved with a Square (DualPI2) with following features: * Supports congestion controls that comply with the Prague requirements in RFC9331 (e.g. TCP-Prague) * Coupled dual-queue that separates the L4S traffic in a low latency queue (L-queue), without harming remaining traffic that is scheduled in classic queue (C-queue) due to congestion-coupling using PI2 as defined in RFC9332 * Configurable overload strategies * Use of sojourn time to reliably estimate queue delay * Supports ECN L4S-identifier (IP.ECN==0b*1) to classify traffic into respective queues For more details of DualPI2, please refer IETF RFC9332 (https://datatracker.ietf.org/doc/html/rfc9332). Best regards, Chia-Yu --- v17 (25-May-2025, Resent at 11-Jun-2025) - Replace 0xffffffff with U32_MAX (Paolo Abeni <pabeni(a)redhat.com>) - Use helper function qdisc_dequeue_internal() and add new helper function skb_apply_step() (Paolo Abeni <pabeni(a)redhat.com>) - Add s64 casting when calculating the delta of the PI controller (Paolo Abeni <pabeni(a)redhat.com>) - Change the drop reason into SKB_DROP_REASON_QDISC_CONGESTED for drop_early (Paolo Abeni <pabeni(a)redhat.com>) - Modify the condition to remove the original skb when enqueuing multiple GSO segments (Paolo Abeni <pabeni(a)redhat.com>) - Add READ_ONCE() in dualpi2_dump_stat() (Paolo Abeni <pabeni(a)redhat.com>) - Add comments, brackets, and brackets for readability (Paolo Abeni <pabeni(a)redhat.com>) v16 (16-MAy-2025) - Add qdisc_lock() to dualpi2_timer() in dualpi2_timer (Paolo Abeni <pabeni(a)redhat.com>) - Introduce convert_ns_to_usec() to convert usec to nsec without overflow in #1 (Paolo Abeni <pabeni(a)redhat.com>) - Update convert_us_tonsec() to convert nsec to usec without overflow in #2 (Paolo Abeni <pabeni(a)redhat.com>) - Add more descriptions with respect to DualPI2 in the cover ltter and add changelog in each patch (Paolo Abeni <pabeni(a)redhat.com>) v15 (09-May-2025) - Add enum of TCA_DUALPI2_ECN_MASK_CLA_ECT to remove potential leakeage in #1 (Simon Horman <horms(a)kernel.org>) - Fix one typo in comment of #2 - Update tc.yaml in #5 to aligh with the updated enum of pkt_sched.h v14 (05-May-2025) - Modify tc.yaml: (1) Replace flags with enum and remove enum-as-flags, (2) Remove credit-queue in xstats, and (3) Change attribute types (Donald Hunter <donald.hun - Add enum and fix the ordering of variables in pkt_sched.h to align with the modified tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Add validators for DROP_OVERLOAD, DROP_EARLY, ECN_MASK, and SPLIT_GSO in sch_dualpi2.c (Donald Hunter <donald.hunter(a)gmail.com>) - Update dualpi2.json to align with the updated variable order in pkt_sched.h - Reorder patches (Donald Hunter <donald.hunter(a)gmail.com>) v13 (26-Apr-2025) - Use dashes in member names to follow YNL conventions in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Define enumerations separately for flags of drop-early, drop-overload, ecn-mask, credit-queue in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Change the types of split-gso and step-packets into flag in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Revert to u32/u8 types for tc-dualpi2-xstats members in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Add new test cases in tc-tests/qdiscs/dualpi2.json to cover all dualpi2 parameters (Donald Hunter <donald.hunter(a)gmail.com>) - Change the type of TCA_DUALPI2_STEP_PACKETS into NLA_FLAG (Donald Hunter <donald.hunter(a)gmail.com>) v12 (22-Apr-2025) - Remove anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>) - Replace u32/u8 with uint and s32 with int in tc spec document (Paolo Abeni <pabeni(a)redhat.com>) - Introduce get_memory_limit function to handle potential overflow when multipling limit with MTU (Paolo Abeni <pabeni(a)redhat.com>) - Double the packet length to further include packet overhead in memory_limit (Paolo Abeni <pabeni(a)redhat.com>) - Remove the check of qdisc_qlen(sch) when calling qdisc_tree_reduce_backlog (Paolo Abeni <pabeni(a)redhat.com>) v11 (15-Apr-2025) - Replace hstimer_init with hstimer_setup in sch_dualpi2.c v10 (25-Mar-2025) - Remove leftover include in include/linux/netdevice.h and anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>) - Use kfree_skb_reason() and add SKB_DROP_REASON_DUALPI2_STEP_DROP drop reason (Paolo Abeni <pabeni(a)redhat.com>) - Split sch_dualpi2.c into 3 patches (and overall 5 patches): Struct definition & parsing, Dump stats & configuration, Enqueue/Dequeue (Paolo Abeni <pabeni(a)redhat.com>) v9 (16-Mar-2025) - Fix mem_usage error in previous version - Add min_qlen_step to the dualpi2 attribute as the minimum queue length in number of packets in the L-queue to start step threshold marking. In previous versions, this value was fixed to 2, so the step threshold was applied to mark packets in the L queue only when the queue length of the L queue was greater than or equal to 2 packets. This will cause larger queuing delays for L4S traffic at low rates (<20Mbps). So we parameterize it and change the default value to 0. Comparison of tcp_1down run 'HTB 20Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 11.55 11.70 ms 350 TCP upload avg : 18.96 N/A Mbits/s 350 TCP upload sum : 18.96 N/A Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 10.81 10.70 ms 350 TCP upload avg : 18.91 N/A Mbits/s 350 TCP upload sum : 18.91 N/A Mbits/s 350 Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 12.61 12.80 ms 350 TCP upload avg : 9.48 N/A Mbits/s 350 TCP upload sum : 9.48 N/A Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 11.06 10.80 ms 350 TCP upload avg : 9.43 N/A Mbits/s 350 TCP upload sum : 9.43 N/A Mbits/s 350 Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 40.86 37.45 ms 350 TCP upload avg : 0.88 N/A Mbits/s 350 TCP upload sum : 0.88 N/A Mbits/s 350 TCP upload::1 : 0.88 0.97 Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 11.07 10.40 ms 350 TCP upload avg : 0.55 N/A Mbits/s 350 TCP upload sum : 0.55 N/A Mbits/s 350 TCP upload::1 : 0.55 0.59 Mbits/s 350 v8 (11-Mar-2025) - Fix warning messages in v7 v7 (07-Mar-2025) - Separate into 3 patches to avoid mixing changes of documentation, selftest, and code. (Cong Wang <xiyou.wangcong(a)gmail.com>) v6 (04-Mar-2025) - Add modprobe for dulapi2 in tc-testing script tc-testing/tdc.sh (Jakub Kicinski <kuba(a)kernel.org>) - Update test cases in dualpi2.json - Update commit message v5 (22-Feb-2025) - A comparison was done between MQ + DUALPI2, MQ + FQ_PIE, MQ + FQ_CODEL: Unshaped 1gigE with 4 download streams test: - Summary of tcp_4down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 1.19 1.34 ms 349 TCP download avg : 235.42 N/A Mbits/s 349 TCP download sum : 941.68 N/A Mbits/s 349 TCP download::1 : 235.19 235.39 Mbits/s 349 TCP download::2 : 235.03 235.35 Mbits/s 349 TCP download::3 : 236.89 235.44 Mbits/s 349 TCP download::4 : 234.57 235.19 Mbits/s 349 - Summary of tcp_4down run 'MQ + FQ_PIE' avg median # data pts Ping (ms) ICMP : 1.21 1.37 ms 350 TCP download avg : 235.42 N/A Mbits/s 350 TCP download sum : 941.61 N/A Mbits/s 350 TCP download::1 : 232.54 233.13 Mbits/s 350 TCP download::2 : 232.52 232.80 Mbits/s 350 TCP download::3 : 233.14 233.78 Mbits/s 350 TCP download::4 : 243.41 241.48 Mbits/s 350 - Summary of tcp_4down run 'MQ + DUALPI2' avg median # data pts Ping (ms) ICMP : 1.19 1.34 ms 349 TCP download avg : 235.42 N/A Mbits/s 349 TCP download sum : 941.68 N/A Mbits/s 349 TCP download::1 : 235.19 235.39 Mbits/s 349 TCP download::2 : 235.03 235.35 Mbits/s 349 TCP download::3 : 236.89 235.44 Mbits/s 349 TCP download::4 : 234.57 235.19 Mbits/s 349 Unshaped 1gigE with 128 download streams test: - Summary of tcp_128down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 Unshaped 10gigE with 4 download streams test: - Summary of tcp_4down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 0.22 0.23 ms 350 TCP download avg : 2354.08 N/A Mbits/s 350 TCP download sum : 9416.31 N/A Mbits/s 350 TCP download::1 : 2353.65 2352.81 Mbits/s 350 TCP download::2 : 2354.54 2354.21 Mbits/s 350 TCP download::3 : 2353.56 2353.78 Mbits/s 350 TCP download::4 : 2354.56 2354.45 Mbits/s 350 - Summary of tcp_4down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 0.20 0.19 ms 350 TCP download avg : 2354.76 N/A Mbits/s 350 TCP download sum : 9419.04 N/A Mbits/s 350 TCP download::1 : 2354.77 2353.89 Mbits/s 350 TCP download::2 : 2353.41 2354.29 Mbits/s 350 TCP download::3 : 2356.18 2354.19 Mbits/s 350 TCP download::4 : 2354.68 2353.15 Mbits/s 350 - Summary of tcp_4down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 0.24 0.24 ms 350 TCP download avg : 2354.11 N/A Mbits/s 350 TCP download sum : 9416.43 N/A Mbits/s 350 TCP download::1 : 2354.75 2353.93 Mbits/s 350 TCP download::2 : 2353.15 2353.75 Mbits/s 350 TCP download::3 : 2353.49 2353.72 Mbits/s 350 TCP download::4 : 2355.04 2353.73 Mbits/s 350 Unshaped 10gigE with 128 download streams test: - Summary of tcp_128down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 7.57 8.69 ms 350 TCP download avg : 73.97 N/A Mbits/s 350 TCP download sum : 9467.82 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 7.82 8.91 ms 350 TCP download avg : 73.97 N/A Mbits/s 350 TCP download sum : 9468.42 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 6.87 7.93 ms 350 TCP download avg : 73.95 N/A Mbits/s 350 TCP download sum : 9465.87 N/A Mbits/s 350 From the results shown above, we see small differences between combinations. - Update commit message to include results of no_split_gso and split_gso (Dave Taht <dave.taht(a)gmail.com> and Paolo Abeni <pabeni(a)redhat.com>) - Add memlimit in the dualpi2 attribute, and add memory_used, max_memory_used, memory_limit in dualpi2 stats (Dave Taht <dave.taht(a)gmail.com>) - Update note in sch_dualpi2.c related to BBRv3 status (Dave Taht <dave.taht(a)gmail.com>) - Update license identifier (Dave Taht <dave.taht(a)gmail.com>) - Add selftest in tools/testing/selftests/tc-testing (Cong Wang <xiyou.wangcong(a)gmail.com>) - Use netlink policies for parameter checks (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Modify texts & fix typos in Documentation/netlink/specs/tc.yaml (Dave Taht <dave.taht(a)gmail.com>) - Add descriptions of packet counter statistics and the reset function of sch_dualpi2.c - Fix step_thresh in packets - Update code comments in sch_dualpi2.c v4 (22-Oct-2024) - Update statement in Kconfig for DualPI2 (Stephen Hemminger <stephen(a)networkplumber.org>) - Put a blank line after #define in sch_dualpi2.c (Stephen Hemminger <stephen(a)networkplumber.org>) - Fix line length warning. v3 (19-Oct-2024) - Fix compilaiton error - Update Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>) v2 (18-Oct-2024) - Add Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>) - Use dualpi2 instead of skb prefix (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Replace nla_parse_nested_deprecated with nla_parse_nested (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Fix line length warning --- Chia-Yu Chang (4): sched: Struct definition and parsing of dualpi2 qdisc sched: Dump configuration and statistics of dualpi2 qdisc selftests/tc-testing: Add selftests for qdisc DualPI2 Documentation: netlink: specs: tc: Add DualPI2 specification Koen De Schepper (1): sched: Add enqueue/dequeue of dualpi2 qdisc Documentation/netlink/specs/tc.yaml | 156 +++ include/net/dropreason-core.h | 6 + include/uapi/linux/pkt_sched.h | 68 + net/sched/Kconfig | 12 + net/sched/Makefile | 1 + net/sched/sch_dualpi2.c | 1146 +++++++++++++++++ tools/testing/selftests/tc-testing/config | 1 + .../tc-testing/tc-tests/qdiscs/dualpi2.json | 254 ++++ tools/testing/selftests/tc-testing/tdc.sh | 1 + 9 files changed, 1645 insertions(+) create mode 100644 net/sched/sch_dualpi2.c create mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/dualpi2.json -- 2.34.1

3 weeks, 1 day

2
6
0 0

[PATCH net-next v2 13/14] selftests: forwarding: adf_mcd_start(): Allow configuring custom interfaces

by Petr Machata

Tests may wish to add other interfaces to listen on. Notably locally generated traffic uses dummy interfaces. The multicast daemon needs to know about these so that it allows forming rules that involve these interfaces, and so that net.ipv4.conf.X.mc_forwarding is set for the interfaces. To that end, allow passing in a list of interfaces to configure in addition to all the physical ones. Signed-off-by: Petr Machata <petrm(a)nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor(a)blackwall.org> --- Notes: v2: - Adjust as per shellcheck citations - Retain Nik's R-b, the changes were very minor. --- CC: Shuah Khan <shuah(a)kernel.org> CC: linux-kselftest(a)vger.kernel.org tools/testing/selftests/net/forwarding/lib.sh | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh index 253847372062..83ee6a07e072 100644 --- a/tools/testing/selftests/net/forwarding/lib.sh +++ b/tools/testing/selftests/net/forwarding/lib.sh @@ -1760,9 +1760,12 @@ mc_send() adf_mcd_start() { + local ifs=("$@") + local table_name="$MCD_TABLE_NAME" local smcroutedir local pid + local if local i check_command "$MCD" || return 1 @@ -1776,6 +1779,16 @@ adf_mcd_start() "$smcroutedir/$table_name.conf" done + for if in "${ifs[@]}"; do + if ! ip_link_has_flag "$if" MULTICAST; then + ip link set dev "$if" multicast on + defer ip link set dev "$if" multicast off + fi + + echo "phyint $if enable" >> \ + "$smcroutedir/$table_name.conf" + done + "$MCD" -N -I "$table_name" -f "$smcroutedir/$table_name.conf" \ -P "$smcroutedir/$table_name.pid" busywait "$BUSYWAIT_TIMEOUT" test -e "$smcroutedir/$table_name.pid" -- 2.49.0

3 weeks, 2 days

1
0
0 0

[PATCH net-next v2 12/14] selftests: net: lib: Add ip_link_has_flag()

by Petr Machata

Add a helper to determine whether a given netdevice has a given flag. Rewrite ip_link_is_up() in terms of the new helper. Signed-off-by: Petr Machata <petrm(a)nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor(a)blackwall.org> --- Notes: CC: Shuah Khan <shuah(a)kernel.org> CC: linux-kselftest(a)vger.kernel.org tools/testing/selftests/net/lib.sh | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh index 006fdadcc4b9..ff0dbe23e8e0 100644 --- a/tools/testing/selftests/net/lib.sh +++ b/tools/testing/selftests/net/lib.sh @@ -547,13 +547,19 @@ ip_link_set_addr() defer ip link set dev "$name" address "$old_addr" } -ip_link_is_up() +ip_link_has_flag() { local name=$1; shift + local flag=$1; shift local state=$(ip -j link show "$name" | - jq -r '(.[].flags[] | select(. == "UP")) // "DOWN"') - [[ $state == "UP" ]] + jq --arg flag "$flag" 'any(.[].flags.[]; . == $flag)') + [[ $state == true ]] +} + +ip_link_is_up() +{ + ip_link_has_flag "$1" UP } ip_link_set_up() -- 2.49.0

3 weeks, 2 days

1
0
0 0

[PATCH net-next v2 11/14] selftests: forwarding: lib: Move smcrouted helpers here

by Petr Machata

router_multicast.sh has several helpers for work with smcrouted. Extract them to lib.sh so that other selftests can use them as well. Convert the helpers to defer in the process, because that simplifies the interface quite a bit. Therefore have router_multicast.sh invoke defer_scopes_cleanup() in its cleanup() function. Signed-off-by: Petr Machata <petrm(a)nvidia.com> --- Notes: v2: - Adjust as per shellcheck citations --- CC: Shuah Khan <shuah(a)kernel.org> CC: linux-kselftest(a)vger.kernel.org tools/testing/selftests/net/forwarding/lib.sh | 33 +++++++++++++++++ .../net/forwarding/router_multicast.sh | 35 ++++--------------- 2 files changed, 39 insertions(+), 29 deletions(-) diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh index 508f3c700d71..253847372062 100644 --- a/tools/testing/selftests/net/forwarding/lib.sh +++ b/tools/testing/selftests/net/forwarding/lib.sh @@ -37,6 +37,7 @@ declare -A NETIFS=( : "${TEAMD:=teamd}" : "${MCD:=smcrouted}" : "${MC_CLI:=smcroutectl}" +: "${MCD_TABLE_NAME:=selftests}" # Constants for netdevice bring-up: # Default time in seconds to wait for an interface to come up before giving up @@ -1757,6 +1758,38 @@ mc_send() msend -g $groups -I $if_name -c 1 > /dev/null 2>&1 } +adf_mcd_start() +{ + local table_name="$MCD_TABLE_NAME" + local smcroutedir + local pid + local i + + check_command "$MCD" || return 1 + check_command "$MC_CLI" || return 1 + + smcroutedir=$(mktemp -d) + defer rm -rf "$smcroutedir" + + for ((i = 1; i <= NUM_NETIFS; ++i)); do + echo "phyint ${NETIFS[p$i]} enable" >> \ + "$smcroutedir/$table_name.conf" + done + + "$MCD" -N -I "$table_name" -f "$smcroutedir/$table_name.conf" \ + -P "$smcroutedir/$table_name.pid" + busywait "$BUSYWAIT_TIMEOUT" test -e "$smcroutedir/$table_name.pid" + pid=$(cat "$smcroutedir/$table_name.pid") + defer kill_process "$pid" +} + +mc_cli() +{ + local table_name="$MCD_TABLE_NAME" + + "$MC_CLI" -I "$table_name" "$@" +} + start_ip_monitor() { local mtype=$1; shift diff --git a/tools/testing/selftests/net/forwarding/router_multicast.sh b/tools/testing/selftests/net/forwarding/router_multicast.sh index 5a58b1ec8aef..83e52abdbc2e 100755 --- a/tools/testing/selftests/net/forwarding/router_multicast.sh +++ b/tools/testing/selftests/net/forwarding/router_multicast.sh @@ -33,10 +33,6 @@ NUM_NETIFS=6 source lib.sh source tc_common.sh -require_command $MCD -require_command $MC_CLI -table_name=selftests - h1_create() { simple_if_init $h1 198.51.100.2/28 2001:db8:1::2/64 @@ -149,25 +145,6 @@ router_destroy() ip link set dev $rp1 down } -start_mcd() -{ - SMCROUTEDIR="$(mktemp -d)" - - for ((i = 1; i <= $NUM_NETIFS; ++i)); do - echo "phyint ${NETIFS[p$i]} enable" >> \ - $SMCROUTEDIR/$table_name.conf - done - - $MCD -N -I $table_name -f $SMCROUTEDIR/$table_name.conf \ - -P $SMCROUTEDIR/$table_name.pid -} - -kill_mcd() -{ - pkill $MCD - rm -rf $SMCROUTEDIR -} - setup_prepare() { h1=${NETIFS[p1]} @@ -179,7 +156,7 @@ setup_prepare() rp3=${NETIFS[p5]} h3=${NETIFS[p6]} - start_mcd + adf_mcd_start || exit "$EXIT_STATUS" vrf_prepare @@ -206,7 +183,7 @@ cleanup() vrf_cleanup - kill_mcd + defer_scopes_cleanup } create_mcast_sg() @@ -214,9 +191,9 @@ create_mcast_sg() local if_name=$1; shift local s_addr=$1; shift local mcast=$1; shift - local dest_ifs=${@} + local dest_ifs=("${@}") - $MC_CLI -I $table_name add $if_name $s_addr $mcast $dest_ifs + mc_cli add "$if_name" "$s_addr" "$mcast" "${dest_ifs[@]}" } delete_mcast_sg() @@ -224,9 +201,9 @@ delete_mcast_sg() local if_name=$1; shift local s_addr=$1; shift local mcast=$1; shift - local dest_ifs=${@} + local dest_ifs=("${@}") - $MC_CLI -I $table_name remove $if_name $s_addr $mcast $dest_ifs + mc_cli remove "$if_name" "$s_addr" "$mcast" "${dest_ifs[@]}" } mcast_v4() -- 2.49.0

3 weeks, 2 days

1
0
0 0

[PATCH v4 0/2] x86/fred: Prevent single-step upon ERETU completion

by Xin Li (Intel)

IDT event delivery has a debug hole in which it does not generate #DB upon returning to userspace before the first userspace instruction is executed if the Trap Flag (TF) is set. FRED closes this hole by introducing a software event flag, i.e., bit 17 of the augmented SS: if the bit is set and ERETU would result in RFLAGS.TF = 1, a single-step trap will be pending upon completion of ERETU. However I overlooked properly setting and clearing the bit in different situations. Thus when FRED is enabled, if the Trap Flag (TF) is set without an external debugger attached, it can lead to an infinite loop in the SIGTRAP handler. To avoid this, the software event flag in the augmented SS must be cleared, ensuring that no single-step trap remains pending when ERETU completes. This patch set combines the fix [1] and its corresponding selftest [2] (requested by Dave Hansen) into one patch set. [1] https://lore.kernel.org/lkml/20250523050153.3308237-1-xin@zytor.com/ [2] https://lore.kernel.org/lkml/20250530230707.2528916-1-xin@zytor.com/ This patch set is based on tip/x86/urgent branch as of today. Xin Li (Intel) (2): x86/fred/signal: Prevent single-step upon ERETU completion selftests/x86: Add a test to detect infinite sigtrap handler loop arch/x86/include/asm/sighandling.h | 22 +++++ arch/x86/kernel/signal_32.c | 4 + arch/x86/kernel/signal_64.c | 4 + tools/testing/selftests/x86/Makefile | 2 +- tools/testing/selftests/x86/sigtrap_loop.c | 97 ++++++++++++++++++++++ 5 files changed, 128 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/x86/sigtrap_loop.c base-commit: dd2922dcfaa3296846265e113309e5f7f138839f -- 2.49.0

3 weeks, 2 days

4
6
0 0

[PATCH bpf-next v2] selftests/bpf: fix signedness bug in redir_partial()

by Fushuai Wang

When xsend() returns -1 (error), the check 'n < sizeof(buf)' incorrectly treats it as success due to unsigned promotion. Explicitly check for -1 first. Fixes: a4b7193d8efd ("selftests/bpf: Add sockmap test for redirecting partial skb data") Signed-off-by: Fushuai Wang <wangfushuai(a)baidu.com> --- tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c index 1d98eee7a2c3..f1bdccc7e4e7 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c @@ -924,6 +924,8 @@ static void redir_partial(int family, int sotype, int sock_map, int parser_map) goto close; n = xsend(c1, buf, sizeof(buf), 0); + if (n == -1) + goto close; if (n < sizeof(buf)) FAIL("incomplete write"); -- 2.36.1

3 weeks, 2 days

2
1
0 0

[PATCH net v3 0/2] Fix ntuple rules targeting default RSS

by Gal Pressman

This series addresses a regression in ethtool flow steering where rules targeting the default RSS context (context 0) were incorrectly rejected. The default RSS context always exists but is not stored in the rss_ctx xarray like additional contexts. The current validation logic was checking for the existence of context 0 in this array, causing valid flow steering rules to be rejected. This prevented configurations such as: - High priority rules directing specific traffic to the default context - Low priority catch-all rules directing remaining traffic to additional contexts Patch 1 fixes the validation logic to skip the existence check for context 0. Patch 2 adds a selftest that verifies this behavior. Changelog - v2->v3: https://lore.kernel.org/all/20250609120250.1630125-1-gal@nvidia.com/ * Reworded commit message. * Fix pylint warning. v1->v2: https://lore.kernel.org/all/20250225071348.509432-1-gal@nvidia.com/ * Reworded commit message. * Added a selftest. Gal Pressman (2): net: ethtool: Don't check if RSS context exists in case of context 0 selftests: drv-net: rss_ctx: Add test for ntuple rules targeting default RSS context net/ethtool/ioctl.c | 3 +- .../selftests/drivers/net/hw/rss_ctx.py | 59 ++++++++++++++++++- 2 files changed, 60 insertions(+), 2 deletions(-) -- 2.40.1

3 weeks, 2 days

3
4
0 0

[PATCH net-next v3 0/4] udp_tunnel: remove rtnl_lock dependency

by Stanislav Fomichev

Recently bnxt had to grow back a bunch of rtnl dependencies because of udp_tunnel's infra. Add separate (global) mutext to protect udp_tunnel state. v3: - fix 2 test failures (Jakub NIPA) v2: - move the lock into udp_tunnel_nic (Jakub) - reorder the lock ordering (Jakub) - move udp_ports_sleep removal into separate patch and update the test (Jakub) Cc: Michael Chan <michael.chan(a)broadcom.com> Stanislav Fomichev (4): udp_tunnel: remove rtnl_lock dependency net: remove redundant ASSERT_RTNL() in queue setup functions netdevsim: remove udp_ports_sleep Revert "bnxt_en: bring back rtnl_lock() in the bnxt_open() path" .../net/ethernet/broadcom/bnx2x/bnx2x_main.c | 3 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 42 ++++--------------- drivers/net/ethernet/emulex/benet/be_main.c | 3 +- drivers/net/ethernet/intel/i40e/i40e_main.c | 1 - drivers/net/ethernet/intel/ice/ice_main.c | 1 - .../net/ethernet/mellanox/mlx4/en_netdev.c | 3 +- .../net/ethernet/mellanox/mlx5/core/en_main.c | 3 +- .../ethernet/netronome/nfp/nfp_net_common.c | 3 +- .../net/ethernet/qlogic/qede/qede_filter.c | 3 -- .../net/ethernet/qlogic/qlcnic/qlcnic_main.c | 1 - drivers/net/ethernet/sfc/ef10.c | 1 - drivers/net/netdevsim/netdevsim.h | 2 - drivers/net/netdevsim/udp_tunnels.c | 12 ------ include/net/udp_tunnel.h | 8 ++-- net/core/dev.c | 2 - net/ipv4/udp_tunnel_nic.c | 30 +++++++------ .../drivers/net/netdevsim/udp_tunnel_nic.sh | 23 +--------- 17 files changed, 32 insertions(+), 109 deletions(-) -- 2.49.0

3 weeks, 2 days

3
8
0 0

[PATCH v8 00/14] riscv: add SBI FWFT misaligned exception delegation support

by Clément Léger

The SBI Firmware Feature extension allows the S-mode to request some specific features (either hardware or software) to be enabled. This series uses this extension to request misaligned access exception delegation to S-mode in order to let the kernel handle it. It also adds support for the KVM FWFT SBI extension based on the misaligned access handling infrastructure. FWFT SBI extension is part of the SBI V3.0 specifications [1]. It can be tested using the qemu provided at [2] which contains the series from [3]. Upstream kvm-unit-tests can be used inside kvm to tests the correct delegation of misaligned exceptions. Upstream OpenSBI can be used. Note: Since SBI V3.0 is not yet ratified, FWFT extension API is split between interface only and implementation, allowing to pick only the interface which do not have hard dependencies on SBI. The tests can be run using the kselftest from series [4]. $ qemu-system-riscv64 \ -cpu rv64,trap-misaligned-access=true,v=true \ -M virt \ -m 1024M \ -bios fw_dynamic.bin \ -kernel Image ... # ./misaligned TAP version 13 1..23 # Starting 23 tests from 1 test cases. # RUN global.gp_load_lh ... # OK global.gp_load_lh ok 1 global.gp_load_lh # RUN global.gp_load_lhu ... # OK global.gp_load_lhu ok 2 global.gp_load_lhu # RUN global.gp_load_lw ... # OK global.gp_load_lw ok 3 global.gp_load_lw # RUN global.gp_load_lwu ... # OK global.gp_load_lwu ok 4 global.gp_load_lwu # RUN global.gp_load_ld ... # OK global.gp_load_ld ok 5 global.gp_load_ld # RUN global.gp_load_c_lw ... # OK global.gp_load_c_lw ok 6 global.gp_load_c_lw # RUN global.gp_load_c_ld ... # OK global.gp_load_c_ld ok 7 global.gp_load_c_ld # RUN global.gp_load_c_ldsp ... # OK global.gp_load_c_ldsp ok 8 global.gp_load_c_ldsp # RUN global.gp_load_sh ... # OK global.gp_load_sh ok 9 global.gp_load_sh # RUN global.gp_load_sw ... # OK global.gp_load_sw ok 10 global.gp_load_sw # RUN global.gp_load_sd ... # OK global.gp_load_sd ok 11 global.gp_load_sd # RUN global.gp_load_c_sw ... # OK global.gp_load_c_sw ok 12 global.gp_load_c_sw # RUN global.gp_load_c_sd ... # OK global.gp_load_c_sd ok 13 global.gp_load_c_sd # RUN global.gp_load_c_sdsp ... # OK global.gp_load_c_sdsp ok 14 global.gp_load_c_sdsp # RUN global.fpu_load_flw ... # OK global.fpu_load_flw ok 15 global.fpu_load_flw # RUN global.fpu_load_fld ... # OK global.fpu_load_fld ok 16 global.fpu_load_fld # RUN global.fpu_load_c_fld ... # OK global.fpu_load_c_fld ok 17 global.fpu_load_c_fld # RUN global.fpu_load_c_fldsp ... # OK global.fpu_load_c_fldsp ok 18 global.fpu_load_c_fldsp # RUN global.fpu_store_fsw ... # OK global.fpu_store_fsw ok 19 global.fpu_store_fsw # RUN global.fpu_store_fsd ... # OK global.fpu_store_fsd ok 20 global.fpu_store_fsd # RUN global.fpu_store_c_fsd ... # OK global.fpu_store_c_fsd ok 21 global.fpu_store_c_fsd # RUN global.fpu_store_c_fsdsp ... # OK global.fpu_store_c_fsdsp ok 22 global.fpu_store_c_fsdsp # RUN global.gen_sigbus ... [12797.988647] misaligned[618]: unhandled signal 7 code 0x1 at 0x0000000000014dc0 in misaligned[4dc0,10000+76000] [12797.988990] CPU: 0 UID: 0 PID: 618 Comm: misaligned Not tainted 6.13.0-rc6-00008-g4ec4468967c9-dirty #51 [12797.989169] Hardware name: riscv-virtio,qemu (DT) [12797.989264] epc : 0000000000014dc0 ra : 0000000000014d00 sp : 00007fffe165d100 [12797.989407] gp : 000000000008f6e8 tp : 0000000000095760 t0 : 0000000000000008 [12797.989544] t1 : 00000000000965d8 t2 : 000000000008e830 s0 : 00007fffe165d160 [12797.989692] s1 : 000000000000001a a0 : 0000000000000000 a1 : 0000000000000002 [12797.989831] a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffffdeadbeef [12797.989964] a5 : 000000000008ef61 a6 : 626769735f6e0000 a7 : fffffffffffff000 [12797.990094] s2 : 0000000000000001 s3 : 00007fffe165d838 s4 : 00007fffe165d848 [12797.990238] s5 : 000000000000001a s6 : 0000000000010442 s7 : 0000000000010200 [12797.990391] s8 : 000000000000003a s9 : 0000000000094508 s10: 0000000000000000 [12797.990526] s11: 0000555567460668 t3 : 00007fffe165d070 t4 : 00000000000965d0 [12797.990656] t5 : fefefefefefefeff t6 : 0000000000000073 [12797.990756] status: 0000000200004020 badaddr: 000000000008ef61 cause: 0000000000000006 [12797.990911] Code: 8793 8791 3423 fcf4 3783 fc84 c737 dead 0713 eef7 (c398) 0001 # OK global.gen_sigbus ok 23 global.gen_sigbus # PASSED: 23 / 23 tests passed. # Totals: pass:23 fail:0 xfail:0 xpass:0 skip:0 error:0 With kvm-tools: # lkvm run -k sbi.flat -m 128 Info: # lkvm run -k sbi.flat -m 128 -c 1 --name guest-97 Info: Removed ghost socket file "/root/.lkvm//guest-97.sock". ########################################################################## # kvm-unit-tests ########################################################################## ... [test messages elided] PASS: sbi: fwft: FWFT extension probing no error PASS: sbi: fwft: get/set reserved feature 0x6 error == SBI_ERR_DENIED PASS: sbi: fwft: get/set reserved feature 0x3fffffff error == SBI_ERR_DENIED PASS: sbi: fwft: get/set reserved feature 0x80000000 error == SBI_ERR_DENIED PASS: sbi: fwft: get/set reserved feature 0xbfffffff error == SBI_ERR_DENIED PASS: sbi: fwft: misaligned_deleg: Get misaligned deleg feature no error PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature invalid value error PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature invalid value error PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value no error PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value 0 PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value no error PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value 1 PASS: sbi: fwft: misaligned_deleg: Verify misaligned load exception trap in supervisor SUMMARY: 50 tests, 2 unexpected failures, 12 skipped This series is available at [5]. Link: https://github.com/riscv-non-isa/riscv-sbi-doc/releases/download/vv3.0-rc2/… [1] Link: https://github.com/rivosinc/qemu/tree/dev/cleger/misaligned [2] Link: https://lore.kernel.org/all/20241211211933.198792-3-fkonrad@amd.com/T/ [3] Link: https://lore.kernel.org/linux-riscv/20250414123543.1615478-1-cleger@rivosin… [4] Link: https://github.com/rivosinc/linux/tree/dev/cleger/fwft [5] --- V8: - Move misaligned_access_speed under CONFIG_RISCV_MISALIGNED and add a separate commit for that. V7: - Fix ifdefery build problems - Move sbi_fwft_is_supported with fwft_set_req struct - Added Atish Reviewed-by - Updated KVM vcpu cfg hedeleg value in set_delegation - Changed SBI ETIME error mapping to ETIMEDOUT - Fixed a few typo reported by Alok V6: - Rename FWFT interface to remove "_local" - Fix test for MEDELEG values in KVM FWFT support - Add __init for unaligned_access_init() - Rebased on master V5: - Return ERANGE as mapping for SBI_ERR_BAD_RANGE - Removed unused sbi_fwft_get() - Fix kernel for sbi_fwft_local_set_cpumask() - Fix indentation for sbi_fwft_local_set() - Remove spurious space in kvm_sbi_fwft_ops. - Rebased on origin/master - Remove fixes commits and sent them as a separate series [4] V4: - Check SBI version 3.0 instead of 2.0 for FWFT presence - Use long for kvm_sbi_fwft operation return value - Init KVM sbi extension even if default_disabled - Remove revert_on_fail parameter for sbi_fwft_feature_set(). - Fix comments for sbi_fwft_set/get() - Only handle local features (there are no globals yet in the spec) - Add new SBI errors to sbi_err_map_linux_errno() V3: - Added comment about kvm sbi fwft supported/set/get callback requirements - Move struct kvm_sbi_fwft_feature in kvm_sbi_fwft.c - Add a FWFT interface V2: - Added Kselftest for misaligned testing - Added get_user() usage instead of __get_user() - Reenable interrupt when possible in misaligned access handling - Document that riscv supports unaligned-traps - Fix KVM extension state when an init function is present - Rework SBI misaligned accesses trap delegation code - Added support for CPU hotplugging - Added KVM SBI reset callback - Added reset for KVM SBI FWFT lock - Return SBI_ERR_DENIED_LOCKED when LOCK flag is set Clément Léger (14): riscv: sbi: add Firmware Feature (FWFT) SBI extensions definitions riscv: sbi: remove useless parenthesis riscv: sbi: add new SBI error mappings riscv: sbi: add FWFT extension interface riscv: sbi: add SBI FWFT extension calls riscv: misaligned: request misaligned exception from SBI riscv: misaligned: use on_each_cpu() for scalar misaligned access probing riscv: misaligned: declare misaligned_access_speed under CONFIG_RISCV_MISALIGNED riscv: misaligned: move emulated access uniformity check in a function riscv: misaligned: add a function to check misalign trap delegability RISC-V: KVM: add SBI extension init()/deinit() functions RISC-V: KVM: add SBI extension reset callback RISC-V: KVM: add support for FWFT SBI extension RISC-V: KVM: add support for SBI_FWFT_MISALIGNED_DELEG arch/riscv/include/asm/cpufeature.h | 14 +- arch/riscv/include/asm/kvm_host.h | 5 +- arch/riscv/include/asm/kvm_vcpu_sbi.h | 12 + arch/riscv/include/asm/kvm_vcpu_sbi_fwft.h | 29 +++ arch/riscv/include/asm/sbi.h | 60 +++++ arch/riscv/include/uapi/asm/kvm.h | 1 + arch/riscv/kernel/sbi.c | 81 ++++++- arch/riscv/kernel/traps_misaligned.c | 112 ++++++++- arch/riscv/kernel/unaligned_access_speed.c | 8 +- arch/riscv/kvm/Makefile | 1 + arch/riscv/kvm/vcpu.c | 4 +- arch/riscv/kvm/vcpu_sbi.c | 54 +++++ arch/riscv/kvm/vcpu_sbi_fwft.c | 257 +++++++++++++++++++++ arch/riscv/kvm/vcpu_sbi_sta.c | 3 +- 14 files changed, 620 insertions(+), 21 deletions(-) create mode 100644 arch/riscv/include/asm/kvm_vcpu_sbi_fwft.h create mode 100644 arch/riscv/kvm/vcpu_sbi_fwft.c -- 2.49.0

3 weeks, 2 days

9
37
0 0

[PATCH net-next 0/5] netconsole: Add support for msgid in sysdata

by Gustavo Luiz Duarte

This patch series introduces a new feature to netconsole which allows appending a message ID to the userdata dictionary. If the msgid feature is enabled, the message ID is built from a per-target 32 bit counter that is incremented and appended to every message sent to the target. Example:: echo 1 > "/sys/kernel/config/netconsole/cmdline0/userdata/msgid_enabled" echo "This is message #1" > /dev/kmsg echo "This is message #2" > /dev/kmsg 13,434,54928466,-;This is message #1 msgid=1 13,435,54934019,-;This is message #2 msgid=2 This feature can be used by the target to detect if messages were dropped or reordered before reaching the target. This allows system administrators to assess the reliability of their netconsole pipeline and detect loss of messages due to network contention or temporary unavailability. Suggested-by: Breno Leitao <leitao(a)debian.org> Signed-off-by: Gustavo Luiz Duarte <gustavold(a)gmail.com> Note to maintainer: This will conflict with a fix I sent recently to net: c85bf1975108 netconsole: fix appending sysdata when sysdata_fields == SYSDATA_RELEASE Please let me know if I should rebase at some point and send a v2. --- Gustavo Luiz Duarte (5): netconsole: introduce 'msgid' as a new sysdata field netconsole: implement configfs for msgid_enabled netconsole: append msgid to sysdata selftests: netconsole: Add tests for 'msgid' feature in sysdata docs: netconsole: document msgid feature Documentation/networking/netconsole.rst | 22 +++++++ drivers/net/netconsole.c | 67 +++++++++++++++++++++- .../selftests/drivers/net/netcons_sysdata.sh | 30 ++++++++++ 3 files changed, 118 insertions(+), 1 deletion(-) --- base-commit: 0097c4195b1d0ca57d15979626c769c74747b5a0 change-id: 20250609-netconsole-msgid-b93c6f8e9c60 Best regards, -- Gustavo Luiz Duarte <gustavold(a)gmail.com>

3 weeks, 2 days

3
11
0 0

[PATCH net-next 13/14] selftests: forwarding: adf_mcd_start(): Allow configuring custom interfaces

by Petr Machata

Tests may wish to add other interfaces to listen on. Notably locally generated traffic uses dummy interfaces. The multicast daemon needs to know about these so that it allows forming rules that involve these interfaces, and so that net.ipv4.conf.X.mc_forwarding is set for the interfaces. To that end, allow passing in a list of interfaces to configure in addition to all the physical ones. Signed-off-by: Petr Machata <petrm(a)nvidia.com> --- Notes: CC: Shuah Khan <shuah(a)kernel.org> CC: linux-kselftest(a)vger.kernel.org tools/testing/selftests/net/forwarding/lib.sh | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh index 88e63562f5c5..5f144d75167a 100644 --- a/tools/testing/selftests/net/forwarding/lib.sh +++ b/tools/testing/selftests/net/forwarding/lib.sh @@ -1760,6 +1760,8 @@ mc_send() adf_mcd_start() { + local ifs=("$@") + local if local i check_command $MCD || return 1 @@ -1775,6 +1777,16 @@ adf_mcd_start() $smcroutedir/$table_name.conf done + for if in ${ifs[@]}; do + if ! ip_link_has_flag "$if" MULTICAST; then + ip link set dev "$if" multicast on + defer ip link set dev "$if" multicast off + fi + + echo "phyint $if enable" >> \ + $smcroutedir/$table_name.conf + done + $MCD -N -I $table_name -f $smcroutedir/$table_name.conf \ -P $smcroutedir/$table_name.pid busywait "$BUSYWAIT_TIMEOUT" test -e $smcroutedir/$table_name.pid -- 2.49.0

3 weeks, 2 days

2
1
0 0

[PATCH net-next 12/14] selftests: net: lib: Add ip_link_has_flag()

by Petr Machata

Add a helper to determine whether a given netdevice has a given flag. Rewrite ip_link_is_up() in terms of the new helper. Signed-off-by: Petr Machata <petrm(a)nvidia.com> --- Notes: CC: Shuah Khan <shuah(a)kernel.org> CC: linux-kselftest(a)vger.kernel.org tools/testing/selftests/net/lib.sh | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh index 006fdadcc4b9..ff0dbe23e8e0 100644 --- a/tools/testing/selftests/net/lib.sh +++ b/tools/testing/selftests/net/lib.sh @@ -547,13 +547,19 @@ ip_link_set_addr() defer ip link set dev "$name" address "$old_addr" } -ip_link_is_up() +ip_link_has_flag() { local name=$1; shift + local flag=$1; shift local state=$(ip -j link show "$name" | - jq -r '(.[].flags[] | select(. == "UP")) // "DOWN"') - [[ $state == "UP" ]] + jq --arg flag "$flag" 'any(.[].flags.[]; . == $flag)') + [[ $state == true ]] +} + +ip_link_is_up() +{ + ip_link_has_flag "$1" UP } ip_link_set_up() -- 2.49.0

3 weeks, 2 days

2
1
0 0

[PATCH net-next 11/14] selftests: forwarding: lib: Move smcrouted helpers here

by Petr Machata

router_multicast.sh has several helpers for work with smcrouted. Extract them to lib.sh so that other selftests can use them as well. Convert the helpers to defer in the process, because that simplifies the interface quite a bit. Therefore have router_multicast.sh invoke defer_scopes_cleanup() in its cleanup() function. Signed-off-by: Petr Machata <petrm(a)nvidia.com> --- Notes: CC: Shuah Khan <shuah(a)kernel.org> CC: linux-kselftest(a)vger.kernel.org tools/testing/selftests/net/forwarding/lib.sh | 31 +++++++++++++++++++ .../net/forwarding/router_multicast.sh | 31 +++---------------- 2 files changed, 35 insertions(+), 27 deletions(-) diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh index 508f3c700d71..88e63562f5c5 100644 --- a/tools/testing/selftests/net/forwarding/lib.sh +++ b/tools/testing/selftests/net/forwarding/lib.sh @@ -37,6 +37,7 @@ declare -A NETIFS=( : "${TEAMD:=teamd}" : "${MCD:=smcrouted}" : "${MC_CLI:=smcroutectl}" +: "${MCD_TABLE_NAME:=selftests}" # Constants for netdevice bring-up: # Default time in seconds to wait for an interface to come up before giving up @@ -1757,6 +1758,36 @@ mc_send() msend -g $groups -I $if_name -c 1 > /dev/null 2>&1 } +adf_mcd_start() +{ + local i + + check_command $MCD || return 1 + check_command $MC_CLI || return 1 + + local table_name="$MCD_TABLE_NAME" + local smcroutedir="$(mktemp -d)" + + defer rm -rf $smcroutedir + + for ((i = 1; i <= $NUM_NETIFS; ++i)); do + echo "phyint ${NETIFS[p$i]} enable" >> \ + $smcroutedir/$table_name.conf + done + + $MCD -N -I $table_name -f $smcroutedir/$table_name.conf \ + -P $smcroutedir/$table_name.pid + busywait "$BUSYWAIT_TIMEOUT" test -e $smcroutedir/$table_name.pid + defer kill_process $(cat $smcroutedir/$table_name.pid) +} + +mc_cli() +{ + local table_name="$MCD_TABLE_NAME" + + $MC_CLI -I $table_name "$@" +} + start_ip_monitor() { local mtype=$1; shift diff --git a/tools/testing/selftests/net/forwarding/router_multicast.sh b/tools/testing/selftests/net/forwarding/router_multicast.sh index 5a58b1ec8aef..1e2378777b48 100755 --- a/tools/testing/selftests/net/forwarding/router_multicast.sh +++ b/tools/testing/selftests/net/forwarding/router_multicast.sh @@ -33,10 +33,6 @@ NUM_NETIFS=6 source lib.sh source tc_common.sh -require_command $MCD -require_command $MC_CLI -table_name=selftests - h1_create() { simple_if_init $h1 198.51.100.2/28 2001:db8:1::2/64 @@ -149,25 +145,6 @@ router_destroy() ip link set dev $rp1 down } -start_mcd() -{ - SMCROUTEDIR="$(mktemp -d)" - - for ((i = 1; i <= $NUM_NETIFS; ++i)); do - echo "phyint ${NETIFS[p$i]} enable" >> \ - $SMCROUTEDIR/$table_name.conf - done - - $MCD -N -I $table_name -f $SMCROUTEDIR/$table_name.conf \ - -P $SMCROUTEDIR/$table_name.pid -} - -kill_mcd() -{ - pkill $MCD - rm -rf $SMCROUTEDIR -} - setup_prepare() { h1=${NETIFS[p1]} @@ -179,7 +156,7 @@ setup_prepare() rp3=${NETIFS[p5]} h3=${NETIFS[p6]} - start_mcd + adf_mcd_start || exit $EXIT_STATUS vrf_prepare @@ -206,7 +183,7 @@ cleanup() vrf_cleanup - kill_mcd + defer_scopes_cleanup } create_mcast_sg() @@ -216,7 +193,7 @@ create_mcast_sg() local mcast=$1; shift local dest_ifs=${@} - $MC_CLI -I $table_name add $if_name $s_addr $mcast $dest_ifs + mc_cli add $if_name $s_addr $mcast $dest_ifs } delete_mcast_sg() @@ -226,7 +203,7 @@ delete_mcast_sg() local mcast=$1; shift local dest_ifs=${@} - $MC_CLI -I $table_name remove $if_name $s_addr $mcast $dest_ifs + mc_cli remove $if_name $s_addr $mcast $dest_ifs } mcast_v4() -- 2.49.0

3 weeks, 2 days

2
1
0 0

[PATCH] selftests/mm: Add CONFIG_FTRACE to config

by Dev Jain

If CONFIG_UPROBES is not set, a merge subtest fails: Failure log: 7151 12:46:54.627936 # # # RUN merge.handle_uprobe_upon_merged_vma ... 7152 12:46:54.639014 # # f /sys/bus/event_source/devices/uprobe/type 7153 12:46:54.639306 # # fopen: No such file or directory 7154 12:46:54.650451 # # # merge.c:473:handle_uprobe_upon_merged_vma:Expected read_sysfs("/sys/bus/event_source/devices/uprobe/type", &type) (1) == 0 (0) 7155 12:46:54.650730 # # # handle_uprobe_upon_merged_vma: Test terminated by assertion 7156 12:46:54.661750 # # # FAIL merge.handle_uprobe_upon_merged_vma 7157 12:46:54.662030 # # not ok 8 merge.handle_uprobe_upon_merged_vma CONFIG_UPROBES is enabled by CONFIG_UPROBE_EVENTS, which gets enabled by CONFIG_FTRACE. Therefore add this config to selftests/mm/config so that CI systems can include this config in the kernel build. Fixes: efe99fabeb11b ("selftests/mm: add test about uprobe pte be orphan during vma merge") Reported-by: Aishwarya <aishwarya.tcv(a)arm.com> Closes: https://lore.kernel.org/all/20250610103729.72440-1-aishwarya.tcv@arm.com/ Signed-off-by: Dev Jain <dev.jain(a)arm.com> --- tools/testing/selftests/mm/config | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/mm/config b/tools/testing/selftests/mm/config index a28baa536332..e600b41030c1 100644 --- a/tools/testing/selftests/mm/config +++ b/tools/testing/selftests/mm/config @@ -8,3 +8,4 @@ CONFIG_GUP_TEST=y CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_MEM_SOFT_DIRTY=y CONFIG_ANON_VMA_NAME=y +CONFIG_FTRACE=y -- 2.30.2

3 weeks, 2 days

6
9
0 0

[PATCH v2] kunit: Add test for static stub

by Tzung-Bi Shih

__kunit_activate_static_stub() works effectively as kunit_deactivate_static_stub() if `replacement_addr` is NULL. Add a test case to catch the issue discovered in 772e50a76ee6 ("kunit: Fix wrong parameter to kunit_deactivate_static_stub()"). For running the test: $ ./tools/testing/kunit/kunit.py run \ --arch=x86_64 \ kunit_stub Signed-off-by: Tzung-Bi Shih <tzungbi(a)kernel.org> --- Changes from v1: (https://lore.kernel.org/linux-kselftest/20250522054912.2948008-1-tzungbi@ke…) - Update the commit message to point to commit 772e50a76ee6. - Rebase to v6.16-rc1. lib/kunit/kunit-test.c | 46 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 45 insertions(+), 1 deletion(-) diff --git a/lib/kunit/kunit-test.c b/lib/kunit/kunit-test.c index d9c781c859fd..f8f567196ca9 100644 --- a/lib/kunit/kunit-test.c +++ b/lib/kunit/kunit-test.c @@ -8,6 +8,7 @@ #include "linux/gfp_types.h" #include <kunit/test.h> #include <kunit/test-bug.h> +#include <kunit/static_stub.h> #include <linux/device.h> #include <kunit/device.h> @@ -868,10 +869,53 @@ static struct kunit_suite kunit_current_test_suite = { .test_cases = kunit_current_test_cases, }; +static void kunit_stub_test(struct kunit *test) +{ + struct kunit fake_test; + const unsigned long fake_real_fn_addr = 0x1234; + const unsigned long fake_replacement_addr = 0x5678; + struct kunit_resource *res; + struct { + void *real_fn_addr; + void *replacement_addr; + } *stub_ctx; + + kunit_init_test(&fake_test, "kunit_stub_fake_test", NULL); + KUNIT_ASSERT_EQ(test, fake_test.status, KUNIT_SUCCESS); + KUNIT_ASSERT_EQ(test, list_count_nodes(&fake_test.resources), 0); + + __kunit_activate_static_stub(&fake_test, (void *)fake_real_fn_addr, + (void *)fake_replacement_addr); + KUNIT_ASSERT_EQ(test, fake_test.status, KUNIT_SUCCESS); + KUNIT_ASSERT_EQ(test, list_count_nodes(&fake_test.resources), 1); + + res = list_first_entry(&fake_test.resources, struct kunit_resource, node); + KUNIT_EXPECT_NOT_NULL(test, res); + + stub_ctx = res->data; + KUNIT_EXPECT_NOT_NULL(test, stub_ctx); + KUNIT_EXPECT_EQ(test, (unsigned long)stub_ctx->real_fn_addr, fake_real_fn_addr); + KUNIT_EXPECT_EQ(test, (unsigned long)stub_ctx->replacement_addr, fake_replacement_addr); + + __kunit_activate_static_stub(&fake_test, (void *)fake_real_fn_addr, NULL); + KUNIT_ASSERT_EQ(test, fake_test.status, KUNIT_SUCCESS); + KUNIT_ASSERT_EQ(test, list_count_nodes(&fake_test.resources), 0); +} + +static struct kunit_case kunit_stub_test_cases[] = { + KUNIT_CASE(kunit_stub_test), + {} +}; + +static struct kunit_suite kunit_stub_test_suite = { + .name = "kunit_stub", + .test_cases = kunit_stub_test_cases, +}; + kunit_test_suites(&kunit_try_catch_test_suite, &kunit_resource_test_suite, &kunit_log_test_suite, &kunit_status_test_suite, &kunit_current_test_suite, &kunit_device_test_suite, - &kunit_fault_test_suite); + &kunit_fault_test_suite, &kunit_stub_test_suite); MODULE_DESCRIPTION("KUnit test for core test infrastructure"); MODULE_LICENSE("GPL v2"); -- 2.50.0.rc1.591.g9c95f17f64-goog

3 weeks, 2 days

1
0
0 0

[PATCH] selftests/bpf: fix signedness bug in redir_partial()

by wangfushuai

When xsend() returns -1 (error), the check 'n < sizeof(buf)' incorrectly treats it as success due to unsigned promotion. Explicitly check for -1 first. Fixes: a4b7193d8efd ("selftests/bpf: Add sockmap test for redirecting partial skb data") Signed-off-by: wangfushuai <wangfushuai(a)baidu.com> --- tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c index 1d98eee7a2c3..f1bdccc7e4e7 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c @@ -924,6 +924,8 @@ static void redir_partial(int family, int sotype, int sock_map, int parser_map) goto close; n = xsend(c1, buf, sizeof(buf), 0); + if (n == -1) + goto close; if (n < sizeof(buf)) FAIL("incomplete write"); -- 2.36.1

3 weeks, 2 days

3
2
0 0

[PATCH V4 2/2] KVM: selftests: Change MDSCR_EL1 register holding variables as uint64_t

by Anshuman Khandual

Change MDSCR_EL1 register holding local variables as uint64_t that reflects its true register width as well. Cc: Marc Zyngier <maz(a)kernel.org> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Joey Gouly <joey.gouly(a)arm.com> Cc: kvm(a)vger.kernel.org Cc: kvmarm(a)lists.linux.dev Cc: linux-kernel(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: linux-arm-kernel(a)lists.infradead.org Reviewed-by: Ada Couprie Diaz <ada.coupriediaz(a)arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual(a)arm.com> --- tools/testing/selftests/kvm/arm64/debug-exceptions.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/kvm/arm64/debug-exceptions.c b/tools/testing/selftests/kvm/arm64/debug-exceptions.c index c7fb55c9135b..e34963956fbc 100644 --- a/tools/testing/selftests/kvm/arm64/debug-exceptions.c +++ b/tools/testing/selftests/kvm/arm64/debug-exceptions.c @@ -140,7 +140,7 @@ static void enable_os_lock(void) static void enable_monitor_debug_exceptions(void) { - uint32_t mdscr; + uint64_t mdscr; asm volatile("msr daifclr, #8"); @@ -223,7 +223,7 @@ void install_hw_bp_ctx(uint8_t addr_bp, uint8_t ctx_bp, uint64_t addr, static void install_ss(void) { - uint32_t mdscr; + uint64_t mdscr; asm volatile("msr daifclr, #8"); -- 2.25.1

3 weeks, 2 days

1
0
0 0

[PATCH v4] riscv: sbi: Add SBI Debug Triggers Extension tests

by Jesse Taube

Add tests for the DBTR SBI extension. Signed-off-by: Jesse Taube <jesse(a)rivosinc.com> --- V1 -> V2: - Call report_prefix_pop before returning - Disable compressed instructions in exec_call, update related comment - Remove extra "| 1" in dbtr_test_load - Remove extra newlines - Remove extra tabs in check_exec - Remove typedefs from enums - Return when dbtr_install_trigger fails - s/avalible/available/g - s/unistall/uninstall/g V2 -> V3: - Change SBI_DBTR_SHMEM_INVALID_ADDR to -1UL - Move all dbtr functions to sbi-dbtr.c - Move INSN_LEN to processor.h - Update include list - Use C-style comments V3 -> V4: - Include libcflat.h - Remove #define SBI_DBTR_SHMEM_INVALID_ADDR --- lib/riscv/asm/sbi.h | 1 + riscv/Makefile | 1 + riscv/sbi-dbtr.c | 811 ++++++++++++++++++++++++++++++++++++++++++++ riscv/sbi-tests.h | 1 + riscv/sbi.c | 1 + 5 files changed, 815 insertions(+) create mode 100644 riscv/sbi-dbtr.c diff --git a/lib/riscv/asm/sbi.h b/lib/riscv/asm/sbi.h index a5738a5c..78fd6e2a 100644 --- a/lib/riscv/asm/sbi.h +++ b/lib/riscv/asm/sbi.h @@ -51,6 +51,7 @@ enum sbi_ext_id { SBI_EXT_SUSP = 0x53555350, SBI_EXT_FWFT = 0x46574654, SBI_EXT_SSE = 0x535345, + SBI_EXT_DBTR = 0x44425452, }; enum sbi_ext_base_fid { diff --git a/riscv/Makefile b/riscv/Makefile index 11e68eae..55c7ac93 100644 --- a/riscv/Makefile +++ b/riscv/Makefile @@ -20,6 +20,7 @@ all: $(tests) $(TEST_DIR)/sbi-deps += $(TEST_DIR)/sbi-asm.o $(TEST_DIR)/sbi-deps += $(TEST_DIR)/sbi-fwft.o $(TEST_DIR)/sbi-deps += $(TEST_DIR)/sbi-sse.o +$(TEST_DIR)/sbi-deps += $(TEST_DIR)/sbi-dbtr.o all_deps += $($(TEST_DIR)/sbi-deps) diff --git a/riscv/sbi-dbtr.c b/riscv/sbi-dbtr.c new file mode 100644 index 00000000..b254f84e --- /dev/null +++ b/riscv/sbi-dbtr.c @@ -0,0 +1,811 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * SBI DBTR testsuite + * + * Copyright (C) 2025, Rivos Inc., Jesse Taube <jesse(a)rivosinc.com> + */ + +#include <asm/io.h> +#include <bitops.h> +#include <asm/processor.h> +#include <libcflat.h> + +#include "sbi-tests.h" + +#define RV_MAX_TRIGGERS 32 + +#define SBI_DBTR_TRIG_STATE_MAPPED BIT(0) +#define SBI_DBTR_TRIG_STATE_U BIT(1) +#define SBI_DBTR_TRIG_STATE_S BIT(2) +#define SBI_DBTR_TRIG_STATE_VU BIT(3) +#define SBI_DBTR_TRIG_STATE_VS BIT(4) +#define SBI_DBTR_TRIG_STATE_HAVE_HW_TRIG BIT(5) + +#define SBI_DBTR_TRIG_STATE_HW_TRIG_IDX_SHIFT 8 +#define SBI_DBTR_TRIG_STATE_HW_TRIG_IDX(trig_state) (trig_state >> SBI_DBTR_TRIG_STATE_HW_TRIG_IDX_SHIFT) + +#define SBI_DBTR_TDATA1_TYPE_SHIFT (__riscv_xlen - 4) + +#define SBI_DBTR_TDATA1_MCONTROL6_LOAD_BIT BIT(0) +#define SBI_DBTR_TDATA1_MCONTROL6_STORE_BIT BIT(1) +#define SBI_DBTR_TDATA1_MCONTROL6_EXECUTE_BIT BIT(2) +#define SBI_DBTR_TDATA1_MCONTROL6_U_BIT BIT(3) +#define SBI_DBTR_TDATA1_MCONTROL6_S_BIT BIT(4) +#define SBI_DBTR_TDATA1_MCONTROL6_SELECT_BIT BIT(21) +#define SBI_DBTR_TDATA1_MCONTROL6_VS_BIT BIT(23) +#define SBI_DBTR_TDATA1_MCONTROL6_VU_BIT BIT(24) + +#define SBI_DBTR_TDATA1_MCONTROL_LOAD_BIT BIT(0) +#define SBI_DBTR_TDATA1_MCONTROL_STORE_BIT BIT(1) +#define SBI_DBTR_TDATA1_MCONTROL_EXECUTE_BIT BIT(2) +#define SBI_DBTR_TDATA1_MCONTROL_U_BIT BIT(3) +#define SBI_DBTR_TDATA1_MCONTROL_S_BIT BIT(4) +#define SBI_DBTR_TDATA1_MCONTROL_SELECT_BIT BIT(19) + +enum McontrolType { + SBI_DBTR_TDATA1_TYPE_NONE = (0UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_LEGACY = (1UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_MCONTROL = (2UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_ICOUNT = (3UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_ITRIGGER = (4UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_ETRIGGER = (5UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_MCONTROL6 = (6UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_TMEXTTRIGGER = (7UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_RESERVED0 = (8UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_RESERVED1 = (9UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_RESERVED2 = (10UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_RESERVED3 = (11UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_CUSTOM0 = (12UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_CUSTOM1 = (13UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_CUSTOM2 = (14UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_DISABLED = (15UL << SBI_DBTR_TDATA1_TYPE_SHIFT), +}; + +enum Tdata1Value { + VALUE_NONE = 0, + VALUE_LOAD = BIT(0), + VALUE_STORE = BIT(1), + VALUE_EXECUTE = BIT(2), +}; + +enum Tdata1Mode { + MODE_NONE = 0, + MODE_M = BIT(0), + MODE_U = BIT(1), + MODE_S = BIT(2), + MODE_VU = BIT(3), + MODE_VS = BIT(4), +}; + +enum sbi_ext_dbtr_fid { + SBI_EXT_DBTR_NUM_TRIGGERS = 0, + SBI_EXT_DBTR_SETUP_SHMEM, + SBI_EXT_DBTR_TRIGGER_READ, + SBI_EXT_DBTR_TRIGGER_INSTALL, + SBI_EXT_DBTR_TRIGGER_UPDATE, + SBI_EXT_DBTR_TRIGGER_UNINSTALL, + SBI_EXT_DBTR_TRIGGER_ENABLE, + SBI_EXT_DBTR_TRIGGER_DISABLE, +}; + +struct sbi_dbtr_data_msg { + unsigned long tstate; + unsigned long tdata1; + unsigned long tdata2; + unsigned long tdata3; +}; + +struct sbi_dbtr_id_msg { + unsigned long idx; +}; + +/* SBI shared mem messages layout */ +struct sbi_dbtr_shmem_entry { + union { + struct sbi_dbtr_data_msg data; + struct sbi_dbtr_id_msg id; + }; +}; + +static bool dbtr_handled; + +/* Expected to be leaf function as not to disrupt frame-pointer */ +static __attribute__((naked)) void exec_call(void) +{ + /* skip over nop when triggered instead of ret. */ + asm volatile (".option push\n" + ".option arch, -c\n" + "nop\n" + "ret\n" + ".option pop\n"); +} + +static void dbtr_exception_handler(struct pt_regs *regs) +{ + dbtr_handled = true; + + /* Reading *epc may cause a fault, skip over nop */ + if ((void *)regs->epc == exec_call) { + regs->epc += 4; + return; + } + + /* WARNING: Skips over the trapped intruction */ + regs->epc += RV_INSN_LEN(readw((void *)regs->epc)); +} + +static bool do_save(void *tdata2) +{ + bool ret; + + writel(0, tdata2); + + ret = dbtr_handled; + dbtr_handled = false; + + return ret; +} + +static bool do_load(void *tdata2) +{ + bool ret; + + readl(tdata2); + + ret = dbtr_handled; + dbtr_handled = false; + + return ret; +} + +static bool do_exec(void) +{ + bool ret; + + exec_call(); + + ret = dbtr_handled; + dbtr_handled = false; + + return ret; +} + +static unsigned long gen_tdata1_mcontrol(enum Tdata1Mode mode, enum Tdata1Value value) +{ + unsigned long tdata1 = SBI_DBTR_TDATA1_TYPE_MCONTROL; + + if (value & VALUE_LOAD) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_LOAD_BIT; + + if (value & VALUE_STORE) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_STORE_BIT; + + if (value & VALUE_EXECUTE) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_EXECUTE_BIT; + + if (mode & MODE_M) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_U_BIT; + + if (mode & MODE_U) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_U_BIT; + + if (mode & MODE_S) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_S_BIT; + + return tdata1; +} + +static unsigned long gen_tdata1_mcontrol6(enum Tdata1Mode mode, enum Tdata1Value value) +{ + unsigned long tdata1 = SBI_DBTR_TDATA1_TYPE_MCONTROL6; + + if (value & VALUE_LOAD) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_LOAD_BIT; + + if (value & VALUE_STORE) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_STORE_BIT; + + if (value & VALUE_EXECUTE) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_EXECUTE_BIT; + + if (mode & MODE_M) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_U_BIT; + + if (mode & MODE_U) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_U_BIT; + + if (mode & MODE_S) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_S_BIT; + + if (mode & MODE_VU) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_VU_BIT; + + if (mode & MODE_VS) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_VS_BIT; + + return tdata1; +} + +static unsigned long gen_tdata1(enum McontrolType type, enum Tdata1Value value, enum Tdata1Mode mode) +{ + switch (type) { + case SBI_DBTR_TDATA1_TYPE_MCONTROL: + return gen_tdata1_mcontrol(mode, value); + case SBI_DBTR_TDATA1_TYPE_MCONTROL6: + return gen_tdata1_mcontrol6(mode, value); + default: + return 0; + } +} + +static struct sbiret sbi_debug_num_triggers(unsigned long trig_tdata1) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_NUM_TRIGGERS, trig_tdata1, 0, 0, 0, 0, 0); +} + +static struct sbiret sbi_debug_set_shmem_raw(unsigned long shmem_phys_lo, + unsigned long shmem_phys_hi, + unsigned long flags) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_SETUP_SHMEM, shmem_phys_lo, + shmem_phys_hi, flags, 0, 0, 0); +} + +static struct sbiret sbi_debug_set_shmem(void *shmem) +{ + phys_addr_t p = virt_to_phys(shmem); + + return sbi_debug_set_shmem_raw(lower_32_bits(p), upper_32_bits(p), 0); +} + +static struct sbiret sbi_debug_read_triggers(unsigned long trig_idx_base, + unsigned long trig_count) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_READ, trig_idx_base, + trig_count, 0, 0, 0, 0); +} + +static struct sbiret sbi_debug_install_triggers(unsigned long trig_count) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_INSTALL, trig_count, 0, 0, 0, 0, 0); +} + +static struct sbiret sbi_debug_update_triggers(unsigned long trig_count) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_UPDATE, trig_count, 0, 0, 0, 0, 0); +} + +static struct sbiret sbi_debug_uninstall_triggers(unsigned long trig_idx_base, + unsigned long trig_idx_mask) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_UNINSTALL, trig_idx_base, + trig_idx_mask, 0, 0, 0, 0); +} + +static struct sbiret sbi_debug_enable_triggers(unsigned long trig_idx_base, + unsigned long trig_idx_mask) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_ENABLE, trig_idx_base, + trig_idx_mask, 0, 0, 0, 0); +} + +static struct sbiret sbi_debug_disable_triggers(unsigned long trig_idx_base, + unsigned long trig_idx_mask) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_DISABLE, trig_idx_base, + trig_idx_mask, 0, 0, 0, 0); +} + +static bool dbtr_install_trigger(struct sbi_dbtr_shmem_entry *shmem, void *tdata2, + unsigned long tdata1) +{ + struct sbiret sbi_ret; + bool ret; + + shmem->data.tdata1 = tdata1; + shmem->data.tdata2 = (unsigned long)tdata2; + + sbi_ret = sbi_debug_install_triggers(1); + ret = sbiret_report_error(&sbi_ret, SBI_SUCCESS, "sbi_debug_install_triggers"); + if (ret) + install_exception_handler(EXC_BREAKPOINT, dbtr_exception_handler); + + return ret; +} + +static bool dbtr_uninstall_trigger(void) +{ + struct sbiret ret; + + install_exception_handler(EXC_BREAKPOINT, NULL); + + ret = sbi_debug_uninstall_triggers(0, 1); + return sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_uninstall_triggers"); +} + +static unsigned long dbtr_test_num_triggers(void) +{ + struct sbiret ret; + unsigned long tdata1 = 0; + /* sbi_debug_num_triggers will return trig_max in sbiret.value when trig_tdata1 == 0 */ + + /* should be at least one trigger. */ + ret = sbi_debug_num_triggers(tdata1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_num_triggers"); + + if (ret.value == 0) + report_fail("sbi_debug_num_triggers: Returned 0 triggers available"); + else + report_pass("sbi_debug_num_triggers: Returned %lu triggers available", ret.value); + + return ret.value; +} + +static enum McontrolType dbtr_test_type(unsigned long *num_trig) +{ + struct sbiret ret; + unsigned long tdata1 = SBI_DBTR_TDATA1_TYPE_MCONTROL6; + + ret = sbi_debug_num_triggers(tdata1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_num_triggers"); + if (ret.value > 0) { + report_pass("sbi_debug_num_triggers: Returned %lu mcontrol6 triggers available", + ret.value); + *num_trig = ret.value; + return tdata1; + } + + tdata1 = SBI_DBTR_TDATA1_TYPE_MCONTROL; + + ret = sbi_debug_num_triggers(tdata1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_num_triggers"); + *num_trig = ret.value; + if (ret.value > 0) { + report_pass("sbi_debug_num_triggers: Returned %lu mcontrol triggers available", + ret.value); + return tdata1; + } + + report_fail("sbi_debug_num_triggers: Returned 0 mcontrol(6) triggers available"); + + return SBI_DBTR_TDATA1_TYPE_NONE; +} + +static struct sbiret dbtr_test_save_install_uninstall(struct sbi_dbtr_shmem_entry *shmem, + enum McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + + report_prefix_push("save_trigger"); + + shmem->data.tdata1 = gen_tdata1(type, VALUE_STORE, MODE_S | MODE_S); + shmem->data.tdata2 = (unsigned long)&test; + + ret = sbi_debug_install_triggers(1); + if (!sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_install_triggers")) { + report_prefix_pop(); + return ret; + } + + install_exception_handler(EXC_BREAKPOINT, dbtr_exception_handler); + + report(do_save(&test), "triggered"); + + if (do_load(&test)) + report_fail("triggered by load"); + + ret = sbi_debug_uninstall_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_uninstall_triggers"); + + if (do_save(&test)) + report_fail("triggered after uninstall"); + + install_exception_handler(EXC_BREAKPOINT, NULL); + report_prefix_pop(); + + return ret; +} + +static void dbtr_test_update(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + + report_prefix_push("update_trigger"); + + if (!dbtr_install_trigger(shmem, NULL, gen_tdata1(type, VALUE_NONE, MODE_NONE))) { + report_prefix_pop(); + return; + } + + shmem->id.idx = 0; + shmem->data.tdata1 = gen_tdata1(type, VALUE_STORE, MODE_S); + shmem->data.tdata2 = (unsigned long)&test; + + ret = sbi_debug_update_triggers(1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_update_triggers"); + + report(do_save(&test), "triggered"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_load(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + static unsigned long test; + + report_prefix_push("load_trigger"); + if (!dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_LOAD, MODE_S))) { + report_prefix_pop(); + return; + } + + report(do_load(&test), "triggered"); + + if (do_save(&test)) + report_fail("triggered by save"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_disable_enable(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + + report_prefix_push("sbi_debug_disable_triggers"); + if (!dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + + ret = sbi_debug_disable_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_disable_triggers"); + + if (do_save(&test)) { + report_fail("should not trigger"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); + report_skip("sbi_debug_enable_triggers: no disable"); + + return; + } + + report_pass("should not trigger"); + + report_prefix_pop(); + report_prefix_push("sbi_debug_enable_triggers"); + + ret = sbi_debug_enable_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_enable_triggers"); + + report(do_save(&test), "triggered"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_exec(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + static unsigned long test; + + report_prefix_push("exec_trigger"); + /* check if loads and saves trigger exec */ + if (!dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_EXECUTE, MODE_S))) { + report_prefix_pop(); + return; + } + + if (do_load(&test)) + report_fail("triggered by load"); + + if (do_save(&test)) + report_fail("triggered by save"); + + dbtr_uninstall_trigger(); + + /* Check if exec works */ + if (!dbtr_install_trigger(shmem, exec_call, gen_tdata1(type, VALUE_EXECUTE, MODE_S))) { + report_prefix_pop(); + return; + } + report(do_exec(), "exec trigger"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_read(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + const unsigned long tstatus_expected = SBI_DBTR_TRIG_STATE_S | SBI_DBTR_TRIG_STATE_MAPPED; + const unsigned long tdata1 = gen_tdata1(type, VALUE_STORE, MODE_S); + static unsigned long test; + struct sbiret ret; + + report_prefix_push("sbi_debug_read_triggers"); + if (!dbtr_install_trigger(shmem, &test, tdata1)) { + report_prefix_pop(); + return; + } + + ret = sbi_debug_read_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_read_triggers"); + + report(shmem->data.tdata1 == tdata1, "tdata1 expected: 0x%016lx, found: 0x%016lx", + tdata1, shmem->data.tdata1); + report(shmem->data.tdata2 == ((unsigned long)&test), + "tdata2 expected: 0x%016lx, found: 0x%016lx", ((unsigned long)&test), + shmem->data.tdata2); + report(shmem->data.tstate == tstatus_expected, "tstate expected: 0x%016lx, found: 0x%016lx", + tstatus_expected, shmem->data.tstate); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void check_exec(unsigned long base) +{ + struct sbiret ret; + + report(do_exec(), "exec triggered"); + + ret = sbi_debug_uninstall_triggers(base, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_uninstall_triggers"); +} + +static void dbtr_test_multiple(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type, + unsigned long num_trigs) +{ + static unsigned long test[2]; + struct sbiret ret; + bool have_three = num_trigs > 2; + + if (num_trigs < 2) + return; + + report_prefix_push("test_multiple"); + + if (!dbtr_install_trigger(shmem, &test[0], gen_tdata1(type, VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + if (!dbtr_install_trigger(shmem, &test[1], gen_tdata1(type, VALUE_LOAD, MODE_S))) + goto error; + if (have_three && + !dbtr_install_trigger(shmem, exec_call, gen_tdata1(type, VALUE_EXECUTE, MODE_S))) { + ret = sbi_debug_uninstall_triggers(1, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_uninstall_triggers"); + goto error; + } + + report(do_save(&test[0]), "save triggered"); + + if (do_load(&test[0])) + report_fail("save triggered by load"); + + report(do_load(&test[1]), "load triggered"); + + if (do_save(&test[1])) + report_fail("load triggered by save"); + + if (have_three) + check_exec(2); + + ret = sbi_debug_uninstall_triggers(1, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_uninstall_triggers"); + + if (do_load(&test[1])) + report_fail("load triggered after uninstall"); + + report(do_save(&test[0]), "save triggered"); + + if (!have_three && + dbtr_install_trigger(shmem, exec_call, gen_tdata1(type, VALUE_EXECUTE, MODE_S))) + check_exec(1); + +error: + ret = sbi_debug_uninstall_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_uninstall_triggers"); + + install_exception_handler(EXC_BREAKPOINT, NULL); + report_prefix_pop(); +} + +static void dbtr_test_multiple_types(struct sbi_dbtr_shmem_entry *shmem, unsigned long type) +{ + static unsigned long test; + + report_prefix_push("dbtr_test_multiple_types"); + + /* check if loads and saves trigger exec */ + if (!dbtr_install_trigger(shmem, &test, + gen_tdata1(type, VALUE_EXECUTE | VALUE_LOAD | VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + + report(do_load(&test), "load trigger"); + + report(do_save(&test), "save trigger"); + + dbtr_uninstall_trigger(); + + /* Check if exec works */ + if (!dbtr_install_trigger(shmem, exec_call, + gen_tdata1(type, VALUE_EXECUTE | VALUE_LOAD | VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + + report(do_exec(), "exec trigger"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_disable_uninstall(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + + report_prefix_push("disable uninstall"); + if (!dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + + ret = sbi_debug_disable_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_disable_triggers"); + + dbtr_uninstall_trigger(); + + if (!dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + + report(do_save(&test), "triggered"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_uninstall_enable(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + + report_prefix_push("uninstall enable"); + if (!dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + dbtr_uninstall_trigger(); + + ret = sbi_debug_enable_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_enable_triggers"); + + install_exception_handler(EXC_BREAKPOINT, dbtr_exception_handler); + + report(!do_save(&test), "should not trigger"); + + install_exception_handler(EXC_BREAKPOINT, NULL); + report_prefix_pop(); +} + +static void dbtr_test_uninstall_update(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + + report_prefix_push("uninstall update"); + if (!dbtr_install_trigger(shmem, NULL, gen_tdata1(type, VALUE_NONE, MODE_NONE))) { + report_prefix_pop(); + return; + } + + dbtr_uninstall_trigger(); + + shmem->id.idx = 0; + shmem->data.tdata1 = gen_tdata1(type, VALUE_STORE, MODE_S); + shmem->data.tdata2 = (unsigned long)&test; + + ret = sbi_debug_update_triggers(1); + sbiret_report_error(&ret, SBI_ERR_FAILURE, "sbi_debug_update_triggers"); + + install_exception_handler(EXC_BREAKPOINT, dbtr_exception_handler); + + report(!do_save(&test), "should not trigger"); + + install_exception_handler(EXC_BREAKPOINT, NULL); + report_prefix_pop(); +} + +static void dbtr_test_disable_read(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + const unsigned long tstatus_expected = SBI_DBTR_TRIG_STATE_S | SBI_DBTR_TRIG_STATE_MAPPED; + const unsigned long tdata1 = gen_tdata1(type, VALUE_STORE, MODE_NONE); + static unsigned long test; + struct sbiret ret; + + report_prefix_push("disable_read"); + if (!dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + + ret = sbi_debug_disable_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_disable_triggers"); + + ret = sbi_debug_read_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_read_triggers"); + + report(shmem->data.tdata1 == tdata1, "tdata1 expected: 0x%016lx, found: 0x%016lx", + tdata1, shmem->data.tdata1); + report(shmem->data.tdata2 == ((unsigned long)&test), + "tdata2 expected: 0x%016lx, found: 0x%016lx", + ((unsigned long)&test), shmem->data.tdata2); + report(shmem->data.tstate == tstatus_expected, "tstate expected: 0x%016lx, found: 0x%016lx", + tstatus_expected, shmem->data.tstate); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +void check_dbtr(void) +{ + static struct sbi_dbtr_shmem_entry shmem[RV_MAX_TRIGGERS] = {}; + unsigned long num_trigs; + enum McontrolType trig_type; + struct sbiret ret; + + report_prefix_push("dbtr"); + + if (!sbi_probe(SBI_EXT_DBTR)) { + report_skip("extension not available"); + report_prefix_pop(); + return; + } + + if (__sbi_get_imp_id() == SBI_IMPL_OPENSBI && + __sbi_get_imp_version() < sbi_impl_opensbi_mk_version(1, 6)) { + report_skip("OpenSBI < v1.7 detected, skipping tests"); + report_prefix_pop(); + return; + } + + num_trigs = dbtr_test_num_triggers(); + if (!num_trigs) + goto error; + + trig_type = dbtr_test_type(&num_trigs); + if (trig_type == SBI_DBTR_TDATA1_TYPE_NONE) + goto error; + + ret = sbi_debug_set_shmem(shmem); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_set_shmem"); + + ret = dbtr_test_save_install_uninstall(&shmem[0], trig_type); + /* install or uninstall failed */ + if (ret.error != SBI_SUCCESS) + goto error; + + dbtr_test_load(&shmem[0], trig_type); + dbtr_test_exec(&shmem[0], trig_type); + dbtr_test_read(&shmem[0], trig_type); + dbtr_test_disable_enable(&shmem[0], trig_type); + dbtr_test_update(&shmem[0], trig_type); + dbtr_test_multiple_types(&shmem[0], trig_type); + dbtr_test_multiple(shmem, trig_type, num_trigs); + dbtr_test_disable_uninstall(&shmem[0], trig_type); + dbtr_test_uninstall_enable(&shmem[0], trig_type); + dbtr_test_uninstall_update(&shmem[0], trig_type); + dbtr_test_disable_read(&shmem[0], trig_type); + +error: + report_prefix_pop(); +} diff --git a/riscv/sbi-tests.h b/riscv/sbi-tests.h index d5c4ae70..6a227745 100644 --- a/riscv/sbi-tests.h +++ b/riscv/sbi-tests.h @@ -99,6 +99,7 @@ static inline bool env_enabled(const char *env) void sbi_bad_fid(int ext); void check_sse(void); +void check_dbtr(void); #endif /* __ASSEMBLER__ */ #endif /* _RISCV_SBI_TESTS_H_ */ diff --git a/riscv/sbi.c b/riscv/sbi.c index edb1a6be..5bd496d0 100644 --- a/riscv/sbi.c +++ b/riscv/sbi.c @@ -1561,6 +1561,7 @@ int main(int argc, char **argv) check_susp(); check_sse(); check_fwft(); + check_dbtr(); return report_summary(); } -- 2.43.0

3 weeks, 2 days

3
2
0 0

[PATCH net-next, v2] selftest: Add selftest for multicast address notifications

by Yuyang Huang

This commit adds a new kernel selftest to verify RTNLGRP_IPV4_MCADDR and RTNLGRP_IPV6_MCADDR notifications. The test works by adding and removing a dummy interface and then confirming that the system correctly receives join and removal notifications for the 224.0.0.1 and ff02::1 multicast addresses. The test relies on the iproute2 version to be 6.13+. Tested by the following command: $ vng -v --user root --cpus 16 -- \ make -C tools/testing/selftests TARGETS=net TEST_PROGS=rtnetlink.sh \ TEST_GEN_PROGS="" run_tests Cc: Maciej Żenczykowski <maze(a)google.com> Cc: Lorenzo Colitti <lorenzo(a)google.com> Signed-off-by: Yuyang Huang <yuyanghuang(a)google.com> --- Changelog since v1: - Skip the test if the iproute2 is too old. tools/testing/selftests/net/rtnetlink.sh | 39 ++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh index 2e8243a65b50..74d4afb55d7c 100755 --- a/tools/testing/selftests/net/rtnetlink.sh +++ b/tools/testing/selftests/net/rtnetlink.sh @@ -21,6 +21,7 @@ ALL_TESTS=" kci_test_vrf kci_test_encap kci_test_macsec + kci_test_mcast_addr_notification kci_test_ipsec kci_test_ipsec_offload kci_test_fdb_get @@ -1334,6 +1335,44 @@ kci_test_mngtmpaddr() return $ret } +kci_test_mcast_addr_notification() +{ + local tmpfile + local monitor_pid + local match_result + + tmpfile=$(mktemp) + + ip monitor maddr > $tmpfile & + monitor_pid=$! + sleep 1 + if [ ! -e "/proc/$monitor_pid" ]; then + end_test "SKIP: mcast addr notification: iproute2 too old" + rm $tmpfile + return $ksft_skip + fi + + run_cmd ip link add name test-dummy1 type dummy + run_cmd ip link set test-dummy1 up + run_cmd ip link del dev test-dummy1 + sleep 1 + + match_result=$(grep -cE "test-dummy1.*(224.0.0.1|ff02::1)" $tmpfile) + + kill $monitor_pid + rm $tmpfile + # There should be 4 line matches as follows. + # 13: test-dummy1 inet6 mcast ff02::1 scope global + # 13: test-dummy1 inet mcast 224.0.0.1 scope global + # Deleted 13: test-dummy1 inet mcast 224.0.0.1 scope global + # Deleted 13: test-dummy1 inet6 mcast ff02::1 scope global + if [ $match_result -ne 4 ];then + end_test "FAIL: mcast addr notification" + return 1 + fi + end_test "PASS: mcast addr notification" +} + kci_test_rtnl() { local current_test -- 2.49.0.1204.g71687c7c1d-goog

3 weeks, 2 days

1
0
0 0

[PATCH net v3 1/2] net: clear the dst when changing skb protocol

by Jakub Kicinski

A not-so-careful NAT46 BPF program can crash the kernel if it indiscriminately flips ingress packets from v4 to v6: BUG: kernel NULL pointer dereference, address: 0000000000000000 ip6_rcv_core (net/ipv6/ip6_input.c:190:20) ipv6_rcv (net/ipv6/ip6_input.c:306:8) process_backlog (net/core/dev.c:6186:4) napi_poll (net/core/dev.c:6906:9) net_rx_action (net/core/dev.c:7028:13) do_softirq (kernel/softirq.c:462:3) netif_rx (net/core/dev.c:5326:3) dev_loopback_xmit (net/core/dev.c:4015:2) ip_mc_finish_output (net/ipv4/ip_output.c:363:8) NF_HOOK (./include/linux/netfilter.h:314:9) ip_mc_output (net/ipv4/ip_output.c:400:5) dst_output (./include/net/dst.h:459:9) ip_local_out (net/ipv4/ip_output.c:130:9) ip_send_skb (net/ipv4/ip_output.c:1496:8) udp_send_skb (net/ipv4/udp.c:1040:8) udp_sendmsg (net/ipv4/udp.c:1328:10) The output interface has a 4->6 program attached at ingress. We try to loop the multicast skb back to the sending socket. Ingress BPF runs as part of netif_rx(), pushes a valid v6 hdr and changes skb->protocol to v6. We enter ip6_rcv_core which tries to use skb_dst(). But the dst is still an IPv4 one left after IPv4 mcast output. Clear the dst in all BPF helpers which change the protocol. Try to preserve metadata dsts, those may carry non-routing metadata. Cc: stable(a)vger.kernel.org Reviewed-by: Maciej Żenczykowski <maze(a)google.com> Acked-by: Daniel Borkmann <daniel(a)iogearbox.net> Fixes: d219df60a70e ("bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()") Fixes: 1b00e0dfe7d0 ("bpf: update skb->protocol in bpf_skb_net_grow") Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper") Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- v3: - go back to v1, the encap / decap which don't change proto will be added in -next - split out the test v2: https://lore.kernel.org/20250607204734.1588964-1-kuba@kernel.org - drop on encap/decap - fix typo (protcol) - add the test to the Makefile v1: https://lore.kernel.org/20250604210604.257036-1-kuba@kernel.org I wonder if we should not skip ingress (tc_skip_classify?) for looped back packets in the first place. But that doesn't seem robust enough vs multiple redirections to solve the crash. Ignoring LOOPBACK packets (like the NAT46 prog should) doesn't work either, since BPF can change pkt_type arbitrarily. CC: martin.lau(a)linux.dev CC: daniel(a)iogearbox.net CC: john.fastabend(a)gmail.com CC: eddyz87(a)gmail.com CC: sdf(a)fomichev.me CC: haoluo(a)google.com CC: willemb(a)google.com CC: william.xuanziyang(a)huawei.com CC: alan.maguire(a)oracle.com CC: bpf(a)vger.kernel.org CC: edumazet(a)google.com CC: maze(a)google.com CC: shuah(a)kernel.org CC: linux-kselftest(a)vger.kernel.org CC: yonghong.song(a)linux.dev --- net/core/filter.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 327ca73f9cd7..7a72f766aacf 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3233,6 +3233,13 @@ static const struct bpf_func_proto bpf_skb_vlan_pop_proto = { .arg1_type = ARG_PTR_TO_CTX, }; +static void bpf_skb_change_protocol(struct sk_buff *skb, u16 proto) +{ + skb->protocol = htons(proto); + if (skb_valid_dst(skb)) + skb_dst_drop(skb); +} + static int bpf_skb_generic_push(struct sk_buff *skb, u32 off, u32 len) { /* Caller already did skb_cow() with len as headroom, @@ -3329,7 +3336,7 @@ static int bpf_skb_proto_4_to_6(struct sk_buff *skb) } } - skb->protocol = htons(ETH_P_IPV6); + bpf_skb_change_protocol(skb, ETH_P_IPV6); skb_clear_hash(skb); return 0; @@ -3359,7 +3366,7 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb) } } - skb->protocol = htons(ETH_P_IP); + bpf_skb_change_protocol(skb, ETH_P_IP); skb_clear_hash(skb); return 0; @@ -3550,10 +3557,10 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, /* Match skb->protocol to new outer l3 protocol */ if (skb->protocol == htons(ETH_P_IP) && flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV6) - skb->protocol = htons(ETH_P_IPV6); + bpf_skb_change_protocol(skb, ETH_P_IPV6); else if (skb->protocol == htons(ETH_P_IPV6) && flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV4) - skb->protocol = htons(ETH_P_IP); + bpf_skb_change_protocol(skb, ETH_P_IP); } if (skb_is_gso(skb)) { @@ -3606,10 +3613,10 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff, /* Match skb->protocol to new outer l3 protocol */ if (skb->protocol == htons(ETH_P_IP) && flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6) - skb->protocol = htons(ETH_P_IPV6); + bpf_skb_change_protocol(skb, ETH_P_IPV6); else if (skb->protocol == htons(ETH_P_IPV6) && flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4) - skb->protocol = htons(ETH_P_IP); + bpf_skb_change_protocol(skb, ETH_P_IP); if (skb_is_gso(skb)) { struct skb_shared_info *shinfo = skb_shinfo(skb); -- 2.49.0

3 weeks, 2 days

3
2
0 0

[PATCH net-next, v2] selftest: Add selftest for multicast address notifications

by Yuyang Huang

This commit adds a new kernel selftest to verify RTNLGRP_IPV4_MCADDR and RTNLGRP_IPV6_MCADDR notifications. The test works by adding and removing a dummy interface and then confirming that the system correctly receives join and removal notifications for the 224.0.0.1 and ff02::1 multicast addresses. The test relies on the iproute2 version to be 6.13+. Tested by the following command: $ vng -v --user root --cpus 16 -- \ make -C tools/testing/selftests TARGETS=net TEST_PROGS=rtnetlink.sh \ TEST_GEN_PROGS="" run_tests Cc: Maciej Żenczykowski <maze(a)google.com> Cc: Lorenzo Colitti <lorenzo(a)google.com> Signed-off-by: Yuyang Huang <yuyanghuang(a)google.com> --- Changelog since v1: - Skip the test if the iproute2 is too old. tools/testing/selftests/net/rtnetlink.sh | 39 ++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh index 2e8243a65b50..74d4afb55d7c 100755 --- a/tools/testing/selftests/net/rtnetlink.sh +++ b/tools/testing/selftests/net/rtnetlink.sh @@ -21,6 +21,7 @@ ALL_TESTS=" kci_test_vrf kci_test_encap kci_test_macsec + kci_test_mcast_addr_notification kci_test_ipsec kci_test_ipsec_offload kci_test_fdb_get @@ -1334,6 +1335,44 @@ kci_test_mngtmpaddr() return $ret } +kci_test_mcast_addr_notification() +{ + local tmpfile + local monitor_pid + local match_result + + tmpfile=$(mktemp) + + ip monitor maddr > $tmpfile & + monitor_pid=$! + sleep 1 + if [ ! -e "/proc/$monitor_pid" ]; then + end_test "SKIP: mcast addr notification: iproute2 too old" + rm $tmpfile + return $ksft_skip + fi + + run_cmd ip link add name test-dummy1 type dummy + run_cmd ip link set test-dummy1 up + run_cmd ip link del dev test-dummy1 + sleep 1 + + match_result=$(grep -cE "test-dummy1.*(224.0.0.1|ff02::1)" $tmpfile) + + kill $monitor_pid + rm $tmpfile + # There should be 4 line matches as follows. + # 13: test-dummy1 inet6 mcast ff02::1 scope global + # 13: test-dummy1 inet mcast 224.0.0.1 scope global + # Deleted 13: test-dummy1 inet mcast 224.0.0.1 scope global + # Deleted 13: test-dummy1 inet6 mcast ff02::1 scope global + if [ $match_result -ne 4 ];then + end_test "FAIL: mcast addr notification" + return 1 + fi + end_test "PASS: mcast addr notification" +} + kci_test_rtnl() { local current_test -- 2.49.0.1204.g71687c7c1d-goog

3 weeks, 2 days

2
1
0 0

[RFC PATCH v2 00/10] Add TDX intra-host migration support

by Ryan Afranji

Hello, This is RFC v2 for the TDX intra-host migration patch series. It addresses comments in RFC v1 [1] and is rebased onto the latest kvm/next (v6.16-rc1). This patchset was built on top of the latest TDX selftests [2] and gmem linking [3] RFC patch series. Here is the series stitched together for your convenience: https://github.com/googleprodkernel/linux-cc/tree/tdx-copyless-rfc-v2 Changes from RFC v1: + Added patch to prevent deadlock warnings by re-ordering locking order. + Added patch to allow vCPUs to be created for uninitialized VMs. + Minor optimizations to TDX intra-host migration core logic. + Moved lapic state transfer into TDX intra-host migration core logic. + Added logic to handle posted interrupts that are injected during migration. + Added selftests. + Addressed comments from RFC v1. + Various small changes to make patchset compatible with latest version of kvm/next. [1] https://lore.kernel.org/lkml/20230407201921.2703758-2-sagis@google.com [2] https://lore.kernel.org/lkml/20250414214801.2693294-2-sagis@google.com [3] https://lore.kernel.org/all/cover.1747368092.git.afranji@google.com Ackerley Tng (2): KVM: selftests: Add TDX support for ucalls KVM: selftests: Add irqfd/interrupts test for TDX with migration Ryan Afranji (3): KVM: x86: Adjust locking order in move_enc_context_from KVM: TDX: Allow vCPUs to be created for migration KVM: selftests: Refactor userspace_mem_region creation out of vm_mem_add Sagi Shahar (5): KVM: Split tdp_mmu_pages to mirror and direct counters KVM: TDX: Add base implementation for tdx_vm_move_enc_context_from KVM: TDX: Implement moving mirror pages between 2 TDs KVM: TDX: Add core logic for TDX intra-host migration KVM: selftests: TDX: Add tests for TDX in-place migration arch/x86/include/asm/kvm_host.h | 7 +- arch/x86/kvm/mmu.h | 2 + arch/x86/kvm/mmu/mmu.c | 66 ++++ arch/x86/kvm/mmu/tdp_mmu.c | 72 +++- arch/x86/kvm/mmu/tdp_mmu.h | 6 + arch/x86/kvm/svm/sev.c | 13 +- arch/x86/kvm/vmx/main.c | 12 +- arch/x86/kvm/vmx/tdx.c | 236 +++++++++++- arch/x86/kvm/vmx/x86_ops.h | 1 + arch/x86/kvm/x86.c | 14 +- tools/testing/selftests/kvm/Makefile.kvm | 2 + .../testing/selftests/kvm/include/kvm_util.h | 25 ++ .../selftests/kvm/include/x86/tdx/tdx_util.h | 3 + .../selftests/kvm/include/x86/tdx/test_util.h | 5 + .../testing/selftests/kvm/include/x86/ucall.h | 4 +- tools/testing/selftests/kvm/lib/kvm_util.c | 222 ++++++++---- .../testing/selftests/kvm/lib/ucall_common.c | 2 +- .../selftests/kvm/lib/x86/tdx/tdx_util.c | 63 +++- .../selftests/kvm/lib/x86/tdx/test_util.c | 17 + tools/testing/selftests/kvm/lib/x86/ucall.c | 108 ++++-- .../kvm/x86/tdx_irqfd_migrate_test.c | 264 ++++++++++++++ .../selftests/kvm/x86/tdx_migrate_tests.c | 337 ++++++++++++++++++ 22 files changed, 1349 insertions(+), 132 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86/tdx_irqfd_migrate_test.c create mode 100644 tools/testing/selftests/kvm/x86/tdx_migrate_tests.c -- 2.50.0.rc1.591.g9c95f17f64-goog

3 weeks, 3 days

1
10
0 0

[PATCH bpf-next v2 1/2] bpftool: Use appropriate permissions for map access

by Slava Imameev

Modify several functions in tools/bpf/bpftool/common.c to allow specification of requested access for file descriptors, such as read-only access. Update bpftool to request only read access for maps when write access is not required. This fixes errors when reading from maps that are protected from modification via security_bpf_map. Signed-off-by: Slava Imameev <slava.imameev(a)crowdstrike.com> --- Changes in v2: - fix for a test compilation error: "conflicting types for 'bpf_fentry_test1'" --- --- tools/bpf/bpftool/btf.c | 3 +- tools/bpf/bpftool/common.c | 57 ++++++++++++++++++++++--------- tools/bpf/bpftool/iter.c | 2 +- tools/bpf/bpftool/link.c | 2 +- tools/bpf/bpftool/main.h | 13 ++++--- tools/bpf/bpftool/map.c | 56 +++++++++++++++++------------- tools/bpf/bpftool/map_perf_ring.c | 3 +- tools/bpf/bpftool/prog.c | 4 +-- 8 files changed, 90 insertions(+), 50 deletions(-) diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c index 6b14cbfa58aa..1ba27cb03348 100644 --- a/tools/bpf/bpftool/btf.c +++ b/tools/bpf/bpftool/btf.c @@ -905,7 +905,8 @@ static int do_dump(int argc, char **argv) return -1; } - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, + BPF_F_RDONLY); if (fd < 0) return -1; diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c index ecfa790adc13..ff1c99281beb 100644 --- a/tools/bpf/bpftool/common.c +++ b/tools/bpf/bpftool/common.c @@ -193,7 +193,8 @@ int mount_tracefs(const char *target) return err; } -int open_obj_pinned(const char *path, bool quiet) +int open_obj_pinned(const char *path, bool quiet, + const struct bpf_obj_get_opts *opts) { char *pname; int fd = -1; @@ -205,7 +206,7 @@ int open_obj_pinned(const char *path, bool quiet) goto out_ret; } - fd = bpf_obj_get(pname); + fd = bpf_obj_get_opts(pname, opts); if (fd < 0) { if (!quiet) p_err("bpf obj get (%s): %s", pname, @@ -221,12 +222,13 @@ int open_obj_pinned(const char *path, bool quiet) return fd; } -int open_obj_pinned_any(const char *path, enum bpf_obj_type exp_type) +int open_obj_pinned_any(const char *path, enum bpf_obj_type exp_type, + const struct bpf_obj_get_opts *opts) { enum bpf_obj_type type; int fd; - fd = open_obj_pinned(path, false); + fd = open_obj_pinned(path, false, opts); if (fd < 0) return -1; @@ -555,7 +557,7 @@ static int do_build_table_cb(const char *fpath, const struct stat *sb, if (typeflag != FTW_F) goto out_ret; - fd = open_obj_pinned(fpath, true); + fd = open_obj_pinned(fpath, true, NULL); if (fd < 0) goto out_ret; @@ -928,7 +930,7 @@ int prog_parse_fds(int *argc, char ***argv, int **fds) path = **argv; NEXT_ARGP(); - (*fds)[0] = open_obj_pinned_any(path, BPF_OBJ_PROG); + (*fds)[0] = open_obj_pinned_any(path, BPF_OBJ_PROG, NULL); if ((*fds)[0] < 0) return -1; return 1; @@ -965,7 +967,8 @@ int prog_parse_fd(int *argc, char ***argv) return fd; } -static int map_fd_by_name(char *name, int **fds) +static int map_fd_by_name(char *name, int **fds, + const struct bpf_get_fd_by_id_opts *opts) { unsigned int id = 0; int fd, nb_fds = 0; @@ -973,6 +976,7 @@ static int map_fd_by_name(char *name, int **fds) int err; while (true) { + LIBBPF_OPTS(bpf_get_fd_by_id_opts, opts_ro); struct bpf_map_info info = {}; __u32 len = sizeof(info); @@ -985,7 +989,9 @@ static int map_fd_by_name(char *name, int **fds) return nb_fds; } - fd = bpf_map_get_fd_by_id(id); + /* Request a read-only fd to query the map info */ + opts_ro.open_flags = BPF_F_RDONLY; + fd = bpf_map_get_fd_by_id_opts(id, &opts_ro); if (fd < 0) { p_err("can't get map by id (%u): %s", id, strerror(errno)); @@ -1004,6 +1010,15 @@ static int map_fd_by_name(char *name, int **fds) continue; } + /* Get an fd with the requested options. */ + close(fd); + fd = bpf_map_get_fd_by_id_opts(id, opts); + if (fd < 0) { + p_err("can't get map by id (%u): %s", id, + strerror(errno)); + goto err_close_fds; + } + if (nb_fds > 0) { tmp = realloc(*fds, (nb_fds + 1) * sizeof(int)); if (!tmp) { @@ -1023,8 +1038,16 @@ static int map_fd_by_name(char *name, int **fds) return -1; } -int map_parse_fds(int *argc, char ***argv, int **fds) +int map_parse_fds(int *argc, char ***argv, int **fds, __u32 open_flags) { + LIBBPF_OPTS(bpf_get_fd_by_id_opts, opts); + + if (open_flags & ~BPF_F_RDONLY) { + p_err("invalid open_flags: %x", open_flags); + return -1; + } + opts.open_flags = open_flags; + if (is_prefix(**argv, "id")) { unsigned int id; char *endptr; @@ -1038,7 +1061,7 @@ int map_parse_fds(int *argc, char ***argv, int **fds) } NEXT_ARGP(); - (*fds)[0] = bpf_map_get_fd_by_id(id); + (*fds)[0] = bpf_map_get_fd_by_id_opts(id, &opts); if ((*fds)[0] < 0) { p_err("get map by id (%u): %s", id, strerror(errno)); return -1; @@ -1056,16 +1079,18 @@ int map_parse_fds(int *argc, char ***argv, int **fds) } NEXT_ARGP(); - return map_fd_by_name(name, fds); + return map_fd_by_name(name, fds, &opts); } else if (is_prefix(**argv, "pinned")) { char *path; + LIBBPF_OPTS(bpf_obj_get_opts, get_opts); + get_opts.file_flags = open_flags; NEXT_ARGP(); path = **argv; NEXT_ARGP(); - (*fds)[0] = open_obj_pinned_any(path, BPF_OBJ_MAP); + (*fds)[0] = open_obj_pinned_any(path, BPF_OBJ_MAP, &get_opts); if ((*fds)[0] < 0) return -1; return 1; @@ -1075,7 +1100,7 @@ int map_parse_fds(int *argc, char ***argv, int **fds) return -1; } -int map_parse_fd(int *argc, char ***argv) +int map_parse_fd(int *argc, char ***argv, __u32 open_flags) { int *fds = NULL; int nb_fds, fd; @@ -1085,7 +1110,7 @@ int map_parse_fd(int *argc, char ***argv) p_err("mem alloc failed"); return -1; } - nb_fds = map_parse_fds(argc, argv, &fds); + nb_fds = map_parse_fds(argc, argv, &fds, open_flags); if (nb_fds != 1) { if (nb_fds > 1) { p_err("several maps match this handle"); @@ -1103,12 +1128,12 @@ int map_parse_fd(int *argc, char ***argv) } int map_parse_fd_and_info(int *argc, char ***argv, struct bpf_map_info *info, - __u32 *info_len) + __u32 *info_len, __u32 open_flags) { int err; int fd; - fd = map_parse_fd(argc, argv); + fd = map_parse_fd(argc, argv, open_flags); if (fd < 0) return -1; diff --git a/tools/bpf/bpftool/iter.c b/tools/bpf/bpftool/iter.c index 5c39c2ed36a2..ad318a8667a4 100644 --- a/tools/bpf/bpftool/iter.c +++ b/tools/bpf/bpftool/iter.c @@ -37,7 +37,7 @@ static int do_pin(int argc, char **argv) return -1; } - map_fd = map_parse_fd(&argc, &argv); + map_fd = map_parse_fd(&argc, &argv, 0); if (map_fd < 0) return -1; diff --git a/tools/bpf/bpftool/link.c b/tools/bpf/bpftool/link.c index 3535afc80a49..8523be11dcd9 100644 --- a/tools/bpf/bpftool/link.c +++ b/tools/bpf/bpftool/link.c @@ -117,7 +117,7 @@ static int link_parse_fd(int *argc, char ***argv) path = **argv; NEXT_ARGP(); - return open_obj_pinned_any(path, BPF_OBJ_LINK); + return open_obj_pinned_any(path, BPF_OBJ_LINK, NULL); } p_err("expected 'id' or 'pinned', got: '%s'?", **argv); diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h index 9eb764fe4cc8..6db704fda5c0 100644 --- a/tools/bpf/bpftool/main.h +++ b/tools/bpf/bpftool/main.h @@ -15,6 +15,7 @@ #include <bpf/hashmap.h> #include <bpf/libbpf.h> +#include <bpf/bpf.h> #include "json_writer.h" @@ -140,8 +141,10 @@ void get_prog_full_name(const struct bpf_prog_info *prog_info, int prog_fd, int get_fd_type(int fd); const char *get_fd_type_name(enum bpf_obj_type type); char *get_fdinfo(int fd, const char *key); -int open_obj_pinned(const char *path, bool quiet); -int open_obj_pinned_any(const char *path, enum bpf_obj_type exp_type); +int open_obj_pinned(const char *path, bool quiet, + const struct bpf_obj_get_opts *opts); +int open_obj_pinned_any(const char *path, enum bpf_obj_type exp_type, + const struct bpf_obj_get_opts *opts); int mount_bpffs_for_file(const char *file_name); int create_and_mount_bpffs_dir(const char *dir_name); int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(int *, char ***)); @@ -167,10 +170,10 @@ int do_iter(int argc, char **argv) __weak; int parse_u32_arg(int *argc, char ***argv, __u32 *val, const char *what); int prog_parse_fd(int *argc, char ***argv); int prog_parse_fds(int *argc, char ***argv, int **fds); -int map_parse_fd(int *argc, char ***argv); -int map_parse_fds(int *argc, char ***argv, int **fds); +int map_parse_fd(int *argc, char ***argv, __u32 open_flags); +int map_parse_fds(int *argc, char ***argv, int **fds, __u32 open_flags); int map_parse_fd_and_info(int *argc, char ***argv, struct bpf_map_info *info, - __u32 *info_len); + __u32 *info_len, __u32 open_flags); struct bpf_prog_linfo; #if defined(HAVE_LLVM_SUPPORT) || defined(HAVE_LIBBFD_SUPPORT) diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c index 81cc668b4b05..c7dc2eae8ba8 100644 --- a/tools/bpf/bpftool/map.c +++ b/tools/bpf/bpftool/map.c @@ -337,9 +337,9 @@ static void fill_per_cpu_value(struct bpf_map_info *info, void *value) memcpy(value + i * step, value, info->value_size); } -static int parse_elem(char **argv, struct bpf_map_info *info, - void *key, void *value, __u32 key_size, __u32 value_size, - __u32 *flags, __u32 **value_fd) +static int parse_elem(char **argv, struct bpf_map_info *info, void *key, + void *value, __u32 key_size, __u32 value_size, + __u32 *flags, __u32 **value_fd, __u32 open_flags) { if (!*argv) { if (!key && !value) @@ -362,7 +362,7 @@ static int parse_elem(char **argv, struct bpf_map_info *info, return -1; return parse_elem(argv, info, NULL, value, key_size, value_size, - flags, value_fd); + flags, value_fd, open_flags); } else if (is_prefix(*argv, "value")) { int fd; @@ -388,7 +388,7 @@ static int parse_elem(char **argv, struct bpf_map_info *info, return -1; } - fd = map_parse_fd(&argc, &argv); + fd = map_parse_fd(&argc, &argv, open_flags); if (fd < 0) return -1; @@ -424,7 +424,7 @@ static int parse_elem(char **argv, struct bpf_map_info *info, } return parse_elem(argv, info, key, NULL, key_size, value_size, - flags, NULL); + flags, NULL, open_flags); } else if (is_prefix(*argv, "any") || is_prefix(*argv, "noexist") || is_prefix(*argv, "exist")) { if (!flags) { @@ -440,7 +440,7 @@ static int parse_elem(char **argv, struct bpf_map_info *info, *flags = BPF_EXIST; return parse_elem(argv + 1, info, key, value, key_size, - value_size, NULL, value_fd); + value_size, NULL, value_fd, open_flags); } p_err("expected key or value, got: %s", *argv); @@ -639,7 +639,7 @@ static int do_show_subset(int argc, char **argv) p_err("mem alloc failed"); return -1; } - nb_fds = map_parse_fds(&argc, &argv, &fds); + nb_fds = map_parse_fds(&argc, &argv, &fds, BPF_F_RDONLY); if (nb_fds < 1) goto exit_free; @@ -672,12 +672,15 @@ static int do_show_subset(int argc, char **argv) static int do_show(int argc, char **argv) { + LIBBPF_OPTS(bpf_get_fd_by_id_opts, opts); struct bpf_map_info info = {}; __u32 len = sizeof(info); __u32 id = 0; int err; int fd; + opts.open_flags = BPF_F_RDONLY; + if (show_pinned) { map_table = hashmap__new(hash_fn_for_key_as_id, equal_fn_for_key_as_id, NULL); @@ -707,7 +710,7 @@ static int do_show(int argc, char **argv) break; } - fd = bpf_map_get_fd_by_id(id); + fd = bpf_map_get_fd_by_id_opts(id, &opts); if (fd < 0) { if (errno == ENOENT) continue; @@ -909,7 +912,7 @@ static int do_dump(int argc, char **argv) p_err("mem alloc failed"); return -1; } - nb_fds = map_parse_fds(&argc, &argv, &fds); + nb_fds = map_parse_fds(&argc, &argv, &fds, BPF_F_RDONLY); if (nb_fds < 1) goto exit_free; @@ -997,7 +1000,7 @@ static int do_update(int argc, char **argv) if (argc < 2) usage(); - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, 0); if (fd < 0) return -1; @@ -1006,7 +1009,7 @@ static int do_update(int argc, char **argv) goto exit_free; err = parse_elem(argv, &info, key, value, info.key_size, - info.value_size, &flags, &value_fd); + info.value_size, &flags, &value_fd, 0); if (err) goto exit_free; @@ -1076,7 +1079,7 @@ static int do_lookup(int argc, char **argv) if (argc < 2) usage(); - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, BPF_F_RDONLY); if (fd < 0) return -1; @@ -1084,7 +1087,8 @@ static int do_lookup(int argc, char **argv) if (err) goto exit_free; - err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, NULL); + err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, NULL, + BPF_F_RDONLY); if (err) goto exit_free; @@ -1127,7 +1131,7 @@ static int do_getnext(int argc, char **argv) if (argc < 2) usage(); - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, BPF_F_RDONLY); if (fd < 0) return -1; @@ -1140,8 +1144,8 @@ static int do_getnext(int argc, char **argv) } if (argc) { - err = parse_elem(argv, &info, key, NULL, info.key_size, 0, - NULL, NULL); + err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, + NULL, BPF_F_RDONLY); if (err) goto exit_free; } else { @@ -1198,7 +1202,7 @@ static int do_delete(int argc, char **argv) if (argc < 2) usage(); - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, BPF_F_RDONLY); if (fd < 0) return -1; @@ -1209,7 +1213,8 @@ static int do_delete(int argc, char **argv) goto exit_free; } - err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, NULL); + err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, NULL, + 0); if (err) goto exit_free; @@ -1226,11 +1231,16 @@ static int do_delete(int argc, char **argv) return err; } +static int map_parse_read_only_fd(int *argc, char ***argv) +{ + return map_parse_fd(argc, argv, BPF_F_RDONLY); +} + static int do_pin(int argc, char **argv) { int err; - err = do_pin_any(argc, argv, map_parse_fd); + err = do_pin_any(argc, argv, map_parse_read_only_fd); if (!err && json_output) jsonw_null(json_wtr); return err; @@ -1319,7 +1329,7 @@ static int do_create(int argc, char **argv) if (!REQ_ARGS(2)) usage(); inner_map_fd = map_parse_fd_and_info(&argc, &argv, - &info, &len); + &info, &len, 0); if (inner_map_fd < 0) return -1; attr.inner_map_fd = inner_map_fd; @@ -1368,7 +1378,7 @@ static int do_pop_dequeue(int argc, char **argv) if (argc < 2) usage(); - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, 0); if (fd < 0) return -1; @@ -1407,7 +1417,7 @@ static int do_freeze(int argc, char **argv) if (!REQ_ARGS(2)) return -1; - fd = map_parse_fd(&argc, &argv); + fd = map_parse_fd(&argc, &argv, 0); if (fd < 0) return -1; diff --git a/tools/bpf/bpftool/map_perf_ring.c b/tools/bpf/bpftool/map_perf_ring.c index 552b4ca40c27..bcb767e2d673 100644 --- a/tools/bpf/bpftool/map_perf_ring.c +++ b/tools/bpf/bpftool/map_perf_ring.c @@ -128,7 +128,8 @@ int do_event_pipe(int argc, char **argv) int err, map_fd; map_info_len = sizeof(map_info); - map_fd = map_parse_fd_and_info(&argc, &argv, &map_info, &map_info_len); + map_fd = map_parse_fd_and_info(&argc, &argv, &map_info, &map_info_len, + 0); if (map_fd < 0) return -1; diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c index 96eea8a67225..deeaa5c1ed7d 100644 --- a/tools/bpf/bpftool/prog.c +++ b/tools/bpf/bpftool/prog.c @@ -1062,7 +1062,7 @@ static int parse_attach_detach_args(int argc, char **argv, int *progfd, if (!REQ_ARGS(2)) return -EINVAL; - *mapfd = map_parse_fd(&argc, &argv); + *mapfd = map_parse_fd(&argc, &argv, 0); if (*mapfd < 0) return *mapfd; @@ -1608,7 +1608,7 @@ static int load_with_options(int argc, char **argv, bool first_prog_only) } NEXT_ARG(); - fd = map_parse_fd(&argc, &argv); + fd = map_parse_fd(&argc, &argv, 0); if (fd < 0) goto err_free_reuse_maps; -- 2.34.1

3 weeks, 3 days

2
5
0 0

Re: [PATCH bpf-next v2 1/2] bpftool: Use appropriate permissions for map access

by Slava Imameev

> > Modify several functions in tools/bpf/bpftool/common.c to allow > > specification of requested access for file descriptors, such as > > read-only access. > > > > Update bpftool to request only read access for maps when write > > access is not required. This fixes errors when reading from maps > > that are protected from modification via security_bpf_map. > > > > Signed-off-by: Slava Imameev <slava.imameev(a)crowdstrike.com> > > > Thanks for this! > > I think the topic of map access in bpftool has been discussed in the > path, but I can't remember what we said or find it again - maybe I don't > remember correctly. Looks good to me overall. > > One question: How thoroughly have you tested that write permissions are > necessary for the different cases? I'm asking because I'm wondering > whether we could restrict to read-only in a couple more cases, see > below. (At the end of the day it doesn't matter too much, it's fine > being conservative and conserving write permissions for now, we can > always refine later; it's already an improvement to do read-only for the > dump/list cases). The goal of this patch was to fix bpftool errors we experienced on our systems. The efforts were focused only on changes to the affected subset of map commands. > > + /* Get an fd with the requested options. */ > > + close(fd); > > + fd = bpf_map_get_fd_by_id_opts(id, opts); > > + if (fd < 0) { > > + p_err("can't get map by id (%u): %s", id, > > + strerror(errno)); > > + goto err_close_fds; > > + } > > > We could maybe skip this step if the requested options are read-only, no > need to close and re-open a fd in that case? I agree. The change will be submitted with version 3. > > -int map_parse_fds(int *argc, char ***argv, int **fds) > > +int map_parse_fds(int *argc, char ***argv, int **fds, __u32 open_flags) > > { > > + LIBBPF_OPTS(bpf_get_fd_by_id_opts, opts); > > + > > + if (open_flags & ~BPF_F_RDONLY) { > > + p_err("invalid open_flags: %x", open_flags); > > + return -1; > > + } > > > I don't think we need this check, the flag is never passed by users. If > you want to catch a bug, use an assert() instead? I agree. This check is replaced with an assert and will be submitted with v3. > > diff --git a/tools/bpf/bpftool/iter.c b/tools/bpf/bpftool/iter.c > > index 5c39c2ed36a2..ad318a8667a4 100644 > > --- a/tools/bpf/bpftool/iter.c > > +++ b/tools/bpf/bpftool/iter.c > > @@ -37,7 +37,7 @@ static int do_pin(int argc, char **argv) > > return -1; > > } > > > > - map_fd = map_parse_fd(&argc, &argv); > > + map_fd = map_parse_fd(&argc, &argv, 0); > > > Do you need write permissions here? (I don't remember.) Iterator requires only read access. I changed it to BPF_F_RDONLY for v3. An iterator test is added to v3. > > - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); > > + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, BPF_F_RDONLY); > > > This one is surprising, don't you need write permissions to delete an > element from the map? Please double-check if you haven't already, I > wouldn't want to break "bpftool map delete". > > I note you don't test items deletion in your tests, by the way. Right, the delete command requires write access. I changed it and added an item deletion test to v3. > > static int do_pin(int argc, char **argv) > > { > > int err; > > > > - err = do_pin_any(argc, argv, map_parse_fd); > > + err = do_pin_any(argc, argv, map_parse_read_only_fd); > > if (!err && json_output) > > jsonw_null(json_wtr); > > return err; > > @@ -1319,7 +1329,7 @@ static int do_create(int argc, char **argv) > > if (!REQ_ARGS(2)) > > usage(); > > inner_map_fd = map_parse_fd_and_info(&argc, &argv, > > - &info, &len); > > + &info, &len, 0); > > > Do you need write permissions for the inner map's fd? This is something > that could be worth checking in the tests, as well. The inner map fd can be created with read only access. I changed it and added a test for map-of-maps creation to v3. > > @@ -128,7 +128,8 @@ int do_event_pipe(int argc, char **argv) > > int err, map_fd; > > > > map_info_len = sizeof(map_info); > > - map_fd = map_parse_fd_and_info(&argc, &argv, &map_info, &map_info_len); > > + map_fd = map_parse_fd_and_info(&argc, &argv, &map_info, &map_info_len, > > + 0); > > > This one might be worth checking, too. An event pipe map fd requires write access as the map is updated by bpf_map_update_elem inside __perf_buffer__new .

3 weeks, 3 days

1
0
0 0

[PATCH net-next v10] selftests/vsock: add initial vmtest.sh for vsock

by Bobby Eshleman

This commit introduces a new vmtest.sh runner for vsock. It uses virtme-ng/qemu to run tests in a VM. The tests validate G2H, H2G, and loopback. The testing tools from tools/testing/vsock/ are reused. Currently, only vsock_test is used. VMCI and hyperv support is included in the config file to be built with the -b option, though not used in the tests. Only tested on x86. To run: $ make -C tools/testing/selftests TARGETS=vsock $ tools/testing/selftests/vsock/vmtest.sh or $ make -C tools/testing/selftests TARGETS=vsock run_tests Example runs (after make -C tools/testing/selftests TARGETS=vsock): $ ./tools/testing/selftests/vsock/vmtest.sh 1..3 ok 0 vm_server_host_client ok 1 vm_client_host_server ok 2 vm_loopback SUMMARY: PASS=3 SKIP=0 FAIL=0 Log: /tmp/vsock_vmtest_m7DI.log $ ./tools/testing/selftests/vsock/vmtest.sh vm_loopback 1..1 ok 0 vm_loopback SUMMARY: PASS=1 SKIP=0 FAIL=0 Log: /tmp/vsock_vmtest_a1IO.log $ mkdir -p ~/scratch $ make -C tools/testing/selftests install TARGETS=vsock INSTALL_PATH=~/scratch [... omitted ...] $ cd ~/scratch $ ./run_kselftest.sh TAP version 13 1..1 # timeout set to 300 # selftests: vsock: vmtest.sh # 1..3 # ok 0 vm_server_host_client # ok 1 vm_client_host_server # ok 2 vm_loopback # SUMMARY: PASS=3 SKIP=0 FAIL=0 # Log: /tmp/vsock_vmtest_svEl.log ok 1 selftests: vsock: vmtest.sh Future work can include vsock_diag_test. Because vsock requires a VM to test anything other than loopback, this patch adds vmtest.sh as a kselftest itself. This is different than other systems that have a "vmtest.sh", where it is used as a utility script to spin up a VM to run the selftests as a guest (but isn't hooked into kselftest). Signed-off-by: Bobby Eshleman <bobbyeshleman(a)gmail.com> --- Changes in v10: - remove dupes in tools/testing/selftests/vsock/config - Link to v9: https://lore.kernel.org/r/20250527-vsock-vmtest-v9-1-24eaeec6fa55@gmail.com Changes in v9: - make kselftest build target depend on tools/testing/vsock sources (Stefano) - add check_vng() for vng version checking (Stefano) - add virtme_ssh_channel=tcp to kernel cmdline (Stefano) - Link to v8: https://lore.kernel.org/r/20250522-vsock-vmtest-v8-1-367619bef134@gmail.com Changes in v8: - remove NIPA comment from commit msg - remove tap_* functions and TAP_PREFIX - add -b for building kernel - Link to v7: https://lore.kernel.org/r/20250515-vsock-vmtest-v7-1-ba6fa86d6c2c@gmail.com Changes in v7: - fix exit code bug when ran is kselftest: use cnt_total instead of KSFT_NUM_TESTS - updated commit message with updated output - updated commit message with commands for installing/running as kselftest - Link to v6: https://lore.kernel.org/r/20250515-vsock-vmtest-v6-1-9af1cc023900@gmail.com Changes in v6: - add make cmd in commit message in vmtest.sh example (Stefano) - check nonzero size of QEMU_PIDFILE using -s conditional (Stefano) - display log file path after tests so it is easier to find amongst other random names - cleanup qemu pidfile if qemu is unable to remove it - make oops/warning failures more obvious with 'FAIL' prefix in log (simply saying 'detected' wasn't clear enough to identify failing condition) - Link to v5: https://lore.kernel.org/r/20250513-vsock-vmtest-v5-1-4e75c4a45ceb@gmail.com Changes in v5: - make log file a tmpfile (Paolo) - make sure both default and user defined QEMU gets handled by the dependency check (Paolo) - increased VM boot up timeout from 1m to 3m for slow hosts (Paolo) - rename vm_setup -> vm_start (Paolo) - derive wait_for_listener from selftests/net/net_helper.sh to removes ss usage - Remove unused 'unset IFS' line (Paolo) - leave space after variable declarations (Paolo) - make QEMU_PIDFILE a tmp file (Paolo) - make everything readonly that is only read (Paolo) - source ktap_helpers.sh for KSFT_PASS and friends (Paolo) - don't check for timeout util (Paolo) - add missing usage string for -q qemu arg - add tap prefix to SUMMARY line since it isn't part of TAP protocol - exit with the correct status code based on failure/pass counts - Link to v4: https://lore.kernel.org/r/20250507-vsock-vmtest-v4-1-6e2a97262cd6@gmail.com Changes in v4: - do not use special tab delimiter for help string parsing (Stefano + Paolo) - fix paths for when installing kselftest and running out-of-tree (Paolo) - change vng to using running kernel instead of compiled kernel (Paolo) - use multi-line string for QEMU_OPTS (Stefano) - change timeout to 300s (Paolo) - skip if tools are not found and use kselftests status codes (Paolo) - remove build from vmtest.sh (Paolo) - change 2222 -> SSH_HOST_PORT (Stefano) - add tap-format output - add vmtest.log to gitignore - check for vsock_test binary and remind user to build it if missing - create a proper build in makefile - style fixes - add ssh, timeout, and pkill to dependency check, just in case - fix numerical comparison in conditionals - check qemu pidfile exists before proceeding (avoid wasting time waiting for ssh) - fix tracking of pass/fail bug - fix stderr redirection bug - Link to v3: https://lore.kernel.org/r/20250428-vsock-vmtest-v3-1-181af6163f3e@gmail.com Changes in v3: - use common conditional syntax for checking variables - use return value instead of global rc - fix typo TEST_HOST_LISTENER_PORT -> TEST_HOST_PORT_LISTENER - use SIGTERM instead of SIGKILL on cleanup - use peer-cid=1 for loopback - change sleep delay times into globals - fix test_vm_loopback logging - add test selection in arguments - make QEMU an argument - check that vng binary is on path - use QEMU variable - change <tab><backslash> to <space><backslash> - fix hardcoded file paths - add comment in commit msg about script that vmtest.sh was based off of - Add tools/testing/selftest/vsock/Makefile for kselftest - Link to v2: https://lore.kernel.org/r/20250417-vsock-vmtest-v2-1-3901a27331e8@gmail.com Changes in v2: - add kernel oops and warnings checker - change testname variable to use FUNCNAME - fix spacing in test_vm_server_host_client - add -s skip build option to vmtest.sh - add test_vm_loopback - pass port to vm_wait_for_listener - fix indentation in vmtest.sh - add vmci and hyperv to config - changed whitespace from tabs to spaces in help string - Link to v1: https://lore.kernel.org/r/20250410-vsock-vmtest-v1-1-f35a81dab98c@gmail.com --- MAINTAINERS | 1 + tools/testing/selftests/vsock/.gitignore | 2 + tools/testing/selftests/vsock/Makefile | 17 ++ tools/testing/selftests/vsock/config | 111 +++++++ tools/testing/selftests/vsock/settings | 1 + tools/testing/selftests/vsock/vmtest.sh | 487 +++++++++++++++++++++++++++++++ 6 files changed, 619 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 657a67f9031ef7798c19ac63e6383d4cb18a9e1f..3fbdd7bbfce7196a3cc7db70203317c6bd0e51fd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -25751,6 +25751,7 @@ F: include/uapi/linux/vm_sockets.h F: include/uapi/linux/vm_sockets_diag.h F: include/uapi/linux/vsockmon.h F: net/vmw_vsock/ +F: tools/testing/selftests/vsock/ F: tools/testing/vsock/ VMALLOC diff --git a/tools/testing/selftests/vsock/.gitignore b/tools/testing/selftests/vsock/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..9c5bf379480f829a14713d5f5dc7d525bc272e84 --- /dev/null +++ b/tools/testing/selftests/vsock/.gitignore @@ -0,0 +1,2 @@ +vmtest.log +vsock_test diff --git a/tools/testing/selftests/vsock/Makefile b/tools/testing/selftests/vsock/Makefile new file mode 100644 index 0000000000000000000000000000000000000000..c407c0afd9388ee692d59a95f304182f067596e9 --- /dev/null +++ b/tools/testing/selftests/vsock/Makefile @@ -0,0 +1,17 @@ +# SPDX-License-Identifier: GPL-2.0 + +CURDIR := $(abspath .) +TOOLSDIR := $(abspath ../../..) +VSOCK_TEST_DIR := $(TOOLSDIR)/testing/vsock +VSOCK_TEST_SRCS := $(wildcard $(VSOCK_TEST_DIR)/*.c $(VSOCK_TEST_DIR)/*.h) + +$(OUTPUT)/vsock_test: $(VSOCK_TEST_DIR)/vsock_test + install -m 755 $< $@ + +$(VSOCK_TEST_DIR)/vsock_test: $(VSOCK_TEST_SRCS) + $(MAKE) -C $(VSOCK_TEST_DIR) vsock_test +TEST_PROGS += vmtest.sh +TEST_GEN_FILES := vsock_test + +include ../lib.mk + diff --git a/tools/testing/selftests/vsock/config b/tools/testing/selftests/vsock/config new file mode 100644 index 0000000000000000000000000000000000000000..5f0a4f17dfc96c0f4b8decafb01724648495191a --- /dev/null +++ b/tools/testing/selftests/vsock/config @@ -0,0 +1,111 @@ +CONFIG_BLK_DEV_INITRD=y +CONFIG_BPF=y +CONFIG_BPF_SYSCALL=y +CONFIG_BPF_JIT=y +CONFIG_HAVE_EBPF_JIT=y +CONFIG_BPF_EVENTS=y +CONFIG_FTRACE_SYSCALLS=y +CONFIG_FUNCTION_TRACER=y +CONFIG_HAVE_DYNAMIC_FTRACE=y +CONFIG_DYNAMIC_FTRACE=y +CONFIG_HAVE_KPROBES=y +CONFIG_KPROBES=y +CONFIG_KPROBE_EVENTS=y +CONFIG_ARCH_SUPPORTS_UPROBES=y +CONFIG_UPROBES=y +CONFIG_UPROBE_EVENTS=y +CONFIG_DEBUG_FS=y +CONFIG_FW_CFG_SYSFS=y +CONFIG_FW_CFG_SYSFS_CMDLINE=y +CONFIG_DRM=y +CONFIG_DRM_VIRTIO_GPU=y +CONFIG_DRM_VIRTIO_GPU_KMS=y +CONFIG_DRM_BOCHS=y +CONFIG_VIRTIO_IOMMU=y +CONFIG_SOUND=y +CONFIG_SND=y +CONFIG_SND_SEQUENCER=y +CONFIG_SND_PCI=y +CONFIG_SND_INTEL8X0=y +CONFIG_SND_HDA_CODEC_REALTEK=y +CONFIG_SECURITYFS=y +CONFIG_CGROUP_BPF=y +CONFIG_SQUASHFS=y +CONFIG_SQUASHFS_XZ=y +CONFIG_SQUASHFS_ZSTD=y +CONFIG_FUSE_FS=y +CONFIG_VIRTIO_FS=y +CONFIG_SERIO=y +CONFIG_PCI=y +CONFIG_INPUT=y +CONFIG_INPUT_KEYBOARD=y +CONFIG_KEYBOARD_ATKBD=y +CONFIG_SERIAL_8250=y +CONFIG_SERIAL_8250_CONSOLE=y +CONFIG_X86_VERBOSE_BOOTUP=y +CONFIG_VGA_CONSOLE=y +CONFIG_FB=y +CONFIG_FB_VESA=y +CONFIG_FRAMEBUFFER_CONSOLE=y +CONFIG_RTC_CLASS=y +CONFIG_RTC_HCTOSYS=y +CONFIG_RTC_DRV_CMOS=y +CONFIG_HYPERVISOR_GUEST=y +CONFIG_PARAVIRT=y +CONFIG_KVM_GUEST=y +CONFIG_KVM=y +CONFIG_KVM_INTEL=y +CONFIG_KVM_AMD=y +CONFIG_VSOCKETS=y +CONFIG_VSOCKETS_DIAG=y +CONFIG_VSOCKETS_LOOPBACK=y +CONFIG_VMWARE_VMCI_VSOCKETS=y +CONFIG_VIRTIO_VSOCKETS=y +CONFIG_VIRTIO_VSOCKETS_COMMON=y +CONFIG_HYPERV_VSOCKETS=y +CONFIG_VMWARE_VMCI=y +CONFIG_VHOST_VSOCK=y +CONFIG_HYPERV=y +CONFIG_UEVENT_HELPER=n +CONFIG_VIRTIO=y +CONFIG_VIRTIO_PCI=y +CONFIG_VIRTIO_MMIO=y +CONFIG_VIRTIO_BALLOON=y +CONFIG_NET=y +CONFIG_NET_CORE=y +CONFIG_NETDEVICES=y +CONFIG_NETWORK_FILESYSTEMS=y +CONFIG_INET=y +CONFIG_NET_9P=y +CONFIG_NET_9P_VIRTIO=y +CONFIG_9P_FS=y +CONFIG_VIRTIO_NET=y +CONFIG_CMDLINE_OVERRIDE=n +CONFIG_BINFMT_SCRIPT=y +CONFIG_SHMEM=y +CONFIG_TMPFS=y +CONFIG_UNIX=y +CONFIG_MODULE_SIG_FORCE=n +CONFIG_DEVTMPFS=y +CONFIG_TTY=y +CONFIG_VT=y +CONFIG_UNIX98_PTYS=y +CONFIG_EARLY_PRINTK=y +CONFIG_INOTIFY_USER=y +CONFIG_BLOCK=y +CONFIG_SCSI_LOWLEVEL=y +CONFIG_SCSI=y +CONFIG_SCSI_VIRTIO=y +CONFIG_BLK_DEV_SD=y +CONFIG_VIRTIO_CONSOLE=y +CONFIG_WATCHDOG=y +CONFIG_WATCHDOG_CORE=y +CONFIG_I6300ESB_WDT=y +CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y +CONFIG_OVERLAY_FS=y +CONFIG_DAX=y +CONFIG_DAX_DRIVER=y +CONFIG_FS_DAX=y +CONFIG_MEMORY_HOTPLUG=y +CONFIG_MEMORY_HOTREMOVE=y +CONFIG_ZONE_DEVICE=y diff --git a/tools/testing/selftests/vsock/settings b/tools/testing/selftests/vsock/settings new file mode 100644 index 0000000000000000000000000000000000000000..694d70710ff08ac9fc91e9ecb5dbdadcf051f019 --- /dev/null +++ b/tools/testing/selftests/vsock/settings @@ -0,0 +1 @@ +timeout=300 diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh new file mode 100755 index 0000000000000000000000000000000000000000..edacebfc163251eee9cd495eb5e704dc7adc958e --- /dev/null +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -0,0 +1,487 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (c) 2025 Meta Platforms, Inc. and affiliates +# +# Dependencies: +# * virtme-ng +# * busybox-static (used by virtme-ng) +# * qemu (used by virtme-ng) + +readonly SCRIPT_DIR="$(cd -P -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)" +readonly KERNEL_CHECKOUT=$(realpath "${SCRIPT_DIR}"/../../../../) + +source "${SCRIPT_DIR}"/../kselftest/ktap_helpers.sh + +readonly VSOCK_TEST="${SCRIPT_DIR}"/vsock_test +readonly TEST_GUEST_PORT=51000 +readonly TEST_HOST_PORT=50000 +readonly TEST_HOST_PORT_LISTENER=50001 +readonly SSH_GUEST_PORT=22 +readonly SSH_HOST_PORT=2222 +readonly VSOCK_CID=1234 +readonly WAIT_PERIOD=3 +readonly WAIT_PERIOD_MAX=60 +readonly WAIT_TOTAL=$(( WAIT_PERIOD * WAIT_PERIOD_MAX )) +readonly QEMU_PIDFILE=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid) + +# virtme-ng offers a netdev for ssh when using "--ssh", but we also need a +# control port forwarded for vsock_test. Because virtme-ng doesn't support +# adding an additional port to forward to the device created from "--ssh" and +# virtme-init mistakenly sets identical IPs to the ssh device and additional +# devices, we instead opt out of using --ssh, add the device manually, and also +# add the kernel cmdline options that virtme-init uses to setup the interface. +readonly QEMU_TEST_PORT_FWD="hostfwd=tcp::${TEST_HOST_PORT}-:${TEST_GUEST_PORT}" +readonly QEMU_SSH_PORT_FWD="hostfwd=tcp::${SSH_HOST_PORT}-:${SSH_GUEST_PORT}" +readonly QEMU_OPTS="\ + -netdev user,id=n0,${QEMU_TEST_PORT_FWD},${QEMU_SSH_PORT_FWD} \ + -device virtio-net-pci,netdev=n0 \ + -device vhost-vsock-pci,guest-cid=${VSOCK_CID} \ + --pidfile ${QEMU_PIDFILE} \ +" +readonly KERNEL_CMDLINE="\ + virtme.dhcp net.ifnames=0 biosdevname=0 \ + virtme.ssh virtme_ssh_channel=tcp virtme_ssh_user=$USER \ +" +readonly LOG=$(mktemp /tmp/vsock_vmtest_XXXX.log) +readonly TEST_NAMES=(vm_server_host_client vm_client_host_server vm_loopback) +readonly TEST_DESCS=( + "Run vsock_test in server mode on the VM and in client mode on the host." + "Run vsock_test in client mode on the VM and in server mode on the host." + "Run vsock_test using the loopback transport in the VM." +) + +VERBOSE=0 + +usage() { + local name + local desc + local i + + echo + echo "$0 [OPTIONS] [TEST]..." + echo "If no TEST argument is given, all tests will be run." + echo + echo "Options" + echo " -b: build the kernel from the current source tree and use it for guest VMs" + echo " -q: set the path to or name of qemu binary" + echo " -v: verbose output" + echo + echo "Available tests" + + for ((i = 0; i < ${#TEST_NAMES[@]}; i++)); do + name=${TEST_NAMES[${i}]} + desc=${TEST_DESCS[${i}]} + printf "\t%-35s%-35s\n" "${name}" "${desc}" + done + echo + + exit 1 +} + +die() { + echo "$*" >&2 + exit "${KSFT_FAIL}" +} + +vm_ssh() { + ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost "$@" + return $? +} + +cleanup() { + if [[ -s "${QEMU_PIDFILE}" ]]; then + pkill -SIGTERM -F "${QEMU_PIDFILE}" > /dev/null 2>&1 + fi + + # If failure occurred during or before qemu start up, then we need + # to clean this up ourselves. + if [[ -e "${QEMU_PIDFILE}" ]]; then + rm "${QEMU_PIDFILE}" + fi +} + +check_args() { + local found + + for arg in "$@"; do + found=0 + for name in "${TEST_NAMES[@]}"; do + if [[ "${name}" = "${arg}" ]]; then + found=1 + break + fi + done + + if [[ "${found}" -eq 0 ]]; then + echo "${arg} is not an available test" >&2 + usage + fi + done + + for arg in "$@"; do + if ! command -v > /dev/null "test_${arg}"; then + echo "Test ${arg} not found" >&2 + usage + fi + done +} + +check_deps() { + for dep in vng ${QEMU} busybox pkill ssh; do + if [[ ! -x $(command -v "${dep}") ]]; then + echo -e "skip: dependency ${dep} not found!\n" + exit "${KSFT_SKIP}" + fi + done + + if [[ ! -x $(command -v "${VSOCK_TEST}") ]]; then + printf "skip: %s not found!" "${VSOCK_TEST}" + printf " Please build the kselftest vsock target.\n" + exit "${KSFT_SKIP}" + fi +} + +check_vng() { + local tested_versions + local version + local ok + + tested_versions=("1.33" "1.36") + version="$(vng --version)" + + ok=0 + for tv in "${tested_versions[@]}"; do + if [[ "${version}" == *"${tv}"* ]]; then + ok=1 + break + fi + done + + if [[ ! "${ok}" -eq 1 ]]; then + printf "warning: vng version '%s' has not been tested and may " "${version}" >&2 + printf "not function properly.\n\tThe following versions have been tested: " >&2 + echo "${tested_versions[@]}" >&2 + fi +} + +handle_build() { + if [[ ! "${BUILD}" -eq 1 ]]; then + return + fi + + if [[ ! -d "${KERNEL_CHECKOUT}" ]]; then + echo "-b requires vmtest.sh called from the kernel source tree" >&2 + exit 1 + fi + + pushd "${KERNEL_CHECKOUT}" &>/dev/null + + if ! vng --kconfig --config "${SCRIPT_DIR}"/config; then + die "failed to generate .config for kernel source tree (${KERNEL_CHECKOUT})" + fi + + if ! make -j$(nproc); then + die "failed to build kernel from source tree (${KERNEL_CHECKOUT})" + fi + + popd &>/dev/null +} + +vm_start() { + local logfile=/dev/null + local verbose_opt="" + local kernel_opt="" + local qemu + + qemu=$(command -v "${QEMU}") + + if [[ "${VERBOSE}" -eq 1 ]]; then + verbose_opt="--verbose" + logfile=/dev/stdout + fi + + if [[ "${BUILD}" -eq 1 ]]; then + kernel_opt="${KERNEL_CHECKOUT}" + fi + + vng \ + --run \ + ${kernel_opt} \ + ${verbose_opt} \ + --qemu-opts="${QEMU_OPTS}" \ + --qemu="${qemu}" \ + --user root \ + --append "${KERNEL_CMDLINE}" \ + --rw &> ${logfile} & + + if ! timeout ${WAIT_TOTAL} \ + bash -c 'while [[ ! -s '"${QEMU_PIDFILE}"' ]]; do sleep 1; done; exit 0'; then + die "failed to boot VM" + fi +} + +vm_wait_for_ssh() { + local i + + i=0 + while true; do + if [[ ${i} -gt ${WAIT_PERIOD_MAX} ]]; then + die "Timed out waiting for guest ssh" + fi + if vm_ssh -- true; then + break + fi + i=$(( i + 1 )) + sleep ${WAIT_PERIOD} + done +} + +# derived from selftests/net/net_helper.sh +wait_for_listener() +{ + local port=$1 + local interval=$2 + local max_intervals=$3 + local protocol=tcp + local pattern + local i + + pattern=":$(printf "%04X" "${port}") " + + # for tcp protocol additionally check the socket state + [ "${protocol}" = "tcp" ] && pattern="${pattern}0A" + for i in $(seq "${max_intervals}"); do + if awk '{print $2" "$4}' /proc/net/"${protocol}"* | \ + grep -q "${pattern}"; then + break + fi + sleep "${interval}" + done +} + +vm_wait_for_listener() { + local port=$1 + + vm_ssh <<EOF +$(declare -f wait_for_listener) +wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX} +EOF +} + +host_wait_for_listener() { + wait_for_listener "${TEST_HOST_PORT_LISTENER}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}" +} + +__log_stdin() { + cat | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }' +} + +__log_args() { + echo "$*" | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }' +} + +log() { + local prefix="$1" + + shift + local redirect= + if [[ ${VERBOSE} -eq 0 ]]; then + redirect=/dev/null + else + redirect=/dev/stdout + fi + + if [[ "$#" -eq 0 ]]; then + __log_stdin | tee -a "${LOG}" > ${redirect} + else + __log_args "$@" | tee -a "${LOG}" > ${redirect} + fi +} + +log_setup() { + log "setup" "$@" +} + +log_host() { + local testname=$1 + + shift + log "test:${testname}:host" "$@" +} + +log_guest() { + local testname=$1 + + shift + log "test:${testname}:guest" "$@" +} + +test_vm_server_host_client() { + local testname="${FUNCNAME[0]#test_}" + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=server \ + --control-port="${TEST_GUEST_PORT}" \ + --peer-cid=2 \ + 2>&1 | log_guest "${testname}" & + + vm_wait_for_listener "${TEST_GUEST_PORT}" + + ${VSOCK_TEST} \ + --mode=client \ + --control-host=127.0.0.1 \ + --peer-cid="${VSOCK_CID}" \ + --control-port="${TEST_HOST_PORT}" 2>&1 | log_host "${testname}" + + return $? +} + +test_vm_client_host_server() { + local testname="${FUNCNAME[0]#test_}" + + ${VSOCK_TEST} \ + --mode "server" \ + --control-port "${TEST_HOST_PORT_LISTENER}" \ + --peer-cid "${VSOCK_CID}" 2>&1 | log_host "${testname}" & + + host_wait_for_listener + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=client \ + --control-host=10.0.2.2 \ + --peer-cid=2 \ + --control-port="${TEST_HOST_PORT_LISTENER}" 2>&1 | log_guest "${testname}" + + return $? +} + +test_vm_loopback() { + local testname="${FUNCNAME[0]#test_}" + local port=60000 # non-forwarded local port + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=server \ + --control-port="${port}" \ + --peer-cid=1 2>&1 | log_guest "${testname}" & + + vm_wait_for_listener "${port}" + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=client \ + --control-host="127.0.0.1" \ + --control-port="${port}" \ + --peer-cid=1 2>&1 | log_guest "${testname}" + + return $? +} + +run_test() { + local host_oops_cnt_before + local host_warn_cnt_before + local vm_oops_cnt_before + local vm_warn_cnt_before + local host_oops_cnt_after + local host_warn_cnt_after + local vm_oops_cnt_after + local vm_warn_cnt_after + local name + local rc + + host_oops_cnt_before=$(dmesg | grep -c -i 'Oops') + host_warn_cnt_before=$(dmesg --level=warn | wc -l) + vm_oops_cnt_before=$(vm_ssh -- dmesg | grep -c -i 'Oops') + vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | wc -l) + + name=$(echo "${1}" | awk '{ print $1 }') + eval test_"${name}" + rc=$? + + host_oops_cnt_after=$(dmesg | grep -i 'Oops' | wc -l) + if [[ ${host_oops_cnt_after} -gt ${host_oops_cnt_before} ]]; then + echo "FAIL: kernel oops detected on host" | log_host "${name}" + rc=$KSFT_FAIL + fi + + host_warn_cnt_after=$(dmesg --level=warn | wc -l) + if [[ ${host_warn_cnt_after} -gt ${host_warn_cnt_before} ]]; then + echo "FAIL: kernel warning detected on host" | log_host "${name}" + rc=$KSFT_FAIL + fi + + vm_oops_cnt_after=$(vm_ssh -- dmesg | grep -i 'Oops' | wc -l) + if [[ ${vm_oops_cnt_after} -gt ${vm_oops_cnt_before} ]]; then + echo "FAIL: kernel oops detected on vm" | log_host "${name}" + rc=$KSFT_FAIL + fi + + vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | wc -l) + if [[ ${vm_warn_cnt_after} -gt ${vm_warn_cnt_before} ]]; then + echo "FAIL: kernel warning detected on vm" | log_host "${name}" + rc=$KSFT_FAIL + fi + + return "${rc}" +} + +QEMU="qemu-system-$(uname -m)" + +while getopts :hvsq:b o +do + case $o in + v) VERBOSE=1;; + b) BUILD=1;; + q) QEMU=$OPTARG;; + h|*) usage;; + esac +done +shift $((OPTIND-1)) + +trap cleanup EXIT + +if [[ ${#} -eq 0 ]]; then + ARGS=("${TEST_NAMES[@]}") +else + ARGS=("$@") +fi + +check_args "${ARGS[@]}" +check_deps +check_vng +handle_build + +echo "1..${#ARGS[@]}" + +log_setup "Booting up VM" +vm_start +vm_wait_for_ssh +log_setup "VM booted up" + +cnt_pass=0 +cnt_fail=0 +cnt_skip=0 +cnt_total=0 +for arg in "${ARGS[@]}"; do + run_test "${arg}" + rc=$? + if [[ ${rc} -eq $KSFT_PASS ]]; then + cnt_pass=$(( cnt_pass + 1 )) + echo "ok ${cnt_total} ${arg}" + elif [[ ${rc} -eq $KSFT_SKIP ]]; then + cnt_skip=$(( cnt_skip + 1 )) + echo "ok ${cnt_total} ${arg} # SKIP" + elif [[ ${rc} -eq $KSFT_FAIL ]]; then + cnt_fail=$(( cnt_fail + 1 )) + echo "not ok ${cnt_total} ${arg} # exit=$rc" + fi + cnt_total=$(( cnt_total + 1 )) +done + +echo "SUMMARY: PASS=${cnt_pass} SKIP=${cnt_skip} FAIL=${cnt_fail}" +echo "Log: ${LOG}" + +if [ $((cnt_pass + cnt_skip)) -eq ${cnt_total} ]; then + exit "$KSFT_PASS" +else + exit "$KSFT_FAIL" +fi --- base-commit: 8066e388be48f1ad62b0449dc1d31a25489fa12a change-id: 20250325-vsock-vmtest-b3a21d2102c2 Best regards, -- Bobby Eshleman <bobbyeshleman(a)gmail.com>

3 weeks, 3 days

3
2
0 0

selftests/filesystem: clang warning null passed to a callee that requires a non-null argument [-Wnonnull]

by Naresh Kamboju

Regressions found on arm, arm64 and x86_64 building warnings with clang-20 and clang-nightly started from Linux next-20250603 Regressions found on arm, arm64 and x86_64 - selftests/filesystem Regression Analysis: - New regression? Yes - Reproducible? Yes First seen on the next-20250603 Good: next-20250530 Bad: next-20250603 Test regression: arm arm64 x86_64 clang warning null passed to a callee that requires a non-null argument [-Wnonnull] Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org> ## Build warnings make[4]: Entering directory '/builds/linux/tools/testing/selftests/filesystems' CC devpts_pts CC file_stressor CC anon_inode_test anon_inode_test.c:45:37: warning: null passed to a callee that requires a non-null argument [-Wnonnull] 45 | ASSERT_LT(execveat(fd_context, "", NULL, NULL, AT_EMPTY_PATH), 0); | ^~~~ ## Source * Kernel version: 6.15.0-next-20250605 * Git tree: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git * Git sha: a0bea9e39035edc56a994630e6048c8a191a99d8 * Toolchain: Debian clang version 21.0.0 (++20250529012636+c474f8f2404d-1~exp1~20250529132821.1479) ## Build * Test log: https://qa-reports.linaro.org/api/testruns/28651387/log_file/ * Build link: https://storage.tuxsuite.com/public/linaro/lkft/builds/2xzM4wMl8SvuLKE3mw3c… * Kernel config: https://storage.tuxsuite.com/public/linaro/lkft/builds/2xzM4wMl8SvuLKE3mw3c… -- Linaro LKFT https://lkft.linaro.org

3 weeks, 3 days

3
3
0 0

[PATCH v3] KVM: SEV: Disable SEV-SNP support on initialization failure

by Ashish Kalra

From: Ashish Kalra <ashish.kalra(a)amd.com> During platform init, SNP initialization may fail for several reasons, such as firmware command failures and incompatible versions. However, the KVM capability may continue to advertise support for it. The platform may have SNP enabled but if SNP_INIT fails then SNP is not supported by KVM. During KVM module initialization query the SNP platform status to obtain the SNP initialization state and use it as an additional condition to determine support for SEV-SNP. Co-developed-by: Sean Christopherson <seanjc(a)google.com> Signed-off-by: Sean Christopherson <seanjc(a)google.com> Co-developed-by: Pratik R. Sampat <prsampat(a)amd.com> Signed-off-by: Pratik R. Sampat <prsampat(a)amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky(a)amd.com> Signed-off-by: Ashish Kalra <ashish.kalra(a)amd.com> --- arch/x86/kvm/svm/sev.c | 44 +++++++++++++++++++++++++++++++++--------- 1 file changed, 35 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index dea9480b9ff6..8c3b12e3de8c 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -2935,6 +2935,33 @@ void __init sev_set_cpu_caps(void) } } +static bool is_sev_snp_initialized(void) +{ + struct sev_user_data_snp_status *status; + struct sev_data_snp_addr buf; + bool initialized = false; + int ret, error = 0; + + status = snp_alloc_firmware_page(GFP_KERNEL | __GFP_ZERO); + if (!status) + return false; + + buf.address = __psp_pa(status); + ret = sev_do_cmd(SEV_CMD_SNP_PLATFORM_STATUS, &buf, &error); + if (ret) { + pr_err("SEV: SNP_PLATFORM_STATUS failed ret=%d, fw_error=%d (%#x)\n", + ret, error, error); + goto out; + } + + initialized = !!status->state; + +out: + snp_free_firmware_page(status); + + return initialized; +} + void __init sev_hardware_setup(void) { unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count; @@ -3039,6 +3066,14 @@ void __init sev_hardware_setup(void) sev_snp_supported = sev_snp_enabled && cc_platform_has(CC_ATTR_HOST_SEV_SNP); out: + if (sev_enabled) { + init_args.probe = true; + if (sev_platform_init(&init_args)) + sev_supported = sev_es_supported = sev_snp_supported = false; + else if (sev_snp_supported) + sev_snp_supported = is_sev_snp_initialized(); + } + if (boot_cpu_has(X86_FEATURE_SEV)) pr_info("SEV %s (ASIDs %u - %u)\n", sev_supported ? min_sev_asid <= max_sev_asid ? "enabled" : @@ -3065,15 +3100,6 @@ void __init sev_hardware_setup(void) sev_supported_vmsa_features = 0; if (sev_es_debug_swap_enabled) sev_supported_vmsa_features |= SVM_SEV_FEAT_DEBUG_SWAP; - - if (!sev_enabled) - return; - - /* - * Do both SNP and SEV initialization at KVM module load. - */ - init_args.probe = true; - sev_platform_init(&init_args); } void sev_hardware_unsetup(void) -- 2.34.1

3 weeks, 3 days

5
4
0 0

[PATCH v2] selftests: cachestat: Refactor test to remove duplication

by Suresh K C

From: Suresh K C <suresh.k.chandrappa(a)gmail.com> Refactored the mmap and shmem test logic into a common function to reduce code duplication and improve maintainability Changes in v2: Refactored mmap and shmem tests into a common function Renamed test function to run_cachestat_test() Removed test for /proc/cpuinfo as a general /proc test case already exists Signed-off-by: Suresh K C <suresh.k.chandrappa(a)gmail.com> --- .../selftests/cachestat/test_cachestat.c | 97 ++++++------------- 1 file changed, 30 insertions(+), 67 deletions(-) diff --git a/tools/testing/selftests/cachestat/test_cachestat.c b/tools/testing/selftests/cachestat/test_cachestat.c index 81e7f6dd2279..7c2f64175943 100644 --- a/tools/testing/selftests/cachestat/test_cachestat.c +++ b/tools/testing/selftests/cachestat/test_cachestat.c @@ -22,7 +22,7 @@ static const char * const dev_files[] = { "/dev/zero", "/dev/null", "/dev/urandom", - "/proc/version","/proc/cpuinfo","/proc" + "/proc/version","/proc" }; void print_cachestat(struct cachestat *cs) @@ -33,6 +33,11 @@ void print_cachestat(struct cachestat *cs) cs->nr_evicted, cs->nr_recently_evicted); } +enum file_type { + FILE_MMAP, + FILE_SHMEM +}; + bool write_exactly(int fd, size_t filesize) { int random_fd = open("/dev/urandom", O_RDONLY); @@ -202,66 +207,8 @@ static int test_cachestat(const char *filename, bool write_random, bool create, return ret; } -bool test_cachestat_mmap(void){ - - size_t PS = sysconf(_SC_PAGESIZE); - size_t filesize = PS * 512 * 2;; - int syscall_ret; - size_t compute_len = PS * 512; - struct cachestat_range cs_range = { PS, compute_len }; - char *filename = "tmpshmcstat"; - unsigned long num_pages = compute_len / PS; - struct cachestat cs; - bool ret = true; - int fd = open(filename, O_RDWR | O_CREAT | O_TRUNC, 0666); - if (fd < 0) { - ksft_print_msg("Unable to create mmap file.\n"); - ret = false; - goto out; - } - if (ftruncate(fd, filesize)) { - ksft_print_msg("Unable to truncate mmap file.\n"); - ret = false; - goto close_fd; - } - if (!write_exactly(fd, filesize)) { - ksft_print_msg("Unable to write to mmap file.\n"); - ret = false; - goto close_fd; - } - char *map = mmap(NULL, filesize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); - if (map == MAP_FAILED) { - ksft_print_msg("mmap failed.\n"); - ret = false; - goto close_fd; - } - - for (int i = 0; i < filesize; i++) { - map[i] = 'A'; - } - map[filesize - 1] = 'X'; - - syscall_ret = syscall(__NR_cachestat, fd, &cs_range, &cs, 0); - - if (syscall_ret) { - ksft_print_msg("Cachestat returned non-zero.\n"); - ret = false; - } else { - print_cachestat(&cs); - if (cs.nr_cache + cs.nr_evicted != num_pages) { - ksft_print_msg("Total number of cached and evicted pages is off.\n"); - ret = false; - } - } - -close_fd: - close(fd); - unlink(filename); -out: - return ret; -} -bool test_cachestat_shmem(void) +bool run_cachestat_test(enum file_type type) { size_t PS = sysconf(_SC_PAGESIZE); size_t filesize = PS * 512 * 2; /* 2 2MB huge pages */ @@ -271,27 +218,43 @@ bool test_cachestat_shmem(void) char *filename = "tmpshmcstat"; struct cachestat cs; bool ret = true; + int fd; unsigned long num_pages = compute_len / PS; - int fd = shm_open(filename, O_CREAT | O_RDWR, 0600); + if (type == FILE_SHMEM) + fd = shm_open(filename, O_CREAT | O_RDWR, 0600); + else + fd = open(filename, O_RDWR | O_CREAT | O_TRUNC, 0666); if (fd < 0) { - ksft_print_msg("Unable to create shmem file.\n"); + ksft_print_msg("Unable to create file.\n"); ret = false; goto out; } if (ftruncate(fd, filesize)) { - ksft_print_msg("Unable to truncate shmem file.\n"); + ksft_print_msg("Unable to truncate file.\n"); ret = false; goto close_fd; } if (!write_exactly(fd, filesize)) { - ksft_print_msg("Unable to write to shmem file.\n"); + ksft_print_msg("Unable to write to file.\n"); ret = false; goto close_fd; } + if (type == FILE_MMAP){ + char *map = mmap(NULL, filesize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (map == MAP_FAILED) { + ksft_print_msg("mmap failed.\n"); + ret = false; + goto close_fd; + } + for (int i = 0; i < filesize; i++) { + map[i] = 'A'; + } + map[filesize - 1] = 'X'; + } syscall_ret = syscall(__NR_cachestat, fd, &cs_range, &cs, 0); if (syscall_ret) { @@ -333,7 +296,7 @@ int main(void) ret = 1; } - for (int i = 0; i < 6; i++) { + for (int i = 0; i < 5; i++) { const char *dev_filename = dev_files[i]; if (test_cachestat(dev_filename, false, false, false, @@ -367,14 +330,14 @@ int main(void) break; } - if (test_cachestat_shmem()) + if (run_cachestat_test(FILE_SHMEM)) ksft_test_result_pass("cachestat works with a shmem file\n"); else { ksft_test_result_fail("cachestat fails with a shmem file\n"); ret = 1; } - if (test_cachestat_mmap()) + if (run_cachestat_test(FILE_MMAP)) ksft_test_result_pass("cachestat works with a mmap file\n"); else { ksft_test_result_fail("cachestat fails with a mmap file\n"); -- 2.43.0

3 weeks, 3 days

1
0
0 0

[PATCH v3 1/2] libbpf: add support for printing BTF character arrays as strings

by Blake Jones

The BTF dumper code currently displays arrays of characters as just that - arrays, with each character formatted individually. Sometimes this is what makes sense, but it's nice to be able to treat that array as a string. This change adds a special case to the btf_dump functionality to allow 0-terminated arrays of single-byte integer values to be printed as character strings. Characters for which isprint() returns false are printed as hex-escaped values. This is enabled when the new ".emit_strings" is set to 1 in the btf_dump_type_data_opts structure. As an example, here's what it looks like to dump the string "hello" using a few different field values for btf_dump_type_data_opts (.compact = 1): - .emit_strings = 0, .skip_names = 0: (char[6])['h','e','l','l','o',] - .emit_strings = 0, .skip_names = 1: ['h','e','l','l','o',] - .emit_strings = 1, .skip_names = 0: (char[6])"hello" - .emit_strings = 1, .skip_names = 1: "hello" Here's the string "h\xff", dumped with .compact = 1 and .skip_names = 1: - .emit_strings = 0: ['h',-1,] - .emit_strings = 1: "h\xff" Signed-off-by: Blake Jones <blakejones(a)google.com> --- tools/lib/bpf/btf.h | 3 ++- tools/lib/bpf/btf_dump.c | 55 +++++++++++++++++++++++++++++++++++++++- 2 files changed, 56 insertions(+), 2 deletions(-) diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h index 4392451d634b..ccfd905f03df 100644 --- a/tools/lib/bpf/btf.h +++ b/tools/lib/bpf/btf.h @@ -326,9 +326,10 @@ struct btf_dump_type_data_opts { bool compact; /* no newlines/indentation */ bool skip_names; /* skip member/type names */ bool emit_zeroes; /* show 0-valued fields */ + bool emit_strings; /* print char arrays as strings */ size_t :0; }; -#define btf_dump_type_data_opts__last_field emit_zeroes +#define btf_dump_type_data_opts__last_field emit_strings LIBBPF_API int btf_dump__dump_type_data(struct btf_dump *d, __u32 id, diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c index 460c3e57fadb..7c2f1f13f958 100644 --- a/tools/lib/bpf/btf_dump.c +++ b/tools/lib/bpf/btf_dump.c @@ -68,6 +68,7 @@ struct btf_dump_data { bool compact; bool skip_names; bool emit_zeroes; + bool emit_strings; __u8 indent_lvl; /* base indent level */ char indent_str[BTF_DATA_INDENT_STR_LEN]; /* below are used during iteration */ @@ -2028,6 +2029,52 @@ static int btf_dump_var_data(struct btf_dump *d, return btf_dump_dump_type_data(d, NULL, t, type_id, data, 0, 0); } +static int btf_dump_string_data(struct btf_dump *d, + const struct btf_type *t, + __u32 id, + const void *data) +{ + const struct btf_array *array = btf_array(t); + const char *chars = data; + __u32 i; + + /* Make sure it is a NUL-terminated string. */ + for (i = 0; i < array->nelems; i++) { + if ((void *)(chars + i) >= d->typed_dump->data_end) + return -E2BIG; + if (chars[i] == '\0') + break; + } + if (i == array->nelems) { + /* The caller will print this as a regular array. */ + return -EINVAL; + } + + btf_dump_data_pfx(d); + btf_dump_printf(d, "\""); + + for (i = 0; i < array->nelems; i++) { + char c = chars[i]; + + if (c == '\0') { + /* + * When printing character arrays as strings, NUL bytes + * are always treated as string terminators; they are + * never printed. + */ + break; + } + if (isprint(c)) + btf_dump_printf(d, "%c", c); + else + btf_dump_printf(d, "\\x%02x", (__u8)c); + } + + btf_dump_printf(d, "\""); + + return 0; +} + static int btf_dump_array_data(struct btf_dump *d, const struct btf_type *t, __u32 id, @@ -2055,8 +2102,13 @@ static int btf_dump_array_data(struct btf_dump *d, * char arrays, so if size is 1 and element is * printable as a char, we'll do that. */ - if (elem_size == 1) + if (elem_size == 1) { + if (d->typed_dump->emit_strings && + btf_dump_string_data(d, t, id, data) == 0) { + return 0; + } d->typed_dump->is_array_char = true; + } } /* note that we increment depth before calling btf_dump_print() below; @@ -2544,6 +2596,7 @@ int btf_dump__dump_type_data(struct btf_dump *d, __u32 id, d->typed_dump->compact = OPTS_GET(opts, compact, false); d->typed_dump->skip_names = OPTS_GET(opts, skip_names, false); d->typed_dump->emit_zeroes = OPTS_GET(opts, emit_zeroes, false); + d->typed_dump->emit_strings = OPTS_GET(opts, emit_strings, false); ret = btf_dump_dump_type_data(d, NULL, t, id, data, 0, 0); -- 2.49.0.1204.g71687c7c1d-goog

3 weeks, 3 days

4
5
0 0

[PATCH v8 RESEND iproute2-next 0/1] DUALPI2 iproute2 patch

by chia-yu.chang＠nokia-bell-labs.com

From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com> Hello, Please find DUALPI2 iproute2 patch v8. v8 (09-May-25) - Update pkt_sched.h with the one in nex-next - Correct a typo in the comment within pkt_sched.h (ALOK TIWARI <alok.a.tiwari(a)oracle.com>) - Update manual content in man/man8/tc-dualpi2.8 (ALOK TIWARI <alok.a.tiwari(a)oracle.com>) - Update tc/q_dualpi2.c to fix missing blank lines and add missing case (ALOK TIWARI <alok.a.tiwari(a)oracle.com>) v7 (05-May-25) - Align pkt_sched.h with the v14 version of net-next due to spec modification in tc.yaml - Reorganize dualpi2_print_opt() to match the order in tc.yaml - Remove credit-queue in PRINT_JSON v6 (26-Apr-25) - Update JSON file output due to spec modification in tc.yaml of net-next v5 (25-Mar-25) - Use matches() to replace current strcmp() (Stephen Hemminger <stephen(a)networkplumber.org>) - Use general parse_percent() for handling scaled percentage values (Stephen Hemminger <stephen(a)networkplumber.org>) - Add print function for JSON of dualpi2 stats (Stephen Hemminger <stephen(a)networkplumber.org>) v4 (16-Mar-25) - Add min_qlen_step to the dualpi2 attribute as the minimum queue length in number of packets in the L-queue to start step marking. v3 (21-Feb-25) - Add memlimit to the dualpi2 attribute, and add memory_used, max_memory_used, and memory_limit in dualpi2 stats (Dave Taht <dave.taht(a)gmail.com>) - Update the manual to align with the latest implementation and clarify the queue naming and default unit - Use common "get_scaled_alpha_beta" and clean print_opt for Dualpi2 v2 (23-Oct-24) - Rename get_float in dualpi2 to get_float_min_max in utils.c - Move get_float from iplink_can.c in utils.c (Stephen Hemminger <stephen(a)networkplumber.org>) - Add print function for JSON of dualpi2 (Stephen Hemminger <stephen(a)networkplumber.org>) For more details of DualPI2, please refer IETF RFC9332 (https://datatracker.ietf.org/doc/html/rfc9332). Best Regards, Chia-Yu Chia-Yu Chang (1): tc: add dualpi2 scheduler module bash-completion/tc | 11 +- include/uapi/linux/pkt_sched.h | 68 +++++ include/utils.h | 2 + ip/iplink_can.c | 14 - lib/utils.c | 30 ++ man/man8/tc-dualpi2.8 | 249 +++++++++++++++ tc/Makefile | 1 + tc/q_dualpi2.c | 534 +++++++++++++++++++++++++++++++++ 8 files changed, 894 insertions(+), 15 deletions(-) create mode 100644 man/man8/tc-dualpi2.8 create mode 100644 tc/q_dualpi2.c -- 2.34.1

3 weeks, 3 days

2
2
0 0

[PATCH net v2 0/2] Fix ntuple rules targeting default RSS

by Gal Pressman

This series addresses a regression in ethtool flow steering where rules targeting the default RSS context (context 0) were incorrectly rejected. The default RSS context always exists but is not stored in the rss_ctx xarray like additional contexts. The current validation logic was checking for the existence of context 0 in this array, causing valid flow steering rules to be rejected. This prevented configurations such as: - High priority rules directing specific traffic to the default context - Low priority catch-all rules directing remaining traffic to additional contexts Patch 1 fixes the validation logic to skip the existence check for context 0. Patch 2 adds a selftest that verifies this behavior. Changelog - v1->v2: https://lore.kernel.org/all/20250225071348.509432-1-gal@nvidia.com/ * Reworded commit message. * Added a selftest. Gal Pressman (2): net: ethtool: Don't check if RSS context exists in case of context 0 selftests: drv-net: rss_ctx: Add test for ntuple rules targeting default RSS context net/ethtool/ioctl.c | 3 +- .../selftests/drivers/net/hw/rss_ctx.py | 59 ++++++++++++++++++- 2 files changed, 60 insertions(+), 2 deletions(-) -- 2.40.1

3 weeks, 3 days

3
9
0 0

[PATCH bpf-next v2 0/2] bpf,ktls: Fix data corruption caused by using bpf_msg_pop_data() in ktls

by Jiayuan Chen

Cong reported an issue where running 'test_sockmap' in the current bpf-next tree results in an error [1]. The specific test case that triggered the error is a combined test involving ktls and bpf_msg_pop_data(). Root Cause: When sending plaintext data, we initially calculated the corresponding ciphertext length. However, if we later reduced the plaintext data length via socket policy, we failed to recalculate the ciphertext length. This results in transmitting buffers containing uninitialized data during ciphertext transmission. This causes uninitialized bytes to be appended after a complete "Application Data" packet, leading to errors on the receiving end when parsing TLS record. This issue has existed for a long time but was only exposed after the following test code was merged. commit 47eae080410b ("selftests/bpf: Add more tests for test_txmsg_push_pop in test_sockmap") Although we already had tests for pop data before this commit, the pop data length was insufficient (less than 5 bytes). This meant that the corrupted TLS records with data length <5 bytes were cached without being parsed, resulting in no error being triggered. After this fix, all tests pass. 1/ 6 sockmap::txmsg test passthrough:OK 2/ 6 sockmap::txmsg test redirect:OK 3/ 2 sockmap::txmsg test redirect wait send mem:OK 4/ 6 sockmap::txmsg test drop:OK 5/ 6 sockmap::txmsg test ingress redirect:OK 6/ 7 sockmap::txmsg test skb:OK 7/12 sockmap::txmsg test apply:OK 8/12 sockmap::txmsg test cork:OK 9/ 3 sockmap::txmsg test hanging corks:OK 10/11 sockmap::txmsg test push_data:OK 11/17 sockmap::txmsg test pull-data:OK 12/ 9 sockmap::txmsg test pop-data:OK 13/ 6 sockmap::txmsg test push/pop data:OK 14/ 1 sockmap::txmsg test ingress parser:OK 15/ 1 sockmap::txmsg test ingress parser2:OK 16/ 6 sockhash::txmsg test passthrough:OK 17/ 6 sockhash::txmsg test redirect:OK 18/ 2 sockhash::txmsg test redirect wait send mem:OK 19/ 6 sockhash::txmsg test drop:OK 20/ 6 sockhash::txmsg test ingress redirect:OK 21/ 7 sockhash::txmsg test skb:OK 22/12 sockhash::txmsg test apply:OK 23/12 sockhash::txmsg test cork:OK 24/ 3 sockhash::txmsg test hanging corks:OK 25/11 sockhash::txmsg test push_data:OK 26/17 sockhash::txmsg test pull-data:OK 27/ 9 sockhash::txmsg test pop-data:OK 28/ 6 sockhash::txmsg test push/pop data:OK 29/ 1 sockhash::txmsg test ingress parser:OK 30/ 1 sockhash::txmsg test ingress parser2:OK 31/ 6 sockhash:ktls:txmsg test passthrough:OK 32/ 6 sockhash:ktls:txmsg test redirect:OK 33/ 2 sockhash:ktls:txmsg test redirect wait send mem:OK 34/ 6 sockhash:ktls:txmsg test drop:OK 35/ 6 sockhash:ktls:txmsg test ingress redirect:OK 36/ 7 sockhash:ktls:txmsg test skb:OK 37/12 sockhash:ktls:txmsg test apply:OK 38/12 sockhash:ktls:txmsg test cork:OK 39/ 3 sockhash:ktls:txmsg test hanging corks:OK 40/11 sockhash:ktls:txmsg test push_data:OK 41/17 sockhash:ktls:txmsg test pull-data:OK 42/ 9 sockhash:ktls:txmsg test pop-data:OK 43/ 6 sockhash:ktls:txmsg test push/pop data:OK 44/ 1 sockhash:ktls:txmsg test ingress parser:OK 45/ 0 sockhash:ktls:txmsg test ingress parser2:OK Pass: 45 Fail: 0 [1]: https://lore.kernel.org/bpf/CAM_iQpU7=4xjbefZoxndKoX9gFFMOe7FcWMq5tHBsymbrn… --- v2 -> v1: Removed WARN_ON() check and added Reviewed-by tag. https://lore.kernel.org/bpf/20250605145529.3q3aqr6iygpa6xg6@gmail.com/ Jiayuan Chen (2): bpf,ktls: Fix data corruption when using bpf_msg_pop_data() in ktls selftests/bpf: Add test to cover ktls with bpf_msg_pop_data net/tls/tls_sw.c | 13 +++ .../selftests/bpf/prog_tests/sockmap_ktls.c | 91 +++++++++++++++++++ .../selftests/bpf/progs/test_sockmap_ktls.c | 4 + 3 files changed, 108 insertions(+) -- 2.47.1

3 weeks, 3 days

4
5
0 0

[PATCH] selftests/mm: Skip uprobe vma merge test if uprobes are not enabled

by Pedro Falcato

If uprobes are not enabled, the test currently fails with: 7151 12:46:54.627936 # # # RUN merge.handle_uprobe_upon_merged_vma ... 7152 12:46:54.639014 # # f /sys/bus/event_source/devices/uprobe/type 7153 12:46:54.639306 # # fopen: No such file or directory 7154 12:46:54.650451 # # # merge.c:473:handle_uprobe_upon_merged_vma:Expected read_sysfs("/sys/bus/event_source/devices/uprobe/type", &type) (1) == 0 (0) 7155 12:46:54.650730 # # # handle_uprobe_upon_merged_vma: Test terminated by assertion 7156 12:46:54.661750 # # # FAIL merge.handle_uprobe_upon_merged_vma 7157 12:46:54.662030 # # not ok 8 merge.handle_uprobe_upon_merged_vma Skipping is a more sane and friendly behavior here. Fixes: efe99fabeb11b ("selftests/mm: add test about uprobe pte be orphan during vma merge") Reported-by: Aishwarya <aishwarya.tcv(a)arm.com> Closes: https://lore.kernel.org/linux-mm/20250610103729.72440-1-aishwarya.tcv@arm.c… Signed-off-by: Pedro Falcato <pfalcato(a)suse.de> --- tools/testing/selftests/mm/merge.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/mm/merge.c b/tools/testing/selftests/mm/merge.c index bbae66fc5038..cc26480098ae 100644 --- a/tools/testing/selftests/mm/merge.c +++ b/tools/testing/selftests/mm/merge.c @@ -470,7 +470,9 @@ TEST_F(merge, handle_uprobe_upon_merged_vma) ASSERT_GE(fd, 0); ASSERT_EQ(ftruncate(fd, page_size), 0); - ASSERT_EQ(read_sysfs("/sys/bus/event_source/devices/uprobe/type", &type), 0); + if (read_sysfs("/sys/bus/event_source/devices/uprobe/type", &type) != 0) { + SKIP(goto out, "Failed to read uprobe sysfs file, skipping"); + } memset(&attr, 0, attr_sz); attr.size = attr_sz; @@ -491,6 +493,7 @@ TEST_F(merge, handle_uprobe_upon_merged_vma) ASSERT_NE(mremap(ptr2, page_size, page_size, MREMAP_MAYMOVE | MREMAP_FIXED, ptr1), MAP_FAILED); +out: close(fd); remove(probe_file); } -- 2.49.0

3 weeks, 3 days

5
4
0 0

[PATCH] selftests/mm: Check for YAMA ptrace_scope configuraiton before modifying it

by Mark Brown

When running the memfd_secret test run_vmtests.sh unconditionally tries to confgiure the YAMA LSM's ptrace_scope configuration, leading to an error if YAMA is not in the running kernel: # ./run_vmtests.sh: line 432: /proc/sys/kernel/yama/ptrace_scope: No such file or directory # # ---------------------- # # running ./memfd_secret # # ---------------------- Check that this file is present before trying to write to it. The indentation here is a bit odd, and it doesn't seem great that we configure but don't restore ptrace_scope. Signed-off-by: Mark Brown <broonie(a)kernel.org> --- tools/testing/selftests/mm/run_vmtests.sh | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh index dddd1dd8af14..33fc7fafa8f9 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -429,7 +429,9 @@ CATEGORY="vma_merge" run_test ./merge if [ -x ./memfd_secret ] then -(echo 0 > /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix +if [ -f /proc/sys/kernel/yama/ptrace_scope ]; then + (echo 0 > /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix +fi CATEGORY="memfd_secret" run_test ./memfd_secret fi --- base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494 change-id: 20250605-selftest-mm-enable-yama-1541c2d2ddcd Best regards, -- Mark Brown <broonie(a)kernel.org>

3 weeks, 3 days

3
2
0 0

[PATCH V3 2/2] KVM: selftests: Change MDSCR_EL1 register holding variables as uint64_t

by Anshuman Khandual

Change MDSCR_EL1 register holding local variables as uint64_t that reflects its true register width as well. Cc: Marc Zyngier <maz(a)kernel.org> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Joey Gouly <joey.gouly(a)arm.com> Cc: kvm(a)vger.kernel.org Cc: kvmarm(a)lists.linux.dev Cc: linux-kernel(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: linux-arm-kernel(a)lists.infradead.org Reviewed-by: Ada Couprie Diaz <ada.coupriediaz(a)arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual(a)arm.com> --- tools/testing/selftests/kvm/arm64/debug-exceptions.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/kvm/arm64/debug-exceptions.c b/tools/testing/selftests/kvm/arm64/debug-exceptions.c index c7fb55c9135b..e34963956fbc 100644 --- a/tools/testing/selftests/kvm/arm64/debug-exceptions.c +++ b/tools/testing/selftests/kvm/arm64/debug-exceptions.c @@ -140,7 +140,7 @@ static void enable_os_lock(void) static void enable_monitor_debug_exceptions(void) { - uint32_t mdscr; + uint64_t mdscr; asm volatile("msr daifclr, #8"); @@ -223,7 +223,7 @@ void install_hw_bp_ctx(uint8_t addr_bp, uint8_t ctx_bp, uint64_t addr, static void install_ss(void) { - uint32_t mdscr; + uint64_t mdscr; asm volatile("msr daifclr, #8"); -- 2.25.1

3 weeks, 3 days

3
4
0 0

[PATCH 0/4] selftests/mm: Tweaks to the cow test

by Mark Brown

A collection of non-functional updates from David Hildenbrand's review. Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Mark Brown (4): kselftest/mm: Clarify errors for pipe() selftests/mm: Convert some cow error reports to ksft_perror() selftests/mm: Don't compare return values to in cow selftests/mm: Add messages about test errors to the cow tests tools/testing/selftests/mm/cow.c | 44 +++++++++++++++++++++++++--------------- 1 file changed, 28 insertions(+), 16 deletions(-) --- base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494 change-id: 20250603-selftest-mm-cow-tweaks-0199c5132b3e Best regards, -- Mark Brown <broonie(a)kernel.org>

3 weeks, 3 days

2
9
0 0

[PATCH v3 0/9] selftests: vDSO: Some cleanups and (warning) fixes

by Thomas Weißschuh

Fixes and cleanups for various issues in the vDSO selftests. Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- Changes in v3: - Rebase on v6.16-rc1 - Preserve vgetrandom_put_state() - Pick up vdso_standalone_test_x86 into this series - Link to v2: https://lore.kernel.org/r/20250505-selftests-vdso-fixes-v2-0-3bc86e42f242@l… Changes in v2: - Refer to -Wstrict-prototypes over -Wold-style-prototypes - Pick up Acks - Enable fixed warnings in Makefile - Link to v1: https://lore.kernel.org/r/20250502-selftests-vdso-fixes-v1-0-fb5d640a4f78@l… --- Thomas Weißschuh (9): selftests: vDSO: chacha: Correctly skip test if necessary selftests: vDSO: clock_getres: Drop unused include of err.h selftests: vDSO: vdso_test_getrandom: Drop unused include of linux/compiler.h selftests: vDSO: vdso_test_getrandom: Avoid -Wunused selftests: vDSO: vdso_config: Avoid -Wunused-variables selftests: vDSO: enable -Wall selftests: vDSO: vdso_test_correctness: Fix -Wstrict-prototypes selftests: vDSO: vdso_test_getrandom: Always print TAP header selftests: vDSO: vdso_standalone_test_x86: Replace source file with symlink tools/testing/selftests/vDSO/Makefile | 2 +- tools/testing/selftests/vDSO/vdso_config.h | 2 + .../selftests/vDSO/vdso_standalone_test_x86.c | 59 +--------------------- tools/testing/selftests/vDSO/vdso_test_chacha.c | 3 +- .../selftests/vDSO/vdso_test_clock_getres.c | 1 - .../testing/selftests/vDSO/vdso_test_correctness.c | 2 +- tools/testing/selftests/vDSO/vdso_test_getrandom.c | 10 ++-- 7 files changed, 13 insertions(+), 66 deletions(-) --- base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494 change-id: 20250423-selftests-vdso-fixes-d2ce74142359 Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

3 weeks, 3 days

1
9
0 0

nolibc test target overwites kernel config without asking

by Mark Brown

Hi, While running the nolibc tests I discovered that they build a kernel in the current directory, including overwriting the existing .config. This is rather suprising for the selftests build system - it usually wouldn't do a kernel build at all - and might be annoying for users. KUnit deals with this by doing it's kernel build in a .kunit directory, it'd probably be good to do something similar for nolibc. Thanks, Mark

3 weeks, 3 days

2
4
0 0

[PATCH v2] selftests/mm: Use generic read_sysfs in thuge-gen test

by Pu Lehui

From: Pu Lehui <pulehui(a)huawei.com> As generic read_sysfs is available in vm_utils, let's use is in thuge-gen test. Signed-off-by: Pu Lehui <pulehui(a)huawei.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> --- v2: - Explicit warning when ps != getpagesize(). (Lorenzo) tools/testing/selftests/mm/thuge-gen.c | 38 +++++++------------------- 1 file changed, 10 insertions(+), 28 deletions(-) diff --git a/tools/testing/selftests/mm/thuge-gen.c b/tools/testing/selftests/mm/thuge-gen.c index 95b6f043a3cb..8e2b08dc5762 100644 --- a/tools/testing/selftests/mm/thuge-gen.c +++ b/tools/testing/selftests/mm/thuge-gen.c @@ -77,40 +77,20 @@ void show(unsigned long ps) system(buf); } -unsigned long thuge_read_sysfs(int warn, char *fmt, ...) +unsigned long read_free(unsigned long ps) { - char *line = NULL; - size_t linelen = 0; - char buf[100]; - FILE *f; - va_list ap; unsigned long val = 0; + char buf[100]; - va_start(ap, fmt); - vsnprintf(buf, sizeof buf, fmt, ap); - va_end(ap); + snprintf(buf, sizeof(buf), + "/sys/kernel/mm/hugepages/hugepages-%lukB/free_hugepages", + ps >> 10); + if (read_sysfs(buf, &val) && ps != getpagesize()) + ksft_print_msg("missing %s\n", buf); - f = fopen(buf, "r"); - if (!f) { - if (warn) - ksft_print_msg("missing %s\n", buf); - return 0; - } - if (getline(&line, &linelen, f) > 0) { - sscanf(line, "%lu", &val); - } - fclose(f); - free(line); return val; } -unsigned long read_free(unsigned long ps) -{ - return thuge_read_sysfs(ps != getpagesize(), - "/sys/kernel/mm/hugepages/hugepages-%lukB/free_hugepages", - ps >> 10); -} - void test_mmap(unsigned long size, unsigned flags) { char *map; @@ -173,6 +153,7 @@ void test_shmget(unsigned long size, unsigned flags) void find_pagesizes(void) { unsigned long largest = getpagesize(); + unsigned long shmmax_val = 0; int i; glob_t g; @@ -195,7 +176,8 @@ void find_pagesizes(void) } globfree(&g); - if (thuge_read_sysfs(0, "/proc/sys/kernel/shmmax") < NUM_PAGES * largest) + read_sysfs("/proc/sys/kernel/shmmax", &shmmax_val); + if (shmmax_val < NUM_PAGES * largest) ksft_exit_fail_msg("Please do echo %lu > /proc/sys/kernel/shmmax", largest * NUM_PAGES); -- 2.34.1

3 weeks, 3 days

1
0
0 0

[PATCH] selftests/mm: Use generic read_sysfs in thuge-gen test

by Pu Lehui

From: Pu Lehui <pulehui(a)huawei.com> As generic read_sysfs is available in vm_utils, let's use is in thuge-gen test. Signed-off-by: Pu Lehui <pulehui(a)huawei.com> --- tools/testing/selftests/mm/thuge-gen.c | 37 +++++++------------------- 1 file changed, 9 insertions(+), 28 deletions(-) diff --git a/tools/testing/selftests/mm/thuge-gen.c b/tools/testing/selftests/mm/thuge-gen.c index 95b6f043a3cb..e11dfbfa661b 100644 --- a/tools/testing/selftests/mm/thuge-gen.c +++ b/tools/testing/selftests/mm/thuge-gen.c @@ -77,40 +77,19 @@ void show(unsigned long ps) system(buf); } -unsigned long thuge_read_sysfs(int warn, char *fmt, ...) +unsigned long read_free(unsigned long ps) { - char *line = NULL; - size_t linelen = 0; - char buf[100]; - FILE *f; - va_list ap; unsigned long val = 0; + char buf[100]; - va_start(ap, fmt); - vsnprintf(buf, sizeof buf, fmt, ap); - va_end(ap); + snprintf(buf, sizeof(buf), + "/sys/kernel/mm/hugepages/hugepages-%lukB/free_hugepages", + ps >> 10); + read_sysfs(buf, &val); - f = fopen(buf, "r"); - if (!f) { - if (warn) - ksft_print_msg("missing %s\n", buf); - return 0; - } - if (getline(&line, &linelen, f) > 0) { - sscanf(line, "%lu", &val); - } - fclose(f); - free(line); return val; } -unsigned long read_free(unsigned long ps) -{ - return thuge_read_sysfs(ps != getpagesize(), - "/sys/kernel/mm/hugepages/hugepages-%lukB/free_hugepages", - ps >> 10); -} - void test_mmap(unsigned long size, unsigned flags) { char *map; @@ -173,6 +152,7 @@ void test_shmget(unsigned long size, unsigned flags) void find_pagesizes(void) { unsigned long largest = getpagesize(); + unsigned long shmmax_val = 0; int i; glob_t g; @@ -195,7 +175,8 @@ void find_pagesizes(void) } globfree(&g); - if (thuge_read_sysfs(0, "/proc/sys/kernel/shmmax") < NUM_PAGES * largest) + read_sysfs("/proc/sys/kernel/shmmax", &shmmax_val); + if (shmmax_val < NUM_PAGES * largest) ksft_exit_fail_msg("Please do echo %lu > /proc/sys/kernel/shmmax", largest * NUM_PAGES); -- 2.34.1

3 weeks, 3 days

2
4
0 0

[PATCH] Documentation: kunit: fix argument of MODULE_IMPORT_NS()

by Thomas Weißschuh

The argument to MODULE_IMPORT_NS() should be a string literal. See commit cdd30ebb1b9f ("module: Convert symbol namespace to string literal") Fixes: d208025db6d6 ("Documentation: kunit: improve example on testing static functions") Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- Documentation/dev-tools/kunit/usage.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/dev-tools/kunit/usage.rst b/Documentation/dev-tools/kunit/usage.rst index 038f480074fd7aaa1f8e1b344bc74bf3426cc173..066ecda1dd98e73a01d50545e79c38a99a3e05a2 100644 --- a/Documentation/dev-tools/kunit/usage.rst +++ b/Documentation/dev-tools/kunit/usage.rst @@ -699,7 +699,7 @@ the template below. #include <kunit/visibility.h> #include <my_file.h> ... - MODULE_IMPORT_NS(EXPORTED_FOR_KUNIT_TESTING); + MODULE_IMPORT_NS("EXPORTED_FOR_KUNIT_TESTING"); ... // Use do_interesting_thing() in tests --- base-commit: f09079bd04a924c72d555cd97942d5f8d7eca98c change-id: 20250611-kunit-example-d195e41c0b34 Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

3 weeks, 3 days

1
0
0 0

[PATCH net-next 0/7] netpoll: Untangle netconsole and netpoll

by Breno Leitao

Initially netpoll and netconsole were created together, and some functions are in the wrong file. Seperate netconsole-only functions in netconsole, avoiding exports. 1. Expose netpoll logging macros in the public header to enable consistent log formatting across netpoll consumers. 2. Relocate netconsole-specific functions from netpoll to the netconsole module where they are actually used, reducing unnecessary coupling. 3. Remove unnecessary function exports 4. Rename netpoll parsing functions in netconsole to better reflect their specific usage. 5. Create a test to check that cmdline works fine. This was in my todo list since [1], this was a good time to add it here to make sure this patchset doesn't regress. PS: The code was split in a way that it is easy to review. When copying the functions from netpoll to netconsole, I do not change than other than adding `static`. This will make checkpatch unhappy, but, further patches will address the issues. It is done this way to make it easy for reviewers. Link: https://lore.kernel.org/netdev/Z36TlACdNMwFD7wv@dev-ushankar.dev.purestorag… [1] Signed-off-by: Breno Leitao <leitao(a)debian.org> --- Breno Leitao (7): netpoll: remove __netpoll_cleanup from exported API netpoll: expose netpoll logging macros in public header netpoll: relocate netconsole-specific functions to netconsole module netpoll: move netpoll_print_options to netconsole netconsole: rename functions to better reflect their purpose netconsole: improve code style in parser function selftest: netconsole: add test for cmdline configuration drivers/net/netconsole.c | 137 ++++++++++++++++++++- include/linux/netpoll.h | 10 +- net/core/netpoll.c | 136 +------------------- tools/testing/selftests/drivers/net/Makefile | 1 + .../selftests/drivers/net/lib/sh/lib_netcons.sh | 39 +++++- .../selftests/drivers/net/netcons_cmdline.sh | 52 ++++++++ 6 files changed, 228 insertions(+), 147 deletions(-) --- base-commit: 2c7e4a2663a1ab5a740c59c31991579b6b865a26 change-id: 20250603-rework-c175cad8d22e Best regards, -- Breno Leitao <leitao(a)debian.org>

3 weeks, 3 days

1
8
0 0

[PATCH net-next] selftests/net: packetdrill: more xfail changes

by Jakub Kicinski

Most of the packetdrill tests have not flaked once last week. Add the few which did to the XFAIL list. Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- CC: shuah(a)kernel.org CC: willemb(a)google.com CC: matttbe(a)kernel.org CC: linux-kselftest(a)vger.kernel.org Every time I sit down to add more I plan to just XFAIL all of packetdrill on slow machines, but then I convince myself otherwise. One last time? --- tools/testing/selftests/net/packetdrill/ksft_runner.sh | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/testing/selftests/net/packetdrill/ksft_runner.sh b/tools/testing/selftests/net/packetdrill/ksft_runner.sh index ef8b25a606d8..c5b01e1bd4c7 100755 --- a/tools/testing/selftests/net/packetdrill/ksft_runner.sh +++ b/tools/testing/selftests/net/packetdrill/ksft_runner.sh @@ -39,11 +39,15 @@ if [[ -n "${KSFT_MACHINE_SLOW}" ]]; then # xfail tests that are known flaky with dbg config, not fixable. # still run them for coverage (and expect 100% pass without dbg). declare -ar xfail_list=( + "tcp_blocking_blocking-connect.pkt" + "tcp_blocking_blocking-read.pkt" "tcp_eor_no-coalesce-retrans.pkt" "tcp_fast_recovery_prr-ss.*.pkt" + "tcp_sack_sack-route-refresh-ip-tos.pkt" "tcp_slow_start_slow-start-after-win-update.pkt" "tcp_timestamping.*.pkt" "tcp_user_timeout_user-timeout-probe.pkt" + "tcp_zerocopy_cl.*.pkt" "tcp_zerocopy_epoll_.*.pkt" "tcp_tcp_info_tcp-info-.*-limited.pkt" ) -- 2.49.0

3 weeks, 3 days

4
3
0 0

[PATCH] selftests: Add version file to kselftest installation dir

by Tianyi Cui

As titled, adding version file to kselftest installation dir, so the user of the tarball can know which kernel version the tarball belongs to. Signed-off-by: Tianyi Cui <1997cui(a)gmail.com> --- tools/testing/selftests/Makefile | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index a0a6ba47d600..246e9863b45b 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -291,6 +291,12 @@ ifdef INSTALL_PATH $(MAKE) -s --no-print-directory OUTPUT=$$BUILD_TARGET COLLECTION=$$TARGET \ -C $$TARGET emit_tests >> $(TEST_LIST); \ done; + @if git describe HEAD > /dev/null 2>&1; then \ + git describe HEAD > $(INSTALL_PATH)/VERSION; \ + printf "Version saved to $(INSTALL_PATH)/VERSION\n"; \ + else \ + printf "Unable to get version from git describe\n"; \ + fi else $(error Error: set INSTALL_PATH to use install) endif -- 2.47.1

3 weeks, 3 days

2
5
0 0

[PATCH net-next v3 0/4] netconsole: Optimize console registration and improve testing

by Breno Leitao

During performance analysis of console subsystem latency, I discovered that netconsole registers console handlers even when no active targets exist. These orphaned console handlers are invoked on every printk() call, get the lock, iterate through empty target lists, and consume CPU cycles without performing any useful work. This patch series addresses the inefficiency by: 1. Implementing dynamic console registration/unregistration based on target availability, ensuring console handlers are only active when needed 2. Adding automatic cleanup of unused console registrations when targets are disabled or removed 3. Extending the selftest suite to cover non-extended console format, which was previously untested The optimization reduces printk() overhead by eliminating unnecessary function calls and list traversals when netconsole targets are not configured, improving overall system performance during heavy logging scenarios. --- Changes in v3: - Set CON_ENABLED before re-enabling the console, to fix a selftest that was failing, as reported by Jakub on v2. - Link to v2: https://lore.kernel.org/r/20250602-netcons_ext-v2-0-ef88d999326d@debian.org Changes in v2: - Added selftests to test the new mechanism - Unregister the console if the last target got disabled - Sending to net-next instead of net (Jakub) - Link to v1: https://lore.kernel.org/r/20250528-netcons_ext-v1-1-69f71e404e00@debian.org --- Breno Leitao (4): netconsole: Only register console drivers when targets are configured netconsole: Add automatic console unregistration on target removal selftests: netconsole: Do not exit from inside the validation function selftests: netconsole: Add support for basic netconsole target format drivers/net/netconsole.c | 67 +++++++++++++++++++--- .../selftests/drivers/net/lib/sh/lib_netcons.sh | 27 +++++++-- .../testing/selftests/drivers/net/netcons_basic.sh | 50 ++++++++++------ 3 files changed, 112 insertions(+), 32 deletions(-) --- base-commit: 2c7e4a2663a1ab5a740c59c31991579b6b865a26 change-id: 20250528-netcons_ext-572982619bea Best regards, -- Breno Leitao <leitao(a)debian.org>

3 weeks, 3 days

2
5
0 0

[PATCH RFC net-next v2] page_pool: import Jesper's page_pool benchmark

by Mina Almasry

From: Jesper Dangaard Brouer <hawk(a)kernel.org> We frequently consult with Jesper's out-of-tree page_pool benchmark to evaluate page_pool changes. Import the benchmark into the upstream linux kernel tree so that (a) we're all running the same version, (b) pave the way for shared improvements, and (c) maybe one day integrate it with nipa, if possible. Import bench_page_pool_simple from commit 35b1716d0c30 ("Add page_bench06_walk_all"), from this repository: https://github.com/netoptimizer/prototype-kernel.git Changes done during upstreaming: - Fix checkpatch issues. - Remove the tasklet logic not needed. - Move under tools/testing - Create ksft for the benchmark. - Changed slightly how the benchmark gets build. Out of tree, time_bench is built as an independent .ko. Here it is included in bench_page_pool.ko Steps to run: ``` mkdir -p /tmp/run-pp-bench make -C ./tools/testing/selftests/net/bench make -C ./tools/testing/selftests/net/bench install INSTALL_PATH=/tmp/run-pp-bench rsync --delete -avz --progress /tmp/run-pp-bench mina@$SERVER:~/ ssh mina@$SERVER << EOF cd ~/run-pp-bench && sudo ./test_bench_page_pool.sh EOF ``` Output: ``` (benchmrk dmesg logs) Fast path results: no-softirq-page_pool01 Per elem: 11 cycles(tsc) 4.368 ns ptr_ring results: no-softirq-page_pool02 Per elem: 527 cycles(tsc) 195.187 ns slow path results: no-softirq-page_pool03 Per elem: 549 cycles(tsc) 203.466 ns ``` Cc: Jesper Dangaard Brouer <hawk(a)kernel.org> Cc: Ilias Apalodimas <ilias.apalodimas(a)linaro.org> Cc: Jakub Kicinski <kuba(a)kernel.org> Cc: Toke Høiland-Jørgensen <toke(a)toke.dk> Signed-off-by: Mina Almasry <almasrymina(a)google.com> --- v2: - Move under tools/selftests (Jakub) - Create ksft for it. - Remove the tasklet logic no longer needed (Jesper + Toke) RFC discussion points: - Desirable to import it? - Can the benchmark be imported as-is for an initial version? Or needs lots of modifications? - Code location. I retained the location in Jesper's tree, but a path like net/core/bench/ may make more sense. --- tools/testing/selftests/net/bench/Makefile | 7 + .../selftests/net/bench/page_pool/Makefile | 17 + .../bench/page_pool/bench_page_pool_simple.c | 275 ++++++++++++ .../bench/page_pool/test_bench_page_pool.sh | 32 ++ .../net/bench/page_pool/time_bench.c | 406 ++++++++++++++++++ .../net/bench/page_pool/time_bench.h | 259 +++++++++++ 6 files changed, 996 insertions(+) create mode 100644 tools/testing/selftests/net/bench/Makefile create mode 100644 tools/testing/selftests/net/bench/page_pool/Makefile create mode 100644 tools/testing/selftests/net/bench/page_pool/bench_page_pool_simple.c create mode 100755 tools/testing/selftests/net/bench/page_pool/test_bench_page_pool.sh create mode 100644 tools/testing/selftests/net/bench/page_pool/time_bench.c create mode 100644 tools/testing/selftests/net/bench/page_pool/time_bench.h diff --git a/tools/testing/selftests/net/bench/Makefile b/tools/testing/selftests/net/bench/Makefile new file mode 100644 index 000000000000..4ebce5d71b18 --- /dev/null +++ b/tools/testing/selftests/net/bench/Makefile @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0 + +TEST_GEN_MODS_DIR := page_pool + +TEST_PROGS += page_pool/test_bench_page_pool.sh + +include ../../lib.mk diff --git a/tools/testing/selftests/net/bench/page_pool/Makefile b/tools/testing/selftests/net/bench/page_pool/Makefile new file mode 100644 index 000000000000..0549a16ba275 --- /dev/null +++ b/tools/testing/selftests/net/bench/page_pool/Makefile @@ -0,0 +1,17 @@ +BENCH_PAGE_POOL_SIMPLE_TEST_DIR := $(realpath $(dir $(abspath $(lastword $(MAKEFILE_LIST))))) +KDIR ?= /lib/modules/$(shell uname -r)/build + +ifeq ($(V),1) +Q = +else +Q = @ +endif + +obj-m += bench_page_pool.o +bench_page_pool-y += bench_page_pool_simple.o time_bench.o + +all: + +$(Q)make -C $(KDIR) M=$(BENCH_PAGE_POOL_SIMPLE_TEST_DIR) modules + +clean: + +$(Q)make -C $(KDIR) M=$(BENCH_PAGE_POOL_SIMPLE_TEST_DIR) clean diff --git a/tools/testing/selftests/net/bench/page_pool/bench_page_pool_simple.c b/tools/testing/selftests/net/bench/page_pool/bench_page_pool_simple.c new file mode 100644 index 000000000000..53d168cce27d --- /dev/null +++ b/tools/testing/selftests/net/bench/page_pool/bench_page_pool_simple.c @@ -0,0 +1,275 @@ +/* + * Benchmark module for page_pool. + * + */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/module.h> +#include <linux/mutex.h> + +#include <linux/version.h> +#include <net/page_pool/helpers.h> + +#include <linux/interrupt.h> +#include <linux/limits.h> + +#include "time_bench.h" + +static int verbose = 1; +#define MY_POOL_SIZE 1024 + +static inline void _page_pool_put_page(struct page_pool *pool, + struct page *page, bool allow_direct) +{ + page_pool_put_page(pool, page, -1, allow_direct); +} + +/* Makes tests selectable. Useful for perf-record to analyze a single test. + * Hint: Bash shells support writing binary number like: $((2#101010) + * + * # modprobe bench_page_pool_simple run_flags=$((2#100)) + */ +static unsigned long run_flags = 0xFFFFFFFF; +module_param(run_flags, ulong, 0); +MODULE_PARM_DESC(run_flags, "Limit which bench test that runs"); +/* Count the bit number from the enum */ +enum benchmark_bit { + bit_run_bench_baseline, + bit_run_bench_no_softirq01, + bit_run_bench_no_softirq02, + bit_run_bench_no_softirq03, +}; +#define bit(b) (1 << (b)) +#define enabled(b) ((run_flags & (bit(b)))) + +/* notice time_bench is limited to U32_MAX nr loops */ +static unsigned long loops = 10000000; +module_param(loops, ulong, 0); +MODULE_PARM_DESC(loops, "Specify loops bench will run"); + +/* Timing at the nanosec level, we need to know the overhead + * introduced by the for loop itself */ +static int time_bench_for_loop(struct time_bench_record *rec, void *data) +{ + uint64_t loops_cnt = 0; + int i; + + time_bench_start(rec); + /** Loop to measure **/ + for (i = 0; i < rec->loops; i++) { + loops_cnt++; + barrier(); /* avoid compiler to optimize this loop */ + } + time_bench_stop(rec, loops_cnt); + return loops_cnt; +} + +static int time_bench_atomic_inc(struct time_bench_record *rec, void *data) +{ + uint64_t loops_cnt = 0; + atomic_t cnt; + int i; + + atomic_set(&cnt, 0); + + time_bench_start(rec); + /** Loop to measure **/ + for (i = 0; i < rec->loops; i++) { + atomic_inc(&cnt); + barrier(); /* avoid compiler to optimize this loop */ + } + loops_cnt = atomic_read(&cnt); + time_bench_stop(rec, loops_cnt); + return loops_cnt; +} + +/* The ptr_ping in page_pool uses a spinlock. We need to know the minimum + * overhead of taking+releasing a spinlock, to know the cycles that can be saved + * by e.g. amortizing this via bulking. + */ +static int time_bench_lock(struct time_bench_record *rec, void *data) +{ + uint64_t loops_cnt = 0; + spinlock_t lock; + int i; + + spin_lock_init(&lock); + + time_bench_start(rec); + /** Loop to measure **/ + for (i = 0; i < rec->loops; i++) { + spin_lock(&lock); + loops_cnt++; + barrier(); /* avoid compiler to optimize this loop */ + spin_unlock(&lock); + } + time_bench_stop(rec, loops_cnt); + return loops_cnt; +} + +/* Helper for filling some page's into ptr_ring */ +static void pp_fill_ptr_ring(struct page_pool *pp, int elems) +{ + gfp_t gfp_mask = GFP_ATOMIC; /* GFP_ATOMIC needed when under run softirq */ + struct page **array; + int i; + + array = kzalloc(sizeof(struct page *) * elems, gfp_mask); + + for (i = 0; i < elems; i++) { + array[i] = page_pool_alloc_pages(pp, gfp_mask); + } + for (i = 0; i < elems; i++) { + _page_pool_put_page(pp, array[i], false); + } + + kfree(array); +} + +enum test_type { type_fast_path, type_ptr_ring, type_page_allocator }; + +/* Depends on compile optimizing this function */ +static __always_inline int time_bench_page_pool(struct time_bench_record *rec, + void *data, enum test_type type, + const char *func) +{ + uint64_t loops_cnt = 0; + gfp_t gfp_mask = GFP_ATOMIC; /* GFP_ATOMIC is not really needed */ + int i, err; + + struct page_pool *pp; + struct page *page; + + struct page_pool_params pp_params = { + .order = 0, + .flags = 0, + .pool_size = MY_POOL_SIZE, + .nid = NUMA_NO_NODE, + .dev = NULL, /* Only use for DMA mapping */ + .dma_dir = DMA_BIDIRECTIONAL, + }; + + pp = page_pool_create(&pp_params); + if (IS_ERR(pp)) { + err = PTR_ERR(pp); + pr_warn("%s: Error(%d) creating page_pool\n", func, err); + goto out; + } + pp_fill_ptr_ring(pp, 64); + + if (in_serving_softirq()) + pr_warn("%s(): in_serving_softirq fast-path\n", func); + else + pr_warn("%s(): Cannot use page_pool fast-path\n", func); + + time_bench_start(rec); + /** Loop to measure **/ + for (i = 0; i < rec->loops; i++) { + /* Common fast-path alloc, that depend on in_serving_softirq() */ + page = page_pool_alloc_pages(pp, gfp_mask); + if (!page) + break; + loops_cnt++; + barrier(); /* avoid compiler to optimize this loop */ + + /* The benchmarks purpose it to test different return paths. + * Compiler should inline optimize other function calls out + */ + if (type == type_fast_path) { + /* Fast-path recycling e.g. XDP_DROP use-case */ + page_pool_recycle_direct(pp, page); + + } else if (type == type_ptr_ring) { + /* Normal return path */ + _page_pool_put_page(pp, page, false); + + } else if (type == type_page_allocator) { + /* Test if not pages are recycled, but instead + * returned back into systems page allocator + */ + get_page(page); /* cause no-recycling */ + _page_pool_put_page(pp, page, false); + put_page(page); + } else { + BUILD_BUG(); + } + } + time_bench_stop(rec, loops_cnt); +out: + page_pool_destroy(pp); + return loops_cnt; +} + +static int time_bench_page_pool01_fast_path(struct time_bench_record *rec, + void *data) +{ + return time_bench_page_pool(rec, data, type_fast_path, __func__); +} + +static int time_bench_page_pool02_ptr_ring(struct time_bench_record *rec, + void *data) +{ + return time_bench_page_pool(rec, data, type_ptr_ring, __func__); +} + +static int time_bench_page_pool03_slow(struct time_bench_record *rec, + void *data) +{ + return time_bench_page_pool(rec, data, type_page_allocator, __func__); +} + +static int run_benchmark_tests(void) +{ + uint32_t nr_loops = loops; + int passed_count = 0; + + /* Baseline tests */ + if (enabled(bit_run_bench_baseline)) { + time_bench_loop(nr_loops * 10, 0, "for_loop", NULL, + time_bench_for_loop); + time_bench_loop(nr_loops * 10, 0, "atomic_inc", NULL, + time_bench_atomic_inc); + time_bench_loop(nr_loops, 0, "lock", NULL, time_bench_lock); + } + + /* This test cannot activate correct code path, due to no-softirq ctx */ + if (enabled(bit_run_bench_no_softirq01)) + time_bench_loop(nr_loops, 0, "no-softirq-page_pool01", NULL, + time_bench_page_pool01_fast_path); + if (enabled(bit_run_bench_no_softirq02)) + time_bench_loop(nr_loops, 0, "no-softirq-page_pool02", NULL, + time_bench_page_pool02_ptr_ring); + if (enabled(bit_run_bench_no_softirq03)) + time_bench_loop(nr_loops, 0, "no-softirq-page_pool03", NULL, + time_bench_page_pool03_slow); + + return passed_count; +} + +static int __init bench_page_pool_simple_module_init(void) +{ + if (verbose) + pr_info("Loaded\n"); + + if (loops > U32_MAX) { + pr_err("Module param loops(%lu) exceeded U32_MAX(%u)\n", loops, + U32_MAX); + return -ECHRNG; + } + + run_benchmark_tests(); + + return 0; +} +module_init(bench_page_pool_simple_module_init); + +static void __exit bench_page_pool_simple_module_exit(void) +{ + if (verbose) + pr_info("Unloaded\n"); +} +module_exit(bench_page_pool_simple_module_exit); + +MODULE_DESCRIPTION("Benchmark of page_pool simple cases"); +MODULE_AUTHOR("Jesper Dangaard Brouer <netoptimizer(a)brouer.com>"); +MODULE_LICENSE("GPL"); diff --git a/tools/testing/selftests/net/bench/page_pool/test_bench_page_pool.sh b/tools/testing/selftests/net/bench/page_pool/test_bench_page_pool.sh new file mode 100755 index 000000000000..5eb48f28b659 --- /dev/null +++ b/tools/testing/selftests/net/bench/page_pool/test_bench_page_pool.sh @@ -0,0 +1,32 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# + +set -e + +DRIVER="./page_pool/bench_page_pool.ko" +result="" + +function run_test() +{ + rmmod "bench_page_pool.ko" || true + insmod $DRIVER > /dev/null 2>&1 + result=$(dmesg | tail -10) + echo "$result" + + echo + echo "Fast path results:" + echo ${result} | grep -o -E "no-softirq-page_pool01 Per elem: ([0-9]+) cycles$tsc$ ([0-9]+\.[0-9]+) ns" + + echo + echo "ptr_ring results:" + echo ${result} | grep -o -E "no-softirq-page_pool02 Per elem: ([0-9]+) cycles$tsc$ ([0-9]+\.[0-9]+) ns" + + echo + echo "slow path results:" + echo ${result} | grep -o -E "no-softirq-page_pool03 Per elem: ([0-9]+) cycles$tsc$ ([0-9]+\.[0-9]+) ns" +} + +run_test + +exit 0 diff --git a/tools/testing/selftests/net/bench/page_pool/time_bench.c b/tools/testing/selftests/net/bench/page_pool/time_bench.c new file mode 100644 index 000000000000..257b1515c64e --- /dev/null +++ b/tools/testing/selftests/net/bench/page_pool/time_bench.c @@ -0,0 +1,406 @@ +/* + * Benchmarking code execution time inside the kernel + * + * Copyright (C) 2014, Red Hat, Inc., Jesper Dangaard Brouer + * for licensing details see kernel-base/COPYING + */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/module.h> +#include <linux/time.h> + +#include <linux/perf_event.h> /* perf_event_create_kernel_counter() */ + +/* For concurrency testing */ +#include <linux/completion.h> +#include <linux/sched.h> +#include <linux/workqueue.h> +#include <linux/kthread.h> + +#include "time_bench.h" + +static int verbose = 1; + +/** TSC (Time-Stamp Counter) based ** + * See: linux/time_bench.h + * tsc_start_clock() and tsc_stop_clock() + */ + +/** Wall-clock based ** + */ + +/** PMU (Performance Monitor Unit) based ** + */ +#define PERF_FORMAT \ + (PERF_FORMAT_GROUP | PERF_FORMAT_ID | PERF_FORMAT_TOTAL_TIME_ENABLED | \ + PERF_FORMAT_TOTAL_TIME_RUNNING) + +struct raw_perf_event { + uint64_t config; /* event */ + uint64_t config1; /* umask */ + struct perf_event *save; + char *desc; +}; + +/* if HT is enable a maximum of 4 events (5 if one is instructions + * retired can be specified, if HT is disabled a maximum of 8 (9 if + * one is instructions retired) can be specified. + * + * From Table 19-1. Architectural Performance Events + * Architectures Software Developer’s Manual Volume 3: System Programming Guide + */ +struct raw_perf_event perf_events[] = { + { 0x3c, 0x00, NULL, "Unhalted CPU Cycles" }, + { 0xc0, 0x00, NULL, "Instruction Retired" } +}; + +#define NUM_EVTS (sizeof(perf_events) / sizeof(struct raw_perf_event)) + +/* WARNING: PMU config is currently broken! + */ +bool time_bench_PMU_config(bool enable) +{ + int i; + struct perf_event_attr perf_conf; + struct perf_event *perf_event; + int cpu; + + preempt_disable(); + cpu = smp_processor_id(); + pr_info("DEBUG: cpu:%d\n", cpu); + preempt_enable(); + + memset(&perf_conf, 0, sizeof(struct perf_event_attr)); + perf_conf.type = PERF_TYPE_RAW; + perf_conf.size = sizeof(struct perf_event_attr); + perf_conf.read_format = PERF_FORMAT; + perf_conf.pinned = 1; + perf_conf.exclude_user = 1; /* No userspace events */ + perf_conf.exclude_kernel = 0; /* Only kernel events */ + + for (i = 0; i < NUM_EVTS; i++) { + perf_conf.disabled = enable; + //perf_conf.disabled = (i == 0) ? 1 : 0; + perf_conf.config = perf_events[i].config; + perf_conf.config1 = perf_events[i].config1; + if (verbose) + pr_info("%s() enable PMU counter: %s\n", + __func__, perf_events[i].desc); + perf_event = perf_event_create_kernel_counter(&perf_conf, cpu, + NULL /* task */, + NULL /* overflow_handler*/, + NULL /* context */); + if (perf_event) { + perf_events[i].save = perf_event; + pr_info("%s():DEBUG perf_event success\n", __func__); + + perf_event_enable(perf_event); + } else { + pr_info("%s():DEBUG perf_event is NULL\n", __func__); + } + } + + return true; +} + +/** Generic functions ** + */ + +/* Calculate stats, store results in record */ +bool time_bench_calc_stats(struct time_bench_record *rec) +{ +#define NANOSEC_PER_SEC 1000000000 /* 10^9 */ + uint64_t ns_per_call_tmp_rem = 0; + uint32_t ns_per_call_remainder = 0; + uint64_t pmc_ipc_tmp_rem = 0; + uint32_t pmc_ipc_remainder = 0; + uint32_t pmc_ipc_div = 0; + uint32_t invoked_cnt_precision = 0; + uint32_t invoked_cnt = 0; /* 32-bit due to div_u64_rem() */ + + if (rec->flags & TIME_BENCH_LOOP) { + if (rec->invoked_cnt < 1000) { + pr_err("ERR: need more(>1000) loops(%llu) for timing\n", + rec->invoked_cnt); + return false; + } + if (rec->invoked_cnt > ((1ULL << 32) - 1)) { + /* div_u64_rem() can only support div with 32bit*/ + pr_err("ERR: Invoke cnt(%llu) too big overflow 32bit\n", + rec->invoked_cnt); + return false; + } + invoked_cnt = (uint32_t)rec->invoked_cnt; + } + + /* TSC (Time-Stamp Counter) records */ + if (rec->flags & TIME_BENCH_TSC) { + rec->tsc_interval = rec->tsc_stop - rec->tsc_start; + if (rec->tsc_interval == 0) { + pr_err("ABORT: timing took ZERO TSC time\n"); + return false; + } + /* Calculate stats */ + if (rec->flags & TIME_BENCH_LOOP) + rec->tsc_cycles = rec->tsc_interval / invoked_cnt; + else + rec->tsc_cycles = rec->tsc_interval; + } + + /* Wall-clock time calc */ + if (rec->flags & TIME_BENCH_WALLCLOCK) { + rec->time_start = rec->ts_start.tv_nsec + + (NANOSEC_PER_SEC * rec->ts_start.tv_sec); + rec->time_stop = rec->ts_stop.tv_nsec + + (NANOSEC_PER_SEC * rec->ts_stop.tv_sec); + rec->time_interval = rec->time_stop - rec->time_start; + if (rec->time_interval == 0) { + pr_err("ABORT: timing took ZERO wallclock time\n"); + return false; + } + /* Calculate stats */ + /*** Division in kernel it tricky ***/ + /* Orig: time_sec = (time_interval / NANOSEC_PER_SEC); */ + /* remainder only correct because NANOSEC_PER_SEC is 10^9 */ + rec->time_sec = div_u64_rem(rec->time_interval, NANOSEC_PER_SEC, + &rec->time_sec_remainder); + //TODO: use existing struct timespec records instead of div? + + if (rec->flags & TIME_BENCH_LOOP) { + /*** Division in kernel it tricky ***/ + /* Orig: ns = ((double)time_interval / invoked_cnt); */ + /* First get quotient */ + rec->ns_per_call_quotient = + div_u64_rem(rec->time_interval, invoked_cnt, + &ns_per_call_remainder); + /* Now get decimals .xxx precision (incorrect roundup)*/ + ns_per_call_tmp_rem = ns_per_call_remainder; + invoked_cnt_precision = invoked_cnt / 1000; + if (invoked_cnt_precision > 0) { + rec->ns_per_call_decimal = + div_u64_rem(ns_per_call_tmp_rem, + invoked_cnt_precision, + &ns_per_call_remainder); + } + } + } + + /* Performance Monitor Unit (PMU) counters */ + if (rec->flags & TIME_BENCH_PMU) { + //FIXME: Overflow handling??? + rec->pmc_inst = rec->pmc_inst_stop - rec->pmc_inst_start; + rec->pmc_clk = rec->pmc_clk_stop - rec->pmc_clk_start; + + /* Calc Instruction Per Cycle (IPC) */ + /* First get quotient */ + rec->pmc_ipc_quotient = div_u64_rem(rec->pmc_inst, rec->pmc_clk, + &pmc_ipc_remainder); + /* Now get decimals .xxx precision (incorrect roundup)*/ + pmc_ipc_tmp_rem = pmc_ipc_remainder; + pmc_ipc_div = rec->pmc_clk / 1000; + if (pmc_ipc_div > 0) { + rec->pmc_ipc_decimal = div_u64_rem(pmc_ipc_tmp_rem, + pmc_ipc_div, + &pmc_ipc_remainder); + } + } + + return true; +} + +/* Generic function for invoking a loop function and calculating + * execution time stats. The function being called/timed is assumed + * to perform a tight loop, and update the timing record struct. + */ +bool time_bench_loop(uint32_t loops, int step, char *txt, void *data, + int (*func)(struct time_bench_record *record, void *data)) +{ + struct time_bench_record rec; + + /* Setup record */ + memset(&rec, 0, sizeof(rec)); /* zero func might not update all */ + rec.version_abi = 1; + rec.loops = loops; + rec.step = step; + rec.flags = (TIME_BENCH_LOOP|TIME_BENCH_TSC|TIME_BENCH_WALLCLOCK); +// rec.flags = (TIME_BENCH_LOOP|TIME_BENCH_TSC| +// TIME_BENCH_WALLCLOCK|TIME_BENCH_PMU); + //TODO: Add/copy txt to rec + + /*** Loop function being timed ***/ + if (!func(&rec, data)) { + pr_err("ABORT: function being timed failed\n"); + return false; + } + + if (rec.invoked_cnt < loops) + pr_warn("WARNING: Invoke count(%llu) smaller than loops(%d)\n", + rec.invoked_cnt, loops); + + /* Calculate stats */ + time_bench_calc_stats(&rec); + + pr_info("Type:%s Per elem: %llu cycles(tsc) %llu.%03llu ns (step:%d)" + " - (measurement period time:%llu.%09u sec time_interval:%llu)" + " - (invoke count:%llu tsc_interval:%llu)\n", + txt, rec.tsc_cycles, + rec.ns_per_call_quotient, rec.ns_per_call_decimal, rec.step, + rec.time_sec, rec.time_sec_remainder, rec.time_interval, + rec.invoked_cnt, rec.tsc_interval); +/* pr_info("DEBUG check is %llu/%llu == %llu.%03llu ?\n", + rec.time_interval, rec.invoked_cnt, + rec.ns_per_call_quotient, rec.ns_per_call_decimal); +*/ + if (rec.flags & TIME_BENCH_PMU) { + pr_info("Type:%s PMU inst/clock" + "%llu/%llu = %llu.%03llu IPC (inst per cycle)\n", + txt, rec.pmc_inst, rec.pmc_clk, + rec.pmc_ipc_quotient, rec.pmc_ipc_decimal); + } + return true; +} + +/* Function getting invoked by kthread */ +static int invoke_test_on_cpu_func(void *private) +{ + struct time_bench_cpu *cpu = private; + struct time_bench_sync *sync = cpu->sync; + cpumask_t newmask = CPU_MASK_NONE; + void *data = cpu->data; + + /* Restrict CPU */ + cpumask_set_cpu(cpu->rec.cpu, &newmask); + set_cpus_allowed_ptr(current, &newmask); + + /* Synchronize start of concurrency test */ + atomic_inc(&sync->nr_tests_running); + wait_for_completion(&sync->start_event); + + /* Start benchmark function */ + if (!cpu->bench_func(&cpu->rec, data)) { + pr_err("ERROR: function being timed failed on CPU:%d(%d)\n", + cpu->rec.cpu, smp_processor_id()); + } else { + if (verbose) + pr_info("SUCCESS: ran on CPU:%d(%d)\n", cpu->rec.cpu, + smp_processor_id()); + } + cpu->did_bench_run = true; + + /* End test */ + atomic_dec(&sync->nr_tests_running); + /* Wait for kthread_stop() telling us to stop */ + while (!kthread_should_stop()) { + set_current_state(TASK_INTERRUPTIBLE); + schedule(); + } + __set_current_state(TASK_RUNNING); + return 0; +} + +void time_bench_print_stats_cpumask(const char *desc, + struct time_bench_cpu *cpu_tasks, + const struct cpumask *mask) +{ + uint64_t average = 0; + int cpu; + int step = 0; + struct sum { + uint64_t tsc_cycles; + int records; + } sum = { 0 }; + + /* Get stats */ + for_each_cpu(cpu, mask) { + struct time_bench_cpu *c = &cpu_tasks[cpu]; + struct time_bench_record *rec = &c->rec; + + /* Calculate stats */ + time_bench_calc_stats(rec); + + pr_info("Type:%s CPU(%d) %llu cycles(tsc) %llu.%03llu ns" + " (step:%d)" + " - (measurement period time:%llu.%09u sec time_interval:%llu)" + " - (invoke count:%llu tsc_interval:%llu)\n", + desc, cpu, rec->tsc_cycles, rec->ns_per_call_quotient, + rec->ns_per_call_decimal, rec->step, rec->time_sec, + rec->time_sec_remainder, rec->time_interval, + rec->invoked_cnt, rec->tsc_interval); + + /* Collect average */ + sum.records++; + sum.tsc_cycles += rec->tsc_cycles; + step = rec->step; + } + + if (sum.records) /* avoid div-by-zero */ + average = sum.tsc_cycles / sum.records; + pr_info("Sum Type:%s Average: %llu cycles(tsc) CPUs:%d step:%d\n", desc, + average, sum.records, step); +} + +void time_bench_run_concurrent( + uint32_t loops, int step, void *data, + const struct cpumask *mask, /* Support masking outsome CPUs*/ + struct time_bench_sync *sync, struct time_bench_cpu *cpu_tasks, + int (*func)(struct time_bench_record *record, void *data)) +{ + int cpu, running = 0; + + if (verbose) // DEBUG + pr_warn("%s() Started on CPU:%d\n", __func__, + smp_processor_id()); + + /* Reset sync conditions */ + atomic_set(&sync->nr_tests_running, 0); + init_completion(&sync->start_event); + + /* Spawn off jobs on all CPUs */ + for_each_cpu(cpu, mask) { + struct time_bench_cpu *c = &cpu_tasks[cpu]; + + running++; + c->sync = sync; /* Send sync variable along */ + c->data = data; /* Send opaque along */ + + /* Init benchmark record */ + memset(&c->rec, 0, sizeof(struct time_bench_record)); + c->rec.version_abi = 1; + c->rec.loops = loops; + c->rec.step = step; + c->rec.flags = (TIME_BENCH_LOOP|TIME_BENCH_TSC| + TIME_BENCH_WALLCLOCK); + c->rec.cpu = cpu; + c->bench_func = func; + c->task = kthread_run(invoke_test_on_cpu_func, c, + "time_bench%d", cpu); + if (IS_ERR(c->task)) { + pr_err("%s(): Failed to start test func\n", __func__); + return; /* Argh, what about cleanup?! */ + } + } + + /* Wait until all processes are running */ + while (atomic_read(&sync->nr_tests_running) < running) { + set_current_state(TASK_UNINTERRUPTIBLE); + schedule_timeout(10); + } + /* Kick off all CPU concurrently on completion event */ + complete_all(&sync->start_event); + + /* Wait for CPUs to finish */ + while (atomic_read(&sync->nr_tests_running)) { + set_current_state(TASK_UNINTERRUPTIBLE); + schedule_timeout(10); + } + + /* Stop the kthreads */ + for_each_cpu(cpu, mask) { + struct time_bench_cpu *c = &cpu_tasks[cpu]; + kthread_stop(c->task); + } + + if (verbose) // DEBUG - happens often, finish on another CPU + pr_warn("%s() Finished on CPU:%d\n", __func__, + smp_processor_id()); +} diff --git a/tools/testing/selftests/net/bench/page_pool/time_bench.h b/tools/testing/selftests/net/bench/page_pool/time_bench.h new file mode 100644 index 000000000000..7331b5789490 --- /dev/null +++ b/tools/testing/selftests/net/bench/page_pool/time_bench.h @@ -0,0 +1,259 @@ +/* + * Benchmarking code execution time inside the kernel + * + * Copyright (C) 2014, Red Hat, Inc., Jesper Dangaard Brouer + * for licensing details see kernel-base/COPYING + */ +#ifndef _LINUX_TIME_BENCH_H +#define _LINUX_TIME_BENCH_H + +/* Main structure used for recording a benchmark run */ +struct time_bench_record { + uint32_t version_abi; + uint32_t loops; /* Requested loop invocations */ + uint32_t step; /* option for e.g. bulk invocations */ + + uint32_t flags; /* Measurements types enabled */ +#define TIME_BENCH_LOOP (1<<0) +#define TIME_BENCH_TSC (1<<1) +#define TIME_BENCH_WALLCLOCK (1<<2) +#define TIME_BENCH_PMU (1<<3) + + uint32_t cpu; /* Used when embedded in time_bench_cpu */ + + /* Records */ + uint64_t invoked_cnt; /* Returned actual invocations */ + uint64_t tsc_start; + uint64_t tsc_stop; + struct timespec64 ts_start; + struct timespec64 ts_stop; + /** PMU counters for instruction and cycles + * instructions counter including pipelined instructions */ + uint64_t pmc_inst_start; + uint64_t pmc_inst_stop; + /* CPU unhalted clock counter */ + uint64_t pmc_clk_start; + uint64_t pmc_clk_stop; + + /* Result records */ + uint64_t tsc_interval; + uint64_t time_start, time_stop, time_interval; /* in nanosec */ + uint64_t pmc_inst, pmc_clk; + + /* Derived result records */ + uint64_t tsc_cycles; // +decimal? + uint64_t ns_per_call_quotient, ns_per_call_decimal; + uint64_t time_sec; + uint32_t time_sec_remainder; + uint64_t pmc_ipc_quotient, pmc_ipc_decimal; /* inst per cycle */ +}; + +/* For synchronizing parallel CPUs to run concurrently */ +struct time_bench_sync { + atomic_t nr_tests_running; + struct completion start_event; +}; + +/* Keep track of CPUs executing our bench function. + * + * Embed a time_bench_record for storing info per cpu + */ +struct time_bench_cpu { + struct time_bench_record rec; + struct time_bench_sync *sync; /* back ptr */ + struct task_struct *task; + /* "data" opaque could have been placed in time_bench_sync, + * but to avoid any false sharing, place it per CPU + */ + void *data; + /* Support masking outsome CPUs, mark if it ran */ + bool did_bench_run; + /* int cpu; // note CPU stored in time_bench_record */ + int (*bench_func)(struct time_bench_record *record, void *data); +}; + +/* + * Below TSC assembler code is not compatible with other archs, and + * can also fail on guests if cpu-flags are not correct. + * + * The way TSC reading is used, many iterations, does not require as + * high accuracy as described below (in Intel Doc #324264). + * + * Considering changing to use get_cycles() (#include <asm/timex.h>). + */ + +/** TSC (Time-Stamp Counter) based ** + * Recommend reading, to understand details of reading TSC accurately: + * Intel Doc #324264, "How to Benchmark Code Execution Times on Intel" + * + * Consider getting exclusive ownership of CPU by using: + * unsigned long flags; + * preempt_disable(); + * raw_local_irq_save(flags); + * _your_code_ + * raw_local_irq_restore(flags); + * preempt_enable(); + * + * Clobbered registers: "%rax", "%rbx", "%rcx", "%rdx" + * RDTSC only change "%rax" and "%rdx" but + * CPUID clears the high 32-bits of all (rax/rbx/rcx/rdx) + */ +static __always_inline uint64_t tsc_start_clock(void) +{ + /* See: Intel Doc #324264 */ + unsigned hi, lo; + asm volatile("CPUID\n\t" + "RDTSC\n\t" + "mov %%edx, %0\n\t" + "mov %%eax, %1\n\t" + : "=r"(hi), "=r"(lo)::"%rax", "%rbx", "%rcx", "%rdx"); + //FIXME: on 32bit use clobbered %eax + %edx + return ((uint64_t)lo) | (((uint64_t)hi) << 32); +} + +static __always_inline uint64_t tsc_stop_clock(void) +{ + /* See: Intel Doc #324264 */ + unsigned hi, lo; + asm volatile("RDTSCP\n\t" + "mov %%edx, %0\n\t" + "mov %%eax, %1\n\t" + "CPUID\n\t" + : "=r"(hi), "=r"(lo)::"%rax", "%rbx", "%rcx", "%rdx"); + return ((uint64_t)lo) | (((uint64_t)hi) << 32); +} + +/* Notes for RDTSC and RDTSCP + * + * Hannes found out that __builtin_ia32_rdtsc and + * __builtin_ia32_rdtscp are undocumented available in gcc, so there + * is no need to write inline assembler functions for them any more. + * + * unsigned long long __builtin_ia32_rdtscp(unsigned int *foo); + * (where foo is set to: numa_node << 12 | cpu) + * and + * unsigned long long __builtin_ia32_rdtsc(void); + * + * Above we combine the calls with CPUID, thus I don't see how this is + * directly appreciable. + */ + +/* +inline uint64_t rdtsc(void) +{ + uint32_t low, high; + asm volatile("rdtsc" : "=a" (low), "=d" (high)); + return low | (((uint64_t )high ) << 32); +} +*/ + +/** Wall-clock based ** + * + * use: getnstimeofday() + * getnstimeofday(&rec->ts_start); + * getnstimeofday(&rec->ts_stop); + * + * API changed see: Documentation/core-api/timekeeping.rst + * https://www.kernel.org/doc/html/latest/core-api/timekeeping.html#c.getnstim… + * + * We should instead use: ktime_get_real_ts64() is a direct + * replacement, but consider using monotonic time (ktime_get_ts64()) + * and/or a ktime_t based interface (ktime_get()/ktime_get_real()). + */ + +/** PMU (Performance Monitor Unit) based ** + * + * Needed for calculating: Instructions Per Cycle (IPC) + * - The IPC number tell how efficient the CPU pipelining were + */ +//lookup: perf_event_create_kernel_counter() + +bool time_bench_PMU_config(bool enable); + +/* Raw reading via rdpmc() using fixed counters + * + * From: https://github.com/andikleen/simple-pmu + */ +enum { + FIXED_SELECT = (1U << 30), /* == 0x40000000 */ + FIXED_INST_RETIRED_ANY = 0, + FIXED_CPU_CLK_UNHALTED_CORE = 1, + FIXED_CPU_CLK_UNHALTED_REF = 2, +}; + +static __always_inline unsigned long long p_rdpmc(unsigned in) +{ + unsigned d, a; + + asm volatile("rdpmc" : "=d"(d), "=a"(a) : "c"(in) : "memory"); + return ((unsigned long long)d << 32) | a; +} + +/* These PMU counter needs to be enabled, but I don't have the + * configure code implemented. My current hack is running: + * sudo perf stat -e cycles:k -e instructions:k insmod lib/ring_queue_test.ko + */ +/* Reading all pipelined instruction */ +static __always_inline unsigned long long pmc_inst(void) +{ + return p_rdpmc(FIXED_SELECT | FIXED_INST_RETIRED_ANY); +} + +/* Reading CPU clock cycles */ +static __always_inline unsigned long long pmc_clk(void) +{ + return p_rdpmc(FIXED_SELECT | FIXED_CPU_CLK_UNHALTED_CORE); +} + +/* Raw reading via MSR rdmsr() is likely wrong + * FIXME: How can I know which raw MSR registers are conf for what? + */ +#define MSR_IA32_PCM0 0x400000C1 /* PERFCTR0 */ +#define MSR_IA32_PCM1 0x400000C2 /* PERFCTR1 */ +#define MSR_IA32_PCM2 0x400000C3 +static inline uint64_t msr_inst(unsigned long long *msr_result) +{ + return rdmsrl_safe(MSR_IA32_PCM0, msr_result); +} + +/** Generic functions ** + */ +bool time_bench_loop(uint32_t loops, int step, char *txt, void *data, + int (*func)(struct time_bench_record *rec, void *data)); +bool time_bench_calc_stats(struct time_bench_record *rec); + +void time_bench_run_concurrent( + uint32_t loops, int step, void *data, + const struct cpumask *mask, /* Support masking outsome CPUs*/ + struct time_bench_sync *sync, struct time_bench_cpu *cpu_tasks, + int (*func)(struct time_bench_record *record, void *data)); +void time_bench_print_stats_cpumask(const char *desc, + struct time_bench_cpu *cpu_tasks, + const struct cpumask *mask); + +//FIXME: use rec->flags to select measurement, should be MACRO +static __always_inline void time_bench_start(struct time_bench_record *rec) +{ + //getnstimeofday(&rec->ts_start); + ktime_get_real_ts64(&rec->ts_start); + if (rec->flags & TIME_BENCH_PMU) { + rec->pmc_inst_start = pmc_inst(); + rec->pmc_clk_start = pmc_clk(); + } + rec->tsc_start = tsc_start_clock(); +} + +static __always_inline void time_bench_stop(struct time_bench_record *rec, + uint64_t invoked_cnt) +{ + rec->tsc_stop = tsc_stop_clock(); + if (rec->flags & TIME_BENCH_PMU) { + rec->pmc_inst_stop = pmc_inst(); + rec->pmc_clk_stop = pmc_clk(); + } + //getnstimeofday(&rec->ts_stop); + ktime_get_real_ts64(&rec->ts_stop); + rec->invoked_cnt = invoked_cnt; +} + +#endif /* _LINUX_TIME_BENCH_H */ base-commit: ea15e046263b19e91ffd827645ae5dfa44ebd044 -- 2.49.0.1151.ga128411c76-goog

3 weeks, 4 days

5
10
0 0

[PATCH] kunit: qemu_configs: Add riscv32 config

by Thomas Weißschuh

Add a basic config to run kunit tests on riscv32. Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- tools/testing/kunit/qemu_configs/riscv32.py | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/tools/testing/kunit/qemu_configs/riscv32.py b/tools/testing/kunit/qemu_configs/riscv32.py new file mode 100644 index 0000000000000000000000000000000000000000..b79ba0ae30f8573035b3401be337b379eba97e26 --- /dev/null +++ b/tools/testing/kunit/qemu_configs/riscv32.py @@ -0,0 +1,17 @@ +# SPDX-License-Identifier: GPL-2.0 + +from ..qemu_config import QemuArchParams + +QEMU_ARCH = QemuArchParams(linux_arch='riscv', + kconfig=''' +CONFIG_NONPORTABLE=y +CONFIG_ARCH_RV32I=y +CONFIG_ARCH_VIRT=y +CONFIG_SERIAL_8250=y +CONFIG_SERIAL_8250_CONSOLE=y +CONFIG_SERIAL_OF_PLATFORM=y +''', + qemu_arch='riscv32', + kernel_path='arch/riscv/boot/Image', + kernel_command_line='console=ttyS0', + extra_qemu_params=['-machine', 'virt']) --- base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8 change-id: 20250214-kunit-qemu-riscv32-fb38d659c373 Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

3 weeks, 4 days

3
2
0 0

[PATCH net-next 14/14] selftests: forwarding: Add a test for verifying VXLAN MC underlay

by Petr Machata

Add tests for MC-routing underlay VXLAN traffic. Signed-off-by: Petr Machata <petrm(a)nvidia.com> --- Notes: CC: Shuah Khan <shuah(a)kernel.org> CC: linux-kselftest(a)vger.kernel.org .../testing/selftests/net/forwarding/Makefile | 1 + .../net/forwarding/vxlan_bridge_1q_mc_ul.sh | 757 ++++++++++++++++++ 2 files changed, 758 insertions(+) create mode 100755 tools/testing/selftests/net/forwarding/vxlan_bridge_1q_mc_ul.sh diff --git a/tools/testing/selftests/net/forwarding/Makefile b/tools/testing/selftests/net/forwarding/Makefile index 00bde7b6f39e..d7bb2e80e88c 100644 --- a/tools/testing/selftests/net/forwarding/Makefile +++ b/tools/testing/selftests/net/forwarding/Makefile @@ -102,6 +102,7 @@ TEST_PROGS = bridge_fdb_learning_limit.sh \ vxlan_bridge_1d_port_8472.sh \ vxlan_bridge_1d.sh \ vxlan_bridge_1q_ipv6.sh \ + vxlan_bridge_1q_mc_ul.sh \ vxlan_bridge_1q_port_8472_ipv6.sh \ vxlan_bridge_1q_port_8472.sh \ vxlan_bridge_1q.sh \ diff --git a/tools/testing/selftests/net/forwarding/vxlan_bridge_1q_mc_ul.sh b/tools/testing/selftests/net/forwarding/vxlan_bridge_1q_mc_ul.sh new file mode 100755 index 000000000000..e01e7ccf2c8d --- /dev/null +++ b/tools/testing/selftests/net/forwarding/vxlan_bridge_1q_mc_ul.sh @@ -0,0 +1,757 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +# +-----------------------------------------+ +# | + $h1.10 + $h1.20 | +# | | 192.0.2.1/28 | 2001:db8:1::1/64 | +# | \________ ________/ | +# | \ / | +# | + $h1 H1 (vrf) | +# +-----------|-----------------------------+ +# | +# +-----------|----------------------------------------------------------------+ +# | +---------|--------------------------------------+ SWITCH (main vrf) | +# | | + $swp1 BR1 (802.1q) | | +# | | vid 10 20 | | +# | | | | +# | | + vx10 (vxlan) + vx20 (vxlan) | + lo10 (dummy) | +# | | local 192.0.2.100 local 2001:db8:4::1 | 192.0.2.100/28 | +# | | group 233.252.0.1 group ff0e::1:2:3 | 2001:db8:4::1/64 | +# | | id 1000 id 2000 | | +# | | vid 10 pvid untagged vid 20 pvid untagged | | +# | +------------------------------------------------+ | +# | | +# | + $swp2 $swp3 + | +# | | 192.0.2.33/28 192.0.2.65/28 | | +# | | 2001:db8:2::1/64 2001:db8:3::1/64 | | +# | | | | +# +---|--------------------------------------------------------------------|---+ +# | | +# +---|--------------------------------+ +--------------------------------|---+ +# | | H2 (vrf) | | H3 (vrf) | | +# | +-|----------------------------+ | | +-----------------------------|-+ | +# | | + $h2 BR2 (802.1d) | | | | BR3 (802.1d) $h3 + | | +# | | | | | | | | +# | | + v1$h2 (veth) | | | | v1$h3 (veth) + | | +# | +-|----------------------------+ | | +-----------------------------|-+ | +# | | | | | | +# +---|--------------------------------+ +--------------------------------|---+ +# | | +# +---|--------------------------------+ +--------------------------------|---+ +# | + v2$h2 (veth) NS2 (netns) | | NS3 (netns) v2$h3 (veth) + | +# | 192.0.2.34/28 | | 192.0.2.66/28 | +# | 2001:db8:2::2/64 | | 2001:db8:3::2/64 | +# | | | | +# | +--------------------------------+ | | +--------------------------------+ | +# | | BR1 (802.1q) | | | | BR1 (802.1q) | | +# | | + vx10 (vxlan) | | | | + vx10 (vxlan) | | +# | | local 192.0.2.34 | | | | local 192.0.2.50 | | +# | | group 233.252.0.1 dev v2$h2 | | | | group 233.252.0.1 dev v2$h3 | | +# | | id 1000 dstport $VXPORT | | | | id 1000 dstport $VXPORT | | +# | | vid 10 pvid untagged | | | | vid 10 pvid untagged | | +# | | | | | | | | +# | | + vx20 (vxlan) | | | | + vx20 (vxlan) | | +# | | local 2001:db8:2::2 | | | | local 2001:db8:3::2 | | +# | | group ff0e::1:2:3 dev v2$h2 | | | | group ff0e::1:2:3 dev v2$h3 | | +# | | id 2000 dstport $VXPORT | | | | id 2000 dstport $VXPORT | | +# | | vid 20 pvid untagged | | | | vid 20 pvid untagged | | +# | | | | | | | | +# | | + w1 (veth) | | | | + w1 (veth) | | +# | | | vid 10 20 | | | | | vid 10 20 | | +# | +--|-----------------------------+ | | +--|-----------------------------+ | +# | | | | | | +# | +--|-----------------------------+ | | +--|-----------------------------+ | +# | | + w2 (veth) VW2 (vrf) | | | | + w2 (veth) VW2 (vrf) | | +# | | |\ | | | | |\ | | +# | | | + w2.10 | | | | | + w2.10 | | +# | | | 192.0.2.3/28 | | | | | 192.0.2.4/28 | | +# | | | | | | | | | | +# | | + w2.20 | | | | + w2.20 | | +# | | 2001:db8:1::3/64 | | | | 2001:db8:1::4/64 | | +# | +--------------------------------+ | | +--------------------------------+ | +# +------------------------------------+ +------------------------------------+ + +: "${VXPORT:=4789}" +export VXPORT + +: "${GROUP4:=233.252.0.1}" +export GROUP4 + +: "${GROUP6:=ff0e::1:2:3}" +export GROUP6 + +: "${IPMR:=lo10}" + +ALL_TESTS=" + ipv4_nomcroute + ipv4_mcroute + ipv4_mcroute_changelink + ipv4_mcroute_starg + ipv4_mcroute_noroute + ipv4_mcroute_fdb + ipv4_mcroute_fdb_oif0 + ipv4_mcroute_fdb_oif0_sep + + ipv6_nomcroute + ipv6_mcroute + ipv6_mcroute_changelink + ipv6_mcroute_starg + ipv6_mcroute_noroute + ipv6_mcroute_fdb + ipv6_mcroute_fdb_oif0 + + ipv4_nomcroute_rx + ipv4_mcroute_rx + ipv4_mcroute_starg_rx + ipv4_mcroute_fdb_oif0_sep_rx + ipv4_mcroute_fdb_sep_rx + + ipv6_nomcroute_rx + ipv6_mcroute_rx + ipv6_mcroute_starg_rx + ipv6_mcroute_fdb_sep_rx +" + +NUM_NETIFS=6 +source lib.sh + +h1_create() +{ + simple_if_init $h1 + defer simple_if_fini $h1 + + ip_link_add $h1.10 master v$h1 link $h1 type vlan id 10 + ip_link_set_up $h1.10 + ip_addr_add $h1.10 192.0.2.1/28 + + ip_link_add $h1.20 master v$h1 link $h1 type vlan id 20 + ip_link_set_up $h1.20 + ip_addr_add $h1.20 2001:db8:1::1/64 +} + +install_capture() +{ + local dev=$1; shift + + tc qdisc add dev $dev clsact + defer tc qdisc del dev $dev clsact + + tc filter add dev $dev ingress proto ip pref 104 \ + flower skip_hw ip_proto udp dst_port $VXPORT \ + action pass + defer tc filter del dev $dev ingress proto ip pref 104 + + tc filter add dev $dev ingress proto ipv6 pref 106 \ + flower skip_hw ip_proto udp dst_port $VXPORT \ + action pass + defer tc filter del dev $dev ingress proto ipv6 pref 106 +} + +h2_create() +{ + # $h2 + ip_link_set_up $h2 + + # H2 + vrf_create v$h2 + defer vrf_destroy v$h2 + + ip_link_set_up v$h2 + + # br2 + ip_link_add br2 type bridge vlan_filtering 0 mcast_snooping 0 + ip_link_set_master br2 v$h2 + ip_link_set_up br2 + + # $h2 + ip_link_set_master $h2 br2 + install_capture $h2 + + # v1$h2 + ip_link_set_up v1$h2 + ip_link_set_master v1$h2 br2 +} + +h3_create() +{ + # $h3 + ip_link_set_up $h3 + + # H3 + vrf_create v$h3 + defer vrf_destroy v$h3 + + ip_link_set_up v$h3 + + # br3 + ip_link_add br3 type bridge vlan_filtering 0 mcast_snooping 0 + ip_link_set_master br3 v$h3 + ip_link_set_up br3 + + # $h3 + ip_link_set_master $h3 br3 + install_capture $h3 + + # v1$h3 + ip_link_set_up v1$h3 + ip_link_set_master v1$h3 br3 +} + +switch_create() +{ + # br1 + ip_link_add br1 type bridge vlan_filtering 1 \ + vlan_default_pvid 0 mcast_snooping 0 + ip_link_set_addr br1 $(mac_get $swp1) + ip_link_set_up br1 + + # A dummy to force the IPv6 OIF=0 test to install a suitable MC route on + # $IPMR to be deterministic. Also used for the IPv6 RX!=TX ping test. + ip_link_add "X$IPMR" up type dummy + + # IPMR + ip_link_add "$IPMR" up type dummy + ip_addr_add "$IPMR" 192.0.2.100/28 + ip_addr_add "$IPMR" 2001:db8:4::1/64 + + # $swp1 + ip_link_set_up $swp1 + ip_link_set_master $swp1 br1 + bridge_vlan_add vid 10 dev $swp1 + bridge_vlan_add vid 20 dev $swp1 + + # $swp2 + ip_link_set_up $swp2 + ip_addr_add $swp2 192.0.2.33/28 + ip_addr_add $swp2 2001:db8:2::1/64 + + # $swp3 + ip_link_set_up $swp3 + ip_addr_add $swp3 192.0.2.65/28 + ip_addr_add $swp3 2001:db8:3::1/64 +} + +vx_create() +{ + local name=$1; shift + local vid=$1; shift + + ip_link_add "$name" up type vxlan dstport "$VXPORT" \ + nolearning noudpcsum tos inherit ttl 16 \ + "$@" + ip_link_set_master "$name" br1 + bridge_vlan_add vid $vid dev "$name" pvid untagged +} +export -f vx_create + +vx_wait() +{ + # Wait for all the ARP, IGMP etc. noise to settle down so that the + # tunnel is clear for measurements. + sleep 10 +} + +vx10_create() +{ + vx_create vx10 10 id 1000 "$@" +} +export -f vx10_create + +vx20_create() +{ + vx_create vx20 20 id 2000 "$@" +} +export -f vx20_create + +vx10_create_wait() +{ + vx10_create "$@" + vx_wait +} + +vx20_create_wait() +{ + vx20_create "$@" + vx_wait +} + +ns_init_common() +{ + local ns=$1; shift + local if_in=$1; shift + local ipv4_in=$1; shift + local ipv6_in=$1; shift + local ipv4_host=$1; shift + local ipv6_host=$1; shift + + # v2$h2 / v2$h3 + ip_link_set_up $if_in + ip_addr_add $if_in $ipv4_in + ip_addr_add $if_in $ipv6_in + + # br1 + ip_link_add br1 type bridge vlan_filtering 1 \ + vlan_default_pvid 0 mcast_snooping 0 + ip_link_set_up br1 + + # vx10, vx20 + vx10_create local ${ipv4_in%/*} group $GROUP4 dev $if_in + vx20_create local ${ipv6_in%/*} group $GROUP6 dev $if_in + + # w1 + ip_link_add w1 type veth peer name w2 + ip_link_set_master w1 br1 + ip_link_set_up w1 + bridge_vlan_add vid 10 dev w1 + bridge_vlan_add vid 20 dev w1 + + # w2 + simple_if_init w2 + defer simple_if_fini w2 + + # w2.10 + ip_link_add w2.10 master vw2 link w2 type vlan id 10 + ip_link_set_up w2.10 + ip_addr_add w2.10 $ipv4_host + + # w2.20 + ip_link_add w2.20 master vw2 link w2 type vlan id 20 + ip_link_set_up w2.20 + ip_addr_add w2.20 $ipv6_host +} +export -f ns_init_common + +ns2_create() +{ + # NS2 + ip netns add ns2 + defer ip netns del ns2 + + # v2$h2 + ip link set dev v2$h2 netns ns2 + defer ip -n ns2 link set dev v2$h2 netns 1 + + in_ns ns2 \ + ns_init_common ns2 v2$h2 \ + 192.0.2.34/28 2001:db8:2::2/64 \ + 192.0.2.3/28 2001:db8:1::3/64 +} + +ns3_create() +{ + # NS3 + ip netns add ns3 + defer ip netns del ns3 + + # v2$h3 + ip link set dev v2$h3 netns ns3 + defer ip -n ns3 link set dev v2$h3 netns 1 + + ip -n ns3 link set dev v2$h3 up + + in_ns ns3 \ + ns_init_common ns3 v2$h3 \ + 192.0.2.66/28 2001:db8:3::2/64 \ + 192.0.2.4/28 2001:db8:1::4/64 +} + +setup_prepare() +{ + h1=${NETIFS[p1]} + swp1=${NETIFS[p2]} + + swp2=${NETIFS[p3]} + h2=${NETIFS[p4]} + + swp3=${NETIFS[p5]} + h3=${NETIFS[p6]} + + vrf_prepare + defer vrf_cleanup + + forwarding_enable + defer forwarding_restore + + ip_link_add v1$h2 type veth peer name v2$h2 + ip_link_add v1$h3 type veth peer name v2$h3 + + h1_create + h2_create + h3_create + switch_create + ns2_create + ns3_create +} + +adf_install_broken_sg() +{ + adf_mcd_start "$IPMR" || exit $EXIT_STATUS + + mc_cli add $swp2 192.0.2.100 $GROUP4 $swp1 $swp3 + defer mc_cli remove $swp2 192.0.2.100 $GROUP4 $swp1 $swp3 + + mc_cli add $swp2 2001:db8:4::1 $GROUP6 $swp1 $swp3 + defer mc_cli remove $swp2 2001:db8:4::1 $GROUP6 $swp1 $swp3 +} + +adf_install_rx() +{ + mc_cli add $swp2 0.0.0.0 $GROUP4 "$IPMR" + defer mc_cli remove $swp2 0.0.0.0 $GROUP4 lo10 + + mc_cli add $swp3 0.0.0.0 $GROUP4 "$IPMR" + defer mc_cli remove $swp3 0.0.0.0 $GROUP4 lo10 + + mc_cli add $swp2 :: $GROUP6 "$IPMR" + defer mc_cli remove $swp2 :: $GROUP6 lo10 + + mc_cli add $swp3 :: $GROUP6 "$IPMR" + defer mc_cli remove $swp3 :: $GROUP6 lo10 +} + +adf_install_sg() +{ + adf_mcd_start "$IPMR" || exit $EXIT_STATUS + + mc_cli add "$IPMR" 192.0.2.100 $GROUP4 $swp2 $swp3 + defer mc_cli remove "$IPMR" 192.0.2.33 $GROUP4 $swp2 $swp3 + + mc_cli add "$IPMR" 2001:db8:4::1 $GROUP6 $swp2 $swp3 + defer mc_cli remove "$IPMR" 2001:db8:4::1 $GROUP6 $swp2 $swp3 + + adf_install_rx +} + +adf_install_sg_sep() +{ + adf_mcd_start lo || exit $EXIT_STATUS + + mc_cli add lo 192.0.2.120 $GROUP4 $swp2 $swp3 + defer mc_cli remove lo 192.0.2.120 $GROUP4 $swp2 $swp3 + + mc_cli add lo 2001:db8:5::1 $GROUP6 $swp2 $swp3 + defer mc_cli remove lo 2001:db8:5::1 $GROUP6 $swp2 $swp3 +} + +adf_install_sg_sep_rx() +{ + local lo=$1; shift + + adf_mcd_start "$IPMR" "$lo" || exit $EXIT_STATUS + + mc_cli add "$lo" 192.0.2.120 $GROUP4 $swp2 $swp3 + defer mc_cli remove "$lo" 192.0.2.120 $GROUP4 $swp2 $swp3 + + mc_cli add "$lo" 2001:db8:5::1 $GROUP6 $swp2 $swp3 + defer mc_cli remove "$lo" 2001:db8:5::1 $GROUP6 $swp2 $swp3 + + adf_install_rx +} + +adf_install_starg() +{ + adf_mcd_start "$IPMR" || exit $EXIT_STATUS + + mc_cli add "$IPMR" 0.0.0.0 $GROUP4 $swp2 $swp3 + defer mc_cli remove "$IPMR" 0.0.0.0 $GROUP4 $swp2 $swp3 + + mc_cli add "$IPMR" :: $GROUP6 $swp2 $swp3 + defer mc_cli remove "$IPMR" :: $GROUP6 $swp2 $swp3 + + adf_install_rx +} + +do_packets_v4() +{ + local mac=$(mac_get $h2) + + $MZ $h1 -Q 10 -c 10 -d 100msec -p 64 -a own -b $mac \ + -A 192.0.2.1 -B 192.0.2.2 -t udp sp=1234,dp=2345 -q +} + +do_packets_v6() +{ + local mac=$(mac_get $h2) + + $MZ -6 $h1 -Q 20 -c 10 -d 100msec -p 64 -a own -b $mac \ + -A 2001:db8:1::1 -B 2001:db8:1::2 -t udp sp=1234,dp=2345 -q +} + +do_test() +{ + local ipv=$1; shift + local expect_h2=$1; shift + local expect_h3=$1; shift + local what=$1; shift + + local pref=$((100 + ipv)) + + RET=0 + + local t0_h2=$(tc_rule_stats_get $h2 $pref ingress) + local t0_h3=$(tc_rule_stats_get $h3 $pref ingress) + + do_packets_v$ipv + sleep 1 + + local t1_h2=$(tc_rule_stats_get $h2 $pref ingress) + local t1_h3=$(tc_rule_stats_get $h3 $pref ingress) + + local d_h2=$((t1_h2 - t0_h2)) + local d_h3=$((t1_h3 - t0_h3)) + + ((d_h2 == expect_h2)) + check_err $? "Expected $expect_h2 packets on H2, got $d_h2" + + ((d_h3 == expect_h3)) + check_err $? "Expected $expect_h3 packets on H3, got $d_h3" + + log_test "VXLAN MC flood $what" +} + +ipv4_do_test_rx() +{ + local h3_should_fail=$1; shift + local what=$1; shift + + RET=0 + + ping_do $h1.10 192.0.2.3 + check_err $? "H2 should respond" + + ping_do $h1.10 192.0.2.4 + check_err_fail $h3_should_fail $? "H3 responds" + + log_test "VXLAN MC flood $what" +} + +ipv6_do_test_rx() +{ + local h3_should_fail=$1; shift + local what=$1; shift + + RET=0 + + ping6_do $h1.20 2001:db8:1::3 + check_err $? "H2 should respond" + + ping6_do $h1.20 2001:db8:1::4 + check_err_fail $h3_should_fail $? "H3 responds" + + log_test "VXLAN MC flood $what" +} + +ipv4_nomcroute() +{ + # Install a misleading (S,G) rule to attempt to trick the system into + # pushing the packets elsewhere. + adf_install_broken_sg + vx10_create_wait local 192.0.2.100 group $GROUP4 dev "$swp2" + do_test 4 10 0 "IPv4 nomcroute" +} + +ipv6_nomcroute() +{ + # Like for IPv4, install a misleading (S,G). + adf_install_broken_sg + vx20_create_wait local 2001:db8:4::1 group $GROUP6 dev "$swp2" + do_test 6 10 0 "IPv6 nomcroute" +} + +ipv4_nomcroute_rx() +{ + vx10_create local 192.0.2.100 group $GROUP4 dev "$swp2" + ipv4_do_test_rx 1 "IPv4 nomcroute ping" +} + +ipv6_nomcroute_rx() +{ + vx20_create local 2001:db8:4::1 group $GROUP6 dev "$swp2" + ipv6_do_test_rx 1 "IPv6 nomcroute ping" +} + +ipv4_mcroute() +{ + adf_install_sg + vx10_create_wait local 192.0.2.100 group $GROUP4 dev "$IPMR" mcroute + do_test 4 10 10 "IPv4 mcroute" +} + +ipv6_mcroute() +{ + adf_install_sg + vx20_create_wait local 2001:db8:4::1 group $GROUP6 dev "$IPMR" mcroute + do_test 6 10 10 "IPv6 mcroute" +} + +ipv4_mcroute_rx() +{ + adf_install_sg + vx10_create_wait local 192.0.2.100 group $GROUP4 dev "$IPMR" mcroute + ipv4_do_test_rx 0 "IPv4 mcroute ping" +} + +ipv6_mcroute_rx() +{ + adf_install_sg + vx20_create_wait local 2001:db8:4::1 group $GROUP6 dev "$IPMR" mcroute + ipv6_do_test_rx 0 "IPv6 mcroute ping" +} + +ipv4_mcroute_changelink() +{ + adf_install_sg + vx10_create_wait local 192.0.2.100 group $GROUP4 dev "$IPMR" + ip link set dev vx10 type vxlan mcroute + sleep 1 + do_test 4 10 10 "IPv4 mcroute changelink" +} + +ipv6_mcroute_changelink() +{ + adf_install_sg + vx20_create_wait local 2001:db8:4::1 group $GROUP6 dev "$IPMR" mcroute + ip link set dev vx20 type vxlan mcroute + sleep 1 + do_test 6 10 10 "IPv6 mcroute changelink" +} + +ipv4_mcroute_starg() +{ + adf_install_starg + vx10_create_wait local 192.0.2.100 group $GROUP4 dev "$IPMR" mcroute + do_test 4 10 10 "IPv4 mcroute (*,G)" +} + +ipv6_mcroute_starg() +{ + adf_install_starg + vx20_create_wait local 2001:db8:4::1 group $GROUP6 dev "$IPMR" mcroute + do_test 6 10 10 "IPv6 mcroute (*,G)" +} + +ipv4_mcroute_starg_rx() +{ + adf_install_starg + vx10_create_wait local 192.0.2.100 group $GROUP4 dev "$IPMR" mcroute + ipv4_do_test_rx 0 "IPv4 mcroute (*,G) ping" +} + +ipv6_mcroute_starg_rx() +{ + adf_install_starg + vx20_create_wait local 2001:db8:4::1 group $GROUP6 dev "$IPMR" mcroute + ipv6_do_test_rx 0 "IPv6 mcroute (*,G) ping" +} + +ipv4_mcroute_noroute() +{ + vx10_create_wait local 192.0.2.100 group $GROUP4 dev "$IPMR" mcroute + do_test 4 0 0 "IPv4 mcroute, no route" +} + +ipv6_mcroute_noroute() +{ + vx20_create_wait local 2001:db8:4::1 group $GROUP6 dev "$IPMR" mcroute + do_test 6 0 0 "IPv6 mcroute, no route" +} + +ipv4_mcroute_fdb() +{ + adf_install_sg + vx10_create_wait local 192.0.2.100 dev "$IPMR" mcroute + bridge fdb add dev vx10 \ + 00:00:00:00:00:00 self static dst $GROUP4 via "$IPMR" + do_test 4 10 10 "IPv4 mcroute FDB" +} + +ipv6_mcroute_fdb() +{ + adf_install_sg + vx20_create_wait local 2001:db8:4::1 dev "$IPMR" mcroute + bridge -6 fdb add dev vx20 \ + 00:00:00:00:00:00 self static dst $GROUP6 via "$IPMR" + do_test 6 10 10 "IPv6 mcroute FDB" +} + +# Use FDB to configure VXLAN in a way where oif=0 for purposes of FIB lookup. +ipv4_mcroute_fdb_oif0() +{ + adf_install_sg + vx10_create_wait local 192.0.2.100 group $GROUP4 dev "$IPMR" mcroute + bridge fdb del dev vx10 00:00:00:00:00:00 + bridge fdb add dev vx10 00:00:00:00:00:00 self static dst $GROUP4 + do_test 4 10 10 "IPv4 mcroute oif=0" +} + +ipv6_mcroute_fdb_oif0() +{ + # The IPv6 tunnel lookup does not fall back to selection by source + # address. Instead it just does a FIB match, and that would find one of + # the several ff00::/8 multicast routes -- each device has one. In order + # to reliably force the $IPMR device, add a /128 route for the + # destination group address. + ip -6 route add table local multicast $GROUP6/128 dev "$IPMR" + defer ip -6 route del table local multicast $GROUP6/128 dev "$IPMR" + + adf_install_sg + vx20_create_wait local 2001:db8:4::1 group $GROUP6 dev "$IPMR" mcroute + bridge -6 fdb del dev vx20 00:00:00:00:00:00 + bridge -6 fdb add dev vx20 00:00:00:00:00:00 self static dst $GROUP6 + do_test 6 10 10 "IPv6 mcroute oif=0" +} + +# In oif=0 test as above, have FIB lookup resolve to loopback instead of IPMR. +# This doesn't work with IPv6 -- a MC route on lo would be marked as RTF_REJECT. +ipv4_mcroute_fdb_oif0_sep() +{ + adf_install_sg_sep + + ip_addr_add lo 192.0.2.120/28 + vx10_create_wait local 192.0.2.120 group $GROUP4 dev "$IPMR" mcroute + bridge fdb del dev vx10 00:00:00:00:00:00 + bridge fdb add dev vx10 00:00:00:00:00:00 self static dst $GROUP4 + do_test 4 10 10 "IPv4 mcroute TX!=RX oif=0" +} + +ipv4_mcroute_fdb_oif0_sep_rx() +{ + adf_install_sg_sep_rx lo + + ip_addr_add lo 192.0.2.120/28 + vx10_create_wait local 192.0.2.120 group $GROUP4 dev "$IPMR" mcroute + bridge fdb del dev vx10 00:00:00:00:00:00 + bridge fdb add dev vx10 00:00:00:00:00:00 self static dst $GROUP4 + ipv4_do_test_rx 0 "IPv4 mcroute TX!=RX oif=0 ping" +} + +ipv4_mcroute_fdb_sep_rx() +{ + adf_install_sg_sep_rx lo + + ip_addr_add lo 192.0.2.120/28 + vx10_create_wait local 192.0.2.120 group $GROUP4 dev "$IPMR" mcroute + bridge fdb del dev vx10 00:00:00:00:00:00 + bridge fdb add dev vx10 00:00:00:00:00:00 self static dst $GROUP4 via lo + ipv4_do_test_rx 0 "IPv4 mcroute TX!=RX ping" +} + +ipv6_mcroute_fdb_sep_rx() +{ + adf_install_sg_sep_rx "X$IPMR" + + ip_addr_add "X$IPMR" 2001:db8:5::1/64 + vx20_create_wait local 2001:db8:5::1 group $GROUP6 dev "$IPMR" mcroute + bridge -6 fdb del dev vx20 00:00:00:00:00:00 + bridge -6 fdb add dev vx20 00:00:00:00:00:00 \ + self static dst $GROUP6 via "X$IPMR" + ipv6_do_test_rx 0 "IPv6 mcroute TX!=RX ping" +} + +trap cleanup EXIT + +setup_prepare +setup_wait +tests_run + +exit $EXIT_STATUS -- 2.49.0

3 weeks, 4 days

2
2
0 0

[PATCH v10 0/6] rust: reduce `as` casts, enable related lints

by Tamir Duberstein

This started with a patch that enabled `clippy::ptr_as_ptr`. Benno Lossin suggested I also look into `clippy::ptr_cast_constness` and I discovered `clippy::as_ptr_cast_mut`. This series now enables all 3 lints. It also enables `clippy::as_underscore` which ensures other pointer casts weren't missed. As a later addition, `clippy::cast_lossless` and `clippy::ref_as_ptr` are also enabled. This series depends on "rust: retain pointer mut-ness in `container_of!`"[1]. Link: https://lore.kernel.org/all/20250409-container-of-mutness-v1-1-64f472b94534… [1] Signed-off-by: Tamir Duberstein <tamird(a)gmail.com> --- Changes in v10: - Move fragment from "rust: enable `clippy::ptr_cast_constness` lint" to "rust: enable `clippy::ptr_as_ptr` lint". (Boqun Feng) - Replace `(...).into()` with `T::from(...)` where the destination type isn't obvious in "rust: enable `clippy::cast_lossless` lint". (Boqun Feng) - Link to v9: https://lore.kernel.org/r/20250416-ptr-as-ptr-v9-0-18ec29b1b1f3@gmail.com Changes in v9: - Replace ref-to-ptr coercion using `let` bindings with `core::ptr::from_{ref,mut}`. (Boqun Feng). - Link to v8: https://lore.kernel.org/r/20250409-ptr-as-ptr-v8-0-3738061534ef@gmail.com Changes in v8: - Use coercion to go ref -> ptr. - rustfmt. - Rebase on v6.15-rc1. - Extract first commit to its own series as it is shared with other series. - Link to v7: https://lore.kernel.org/r/20250325-ptr-as-ptr-v7-0-87ab452147b9@gmail.com Changes in v7: - Add patch to enable `clippy::ref_as_ptr`. - Link to v6: https://lore.kernel.org/r/20250324-ptr-as-ptr-v6-0-49d1b7fd4290@gmail.com Changes in v6: - Drop strict provenance patch. - Fix URLs in doc comments. - Add patch to enable `clippy::cast_lossless`. - Rebase on rust-next. - Link to v5: https://lore.kernel.org/r/20250317-ptr-as-ptr-v5-0-5b5f21fa230a@gmail.com Changes in v5: - Use `pointer::addr` in OF. (Boqun Feng) - Add documentation on stubs. (Benno Lossin) - Mark stubs `#[inline]`. - Pick up Alice's RB on a shared commit from https://lore.kernel.org/all/Z9f-3Aj3_FWBZRrm@google.com/. - Link to v4: https://lore.kernel.org/r/20250315-ptr-as-ptr-v4-0-b2d72c14dc26@gmail.com Changes in v4: - Add missing SoB. (Benno Lossin) - Use `without_provenance_mut` in alloc. (Boqun Feng) - Limit strict provenance lints to the `kernel` crate to avoid complex logic in the build system. This can be revisited on MSRV >= 1.84.0. - Rebase on rust-next. - Link to v3: https://lore.kernel.org/r/20250314-ptr-as-ptr-v3-0-e7ba61048f4a@gmail.com Changes in v3: - Fixed clippy warning in rust/kernel/firmware.rs. (kernel test robot) Link: https://lore.kernel.org/all/202503120332.YTCpFEvv-lkp@intel.com/ - s/as u64/as bindings::phys_addr_t/g. (Benno Lossin) - Use strict provenance APIs and enable lints. (Benno Lossin) - Link to v2: https://lore.kernel.org/r/20250309-ptr-as-ptr-v2-0-25d60ad922b7@gmail.com Changes in v2: - Fixed typo in first commit message. - Added additional patches, converted to series. - Link to v1: https://lore.kernel.org/r/20250307-ptr-as-ptr-v1-1-582d06514c98@gmail.com --- Tamir Duberstein (6): rust: enable `clippy::ptr_as_ptr` lint rust: enable `clippy::ptr_cast_constness` lint rust: enable `clippy::as_ptr_cast_mut` lint rust: enable `clippy::as_underscore` lint rust: enable `clippy::cast_lossless` lint rust: enable `clippy::ref_as_ptr` lint Makefile | 6 ++++++ drivers/gpu/drm/drm_panic_qr.rs | 2 +- rust/bindings/lib.rs | 3 +++ rust/kernel/alloc/allocator_test.rs | 2 +- rust/kernel/alloc/kvec.rs | 4 ++-- rust/kernel/block/mq/operations.rs | 2 +- rust/kernel/block/mq/request.rs | 6 +++--- rust/kernel/device.rs | 4 ++-- rust/kernel/device_id.rs | 4 ++-- rust/kernel/devres.rs | 19 ++++++++++--------- rust/kernel/dma.rs | 6 +++--- rust/kernel/error.rs | 2 +- rust/kernel/firmware.rs | 3 ++- rust/kernel/fs/file.rs | 2 +- rust/kernel/io.rs | 18 +++++++++--------- rust/kernel/kunit.rs | 11 +++++++---- rust/kernel/list/impl_list_item_mod.rs | 2 +- rust/kernel/miscdevice.rs | 2 +- rust/kernel/net/phy.rs | 4 ++-- rust/kernel/of.rs | 6 +++--- rust/kernel/pci.rs | 11 +++++++---- rust/kernel/platform.rs | 4 +++- rust/kernel/print.rs | 6 +++--- rust/kernel/seq_file.rs | 2 +- rust/kernel/str.rs | 14 +++++++------- rust/kernel/sync/poll.rs | 2 +- rust/kernel/time/hrtimer/pin.rs | 2 +- rust/kernel/time/hrtimer/pin_mut.rs | 2 +- rust/kernel/uaccess.rs | 4 ++-- rust/kernel/workqueue.rs | 12 ++++++------ rust/uapi/lib.rs | 3 +++ 31 files changed, 96 insertions(+), 74 deletions(-) --- base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8 change-id: 20250307-ptr-as-ptr-21b1867fc4d4 prerequisite-change-id: 20250409-container-of-mutness-b153dab4388d:v1 prerequisite-patch-id: 53d5889db599267f87642bb0ae3063c29bc24863 Best regards, -- Tamir Duberstein <tamird(a)gmail.com>

3 weeks, 4 days

3
13
0 0

[PATCH net-next v2 0/4] udp_tunnel: remove rtnl_lock dependency

by Stanislav Fomichev

Recently bnxt had to grow back a bunch of rtnl dependencies because of udp_tunnel's infra. Add separate (global) mutext to protect udp_tunnel state. v2: - move the lock into udp_tunnel_nic (Jakub) - reorder the lock ordering (Jakub) - move udp_ports_sleep removal into separate patch and update the test (Jakub) Cc: Michael Chan <michael.chan(a)broadcom.com> Stanislav Fomichev (4): udp_tunnel: remove rtnl_lock dependency net: remove redundant ASSERT_RTNL() in queue setup functions netdevsim: remove udp_ports_sleep Revert "bnxt_en: bring back rtnl_lock() in the bnxt_open() path" .../net/ethernet/broadcom/bnx2x/bnx2x_main.c | 3 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 42 ++++--------------- drivers/net/ethernet/emulex/benet/be_main.c | 3 +- drivers/net/ethernet/intel/i40e/i40e_main.c | 1 - drivers/net/ethernet/intel/ice/ice_main.c | 1 - .../net/ethernet/mellanox/mlx4/en_netdev.c | 3 +- .../net/ethernet/mellanox/mlx5/core/en_main.c | 3 +- .../ethernet/netronome/nfp/nfp_net_common.c | 3 +- .../net/ethernet/qlogic/qede/qede_filter.c | 3 -- .../net/ethernet/qlogic/qlcnic/qlcnic_main.c | 1 - drivers/net/ethernet/sfc/ef10.c | 1 - drivers/net/netdevsim/netdevsim.h | 2 - drivers/net/netdevsim/udp_tunnels.c | 12 ------ include/net/udp_tunnel.h | 8 ++-- net/core/dev.c | 2 - net/ipv4/udp_tunnel_nic.c | 30 +++++++------ .../drivers/net/netdevsim/udp_tunnel_nic.sh | 10 ----- 17 files changed, 31 insertions(+), 97 deletions(-) -- 2.49.0

3 weeks, 4 days

4
9
0 0

[PATCH v3 0/4] KVM: arm64: selftests: arch_timer_edge_cases fixes

by Sebastian Ott

Some small fixes for arch_timer_edge_cases that I stumbled upon while debugging failures for this selftest on ampere-one. Changes since v1: * determine effective counter width based on suggestions from Marc Changes since v2: * new patch to fix xval initialization I've done tests with this on various machines - no issues during several hundreds of test runs. v1: https://lore.kernel.org/kvmarm/20250509143312.34224-1-sebott@redhat.com/ v2: https://lore.kernel.org/kvmarm/20250527142434.25209-1-sebott@redhat.com/ Sebastian Ott (4): KVM: arm64: selftests: fix help text for arch_timer_edge_cases KVM: arm64: selftests: fix thread migration in arch_timer_edge_cases KVM: arm64: selftests: arch_timer_edge_cases - fix xval init KVM: arm64: selftests: arch_timer_edge_cases - determine effective counter width .../kvm/arm64/arch_timer_edge_cases.c | 39 ++++++++++++------- 1 file changed, 25 insertions(+), 14 deletions(-) base-commit: 0ff41df1cb268fc69e703a08a57ee14ae967d0ca -- 2.49.0

3 weeks, 4 days

3
7
0 0

[PATCH v17 net-next 0/5] DUALPI2 patch

by chia-yu.chang＠nokia-bell-labs.com

From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com> Hello, Please find the DualPI2 patch v17. This patch serise adds DualPI Improved with a Square (DualPI2) with following features: * Supports congestion controls that comply with the Prague requirements in RFC9331 (e.g. TCP-Prague) * Coupled dual-queue that separates the L4S traffic in a low latency queue (L-queue), without harming remaining traffic that is scheduled in classic queue (C-queue) due to congestion-coupling using PI2 as defined in RFC9332 * Configurable overload strategies * Use of sojourn time to reliably estimate queue delay * Supports ECN L4S-identifier (IP.ECN==0b*1) to classify traffic into respective queues For more details of DualPI2, please refer IETF RFC9332 (https://datatracker.ietf.org/doc/html/rfc9332). Best regards, Chia-Yu --- v17 (25-May-2025, Resent at 10-Jun-2025) - Replace 0xffffffff with U32_MAX (Paolo Abeni <pabeni(a)redhat.com>) - Use helper function qdisc_dequeue_internal() and add new helper function skb_apply_step() (Paolo Abeni <pabeni(a)redhat.com>) - Add s64 casting when calculating the delta of the PI controller (Paolo Abeni <pabeni(a)redhat.com>) - Change the drop reason into SKB_DROP_REASON_QDISC_CONGESTED for drop_early (Paolo Abeni <pabeni(a)redhat.com>) - Modify the condition to remove the original skb when enqueuing multiple GSO segments (Paolo Abeni <pabeni(a)redhat.com>) - Add READ_ONCE() in dualpi2_dump_stat() (Paolo Abeni <pabeni(a)redhat.com>) - Add comments, brackets, and brackets for readability (Paolo Abeni <pabeni(a)redhat.com>) v16 (16-MAy-2025) - Add qdisc_lock() to dualpi2_timer() in dualpi2_timer (Paolo Abeni <pabeni(a)redhat.com>) - Introduce convert_ns_to_usec() to convert usec to nsec without overflow in #1 (Paolo Abeni <pabeni(a)redhat.com>) - Update convert_us_tonsec() to convert nsec to usec without overflow in #2 (Paolo Abeni <pabeni(a)redhat.com>) - Add more descriptions with respect to DualPI2 in the cover ltter and add changelog in each patch (Paolo Abeni <pabeni(a)redhat.com>) v15 (09-May-2025) - Add enum of TCA_DUALPI2_ECN_MASK_CLA_ECT to remove potential leakeage in #1 (Simon Horman <horms(a)kernel.org>) - Fix one typo in comment of #2 - Update tc.yaml in #5 to aligh with the updated enum of pkt_sched.h v14 (05-May-2025) - Modify tc.yaml: (1) Replace flags with enum and remove enum-as-flags, (2) Remove credit-queue in xstats, and (3) Change attribute types (Donald Hunter <donald.hun - Add enum and fix the ordering of variables in pkt_sched.h to align with the modified tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Add validators for DROP_OVERLOAD, DROP_EARLY, ECN_MASK, and SPLIT_GSO in sch_dualpi2.c (Donald Hunter <donald.hunter(a)gmail.com>) - Update dualpi2.json to align with the updated variable order in pkt_sched.h - Reorder patches (Donald Hunter <donald.hunter(a)gmail.com>) v13 (26-Apr-2025) - Use dashes in member names to follow YNL conventions in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Define enumerations separately for flags of drop-early, drop-overload, ecn-mask, credit-queue in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Change the types of split-gso and step-packets into flag in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Revert to u32/u8 types for tc-dualpi2-xstats members in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Add new test cases in tc-tests/qdiscs/dualpi2.json to cover all dualpi2 parameters (Donald Hunter <donald.hunter(a)gmail.com>) - Change the type of TCA_DUALPI2_STEP_PACKETS into NLA_FLAG (Donald Hunter <donald.hunter(a)gmail.com>) v12 (22-Apr-2025) - Remove anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>) - Replace u32/u8 with uint and s32 with int in tc spec document (Paolo Abeni <pabeni(a)redhat.com>) - Introduce get_memory_limit function to handle potential overflow when multipling limit with MTU (Paolo Abeni <pabeni(a)redhat.com>) - Double the packet length to further include packet overhead in memory_limit (Paolo Abeni <pabeni(a)redhat.com>) - Remove the check of qdisc_qlen(sch) when calling qdisc_tree_reduce_backlog (Paolo Abeni <pabeni(a)redhat.com>) v11 (15-Apr-2025) - Replace hstimer_init with hstimer_setup in sch_dualpi2.c v10 (25-Mar-2025) - Remove leftover include in include/linux/netdevice.h and anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>) - Use kfree_skb_reason() and add SKB_DROP_REASON_DUALPI2_STEP_DROP drop reason (Paolo Abeni <pabeni(a)redhat.com>) - Split sch_dualpi2.c into 3 patches (and overall 5 patches): Struct definition & parsing, Dump stats & configuration, Enqueue/Dequeue (Paolo Abeni <pabeni(a)redhat.com>) v9 (16-Mar-2025) - Fix mem_usage error in previous version - Add min_qlen_step to the dualpi2 attribute as the minimum queue length in number of packets in the L-queue to start step threshold marking. In previous versions, this value was fixed to 2, so the step threshold was applied to mark packets in the L queue only when the queue length of the L queue was greater than or equal to 2 packets. This will cause larger queuing delays for L4S traffic at low rates (<20Mbps). So we parameterize it and change the default value to 0. Comparison of tcp_1down run 'HTB 20Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 11.55 11.70 ms 350 TCP upload avg : 18.96 N/A Mbits/s 350 TCP upload sum : 18.96 N/A Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 10.81 10.70 ms 350 TCP upload avg : 18.91 N/A Mbits/s 350 TCP upload sum : 18.91 N/A Mbits/s 350 Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 12.61 12.80 ms 350 TCP upload avg : 9.48 N/A Mbits/s 350 TCP upload sum : 9.48 N/A Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 11.06 10.80 ms 350 TCP upload avg : 9.43 N/A Mbits/s 350 TCP upload sum : 9.43 N/A Mbits/s 350 Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 40.86 37.45 ms 350 TCP upload avg : 0.88 N/A Mbits/s 350 TCP upload sum : 0.88 N/A Mbits/s 350 TCP upload::1 : 0.88 0.97 Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 11.07 10.40 ms 350 TCP upload avg : 0.55 N/A Mbits/s 350 TCP upload sum : 0.55 N/A Mbits/s 350 TCP upload::1 : 0.55 0.59 Mbits/s 350 v8 (11-Mar-2025) - Fix warning messages in v7 v7 (07-Mar-2025) - Separate into 3 patches to avoid mixing changes of documentation, selftest, and code. (Cong Wang <xiyou.wangcong(a)gmail.com>) v6 (04-Mar-2025) - Add modprobe for dulapi2 in tc-testing script tc-testing/tdc.sh (Jakub Kicinski <kuba(a)kernel.org>) - Update test cases in dualpi2.json - Update commit message v5 (22-Feb-2025) - A comparison was done between MQ + DUALPI2, MQ + FQ_PIE, MQ + FQ_CODEL: Unshaped 1gigE with 4 download streams test: - Summary of tcp_4down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 1.19 1.34 ms 349 TCP download avg : 235.42 N/A Mbits/s 349 TCP download sum : 941.68 N/A Mbits/s 349 TCP download::1 : 235.19 235.39 Mbits/s 349 TCP download::2 : 235.03 235.35 Mbits/s 349 TCP download::3 : 236.89 235.44 Mbits/s 349 TCP download::4 : 234.57 235.19 Mbits/s 349 - Summary of tcp_4down run 'MQ + FQ_PIE' avg median # data pts Ping (ms) ICMP : 1.21 1.37 ms 350 TCP download avg : 235.42 N/A Mbits/s 350 TCP download sum : 941.61 N/A Mbits/s 350 TCP download::1 : 232.54 233.13 Mbits/s 350 TCP download::2 : 232.52 232.80 Mbits/s 350 TCP download::3 : 233.14 233.78 Mbits/s 350 TCP download::4 : 243.41 241.48 Mbits/s 350 - Summary of tcp_4down run 'MQ + DUALPI2' avg median # data pts Ping (ms) ICMP : 1.19 1.34 ms 349 TCP download avg : 235.42 N/A Mbits/s 349 TCP download sum : 941.68 N/A Mbits/s 349 TCP download::1 : 235.19 235.39 Mbits/s 349 TCP download::2 : 235.03 235.35 Mbits/s 349 TCP download::3 : 236.89 235.44 Mbits/s 349 TCP download::4 : 234.57 235.19 Mbits/s 349 Unshaped 1gigE with 128 download streams test: - Summary of tcp_128down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 Unshaped 10gigE with 4 download streams test: - Summary of tcp_4down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 0.22 0.23 ms 350 TCP download avg : 2354.08 N/A Mbits/s 350 TCP download sum : 9416.31 N/A Mbits/s 350 TCP download::1 : 2353.65 2352.81 Mbits/s 350 TCP download::2 : 2354.54 2354.21 Mbits/s 350 TCP download::3 : 2353.56 2353.78 Mbits/s 350 TCP download::4 : 2354.56 2354.45 Mbits/s 350 - Summary of tcp_4down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 0.20 0.19 ms 350 TCP download avg : 2354.76 N/A Mbits/s 350 TCP download sum : 9419.04 N/A Mbits/s 350 TCP download::1 : 2354.77 2353.89 Mbits/s 350 TCP download::2 : 2353.41 2354.29 Mbits/s 350 TCP download::3 : 2356.18 2354.19 Mbits/s 350 TCP download::4 : 2354.68 2353.15 Mbits/s 350 - Summary of tcp_4down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 0.24 0.24 ms 350 TCP download avg : 2354.11 N/A Mbits/s 350 TCP download sum : 9416.43 N/A Mbits/s 350 TCP download::1 : 2354.75 2353.93 Mbits/s 350 TCP download::2 : 2353.15 2353.75 Mbits/s 350 TCP download::3 : 2353.49 2353.72 Mbits/s 350 TCP download::4 : 2355.04 2353.73 Mbits/s 350 Unshaped 10gigE with 128 download streams test: - Summary of tcp_128down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 7.57 8.69 ms 350 TCP download avg : 73.97 N/A Mbits/s 350 TCP download sum : 9467.82 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 7.82 8.91 ms 350 TCP download avg : 73.97 N/A Mbits/s 350 TCP download sum : 9468.42 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 6.87 7.93 ms 350 TCP download avg : 73.95 N/A Mbits/s 350 TCP download sum : 9465.87 N/A Mbits/s 350 From the results shown above, we see small differences between combinations. - Update commit message to include results of no_split_gso and split_gso (Dave Taht <dave.taht(a)gmail.com> and Paolo Abeni <pabeni(a)redhat.com>) - Add memlimit in the dualpi2 attribute, and add memory_used, max_memory_used, memory_limit in dualpi2 stats (Dave Taht <dave.taht(a)gmail.com>) - Update note in sch_dualpi2.c related to BBRv3 status (Dave Taht <dave.taht(a)gmail.com>) - Update license identifier (Dave Taht <dave.taht(a)gmail.com>) - Add selftest in tools/testing/selftests/tc-testing (Cong Wang <xiyou.wangcong(a)gmail.com>) - Use netlink policies for parameter checks (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Modify texts & fix typos in Documentation/netlink/specs/tc.yaml (Dave Taht <dave.taht(a)gmail.com>) - Add descriptions of packet counter statistics and the reset function of sch_dualpi2.c - Fix step_thresh in packets - Update code comments in sch_dualpi2.c v4 (22-Oct-2024) - Update statement in Kconfig for DualPI2 (Stephen Hemminger <stephen(a)networkplumber.org>) - Put a blank line after #define in sch_dualpi2.c (Stephen Hemminger <stephen(a)networkplumber.org>) - Fix line length warning. v3 (19-Oct-2024) - Fix compilaiton error - Update Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>) v2 (18-Oct-2024) - Add Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>) - Use dualpi2 instead of skb prefix (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Replace nla_parse_nested_deprecated with nla_parse_nested (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Fix line length warning --- Chia-Yu Chang (4): sched: Struct definition and parsing of dualpi2 qdisc sched: Dump configuration and statistics of dualpi2 qdisc selftests/tc-testing: Add selftests for qdisc DualPI2 Documentation: netlink: specs: tc: Add DualPI2 specification Koen De Schepper (1): sched: Add enqueue/dequeue of dualpi2 qdisc Documentation/netlink/specs/tc.yaml | 156 +++ include/net/dropreason-core.h | 6 + include/uapi/linux/pkt_sched.h | 68 + net/sched/Kconfig | 12 + net/sched/Makefile | 1 + net/sched/sch_dualpi2.c | 1146 +++++++++++++++++ tools/testing/selftests/tc-testing/config | 1 + .../tc-testing/tc-tests/qdiscs/dualpi2.json | 254 ++++ tools/testing/selftests/tc-testing/tdc.sh | 1 + 9 files changed, 1645 insertions(+) create mode 100644 net/sched/sch_dualpi2.c create mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/dualpi2.json -- 2.34.1

3 weeks, 4 days

1
5
0 0

[PATCH bpf-next v4 0/9] bpf: Mitigate Spectre v1 using barriers

by Luis Gerhorst

This improves the expressiveness of unprivileged BPF by inserting speculation barriers instead of rejecting the programs. The approach was previously presented at LPC'24 [1] and RAID'24 [2]. To mitigate the Spectre v1 (PHT) vulnerability, the kernel rejects potentially-dangerous unprivileged BPF programs as of commit 9183671af6db ("bpf: Fix leakage under speculation on mispredicted branches"). In [2], we have analyzed 364 object files from open source projects (Linux Samples and Selftests, BCC, Loxilb, Cilium, libbpf Examples, Parca, and Prevail) and found that this affects 31% to 54% of programs. To resolve this in the majority of cases this patchset adds a fall-back for mitigating Spectre v1 using speculation barriers. The kernel still optimistically attempts to verify all speculative paths but uses speculation barriers against v1 when unsafe behavior is detected. This allows for more programs to be accepted without disabling the BPF Spectre mitigations (e.g., by setting cpu_mitigations_off()). For this, it relies on the fact that speculation barriers generally prevent all later instructions from executing if the speculation was not correct (not only loads). See patch 7 ("bpf: Fall back to nospec for Spectre v1") for a detailed description and references to the relevant vendor documentation (AMD and Intel x86-64, ARM64, and PowerPC). In [1] we have measured the overhead of this approach relative to having mitigations off and including the upstream Spectre v4 mitigations. For event tracing and stack-sampling profilers, we found that mitigations increase BPF program execution time by 0% to 62%. For the Loxilb network load balancer, we have measured a 14% slowdown in SCTP performance but no significant slowdown for TCP. This overhead only applies to programs that were previously rejected. I reran the expressiveness-evaluation with v6.14 and made sure the main results still match those from [1] and [2] (which used v6.5). Main design decisions are: * Do not use separate bytecode insns for v1 and v4 barriers (inspired by Daniel Borkmann's question at LPC). This simplifies the verifier significantly and has the only downside that performance on PowerPC is not as high as it could be. * Allow archs to still disable v1/v4 mitigations separately by setting bpf_jit_bypass_spec_v1/v4(). This has the benefit that archs can benefit from improved BPF expressiveness / performance if they are not vulnerable (e.g., ARM64 for v4 in the kernel). * Do not remove the empty BPF_NOSPEC implementation for backends for which it is unknown whether they are vulnerable to Spectre v1. [1] https://lpc.events/event/18/contributions/1954/ ("Mitigating Spectre-PHT using Speculation Barriers in Linux eBPF") [2] https://arxiv.org/pdf/2405.00078 ("VeriFence: Lightweight and Precise Spectre Defenses for Untrusted Linux Kernel Extensions") Changes: * v3 -> v4: - Remove insn parameter from do_check_insn() and extract process_bpf_exit_full as a function as requested by Eduard - Investigate apparent sanitize_check_bounds() bug reported by Kartikeya (does appear to not be a bug but only confusing code), sent separate patch to document it and add an assert - Remove already-merged commit 1 ("selftests/bpf: Fix caps for __xlated/jited_unpriv") - Drop former commit 10 ("bpf: Allow nospec-protected var-offset stack access") as it did not include a test and there are other places where var-off is rejected. Also, none of the tested real-world programs used var-off in the paper. Therefore keep the old behavior for now and potentially prepare a patch that converts all cases later if required. - Add link to AMD lfence and PowerPC speculation barrier (ori 31,31,0) documentation - Move detailed barrier documentation to commit 7 ("bpf: Fall back to nospec for Spectre v1") - Link to v3: https://lore.kernel.org/all/20250501073603.1402960-1-luis.gerhorst@fau.de/ * v2 -> v3: - Fix https://lore.kernel.org/oe-kbuild-all/202504212030.IF1SLhz6-lkp@intel.com/ and similar by moving the bpf_jit_bypass_spec_v1/v4() prototypes out of the #ifdef CONFIG_BPF_SYSCALL. Decided not to move them to filter.h (where similar bpf_jit_*() prototypes live) as they would still have to be duplicated in bpf.h to be usable to bpf_bypass_spec_v1/v4() (unless including filter.h in bpf.h is an option). - Fix https://lore.kernel.org/oe-kbuild-all/202504220035.SoGveGpj-lkp@intel.com/ by moving the variable declarations out of the switch-case. - Build touched C files with W=2 and bpf config on x86 to check that there are no other warnings introduced. - Found 3 more checkpatch warnings that can be fixed without degrading readability. - Rebase to bpf-next 2025-05-01 - Link to v2: https://lore.kernel.org/bpf/20250421091802.3234859-1-luis.gerhorst@fau.de/ * v1 -> v2: - Drop former commits 9 ("bpf: Return PTR_ERR from push_stack()") and 11 ("bpf: Fall back to nospec for spec path verification") as suggested by Alexei. This series therefore no longer changes push_stack() to return PTR_ERR. - Add detailed explanation of how lfence works internally and how it affects the algorithm. - Add tests checking that nospec instructions are inserted in expected locations using __xlated_unpriv as suggested by Eduard (also, include a fix for __xlated_unpriv) - Add a test for the mitigations from the description of commit 9183671af6db ("bpf: Fix leakage under speculation on mispredicted branches") - Remove unused variables from do_check[_insn]() as suggested by Eduard. - Remove INSN_IDX_MODIFIED to improve readability as suggested by Eduard. This also causes the nospec_result-check to run (and fail) for jumping-ops. Add a warning to assert that this check must never succeed in that case. - Add details on the safety of patch 10 ("bpf: Allow nospec-protected var-offset stack access") based on the feedback on v1. - Rebase to bpf-next-250420 - Link to v1: https://lore.kernel.org/all/20250313172127.1098195-1-luis.gerhorst@fau.de/ * RFC -> v1: - rebase to bpf-next-250313 - tests: mark expected successes/new errors - add bpt_jit_bypass_spec_v1/v4() to avoid #ifdef in bpf_bypass_spec_v1/v4() - ensure that nospec with v1-support is implemented for archs for which GCC supports speculation barriers, except for MIPS - arm64: emit speculation barrier - powerpc: change nospec to include v1 barrier - discuss potential security (archs that do not impl. BPF nospec) and performance (only PowerPC) regressions - Link to RFC: https://lore.kernel.org/bpf/20250224203619.594724-1-luis.gerhorst@fau.de/ Luis Gerhorst (9): bpf: Move insn if/else into do_check_insn() bpf: Return -EFAULT on misconfigurations bpf: Return -EFAULT on internal errors bpf, arm64, powerpc: Add bpf_jit_bypass_spec_v1/v4() bpf, arm64, powerpc: Change nospec to include v1 barrier bpf: Rename sanitize_stack_spill to nospec_result bpf: Fall back to nospec for Spectre v1 selftests/bpf: Add test for Spectre v1 mitigation bpf: Fall back to nospec for sanitization-failures arch/arm64/net/bpf_jit.h | 5 + arch/arm64/net/bpf_jit_comp.c | 28 +- arch/powerpc/net/bpf_jit_comp64.c | 80 ++- include/linux/bpf.h | 11 +- include/linux/bpf_verifier.h | 3 +- include/linux/filter.h | 2 +- kernel/bpf/core.c | 32 +- kernel/bpf/verifier.c | 633 ++++++++++-------- tools/testing/selftests/bpf/progs/bpf_misc.h | 4 + .../selftests/bpf/progs/verifier_and.c | 8 +- .../selftests/bpf/progs/verifier_bounds.c | 66 +- .../bpf/progs/verifier_bounds_deduction.c | 45 +- .../selftests/bpf/progs/verifier_map_ptr.c | 20 +- .../selftests/bpf/progs/verifier_movsx.c | 16 +- .../selftests/bpf/progs/verifier_unpriv.c | 65 +- .../bpf/progs/verifier_value_ptr_arith.c | 101 ++- .../selftests/bpf/verifier/dead_code.c | 3 +- tools/testing/selftests/bpf/verifier/jmp32.c | 33 +- tools/testing/selftests/bpf/verifier/jset.c | 10 +- 19 files changed, 755 insertions(+), 410 deletions(-) base-commit: cd2e103d57e5615f9bb027d772f93b9efd567224 -- 2.49.0

3 weeks, 4 days

4
12
0 0

[PATCH] selftests/mm: Increase timeout from 180 to 900 seconds

by Shivank Garg

The mm selftests are timing out with the current 180-second limit. Testing shows that run_vmtests.sh takes approximately 11 minutes (664 seconds) to complete. Increase the timeout to 900 seconds (15 minutes) to provide sufficient buffer for the tests to complete successfully. Signed-off-by: Shivank Garg <shivankg(a)amd.com> --- tools/testing/selftests/mm/settings | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/mm/settings b/tools/testing/selftests/mm/settings index a953c96aa16e..e2206265f67c 100644 --- a/tools/testing/selftests/mm/settings +++ b/tools/testing/selftests/mm/settings @@ -1 +1 @@ -timeout=180 +timeout=900 -- 2.43.0

3 weeks, 4 days

2
2
0 0

[PATCH net v2] net: clear the dst when changing skb protocol

by Jakub Kicinski

A not-so-careful NAT46 BPF program can crash the kernel if it indiscriminately flips ingress packets from v4 to v6: BUG: kernel NULL pointer dereference, address: 0000000000000000 ip6_rcv_core (net/ipv6/ip6_input.c:190:20) ipv6_rcv (net/ipv6/ip6_input.c:306:8) process_backlog (net/core/dev.c:6186:4) napi_poll (net/core/dev.c:6906:9) net_rx_action (net/core/dev.c:7028:13) do_softirq (kernel/softirq.c:462:3) netif_rx (net/core/dev.c:5326:3) dev_loopback_xmit (net/core/dev.c:4015:2) ip_mc_finish_output (net/ipv4/ip_output.c:363:8) NF_HOOK (./include/linux/netfilter.h:314:9) ip_mc_output (net/ipv4/ip_output.c:400:5) dst_output (./include/net/dst.h:459:9) ip_local_out (net/ipv4/ip_output.c:130:9) ip_send_skb (net/ipv4/ip_output.c:1496:8) udp_send_skb (net/ipv4/udp.c:1040:8) udp_sendmsg (net/ipv4/udp.c:1328:10) The output interface has a 4->6 program attached at ingress. We try to loop the multicast skb back to the sending socket. Ingress BPF runs as part of netif_rx(), pushes a valid v6 hdr and changes skb->protocol to v6. We enter ip6_rcv_core which tries to use skb_dst(). But the dst is still an IPv4 one left after IPv4 mcast output. Clear the dst in all BPF helpers which change the protocol. Also clear the dst if we did an encap or decap as those will most likely make the dst stale. Try to preserve metadata dsts, those may carry non-routing metadata. Reviewed-by: Maciej Żenczykowski <maze(a)google.com> Acked-by: Daniel Borkmann <daniel(a)iogearbox.net> Fixes: d219df60a70e ("bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()") Fixes: 1b00e0dfe7d0 ("bpf: update skb->protocol in bpf_skb_net_grow") Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper") Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- v2: - drop on encap/decap - fix typo (protcol) - add the test to the Makefile v1: https://lore.kernel.org/20250604210604.257036-1-kuba@kernel.org I wonder if we should not skip ingress (tc_skip_classify?) for looped back packets in the first place. But that doesn't seem robust enough vs multiple redirections to solve the crash. Ignoring LOOPBACK packets (like the NAT46 prog should) doesn't work either, since BPF can change pkt_type arbitrarily. CC: martin.lau(a)linux.dev CC: daniel(a)iogearbox.net CC: john.fastabend(a)gmail.com CC: eddyz87(a)gmail.com CC: sdf(a)fomichev.me CC: haoluo(a)google.com CC: willemb(a)google.com CC: william.xuanziyang(a)huawei.com CC: alan.maguire(a)oracle.com CC: bpf(a)vger.kernel.org CC: edumazet(a)google.com CC: maze(a)google.com CC: shuah(a)kernel.org CC: linux-kselftest(a)vger.kernel.org CC: yonghong.song(a)linux.dev --- tools/testing/selftests/net/Makefile | 1 + net/core/filter.c | 31 +++++++++++++++++++------- tools/testing/selftests/net/nat6to4.sh | 15 +++++++++++++ 3 files changed, 39 insertions(+), 8 deletions(-) create mode 100755 tools/testing/selftests/net/nat6to4.sh diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index ea84b88bcb30..ab996bd22a5f 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -27,6 +27,7 @@ TEST_PROGS += amt.sh TEST_PROGS += unicast_extensions.sh TEST_PROGS += udpgro_fwd.sh TEST_PROGS += udpgro_frglist.sh +TEST_PROGS += nat6to4.sh TEST_PROGS += veth.sh TEST_PROGS += ioam6.sh TEST_PROGS += gro.sh diff --git a/net/core/filter.c b/net/core/filter.c index 327ca73f9cd7..d5917d6446f2 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3406,8 +3406,14 @@ BPF_CALL_3(bpf_skb_change_proto, struct sk_buff *, skb, __be16, proto, * need to be verified first. */ ret = bpf_skb_proto_xlat(skb, proto); + if (ret) + return ret; + bpf_compute_data_pointers(skb); - return ret; + if (skb_valid_dst(skb)) + skb_dst_drop(skb); + + return 0; } static const struct bpf_func_proto bpf_skb_change_proto_proto = { @@ -3554,6 +3560,9 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, else if (skb->protocol == htons(ETH_P_IPV6) && flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV4) skb->protocol = htons(ETH_P_IP); + + if (skb_valid_dst(skb)) + skb_dst_drop(skb); } if (skb_is_gso(skb)) { @@ -3581,6 +3590,7 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff, u64 flags) { + bool decap = flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK; int ret; if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO | @@ -3603,13 +3613,18 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff, if (unlikely(ret < 0)) return ret; - /* Match skb->protocol to new outer l3 protocol */ - if (skb->protocol == htons(ETH_P_IP) && - flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6) - skb->protocol = htons(ETH_P_IPV6); - else if (skb->protocol == htons(ETH_P_IPV6) && - flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4) - skb->protocol = htons(ETH_P_IP); + if (decap) { + /* Match skb->protocol to new outer l3 protocol */ + if (skb->protocol == htons(ETH_P_IP) && + flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6) + skb->protocol = htons(ETH_P_IPV6); + else if (skb->protocol == htons(ETH_P_IPV6) && + flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4) + skb->protocol = htons(ETH_P_IP); + + if (skb_valid_dst(skb)) + skb_dst_drop(skb); + } if (skb_is_gso(skb)) { struct skb_shared_info *shinfo = skb_shinfo(skb); diff --git a/tools/testing/selftests/net/nat6to4.sh b/tools/testing/selftests/net/nat6to4.sh new file mode 100755 index 000000000000..0ee859b622a4 --- /dev/null +++ b/tools/testing/selftests/net/nat6to4.sh @@ -0,0 +1,15 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +NS="ns-peer-$(mktemp -u XXXXXX)" + +ip netns add "${NS}" +ip -netns "${NS}" link set lo up +ip -netns "${NS}" route add default via 127.0.0.2 dev lo + +tc -n "${NS}" qdisc add dev lo ingress +tc -n "${NS}" filter add dev lo ingress prio 4 protocol ip \ + bpf object-file nat6to4.bpf.o section schedcls/egress4/snat4 direct-action + +ip netns exec "${NS}" \ + bash -c 'echo 012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789abc | socat - UDP4-DATAGRAM:224.1.0.1:6666,ip-multicast-loop=1' -- 2.49.0

3 weeks, 4 days

3
5
0 0

ATS 2025 Schedule Now Live – Register Today!

by Gustavo Padovan

Hello everyone, The schedule for the Automated Testing Summit (ATS) 2025 is now live! You can now explore the full program and speaker list at: 🔗 https://ats25.sched.com/ This year’s ATS will be packed with talks and discussions focused on scaling test infrastructure, improving collaboration across projects, and pushing the boundaries of automation in the Linux ecosystem. 📍 ATS 2025 will take place as a co-located event at the Open Source Summit North America, on June 26th in Denver, CO. If you haven’t yet registered, you can do so here: 🔗 https://events.linuxfoundation.org/open-source-summit-north-america/feature… You can attend in person or virtually. We look forward to seeing you there! Best regards, The KernelCI Team -- Gustavo Padovan Collabora Ltd.

3 weeks, 5 days

1
0
0 0

[RFC net-next 6/6] selftests: drv-net: add test for RSS on flow label

by Jakub Kicinski

Add a simple test for checking that RSS on flow label works, and that its rejected for IPv4 flows. # ./tools/testing/selftests/drivers/net/hw/rss_flow_label.py TAP version 13 1..2 ok 1 rss_flow_label.test_rss_flow_label ok 2 rss_flow_label.test_rss_flow_label_6only # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0 Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- CC: shuah(a)kernel.org CC: sdf(a)fomichev.me CC: linux-kselftest(a)vger.kernel.org --- .../testing/selftests/drivers/net/hw/Makefile | 1 + .../drivers/net/hw/rss_flow_label.py | 151 ++++++++++++++++++ 2 files changed, 152 insertions(+) create mode 100755 tools/testing/selftests/drivers/net/hw/rss_flow_label.py diff --git a/tools/testing/selftests/drivers/net/hw/Makefile b/tools/testing/selftests/drivers/net/hw/Makefile index df2c047ffa90..56bf1f1b8377 100644 --- a/tools/testing/selftests/drivers/net/hw/Makefile +++ b/tools/testing/selftests/drivers/net/hw/Makefile @@ -17,6 +17,7 @@ TEST_PROGS = \ loopback.sh \ pp_alloc_fail.py \ rss_ctx.py \ + rss_flow_label.py \ rss_input_xfrm.py \ tso.py \ xsk_reconfig.py \ diff --git a/tools/testing/selftests/drivers/net/hw/rss_flow_label.py b/tools/testing/selftests/drivers/net/hw/rss_flow_label.py new file mode 100755 index 000000000000..e471e13160ae --- /dev/null +++ b/tools/testing/selftests/drivers/net/hw/rss_flow_label.py @@ -0,0 +1,151 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 + +""" +Tests for RSS hashing on IPv6 Flow Label. +""" + +import glob +import socket +from lib.py import CmdExitFailure +from lib.py import ksft_run, ksft_exit, ksft_eq, ksft_ge, ksft_in, \ + ksft_not_in, ksft_raises, KsftSkipEx +from lib.py import bkg, cmd, defer, fd_read_timeout, rand_port +from lib.py import NetDrvEpEnv + + +def _ethtool_get_cfg(cfg, fl_type): + descr = cmd(f"ethtool -n {cfg.ifname} rx-flow-hash {fl_type}").stdout + + converter = { + "IP SA": "s", + "IP DA": "d", + "L3 proto": "t", + "L4 bytes 0 & 1 [TCP/UDP src port]": "f", + "L4 bytes 2 & 3 [TCP/UDP dst port]": "n", + "IPv6 Flow Label": "l", + } + + ret = "" + for line in descr.split("\n")[1:-2]: + # if this raises we probably need to add more keys to converter above + ret += converter[line] + return ret + + +def _traffic(cfg, one_sock, one_cpu): + local_port = rand_port(socket.SOCK_DGRAM) + remote_port = rand_port(socket.SOCK_DGRAM) + + sock = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM) + sock.bind(("", local_port)) + sock.connect((cfg.remote_addr_v["6"], 0)) + if one_sock: + send = f"exec 5<>/dev/udp/{cfg.addr_v['6']}/{local_port}; " \ + "for i in `seq 20`; do echo a >&5; sleep 0.02; done; exec 5>&-" + else: + send = "for i in `seq 20`; do echo a | socat -t0.02 - UDP6:" \ + f"[{cfg.addr_v['6']}]:{local_port},sourceport={remote_port}; done" + + cpus = set() + with bkg(send, shell=True, host=cfg.remote, exit_wait=True): + for _ in range(20): + fd_read_timeout(sock.fileno(), 1) + cpu = sock.getsockopt(socket.SOL_SOCKET, socket.SO_INCOMING_CPU) + cpus.add(cpu) + + if one_cpu: + ksft_eq(len(cpus), 1, + f"{one_sock=} - expected one CPU, got traffic on: {cpus=}") + else: + ksft_ge(len(cpus), 2, + f"{one_sock=} - expected many CPUs, got traffic on: {cpus=}") + + +def test_rss_flow_label(cfg): + """ + Test hashing on IPv6 flow label. Send traffic over a single socket + and over multiple sockets. Depend on the remote having auto-label + enabled so that it randomizes the label per socket. + """ + + cfg.require_ipver("6") + cfg.require_cmd("socat", remote=True) + if not hasattr(socket, "SO_INCOMING_CPU"): + raise KsftSkipEx("socket.SO_INCOMING_CPU was added in Python 3.11") + + # 1 is the default, if someone changed it we probably shouldn"t mess with it + af = cmd("cat /proc/sys/net/ipv6/auto_flowlabels", host=cfg.remote).stdout + if af.strip() != "1": + raise KsftSkipEx("Remote does not have auto_flowlabels enabled") + + qcnt = len(glob.glob(f"/sys/class/net/{cfg.ifname}/queues/rx-*")) + if qcnt < 2: + raise KsftSkipEx(f"Local has only {qcnt} queues") + + # Enable flow label hashing for UDP6 + initial = _ethtool_get_cfg(cfg, "udp6") + no_lbl = initial.replace("l", "") + if "l" not in initial: + try: + cmd(f"ethtool -N {cfg.ifname} rx-flow-hash udp6 l{no_lbl}") + except CmdExitFailure as exc: + raise KsftSkipEx("Device doesn't support Flow Label for UDP6") from exc + + defer(cmd, f"ethtool -N {cfg.ifname} rx-flow-hash udp6 {initial}") + + _traffic(cfg, one_sock=True, one_cpu=True) + _traffic(cfg, one_sock=False, one_cpu=False) + + # Disable it, we should see no hashing (reset was already defer()ed) + cmd(f"ethtool -N {cfg.ifname} rx-flow-hash udp6 {no_lbl}") + + _traffic(cfg, one_sock=False, one_cpu=True) + + +def _check_v4_flow_types(cfg): + for fl_type in ["tcp4", "udp4", "ah4", "esp4", "sctp4"]: + try: + cur = cmd(f"ethtool -n {cfg.ifname} rx-flow-hash {fl_type}").stdout + ksft_not_in("Flow Label", cur, + comment=f"{fl_type=} has Flow Label:" + cur) + except CmdExitFailure: + # Probably does not support this flow type + pass + + +def test_rss_flow_label_6only(cfg): + """ + Test interactions with IPv4 flow types. It should not be possible to set + IPv6 Flow Label hashing for an IPv4 flow type. The Flow Label should also + not appear in the IPv4 "current config". + """ + + with ksft_raises(CmdExitFailure) as cm: + cmd(f"ethtool -N {cfg.ifname} rx-flow-hash tcp4 sdfnl") + ksft_in("Invalid argument", cm.exception.cmd.stderr) + + _check_v4_flow_types(cfg) + + # Try to enable Flow Labels and check again, in case it leaks thru + initial = _ethtool_get_cfg(cfg, "udp6") + changed = initial.replace("l", "") if "l" in initial else initial + "l" + + cmd(f"ethtool -N {cfg.ifname} rx-flow-hash udp6 {changed}") + restore = defer(cmd, f"ethtool -N {cfg.ifname} rx-flow-hash udp6 {initial}") + + _check_v4_flow_types(cfg) + restore.exec() + _check_v4_flow_types(cfg) + + +def main() -> None: + with NetDrvEpEnv(__file__, nsim_test=False) as cfg: + ksft_run([test_rss_flow_label, + test_rss_flow_label_6only], + args=(cfg, )) + ksft_exit() + + +if __name__ == "__main__": + main() -- 2.49.0

3 weeks, 5 days

1
0
0 0

[RFC net-next 5/6] selftests: drv-net: import things in lib one by one

by Jakub Kicinski

pylint doesn't understand our path hacks, and it generates a lot of warnings for driver tests. Import what we use one by one, this is hopefully not too tedious and it makes pylint happy. Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- CC: shuah(a)kernel.org CC: mohan.prasad(a)microchip.com CC: sdf(a)fomichev.me CC: dw(a)davidwei.uk CC: linux-kselftest(a)vger.kernel.org --- .../selftests/drivers/net/hw/lib/py/__init__.py | 17 +++++++++++++++++ .../selftests/drivers/net/lib/py/__init__.py | 14 ++++++++++++++ 2 files changed, 31 insertions(+) diff --git a/tools/testing/selftests/drivers/net/hw/lib/py/__init__.py b/tools/testing/selftests/drivers/net/hw/lib/py/__init__.py index b582885786f5..56ff11074b55 100644 --- a/tools/testing/selftests/drivers/net/hw/lib/py/__init__.py +++ b/tools/testing/selftests/drivers/net/hw/lib/py/__init__.py @@ -7,8 +7,25 @@ KSFT_DIR = (Path(__file__).parent / "../../../../..").resolve() try: sys.path.append(KSFT_DIR.as_posix()) + from net.lib.py import * from drivers.net.lib.py import * + + # Import one by one to avoid pylint false positives + from net.lib.py import EthtoolFamily, NetdevFamily, NetshaperFamily, \ + NlError, RtnlFamily + from net.lib.py import CmdExitFailure + from net.lib.py import bkg, cmd, defer, ethtool, fd_read_timeout, ip, \ + rand_port, tool, wait_port_listen + from net.lib.py import fd_read_timeout + from net.lib.py import KsftSkipEx, KsftFailEx, KsftXfailEx + from net.lib.py import ksft_disruptive, ksft_exit, ksft_pr, ksft_run, \ + ksft_setup + from net.lib.py import ksft_eq, ksft_ge, ksft_in, ksft_is, ksft_lt, \ + ksft_ne, ksft_not_in, ksft_raises, ksft_true + from net.lib.py import NetNSEnter + from drivers.net.lib.py import GenerateTraffic + from drivers.net.lib.py import NetDrvEnv, NetDrvEpEnv except ModuleNotFoundError as e: ksft_pr("Failed importing `net` library from kernel sources") ksft_pr(str(e)) diff --git a/tools/testing/selftests/drivers/net/lib/py/__init__.py b/tools/testing/selftests/drivers/net/lib/py/__init__.py index 401e70f7f136..9ed1d8f70524 100644 --- a/tools/testing/selftests/drivers/net/lib/py/__init__.py +++ b/tools/testing/selftests/drivers/net/lib/py/__init__.py @@ -7,7 +7,21 @@ KSFT_DIR = (Path(__file__).parent / "../../../..").resolve() try: sys.path.append(KSFT_DIR.as_posix()) + from net.lib.py import * + + # Import one by one to avoid pylint false positives + from net.lib.py import EthtoolFamily, NetdevFamily, NetshaperFamily, \ + NlError, RtnlFamily + from net.lib.py import CmdExitFailure + from net.lib.py import bkg, cmd, defer, ethtool, fd_read_timeout, ip, \ + rand_port, tool, wait_port_listen + from net.lib.py import fd_read_timeout + from net.lib.py import KsftSkipEx, KsftFailEx, KsftXfailEx + from net.lib.py import ksft_disruptive, ksft_exit, ksft_pr, ksft_run, \ + ksft_setup + from net.lib.py import ksft_eq, ksft_ge, ksft_in, ksft_is, ksft_lt, \ + ksft_ne, ksft_not_in, ksft_raises, ksft_true except ModuleNotFoundError as e: ksft_pr("Failed importing `net` library from kernel sources") ksft_pr(str(e)) -- 2.49.0

3 weeks, 5 days

1
0
0 0

[PATCH] selftests: ir_decoder: Convert header comment to proper multi-line block

by Abdelrahman Fekry

The test file for the IR decoder used single-line comments at the top to document its purpose and licensing, which is inconsistent with the style used throughout the Linux kernel. in this patch i converted the file header to a proper multi-line comment block (/*) that aligns with standard kernel practices. This improves readability, consistency across selftests, and ensures the license and documentation are clearly visible in a familiar format. No functional changes have been made. Signed-off-by: Abdelrahman Fekry <Abdelrahmanfekry375(a)gmail.com> --- tools/testing/selftests/ir/ir_loopback.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/tools/testing/selftests/ir/ir_loopback.c b/tools/testing/selftests/ir/ir_loopback.c index f4a15cbdd5ea..2de4a6296f35 100644 --- a/tools/testing/selftests/ir/ir_loopback.c +++ b/tools/testing/selftests/ir/ir_loopback.c @@ -1,14 +1,15 @@ // SPDX-License-Identifier: GPL-2.0 -// test ir decoder -// -// Copyright (C) 2018 Sean Young <sean(a)mess.org> - -// When sending LIRC_MODE_SCANCODE, the IR will be encoded. rc-loopback -// will send this IR to the receiver side, where we try to read the decoded -// IR. Decoding happens in a separate kernel thread, so we will need to -// wait until that is scheduled, hence we use poll to check for read -// readiness. - +/* Copyright (C) 2018 Sean Young <sean(a)mess.org> + * + * Selftest for IR decoder + * + * + * When sending LIRC_MODE_SCANCODE, the IR will be encoded. rc-loopback + * will send this IR to the receiver side, where we try to read the decoded + * IR. Decoding happens in a separate kernel thread, so we will need to + * wait until that is scheduled, hence we use poll to check for read + * readiness. +*/ #include <linux/lirc.h> #include <errno.h> #include <stdio.h> -- 2.25.1

3 weeks, 5 days

3
3
0 0

[PATCH v6 0/2] x86/fred: Prevent immediate repeat of single step trap on return from SIGTRAP handler

by Xin Li (Intel)

IDT event delivery has a debug hole in which it does not generate #DB upon returning to userspace before the first userspace instruction is executed if the Trap Flag (TF) is set. FRED closes this hole by introducing a software event flag, i.e., bit 17 of the augmented SS: if the bit is set and ERETU would result in RFLAGS.TF = 1, a single-step trap will be pending upon completion of ERETU. However I overlooked properly setting and clearing the bit in different situations. Thus when FRED is enabled, if the Trap Flag (TF) is set without an external debugger attached, it can lead to an infinite loop in the SIGTRAP handler. To avoid this, the software event flag in the augmented SS must be cleared, ensuring that no single-step trap remains pending when ERETU completes. This patch set combines the fix [1] and its corresponding selftest [2] (requested by Dave Hansen) into one patch set. [1] https://lore.kernel.org/lkml/20250523050153.3308237-1-xin@zytor.com/ [2] https://lore.kernel.org/lkml/20250530230707.2528916-1-xin@zytor.com/ This patch set is based on tip/x86/urgent branch. Link to v5 of this patch set: https://lore.kernel.org/lkml/20250606174528.1004756-1-xin@zytor.com/ Changes in v6: *) Replace a "sub $128, %rsp" with "add $-128, %rsp" (hpa). *) Declared loop_count_on_same_ip inside sigtrap() (Sohil). *) s/sigtrap/SIGTRAP (Sohil). *) Add TB from Sohil to the first patch. Xin Li (Intel) (2): x86/fred/signal: Prevent immediate repeat of single step trap on return from SIGTRAP handler selftests/x86: Add a test to detect infinite SIGTRAP handler loop arch/x86/include/asm/sighandling.h | 22 +++++ arch/x86/kernel/signal_32.c | 4 + arch/x86/kernel/signal_64.c | 4 + tools/testing/selftests/x86/Makefile | 2 +- tools/testing/selftests/x86/sigtrap_loop.c | 101 +++++++++++++++++++++ 5 files changed, 132 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/x86/sigtrap_loop.c base-commit: dd2922dcfaa3296846265e113309e5f7f138839f -- 2.49.0

3 weeks, 5 days

2
3
0 0

[PATCH 0/2] kselftest/arm64: Add coverage for the interaction of vfork() and GCS

by Mark Brown

I had cause to look at the vfork() support for GCS and realised that we don't have any direct test coverage, this series does so by adding vfork() to nolibc and then using that in basic-gcs to provide some simple vfork() coverage. Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Mark Brown (2): tools/nolibc: Provide vfork() kselftest/arm64: Add a test for vfork() with GCS tools/include/nolibc/sys.h | 29 ++++++++++++ tools/testing/selftests/arm64/gcs/basic-gcs.c | 63 +++++++++++++++++++++++++++ 2 files changed, 92 insertions(+) --- base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494 change-id: 20250528-arm64-gcs-vfork-exit-4a7daf7652ee Best regards, -- Mark Brown <broonie(a)kernel.org>

3 weeks, 5 days

2
7
0 0

[PATCH v2] selftests/bpf: Validate UDP length in cls_redirect test

by Suchit Karunakaran

From: Suchit <suchitkarunakaran(a)gmail.com> Add validation step to ensure that the UDP payload is long enough to contain the expected GUE and UNIGUE encapsulation headers Signed-off-by: Suchit <suchitkarunakaran(a)gmail.com> --- Changes since v2: - Rebase tools/testing/selftests/bpf/progs/test_cls_redirect.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/progs/test_cls_redirect.c b/tools/testing/selftests/bpf/progs/test_cls_redirect.c index f344c6835e84..c1d2eaee2e77 100644 --- a/tools/testing/selftests/bpf/progs/test_cls_redirect.c +++ b/tools/testing/selftests/bpf/progs/test_cls_redirect.c @@ -978,7 +978,14 @@ int cls_redirect(struct __sk_buff *skb) return TC_ACT_OK; } - /* TODO Check UDP length? */ + uint16_t udp_len = bpf_ntohs(encap->udp.len); + uint16_t min_encap_len = sizeof(encap->udp) + sizeof(encap->gue) + sizeof(encap->unigue); + + if (udp_len < min_encap_len) { + metrics->errors_total_malformed_encapsulation++; + return TC_ACT_SHOT; + } + if (encap->udp.dest != ENCAPSULATION_PORT) { return TC_ACT_OK; } -- 2.49.0

4 weeks

3
2
1 0

[PATCH v5 0/2] x86/fred: Prevent immediate repeat of single step trap on return from SIGTRAP handler

by Xin Li (Intel)

IDT event delivery has a debug hole in which it does not generate #DB upon returning to userspace before the first userspace instruction is executed if the Trap Flag (TF) is set. FRED closes this hole by introducing a software event flag, i.e., bit 17 of the augmented SS: if the bit is set and ERETU would result in RFLAGS.TF = 1, a single-step trap will be pending upon completion of ERETU. However I overlooked properly setting and clearing the bit in different situations. Thus when FRED is enabled, if the Trap Flag (TF) is set without an external debugger attached, it can lead to an infinite loop in the SIGTRAP handler. To avoid this, the software event flag in the augmented SS must be cleared, ensuring that no single-step trap remains pending when ERETU completes. This patch set combines the fix [1] and its corresponding selftest [2] (requested by Dave Hansen) into one patch set. [1] https://lore.kernel.org/lkml/20250523050153.3308237-1-xin@zytor.com/ [2] https://lore.kernel.org/lkml/20250530230707.2528916-1-xin@zytor.com/ This patch set is based on tip/x86/urgent branch as of today. Link to v4 of this patch set: https://lore.kernel.org/lkml/20250605181020.590459-1-xin@zytor.com/ Changes in v5: *) Accurately rephrase the shortlog (hpa). *) Do "sub $-128, %rsp" rather than "add $128, %rsp", which is more efficient in code size (hpa). *) Add TB from Sohil. *) Add Cc: stable(a)vger.kernel.org to all patches. Xin Li (Intel) (2): x86/fred/signal: Prevent immediate repeat of single step trap on return from SIGTRAP handler selftests/x86: Add a test to detect infinite sigtrap handler loop arch/x86/include/asm/sighandling.h | 22 +++++ arch/x86/kernel/signal_32.c | 4 + arch/x86/kernel/signal_64.c | 4 + tools/testing/selftests/x86/Makefile | 2 +- tools/testing/selftests/x86/sigtrap_loop.c | 98 ++++++++++++++++++++++ 5 files changed, 129 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/x86/sigtrap_loop.c base-commit: dd2922dcfaa3296846265e113309e5f7f138839f -- 2.49.0

4 weeks

3
4
0 0

[PATCH RFC bpf-next 0/4] bpf, arm64: support up to 12 arguments

by Alexis Lothoré (eBPF Foundation)

Hello, this series is a revival of Xu Kuhoai's work to enable larger arguments count for BPF programs on ARM64 ([1]). His initial series received some positive feedback, but lacked some specific case handling around arguments alignment (see AAPCS64 C.14 rule in section 6.8.2, [2]). There as been another attempt from Puranjay Mohan, which was unfortunately missing the same thing ([3]). Since there has been some time between those series and this new one, I chose to send it as a new series rather than a new revision of the existing series. To support the increased argument counts and arguments larger than registers size (eg: structures), the trampoline does the following: - for bpf programs: arguments are retrieved from both registers and the function stack, and pushed in the trampoline stack as an array of u64 to generate the programs context. It is then passed by pointer to the bpf programs - when the trampoline is in charge of calling the original function: it restores the registers content, and generates a new stack layout for the additional arguments that do not fit in registers. This new attempt is based on Xu's series and aims to handle the missing alignment concern raised in the reviews discussions. The main novelties are then around arguments alignments: - the first commit is exposing some new info in the BTF function model passed to the JIT compiler to allow it to deduce the needed alignment when configuring the trampoline stack - the second commit is taken from Xu's series, and received the following modifications: - the calc_aux_args computes an expected alignment for each argument - the calc_aux_args computes two different stack space sizes: the one needed to store the bpf programs context, and the original function stacked arguments (which needs alignment). Those stack sizes are in bytes instead of "slots" - when saving/restoring arguments for bpf program or for the original function, make sure to align the load/store accordingly, when relevant - a few typos fixes and some rewording, raised by the review on the original series - the last commit introduces some explicit tests that ensure that the needed alignment is enforced by the trampoline I marked the series as RFC because it appears that the new tests trigger some failures in CI on x86 and s390, despite the series not touching any code related to those architectures. Some very early investigation/gdb debugging on the x86 side seems to hint that it could be related to the same missing alignment too (based on section 3.2.3 in [4], and so the x86 trampoline would need the same alignment handling ?). For s390 it looks less clear, as all values captured from the bpf test program are set to 0 in the CI output, and I don't have the proper setup yet to check the low level details. I am tempted to isolate those new tests (which were actually useful to spot real issues while tuning the ARM64 trampoline) and add them to the relevant DENYLIST files for x86/s390, but I guess this is not the right direction, so I would gladly take a second opinion on this. [1] https://lore.kernel.org/all/20230917150752.69612-1-xukuohai@huaweicloud.com… [2] https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#id82 [3] https://lore.kernel.org/bpf/20240705125336.46820-1-puranjay@kernel.org/ [4] https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com> --- Alexis Lothoré (eBPF Foundation) (3): bpf: add struct largest member size in func model bpf/selftests: add tests to validate proper arguments alignment on ARM64 bpf/selftests: enable tracing tests for ARM64 Xu Kuohai (1): bpf, arm64: Support up to 12 function arguments arch/arm64/net/bpf_jit_comp.c | 235 ++++++++++++++++----- include/linux/bpf.h | 1 + kernel/bpf/btf.c | 25 +++ tools/testing/selftests/bpf/DENYLIST.aarch64 | 3 - .../selftests/bpf/prog_tests/tracing_struct.c | 23 ++ tools/testing/selftests/bpf/progs/tracing_struct.c | 10 +- .../selftests/bpf/progs/tracing_struct_many_args.c | 67 ++++++ .../testing/selftests/bpf/test_kmods/bpf_testmod.c | 50 +++++ 8 files changed, 357 insertions(+), 57 deletions(-) --- base-commit: 91e7eb701b4bc389e7ddfd80ef6e82d1a6d2d368 change-id: 20250220-many_args_arm64-8bd3747e6948 Best regards, -- Alexis Lothoré, Bootlin Embedded Linux and Kernel engineering https://bootlin.com

4 weeks, 1 day

8
31
0 0

[PATCH v2 00/15] Consolidate iommu page table implementations (AMD)

by Jason Gunthorpe

Currently each of the iommu page table formats duplicates all of the logic to maintain the page table and perform map/unmap/etc operations. There are several different versions of the algorithms between all the different formats. The io-pgtable system provides an interface to help isolate the page table code from the iommu driver, but doesn't provide tools to implement the common algorithms. This makes it very hard to improve the state of the pagetable code under the iommu domains as any proposed improvement needs to alter a large number of different driver code paths. Combined with a lack of software based testing this makes improvement in this area very hard. iommufd wants several new page table operations: - More efficient map/unmap operations, using iommufd's batching logic - unmap that returns the physical addresses into a batch as it progresses - cut that allows splitting areas so large pages can have holes poked in them dynamically (ie guestmemfd hitless shared/private transitions) - More agressive freeing of table memory to avoid waste - Fragmenting large pages so that dirty tracking can be more granular - Reassembling large pages so that VMs can run at full IO performance in migration/dirty tracking error flows - KHO integration for kernel live upgrade Together these are algorithmically complex enough to be a very significant task to go and implement in all the page table formats we support. Just the "server" focused drivers use almost all the formats (ARMv8 S1&S2 / x86 PAE / AMDv1 / VT-D SS / RISCV) Instead of doing the duplicated work, this series takes the first step to consolidate the algorithms into one places. In spirit it is similar to the work Christoph did a few years back to pull the redundant get_user_pages() implementations out of the arch code into core MM. This unlocked a great deal of improvement in that space in the following years. I would like to see the same benefit in iommu as well. My first RFC showed a bigger picture with all most all formats and more algorithms. This series reorganizes that to be narrowly focused on just enough to convert the AMD driver to use the new mechanism. kunit tests are provided that allow good testing of the algorithms and all formats on x86, nothing is arch specific. AMD is one of the simpler options as the HW is quite uniform with few different options/bugs while still requiring the complicated contiguous pages support. The HW also has a very simple range based invalidation approach that is easy to implement. The AMD v1 and AMD v2 page table formats are implemented bit for bit identical to the current code, tested using a compare kunit test that checks against the io-pgtable version (on github, see below). Updating the AMD driver to replace the io-pgtable layer with the new stuff is fairly straightforward now. The layering is fixed up in the new version so that all the invalidation goes through function pointers. Several small fixing patches have come out of this as I've been fixing the problems that the test suite uncovers in the current code, and implementing the fixed version in iommupt. On performance, there is a quite wide variety of implementation designs across all the drivers. Looking at some key performance across the main formats: iommu_map(): pgsz ,avg new,old ns, min new,old ns , min % (+ve is better) 2^12, 53,66 , 51,63 , 19.19 (AMDV1) 256*2^12, 386,1909 , 367,1795 , 79.79 256*2^21, 362,1633 , 355,1556 , 77.77 2^12, 56,62 , 52,59 , 11.11 (AMDv2) 256*2^12, 405,1355 , 357,1292 , 72.72 256*2^21, 393,1160 , 358,1114 , 67.67 2^12, 55,65 , 53,62 , 14.14 (VTD second stage) 256*2^12, 391,518 , 332,512 , 35.35 256*2^21, 383,635 , 336,624 , 46.46 2^12, 57,65 , 55,63 , 12.12 (ARM 64 bit) 256*2^12, 380,389 , 361,369 , 2.02 256*2^21, 358,419 , 345,400 , 13.13 iommu_unmap(): pgsz ,avg new,old ns, min new,old ns , min % (+ve is better) 2^12, 69,88 , 65,85 , 23.23 (AMDv1) 256*2^12, 353,6498 , 331,6029 , 94.94 256*2^21, 373,6014 , 360,5706 , 93.93 2^12, 71,72 , 66,69 , 4.04 (AMDv2) 256*2^12, 228,891 , 206,871 , 76.76 256*2^21, 254,721 , 245,711 , 65.65 2^12, 69,87 , 65,82 , 20.20 (VTD second stage) 256*2^12, 210,321 , 200,315 , 36.36 256*2^21, 255,349 , 238,342 , 30.30 2^12, 72,77 , 68,74 , 8.08 (ARM 64 bit) 256*2^12, 521,357 , 447,346 , -29.29 256*2^21, 489,358 , 433,345 , -25.25 * Above numbers include additional patches to remove the iommu_pgsize() overheads. gcc 13.3.0, i7-12700 This version provides fairly consistent performance across formats. ARM unmap performance is quite different because this version supports contiguous pages and uses a very different algorithm for unmapping. Though why it is so worse compared to AMDv1 I haven't figured out yet. The per-format commits include a more detailed chart. There is a second branch: https://github.com/jgunthorpe/linux/commits/iommu_pt_all Containing supporting work and future steps: - ARM short descriptor (32 bit), ARM long descriptor (64 bit) formats - VT-D second stage format - DART v1 & v2 format - Draft of a iommufd 'cut' operation to break down huge pages - Draft of support for a DMA incoherent HW page table walker - A compare test that checks the iommupt formats against the iopgtable interface, including updating AMD to have a working iopgtable and patches to make VT-D have an iopgtable for testing. - A performance test to micro-benchmark map and unmap against iogptable My strategy is to go one by one for the drivers: - AMD driver conversion - RISCV page table and driver - Intel VT-D driver and VTDSS page table - ARM SMMUv3 And concurrently work on the algorithm side: - debugfs content dump, like VT-D has - Cut support - Increase/Decrease page size support - map/unmap batching - KHO As we make more algorithm improvements the value to convert the drivers increases. This is on github: https://github.com/jgunthorpe/linux/commits/iommu_pt v1: - AMD driver only, many code changes RFC: https://lore.kernel.org/all/0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com/ Alejandro Jimenez (1): iommu/amd: Use the generic iommu page table Jason Gunthorpe (14): genpt: Generic Page Table base API genpt: Add Documentation/ files iommupt: Add the basic structure of the iommu implementation iommupt: Add the AMD IOMMU v1 page table format iommupt: Add iova_to_phys op iommupt: Add unmap_pages op iommupt: Add map_pages op iommupt: Add read_and_clear_dirty op iommupt: Add a kunit test for Generic Page Table iommupt: Add a mock pagetable format for iommufd selftest to use iommufd: Change the selftest to use iommupt instead of xarray iommupt: Add the x86 64 bit page table format iommu/amd: Remove AMD io_pgtable support iommupt: Add a kunit test for the IOMMU implementation .clang-format | 1 + Documentation/driver-api/generic_pt.rst | 105 ++ Documentation/driver-api/index.rst | 1 + drivers/iommu/Kconfig | 2 + drivers/iommu/Makefile | 1 + drivers/iommu/amd/Kconfig | 5 +- drivers/iommu/amd/Makefile | 2 +- drivers/iommu/amd/amd_iommu.h | 1 - drivers/iommu/amd/amd_iommu_types.h | 109 +- drivers/iommu/amd/io_pgtable.c | 560 -------- drivers/iommu/amd/io_pgtable_v2.c | 370 ------ drivers/iommu/amd/iommu.c | 493 ++++--- drivers/iommu/generic_pt/.kunitconfig | 13 + drivers/iommu/generic_pt/Kconfig | 72 ++ drivers/iommu/generic_pt/fmt/Makefile | 26 + drivers/iommu/generic_pt/fmt/amdv1.h | 407 ++++++ drivers/iommu/generic_pt/fmt/defs_amdv1.h | 21 + drivers/iommu/generic_pt/fmt/defs_x86_64.h | 21 + drivers/iommu/generic_pt/fmt/iommu_amdv1.c | 15 + drivers/iommu/generic_pt/fmt/iommu_mock.c | 10 + drivers/iommu/generic_pt/fmt/iommu_template.h | 48 + drivers/iommu/generic_pt/fmt/iommu_x86_64.c | 12 + drivers/iommu/generic_pt/fmt/x86_64.h | 241 ++++ drivers/iommu/generic_pt/iommu_pt.h | 1146 +++++++++++++++++ drivers/iommu/generic_pt/kunit_generic_pt.h | 721 +++++++++++ drivers/iommu/generic_pt/kunit_iommu.h | 183 +++ drivers/iommu/generic_pt/kunit_iommu_pt.h | 451 +++++++ drivers/iommu/generic_pt/pt_common.h | 351 +++++ drivers/iommu/generic_pt/pt_defs.h | 312 +++++ drivers/iommu/generic_pt/pt_fmt_defaults.h | 193 +++ drivers/iommu/generic_pt/pt_iter.h | 638 +++++++++ drivers/iommu/generic_pt/pt_log2.h | 130 ++ drivers/iommu/io-pgtable.c | 4 - drivers/iommu/iommufd/Kconfig | 1 + drivers/iommu/iommufd/iommufd_test.h | 11 +- drivers/iommu/iommufd/selftest.c | 439 +++---- include/linux/generic_pt/common.h | 166 +++ include/linux/generic_pt/iommu.h | 264 ++++ include/linux/io-pgtable.h | 2 - tools/testing/selftests/iommu/iommufd.c | 60 +- tools/testing/selftests/iommu/iommufd_utils.h | 12 + 41 files changed, 6046 insertions(+), 1574 deletions(-) create mode 100644 Documentation/driver-api/generic_pt.rst delete mode 100644 drivers/iommu/amd/io_pgtable.c delete mode 100644 drivers/iommu/amd/io_pgtable_v2.c create mode 100644 drivers/iommu/generic_pt/.kunitconfig create mode 100644 drivers/iommu/generic_pt/Kconfig create mode 100644 drivers/iommu/generic_pt/fmt/Makefile create mode 100644 drivers/iommu/generic_pt/fmt/amdv1.h create mode 100644 drivers/iommu/generic_pt/fmt/defs_amdv1.h create mode 100644 drivers/iommu/generic_pt/fmt/defs_x86_64.h create mode 100644 drivers/iommu/generic_pt/fmt/iommu_amdv1.c create mode 100644 drivers/iommu/generic_pt/fmt/iommu_mock.c create mode 100644 drivers/iommu/generic_pt/fmt/iommu_template.h create mode 100644 drivers/iommu/generic_pt/fmt/iommu_x86_64.c create mode 100644 drivers/iommu/generic_pt/fmt/x86_64.h create mode 100644 drivers/iommu/generic_pt/iommu_pt.h create mode 100644 drivers/iommu/generic_pt/kunit_generic_pt.h create mode 100644 drivers/iommu/generic_pt/kunit_iommu.h create mode 100644 drivers/iommu/generic_pt/kunit_iommu_pt.h create mode 100644 drivers/iommu/generic_pt/pt_common.h create mode 100644 drivers/iommu/generic_pt/pt_defs.h create mode 100644 drivers/iommu/generic_pt/pt_fmt_defaults.h create mode 100644 drivers/iommu/generic_pt/pt_iter.h create mode 100644 drivers/iommu/generic_pt/pt_log2.h create mode 100644 include/linux/generic_pt/common.h create mode 100644 include/linux/generic_pt/iommu.h base-commit: db37090502f67e46541e53b91f00bbd565c96bd0 -- 2.43.0

4 weeks, 1 day

8
37
0 0

[PATCH] selftests/mm: Skip failed memfd setups in gup_longterm

by Mark Brown

Unlike the other cases gup_longterm's memfd tests previously skipped the test when failing to set up the file descriptor to test, restore this behaviour. Signed-off-by: Mark Brown <broonie(a)kernel.org> --- tools/testing/selftests/mm/gup_longterm.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/mm/gup_longterm.c b/tools/testing/selftests/mm/gup_longterm.c index 8a97ac5176a4..29047d2e0c49 100644 --- a/tools/testing/selftests/mm/gup_longterm.c +++ b/tools/testing/selftests/mm/gup_longterm.c @@ -298,8 +298,11 @@ static void run_with_memfd(test_fn fn, const char *desc) log_test_start("%s ... with memfd", desc); fd = memfd_create("test", 0); - if (fd < 0) + if (fd < 0) { ksft_print_msg("memfd_create() failed (%s)\n", strerror(errno)); + log_test_result(KSFT_SKIP); + return; + } fn(fd, pagesize); close(fd); @@ -366,6 +369,8 @@ static void run_with_memfd_hugetlb(test_fn fn, const char *desc, fd = memfd_create("test", flags); if (fd < 0) { ksft_print_msg("memfd_create() failed (%s)\n", strerror(errno)); + log_test_result(KSFT_SKIP); + return; } fn(fd, hugetlbsize); --- base-commit: ec7714e4947909190ffb3041a03311a975350fe0 change-id: 20250603-selftest-mm-gup-longterm-tweaks-e685a8ae9751 Best regards, -- Mark Brown <broonie(a)kernel.org>

4 weeks, 1 day

4
3
0 0

[PATCH v5 0/5] kunit: Add support for suppressing warning backtraces

by Alessandro Carminati

Some unit tests intentionally trigger warning backtraces by passing bad parameters to kernel API functions. Such unit tests typically check the return value from such calls, not the existence of the warning backtrace. Such intentionally generated warning backtraces are neither desirable nor useful for a number of reasons: - They can result in overlooked real problems. - A warning that suddenly starts to show up in unit tests needs to be investigated and has to be marked to be ignored, for example by adjusting filter scripts. Such filters are ad hoc because there is no real standard format for warnings. On top of that, such filter scripts would require constant maintenance. One option to address the problem would be to add messages such as "expected warning backtraces start/end here" to the kernel log. However, that would again require filter scripts, might result in missing real problematic warning backtraces triggered while the test is running, and the irrelevant backtrace(s) would still clog the kernel log. Solve the problem by providing a means to identify and suppress specific warning backtraces while executing test code. Support suppressing multiple backtraces while at the same time limiting changes to generic code to the absolute minimum. Overview: Patch#1 Introduces the suppression infrastructure. Patch#2 Mitigate the impact at WARN*() sites. Patch#3 Adds selftests to validate the functionality. Patch#4 Demonstrates real-world usage in the DRM subsystem. Patch#5 Documents the new API and usage guidelines. Design Notes: The objective is to suppress unwanted WARN*() generated messages. Although most major architectures share common bug handling via `lib/bug.c` and `report_bug()`, some minor or legacy architectures still rely on their own platform-specific handling. This divergence must be considered in any such feature. Additionally, a key challenge in implementing this feature is the fragmentation of `WARN*()` messages emission: specific part in the macro, common with BUG*() part in the exception handler. As a result, any intervention to suppress the message must occur before the illegal instruction. Lessons from the Previous Attempt In earlier iterations, suppression logic was added inside the `__report_bug()` function to intercept WARN*() messages not producing messages in the macro. To implement the check in the check in the bug handler code, two strategies were considered: * Strategy #1: Use `kallsyms` to infer the originating functionid, namely a pointer to the function. Since in any case, the user interface relies on function names, they must be translated in addresses at suppression- time or at check-time. Assuming to translate at suppression-time, the `kallsyms` subsystem needs to be used to determine the symbol address from the name, and again to produce the functionid from `bugaddr`. This approach proved unreliable due to compiler-induced transformations such as inlining, cloning, and code fragmentation. Attempts to preventing them is also unconvenient because several `WARN()` sites are in functions intentionally declared as `__always_inline`. * Strategy #2: Store function name `__func__` in `struct bug_entry` in the `__bug_table`. This implementation was used in the previous version. However, `__func__` is a compiler-generated symbol, which complicates relocation and linking in position-independent code. Workarounds such as storing offsets from `.rodata` or embedding string literals directly into the table would have significantly either increased complexity or increase the __bug_table size. Additionally, architectures not using the unified `BUG()` path would still require ad-hoc handling. Because current WARN*() message production strategy, a few WARN*() macros still need a check to suppress the part of the message produced in the macro itself. Current Proposal: Check Directly in the `WARN()` Macros. This avoids the need for function symbol resolution or ELF section modification. Suppression is implemented directly in the `WARN*()` macros. A helper function, `__kunit_is_suppressed_warning()`, is used to determine whether suppression applies. It is marked as `noinstr`, since some `WARN*()` sites reside in non-instrumentable sections. As it uses `strcmp`, a `noinstr` version of `strcmp` was introduced. The implementation is deliberately simple and avoids architecture-specific optimizations to preserve portability. Since this mechanism compares function names and is intended for test usage only, performance is not a primary concern. This series is based on the RFC patch and subsequent discussion at https://patchwork.kernel.org/project/linux-kselftest/patch/02546e59-1afe-4b… and offers a more comprehensive solution of the problem discussed there. Changes since RFC: - Introduced CONFIG_KUNIT_SUPPRESS_BACKTRACE - Minor cleanups and bug fixes - Added support for all affected architectures - Added support for counting suppressed warnings - Added unit tests using those counters - Added patch to suppress warning backtraces in dev_addr_lists tests Changes since v1: - Rebased to v6.9-rc1 - Added Tested-by:, Acked-by:, and Reviewed-by: tags [I retained those tags since there have been no functional changes] - Introduced KUNIT_SUPPRESS_BACKTRACE configuration option, enabled by default. Changes since v2: - Rebased to v6.9-rc2 - Added comments to drm warning suppression explaining why it is needed. - Added patch to move conditional code in arch/sh/include/asm/bug.h to avoid kerneldoc warning - Added architecture maintainers to Cc: for architecture specific patches - No functional changes Changes since v3: - Rebased to v6.14-rc6 - Dropped net: "kunit: Suppress lock warning noise at end of dev_addr_lists tests" since 3db3b62955cd6d73afde05a17d7e8e106695c3b9 - Added __kunit_ and KUNIT_ prefixes. - Tested on interessed architectures. Changes since v4: - Rebased to v6.15-rc7 - Dropped all code in __report_bug() - Moved all checks in WARN*() macros. - Dropped all architecture specific code. - Made __kunit_is_suppressed_warning nice to noinstr functions. Alessandro Carminati (2): bug/kunit: Core support for suppressing warning backtraces bug/kunit: Suppressing warning backtraces reduced impact on WARN*() sites Guenter Roeck (3): Add unit tests to verify that warning backtrace suppression works. drm: Suppress intentional warning backtraces in scaling unit tests kunit: Add documentation for warning backtrace suppression API Documentation/dev-tools/kunit/usage.rst | 30 ++++++- drivers/gpu/drm/tests/drm_rect_test.c | 16 ++++ include/asm-generic/bug.h | 48 +++++++---- include/kunit/bug.h | 62 ++++++++++++++ include/kunit/test.h | 1 + lib/kunit/Kconfig | 9 ++ lib/kunit/Makefile | 9 +- lib/kunit/backtrace-suppression-test.c | 105 ++++++++++++++++++++++++ lib/kunit/bug.c | 54 ++++++++++++ 9 files changed, 316 insertions(+), 18 deletions(-) create mode 100644 include/kunit/bug.h create mode 100644 lib/kunit/backtrace-suppression-test.c create mode 100644 lib/kunit/bug.c -- 2.34.1

4 weeks, 1 day

7
29
0 0

[PATCH v5 00/29] iommufd: Add vIOMMU infrastructure (Part-4 HW QUEUE)

by Nicolin Chen

The vIOMMU object is designed to represent a slice of an IOMMU HW for its virtualization features shared with or passed to user space (a VM mostly) in a way of HW acceleration. This extended the HWPT-based design for more advanced virtualization feature. HW QUEUE introduced by this series as a part of the vIOMMU infrastructure represents a HW accelerated queue/buffer for VM to use exclusively, e.g. - NVIDIA's Virtual Command Queue - AMD vIOMMU's Command Buffer, Event Log Buffer, and PPR Log Buffer each of which allows its IOMMU HW to directly access a queue memory owned by a guest VM and allows a guest OS to control the HW queue direclty, to avoid VM Exit overheads to improve the performance. Introduce IOMMUFD_OBJ_HW_QUEUE and its pairing IOMMUFD_CMD_HW_QUEUE_ALLOC allowing VMM to forward the IOMMU-specific queue info, such as queue base address, size, and etc. Meanwhile, a guest-owned queue needs the guest kernel to control the queue by reading/writing its consumer and producer indexes, via MMIO acceses to the hardware MMIO registers. Introduce an mmap infrastructure for iommufd to support passing through a piece of MMIO region from the host physical address space to the guest physical address space. The mmap info (offset/ length) used by an mmap syscall must be pre-allocated and returned to the user space via an output driver-data during an IOMMUFD_CMD_HW_QUEUE_ALLOC call. Thus, it requires a driver-specific user data support in the vIOMMU allocation flow. As a real-world use case, this series implements a HW QUEUE support in the tegra241-cmdqv driver for VCMDQs on NVIDIA Grace CPU. In another word, it is also the Tegra CMDQV series Part-2 (user-space support), reworked from Previous RFCv1: https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/ This enables the HW accelerated feature for NVIDIA Grace CPU. Compared to the standard SMMUv3 operating in the nested translation mode trapping CMDQ for TLBI and ATC_INV commands, this gives a huge performance improvement: 70% to 90% reductions of invalidation time were measured by various DMA unmap tests running in a guest OS. // Unmap latencies from "dma_map_benchmark -g @granule -t @threads", // by toggling "/sys/kernel/debug/iommu/tegra241_cmdqv/bypass_vcmdq" @granule | @threads | bypass_vcmdq=1 | bypass_vcmdq=0 4KB 1 35.7 us 5.3 us 16KB 1 41.8 us 6.8 us 64KB 1 68.9 us 9.9 us 128KB 1 109.0 us 12.6 us 256KB 1 187.1 us 18.0 us 4KB 2 96.9 us 6.8 us 16KB 2 97.8 us 7.5 us 64KB 2 151.5 us 10.7 us 128KB 2 257.8 us 12.7 us 256KB 2 443.0 us 17.9 us This is on Github: https://github.com/nicolinc/iommufd/commits/iommufd_hw_queue-v5 Paring QEMU branch for testing: https://github.com/nicolinc/qemu/commits/wip/for_iommufd_hw_queue-v5 Changelog v5 * Rebase on v6.15-rc6 * Add Reviewed-by from Jason and Kevin * Correct typos in kdoc and update commit logs * [iommufd] Add a cosmetic fix * [iommufd] Drop unused num_pfns * [iommufd] Drop unnecessary check * [iommufd] Reorder patch sequence * [iommufd] Use io_remap_pfn_range() * [iommufd] Use success oriented flow * [iommufd] Fix max_npages calculation * [iommufd] Add more selftest coverage * [iommufd] Drop redundant static_assert * [iommufd] Fix mmap pfn range validation * [iommufd] Reject unmap on pinned iovas * [iommufd] Drop redundant vm_flags_set() * [iommufd] Drop iommufd_struct_destroy() * [iommufd] Drop redundant queue iova test * [iommufd] Use "mmio_addr" and "mmio_pfn" * [iommufd] Rename to "nesting_parent_iova" * [iommufd] Make iopt_pin_pages call option * [iommufd] Add ictx comparison in depend() * [iommufd] Add iommufd_object_alloc_ucmd() * [iommufd] Move kcalloc() after validations * [iommufd] Replace ictx setting with WARN_ON * [iommufd] Make hw_info's type bidirectional * [smmu] Add supported_vsmmu_type in impl_ops * [smmu] Drop impl report in smmu vendor struct * [tegra] Add IOMMU_HW_INFO_TYPE_TEGRA241_CMDQV * [tegra] Replace "number of VINTFs" with a note * [tegra] Drop the redundant lvcmdq pointer setting * [tegra] Flag IOMMUFD_VIOMMU_FLAG_HW_QUEUE_READS_PA * [tegra] Use "vintf_alloc_vsid" for vdevice_alloc op v4 https://lore.kernel.org/all/cover.1746757630.git.nicolinc@nvidia.com/ * Rebase on v6.15-rc5 * Add Reviewed-by from Vasant * Rename "vQUEUE" to "HW QUEUE" * Use "offset" and "length" for all mmap-related variables * [iommufd] Use u64 for guest PA * [iommufd] Fix typo in uAPI doc * [iommufd] Rename immap_id to offset * [iommufd] Drop the partial-size mmap support * [iommufd] Do not replace WARN_ON with WARN_ON_ONCE * [iommufd] Use "u64 base_addr" for queue base address * [iommufd] Use u64 base_pfn/num_pfns for immap structure * [iommufd] Correct the size passed in to mtree_alloc_range() * [iommufd] Add IOMMUFD_VIOMMU_FLAG_HW_QUEUE_READS_PA to viommu_ops v3 https://lore.kernel.org/all/cover.1746139811.git.nicolinc@nvidia.com/ * Add Reviewed-by from Baolu, Pranjal, and Alok * Revise kdocs, uAPI docs, and commit logs * Rename "vCMDQ" back to "vQUEUE" for AMD cases * [tegra] Add tegra241_vcmdq_hw_flush_timeout() * [tegra] Rename vsmmu_alloc to alloc_vintf_user * [tegra] Use writel for SID replacement registers * [tegra] Move mmap removal call to vsmmu_destroy op * [tegra] Fix revert in tegra241_vintf_alloc_lvcmdq_user() * [iommufd] Replace "& ~PAGE_MASK" with PAGE_ALIGNED() * [iommufd] Add an object-type "owner" to immap structure * [iommufd] Drop the ictx input in the new for-driver APIs * [iommufd] Add iommufd_vma_ops to keep track of mmap lifecycle * [iommufd] Add viommu-based iommufd_viommu_alloc/destroy_mmap helpers * [iommufd] Rename iommufd_ctx_alloc/free_mmap to _iommufd_alloc/destroy_mmap v2 https://lore.kernel.org/all/cover.1745646960.git.nicolinc@nvidia.com/ * Add Reviewed-by from Jason * [smmu] Fix vsmmu initial value * [smmu] Support impl for hw_info * [tegra] Rename "slot" to "vsid" * [tegra] Update kdocs and commit logs * [tegra] Map/unmap LVCMDQ dynamically * [tegra] Refcount the previous LVCMDQ * [tegra] Return -EEXIST if LVCMDQ exists * [tegra] Simplify VINTF cleanup routine * [tegra] Use vmid and s2_domain in vsmmu * [tegra] Rename "mmap_pgoff" to "immap_id" * [tegra] Add more addr and length validation * [iommufd] Add more narrative to mmap's kdoc * [iommufd] Add iommufd_struct_depend/undepend() * [iommufd] Rename vcmdq_free op to vcmdq_destroy * [iommufd] Fix bug in iommu_copy_struct_to_user() * [iommufd] Drop is_io from iommufd_ctx_alloc_mmap() * [iommufd] Test the queue memory for its contiguity * [iommufd] Return -ENXIO if address or length fails * [iommufd] Do not change @min_last in mock_viommu_alloc() * [iommufd] Generalize TEGRA241_VCMDQ data in core structure * [iommufd] Add selftest coverage for IOMMUFD_CMD_VCMDQ_ALLOC * [iommufd] Add iopt_pin_pages() to prevent queue memory from unmapping v1 https://lore.kernel.org/all/cover.1744353300.git.nicolinc@nvidia.com/ Thanks Nicolin Nicolin Chen (29): iommufd: Apply obvious cosmetic fixes iommufd: Introduce iommufd_object_alloc_ucmd helper iommu: Apply the new iommufd_object_alloc_ucmd helper iommu: Add iommu_copy_struct_to_user helper iommu: Pass in a driver-level user data structure to viommu_alloc op iommufd/viommu: Allow driver-specific user data for a vIOMMU object iommufd/selftest: Support user_data in mock_viommu_alloc iommufd/selftest: Add coverage for viommu data iommufd: Do not unmap an owned iopt_area iommufd: Abstract iopt_pin_pages and iopt_unpin_pages helpers iommufd/driver: Let iommufd_viommu_alloc helper save ictx to viommu->ictx iommufd/viommu: Add driver-allocated vDEVICE support iommufd/viommu: Introduce IOMMUFD_OBJ_HW_QUEUE and its related struct iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl iommufd/driver: Add iommufd_hw_queue_depend/undepend() helpers iommufd/selftest: Add coverage for IOMMUFD_CMD_HW_QUEUE_ALLOC iommufd: Add mmap interface iommufd/selftest: Add coverage for the new mmap interface Documentation: userspace-api: iommufd: Update HW QUEUE iommu: Allow an input type in hw_info op iommufd: Allow an input data_type via iommu_hw_info iommufd/selftest: Update hw_info coverage for an input data_type iommu/arm-smmu-v3-iommufd: Add vsmmu_alloc impl op iommu/arm-smmu-v3-iommufd: Add hw_info to impl_ops iommu/tegra241-cmdqv: Use request_threaded_irq iommu/tegra241-cmdqv: Simplify deinit flow in tegra241_cmdqv_remove_vintf() iommu/tegra241-cmdqv: Do not statically map LVCMDQs iommu/tegra241-cmdqv: Add user-space use support iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 28 +- drivers/iommu/iommufd/io_pagetable.h | 15 +- drivers/iommu/iommufd/iommufd_private.h | 41 +- drivers/iommu/iommufd/iommufd_test.h | 20 + include/linux/iommu.h | 53 +- include/linux/iommufd.h | 221 +++++++- include/uapi/linux/iommufd.h | 150 +++++- tools/testing/selftests/iommu/iommufd_utils.h | 91 +++- .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 33 +- .../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 496 +++++++++++++++++- drivers/iommu/intel/iommu.c | 4 + drivers/iommu/iommufd/device.c | 137 +---- drivers/iommu/iommufd/driver.c | 97 ++++ drivers/iommu/iommufd/eventq.c | 14 +- drivers/iommu/iommufd/hw_pagetable.c | 6 +- drivers/iommu/iommufd/io_pagetable.c | 106 +++- drivers/iommu/iommufd/iova_bitmap.c | 1 - drivers/iommu/iommufd/main.c | 80 ++- drivers/iommu/iommufd/pages.c | 19 +- drivers/iommu/iommufd/selftest.c | 158 +++++- drivers/iommu/iommufd/viommu.c | 146 +++++- tools/testing/selftests/iommu/iommufd.c | 146 +++++- .../selftests/iommu/iommufd_fail_nth.c | 15 +- Documentation/userspace-api/iommufd.rst | 12 + 24 files changed, 1794 insertions(+), 295 deletions(-) -- 2.43.0

4 weeks, 1 day

3
78
0 0

[PATCH bpf-next] selftests/bpf: rbtree: Fix incorrect global variable usage

by Rong Tao

From: Rong Tao <rongtao(a)cestc.cn> Within __add_three() function, should use function parameters instead of global variables. So that the variables groot_nested.inner.root and groot_nested.inner.glock in rbtree_add_nodes_nested() are tested correctly. Signed-off-by: Rong Tao <rongtao(a)cestc.cn> --- tools/testing/selftests/bpf/progs/rbtree.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/rbtree.c b/tools/testing/selftests/bpf/progs/rbtree.c index a3620c15c136..49fe93d7e059 100644 --- a/tools/testing/selftests/bpf/progs/rbtree.c +++ b/tools/testing/selftests/bpf/progs/rbtree.c @@ -61,19 +61,19 @@ static long __add_three(struct bpf_rb_root *root, struct bpf_spin_lock *lock) } m->key = 1; - bpf_spin_lock(&glock); - bpf_rbtree_add(&groot, &n->node, less); - bpf_rbtree_add(&groot, &m->node, less); - bpf_spin_unlock(&glock); + bpf_spin_lock(lock); + bpf_rbtree_add(root, &n->node, less); + bpf_rbtree_add(root, &m->node, less); + bpf_spin_unlock(lock); n = bpf_obj_new(typeof(*n)); if (!n) return 3; n->key = 3; - bpf_spin_lock(&glock); - bpf_rbtree_add(&groot, &n->node, less); - bpf_spin_unlock(&glock); + bpf_spin_lock(lock); + bpf_rbtree_add(root, &n->node, less); + bpf_spin_unlock(lock); return 0; } -- 2.49.0

1 month

2
1
0 0

[PATCH v2 0/4] selftests/mm: cow and gup_longterm cleanups

by Mark Brown

The bulk of these changes modify the cow and gup_longterm tests to report unique and stable names for each test, bringing them into line with the expectations of tooling that works with kselftest. The string reported as a test result is used by tooling to both deduplicate tests and track tests between test runs, using the same string for multiple tests or changing the string depending on test result causes problems for user interfaces and automation such as bisection. It was suggested that converting to use kselftest_harness.h would be a good way of addressing this, however that really wants the set of tests to run to be known at compile time but both test programs dynamically enumarate the set of huge page sizes the system supports and test each. Refactoring to handle this would be even more invasive than these changes which are large but straightforward and repetitive. A version of the main gup_longterm cleanup was previously sent separately, this version factors out the helpers for logging the start of the test since the cow test looks very similar. Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Changes in v2: - Typo fixes. - Link to v1: https://lore.kernel.org/r/20250522-selftests-mm-cow-dedupe-v1-0-713cee2fdd6… --- Mark Brown (4): selftests/mm: Use standard ksft_finished() in cow and gup_longterm selftests/mm: Add helper for logging test start and results selftests/mm: Report unique test names for each cow test selftests/mm: Fix test result reporting in gup_longterm tools/testing/selftests/mm/cow.c | 340 +++++++++++++++++++----------- tools/testing/selftests/mm/gup_longterm.c | 158 ++++++++------ tools/testing/selftests/mm/vm_util.h | 20 ++ 3 files changed, 334 insertions(+), 184 deletions(-) --- base-commit: a5806cd506af5a7c19bcd596e4708b5c464bfd21 change-id: 20250521-selftests-mm-cow-dedupe-33dcab034558 Best regards, -- Mark Brown <broonie(a)kernel.org>

1 month

4
34
0 0

[PATCH bpf-next] selftests/bpf: Fix compile error of bin_attribute::read/write()

by Rong Tao

From: Rong Tao <rongtao(a)cestc.cn> Since commit 97d06802d10a ("sysfs: constify bin_attribute argument of bin_attribute::read/write()"), make bin_attribute parameter of bin_attribute::read/write() const. Signed-off-by: Rong Tao <rongtao(a)cestc.cn> --- tools/testing/selftests/bpf/test_kmods/bpf_testmod.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c index e6c248e3ae54..e9e918cdf31f 100644 --- a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c +++ b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c @@ -385,7 +385,7 @@ int bpf_testmod_fentry_ok; noinline ssize_t bpf_testmod_test_read(struct file *file, struct kobject *kobj, - struct bin_attribute *bin_attr, + const struct bin_attribute *bin_attr, char *buf, loff_t off, size_t len) { struct bpf_testmod_test_read_ctx ctx = { @@ -465,7 +465,7 @@ ALLOW_ERROR_INJECTION(bpf_testmod_test_read, ERRNO); noinline ssize_t bpf_testmod_test_write(struct file *file, struct kobject *kobj, - struct bin_attribute *bin_attr, + const struct bin_attribute *bin_attr, char *buf, loff_t off, size_t len) { struct bpf_testmod_test_write_ctx ctx = { @@ -567,7 +567,7 @@ static void testmod_unregister_uprobe(void) static ssize_t bpf_testmod_uprobe_write(struct file *file, struct kobject *kobj, - struct bin_attribute *bin_attr, + const struct bin_attribute *bin_attr, char *buf, loff_t off, size_t len) { unsigned long offset = 0; -- 2.49.0

1 month

3
3
0 0

[PATCH net] selftests: drv-net: tso: make bkg() wait for socat to quit

by Jakub Kicinski

Commit 846742f7e32f ("selftests: drv-net: add a warning for bkg + shell + terminate") added a warning for bkg() used with terminate=True. The tso test was missed as we didn't have it running anywhere in NIPA. Add exit_wait=True, to avoid: # Warning: combining shell and terminate is risky! # SIGTERM may not reach the child on zsh/ksh! getting printed twice for every variant. Fixes: 0d0f4174f6c8 ("selftests: drv-net: add a simple TSO test") Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- CC: shuah(a)kernel.org CC: willemb(a)google.com CC: linux-kselftest(a)vger.kernel.org --- tools/testing/selftests/drivers/net/hw/tso.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/drivers/net/hw/tso.py b/tools/testing/selftests/drivers/net/hw/tso.py index e1ecb92f79d9..150d6db241a0 100755 --- a/tools/testing/selftests/drivers/net/hw/tso.py +++ b/tools/testing/selftests/drivers/net/hw/tso.py @@ -39,7 +39,7 @@ from lib.py import bkg, cmd, defer, ethtool, ip, rand_port, wait_port_listen port = rand_port() listen_cmd = f"socat -{ipver} -t 2 -u TCP-LISTEN:{port},reuseport /dev/null,ignoreeof" - with bkg(listen_cmd, host=cfg.remote) as nc: + with bkg(listen_cmd, host=cfg.remote, exit_wait=True) as nc: wait_port_listen(port, host=cfg.remote) if ipver == "4": -- 2.49.0

1 month

3
2
0 0

[PATCH net v2] selftests: drv-net: add configs for the TSO test

by Jakub Kicinski

Add missing config options for the tso.py test, specifically to make sure the kernel is built with vxlan and gre tunnels. I noticed this while adding a TSO-capable device QEMU to the CI. Previously we only run virtio tests and it doesn't report LSO stats on the QEMU we have. Fixes: 0d0f4174f6c8 ("selftests: drv-net: add a simple TSO test") Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- v2: - drop NET_IP_TUNNEL v1: https://lore.kernel.org/20250602231640.314556-1-kuba@kernel.org CC: shuah(a)kernel.org CC: willemb(a)google.com CC: linux-kselftest(a)vger.kernel.org --- tools/testing/selftests/drivers/net/hw/config | 5 +++++ 1 file changed, 5 insertions(+) create mode 100644 tools/testing/selftests/drivers/net/hw/config diff --git a/tools/testing/selftests/drivers/net/hw/config b/tools/testing/selftests/drivers/net/hw/config new file mode 100644 index 000000000000..88ae719e6f8f --- /dev/null +++ b/tools/testing/selftests/drivers/net/hw/config @@ -0,0 +1,5 @@ +CONFIG_IPV6=y +CONFIG_IPV6_GRE=y +CONFIG_NET_IPGRE=y +CONFIG_NET_IPGRE_DEMUX=y +CONFIG_VXLAN=y -- 2.49.0

1 month

3
3
0 0

[PATCH net] selftests: drv-net: tso: fix the GRE device name

by Jakub Kicinski

The device type for IPv4 GRE is "gre" not "ipgre", unlike for IPv6 which uses "ip6gre". Not sure how I missed this when writing the test, perhaps because all HW I have access to is on an IPv6-only network. Fixes: 0d0f4174f6c8 ("selftests: drv-net: add a simple TSO test") Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- CC: shuah(a)kernel.org CC: willemb(a)google.com CC: linux-kselftest(a)vger.kernel.org --- tools/testing/selftests/drivers/net/hw/tso.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/drivers/net/hw/tso.py b/tools/testing/selftests/drivers/net/hw/tso.py index 150d6db241a0..3370827409aa 100755 --- a/tools/testing/selftests/drivers/net/hw/tso.py +++ b/tools/testing/selftests/drivers/net/hw/tso.py @@ -216,7 +216,7 @@ from lib.py import bkg, cmd, defer, ethtool, ip, rand_port, wait_port_listen ("", "6", "tx-tcp6-segmentation", None), ("vxlan", "", "tx-udp_tnl-segmentation", ("vxlan", True, "id 100 dstport 4789 noudpcsum")), ("vxlan_csum", "", "tx-udp_tnl-csum-segmentation", ("vxlan", False, "id 100 dstport 4789 udpcsum")), - ("gre", "4", "tx-gre-segmentation", ("ipgre", False, "")), + ("gre", "4", "tx-gre-segmentation", ("gre", False, "")), ("gre", "6", "tx-gre-segmentation", ("ip6gre", False, "")), ) -- 2.49.0

1 month

4
3
0 0

[PATCH bpf-next v1 0/2] bpf,ktls: Fix data corruption caused by using bpf_msg_pop_data() in ktls

by Jiayuan Chen

Cong reported an issue where running 'test_sockmap' in the current bpf-next tree results in an error [1]. The specific test case that triggered the error is a combined test involving ktls and bpf_msg_pop_data(). Root Cause: When sending plaintext data, we initially calculated the corresponding ciphertext length. However, if we later reduced the plaintext data length via socket policy, we failed to recalculate the ciphertext length. This results in transmitting buffers containing uninitialized data during ciphertext transmission. This causes uninitialized bytes to be appended after a complete "Application Data" packet, leading to errors on the receiving end when parsing TLS record. This issue has existed for a long time but was only exposed after the following test code was merged. commit 47eae080410b ("selftests/bpf: Add more tests for test_txmsg_push_pop in test_sockmap") Although we already had tests for pop data before this commit, the pop data length was insufficient (less than 5 bytes). This meant that the corrupted TLS records with data length <5 bytes were cached without being parsed, resulting in no error being triggered. After this fix, all tests pass. 1/ 6 sockmap::txmsg test passthrough:OK 2/ 6 sockmap::txmsg test redirect:OK 3/ 2 sockmap::txmsg test redirect wait send mem:OK 4/ 6 sockmap::txmsg test drop:OK 5/ 6 sockmap::txmsg test ingress redirect:OK 6/ 7 sockmap::txmsg test skb:OK 7/12 sockmap::txmsg test apply:OK 8/12 sockmap::txmsg test cork:OK 9/ 3 sockmap::txmsg test hanging corks:OK 10/11 sockmap::txmsg test push_data:OK 11/17 sockmap::txmsg test pull-data:OK 12/ 9 sockmap::txmsg test pop-data:OK 13/ 6 sockmap::txmsg test push/pop data:OK 14/ 1 sockmap::txmsg test ingress parser:OK 15/ 1 sockmap::txmsg test ingress parser2:OK 16/ 6 sockhash::txmsg test passthrough:OK 17/ 6 sockhash::txmsg test redirect:OK 18/ 2 sockhash::txmsg test redirect wait send mem:OK 19/ 6 sockhash::txmsg test drop:OK 20/ 6 sockhash::txmsg test ingress redirect:OK 21/ 7 sockhash::txmsg test skb:OK 22/12 sockhash::txmsg test apply:OK 23/12 sockhash::txmsg test cork:OK 24/ 3 sockhash::txmsg test hanging corks:OK 25/11 sockhash::txmsg test push_data:OK 26/17 sockhash::txmsg test pull-data:OK 27/ 9 sockhash::txmsg test pop-data:OK 28/ 6 sockhash::txmsg test push/pop data:OK 29/ 1 sockhash::txmsg test ingress parser:OK 30/ 1 sockhash::txmsg test ingress parser2:OK 31/ 6 sockhash:ktls:txmsg test passthrough:OK 32/ 6 sockhash:ktls:txmsg test redirect:OK 33/ 2 sockhash:ktls:txmsg test redirect wait send mem:OK 34/ 6 sockhash:ktls:txmsg test drop:OK 35/ 6 sockhash:ktls:txmsg test ingress redirect:OK 36/ 7 sockhash:ktls:txmsg test skb:OK 37/12 sockhash:ktls:txmsg test apply:OK 38/12 sockhash:ktls:txmsg test cork:OK 39/ 3 sockhash:ktls:txmsg test hanging corks:OK 40/11 sockhash:ktls:txmsg test push_data:OK 41/17 sockhash:ktls:txmsg test pull-data:OK 42/ 9 sockhash:ktls:txmsg test pop-data:OK 43/ 6 sockhash:ktls:txmsg test push/pop data:OK 44/ 1 sockhash:ktls:txmsg test ingress parser:OK 45/ 0 sockhash:ktls:txmsg test ingress parser2:OK Pass: 45 Fail: 0 [1]: https://lore.kernel.org/bpf/CAM_iQpU7=4xjbefZoxndKoX9gFFMOe7FcWMq5tHBsymbrn… Jiayuan Chen (2): bpf,ktls: Fix data corruption when using bpf_msg_pop_data() in ktls selftests/bpf: Add test to cover ktls with bpf_msg_pop_data net/tls/tls_sw.c | 15 +++ .../selftests/bpf/prog_tests/sockmap_ktls.c | 91 +++++++++++++++++++ .../selftests/bpf/progs/test_sockmap_ktls.c | 4 + 3 files changed, 110 insertions(+) -- 2.47.1

1 month

3
7
0 0

[PATCH v2 0/3] KVM: arm64: selftests: arch_timer_edge_cases fixes

by Sebastian Ott

Some small fixes for arch_timer_edge_cases that I stumbled upon while debugging failures for this selftest on ampere-one. Changes since v1: modified patch 3 based on suggestions from Marc. I've done some tests with this on various machines - seems to be all good, however on ampere-one I now hit this in 10% of the runs: ==== Test Assertion Failure ==== arm64/arch_timer_edge_cases.c:481: timer_get_cntct(timer) >= DEF_CNT + (timer_get_cntfrq() * (uint64_t)(delta_2_ms) / 1000) pid=166657 tid=166657 errno=4 - Interrupted system call 1 0x0000000000404db3: test_run at arch_timer_edge_cases.c:933 2 0x0000000000401f9f: main at arch_timer_edge_cases.c:1062 3 0x0000ffffaedd625b: ?? ??:0 4 0x0000ffffaedd633b: ?? ??:0 5 0x00000000004020af: _start at ??:? timer_get_cntct(timer) >= DEF_CNT + msec_to_cycles(delta_2_ms) This is not new, it was just hidden behind the other failure. I'll try to figure out what this is about (seems to be independent of the wait time).. Sebastian Ott (3): KVM: arm64: selftests: fix help text for arch_timer_edge_cases KVM: arm64: selftests: fix thread migration in arch_timer_edge_cases KVM: arm64: selftests: arch_timer_edge_cases - determine effective counter width .../kvm/arm64/arch_timer_edge_cases.c | 37 ++++++++++++------- 1 file changed, 24 insertions(+), 13 deletions(-) base-commit: 0ff41df1cb268fc69e703a08a57ee14ae967d0ca -- 2.49.0

1 month

3
7
0 0

[RFC] selftests/mm: Skip tests dependent on a binary not built

by Khaled Elnaggar

Hello. Running the mm selftests from the kernel's root directory on an x86_64 debian machine using: make defconfig sudo make kselftest TARGETS=mm the tests run normally till we reach one which stalls for 180 seconds and times out according to the following logs: ``` ----------------------------------------------- running ./charge_reserved_hugetlb.sh -cgroup-v2 ----------------------------------------------- CLEANUP DONE CLEANUP DONE Test normal case. private=, populate=, method=0, reserve= nr hugepages = 10 writing cgroup limit: 20971520 writing reseravation limit: 20971520 Starting: hugetlb_usage=0 reserved_usage=0 expect_failure is 0 Putting task in cgroup 'hugetlb_cgroup_test' Method is 0 >>> write_hugetlb_memory.sh: line 22: ./write_to_hugetlbfs: No such file or directory <<< Waiting for hugetlb memory reservation to reach size 10485760. 0 Waiting for hugetlb memory reservation to reach size 10485760. 0 ... Waiting for hugetlb memory reservation to reach size 10485760. 0 Waiting for hugetlb memory reservation to reach size 10485760. 0 not ok 1 selftests: mm: run_vmtests.sh # TIMEOUT 180 seconds make[3]: Leaving directory '/linux/tools/testing/selftests/mm' ``` Logs show that the executable "write_to_hugetlbfs" is missing, causing the test to hang waiting for hugepage reservations. The executable not found means it was not built by the Make system. It is mentioned in Makefile:136-142, and only built if ARCH is 64-bit ``` ifneq (,$(filter $(ARCH),arm64 mips64 parisc64 powerpc riscv64 s390x sparc64 x86_64 s390)) TEST_GEN_FILES += va_high_addr_switch ifneq ($(ARCH),riscv64) TEST_GEN_FILES += virtual_address_range endif TEST_GEN_FILES += write_to_hugetlbfs endif ``` So, for some reason, the top-level Makefile provides ARCH as x86. My proposed solution is similar to existing virtual_address_range check that is to check for the binary, and if it is not found, skip these 2 test cases: charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh since they directly and indirectly depend on write_to_hugetlbfs binary. This is just a workaround, the root issue of different ARCH detection when running tests from the kernel root directory should still be addressed. I am not sure how to approach it and open for your suggestions. Note that this issue does not happen when ran from selftests/mm using something like sudo make -C tools/testing/selftests/mm because then mm/Makefile's ARCH detection runs correctly (x86_64) Kindly review and share your thoughts. Signed-off-by: Khaled Elnaggar <khaledelnaggarlinux(a)gmail.com> --- tools/testing/selftests/mm/run_vmtests.sh | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh index dddd1dd8af14..cdbcfdb62f8a 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -375,8 +375,13 @@ CATEGORY="process_mrelease" run_test ./mrelease_test CATEGORY="mremap" run_test ./mremap_test CATEGORY="hugetlb" run_test ./thuge-gen + +# the following depend on write_to_hugetlbfs binary +if [ -x ./write_to_hugetlbfs ]; then CATEGORY="hugetlb" run_test ./charge_reserved_hugetlb.sh -cgroup-v2 CATEGORY="hugetlb" run_test ./hugetlb_reparenting_test.sh -cgroup-v2 +fi + if $RUN_DESTRUCTIVE; then nr_hugepages_tmp=$(cat /proc/sys/vm/nr_hugepages) enable_soft_offline=$(cat /proc/sys/vm/enable_soft_offline) -- 2.47.2

1 month

3
5
0 0

[PATCH 00/17] ARM64 PMU Partitioning

by Colton Lewis

Overview: This series implements a new PMU scheme on ARM, a partitioned PMU that exists alongside the existing emulated PMU and may be enabled by the kernel command line kvm.reserved_host_counters or by the vcpu ioctl KVM_ARM_PARTITION_PMU. This is a continuation of the RFC posted earlier this year. [1] The high level overview and reason for the name is that this implementation takes advantage of recent CPU features to partition the PMU counters into a host-reserved set and a guest-reserved set. Guests are allowed untrapped hardware access to the most frequently used PMU registers and features for the guest-reserved counters only. This untrapped hardware access significantly reduces the overhead of using performance monitoring capabilities such as the `perf` tool inside a guest VM. Register accesses that aren't trapping to KVM mean less time spent in the host kernel and more time on the workloads guests care about. This optimization especially shines during high `perf` sample rates or large numbers of events that require multiplexing hardware counters. Performance: For example, the following tests were carried out on identical ARM machines with 10 general purpose counters with identical guest images run on QEMU, the only difference being my PMU implementation or the existing one. Some arguments have been simplified here to clarify the purpose of the test: 1) time perf record -e ${FIFTEEN_HW_EVENTS} -F 1000 -- \ gzip -c tmpfs/random.64M.img >/dev/null On emulated PMU this command took 4.143s real time with 0.159s system time. On partitioned PMU this command took 3.139s real time with 0.110s system time, runtime reductions of 24.23% and 30.82%. 2) time perf stat -dd -- \ automated_specint2017.sh On emulated PMU this benchmark completed in 3789.16s real time with 224.45s system time and a final benchmark score of 4.28. On partitioned PMU this benchmark completed in 3525.67s real time with 15.98s system time and a final benchmark score of 4.56. That is a 6.95% reduction in runtime, 92.88% reduction in system time, and 6.54% improvement in overall benchmark score. Seeing these improvements on something as lightweight as perf stat is remarkable and implies there would have been a much greater improvement with perf record. I did not test that because I was not confident it would even finish in a reasonable time on the emulated PMU Test 3 was slightly different, I ran the workload in a VM with a single VCPU pinned to a physical CPU and analyzed from the host where the physical CPU spent its time using mpstat. 3) perf record -e ${FIFTEEN_HW_EVENTS} -F 4000 -- \ stress-ng --cpu 0 --timeout 30 Over a period of 30s the cpu running with the emulated PMU spent 34.96% of the time in the host kernel and 55.85% of the time in the guest. The cpu running the partitioned PMU spent 0.97% of its time in the host kernel and 91.06% of its time in the guest. Taken together, these tests represent a remarkable performance improvement for anything perf related using this new PMU implementation. Caveats: Because the most consistent and performant thing to do was untrap PMCR_EL0, the number of counters visible to the guest via PMCR_EL0.N is always equal to the value KVM sets for MDCR_EL2.HPMN. Previously allowed writes to PMCR_EL0.N via {GET,SET}_ONE_REG no longer affect the guest. These improvements come at a cost to 7-35 new registers that must be swapped at every vcpu_load and vcpu_put if the feature is enabled. I have been informed KVM would like to avoid paying this cost when possible. One solution is to make the trapping changes and context swapping lazy such that the trapping changes and context swapping only take place after the guest has actually accessed the PMU so guests that never access the PMU never pay the cost. This is not done here because it is not crucial to the primary functionality and I thought review would be more productive as soon as I had something complete enough for reviewers to easily play with. However, this or any better ideas are on the table for inclusion in future re-rolls. [1] https://lore.kernel.org/kvmarm/20250213180317.3205285-1-coltonlewis@google.… Colton Lewis (16): arm64: cpufeature: Add cpucap for HPMN0 arm64: Generate sign macro for sysreg Enums arm64: cpufeature: Add cpucap for PMICNTR KVM: arm64: Reorganize PMU functions KVM: arm64: Introduce method to partition the PMU perf: arm_pmuv3: Generalize counter bitmasks perf: arm_pmuv3: Keep out of guest counter partition KVM: arm64: Set up FGT for Partitioned PMU KVM: arm64: Writethrough trapped PMEVTYPER register KVM: arm64: Use physical PMSELR for PMXEVTYPER if partitioned KVM: arm64: Writethrough trapped PMOVS register KVM: arm64: Context switch Partitioned PMU guest registers perf: pmuv3: Handle IRQs for Partitioned PMU guest counters KVM: arm64: Inject recorded guest interrupts KVM: arm64: Add ioctl to partition the PMU when supported KVM: arm64: selftests: Add test case for partitioned PMU Marc Zyngier (1): KVM: arm64: Cleanup PMU includes Documentation/virt/kvm/api.rst | 16 + arch/arm/include/asm/arm_pmuv3.h | 24 + arch/arm64/include/asm/arm_pmuv3.h | 36 +- arch/arm64/include/asm/kvm_host.h | 208 +++++- arch/arm64/include/asm/kvm_pmu.h | 82 +++ arch/arm64/kernel/cpufeature.c | 15 + arch/arm64/kvm/Makefile | 2 +- arch/arm64/kvm/arm.c | 24 +- arch/arm64/kvm/debug.c | 13 +- arch/arm64/kvm/hyp/include/hyp/switch.h | 65 +- arch/arm64/kvm/pmu-emul.c | 629 +---------------- arch/arm64/kvm/pmu-part.c | 358 ++++++++++ arch/arm64/kvm/pmu.c | 630 ++++++++++++++++++ arch/arm64/kvm/sys_regs.c | 54 +- arch/arm64/tools/cpucaps | 2 + arch/arm64/tools/gen-sysreg.awk | 1 + arch/arm64/tools/sysreg | 6 +- drivers/perf/arm_pmuv3.c | 55 +- include/kvm/arm_pmu.h | 199 ------ include/linux/perf/arm_pmu.h | 15 +- include/linux/perf/arm_pmuv3.h | 14 +- include/uapi/linux/kvm.h | 4 + tools/include/uapi/linux/kvm.h | 2 + .../selftests/kvm/arm64/vpmu_counter_access.c | 40 +- virt/kvm/kvm_main.c | 1 + 25 files changed, 1616 insertions(+), 879 deletions(-) create mode 100644 arch/arm64/include/asm/kvm_pmu.h create mode 100644 arch/arm64/kvm/pmu-part.c delete mode 100644 include/kvm/arm_pmu.h base-commit: 1b85d923ba8c9e6afaf19e26708411adde94fba8 -- 2.49.0.1204.g71687c7c1d-goog

1 month

3
33
0 0

[PATCH] selftests: cachestat: add tests for mmap and /proc/cpuinfo

by Suresh K C

From: Suresh K C <suresh.k.chandrappa(a)gmail.com> Add a test case to verify cachestat behavior with memory-mapped files using mmap(). This ensures that pages accessed via mmap are correctly accounted for in the page cache. Also add a test for /proc/cpuinfo to validate cachestat's handling of virtual files in pseudo-filesystems. This improves test coverage for edge cases involving non-regular files. Tested on x86_64 with default kernel config. Signed-off-by: Suresh K C <suresh.k.chandrappa(a)gmail.com> --- .../selftests/cachestat/test_cachestat.c | 69 ++++++++++++++++++- 1 file changed, 67 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/cachestat/test_cachestat.c b/tools/testing/selftests/cachestat/test_cachestat.c index 632ab44737ec..81e7f6dd2279 100644 --- a/tools/testing/selftests/cachestat/test_cachestat.c +++ b/tools/testing/selftests/cachestat/test_cachestat.c @@ -22,7 +22,7 @@ static const char * const dev_files[] = { "/dev/zero", "/dev/null", "/dev/urandom", - "/proc/version", "/proc" + "/proc/version","/proc/cpuinfo","/proc" }; void print_cachestat(struct cachestat *cs) @@ -202,6 +202,65 @@ static int test_cachestat(const char *filename, bool write_random, bool create, return ret; } +bool test_cachestat_mmap(void){ + + size_t PS = sysconf(_SC_PAGESIZE); + size_t filesize = PS * 512 * 2;; + int syscall_ret; + size_t compute_len = PS * 512; + struct cachestat_range cs_range = { PS, compute_len }; + char *filename = "tmpshmcstat"; + unsigned long num_pages = compute_len / PS; + struct cachestat cs; + bool ret = true; + int fd = open(filename, O_RDWR | O_CREAT | O_TRUNC, 0666); + if (fd < 0) { + ksft_print_msg("Unable to create mmap file.\n"); + ret = false; + goto out; + } + if (ftruncate(fd, filesize)) { + ksft_print_msg("Unable to truncate mmap file.\n"); + ret = false; + goto close_fd; + } + if (!write_exactly(fd, filesize)) { + ksft_print_msg("Unable to write to mmap file.\n"); + ret = false; + goto close_fd; + } + char *map = mmap(NULL, filesize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (map == MAP_FAILED) { + ksft_print_msg("mmap failed.\n"); + ret = false; + goto close_fd; + } + + for (int i = 0; i < filesize; i++) { + map[i] = 'A'; + } + map[filesize - 1] = 'X'; + + syscall_ret = syscall(__NR_cachestat, fd, &cs_range, &cs, 0); + + if (syscall_ret) { + ksft_print_msg("Cachestat returned non-zero.\n"); + ret = false; + } else { + print_cachestat(&cs); + if (cs.nr_cache + cs.nr_evicted != num_pages) { + ksft_print_msg("Total number of cached and evicted pages is off.\n"); + ret = false; + } + } + +close_fd: + close(fd); + unlink(filename); +out: + return ret; +} + bool test_cachestat_shmem(void) { size_t PS = sysconf(_SC_PAGESIZE); @@ -274,7 +333,7 @@ int main(void) ret = 1; } - for (int i = 0; i < 5; i++) { + for (int i = 0; i < 6; i++) { const char *dev_filename = dev_files[i]; if (test_cachestat(dev_filename, false, false, false, @@ -315,5 +374,11 @@ int main(void) ret = 1; } + if (test_cachestat_mmap()) + ksft_test_result_pass("cachestat works with a mmap file\n"); + else { + ksft_test_result_fail("cachestat fails with a mmap file\n"); + ret = 1; + } return ret; } -- 2.43.0

1 month

2
1
0 0

[PATCH v2] riscv: sbi: Add SBI Debug Triggers Extension tests

by Jesse Taube

Add tests for the DBTR SBI extension. Signed-off-by: Jesse Taube <jesse(a)rivosinc.com> --- V1 -> V2: - Call report_prefix_pop before returning - Disable compressed instructions in exec_call, update related comment - Remove extra "| 1" in dbtr_test_load - Remove extra newlines - Remove extra tabs in check_exec - Remove typedefs from enums - Return when dbtr_install_trigger fails - s/avalible/available/g - s/unistall/uninstall/g --- lib/riscv/asm/sbi.h | 28 ++ lib/riscv/sbi.c | 58 ++++ riscv/Makefile | 1 + riscv/sbi-dbtr.c | 751 ++++++++++++++++++++++++++++++++++++++++++++ riscv/sbi-tests.h | 1 + riscv/sbi.c | 1 + 6 files changed, 840 insertions(+) create mode 100644 riscv/sbi-dbtr.c diff --git a/lib/riscv/asm/sbi.h b/lib/riscv/asm/sbi.h index a5738a5c..ce19ab89 100644 --- a/lib/riscv/asm/sbi.h +++ b/lib/riscv/asm/sbi.h @@ -51,6 +51,7 @@ enum sbi_ext_id { SBI_EXT_SUSP = 0x53555350, SBI_EXT_FWFT = 0x46574654, SBI_EXT_SSE = 0x535345, + SBI_EXT_DBTR = 0x44425452, }; enum sbi_ext_base_fid { @@ -125,6 +126,17 @@ enum sbi_ext_fwft_fid { #define SBI_FWFT_SET_FLAG_LOCK BIT(0) +enum sbi_ext_dbtr_fid { + SBI_EXT_DBTR_NUM_TRIGGERS = 0, + SBI_EXT_DBTR_SETUP_SHMEM, + SBI_EXT_DBTR_TRIGGER_READ, + SBI_EXT_DBTR_TRIGGER_INSTALL, + SBI_EXT_DBTR_TRIGGER_UPDATE, + SBI_EXT_DBTR_TRIGGER_UNINSTALL, + SBI_EXT_DBTR_TRIGGER_ENABLE, + SBI_EXT_DBTR_TRIGGER_DISABLE, +}; + enum sbi_ext_sse_fid { SBI_EXT_SSE_READ_ATTRS = 0, SBI_EXT_SSE_WRITE_ATTRS, @@ -282,6 +294,22 @@ static inline bool sbi_sse_event_is_global(uint32_t event_id) return !!(event_id & SBI_SSE_EVENT_GLOBAL_BIT); } +struct sbiret sbi_debug_num_triggers(unsigned long trig_tdata1); +struct sbiret sbi_debug_set_shmem(void *shmem); +struct sbiret sbi_debug_set_shmem_raw(unsigned long shmem_phys_lo, + unsigned long shmem_phys_hi, + unsigned long flags); +struct sbiret sbi_debug_read_triggers(unsigned long trig_idx_base, + unsigned long trig_count); +struct sbiret sbi_debug_install_triggers(unsigned long trig_count); +struct sbiret sbi_debug_update_triggers(unsigned long trig_count); +struct sbiret sbi_debug_uninstall_triggers(unsigned long trig_idx_base, + unsigned long trig_idx_mask); +struct sbiret sbi_debug_enable_triggers(unsigned long trig_idx_base, + unsigned long trig_idx_mask); +struct sbiret sbi_debug_disable_triggers(unsigned long trig_idx_base, + unsigned long trig_idx_mask); + struct sbiret sbi_sse_read_attrs_raw(unsigned long event_id, unsigned long base_attr_id, unsigned long attr_count, unsigned long phys_lo, unsigned long phys_hi); diff --git a/lib/riscv/sbi.c b/lib/riscv/sbi.c index 2959378f..39c0d3bd 100644 --- a/lib/riscv/sbi.c +++ b/lib/riscv/sbi.c @@ -32,6 +32,64 @@ struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0, return ret; } +struct sbiret sbi_debug_num_triggers(unsigned long trig_tdata1) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_NUM_TRIGGERS, trig_tdata1, 0, 0, 0, 0, 0); +} + +struct sbiret sbi_debug_set_shmem(void *shmem) +{ + phys_addr_t p = virt_to_phys(shmem); + + return sbi_debug_set_shmem_raw(lower_32_bits(p), upper_32_bits(p), 0); +} + +struct sbiret sbi_debug_set_shmem_raw(unsigned long shmem_phys_lo, + unsigned long shmem_phys_hi, + unsigned long flags) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_SETUP_SHMEM, shmem_phys_lo, + shmem_phys_hi, flags, 0, 0, 0); +} + +struct sbiret sbi_debug_read_triggers(unsigned long trig_idx_base, + unsigned long trig_count) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_READ, trig_idx_base, + trig_count, 0, 0, 0, 0); +} + +struct sbiret sbi_debug_install_triggers(unsigned long trig_count) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_INSTALL, trig_count, 0, 0, 0, 0, 0); +} + +struct sbiret sbi_debug_update_triggers(unsigned long trig_count) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_UPDATE, trig_count, 0, 0, 0, 0, 0); +} + +struct sbiret sbi_debug_uninstall_triggers(unsigned long trig_idx_base, + unsigned long trig_idx_mask) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_UNINSTALL, trig_idx_base, + trig_idx_mask, 0, 0, 0, 0); +} + +struct sbiret sbi_debug_enable_triggers(unsigned long trig_idx_base, + unsigned long trig_idx_mask) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_ENABLE, trig_idx_base, + trig_idx_mask, 0, 0, 0, 0); +} + +struct sbiret sbi_debug_disable_triggers(unsigned long trig_idx_base, + unsigned long trig_idx_mask) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_DISABLE, trig_idx_base, + trig_idx_mask, 0, 0, 0, 0); +} + struct sbiret sbi_sse_read_attrs_raw(unsigned long event_id, unsigned long base_attr_id, unsigned long attr_count, unsigned long phys_lo, unsigned long phys_hi) diff --git a/riscv/Makefile b/riscv/Makefile index 11e68eae..55c7ac93 100644 --- a/riscv/Makefile +++ b/riscv/Makefile @@ -20,6 +20,7 @@ all: $(tests) $(TEST_DIR)/sbi-deps += $(TEST_DIR)/sbi-asm.o $(TEST_DIR)/sbi-deps += $(TEST_DIR)/sbi-fwft.o $(TEST_DIR)/sbi-deps += $(TEST_DIR)/sbi-sse.o +$(TEST_DIR)/sbi-deps += $(TEST_DIR)/sbi-dbtr.o all_deps += $($(TEST_DIR)/sbi-deps) diff --git a/riscv/sbi-dbtr.c b/riscv/sbi-dbtr.c new file mode 100644 index 00000000..fe323f0f --- /dev/null +++ b/riscv/sbi-dbtr.c @@ -0,0 +1,751 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * SBI DBTR testsuite + * + * Copyright (C) 2025, Rivos Inc., Jesse Taube <jesse(a)rivosinc.com> + */ + +#include <asm/io.h> + +#include "sbi-tests.h" + +#define INSN_LEN(insn) ((((insn) & 0x3) < 0x3) ? 2 : 4) + +#if __riscv_xlen == 64 +#define SBI_DBTR_SHMEM_INVALID_ADDR 0xFFFFFFFFFFFFFFFFUL +#elif __riscv_xlen == 32 +#define SBI_DBTR_SHMEM_INVALID_ADDR 0xFFFFFFFFUL +#else +#error "Unexpected __riscv_xlen" +#endif + +#define RV_MAX_TRIGGERS 32 + +#define SBI_DBTR_TRIG_STATE_MAPPED BIT(0) +#define SBI_DBTR_TRIG_STATE_U BIT(1) +#define SBI_DBTR_TRIG_STATE_S BIT(2) +#define SBI_DBTR_TRIG_STATE_VU BIT(3) +#define SBI_DBTR_TRIG_STATE_VS BIT(4) +#define SBI_DBTR_TRIG_STATE_HAVE_HW_TRIG BIT(5) + +#define SBI_DBTR_TRIG_STATE_HW_TRIG_IDX_SHIFT 8 +#define SBI_DBTR_TRIG_STATE_HW_TRIG_IDX(trig_state) (trig_state >> SBI_DBTR_TRIG_STATE_HW_TRIG_IDX_SHIFT) + +#define SBI_DBTR_TDATA1_TYPE_SHIFT (__riscv_xlen - 4) + +#define SBI_DBTR_TDATA1_MCONTROL6_LOAD_BIT BIT(0) +#define SBI_DBTR_TDATA1_MCONTROL6_STORE_BIT BIT(1) +#define SBI_DBTR_TDATA1_MCONTROL6_EXECUTE_BIT BIT(2) +#define SBI_DBTR_TDATA1_MCONTROL6_U_BIT BIT(3) +#define SBI_DBTR_TDATA1_MCONTROL6_S_BIT BIT(4) +#define SBI_DBTR_TDATA1_MCONTROL6_SELECT_BIT BIT(21) +#define SBI_DBTR_TDATA1_MCONTROL6_VS_BIT BIT(23) +#define SBI_DBTR_TDATA1_MCONTROL6_VU_BIT BIT(24) + +#define SBI_DBTR_TDATA1_MCONTROL_LOAD_BIT BIT(0) +#define SBI_DBTR_TDATA1_MCONTROL_STORE_BIT BIT(1) +#define SBI_DBTR_TDATA1_MCONTROL_EXECUTE_BIT BIT(2) +#define SBI_DBTR_TDATA1_MCONTROL_U_BIT BIT(3) +#define SBI_DBTR_TDATA1_MCONTROL_S_BIT BIT(4) +#define SBI_DBTR_TDATA1_MCONTROL_SELECT_BIT BIT(19) + +enum McontrolType { + SBI_DBTR_TDATA1_TYPE_NONE = (0UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_LEGACY = (1UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_MCONTROL = (2UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_ICOUNT = (3UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_ITRIGGER = (4UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_ETRIGGER = (5UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_MCONTROL6 = (6UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_TMEXTTRIGGER = (7UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_RESERVED0 = (8UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_RESERVED1 = (9UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_RESERVED2 = (10UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_RESERVED3 = (11UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_CUSTOM0 = (12UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_CUSTOM1 = (13UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_CUSTOM2 = (14UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_DISABLED = (15UL << SBI_DBTR_TDATA1_TYPE_SHIFT), +}; + +enum Tdata1Value { + VALUE_NONE = 0, + VALUE_LOAD = BIT(0), + VALUE_STORE = BIT(1), + VALUE_EXECUTE = BIT(2), +}; + +enum Tdata1Mode { + MODE_NONE = 0, + MODE_M = BIT(0), + MODE_U = BIT(1), + MODE_S = BIT(2), + MODE_VU = BIT(3), + MODE_VS = BIT(4), +}; + +struct sbi_dbtr_data_msg { + unsigned long tstate; + unsigned long tdata1; + unsigned long tdata2; + unsigned long tdata3; +}; + +struct sbi_dbtr_id_msg { + unsigned long idx; +}; + +/* SBI shared mem messages layout */ +struct sbi_dbtr_shmem_entry { + union { + struct sbi_dbtr_data_msg data; + struct sbi_dbtr_id_msg id; + }; +}; + +static bool dbtr_handled; + +// Expected to be leaf function as not to disrupt frame-pointer +static __attribute__((naked)) void exec_call(void) +{ + // skip over nop when triggered instead of ret. + asm volatile (".option push\n" + ".option arch, -c\n" + "nop\n" + "ret\n" + ".option pop\n"); +} + +static void dbtr_exception_handler(struct pt_regs *regs) +{ + dbtr_handled = true; + + /* Reading *epc may cause a fault, skip over nop */ + if ((void *)regs->epc == exec_call) { + regs->epc += 4; + return; + } + + /* WARNING: Skips over the trapped intruction */ + regs->epc += INSN_LEN(readw((void *)regs->epc)); +} + +static bool do_save(void *tdata2) +{ + bool ret; + + writel(0, tdata2); + + ret = dbtr_handled; + dbtr_handled = false; + + return ret; +} + +static bool do_load(void *tdata2) +{ + bool ret; + + readl(tdata2); + + ret = dbtr_handled; + dbtr_handled = false; + + return ret; +} + +static bool do_exec(void) +{ + bool ret; + + exec_call(); + + ret = dbtr_handled; + dbtr_handled = false; + + return ret; +} + +static unsigned long gen_tdata1_mcontrol(enum Tdata1Mode mode, enum Tdata1Value value) +{ + unsigned long tdata1 = SBI_DBTR_TDATA1_TYPE_MCONTROL; + + if (value & VALUE_LOAD) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_LOAD_BIT; + + if (value & VALUE_STORE) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_STORE_BIT; + + if (value & VALUE_EXECUTE) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_EXECUTE_BIT; + + if (mode & MODE_M) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_U_BIT; + + if (mode & MODE_U) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_U_BIT; + + if (mode & MODE_S) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_S_BIT; + + return tdata1; +} + +static unsigned long gen_tdata1_mcontrol6(enum Tdata1Mode mode, enum Tdata1Value value) +{ + unsigned long tdata1 = SBI_DBTR_TDATA1_TYPE_MCONTROL6; + + if (value & VALUE_LOAD) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_LOAD_BIT; + + if (value & VALUE_STORE) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_STORE_BIT; + + if (value & VALUE_EXECUTE) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_EXECUTE_BIT; + + if (mode & MODE_M) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_U_BIT; + + if (mode & MODE_U) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_U_BIT; + + if (mode & MODE_S) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_S_BIT; + + if (mode & MODE_VU) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_VU_BIT; + + if (mode & MODE_VS) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_VS_BIT; + + return tdata1; +} + +static unsigned long gen_tdata1(enum McontrolType type, enum Tdata1Value value, enum Tdata1Mode mode) +{ + switch (type) { + case SBI_DBTR_TDATA1_TYPE_MCONTROL: + return gen_tdata1_mcontrol(mode, value); + case SBI_DBTR_TDATA1_TYPE_MCONTROL6: + return gen_tdata1_mcontrol6(mode, value); + default: + return 0; + } +} + +static bool dbtr_install_trigger(struct sbi_dbtr_shmem_entry *shmem, void *tdata2, + unsigned long tdata1) +{ + struct sbiret sbi_ret; + bool ret; + + shmem->data.tdata1 = tdata1; + shmem->data.tdata2 = (unsigned long)tdata2; + + sbi_ret = sbi_debug_install_triggers(1); + ret = sbiret_report_error(&sbi_ret, SBI_SUCCESS, "sbi_debug_install_triggers"); + if (ret) + install_exception_handler(EXC_BREAKPOINT, dbtr_exception_handler); + + return ret; +} + +static bool dbtr_uninstall_trigger(void) +{ + struct sbiret ret; + + install_exception_handler(EXC_BREAKPOINT, NULL); + + ret = sbi_debug_uninstall_triggers(0, 1); + return sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_uninstall_triggers"); +} + +static unsigned long dbtr_test_num_triggers(void) +{ + struct sbiret ret; + //sbi_debug_num_triggers will return trig_max in sbiret.value when trig_tdata1 == 0 + unsigned long tdata1 = 0; + + // should be atleast one trigger. + ret = sbi_debug_num_triggers(tdata1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_num_triggers"); + + if (ret.value == 0) + report_fail("sbi_debug_num_triggers: Returned 0 triggers available"); + else + report_pass("sbi_debug_num_triggers: Returned %lu triggers available", ret.value); + + return ret.value; +} + +static enum McontrolType dbtr_test_type(unsigned long *num_trig) +{ + struct sbiret ret; + + //sbi_debug_num_triggers will return trig_max in sbiret.value when trig_tdata1 == 0 + unsigned long tdata1 = SBI_DBTR_TDATA1_TYPE_MCONTROL6; + + ret = sbi_debug_num_triggers(tdata1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_num_triggers"); + if (ret.value > 0) { + report_pass("sbi_debug_num_triggers: Returned %lu mcontrol6 triggers available", + ret.value); + *num_trig = ret.value; + return tdata1; + } + + tdata1 = SBI_DBTR_TDATA1_TYPE_MCONTROL; + + ret = sbi_debug_num_triggers(tdata1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_num_triggers"); + *num_trig = ret.value; + if (ret.value > 0) { + report_pass("sbi_debug_num_triggers: Returned %lu mcontrol triggers available", + ret.value); + return tdata1; + } + + report_fail("sbi_debug_num_triggers: Returned 0 mcontrol(6) triggers available"); + + return SBI_DBTR_TDATA1_TYPE_NONE; +} + +static struct sbiret dbtr_test_save_install_uninstall(struct sbi_dbtr_shmem_entry *shmem, + enum McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + + report_prefix_push("save_trigger"); + + shmem->data.tdata1 = gen_tdata1(type, VALUE_STORE, MODE_S | MODE_S); + shmem->data.tdata2 = (unsigned long)&test; + + ret = sbi_debug_install_triggers(1); + if (!sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_install_triggers")) { + report_prefix_pop(); + return ret; + } + + install_exception_handler(EXC_BREAKPOINT, dbtr_exception_handler); + + report(do_save(&test), "triggered"); + + if (do_load(&test)) + report_fail("triggered by load"); + + ret = sbi_debug_uninstall_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_uninstall_triggers"); + + if (do_save(&test)) + report_fail("triggered after uninstall"); + + install_exception_handler(EXC_BREAKPOINT, NULL); + report_prefix_pop(); + + return ret; +} + +static void dbtr_test_update(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + + report_prefix_push("update_trigger"); + + if (!dbtr_install_trigger(shmem, NULL, gen_tdata1(type, VALUE_NONE, MODE_NONE))) { + report_prefix_pop(); + return; + } + + shmem->id.idx = 0; + shmem->data.tdata1 = gen_tdata1(type, VALUE_STORE, MODE_S); + shmem->data.tdata2 = (unsigned long)&test; + + ret = sbi_debug_update_triggers(1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_update_triggers"); + + report(do_save(&test), "triggered"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_load(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + static unsigned long test; + + report_prefix_push("load_trigger"); + if (!dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_LOAD, MODE_S))) { + report_prefix_pop(); + return; + } + + report(do_load(&test), "triggered"); + + if (do_save(&test)) + report_fail("triggered by save"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_disable_enable(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + + report_prefix_push("sbi_debug_disable_triggers"); + if (!dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + + ret = sbi_debug_disable_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_disable_triggers"); + + if (do_save(&test)) { + report_fail("should not trigger"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); + report_skip("sbi_debug_enable_triggers: no disable"); + + return; + } + + report_pass("should not trigger"); + + report_prefix_pop(); + report_prefix_push("sbi_debug_enable_triggers"); + + ret = sbi_debug_enable_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_enable_triggers"); + + report(do_save(&test), "triggered"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_exec(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + static unsigned long test; + + report_prefix_push("exec_trigger"); + /* check if loads and saves trigger exec */ + if (!dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_EXECUTE, MODE_S))) { + report_prefix_pop(); + return; + } + + if (do_load(&test)) + report_fail("triggered by load"); + + if (do_save(&test)) + report_fail("triggered by save"); + + dbtr_uninstall_trigger(); + + /* Check if exec works */ + if (!dbtr_install_trigger(shmem, exec_call, gen_tdata1(type, VALUE_EXECUTE, MODE_S))) { + report_prefix_pop(); + return; + } + report(do_exec(), "exec trigger"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_read(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + const unsigned long tstatus_expected = SBI_DBTR_TRIG_STATE_S | SBI_DBTR_TRIG_STATE_MAPPED; + const unsigned long tdata1 = gen_tdata1(type, VALUE_STORE, MODE_S); + static unsigned long test; + struct sbiret ret; + + report_prefix_push("sbi_debug_read_triggers"); + if (!dbtr_install_trigger(shmem, &test, tdata1)) { + report_prefix_pop(); + return; + } + + ret = sbi_debug_read_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_read_triggers"); + + report(shmem->data.tdata1 == tdata1, "tdata1 expected: 0x%016lx, found: 0x%016lx", + tdata1, shmem->data.tdata1); + report(shmem->data.tdata2 == ((unsigned long)&test), + "tdata2 expected: 0x%016lx, found: 0x%016lx", ((unsigned long)&test), + shmem->data.tdata2); + report(shmem->data.tstate == tstatus_expected, "tstate expected: 0x%016lx, found: 0x%016lx", + tstatus_expected, shmem->data.tstate); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void check_exec(unsigned long base) +{ + struct sbiret ret; + + report(do_exec(), "exec triggered"); + + ret = sbi_debug_uninstall_triggers(base, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_uninstall_triggers"); +} + +static void dbtr_test_multiple(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type, + unsigned long num_trigs) +{ + static unsigned long test[2]; + struct sbiret ret; + bool have_three = num_trigs > 2; + + if (num_trigs < 2) + return; + + report_prefix_push("test_multiple"); + + if (!dbtr_install_trigger(shmem, &test[0], gen_tdata1(type, VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + if (!dbtr_install_trigger(shmem, &test[1], gen_tdata1(type, VALUE_LOAD, MODE_S))) + goto error; + if (have_three && + !dbtr_install_trigger(shmem, exec_call, gen_tdata1(type, VALUE_EXECUTE, MODE_S))) { + ret = sbi_debug_uninstall_triggers(1, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_uninstall_triggers"); + goto error; + } + + report(do_save(&test[0]), "save triggered"); + + if (do_load(&test[0])) + report_fail("save triggered by load"); + + report(do_load(&test[1]), "load triggered"); + + if (do_save(&test[1])) + report_fail("load triggered by save"); + + if (have_three) + check_exec(2); + + ret = sbi_debug_uninstall_triggers(1, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_uninstall_triggers"); + + if (do_load(&test[1])) + report_fail("load triggered after uninstall"); + + report(do_save(&test[0]), "save triggered"); + + if (!have_three && + dbtr_install_trigger(shmem, exec_call, gen_tdata1(type, VALUE_EXECUTE, MODE_S))) + check_exec(1); + +error: + ret = sbi_debug_uninstall_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_uninstall_triggers"); + + install_exception_handler(EXC_BREAKPOINT, NULL); + report_prefix_pop(); +} + +static void dbtr_test_multiple_types(struct sbi_dbtr_shmem_entry *shmem, unsigned long type) +{ + static unsigned long test; + + report_prefix_push("dbtr_test_multiple_types"); + + /* check if loads and saves trigger exec */ + if (!dbtr_install_trigger(shmem, &test, + gen_tdata1(type, VALUE_EXECUTE | VALUE_LOAD | VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + + report(do_load(&test), "load trigger"); + + report(do_save(&test), "save trigger"); + + dbtr_uninstall_trigger(); + + /* Check if exec works */ + if (!dbtr_install_trigger(shmem, exec_call, + gen_tdata1(type, VALUE_EXECUTE | VALUE_LOAD | VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + + report(do_exec(), "exec trigger"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_disable_uninstall(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + + report_prefix_push("disable uninstall"); + if (!dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + + ret = sbi_debug_disable_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_disable_triggers"); + + dbtr_uninstall_trigger(); + + if (!dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + + report(do_save(&test), "triggered"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_uninstall_enable(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + + report_prefix_push("uninstall enable"); + if (!dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + dbtr_uninstall_trigger(); + + ret = sbi_debug_enable_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_enable_triggers"); + + install_exception_handler(EXC_BREAKPOINT, dbtr_exception_handler); + + report(!do_save(&test), "should not trigger"); + + install_exception_handler(EXC_BREAKPOINT, NULL); + report_prefix_pop(); +} + +static void dbtr_test_uninstall_update(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + + report_prefix_push("uninstall update"); + if (!dbtr_install_trigger(shmem, NULL, gen_tdata1(type, VALUE_NONE, MODE_NONE))) { + report_prefix_pop(); + return; + } + + dbtr_uninstall_trigger(); + + shmem->id.idx = 0; + shmem->data.tdata1 = gen_tdata1(type, VALUE_STORE, MODE_S); + shmem->data.tdata2 = (unsigned long)&test; + + ret = sbi_debug_update_triggers(1); + sbiret_report_error(&ret, SBI_ERR_FAILURE, "sbi_debug_update_triggers"); + + install_exception_handler(EXC_BREAKPOINT, dbtr_exception_handler); + + report(!do_save(&test), "should not trigger"); + + install_exception_handler(EXC_BREAKPOINT, NULL); + report_prefix_pop(); +} + +static void dbtr_test_disable_read(struct sbi_dbtr_shmem_entry *shmem, enum McontrolType type) +{ + const unsigned long tstatus_expected = SBI_DBTR_TRIG_STATE_S | SBI_DBTR_TRIG_STATE_MAPPED; + const unsigned long tdata1 = gen_tdata1(type, VALUE_STORE, MODE_NONE); + static unsigned long test; + struct sbiret ret; + + report_prefix_push("disable_read"); + if (!dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_STORE, MODE_S))) { + report_prefix_pop(); + return; + } + + ret = sbi_debug_disable_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_disable_triggers"); + + ret = sbi_debug_read_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_read_triggers"); + + report(shmem->data.tdata1 == tdata1, "tdata1 expected: 0x%016lx, found: 0x%016lx", + tdata1, shmem->data.tdata1); + report(shmem->data.tdata2 == ((unsigned long)&test), + "tdata2 expected: 0x%016lx, found: 0x%016lx", + ((unsigned long)&test), shmem->data.tdata2); + report(shmem->data.tstate == tstatus_expected, "tstate expected: 0x%016lx, found: 0x%016lx", + tstatus_expected, shmem->data.tstate); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +void check_dbtr(void) +{ + static struct sbi_dbtr_shmem_entry shmem[RV_MAX_TRIGGERS] = {}; + unsigned long num_trigs; + enum McontrolType trig_type; + struct sbiret ret; + + report_prefix_push("dbtr"); + + if (!sbi_probe(SBI_EXT_DBTR)) { + report_skip("extension not available"); + report_prefix_pop(); + return; + } + + if (__sbi_get_imp_id() == SBI_IMPL_OPENSBI && + __sbi_get_imp_version() < sbi_impl_opensbi_mk_version(1, 6)) { + report_skip("OpenSBI < v1.7 detected, skipping tests"); + report_prefix_pop(); + return; + } + + num_trigs = dbtr_test_num_triggers(); + if (!num_trigs) + goto error; + + trig_type = dbtr_test_type(&num_trigs); + if (trig_type == SBI_DBTR_TDATA1_TYPE_NONE) + goto error; + + ret = sbi_debug_set_shmem(shmem); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_set_shmem"); + + ret = dbtr_test_save_install_uninstall(&shmem[0], trig_type); + /* install or uninstall failed */ + if (ret.error != SBI_SUCCESS) + goto error; + + dbtr_test_load(&shmem[0], trig_type); + dbtr_test_exec(&shmem[0], trig_type); + dbtr_test_read(&shmem[0], trig_type); + dbtr_test_disable_enable(&shmem[0], trig_type); + dbtr_test_update(&shmem[0], trig_type); + dbtr_test_multiple_types(&shmem[0], trig_type); + dbtr_test_multiple(shmem, trig_type, num_trigs); + dbtr_test_disable_uninstall(&shmem[0], trig_type); + dbtr_test_uninstall_enable(&shmem[0], trig_type); + dbtr_test_uninstall_update(&shmem[0], trig_type); + dbtr_test_disable_read(&shmem[0], trig_type); + +error: + report_prefix_pop(); +} diff --git a/riscv/sbi-tests.h b/riscv/sbi-tests.h index d5c4ae70..6a227745 100644 --- a/riscv/sbi-tests.h +++ b/riscv/sbi-tests.h @@ -99,6 +99,7 @@ static inline bool env_enabled(const char *env) void sbi_bad_fid(int ext); void check_sse(void); +void check_dbtr(void); #endif /* __ASSEMBLER__ */ #endif /* _RISCV_SBI_TESTS_H_ */ diff --git a/riscv/sbi.c b/riscv/sbi.c index edb1a6be..5bd496d0 100644 --- a/riscv/sbi.c +++ b/riscv/sbi.c @@ -1561,6 +1561,7 @@ int main(int argc, char **argv) check_susp(); check_sse(); check_fwft(); + check_dbtr(); return report_summary(); } -- 2.43.0

1 month

2
1
0 0

[RFC Patch 0/2] selftests/mm: assert rmap behave as expected

by Wei Yang

As David suggested, currently we don't have a high level test case to verify the behavior of rmap. This patch set introduce the verification on rmap by migration. Patch 1 is a preparation to move ksm related operation into vm_util. Patch 2 is the new test case. Currently it covers following four scenarios: * anonymous page * shmem page * pagecache page * ksm page Wei Yang (2): selftests/mm: put general ksm operation into vm_util selftests/mm: assert rmap behave as expected MAINTAINERS | 1 + tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 3 + .../selftests/mm/ksm_functional_tests.c | 76 +-- tools/testing/selftests/mm/rmap.c | 466 ++++++++++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 4 + tools/testing/selftests/mm/vm_util.c | 71 +++ tools/testing/selftests/mm/vm_util.h | 7 + 8 files changed, 563 insertions(+), 66 deletions(-) create mode 100644 tools/testing/selftests/mm/rmap.c -- 2.34.1

1 month

1
3
0 0

[PATCH] selftests/timers: Fix integer overlow errors on 32 bit systems

by Terry Tritton

The use of NSEC_PER_SEC (1000000000L) as defined in include/vdso/time64.h causes several integer overflow warnings and test errors on 32 bit architectures. Use a long long instead of long to prevent integer overflow when converting seconds to nanoseconds. Signed-off-by: Terry Tritton <terry.tritton(a)linaro.org> --- tools/testing/selftests/timers/adjtick.c | 5 ++++- tools/testing/selftests/timers/alarmtimer-suspend.c | 4 +++- tools/testing/selftests/timers/inconsistency-check.c | 4 +++- tools/testing/selftests/timers/leap-a-day.c | 4 +++- tools/testing/selftests/timers/mqueue-lat.c | 3 ++- tools/testing/selftests/timers/nanosleep.c | 4 +++- tools/testing/selftests/timers/nsleep-lat.c | 4 +++- tools/testing/selftests/timers/posix_timers.c | 5 ++++- tools/testing/selftests/timers/raw_skew.c | 4 +++- tools/testing/selftests/timers/set-2038.c | 4 +++- tools/testing/selftests/timers/set-timer-lat.c | 4 +++- tools/testing/selftests/timers/valid-adjtimex.c | 5 ++++- 12 files changed, 38 insertions(+), 12 deletions(-) diff --git a/tools/testing/selftests/timers/adjtick.c b/tools/testing/selftests/timers/adjtick.c index 777d9494b683..b5929c33b632 100644 --- a/tools/testing/selftests/timers/adjtick.c +++ b/tools/testing/selftests/timers/adjtick.c @@ -22,10 +22,13 @@ #include <sys/time.h> #include <sys/timex.h> #include <time.h> -#include <include/vdso/time64.h> #include "../kselftest.h" +/* define NSEC_PER_SEC as long long to avoid overflow on 32 bit architectures*/ +#define NSEC_PER_SEC 1000000000LL +#define USEC_PER_SEC 1000000LL + #define MILLION 1000000 long systick; diff --git a/tools/testing/selftests/timers/alarmtimer-suspend.c b/tools/testing/selftests/timers/alarmtimer-suspend.c index a9ef76ea6051..b5799df271ae 100644 --- a/tools/testing/selftests/timers/alarmtimer-suspend.c +++ b/tools/testing/selftests/timers/alarmtimer-suspend.c @@ -28,10 +28,12 @@ #include <signal.h> #include <stdlib.h> #include <pthread.h> -#include <include/vdso/time64.h> #include <errno.h> #include "../kselftest.h" +/* define NSEC_PER_SEC as long long to avoid overflow on 32 bit architectures*/ +#define NSEC_PER_SEC 1000000000LL + #define UNREASONABLE_LAT (NSEC_PER_SEC * 5) /* hopefully we resume in 5 secs */ #define SUSPEND_SECS 15 diff --git a/tools/testing/selftests/timers/inconsistency-check.c b/tools/testing/selftests/timers/inconsistency-check.c index 9d1573769d55..2b2d7293b313 100644 --- a/tools/testing/selftests/timers/inconsistency-check.c +++ b/tools/testing/selftests/timers/inconsistency-check.c @@ -28,9 +28,11 @@ #include <sys/timex.h> #include <string.h> #include <signal.h> -#include <include/vdso/time64.h> #include "../kselftest.h" +/* define NSEC_PER_SEC as long long to avoid overflow on 32 bit architectures*/ +#define NSEC_PER_SEC 1000000000LL + /* CLOCK_HWSPECIFIC == CLOCK_SGI_CYCLE (Deprecated) */ #define CLOCK_HWSPECIFIC 10 diff --git a/tools/testing/selftests/timers/leap-a-day.c b/tools/testing/selftests/timers/leap-a-day.c index 04004a7c0934..008c38ce4b2f 100644 --- a/tools/testing/selftests/timers/leap-a-day.c +++ b/tools/testing/selftests/timers/leap-a-day.c @@ -48,9 +48,11 @@ #include <string.h> #include <signal.h> #include <unistd.h> -#include <include/vdso/time64.h> #include "../kselftest.h" +/* define NSEC_PER_SEC as long long to avoid overflow on 32 bit architectures*/ +#define NSEC_PER_SEC 1000000000LL + #define CLOCK_TAI 11 time_t next_leap; diff --git a/tools/testing/selftests/timers/mqueue-lat.c b/tools/testing/selftests/timers/mqueue-lat.c index 63de2334a291..1a6d26f86137 100644 --- a/tools/testing/selftests/timers/mqueue-lat.c +++ b/tools/testing/selftests/timers/mqueue-lat.c @@ -29,9 +29,10 @@ #include <signal.h> #include <errno.h> #include <mqueue.h> -#include <include/vdso/time64.h> #include "../kselftest.h" +/* define NSEC_PER_SEC as long long to avoid overflow on 32 bit architectures*/ +#define NSEC_PER_SEC 1000000000LL #define TARGET_TIMEOUT 100000000 /* 100ms in nanoseconds */ #define UNRESONABLE_LATENCY 40000000 /* 40ms in nanosecs */ diff --git a/tools/testing/selftests/timers/nanosleep.c b/tools/testing/selftests/timers/nanosleep.c index 252c6308c569..55ea67478fdd 100644 --- a/tools/testing/selftests/timers/nanosleep.c +++ b/tools/testing/selftests/timers/nanosleep.c @@ -27,9 +27,11 @@ #include <sys/timex.h> #include <string.h> #include <signal.h> -#include <include/vdso/time64.h> #include "../kselftest.h" +/* define NSEC_PER_SEC as long long to avoid overflow on 32 bit architectures*/ +#define NSEC_PER_SEC 1000000000LL + /* CLOCK_HWSPECIFIC == CLOCK_SGI_CYCLE (Deprecated) */ #define CLOCK_HWSPECIFIC 10 diff --git a/tools/testing/selftests/timers/nsleep-lat.c b/tools/testing/selftests/timers/nsleep-lat.c index de23dc0c9f97..347d622987c8 100644 --- a/tools/testing/selftests/timers/nsleep-lat.c +++ b/tools/testing/selftests/timers/nsleep-lat.c @@ -24,9 +24,11 @@ #include <sys/timex.h> #include <string.h> #include <signal.h> -#include <include/vdso/time64.h> #include "../kselftest.h" +/* define NSEC_PER_SEC as long long to avoid overflow on 32 bit architectures*/ +#define NSEC_PER_SEC 1000000000LL + #define UNRESONABLE_LATENCY 40000000 /* 40ms in nanosecs */ /* CLOCK_HWSPECIFIC == CLOCK_SGI_CYCLE (Deprecated) */ diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c index f0eceb0faf34..555bf161f420 100644 --- a/tools/testing/selftests/timers/posix_timers.c +++ b/tools/testing/selftests/timers/posix_timers.c @@ -16,11 +16,14 @@ #include <string.h> #include <unistd.h> #include <time.h> -#include <include/vdso/time64.h> #include <pthread.h> #include "../kselftest.h" +/* define NSEC_PER_SEC as long long to avoid overflow on 32 bit architectures*/ +#define NSEC_PER_SEC 1000000000LL +#define USEC_PER_SEC 1000000LL + #define DELAY 2 static void __fatal_error(const char *test, const char *name, const char *what) diff --git a/tools/testing/selftests/timers/raw_skew.c b/tools/testing/selftests/timers/raw_skew.c index 957f7cd29cb1..ff7675d98560 100644 --- a/tools/testing/selftests/timers/raw_skew.c +++ b/tools/testing/selftests/timers/raw_skew.c @@ -25,9 +25,11 @@ #include <sys/time.h> #include <sys/timex.h> #include <time.h> -#include <include/vdso/time64.h> #include "../kselftest.h" +/* define NSEC_PER_SEC as long long to avoid overflow on 32 bit architectures*/ +#define NSEC_PER_SEC 1000000000LL + #define shift_right(x, s) ({ \ __typeof__(x) __x = (x); \ __typeof__(s) __s = (s); \ diff --git a/tools/testing/selftests/timers/set-2038.c b/tools/testing/selftests/timers/set-2038.c index ed244315e11c..8130d551a11c 100644 --- a/tools/testing/selftests/timers/set-2038.c +++ b/tools/testing/selftests/timers/set-2038.c @@ -27,9 +27,11 @@ #include <unistd.h> #include <time.h> #include <sys/time.h> -#include <include/vdso/time64.h> #include "../kselftest.h" +/* define NSEC_PER_SEC as long long to avoid overflow on 32 bit architectures*/ +#define NSEC_PER_SEC 1000000000LL + #define KTIME_MAX ((long long)~((unsigned long long)1 << 63)) #define KTIME_SEC_MAX (KTIME_MAX / NSEC_PER_SEC) diff --git a/tools/testing/selftests/timers/set-timer-lat.c b/tools/testing/selftests/timers/set-timer-lat.c index 9d8437c13929..79a6a6cba186 100644 --- a/tools/testing/selftests/timers/set-timer-lat.c +++ b/tools/testing/selftests/timers/set-timer-lat.c @@ -28,9 +28,11 @@ #include <signal.h> #include <stdlib.h> #include <pthread.h> -#include <include/vdso/time64.h> #include "../kselftest.h" +/* define NSEC_PER_SEC as long long to avoid overflow on 32 bit architectures*/ +#define NSEC_PER_SEC 1000000000LL + /* CLOCK_HWSPECIFIC == CLOCK_SGI_CYCLE (Deprecated) */ #define CLOCK_HWSPECIFIC 10 diff --git a/tools/testing/selftests/timers/valid-adjtimex.c b/tools/testing/selftests/timers/valid-adjtimex.c index 6b7801055ad1..e4f31e678630 100644 --- a/tools/testing/selftests/timers/valid-adjtimex.c +++ b/tools/testing/selftests/timers/valid-adjtimex.c @@ -29,9 +29,12 @@ #include <string.h> #include <signal.h> #include <unistd.h> -#include <include/vdso/time64.h> #include "../kselftest.h" +/* define NSEC_PER_SEC as long long to avoid overflow on 32 bit architectures*/ +#define NSEC_PER_SEC 1000000000LL +#define USEC_PER_SEC 1000000LL + #define ADJ_SETOFFSET 0x0100 #include <sys/syscall.h> -- 2.39.5

1 month

2
2
0 0

[PATCH] selftests: ublk: kublk: improve behavior on init failure

by Uday Shankar

Some failure modes are handled poorly by kublk. For example, if ublk_drv is built as a module but not currently loaded into the kernel, ./kublk add ... just hangs forever. This happens because in this case (and a few others), the worker process does not notify its parent (via a write to the shared eventfd) that it has tried and failed to initialize, so the parent hangs forever. Fix this by ensuring that we always notify the parent process of any initialization failure, and have the parent print a (not very descriptive) log line when this happens. Signed-off-by: Uday Shankar <ushankar(a)purestorage.com> --- tools/testing/selftests/ublk/kublk.c | 34 +++++++++++++++++++++++----------- 1 file changed, 23 insertions(+), 11 deletions(-) diff --git a/tools/testing/selftests/ublk/kublk.c b/tools/testing/selftests/ublk/kublk.c index a98e14e4c245965d817b93843ff9a4011291223b..e2d2042810d4bb472e48a0ed91317d2bdf6e2f2a 100644 --- a/tools/testing/selftests/ublk/kublk.c +++ b/tools/testing/selftests/ublk/kublk.c @@ -1112,7 +1112,7 @@ static int __cmd_dev_add(const struct dev_ctx *ctx) __u64 features; const struct ublk_tgt_ops *ops; struct ublksrv_ctrl_dev_info *info; - struct ublk_dev *dev; + struct ublk_dev *dev = NULL; int dev_id = ctx->dev_id; int ret, i; @@ -1120,13 +1120,15 @@ static int __cmd_dev_add(const struct dev_ctx *ctx) if (!ops) { ublk_err("%s: no such tgt type, type %s\n", __func__, tgt_type); - return -ENODEV; + ret = -ENODEV; + goto fail; } if (nr_queues > UBLK_MAX_QUEUES || depth > UBLK_QUEUE_DEPTH) { ublk_err("%s: invalid nr_queues or depth queues %u depth %u\n", __func__, nr_queues, depth); - return -EINVAL; + ret = -EINVAL; + goto fail; } /* default to 1:1 threads:queues if nthreads is unspecified */ @@ -1136,30 +1138,37 @@ static int __cmd_dev_add(const struct dev_ctx *ctx) if (nthreads > UBLK_MAX_THREADS) { ublk_err("%s: %u is too many threads (max %u)\n", __func__, nthreads, UBLK_MAX_THREADS); - return -EINVAL; + ret = -EINVAL; + goto fail; } if (nthreads != nr_queues && !ctx->per_io_tasks) { ublk_err("%s: threads %u must be same as queues %u if " "not using per_io_tasks\n", __func__, nthreads, nr_queues); - return -EINVAL; + ret = -EINVAL; + goto fail; } dev = ublk_ctrl_init(); if (!dev) { ublk_err("%s: can't alloc dev id %d, type %s\n", __func__, dev_id, tgt_type); - return -ENOMEM; + ret = -ENOMEM; + goto fail; } /* kernel doesn't support get_features */ ret = ublk_ctrl_get_features(dev, &features); - if (ret < 0) - return -EINVAL; + if (ret < 0) { + ret = -EINVAL; + goto fail; + } - if (!(features & UBLK_F_CMD_IOCTL_ENCODE)) - return -ENOTSUP; + if (!(features & UBLK_F_CMD_IOCTL_ENCODE)) { + ret = -ENOTSUP; + goto fail; + } info = &dev->dev_info; info->dev_id = ctx->dev_id; @@ -1200,7 +1209,8 @@ static int __cmd_dev_add(const struct dev_ctx *ctx) fail: if (ret < 0) ublk_send_dev_event(ctx, dev, -1); - ublk_ctrl_deinit(dev); + if (dev) + ublk_ctrl_deinit(dev); return ret; } @@ -1262,6 +1272,8 @@ static int cmd_dev_add(struct dev_ctx *ctx) shmctl(ctx->_shmid, IPC_RMID, NULL); /* wait for child and detach from it */ wait(NULL); + if (exit_code == EXIT_FAILURE) + ublk_err("%s: command failed\n", __func__); exit(exit_code); } else { exit(EXIT_FAILURE); --- base-commit: c09a8b00f850d3ca0af998bff1fac4a3f6d11768 change-id: 20250603-ublk_init_fail-b498905159eb Best regards, -- Uday Shankar <ushankar(a)purestorage.com>

1 month

3
2
0 0

[PATCH v2] selftests: ir_decoder: Convert header comment to proper multi-line block

by Abdelrahman Fekry

well, i checked the script using checkpatch.pl and it shows that the patch has no warnings or errors and its ready to be sent v2: - fixed multiple trailing whitespace errors and - the Signed-off-by mismatch The test file for the IR decoder used single-line comments at the top to document its purpose and licensing, which is inconsistent with the style used throughout the Linux kernel. In this patch i converted the file header to a proper multi-line comment block (/*) that aligns with standard kernel practices. This improves readability, consistency across selftests, and ensures the license and documentation are clearly visible in a familiar format. No functional changes have been made. Signed-off-by: Abdelrahman Fekry <abdelrahmanfekry375(a)gmail.com> --- tools/testing/selftests/ir/ir_loopback.c | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/tools/testing/selftests/ir/ir_loopback.c b/tools/testing/selftests/ir/ir_loopback.c index f4a15cbdd5ea..c94faa975630 100644 --- a/tools/testing/selftests/ir/ir_loopback.c +++ b/tools/testing/selftests/ir/ir_loopback.c @@ -1,14 +1,17 @@ // SPDX-License-Identifier: GPL-2.0 -// test ir decoder -// -// Copyright (C) 2018 Sean Young <sean(a)mess.org> - -// When sending LIRC_MODE_SCANCODE, the IR will be encoded. rc-loopback -// will send this IR to the receiver side, where we try to read the decoded -// IR. Decoding happens in a separate kernel thread, so we will need to -// wait until that is scheduled, hence we use poll to check for read -// readiness. - +/* Copyright (C) 2018 Sean Young <sean(a)mess.org> + * + * Selftest for IR decoder + * + * + * When sending LIRC_MODE_SCANCODE, the IR will be encoded. + * rc-loopback will send this IR to the receiver side, + * where we try to read the decoded IR. + * Decoding happens in a separate kernel thread, + * so we will need to wait until that is scheduled, + * hence we use poll to check for read + * readiness. + */ #include <linux/lirc.h> #include <errno.h> #include <stdio.h> -- 2.25.1

1 month

1
0
0 0

[PATCH bpf-next v3 00/11] bpf: Mitigate Spectre v1 using barriers

by Luis Gerhorst

This improves the expressiveness of unprivileged BPF by inserting speculation barriers instead of rejecting the programs. The approach was previously presented at LPC'24 [1] and RAID'24 [2]. To mitigate the Spectre v1 (PHT) vulnerability, the kernel rejects potentially-dangerous unprivileged BPF programs as of commit 9183671af6db ("bpf: Fix leakage under speculation on mispredicted branches"). In [2], we have analyzed 364 object files from open source projects (Linux Samples and Selftests, BCC, Loxilb, Cilium, libbpf Examples, Parca, and Prevail) and found that this affects 31% to 54% of programs. To resolve this in the majority of cases this patchset adds a fall-back for mitigating Spectre v1 using speculation barriers. The kernel still optimistically attempts to verify all speculative paths but uses speculation barriers against v1 when unsafe behavior is detected. This allows for more programs to be accepted without disabling the BPF Spectre mitigations (e.g., by setting cpu_mitigations_off()). For this, it relies on the fact that speculation barriers prevent all later instructions if the speculation was not correct: * On x86_64, lfence acts as full speculation barrier, not only as a load fence [3]: An LFENCE instruction or a serializing instruction will ensure that no later instructions execute, even speculatively, until all prior instructions complete locally. [...] Inserting an LFENCE instruction after a bounds check prevents later operations from executing before the bound check completes. This was experimentally confirmed in [4]. * ARM's SB speculation barrier instruction also affects "any instruction that appears later in the program order than the barrier" [5]. In [1] we have measured the overhead of this approach relative to having mitigations off and including the upstream Spectre v4 mitigations. For event tracing and stack-sampling profilers, we found that mitigations increase BPF program execution time by 0% to 62%. For the Loxilb network load balancer, we have measured a 14% slowdown in SCTP performance but no significant slowdown for TCP. This overhead only applies to programs that were previously rejected. I reran the expressiveness-evaluation with v6.14 and made sure the main results still match those from [1] and [2] (which used v6.5). Main design decisions are: * Do not use separate bytecode insns for v1 and v4 barriers (inspired by Daniel Borkmann's question at LPC). This simplifies the verifier significantly and has the only downside that performance on PowerPC is not as high as it could be. * Allow archs to still disable v1/v4 mitigations separately by setting bpf_jit_bypass_spec_v1/v4(). This has the benefit that archs can benefit from improved BPF expressiveness / performance if they are not vulnerable (e.g., ARM64 for v4 in the kernel). * Do not remove the empty BPF_NOSPEC implementation for backends for which it is unknown whether they are vulnerable to Spectre v1. [1] https://lpc.events/event/18/contributions/1954/ ("Mitigating Spectre-PHT using Speculation Barriers in Linux eBPF") [2] https://arxiv.org/pdf/2405.00078 ("VeriFence: Lightweight and Precise Spectre Defenses for Untrusted Linux Kernel Extensions") [3] https://www.intel.com/content/www/us/en/developer/articles/technical/softwa… ("Managed Runtime Speculative Execution Side Channel Mitigations") [4] https://dl.acm.org/doi/pdf/10.1145/3359789.3359837 ("Speculator: a tool to analyze speculative execution attacks and mitigations" - Section 4.6 "Stopping Speculative Execution") [5] https://developer.arm.com/documentation/ddi0597/2020-12/Base-Instructions/S… ("SB - Speculation Barrier - Arm Armv8-A A32/T32 Instruction Set Architecture (2020-12)") Changes: * v2 -> v3: - Fix https://lore.kernel.org/oe-kbuild-all/202504212030.IF1SLhz6-lkp@intel.com/ and similar by moving the bpf_jit_bypass_spec_v1/v4() prototypes out of the #ifdef CONFIG_BPF_SYSCALL. Decided not to move them to filter.h (where similar bpf_jit_*() prototypes live) as they would still have to be duplicated in bpf.h to be usable to bpf_bypass_spec_v1/v4() (unless including filter.h in bpf.h is an option). - Fix https://lore.kernel.org/oe-kbuild-all/202504220035.SoGveGpj-lkp@intel.com/ by moving the variable declarations out of the switch-case. - Build touched C files with W=2 and bpf config on x86 to check that there are no other warnings introduced. - Found 3 more checkpatch warnings that can be fixed without degrading readability. - Rebase to bpf-next 2025-05-01 - Link to v2: https://lore.kernel.org/bpf/20250421091802.3234859-1-luis.gerhorst@fau.de/ * v1 -> v2: - Drop former commits 9 ("bpf: Return PTR_ERR from push_stack()") and 11 ("bpf: Fall back to nospec for spec path verification") as suggested by Alexei. This series therefore no longer changes push_stack() to return PTR_ERR. - Add detailed explanation of how lfence works internally and how it affects the algorithm. - Add tests checking that nospec instructions are inserted in expected locations using __xlated_unpriv as suggested by Eduard (also, include a fix for __xlated_unpriv) - Add a test for the mitigations from the description of commit 9183671af6db ("bpf: Fix leakage under speculation on mispredicted branches") - Remove unused variables from do_check[_insn]() as suggested by Eduard. - Remove INSN_IDX_MODIFIED to improve readability as suggested by Eduard. This also causes the nospec_result-check to run (and fail) for jumping-ops. Add a warning to assert that this check must never succeed in that case. - Add details on the safety of patch 10 ("bpf: Allow nospec-protected var-offset stack access") based on the feedback on v1. - Rebase to bpf-next-250420 - Link to v1: https://lore.kernel.org/all/20250313172127.1098195-1-luis.gerhorst@fau.de/ * RFC -> v1: - rebase to bpf-next-250313 - tests: mark expected successes/new errors - add bpt_jit_bypass_spec_v1/v4() to avoid #ifdef in bpf_bypass_spec_v1/v4() - ensure that nospec with v1-support is implemented for archs for which GCC supports speculation barriers, except for MIPS - arm64: emit speculation barrier - powerpc: change nospec to include v1 barrier - discuss potential security (archs that do not impl. BPF nospec) and performance (only PowerPC) regressions - Link to RFC: https://lore.kernel.org/bpf/20250224203619.594724-1-luis.gerhorst@fau.de/ Luis Gerhorst (11): selftests/bpf: Fix caps for __xlated/jited_unpriv bpf: Move insn if/else into do_check_insn() bpf: Return -EFAULT on misconfigurations bpf: Return -EFAULT on internal errors bpf, arm64, powerpc: Add bpf_jit_bypass_spec_v1/v4() bpf, arm64, powerpc: Change nospec to include v1 barrier bpf: Rename sanitize_stack_spill to nospec_result bpf: Fall back to nospec for Spectre v1 selftests/bpf: Add test for Spectre v1 mitigation bpf: Allow nospec-protected var-offset stack access bpf: Fall back to nospec for sanitization-failures arch/arm64/net/bpf_jit.h | 5 + arch/arm64/net/bpf_jit_comp.c | 28 +- arch/powerpc/net/bpf_jit_comp64.c | 80 ++- include/linux/bpf.h | 11 +- include/linux/bpf_verifier.h | 3 +- include/linux/filter.h | 2 +- kernel/bpf/core.c | 32 +- kernel/bpf/verifier.c | 653 ++++++++++-------- tools/testing/selftests/bpf/progs/bpf_misc.h | 4 + .../selftests/bpf/progs/verifier_and.c | 8 +- .../selftests/bpf/progs/verifier_bounds.c | 66 +- .../bpf/progs/verifier_bounds_deduction.c | 45 +- .../selftests/bpf/progs/verifier_map_ptr.c | 20 +- .../selftests/bpf/progs/verifier_movsx.c | 16 +- .../selftests/bpf/progs/verifier_unpriv.c | 65 +- .../bpf/progs/verifier_value_ptr_arith.c | 101 ++- tools/testing/selftests/bpf/test_loader.c | 14 +- .../selftests/bpf/verifier/dead_code.c | 3 +- tools/testing/selftests/bpf/verifier/jmp32.c | 33 +- tools/testing/selftests/bpf/verifier/jset.c | 10 +- 20 files changed, 771 insertions(+), 428 deletions(-) base-commit: 358b1c0f56ebb6996fcec7dcdcf6bae5dcbc8b6c -- 2.49.0

1 month

6
34
0 0

[PATCH v2 1/2] libbpf: add support for printing BTF character arrays as strings

by Blake Jones

The BTF dumper code currently displays arrays of characters as just that - arrays, with each character formatted individually. Sometimes this is what makes sense, but it's nice to be able to treat that array as a string. This change adds a special case to the btf_dump functionality to allow arrays of single-byte integer values to be printed as character strings. Characters for which isprint() returns false are printed as hex-escaped values. This is enabled when the new ".emit_strings" is set to 1 in the btf_dump_type_data_opts structure. As an example, here's what it looks like to dump the string "hello" using a few different field values for btf_dump_type_data_opts (.compact = 1): - .emit_strings = 0, .skip_names = 0: (char[6])['h','e','l','l','o',] - .emit_strings = 0, .skip_names = 1: ['h','e','l','l','o',] - .emit_strings = 1, .skip_names = 0: (char[6])"hello" - .emit_strings = 1, .skip_names = 1: "hello" Here's the string "h\xff", dumped with .compact = 1 and .skip_names = 1: - .emit_strings = 0: ['h',-1,] - .emit_strings = 1: "h\xff" Signed-off-by: Blake Jones <blakejones(a)google.com> --- tools/lib/bpf/btf.h | 3 ++- tools/lib/bpf/btf_dump.c | 44 +++++++++++++++++++++++++++++++++++++++- 2 files changed, 45 insertions(+), 2 deletions(-) diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h index 4392451d634b..ccfd905f03df 100644 --- a/tools/lib/bpf/btf.h +++ b/tools/lib/bpf/btf.h @@ -326,9 +326,10 @@ struct btf_dump_type_data_opts { bool compact; /* no newlines/indentation */ bool skip_names; /* skip member/type names */ bool emit_zeroes; /* show 0-valued fields */ + bool emit_strings; /* print char arrays as strings */ size_t :0; }; -#define btf_dump_type_data_opts__last_field emit_zeroes +#define btf_dump_type_data_opts__last_field emit_strings LIBBPF_API int btf_dump__dump_type_data(struct btf_dump *d, __u32 id, diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c index 460c3e57fadb..336a6646e0fa 100644 --- a/tools/lib/bpf/btf_dump.c +++ b/tools/lib/bpf/btf_dump.c @@ -68,6 +68,7 @@ struct btf_dump_data { bool compact; bool skip_names; bool emit_zeroes; + bool emit_strings; __u8 indent_lvl; /* base indent level */ char indent_str[BTF_DATA_INDENT_STR_LEN]; /* below are used during iteration */ @@ -2028,6 +2029,43 @@ static int btf_dump_var_data(struct btf_dump *d, return btf_dump_dump_type_data(d, NULL, t, type_id, data, 0, 0); } +static int btf_dump_string_data(struct btf_dump *d, + const struct btf_type *t, + __u32 id, + const void *data) +{ + const struct btf_array *array = btf_array(t); + __u32 i; + + btf_dump_data_pfx(d); + btf_dump_printf(d, "\""); + + for (i = 0; i < array->nelems; i++, data++) { + char c; + + if (data >= d->typed_dump->data_end) + return -E2BIG; + + c = *(char *)data; + if (c == '\0') { + /* + * When printing character arrays as strings, NUL bytes + * are always treated as string terminators; they are + * never printed. + */ + break; + } + if (isprint(c)) + btf_dump_printf(d, "%c", c); + else + btf_dump_printf(d, "\\x%02x", *(__u8 *)data); + } + + btf_dump_printf(d, "\""); + + return 0; +} + static int btf_dump_array_data(struct btf_dump *d, const struct btf_type *t, __u32 id, @@ -2055,8 +2093,11 @@ static int btf_dump_array_data(struct btf_dump *d, * char arrays, so if size is 1 and element is * printable as a char, we'll do that. */ - if (elem_size == 1) + if (elem_size == 1) { + if (d->typed_dump->emit_strings) + return btf_dump_string_data(d, t, id, data); d->typed_dump->is_array_char = true; + } } /* note that we increment depth before calling btf_dump_print() below; @@ -2544,6 +2585,7 @@ int btf_dump__dump_type_data(struct btf_dump *d, __u32 id, d->typed_dump->compact = OPTS_GET(opts, compact, false); d->typed_dump->skip_names = OPTS_GET(opts, skip_names, false); d->typed_dump->emit_zeroes = OPTS_GET(opts, emit_zeroes, false); + d->typed_dump->emit_strings = OPTS_GET(opts, emit_strings, false); ret = btf_dump_dump_type_data(d, NULL, t, id, data, 0, 0); -- 2.49.0.1204.g71687c7c1d-goog

1 month

3
5
0 0

[RESEND PATCH] selftests/bpf: Fix bpf selftest build error

by Saket Kumar Bhaskar

On linux-next, build for bpf selftest displays an error due to mismatch in the expected function signature of bpf_testmod_test_read and bpf_testmod_test_write. Commit 97d06802d10a ("sysfs: constify bin_attribute argument of bin_attribute::read/write()") changed the required type for struct bin_attribute to const struct bin_attribute. To resolve the error, update corresponding signature for the callback. Fixes: 97d06802d10a ("sysfs: constify bin_attribute argument of bin_attribute::read/write()") Reported-by: Venkat Rao Bagalkote <venkat88(a)linux.ibm.com> Closes: https://lore.kernel.org/all/e915da49-2b9a-4c4c-a34f-877f378129f6@linux.ibm.… Tested-by: Venkat Rao Bagalkote <venkat88(a)linux.ibm.com> Signed-off-by: Saket Kumar Bhaskar <skb99(a)linux.ibm.com> --- [RESEND]: - Added Fixes and Tested-by tag. - Added Greg as receipent for driver-core tree. Original patch: https://lore.kernel.org/all/20250509122348.649064-1-skb99@linux.ibm.com/ tools/testing/selftests/bpf/test_kmods/bpf_testmod.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c index 2e54b95ad898..194c442580ee 100644 --- a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c +++ b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c @@ -385,7 +385,7 @@ int bpf_testmod_fentry_ok; noinline ssize_t bpf_testmod_test_read(struct file *file, struct kobject *kobj, - struct bin_attribute *bin_attr, + const struct bin_attribute *bin_attr, char *buf, loff_t off, size_t len) { struct bpf_testmod_test_read_ctx ctx = { @@ -465,7 +465,7 @@ ALLOW_ERROR_INJECTION(bpf_testmod_test_read, ERRNO); noinline ssize_t bpf_testmod_test_write(struct file *file, struct kobject *kobj, - struct bin_attribute *bin_attr, + const struct bin_attribute *bin_attr, char *buf, loff_t off, size_t len) { struct bpf_testmod_test_write_ctx ctx = { -- 2.43.5

1 month

4
4
0 0

[PATCH net] selftests: drv-net: add configs for the TSO test

by Jakub Kicinski

Add missing config options for the tso.py test, specifically to make sure the kernel is built with vxlan and gre tunnels. I noticed this while adding a TSO-capable device QEMU to the CI. Previously we only run virtio tests and it doesn't report LSO stats on the QEMU we have. Fixes: 0d0f4174f6c8 ("selftests: drv-net: add a simple TSO test") Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- CC: shuah(a)kernel.org CC: willemb(a)google.com CC: linux-kselftest(a)vger.kernel.org --- tools/testing/selftests/drivers/net/hw/config | 6 ++++++ 1 file changed, 6 insertions(+) create mode 100644 tools/testing/selftests/drivers/net/hw/config diff --git a/tools/testing/selftests/drivers/net/hw/config b/tools/testing/selftests/drivers/net/hw/config new file mode 100644 index 000000000000..ea4b70d71563 --- /dev/null +++ b/tools/testing/selftests/drivers/net/hw/config @@ -0,0 +1,6 @@ +CONFIG_IPV6=y +CONFIG_IPV6_GRE=y +CONFIG_NET_IP_TUNNEL=y +CONFIG_NET_IPGRE=y +CONFIG_NET_IPGRE_DEMUX=y +CONFIG_VXLAN=y -- 2.49.0

1 month

2
2
0 0

[PATCH net v3] selftests: net: build net/lib dependency in all target

by Bui Quang Minh

We have the logic to include net/lib automatically for net related selftests. However, currently, this logic is only in install target which means only `make install` will have net/lib included. This commit adds the logic to all target so that all `make`, `make run_tests` and `make install` will have net/lib included in net related selftests. Signed-off-by: Bui Quang Minh <minhquangbui99(a)gmail.com> --- Changes in v3: - Don't remove INSTALL_DEP_TARGETS in install target so that net/lib is copied to INSTALL_PATH Changes in v2: - Make the commit message clearer. tools/testing/selftests/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 6aa11cd3db42..339b31e6a6b5 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -205,7 +205,7 @@ export KHDR_INCLUDES all: @ret=1; \ - for TARGET in $(TARGETS); do \ + for TARGET in $(TARGETS) $(INSTALL_DEP_TARGETS); do \ BUILD_TARGET=$$BUILD/$$TARGET; \ mkdir $$BUILD_TARGET -p; \ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET \ -- 2.43.0

1 month

2
1
0 0

[PATCH] selftests: ipc: Replace fail print statements with ksft_test_result_fail

by Nick Huang

Use the standard kselftest failure report function to ensure consistent test output formatting. This improves readability and integration with automated test frameworks. Signed-off-by: Nick Huang <sef1548(a)gmail.com> --- tools/testing/selftests/ipc/msgque.c | 47 ++++++++++++++-------------- 1 file changed, 23 insertions(+), 24 deletions(-) diff --git a/tools/testing/selftests/ipc/msgque.c b/tools/testing/selftests/ipc/msgque.c index e9dbb84c100a..5e36aeeb9901 100644 --- a/tools/testing/selftests/ipc/msgque.c +++ b/tools/testing/selftests/ipc/msgque.c @@ -39,26 +39,26 @@ int restore_queue(struct msgque_data *msgque) fd = open("/proc/sys/kernel/msg_next_id", O_WRONLY); if (fd == -1) { - printf("Failed to open /proc/sys/kernel/msg_next_id\n"); + ksft_test_result_fail("Failed to open /proc/sys/kernel/msg_next_id\n"); return -errno; } sprintf(buf, "%d", msgque->msq_id); ret = write(fd, buf, strlen(buf)); if (ret != strlen(buf)) { - printf("Failed to write to /proc/sys/kernel/msg_next_id\n"); + ksft_test_result_fail("Failed to write to /proc/sys/kernel/msg_next_id\n"); return -errno; } id = msgget(msgque->key, msgque->mode | IPC_CREAT | IPC_EXCL); if (id == -1) { - printf("Failed to create queue\n"); + ksft_test_result_fail("Failed to create queue\n"); return -errno; } if (id != msgque->msq_id) { - printf("Restored queue has wrong id (%d instead of %d)\n", - id, msgque->msq_id); + ksft_test_result_fail("Restored queue has wrong id (%d instead of %d)\n" + , id, msgque->msq_id); ret = -EFAULT; goto destroy; } @@ -66,7 +66,7 @@ int restore_queue(struct msgque_data *msgque) for (i = 0; i < msgque->qnum; i++) { if (msgsnd(msgque->msq_id, &msgque->messages[i].mtype, msgque->messages[i].msize, IPC_NOWAIT) != 0) { - printf("msgsnd failed (%m)\n"); + ksft_test_result_fail("msgsnd failed (%m)\n"); ret = -errno; goto destroy; } @@ -90,23 +90,22 @@ int check_and_destroy_queue(struct msgque_data *msgque) if (ret < 0) { if (errno == ENOMSG) break; - printf("Failed to read IPC message: %m\n"); + ksft_test_result_fail("Failed to read IPC message: %m\n"); ret = -errno; goto err; } if (ret != msgque->messages[cnt].msize) { - printf("Wrong message size: %d (expected %d)\n", ret, - msgque->messages[cnt].msize); + ksft_test_result_fail("Wrong message size: %d (expected %d)\n", ret, msgque->messages[cnt].msize); ret = -EINVAL; goto err; } if (message.mtype != msgque->messages[cnt].mtype) { - printf("Wrong message type\n"); + ksft_test_result_fail("Wrong message type\n"); ret = -EINVAL; goto err; } if (memcmp(message.mtext, msgque->messages[cnt].mtext, ret)) { - printf("Wrong message content\n"); + ksft_test_result_fail("Wrong message content\n"); ret = -EINVAL; goto err; } @@ -114,7 +113,7 @@ int check_and_destroy_queue(struct msgque_data *msgque) } if (cnt != msgque->qnum) { - printf("Wrong message number\n"); + ksft_test_result_fail("Wrong message number\n"); ret = -EINVAL; goto err; } @@ -139,7 +138,7 @@ int dump_queue(struct msgque_data *msgque) if (ret < 0) { if (errno == EINVAL) continue; - printf("Failed to get stats for IPC queue with id %d\n", + ksft_test_result_fail("Failed to get stats for IPC queue with id %d\n", kern_id); return -errno; } @@ -150,7 +149,7 @@ int dump_queue(struct msgque_data *msgque) msgque->messages = malloc(sizeof(struct msg1) * ds.msg_qnum); if (msgque->messages == NULL) { - printf("Failed to get stats for IPC queue\n"); + ksft_test_result_fail("Failed to get stats for IPC queue\n"); return -ENOMEM; } @@ -162,7 +161,7 @@ int dump_queue(struct msgque_data *msgque) ret = msgrcv(msgque->msq_id, &msgque->messages[i].mtype, MAX_MSG_SIZE, i, IPC_NOWAIT | MSG_COPY); if (ret < 0) { - printf("Failed to copy IPC message: %m (%d)\n", errno); + ksft_test_result_fail("Failed to copy IPC message: %m (%d)\n", errno); return -errno; } msgque->messages[i].msize = ret; @@ -178,7 +177,7 @@ int fill_msgque(struct msgque_data *msgque) memcpy(msgbuf.mtext, TEST_STRING, sizeof(TEST_STRING)); if (msgsnd(msgque->msq_id, &msgbuf.mtype, sizeof(TEST_STRING), IPC_NOWAIT) != 0) { - printf("First message send failed (%m)\n"); + ksft_test_result_fail("First message send failed (%m)\n"); return -errno; } @@ -186,7 +185,7 @@ int fill_msgque(struct msgque_data *msgque) memcpy(msgbuf.mtext, ANOTHER_TEST_STRING, sizeof(ANOTHER_TEST_STRING)); if (msgsnd(msgque->msq_id, &msgbuf.mtype, sizeof(ANOTHER_TEST_STRING), IPC_NOWAIT) != 0) { - printf("Second message send failed (%m)\n"); + ksft_test_result_fail("Second message send failed (%m)\n"); return -errno; } return 0; @@ -202,44 +201,44 @@ int main(int argc, char **argv) msgque.key = ftok(argv[0], 822155650); if (msgque.key == -1) { - printf("Can't make key: %d\n", -errno); + ksft_test_result_fail("Can't make key: %d\n", -errno); ksft_exit_fail(); } msgque.msq_id = msgget(msgque.key, IPC_CREAT | IPC_EXCL | 0666); if (msgque.msq_id == -1) { err = -errno; - printf("Can't create queue: %d\n", err); + ksft_test_result_fail("Can't create queue: %d\n", err); goto err_out; } err = fill_msgque(&msgque); if (err) { - printf("Failed to fill queue: %d\n", err); + ksft_test_result_fail("Failed to fill queue: %d\n", err); goto err_destroy; } err = dump_queue(&msgque); if (err) { - printf("Failed to dump queue: %d\n", err); + ksft_test_result_fail("Failed to dump queue: %d\n", err); goto err_destroy; } err = check_and_destroy_queue(&msgque); if (err) { - printf("Failed to check and destroy queue: %d\n", err); + ksft_test_result_fail("Failed to check and destroy queue: %d\n", err); goto err_out; } err = restore_queue(&msgque); if (err) { - printf("Failed to restore queue: %d\n", err); + ksft_test_result_fail("Failed to restore queue: %d\n", err); goto err_destroy; } err = check_and_destroy_queue(&msgque); if (err) { - printf("Failed to test queue: %d\n", err); + ksft_test_result_fail("Failed to test queue: %d\n", err); goto err_out; } ksft_exit_pass(); -- 2.48.1

1 month

3
2
0 0

[PATCH net] Revert "kunit: configs: Enable CONFIG_INIT_STACK_ALL_PATTERN in all_tests"

by Jakub Kicinski

This reverts commit a571a9a1b120264e24b41eddf1ac5140131bfa84. The commit in question breaks kunit for older compilers: $ gcc --version gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5) $ ./tools/testing/kunit/kunit.py run --alltests --json --arch=x86_64 Configuring KUnit Kernel ... Regenerating .config ... Populating config with: $ make ARCH=x86_64 O=.kunit olddefconfig ERROR:root:Not all Kconfig options selected in kunitconfig were in the generated .config. This is probably due to unsatisfied dependencies. Missing: CONFIG_INIT_STACK_ALL_PATTERN=y Link: https://lore.kernel.org/20250529083811.778bc31b@kernel.org Fixes: a571a9a1b120 ("kunit: configs: Enable CONFIG_INIT_STACK_ALL_PATTERN in all_tests") Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- I'd like to take this in via netdev since it fixes our CI. We'll send it to Linus next week. CC: brendan.higgins(a)linux.dev CC: davidgow(a)google.com CC: rmoar(a)google.com CC: broonie(a)kernel.org CC: rf(a)opensource.cirrus.com CC: mic(a)digikod.net CC: skhan(a)linuxfoundation.org CC: linux-kselftest(a)vger.kernel.org CC: kunit-dev(a)googlegroups.com --- tools/testing/kunit/configs/all_tests.config | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/testing/kunit/configs/all_tests.config b/tools/testing/kunit/configs/all_tests.config index 48b132cd9d2a..2f093048d985 100644 --- a/tools/testing/kunit/configs/all_tests.config +++ b/tools/testing/kunit/configs/all_tests.config @@ -10,7 +10,6 @@ CONFIG_KUNIT_EXAMPLE_TEST=y CONFIG_KUNIT_ALL_TESTS=y CONFIG_FORTIFY_SOURCE=y -CONFIG_INIT_STACK_ALL_PATTERN=y CONFIG_IIO=y -- 2.49.0

1 month

5
8
0 0

[kvm-unit-tests PATCH] riscv: sbi: Add SBI Debug Triggers Extension tests

by Jesse Taube

Add tests for the DBTR SBI extension. Signed-off-by: Jesse Taube <jesse(a)rivosinc.com> --- lib/riscv/asm/sbi.h | 28 ++ lib/riscv/sbi.c | 58 ++++ riscv/Makefile | 1 + riscv/sbi-dbtr.c | 703 ++++++++++++++++++++++++++++++++++++++++++++ riscv/sbi-tests.h | 1 + riscv/sbi.c | 1 + 6 files changed, 792 insertions(+) create mode 100644 riscv/sbi-dbtr.c diff --git a/lib/riscv/asm/sbi.h b/lib/riscv/asm/sbi.h index a5738a5c..ce19ab89 100644 --- a/lib/riscv/asm/sbi.h +++ b/lib/riscv/asm/sbi.h @@ -51,6 +51,7 @@ enum sbi_ext_id { SBI_EXT_SUSP = 0x53555350, SBI_EXT_FWFT = 0x46574654, SBI_EXT_SSE = 0x535345, + SBI_EXT_DBTR = 0x44425452, }; enum sbi_ext_base_fid { @@ -125,6 +126,17 @@ enum sbi_ext_fwft_fid { #define SBI_FWFT_SET_FLAG_LOCK BIT(0) +enum sbi_ext_dbtr_fid { + SBI_EXT_DBTR_NUM_TRIGGERS = 0, + SBI_EXT_DBTR_SETUP_SHMEM, + SBI_EXT_DBTR_TRIGGER_READ, + SBI_EXT_DBTR_TRIGGER_INSTALL, + SBI_EXT_DBTR_TRIGGER_UPDATE, + SBI_EXT_DBTR_TRIGGER_UNINSTALL, + SBI_EXT_DBTR_TRIGGER_ENABLE, + SBI_EXT_DBTR_TRIGGER_DISABLE, +}; + enum sbi_ext_sse_fid { SBI_EXT_SSE_READ_ATTRS = 0, SBI_EXT_SSE_WRITE_ATTRS, @@ -282,6 +294,22 @@ static inline bool sbi_sse_event_is_global(uint32_t event_id) return !!(event_id & SBI_SSE_EVENT_GLOBAL_BIT); } +struct sbiret sbi_debug_num_triggers(unsigned long trig_tdata1); +struct sbiret sbi_debug_set_shmem(void *shmem); +struct sbiret sbi_debug_set_shmem_raw(unsigned long shmem_phys_lo, + unsigned long shmem_phys_hi, + unsigned long flags); +struct sbiret sbi_debug_read_triggers(unsigned long trig_idx_base, + unsigned long trig_count); +struct sbiret sbi_debug_install_triggers(unsigned long trig_count); +struct sbiret sbi_debug_update_triggers(unsigned long trig_count); +struct sbiret sbi_debug_uninstall_triggers(unsigned long trig_idx_base, + unsigned long trig_idx_mask); +struct sbiret sbi_debug_enable_triggers(unsigned long trig_idx_base, + unsigned long trig_idx_mask); +struct sbiret sbi_debug_disable_triggers(unsigned long trig_idx_base, + unsigned long trig_idx_mask); + struct sbiret sbi_sse_read_attrs_raw(unsigned long event_id, unsigned long base_attr_id, unsigned long attr_count, unsigned long phys_lo, unsigned long phys_hi); diff --git a/lib/riscv/sbi.c b/lib/riscv/sbi.c index 2959378f..39c0d3bd 100644 --- a/lib/riscv/sbi.c +++ b/lib/riscv/sbi.c @@ -32,6 +32,64 @@ struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0, return ret; } +struct sbiret sbi_debug_num_triggers(unsigned long trig_tdata1) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_NUM_TRIGGERS, trig_tdata1, 0, 0, 0, 0, 0); +} + +struct sbiret sbi_debug_set_shmem(void *shmem) +{ + phys_addr_t p = virt_to_phys(shmem); + + return sbi_debug_set_shmem_raw(lower_32_bits(p), upper_32_bits(p), 0); +} + +struct sbiret sbi_debug_set_shmem_raw(unsigned long shmem_phys_lo, + unsigned long shmem_phys_hi, + unsigned long flags) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_SETUP_SHMEM, shmem_phys_lo, + shmem_phys_hi, flags, 0, 0, 0); +} + +struct sbiret sbi_debug_read_triggers(unsigned long trig_idx_base, + unsigned long trig_count) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_READ, trig_idx_base, + trig_count, 0, 0, 0, 0); +} + +struct sbiret sbi_debug_install_triggers(unsigned long trig_count) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_INSTALL, trig_count, 0, 0, 0, 0, 0); +} + +struct sbiret sbi_debug_update_triggers(unsigned long trig_count) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_UPDATE, trig_count, 0, 0, 0, 0, 0); +} + +struct sbiret sbi_debug_uninstall_triggers(unsigned long trig_idx_base, + unsigned long trig_idx_mask) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_UNINSTALL, trig_idx_base, + trig_idx_mask, 0, 0, 0, 0); +} + +struct sbiret sbi_debug_enable_triggers(unsigned long trig_idx_base, + unsigned long trig_idx_mask) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_ENABLE, trig_idx_base, + trig_idx_mask, 0, 0, 0, 0); +} + +struct sbiret sbi_debug_disable_triggers(unsigned long trig_idx_base, + unsigned long trig_idx_mask) +{ + return sbi_ecall(SBI_EXT_DBTR, SBI_EXT_DBTR_TRIGGER_DISABLE, trig_idx_base, + trig_idx_mask, 0, 0, 0, 0); +} + struct sbiret sbi_sse_read_attrs_raw(unsigned long event_id, unsigned long base_attr_id, unsigned long attr_count, unsigned long phys_lo, unsigned long phys_hi) diff --git a/riscv/Makefile b/riscv/Makefile index 11e68eae..55c7ac93 100644 --- a/riscv/Makefile +++ b/riscv/Makefile @@ -20,6 +20,7 @@ all: $(tests) $(TEST_DIR)/sbi-deps += $(TEST_DIR)/sbi-asm.o $(TEST_DIR)/sbi-deps += $(TEST_DIR)/sbi-fwft.o $(TEST_DIR)/sbi-deps += $(TEST_DIR)/sbi-sse.o +$(TEST_DIR)/sbi-deps += $(TEST_DIR)/sbi-dbtr.o all_deps += $($(TEST_DIR)/sbi-deps) diff --git a/riscv/sbi-dbtr.c b/riscv/sbi-dbtr.c new file mode 100644 index 00000000..e557aae1 --- /dev/null +++ b/riscv/sbi-dbtr.c @@ -0,0 +1,703 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * SBI DBTR testsuite + * + * Copyright (C) 2025, Rivos Inc., Jesse Taube <jesse(a)rivosinc.com> + */ + +#include <asm/io.h> + +#include "sbi-tests.h" + +#define INSN_LEN(insn) ((((insn) & 0x3) < 0x3) ? 2 : 4) + +#if __riscv_xlen == 64 +#define SBI_DBTR_SHMEM_INVALID_ADDR 0xFFFFFFFFFFFFFFFFUL +#elif __riscv_xlen == 32 +#define SBI_DBTR_SHMEM_INVALID_ADDR 0xFFFFFFFFUL +#else +#error "Unexpected __riscv_xlen" +#endif + +#define RV_MAX_TRIGGERS 32 + +#define SBI_DBTR_TRIG_STATE_MAPPED BIT(0) +#define SBI_DBTR_TRIG_STATE_U BIT(1) +#define SBI_DBTR_TRIG_STATE_S BIT(2) +#define SBI_DBTR_TRIG_STATE_VU BIT(3) +#define SBI_DBTR_TRIG_STATE_VS BIT(4) +#define SBI_DBTR_TRIG_STATE_HAVE_HW_TRIG BIT(5) + +#define SBI_DBTR_TRIG_STATE_HW_TRIG_IDX_SHIFT 8 +#define SBI_DBTR_TRIG_STATE_HW_TRIG_IDX(trig_state) (trig_state >> SBI_DBTR_TRIG_STATE_HW_TRIG_IDX_SHIFT) + +#define SBI_DBTR_TDATA1_TYPE_SHIFT (__riscv_xlen - 4) + +#define SBI_DBTR_TDATA1_MCONTROL6_LOAD_BIT BIT(0) +#define SBI_DBTR_TDATA1_MCONTROL6_STORE_BIT BIT(1) +#define SBI_DBTR_TDATA1_MCONTROL6_EXECUTE_BIT BIT(2) +#define SBI_DBTR_TDATA1_MCONTROL6_U_BIT BIT(3) +#define SBI_DBTR_TDATA1_MCONTROL6_S_BIT BIT(4) +#define SBI_DBTR_TDATA1_MCONTROL6_SELECT_BIT BIT(21) +#define SBI_DBTR_TDATA1_MCONTROL6_VS_BIT BIT(23) +#define SBI_DBTR_TDATA1_MCONTROL6_VU_BIT BIT(24) + +#define SBI_DBTR_TDATA1_MCONTROL_LOAD_BIT BIT(0) +#define SBI_DBTR_TDATA1_MCONTROL_STORE_BIT BIT(1) +#define SBI_DBTR_TDATA1_MCONTROL_EXECUTE_BIT BIT(2) +#define SBI_DBTR_TDATA1_MCONTROL_U_BIT BIT(3) +#define SBI_DBTR_TDATA1_MCONTROL_S_BIT BIT(4) +#define SBI_DBTR_TDATA1_MCONTROL_SELECT_BIT BIT(19) + +typedef enum { + SBI_DBTR_TDATA1_TYPE_NONE = (0UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_LEGACY = (1UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_MCONTROL = (2UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_ICOUNT = (3UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_ITRIGGER = (4UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_ETRIGGER = (5UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_MCONTROL6 = (6UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_TMEXTTRIGGER = (7UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_RESERVED0 = (8UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_RESERVED1 = (9UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_RESERVED2 = (10UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_RESERVED3 = (11UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_CUSTOM0 = (12UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_CUSTOM1 = (13UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_CUSTOM2 = (14UL << SBI_DBTR_TDATA1_TYPE_SHIFT), + SBI_DBTR_TDATA1_TYPE_DISABLED = (15UL << SBI_DBTR_TDATA1_TYPE_SHIFT), +} McontrolType; + +typedef enum { + VALUE_NONE = 0, + VALUE_LOAD = BIT(0), + VALUE_STORE = BIT(1), + VALUE_EXECUTE = BIT(2), +} Tdata1Value; + +typedef enum { + MODE_NONE = 0, + MODE_M = BIT(0), + MODE_U = BIT(1), + MODE_S = BIT(2), + MODE_VU = BIT(3), + MODE_VS = BIT(4), +} Tdata1Mode; + +struct sbi_dbtr_data_msg { + unsigned long tstate; + unsigned long tdata1; + unsigned long tdata2; + unsigned long tdata3; +}; + +struct sbi_dbtr_id_msg { + unsigned long idx; +}; + +/* SBI shared mem messages layout */ +struct sbi_dbtr_shmem_entry { + union { + struct sbi_dbtr_data_msg data; + struct sbi_dbtr_id_msg id; + }; +}; + +static bool dbtr_handled; + +// Expected to be leaf function as not to disrupt frame-pointer +static __attribute__((naked)) void exec_call(void) +{ + // skip over nop when triggered instead of ret. + asm volatile ("nop\n" + "nop\n" + "ret\n"); +} + +static void dbtr_exception_handler(struct pt_regs *regs) +{ + dbtr_handled = true; + + /* Reading may cause a fault, skip over two nops if compressed, one if not */ + if ((void *)regs->epc == exec_call) { + regs->epc += 4; + return; + } + + /* WARNING: Skips over the trapped intruction */ + regs->epc += INSN_LEN(readw((void *)regs->epc)); +} + +static bool do_save(void *tdata2) +{ + bool ret; + + writel(0, tdata2); + + ret = dbtr_handled; + dbtr_handled = false; + + return ret; +} + +static bool do_load(void *tdata2) +{ + bool ret; + + readl(tdata2); + + ret = dbtr_handled; + dbtr_handled = false; + + return ret; +} + +static bool do_exec(void) +{ + bool ret; + + exec_call(); + + ret = dbtr_handled; + dbtr_handled = false; + + return ret; +} + +static unsigned long gen_tdata1_mcontrol(Tdata1Mode mode, Tdata1Value value) +{ + unsigned long tdata1 = SBI_DBTR_TDATA1_TYPE_MCONTROL; + + if (value & VALUE_LOAD) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_LOAD_BIT; + + if (value & VALUE_STORE) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_STORE_BIT; + + if (value & VALUE_EXECUTE) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_EXECUTE_BIT; + + if (mode & MODE_M) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_U_BIT; + + if (mode & MODE_U) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_U_BIT; + + if (mode & MODE_S) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL_S_BIT; + + return tdata1; +} + +static unsigned long gen_tdata1_mcontrol6(Tdata1Mode mode, Tdata1Value value) +{ + unsigned long tdata1 = SBI_DBTR_TDATA1_TYPE_MCONTROL6; + + if (value & VALUE_LOAD) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_LOAD_BIT; + + if (value & VALUE_STORE) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_STORE_BIT; + + if (value & VALUE_EXECUTE) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_EXECUTE_BIT; + + if (mode & MODE_M) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_U_BIT; + + if (mode & MODE_U) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_U_BIT; + + if (mode & MODE_S) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_S_BIT; + + if (mode & MODE_VU) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_VU_BIT; + + if (mode & MODE_VS) + tdata1 |= SBI_DBTR_TDATA1_MCONTROL6_VS_BIT; + + return tdata1; +} + +static unsigned long gen_tdata1(McontrolType type, Tdata1Value value, Tdata1Mode mode) +{ + switch (type) { + case SBI_DBTR_TDATA1_TYPE_MCONTROL: + return gen_tdata1_mcontrol(mode, value); + case SBI_DBTR_TDATA1_TYPE_MCONTROL6: + return gen_tdata1_mcontrol6(mode, value); + default: + return 0; + } +} + +static bool dbtr_install_trigger(struct sbi_dbtr_shmem_entry *shmem, void *tdata2, + unsigned long tdata1) +{ + struct sbiret sbi_ret; + bool ret; + + shmem->data.tdata1 = tdata1; + shmem->data.tdata2 = (unsigned long)tdata2; + + sbi_ret = sbi_debug_install_triggers(1); + ret = sbiret_report_error(&sbi_ret, SBI_SUCCESS, "sbi_debug_install_triggers"); + + if (ret) + install_exception_handler(EXC_BREAKPOINT, dbtr_exception_handler); + + return ret; +} + +static bool dbtr_uninstall_trigger(void) +{ + struct sbiret ret; + + install_exception_handler(EXC_BREAKPOINT, NULL); + + ret = sbi_debug_uninstall_triggers(0, 1); + return sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_uninstall_triggers"); +} + +static unsigned long dbtr_test_num_triggers(void) +{ + struct sbiret ret; + + //sbi_debug_num_triggers will return trig_max in sbiret.value when trig_tdata1 == 0 + unsigned long tdata1 = 0; + + // should be atleast one trigger. + ret = sbi_debug_num_triggers(tdata1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_num_triggers"); + + if (ret.value == 0) + report_fail("sbi_debug_num_triggers: Returned 0 triggers avalible"); + else + report_pass("sbi_debug_num_triggers: Returned %lu triggers avalible", ret.value); + + return ret.value; +} + +static McontrolType dbtr_test_type(unsigned long *num_trig) +{ + struct sbiret ret; + + //sbi_debug_num_triggers will return trig_max in sbiret.value when trig_tdata1 == 0 + unsigned long tdata1 = SBI_DBTR_TDATA1_TYPE_MCONTROL6; + + ret = sbi_debug_num_triggers(tdata1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_num_triggers"); + if (ret.value > 0) { + report_pass("sbi_debug_num_triggers: Returned %lu mcontrol6 triggers avalible", + ret.value); + *num_trig = ret.value; + return tdata1; + } + + tdata1 = SBI_DBTR_TDATA1_TYPE_MCONTROL; + + ret = sbi_debug_num_triggers(tdata1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_num_triggers"); + *num_trig = ret.value; + if (ret.value > 0) { + report_pass("sbi_debug_num_triggers: Returned %lu mcontrol triggers avalible", + ret.value); + return tdata1; + } + + report_fail("sbi_debug_num_triggers: Returned 0 mcontrol(6) triggers avalible"); + + return SBI_DBTR_TDATA1_TYPE_NONE; +} + +static struct sbiret dbtr_test_save_install_uninstall(struct sbi_dbtr_shmem_entry *shmem, + McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + + report_prefix_push("save_trigger"); + + shmem->data.tdata1 = gen_tdata1(type, VALUE_STORE, MODE_S | MODE_S); + shmem->data.tdata2 = (unsigned long)&test; + + ret = sbi_debug_install_triggers(1); + if (!sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_install_triggers")) + return ret; + + install_exception_handler(EXC_BREAKPOINT, dbtr_exception_handler); + + report(do_save(&test), "triggered"); + + if (do_load(&test)) + report_fail("triggered by load"); + + ret = sbi_debug_uninstall_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_uninstall_triggers"); + + if (do_save(&test)) + report_fail("triggered after uninstall"); + + install_exception_handler(EXC_BREAKPOINT, NULL); + report_prefix_pop(); + + return ret; +} + +static void dbtr_test_update(struct sbi_dbtr_shmem_entry *shmem, McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + + report_prefix_push("update_trigger"); + + dbtr_install_trigger(shmem, NULL, gen_tdata1(type, VALUE_NONE, MODE_NONE)); + + shmem->id.idx = 0; + shmem->data.tdata1 = gen_tdata1(type, VALUE_STORE, MODE_S); + shmem->data.tdata2 = (unsigned long)&test; + + ret = sbi_debug_update_triggers(1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_update_triggers"); + + report(do_save(&test), "triggered"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_load(struct sbi_dbtr_shmem_entry *shmem, McontrolType type) +{ + static unsigned long test; + + report_prefix_push("load_trigger"); + dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_LOAD, MODE_S) | 1); + + report(do_load(&test), "triggered"); + + if (do_save(&test)) + report_fail("triggered by save"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_disable_enable(struct sbi_dbtr_shmem_entry *shmem, McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + + report_prefix_push("sbi_debug_disable_triggers"); + dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_STORE, MODE_S)); + + ret = sbi_debug_disable_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_disable_triggers"); + + if (do_save(&test)) { + report_fail("should not trigger"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); + report_skip("sbi_debug_enable_triggers: no disable"); + + return; + } + + report_pass("should not trigger"); + + report_prefix_pop(); + report_prefix_push("sbi_debug_enable_triggers"); + + ret = sbi_debug_enable_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_enable_triggers"); + + report(do_save(&test), "triggered"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_exec(struct sbi_dbtr_shmem_entry *shmem, McontrolType type) +{ + static unsigned long test; + + report_prefix_push("exec_trigger"); + /* check if loads and saves trigger exec */ + dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_EXECUTE, MODE_S)); + + if (do_load(&test)) + report_fail("triggered by load"); + + if (do_save(&test)) + report_fail("triggered by save"); + + dbtr_uninstall_trigger(); + + /* Check if exec works */ + dbtr_install_trigger(shmem, exec_call, gen_tdata1(type, VALUE_EXECUTE, MODE_S)); + report(do_exec(), "exec trigger"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_read(struct sbi_dbtr_shmem_entry *shmem, McontrolType type) +{ + const unsigned long tstatus_expected = SBI_DBTR_TRIG_STATE_S | SBI_DBTR_TRIG_STATE_MAPPED; + const unsigned long tdata1 = gen_tdata1(type, VALUE_STORE, MODE_S); + static unsigned long test; + struct sbiret ret; + + report_prefix_push("sbi_debug_read_triggers"); + dbtr_install_trigger(shmem, &test, tdata1); + + ret = sbi_debug_read_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_read_triggers"); + + report(shmem->data.tdata1 == tdata1, "tdata1 expected: 0x%016lx, found: 0x%016lx", + tdata1, shmem->data.tdata1); + report(shmem->data.tdata2 == ((unsigned long)&test), + "tdata2 expected: 0x%016lx, found: 0x%016lx", ((unsigned long)&test), + shmem->data.tdata2); + report(shmem->data.tstate == tstatus_expected, "tstate expected: 0x%016lx, found: 0x%016lx", + tstatus_expected, shmem->data.tstate); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void check_exec(unsigned long base) +{ + struct sbiret ret; + + report(do_exec(), "exec triggered"); + + ret = sbi_debug_uninstall_triggers(base, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_uninstall_triggers"); +} + +static void dbtr_test_multiple(struct sbi_dbtr_shmem_entry *shmem, McontrolType type, + unsigned long num_trigs) +{ + static unsigned long test[2]; + struct sbiret ret; + bool have_three = num_trigs > 2; + + if (num_trigs < 2) + return; + + report_prefix_push("test_multiple"); + + dbtr_install_trigger(shmem, &test[0], gen_tdata1(type, VALUE_STORE, MODE_S)); + dbtr_install_trigger(shmem, &test[1], gen_tdata1(type, VALUE_LOAD, MODE_S)); + if (have_three) + dbtr_install_trigger(shmem, exec_call, gen_tdata1(type, VALUE_EXECUTE, MODE_S)); + + report(do_save(&test[0]), "save triggered"); + + if (do_load(&test[0])) + report_fail("save triggered by load"); + + report(do_load(&test[1]), "load triggered"); + + if (do_save(&test[1])) + report_fail("load triggered by save"); + + if (have_three) + check_exec(2); + + ret = sbi_debug_uninstall_triggers(1, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_uninstall_triggers"); + + if (do_load(&test[1])) + report_fail("load triggered after uninstall"); + + report(do_save(&test[0]), "save triggered"); + + if (!have_three) { + dbtr_install_trigger(shmem, exec_call, gen_tdata1(type, VALUE_EXECUTE, MODE_S)); + check_exec(1); + } + + ret = sbi_debug_uninstall_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_uninstall_triggers"); + + install_exception_handler(EXC_BREAKPOINT, NULL); + report_prefix_pop(); +} + +static void dbtr_test_multiple_types(struct sbi_dbtr_shmem_entry *shmem, unsigned long type) +{ + static unsigned long test; + + report_prefix_push("dbtr_test_multiple_types"); + + /* check if loads and saves trigger exec */ + dbtr_install_trigger(shmem, &test, + gen_tdata1(type, VALUE_EXECUTE | VALUE_LOAD | VALUE_STORE, MODE_S)); + + report(do_load(&test), "load trigger"); + + report(do_save(&test), "save trigger"); + + dbtr_uninstall_trigger(); + + /* Check if exec works */ + dbtr_install_trigger(shmem, exec_call, + gen_tdata1(type, VALUE_EXECUTE | VALUE_LOAD | VALUE_STORE, MODE_S)); + + report(do_exec(), "exec trigger"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_disable_unistall(struct sbi_dbtr_shmem_entry *shmem, McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + + report_prefix_push("disable uninstall"); + dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_STORE, MODE_S)); + + ret = sbi_debug_disable_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_disable_triggers"); + + dbtr_uninstall_trigger(); + + dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_STORE, MODE_S)); + + report(do_save(&test), "triggered"); + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +static void dbtr_test_unistall_enable(struct sbi_dbtr_shmem_entry *shmem, McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + + report_prefix_push("uninstall enable"); + dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_STORE, MODE_S)); + + dbtr_uninstall_trigger(); + + ret = sbi_debug_enable_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_enable_triggers"); + + install_exception_handler(EXC_BREAKPOINT, dbtr_exception_handler); + + report(!do_save(&test), "should not trigger"); + + install_exception_handler(EXC_BREAKPOINT, NULL); + report_prefix_pop(); +} + +static void dbtr_test_unistall_update(struct sbi_dbtr_shmem_entry *shmem, McontrolType type) +{ + static unsigned long test; + struct sbiret ret; + + report_prefix_push("uninstall update"); + dbtr_install_trigger(shmem, NULL, gen_tdata1(type, VALUE_NONE, MODE_NONE)); + + dbtr_uninstall_trigger(); + + shmem->id.idx = 0; + shmem->data.tdata1 = gen_tdata1(type, VALUE_STORE, MODE_S); + shmem->data.tdata2 = (unsigned long)&test; + + ret = sbi_debug_update_triggers(1); + sbiret_report_error(&ret, SBI_ERR_FAILURE, "sbi_debug_update_triggers"); + + install_exception_handler(EXC_BREAKPOINT, dbtr_exception_handler); + + report(!do_save(&test), "should not trigger"); + + install_exception_handler(EXC_BREAKPOINT, NULL); + report_prefix_pop(); +} + +static void dbtr_test_disable_read(struct sbi_dbtr_shmem_entry *shmem, McontrolType type) +{ + const unsigned long tstatus_expected = SBI_DBTR_TRIG_STATE_S | SBI_DBTR_TRIG_STATE_MAPPED; + const unsigned long tdata1 = gen_tdata1(type, VALUE_STORE, MODE_NONE); + static unsigned long test; + struct sbiret ret; + + report_prefix_push("disable_read"); + dbtr_install_trigger(shmem, &test, gen_tdata1(type, VALUE_STORE, MODE_S)); + + ret = sbi_debug_disable_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_disable_triggers"); + + ret = sbi_debug_read_triggers(0, 1); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_read_triggers"); + + report(shmem->data.tdata1 == tdata1, "tdata1 expected: 0x%016lx, found: 0x%016lx", + tdata1, shmem->data.tdata1); + report(shmem->data.tdata2 == ((unsigned long)&test), + "tdata2 expected: 0x%016lx, found: 0x%016lx", + ((unsigned long)&test), shmem->data.tdata2); + report(shmem->data.tstate == tstatus_expected, "tstate expected: 0x%016lx, found: 0x%016lx", + tstatus_expected, shmem->data.tstate); + + + dbtr_uninstall_trigger(); + report_prefix_pop(); +} + +void check_dbtr(void) +{ + static struct sbi_dbtr_shmem_entry shmem[RV_MAX_TRIGGERS] = {}; + unsigned long num_trigs; + McontrolType trig_type; + struct sbiret ret; + + report_prefix_push("dbtr"); + + if (!sbi_probe(SBI_EXT_DBTR)) { + report_skip("extension not available"); + report_prefix_pop(); + return; + } + + if (__sbi_get_imp_id() == SBI_IMPL_OPENSBI && + __sbi_get_imp_version() < sbi_impl_opensbi_mk_version(1, 6)) { + report_skip("OpenSBI < v1.7 detected, skipping tests"); + report_prefix_pop(); + return; + } + + num_trigs = dbtr_test_num_triggers(); + if (!num_trigs) + return; + + trig_type = dbtr_test_type(&num_trigs); + if (trig_type == SBI_DBTR_TDATA1_TYPE_NONE) + return; + + ret = sbi_debug_set_shmem(shmem); + sbiret_report_error(&ret, SBI_SUCCESS, "sbi_debug_set_shmem"); + + ret = dbtr_test_save_install_uninstall(&shmem[0], trig_type); + /* install or unistall failed */ + if (ret.error != SBI_SUCCESS) + return; + + dbtr_test_load(&shmem[0], trig_type); + dbtr_test_exec(&shmem[0], trig_type); + dbtr_test_read(&shmem[0], trig_type); + dbtr_test_disable_enable(&shmem[0], trig_type); + dbtr_test_update(&shmem[0], trig_type); + dbtr_test_multiple_types(&shmem[0], trig_type); + dbtr_test_multiple(shmem, trig_type, num_trigs); + dbtr_test_disable_unistall(&shmem[0], trig_type); + dbtr_test_unistall_enable(&shmem[0], trig_type); + dbtr_test_unistall_update(&shmem[0], trig_type); + dbtr_test_disable_read(&shmem[0], trig_type); + + report_prefix_pop(); +} diff --git a/riscv/sbi-tests.h b/riscv/sbi-tests.h index d5c4ae70..6a227745 100644 --- a/riscv/sbi-tests.h +++ b/riscv/sbi-tests.h @@ -99,6 +99,7 @@ static inline bool env_enabled(const char *env) void sbi_bad_fid(int ext); void check_sse(void); +void check_dbtr(void); #endif /* __ASSEMBLER__ */ #endif /* _RISCV_SBI_TESTS_H_ */ diff --git a/riscv/sbi.c b/riscv/sbi.c index edb1a6be..5bd496d0 100644 --- a/riscv/sbi.c +++ b/riscv/sbi.c @@ -1561,6 +1561,7 @@ int main(int argc, char **argv) check_susp(); check_sse(); check_fwft(); + check_dbtr(); return report_summary(); } -- 2.43.0

1 month

3
2
0 0

[PATCH v2] selftests: Add functional test for the abort file in fusectl

by Chen Linxuan

This patch add a simple functional test for the "abort" file in fusectlfs (/sys/fs/fuse/connections/ID/about). A simple fuse daemon is added for testing. Related discussion can be found in the link below. Link: https://lore.kernel.org/all/CAOQ4uxjKFXOKQxPpxtS6G_nR0tpw95w0GiO68UcWg_OBhm… Cc: Amir Goldstein <amir73il(a)gmail.com> Signed-off-by: Chen Linxuan <chenlinxuan(a)uniontech.com> --- Changes in v2: - Apply changes suggested by Amir Goldstein - Check errno - Link to v1: https://lore.kernel.org/all/20250515073449.346774-2-chenlinxuan@uniontech.c… --- MAINTAINERS | 1 + tools/testing/selftests/Makefile | 1 + .../selftests/filesystems/fusectl/.gitignore | 3 + .../selftests/filesystems/fusectl/Makefile | 21 +++ .../selftests/filesystems/fusectl/fuse_mnt.c | 146 ++++++++++++++++++ .../filesystems/fusectl/fusectl_test.c | 116 ++++++++++++++ 6 files changed, 288 insertions(+) create mode 100644 tools/testing/selftests/filesystems/fusectl/.gitignore create mode 100644 tools/testing/selftests/filesystems/fusectl/Makefile create mode 100644 tools/testing/selftests/filesystems/fusectl/fuse_mnt.c create mode 100644 tools/testing/selftests/filesystems/fusectl/fusectl_test.c diff --git a/MAINTAINERS b/MAINTAINERS index 3563492e4eba4..bee6f7d787a33 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9740,6 +9740,7 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git F: Documentation/filesystems/fuse.rst F: fs/fuse/ F: include/uapi/linux/fuse.h +F: tools/testing/selftests/filesystems/fusectl FUTEX SUBSYSTEM M: Thomas Gleixner <tglx(a)linutronix.de> diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 80fb84fa3cfcb..a9bfefa961889 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -36,6 +36,7 @@ TARGETS += filesystems/fat TARGETS += filesystems/overlayfs TARGETS += filesystems/statmount TARGETS += filesystems/mount-notify +TARGETS += filesystems/fusectl TARGETS += firmware TARGETS += fpu TARGETS += ftrace diff --git a/tools/testing/selftests/filesystems/fusectl/.gitignore b/tools/testing/selftests/filesystems/fusectl/.gitignore new file mode 100644 index 0000000000000..3e72e742d08e8 --- /dev/null +++ b/tools/testing/selftests/filesystems/fusectl/.gitignore @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0-only +fuse_mnt +fusectl_test diff --git a/tools/testing/selftests/filesystems/fusectl/Makefile b/tools/testing/selftests/filesystems/fusectl/Makefile new file mode 100644 index 0000000000000..612aad69a93aa --- /dev/null +++ b/tools/testing/selftests/filesystems/fusectl/Makefile @@ -0,0 +1,21 @@ +# SPDX-License-Identifier: GPL-2.0-or-later + +CFLAGS += -Wall -O2 -g $(KHDR_INCLUDES) + +TEST_GEN_PROGS := fusectl_test +TEST_GEN_FILES := fuse_mnt + +include ../../lib.mk + +VAR_CFLAGS := $(shell pkg-config fuse --cflags 2>/dev/null) +ifeq ($(VAR_CFLAGS),) +VAR_CFLAGS := -D_FILE_OFFSET_BITS=64 -I/usr/include/fuse +endif + +VAR_LDLIBS := $(shell pkg-config fuse --libs 2>/dev/null) +ifeq ($(VAR_LDLIBS),) +VAR_LDLIBS := -lfuse -pthread +endif + +$(OUTPUT)/fuse_mnt: CFLAGS += $(VAR_CFLAGS) +$(OUTPUT)/fuse_mnt: LDLIBS += $(VAR_LDLIBS) diff --git a/tools/testing/selftests/filesystems/fusectl/fuse_mnt.c b/tools/testing/selftests/filesystems/fusectl/fuse_mnt.c new file mode 100644 index 0000000000000..d12b17f30fadc --- /dev/null +++ b/tools/testing/selftests/filesystems/fusectl/fuse_mnt.c @@ -0,0 +1,146 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * fusectl test file-system + * Creates a simple FUSE filesystem with a single read-write file (/test) + */ + +#define FUSE_USE_VERSION 26 + +#include <fuse.h> +#include <stdio.h> +#include <string.h> +#include <errno.h> +#include <fcntl.h> +#include <stdlib.h> +#include <unistd.h> + +#define MAX(a, b) ((a) > (b) ? (a) : (b)) + +static char *content; +static size_t content_size = 0; +static const char test_path[] = "/test"; + +static int test_getattr(const char *path, struct stat *st) +{ + memset(st, 0, sizeof(*st)); + + if (!strcmp(path, "/")) { + st->st_mode = S_IFDIR | 0755; + st->st_nlink = 2; + return 0; + } + + if (!strcmp(path, test_path)) { + st->st_mode = S_IFREG | 0664; + st->st_nlink = 1; + st->st_size = content_size; + return 0; + } + + return -ENOENT; +} + +static int test_readdir(const char *path, void *buf, fuse_fill_dir_t filler, + off_t offset, struct fuse_file_info *fi) +{ + if (strcmp(path, "/")) + return -ENOENT; + + filler(buf, ".", NULL, 0); + filler(buf, "..", NULL, 0); + filler(buf, test_path + 1, NULL, 0); + + return 0; +} + +static int test_open(const char *path, struct fuse_file_info *fi) +{ + if (strcmp(path, test_path)) + return -ENOENT; + + return 0; +} + +static int test_read(const char *path, char *buf, size_t size, off_t offset, + struct fuse_file_info *fi) +{ + if (strcmp(path, test_path) != 0) + return -ENOENT; + + if (!content || content_size == 0) + return 0; + + if (offset >= content_size) + return 0; + + if (offset + size > content_size) + size = content_size - offset; + + memcpy(buf, content + offset, size); + + return size; +} + +static int test_write(const char *path, const char *buf, size_t size, + off_t offset, struct fuse_file_info *fi) +{ + size_t new_size; + + if (strcmp(path, test_path) != 0) + return -ENOENT; + + if(offset > content_size) + return -EINVAL; + + new_size = MAX(offset + size, content_size); + + if (new_size > content_size) + content = realloc(content, new_size); + + content_size = new_size; + + if (!content) + return -ENOMEM; + + memcpy(content + offset, buf, size); + + return size; +} + +static int test_truncate(const char *path, off_t size) +{ + if (strcmp(path, test_path) != 0) + return -ENOENT; + + if (size == 0) { + free(content); + content = NULL; + content_size = 0; + return 0; + } + + content = realloc(content, size); + + if (!content) + return -ENOMEM; + + if (size > content_size) + memset(content + content_size, 0, size - content_size); + + content_size = size; + return 0; +} + +static struct fuse_operations memfd_ops = { + .getattr = test_getattr, + .readdir = test_readdir, + .open = test_open, + .read = test_read, + .write = test_write, + .truncate = test_truncate, +}; + +int main(int argc, char *argv[]) +{ + return fuse_main(argc, argv, &memfd_ops, NULL); +} diff --git a/tools/testing/selftests/filesystems/fusectl/fusectl_test.c b/tools/testing/selftests/filesystems/fusectl/fusectl_test.c new file mode 100644 index 0000000000000..7050fbe0970e7 --- /dev/null +++ b/tools/testing/selftests/filesystems/fusectl/fusectl_test.c @@ -0,0 +1,116 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +// Copyright (c) 2025 Chen Linxuan <chenlinxuan(a)uniontech.com> + +#define _GNU_SOURCE + +#include <errno.h> +#include <fcntl.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/mount.h> +#include <sys/stat.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <unistd.h> +#include <dirent.h> +#include <linux/limits.h> + +#include "../../kselftest_harness.h" + +#define FUSECTL_MOUNTPOINT "/sys/fs/fuse/connections" +#define FUSE_MOUNTPOINT "/tmp/fuse_mnt_XXXXXX" +#define FUSE_DEVICE "/dev/fuse" +#define FUSECTL_TEST_VALUE "1" + +FIXTURE(fusectl){ + char fuse_mountpoint[sizeof(FUSE_MOUNTPOINT)]; + int connection; +}; + +FIXTURE_SETUP(fusectl) +{ + const char *fuse_mnt_prog = "./fuse_mnt"; + int status, pid; + struct stat statbuf; + + strcpy(self->fuse_mountpoint, FUSE_MOUNTPOINT); + + if (!mkdtemp(self->fuse_mountpoint)) + SKIP(return, + "Failed to create FUSE mountpoint %s", + strerror(errno)); + + if (access(FUSECTL_MOUNTPOINT, F_OK)) + SKIP(return, + "FUSE control filesystem not mounted"); + + pid = fork(); + if (pid < 0) + SKIP(return, + "Failed to fork FUSE daemon process: %s", + strerror(errno)); + + if (pid == 0) { + execlp(fuse_mnt_prog, fuse_mnt_prog, self->fuse_mountpoint, NULL); + exit(errno); + } + + waitpid(pid, &status, 0); + if (!WIFEXITED(status) || WEXITSTATUS(status) != 0) { + SKIP(return, + "Failed to start FUSE daemon %s", + strerror(WEXITSTATUS(status))); + } + + if (stat(self->fuse_mountpoint, &statbuf)) + SKIP(return, + "Failed to stat FUSE mountpoint %s", + strerror(errno)); + + self->connection = statbuf.st_dev; +} + +FIXTURE_TEARDOWN(fusectl) +{ + umount(self->fuse_mountpoint); + rmdir(self->fuse_mountpoint); +} + +TEST_F(fusectl, abort) +{ + char path_buf[PATH_MAX]; + int abort_fd, test_fd, ret; + + sprintf(path_buf, "/sys/fs/fuse/connections/%d/abort", self->connection); + + ASSERT_EQ(0, access(path_buf, F_OK)); + + abort_fd = open(path_buf, O_WRONLY); + ASSERT_GE(abort_fd, 0); + + sprintf(path_buf, "%s/test", self->fuse_mountpoint); + + test_fd = open(path_buf, O_RDWR); + ASSERT_GE(test_fd, 0); + + ret = read(test_fd, path_buf, sizeof(path_buf)); + ASSERT_EQ(ret, 0); + + ret = write(test_fd, "test", sizeof("test")); + ASSERT_EQ(ret, sizeof("test")); + + ret = lseek(test_fd, 0, SEEK_SET); + ASSERT_GE(ret, 0); + + ret = write(abort_fd, FUSECTL_TEST_VALUE, sizeof(FUSECTL_TEST_VALUE)); + ASSERT_GT(ret, 0); + + close(abort_fd); + + ret = read(test_fd, path_buf, sizeof(path_buf)); + ASSERT_EQ(ret, -1); + ASSERT_EQ(errno, ENOTCONN); +} + +TEST_HARNESS_MAIN -- 2.43.0

1 month

2
4
0 0

[PATCH] libbpf: add support for printing BTF character arrays as strings

by Blake Jones

The BTF dumper code currently displays arrays of characters as just that - arrays, with each character formatted individually. Sometimes this is what makes sense, but it's nice to be able to treat that array as a string. This change adds a special case to the btf_dump functionality to allow arrays of single-byte integer values to be printed as character strings. Characters for which isprint() returns false are printed as hex-escaped values. This is enabled when the new ".print_strings" is set to 1 in the btf_dump_type_data_opts structure. As an example, here's what it looks like to dump the string "hello" using a few different field values for btf_dump_type_data_opts (.compact = 1): - .print_strings = 0, .skip_names = 0: (char[6])['h','e','l','l','o',] - .print_strings = 0, .skip_names = 1: ['h','e','l','l','o',] - .print_strings = 1, .skip_names = 0: (char[6])"hello" - .print_strings = 1, .skip_names = 1: "hello" Here's the string "h\xff", dumped with .compact = 1 and .skip_names = 1: - .print_strings = 0: ['h',-1,] - .print_strings = 1: "h\xff" Signed-off-by: Blake Jones <blakejones(a)google.com> --- tools/lib/bpf/btf.h | 3 +- tools/lib/bpf/btf_dump.c | 51 ++++++++- .../selftests/bpf/prog_tests/btf_dump.c | 102 ++++++++++++++++++ 3 files changed, 154 insertions(+), 2 deletions(-) diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h index 4392451d634b..be8e8e26d245 100644 --- a/tools/lib/bpf/btf.h +++ b/tools/lib/bpf/btf.h @@ -326,9 +326,10 @@ struct btf_dump_type_data_opts { bool compact; /* no newlines/indentation */ bool skip_names; /* skip member/type names */ bool emit_zeroes; /* show 0-valued fields */ + bool print_strings; /* print char arrays as strings */ size_t :0; }; -#define btf_dump_type_data_opts__last_field emit_zeroes +#define btf_dump_type_data_opts__last_field print_strings LIBBPF_API int btf_dump__dump_type_data(struct btf_dump *d, __u32 id, diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c index 460c3e57fadb..a07dd5accdd8 100644 --- a/tools/lib/bpf/btf_dump.c +++ b/tools/lib/bpf/btf_dump.c @@ -75,6 +75,7 @@ struct btf_dump_data { bool is_array_member; bool is_array_terminated; bool is_array_char; + bool print_strings; }; struct btf_dump { @@ -2028,6 +2029,50 @@ static int btf_dump_var_data(struct btf_dump *d, return btf_dump_dump_type_data(d, NULL, t, type_id, data, 0, 0); } +static int btf_dump_string_data(struct btf_dump *d, + const struct btf_type *t, + __u32 id, + const void *data) +{ + const struct btf_array *array = btf_array(t); + __u32 i; + + if (!btf_is_int(skip_mods_and_typedefs(d->btf, array->type, NULL)) || + btf__resolve_size(d->btf, array->type) != 1 || + !d->typed_dump->print_strings) { + pr_warn("unexpected %s() call for array type %u\n", + __func__, array->type); + return -EINVAL; + } + + btf_dump_data_pfx(d); + btf_dump_printf(d, "\""); + + for (i = 0; i < array->nelems; i++, data++) { + char c; + + if (data >= d->typed_dump->data_end) + return -E2BIG; + + c = *(char *)data; + if (c == '\0') { + /* When printing character arrays as strings, NUL bytes + * are always treated as string terminators; they are + * never printed. + */ + break; + } + if (isprint(c)) + btf_dump_printf(d, "%c", c); + else + btf_dump_printf(d, "\\x%02x", *(__u8 *)data); + } + + btf_dump_printf(d, "\""); + + return 0; +} + static int btf_dump_array_data(struct btf_dump *d, const struct btf_type *t, __u32 id, @@ -2055,8 +2100,11 @@ static int btf_dump_array_data(struct btf_dump *d, * char arrays, so if size is 1 and element is * printable as a char, we'll do that. */ - if (elem_size == 1) + if (elem_size == 1) { + if (d->typed_dump->print_strings) + return btf_dump_string_data(d, t, id, data); d->typed_dump->is_array_char = true; + } } /* note that we increment depth before calling btf_dump_print() below; @@ -2544,6 +2592,7 @@ int btf_dump__dump_type_data(struct btf_dump *d, __u32 id, d->typed_dump->compact = OPTS_GET(opts, compact, false); d->typed_dump->skip_names = OPTS_GET(opts, skip_names, false); d->typed_dump->emit_zeroes = OPTS_GET(opts, emit_zeroes, false); + d->typed_dump->print_strings = OPTS_GET(opts, print_strings, false); ret = btf_dump_dump_type_data(d, NULL, t, id, data, 0, 0); diff --git a/tools/testing/selftests/bpf/prog_tests/btf_dump.c b/tools/testing/selftests/bpf/prog_tests/btf_dump.c index c0a776feec23..70e51943f148 100644 --- a/tools/testing/selftests/bpf/prog_tests/btf_dump.c +++ b/tools/testing/selftests/bpf/prog_tests/btf_dump.c @@ -879,6 +879,106 @@ static void test_btf_dump_var_data(struct btf *btf, struct btf_dump *d, "static int bpf_cgrp_storage_busy = (int)2", 2); } +/* + * String-like types are generally not named, so they need to be + * found this way rather than via btf__find_by_name(). + */ +static int find_char_array_type(struct btf *btf, int nelems) +{ + const int nr_types = btf__type_cnt(btf); + const int char_type = btf__find_by_name(btf, "char"); + + for (int i = 1; i < nr_types; i++) { + const struct btf_type *t; + const struct btf_array *at; + + t = btf__type_by_id(btf, i); + if (btf_kind(t) != BTF_KIND_ARRAY) + continue; + + at = btf_array(t); + if (at->nelems == nelems && at->type == char_type) + return i; + } + + return -ENOENT; +} + +static int btf_dump_string_data(struct btf *btf, struct btf_dump *d, + char *str, struct btf_dump_type_data_opts *opts, + char *ptr, size_t ptr_sz, + const char *expected_val) +{ + char name[64]; + size_t type_sz; + int type_id; + int ret = 0; + + snprintf(name, sizeof(name), "char[%zu]", ptr_sz); + type_id = find_char_array_type(btf, ptr_sz); + if (!ASSERT_GE(type_id, 0, "find type id")) + return -ENOENT; + type_sz = btf__resolve_size(btf, type_id); + str[0] = '\0'; + ret = btf_dump__dump_type_data(d, type_id, ptr, ptr_sz, opts); + if (type_sz <= ptr_sz) { + if (!ASSERT_EQ(ret, type_sz, "failed/unexpected type_sz")) + return -EINVAL; + } else { + if (!ASSERT_EQ(ret, -E2BIG, "failed to return -E2BIG")) + return -EINVAL; + } + if (!ASSERT_STREQ(str, expected_val, "ensure expected/actual match")) + return -EFAULT; + return 0; +} + +static void test_btf_dump_string_data(struct btf *btf, struct btf_dump *d, + char *str) +{ + DECLARE_LIBBPF_OPTS(btf_dump_type_data_opts, opts); + + opts.compact = true; + opts.emit_zeroes = false; + opts.print_strings = true; + + opts.skip_names = false; + btf_dump_string_data(btf, d, str, &opts, "foo", 4, + "(char[4])\"foo\""); + + opts.skip_names = true; + btf_dump_string_data(btf, d, str, &opts, "foo", 4, + "\"foo\""); + + /* This should have no effect. */ + opts.emit_zeroes = false; + btf_dump_string_data(btf, d, str, &opts, "foo", 4, + "\"foo\""); + + /* This should have no effect. */ + opts.compact = false; + btf_dump_string_data(btf, d, str, &opts, "foo", 4, + "\"foo\""); + + /* Non-printable characters come out as hex. */ + btf_dump_string_data(btf, d, str, &opts, "fo\xff", 4, + "\"fo\\xff\""); + btf_dump_string_data(btf, d, str, &opts, "fo\x7", 4, + "\"fo\\x07\""); + + /* Should get printed properly even though there's no NUL. */ + char food[4] = { 'f', 'o', 'o', 'd' }; + + btf_dump_string_data(btf, d, str, &opts, food, 4, + "\"food\""); + + /* The embedded NUL should terminate the string. */ + char embed[4] = { 'f', 'o', '\0', 'd' }; + + btf_dump_string_data(btf, d, str, &opts, embed, 4, + "\"fo\""); +} + static void test_btf_datasec(struct btf *btf, struct btf_dump *d, char *str, const char *name, const char *expected_val, void *data, size_t data_sz) @@ -970,6 +1070,8 @@ void test_btf_dump() { test_btf_dump_struct_data(btf, d, str); if (test__start_subtest("btf_dump: var_data")) test_btf_dump_var_data(btf, d, str); + if (test__start_subtest("btf_dump: string_data")) + test_btf_dump_string_data(btf, d, str); btf_dump__free(d); btf__free(btf); -- 2.49.0.1204.g71687c7c1d-goog

1 month

4
7
0 0

[PATCH] selftests: firmware: Add details in error logging

by Harshal

Specify details in logs of failed cases Use die() instead of exit() when write to sys_path fails Signed-off-by: Harshal <embedkari167(a)gmail.com> --- tools/testing/selftests/firmware/fw_namespace.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/firmware/fw_namespace.c b/tools/testing/selftests/firmware/fw_namespace.c index 04757dc7e546..deff7f57b694 100644 --- a/tools/testing/selftests/firmware/fw_namespace.c +++ b/tools/testing/selftests/firmware/fw_namespace.c @@ -38,10 +38,11 @@ static void trigger_fw(const char *fw_name, const char *sys_path) fd = open(sys_path, O_WRONLY); if (fd < 0) - die("open failed: %s\n", + die("open of sys_path failed: %s\n", strerror(errno)); if (write(fd, fw_name, strlen(fw_name)) != strlen(fw_name)) - exit(EXIT_FAILURE); + die("write to sys_path failed: %s\n", + strerror(errno)); close(fd); } @@ -52,10 +53,10 @@ static void setup_fw(const char *fw_path) fd = open(fw_path, O_WRONLY | O_CREAT, 0600); if (fd < 0) - die("open failed: %s\n", + die("open of firmware file failed: %s\n", strerror(errno)); if (write(fd, fw, sizeof(fw) -1) != sizeof(fw) -1) - die("write failed: %s\n", + die("write to firmware file failed: %s\n", strerror(errno)); close(fd); } @@ -66,7 +67,7 @@ static bool test_fw_in_ns(const char *fw_name, const char *sys_path, bool block_ if (block_fw_in_parent_ns) if (mount("test", "/lib/firmware", "tmpfs", MS_RDONLY, NULL) == -1) - die("blocking firmware in parent ns failed\n"); + die("blocking firmware in parent namespace failed\n"); child = fork(); if (child == -1) { @@ -99,11 +100,11 @@ static bool test_fw_in_ns(const char *fw_name, const char *sys_path, bool block_ strerror(errno)); } if (mount(NULL, "/", NULL, MS_SLAVE|MS_REC, NULL) == -1) - die("remount root in child ns failed\n"); + die("remount root in child namespace failed\n"); if (!block_fw_in_parent_ns) { if (mount("test", "/lib/firmware", "tmpfs", MS_RDONLY, NULL) == -1) - die("blocking firmware in child ns failed\n"); + die("blocking firmware in child namespace failed\n"); } else umount("/lib/firmware"); @@ -129,8 +130,8 @@ int main(int argc, char **argv) die("error: failed to build full fw_path\n"); setup_fw(fw_path); - setvbuf(stdout, NULL, _IONBF, 0); + /* Positive case: firmware in PID1 mount namespace */ printf("Testing with firmware in parent namespace (assumed to be same file system as PID1)\n"); if (!test_fw_in_ns(fw_name, sys_path, false)) -- 2.43.0

1 month

3
3
0 0

[PATCH v3] selftests/ptrace/get_syscall_info: fix for MIPS n32

by Dmitry V. Levin

MIPS n32 is one of two ILP32 architectures supported by the kernel that have 64-bit syscall arguments (another one is x32). When this test passed 32-bit arguments to syscall(), they were sign-extended in libc, PTRACE_GET_SYSCALL_INFO reported these sign-extended 64-bit values, and the test complained about the mismatch. Fix this by passing arguments of the appropriate type to syscall(), which is "unsigned long long" on MIPS n32, and __kernel_ulong_t on other architectures, the same way as selftests/ptrace/set_syscall_info already does. As a side effect, this also extends the test on all 64-bit architectures by choosing constants that don't fit into 32-bit integers. Signed-off-by: Dmitry V. Levin <ldv(a)strace.io> Acked-by: Shuah Khan <skhan(a)linuxfoundation.org> --- v3: Added Acked-by: https://lore.kernel.org/all/b2e62143-fa68-4cd1-bf6c-67f0ad49c670@linuxfound… v2: Fixed MIPS #ifdef: https://lore.kernel.org/all/20250115233747.GA28541@strace.io/ .../selftests/ptrace/get_syscall_info.c | 53 +++++++++++-------- 1 file changed, 32 insertions(+), 21 deletions(-) diff --git a/tools/testing/selftests/ptrace/get_syscall_info.c b/tools/testing/selftests/ptrace/get_syscall_info.c index 5bcd1c7b5be6..2970f72d66d3 100644 --- a/tools/testing/selftests/ptrace/get_syscall_info.c +++ b/tools/testing/selftests/ptrace/get_syscall_info.c @@ -11,8 +11,19 @@ #include <err.h> #include <signal.h> #include <asm/unistd.h> +#include <linux/types.h> #include "linux/ptrace.h" +#if defined(_MIPS_SIM) && _MIPS_SIM == _MIPS_SIM_NABI32 +/* + * MIPS N32 is the only architecture where __kernel_ulong_t + * does not match the bitness of syscall arguments. + */ +typedef unsigned long long kernel_ulong_t; +#else +typedef __kernel_ulong_t kernel_ulong_t; +#endif + static int kill_tracee(pid_t pid) { @@ -42,37 +53,37 @@ sys_ptrace(int request, pid_t pid, unsigned long addr, unsigned long data) TEST(get_syscall_info) { - static const unsigned long args[][7] = { + const kernel_ulong_t args[][7] = { /* a sequence of architecture-agnostic syscalls */ { __NR_chdir, - (unsigned long) "", - 0xbad1fed1, - 0xbad2fed2, - 0xbad3fed3, - 0xbad4fed4, - 0xbad5fed5 + (uintptr_t) "", + (kernel_ulong_t) 0xdad1bef1bad1fed1ULL, + (kernel_ulong_t) 0xdad2bef2bad2fed2ULL, + (kernel_ulong_t) 0xdad3bef3bad3fed3ULL, + (kernel_ulong_t) 0xdad4bef4bad4fed4ULL, + (kernel_ulong_t) 0xdad5bef5bad5fed5ULL }, { __NR_gettid, - 0xcaf0bea0, - 0xcaf1bea1, - 0xcaf2bea2, - 0xcaf3bea3, - 0xcaf4bea4, - 0xcaf5bea5 + (kernel_ulong_t) 0xdad0bef0caf0bea0ULL, + (kernel_ulong_t) 0xdad1bef1caf1bea1ULL, + (kernel_ulong_t) 0xdad2bef2caf2bea2ULL, + (kernel_ulong_t) 0xdad3bef3caf3bea3ULL, + (kernel_ulong_t) 0xdad4bef4caf4bea4ULL, + (kernel_ulong_t) 0xdad5bef5caf5bea5ULL }, { __NR_exit_group, 0, - 0xfac1c0d1, - 0xfac2c0d2, - 0xfac3c0d3, - 0xfac4c0d4, - 0xfac5c0d5 + (kernel_ulong_t) 0xdad1bef1fac1c0d1ULL, + (kernel_ulong_t) 0xdad2bef2fac2c0d2ULL, + (kernel_ulong_t) 0xdad3bef3fac3c0d3ULL, + (kernel_ulong_t) 0xdad4bef4fac4c0d4ULL, + (kernel_ulong_t) 0xdad5bef5fac5c0d5ULL } }; - const unsigned long *exp_args; + const kernel_ulong_t *exp_args; pid_t pid = fork(); @@ -154,7 +165,7 @@ TEST(get_syscall_info) } ASSERT_LT(0, (rc = sys_ptrace(PTRACE_GET_SYSCALL_INFO, pid, size, - (unsigned long) &info))) { + (uintptr_t) &info))) { LOG_KILL_TRACEE("PTRACE_GET_SYSCALL_INFO: %m"); } ASSERT_EQ(expected_none_size, rc) { @@ -177,7 +188,7 @@ TEST(get_syscall_info) case SIGTRAP | 0x80: ASSERT_LT(0, (rc = sys_ptrace(PTRACE_GET_SYSCALL_INFO, pid, size, - (unsigned long) &info))) { + (uintptr_t) &info))) { LOG_KILL_TRACEE("PTRACE_GET_SYSCALL_INFO: %m"); } switch (ptrace_stop) { -- ldv

1 month

1
0
0 0

[PATCH v2] selftests/filesystems: Fix build of anon_inode_test

by Mark Brown

The newly added anon_inode_test test fails to build due to attempting to include a nonexisting overlayfs/wrapper.h: anon_inode_test.c:10:10: fatal error: overlayfs/wrappers.h: No such file or directory 10 | #include "overlayfs/wrappers.h" | ^~~~~~~~~~~~~~~~~~~~~~ This is due to 0bd92b9fe538 ("selftests/filesystems: move wrapper.h out of overlayfs subdir") which was added in the vfs-6.16.selftests branch which was based on -rc5 and did not contain the newly added test so once things were merged into mainline the build started failing - both parent commits are fine. Fixes: 3e406741b1989 ("Merge tag 'vfs-6.16-rc1.selftests' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs") Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Changes in v2: - Rebase onto mainline and adjust fixes commit now the two branches got merged there. - Link to v1: https://lore.kernel.org/r/20250518-selftests-anon-inode-build-v1-1-71eff818… --- tools/testing/selftests/filesystems/anon_inode_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/filesystems/anon_inode_test.c b/tools/testing/selftests/filesystems/anon_inode_test.c index e8e0ef1460d2..73e0a4d4fb2f 100644 --- a/tools/testing/selftests/filesystems/anon_inode_test.c +++ b/tools/testing/selftests/filesystems/anon_inode_test.c @@ -7,7 +7,7 @@ #include <sys/stat.h> #include "../kselftest_harness.h" -#include "overlayfs/wrappers.h" +#include "wrappers.h" TEST(anon_inode_no_chown) { --- base-commit: f66bc387efbee59978e076ce9bf123ac353b389c change-id: 20250516-selftests-anon-inode-build-007e206e8422 Best regards, -- Mark Brown <broonie(a)kernel.org>

1 month

2
1
0 0

[PATCH v2] selftests/ptrace/get_syscall_info: fix for MIPS n32

by Dmitry V. Levin

MIPS n32 is one of two ILP32 architectures supported by the kernel that have 64-bit syscall arguments (another one is x32). When this test passed 32-bit arguments to syscall(), they were sign-extended in libc, PTRACE_GET_SYSCALL_INFO reported these sign-extended 64-bit values, and the test complained about the mismatch. Fix this by passing arguments of the appropriate type to syscall(), which is "unsigned long long" on MIPS n32, and __kernel_ulong_t on other architectures. As a side effect, this also extends the test on all 64-bit architectures by choosing constants that don't fit into 32-bit integers. Signed-off-by: Dmitry V. Levin <ldv(a)strace.io> --- v2: Fixed MIPS #ifdef. .../selftests/ptrace/get_syscall_info.c | 53 +++++++++++-------- 1 file changed, 32 insertions(+), 21 deletions(-) diff --git a/tools/testing/selftests/ptrace/get_syscall_info.c b/tools/testing/selftests/ptrace/get_syscall_info.c index 5bcd1c7b5be6..2970f72d66d3 100644 --- a/tools/testing/selftests/ptrace/get_syscall_info.c +++ b/tools/testing/selftests/ptrace/get_syscall_info.c @@ -11,8 +11,19 @@ #include <err.h> #include <signal.h> #include <asm/unistd.h> +#include <linux/types.h> #include "linux/ptrace.h" +#if defined(_MIPS_SIM) && _MIPS_SIM == _MIPS_SIM_NABI32 +/* + * MIPS N32 is the only architecture where __kernel_ulong_t + * does not match the bitness of syscall arguments. + */ +typedef unsigned long long kernel_ulong_t; +#else +typedef __kernel_ulong_t kernel_ulong_t; +#endif + static int kill_tracee(pid_t pid) { @@ -42,37 +53,37 @@ sys_ptrace(int request, pid_t pid, unsigned long addr, unsigned long data) TEST(get_syscall_info) { - static const unsigned long args[][7] = { + const kernel_ulong_t args[][7] = { /* a sequence of architecture-agnostic syscalls */ { __NR_chdir, - (unsigned long) "", - 0xbad1fed1, - 0xbad2fed2, - 0xbad3fed3, - 0xbad4fed4, - 0xbad5fed5 + (uintptr_t) "", + (kernel_ulong_t) 0xdad1bef1bad1fed1ULL, + (kernel_ulong_t) 0xdad2bef2bad2fed2ULL, + (kernel_ulong_t) 0xdad3bef3bad3fed3ULL, + (kernel_ulong_t) 0xdad4bef4bad4fed4ULL, + (kernel_ulong_t) 0xdad5bef5bad5fed5ULL }, { __NR_gettid, - 0xcaf0bea0, - 0xcaf1bea1, - 0xcaf2bea2, - 0xcaf3bea3, - 0xcaf4bea4, - 0xcaf5bea5 + (kernel_ulong_t) 0xdad0bef0caf0bea0ULL, + (kernel_ulong_t) 0xdad1bef1caf1bea1ULL, + (kernel_ulong_t) 0xdad2bef2caf2bea2ULL, + (kernel_ulong_t) 0xdad3bef3caf3bea3ULL, + (kernel_ulong_t) 0xdad4bef4caf4bea4ULL, + (kernel_ulong_t) 0xdad5bef5caf5bea5ULL }, { __NR_exit_group, 0, - 0xfac1c0d1, - 0xfac2c0d2, - 0xfac3c0d3, - 0xfac4c0d4, - 0xfac5c0d5 + (kernel_ulong_t) 0xdad1bef1fac1c0d1ULL, + (kernel_ulong_t) 0xdad2bef2fac2c0d2ULL, + (kernel_ulong_t) 0xdad3bef3fac3c0d3ULL, + (kernel_ulong_t) 0xdad4bef4fac4c0d4ULL, + (kernel_ulong_t) 0xdad5bef5fac5c0d5ULL } }; - const unsigned long *exp_args; + const kernel_ulong_t *exp_args; pid_t pid = fork(); @@ -154,7 +165,7 @@ TEST(get_syscall_info) } ASSERT_LT(0, (rc = sys_ptrace(PTRACE_GET_SYSCALL_INFO, pid, size, - (unsigned long) &info))) { + (uintptr_t) &info))) { LOG_KILL_TRACEE("PTRACE_GET_SYSCALL_INFO: %m"); } ASSERT_EQ(expected_none_size, rc) { @@ -177,7 +188,7 @@ TEST(get_syscall_info) case SIGTRAP | 0x80: ASSERT_LT(0, (rc = sys_ptrace(PTRACE_GET_SYSCALL_INFO, pid, size, - (unsigned long) &info))) { + (uintptr_t) &info))) { LOG_KILL_TRACEE("PTRACE_GET_SYSCALL_INFO: %m"); } switch (ptrace_stop) { -- ldv

1 month

3
7
0 0

[PATCH] lib/tests: Make RANDSTRUCT_KUNIT_TEST depend on RANDSTRUCT

by Geert Uytterhoeven

When CONFIG_RANDSTRUCT is not enabled, all randstruct tests are skipped. Move this logic from run-time to config-time, to avoid people building and running tests that do not do anything. Fixes: b370f7eacdcfe1dd ("lib/tests: Add randstruct KUnit test") Signed-off-by: Geert Uytterhoeven <geert(a)linux-m68k.org> --- FTR, after adding "select HAVE_GCC_PLUGINS" to arch/m68k/Kconfig and installing gcc-13-plugin-dev-m68k-linux-gnu, I could enable CONFIG_RANDSTRUCT_FULL and run the tests (all passed!), so probably I should send a patch to select HAVE_GCC_PLUGINS? --- lib/Kconfig.debug | 4 ++-- lib/tests/randstruct_kunit.c | 9 --------- 2 files changed, 2 insertions(+), 11 deletions(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index a8b4febad716a4be..407f2ed7fcb3e94c 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2895,10 +2895,10 @@ config OVERFLOW_KUNIT_TEST config RANDSTRUCT_KUNIT_TEST tristate "Test randstruct structure layout randomization at runtime" if !KUNIT_ALL_TESTS depends on KUNIT + depends on RANDSTRUCT default KUNIT_ALL_TESTS help - Builds unit tests for the checking CONFIG_RANDSTRUCT=y, which - randomizes structure layouts. + Builds unit tests for checking structure layout randomization. config STACKINIT_KUNIT_TEST tristate "Test level of stack variable initialization" if !KUNIT_ALL_TESTS diff --git a/lib/tests/randstruct_kunit.c b/lib/tests/randstruct_kunit.c index f3a2d63c4cfbe7dc..2c95eca76d2411bc 100644 --- a/lib/tests/randstruct_kunit.c +++ b/lib/tests/randstruct_kunit.c @@ -305,14 +305,6 @@ static void randstruct_initializers(struct kunit *test) #undef init_members } -static int randstruct_test_init(struct kunit *test) -{ - if (!IS_ENABLED(CONFIG_RANDSTRUCT)) - kunit_skip(test, "Not built with CONFIG_RANDSTRUCT=y"); - - return 0; -} - static struct kunit_case randstruct_test_cases[] = { KUNIT_CASE(randstruct_layout_same), KUNIT_CASE(randstruct_layout_mixed), @@ -324,7 +316,6 @@ static struct kunit_case randstruct_test_cases[] = { static struct kunit_suite randstruct_test_suite = { .name = "randstruct", - .init = randstruct_test_init, .test_cases = randstruct_test_cases, }; -- 2.43.0

1 month

2
3
0 0

[PATCH] lib/tests: Make FORTIFY_KUNIT_TEST depend on FORTIFY_SOURCE

by Geert Uytterhoeven

When CONFIG_FORTIFY_SOURCE is not enabled, all fortify tests are skipped. Move this logic from run-time to config-time, to avoid people building and running tests that do not do anything. This basically reverts commit 1a78f8cb5daac774 ("fortify: Allow KUnit test to build without FORTIFY") in v6.9, which was v3 of commit a9dc8d0442294b42 ("fortify: Allow KUnit test to build without FORTIFY") in v6.5, which was quickly reverted in commit 5e2956ee46244ffb ("Revert "fortify: Allow KUnit test to build without FORTIFY""). Signed-off-by: Geert Uytterhoeven <geert(a)linux-m68k.org> --- Let's keep on playing whack-a-mole ;-) --- lib/Kconfig.debug | 1 + lib/tests/fortify_kunit.c | 8 -------- 2 files changed, 1 insertion(+), 8 deletions(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 407f2ed7fcb3e94c..ca5afd192c9fbf51 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2912,6 +2912,7 @@ config STACKINIT_KUNIT_TEST config FORTIFY_KUNIT_TEST tristate "Test fortified str*() and mem*() function internals at runtime" if !KUNIT_ALL_TESTS depends on KUNIT + depends on FORTIFY_SOURCE default KUNIT_ALL_TESTS help Builds unit tests for checking internals of FORTIFY_SOURCE as used diff --git a/lib/tests/fortify_kunit.c b/lib/tests/fortify_kunit.c index 29ffc62a71e3f968..10b0e1b12cdc3ae2 100644 --- a/lib/tests/fortify_kunit.c +++ b/lib/tests/fortify_kunit.c @@ -48,11 +48,6 @@ void fortify_add_kunit_error(int write); #include <linux/string.h> #include <linux/vmalloc.h> -/* Handle being built without CONFIG_FORTIFY_SOURCE */ -#ifndef __compiletime_strlen -# define __compiletime_strlen __builtin_strlen -#endif - static struct kunit_resource read_resource; static struct kunit_resource write_resource; static int fortify_read_overflows; @@ -1071,9 +1066,6 @@ static void fortify_test_kmemdup(struct kunit *test) static int fortify_test_init(struct kunit *test) { - if (!IS_ENABLED(CONFIG_FORTIFY_SOURCE)) - kunit_skip(test, "Not built with CONFIG_FORTIFY_SOURCE=y"); - fortify_read_overflows = 0; kunit_add_named_resource(test, NULL, NULL, &read_resource, "fortify_read_overflows", -- 2.43.0

1 month

2
1
0 0

[PATCH v2 10/10] selftests/sched_ext: Add test for sched_ext dl_server

by Joel Fernandes

From: Andrea Righi <arighi(a)nvidia.com> Add a selftest to validate the correct behavior of the deadline server for the ext_sched_class. [ Joel: Replaced occurences of CFS in the test with EXT. ] Signed-off-by: Andrea Righi <arighi(a)nvidia.com> Signed-off-by: Joel Fernandes <joelagnelf(a)nvidia.com> --- tools/testing/selftests/sched_ext/Makefile | 1 + .../selftests/sched_ext/rt_stall.bpf.c | 23 ++ tools/testing/selftests/sched_ext/rt_stall.c | 213 ++++++++++++++++++ 3 files changed, 237 insertions(+) create mode 100644 tools/testing/selftests/sched_ext/rt_stall.bpf.c create mode 100644 tools/testing/selftests/sched_ext/rt_stall.c diff --git a/tools/testing/selftests/sched_ext/Makefile b/tools/testing/selftests/sched_ext/Makefile index f4531327b8e7..dcc803eeab39 100644 --- a/tools/testing/selftests/sched_ext/Makefile +++ b/tools/testing/selftests/sched_ext/Makefile @@ -181,6 +181,7 @@ auto-test-targets := \ select_cpu_dispatch_bad_dsq \ select_cpu_dispatch_dbl_dsp \ select_cpu_vtime \ + rt_stall \ test_example \ testcase-targets := $(addsuffix .o,$(addprefix $(SCXOBJ_DIR)/,$(auto-test-targets))) diff --git a/tools/testing/selftests/sched_ext/rt_stall.bpf.c b/tools/testing/selftests/sched_ext/rt_stall.bpf.c new file mode 100644 index 000000000000..80086779dd1e --- /dev/null +++ b/tools/testing/selftests/sched_ext/rt_stall.bpf.c @@ -0,0 +1,23 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * A scheduler that verified if RT tasks can stall SCHED_EXT tasks. + * + * Copyright (c) 2025 NVIDIA Corporation. + */ + +#include <scx/common.bpf.h> + +char _license[] SEC("license") = "GPL"; + +UEI_DEFINE(uei); + +void BPF_STRUCT_OPS(rt_stall_exit, struct scx_exit_info *ei) +{ + UEI_RECORD(uei, ei); +} + +SEC(".struct_ops.link") +struct sched_ext_ops rt_stall_ops = { + .exit = (void *)rt_stall_exit, + .name = "rt_stall", +}; diff --git a/tools/testing/selftests/sched_ext/rt_stall.c b/tools/testing/selftests/sched_ext/rt_stall.c new file mode 100644 index 000000000000..d4cb545ebfd8 --- /dev/null +++ b/tools/testing/selftests/sched_ext/rt_stall.c @@ -0,0 +1,213 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2025 NVIDIA Corporation. + */ +#define _GNU_SOURCE +#include <stdio.h> +#include <stdlib.h> +#include <unistd.h> +#include <sched.h> +#include <sys/prctl.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <time.h> +#include <linux/sched.h> +#include <signal.h> +#include <bpf/bpf.h> +#include <scx/common.h> +#include <sys/wait.h> +#include <unistd.h> +#include "rt_stall.bpf.skel.h" +#include "scx_test.h" +#include "../kselftest.h" + +#define CORE_ID 0 /* CPU to pin tasks to */ +#define RUN_TIME 5 /* How long to run the test in seconds */ + +/* Simple busy-wait function for test tasks */ +static void process_func(void) +{ + while (1) { + /* Busy wait */ + for (volatile unsigned long i = 0; i < 10000000UL; i++); + } +} + +/* Set CPU affinity to a specific core */ +static void set_affinity(int cpu) +{ + cpu_set_t mask; + + CPU_ZERO(&mask); + CPU_SET(cpu, &mask); + if (sched_setaffinity(0, sizeof(mask), &mask) != 0) { + perror("sched_setaffinity"); + exit(EXIT_FAILURE); + } +} + +/* Set task scheduling policy and priority */ +static void set_sched(int policy, int priority) +{ + struct sched_param param; + + param.sched_priority = priority; + if (sched_setscheduler(0, policy, &param) != 0) { + perror("sched_setscheduler"); + exit(EXIT_FAILURE); + } +} + +/* Get process runtime from /proc/<pid>/stat */ +static float get_process_runtime(int pid) +{ + char path[256]; + FILE *file; + long utime, stime; + int fields; + + snprintf(path, sizeof(path), "/proc/%d/stat", pid); + file = fopen(path, "r"); + if (file == NULL) { + perror("Failed to open stat file"); + return -1; + } + + /* Skip the first 13 fields and read the 14th and 15th */ + fields = fscanf(file, + "%*d %*s %*c %*d %*d %*d %*d %*d %*u %*u %*u %*u %*u %lu %lu", + &utime, &stime); + fclose(file); + + if (fields != 2) { + fprintf(stderr, "Failed to read stat file\n"); + return -1; + } + + /* Calculate the total time spent in the process */ + long total_time = utime + stime; + long ticks_per_second = sysconf(_SC_CLK_TCK); + float runtime_seconds = total_time * 1.0 / ticks_per_second; + + return runtime_seconds; +} + +static enum scx_test_status setup(void **ctx) +{ + struct rt_stall *skel; + + skel = rt_stall__open(); + SCX_FAIL_IF(!skel, "Failed to open"); + SCX_ENUM_INIT(skel); + SCX_FAIL_IF(rt_stall__load(skel), "Failed to load skel"); + + *ctx = skel; + + return SCX_TEST_PASS; +} + +static bool sched_stress_test(void) +{ + float cfs_runtime, rt_runtime; + int cfs_pid, rt_pid; + float expected_min_ratio = 0.04; /* 4% */ + + ksft_print_header(); + ksft_set_plan(1); + + /* Create and set up a EXT task */ + cfs_pid = fork(); + if (cfs_pid == 0) { + set_affinity(CORE_ID); + process_func(); + exit(0); + } else if (cfs_pid < 0) { + perror("fork for EXT task"); + ksft_exit_fail(); + } + + /* Create an RT task */ + rt_pid = fork(); + if (rt_pid == 0) { + set_affinity(CORE_ID); + set_sched(SCHED_FIFO, 50); + process_func(); + exit(0); + } else if (rt_pid < 0) { + perror("fork for RT task"); + ksft_exit_fail(); + } + + /* Let the processes run for the specified time */ + sleep(RUN_TIME); + + /* Get runtime for the EXT task */ + cfs_runtime = get_process_runtime(cfs_pid); + if (cfs_runtime != -1) + ksft_print_msg("Runtime of EXT task (PID %d) is %f seconds\n", cfs_pid, cfs_runtime); + else + ksft_exit_fail_msg("Error getting runtime for EXT task (PID %d)\n", cfs_pid); + + /* Get runtime for the RT task */ + rt_runtime = get_process_runtime(rt_pid); + if (rt_runtime != -1) + ksft_print_msg("Runtime of RT task (PID %d) is %f seconds\n", rt_pid, rt_runtime); + else + ksft_exit_fail_msg("Error getting runtime for RT task (PID %d)\n", rt_pid); + + /* Kill the processes */ + kill(cfs_pid, SIGKILL); + kill(rt_pid, SIGKILL); + waitpid(cfs_pid, NULL, 0); + waitpid(rt_pid, NULL, 0); + + /* Verify that the scx task got enough runtime */ + float actual_ratio = cfs_runtime / (cfs_runtime + rt_runtime); + ksft_print_msg("EXT task got %.2f%% of total runtime\n", actual_ratio * 100); + + if (actual_ratio >= expected_min_ratio) { + ksft_test_result_pass("PASS: EXT task got more than %.2f%% of runtime\n", + expected_min_ratio * 100); + return true; + } else { + ksft_test_result_fail("FAIL: EXT task got less than %.2f%% of runtime\n", + expected_min_ratio * 100); + return false; + } +} + +static enum scx_test_status run(void *ctx) +{ + struct rt_stall *skel = ctx; + struct bpf_link *link; + bool res; + + link = bpf_map__attach_struct_ops(skel->maps.rt_stall_ops); + SCX_FAIL_IF(!link, "Failed to attach scheduler"); + + res = sched_stress_test(); + + SCX_EQ(skel->data->uei.kind, EXIT_KIND(SCX_EXIT_NONE)); + bpf_link__destroy(link); + + if (!res) + ksft_exit_fail(); + + return SCX_TEST_PASS; +} + +static void cleanup(void *ctx) +{ + struct rt_stall *skel = ctx; + + rt_stall__destroy(skel); +} + +struct scx_test rt_stall = { + .name = "rt_stall", + .description = "Verify that RT tasks cannot stall SCHED_EXT tasks", + .setup = setup, + .run = run, + .cleanup = cleanup, +}; +REGISTER_SCX_TEST(&rt_stall) -- 2.43.0

1 month

1
0
0 0

[PATCH] selftests/bpf: Validate UDP length in cls_redirect test

by Suchit

Add validation step to ensure that the UDP payload is long enough to contain the expected GUE and UNIGUE encapsulation headers Signed-off-by: Suchit <suchitkarunakaran(a)gmail.com> --- tools/testing/selftests/bpf/progs/test_cls_redirect.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/progs/test_cls_redirect.c b/tools/testing/selftests/bpf/progs/test_cls_redirect.c index f344c6835e84..c1d2eaee2e77 100644 --- a/tools/testing/selftests/bpf/progs/test_cls_redirect.c +++ b/tools/testing/selftests/bpf/progs/test_cls_redirect.c @@ -978,7 +978,14 @@ int cls_redirect(struct __sk_buff *skb) return TC_ACT_OK; } - /* TODO Check UDP length? */ + uint16_t udp_len = bpf_ntohs(encap->udp.len); + uint16_t min_encap_len = sizeof(encap->udp) + sizeof(encap->gue) + sizeof(encap->unigue); + + if (udp_len < min_encap_len) { + metrics->errors_total_malformed_encapsulation++; + return TC_ACT_SHOT; + } if (encap->udp.dest != ENCAPSULATION_PORT) { return TC_ACT_OK; } -- 2.49.0

1 month

1
0
0 0

[PATCH net-next v2 0/4] netconsole: Optimize console registration and improve testing

by Breno Leitao

During performance analysis of console subsystem latency, I discovered that netconsole registers console handlers even when no active targets exist. These orphaned console handlers are invoked on every printk() call, get the lock, iterate through empty target lists, and consume CPU cycles without performing any useful work. This patch series addresses the inefficiency by: 1. Implementing dynamic console registration/unregistration based on target availability, ensuring console handlers are only active when needed 2. Adding automatic cleanup of unused console registrations when targets are disabled or removed 3. Extending the selftest suite to cover non-extended console format, which was previously untested The optimization reduces printk() overhead by eliminating unnecessary function calls and list traversals when netconsole targets are not configured, improving overall system performance during heavy logging scenarios. --- Changes in v2: - Added selftests to test the new mechanism - Unregister the console if the last target got disabled - Sending to net-next instead of net (Jakub) - Link to v1: https://lore.kernel.org/r/20250528-netcons_ext-v1-1-69f71e404e00@debian.org --- Breno Leitao (4): netconsole: Only register console drivers when targets are configured netconsole: Add automatic console unregistration on target removal selftests: netconsole: Do not exit from inside the validation function selftests: netconsole: Add support for basic netconsole target format drivers/net/netconsole.c | 61 +++++++++++++++++++--- .../selftests/drivers/net/lib/sh/lib_netcons.sh | 27 ++++++++-- .../testing/selftests/drivers/net/netcons_basic.sh | 50 +++++++++++------- 3 files changed, 107 insertions(+), 31 deletions(-) --- base-commit: 914873bc7df913db988284876c16257e6ab772c6 change-id: 20250528-netcons_ext-572982619bea Best regards, -- Breno Leitao <leitao(a)debian.org>

1 month

2
5
0 0

[PATCH v11 0/5] rust: replace kernel::str::CStr w/ core::ffi::CStr

by Tamir Duberstein

This picks up from Michal Rostecki's work[0]. Per Michal's guidance I have omitted Co-authored tags, as the end result is quite different. Link: https://lore.kernel.org/rust-for-linux/20240819153656.28807-2-vadorovsky@pr… [0] Closes: https://github.com/Rust-for-Linux/linux/issues/1075 Signed-off-by: Tamir Duberstein <tamird(a)gmail.com> --- Changes in v11: - Use `quote_spanned!` to avoid `use<'a, T>` and generally reduce manual token construction. - Add a commit to simplify `quote_spanned!`. - Drop first commit in favor of https://lore.kernel.org/rust-for-linux/20240906164448.2268368-1-paddymills@…. (Miguel Ojeda) - Correctly handle expressions such as `pr_info!("{a}", a = a = a)`. (Benno Lossin) - Avoid dealing with `}}` escapes, which is not needed. (Benno Lossin) - Revert some unnecessary changes. (Benno Lossin) - Rename `c_str_avoid_literals!` to `str_to_cstr!`. (Benno Lossin & Alice Ryhl). - Link to v10: https://lore.kernel.org/r/20250524-cstr-core-v10-0-6412a94d9d75@gmail.com Changes in v10: - Rebase on cbeaa41dfe26b72639141e87183cb23e00d4b0dd. - Implement Alice's suggestion to use a proc macro to work around orphan rules otherwise preventing `core::ffi::CStr` to be directly printed with `{}`. - Link to v9: https://lore.kernel.org/r/20250317-cstr-core-v9-0-51d6cc522f62@gmail.com Changes in v9: - Rebase on rust-next. - Restore `impl Display for BStr` which exists upstream[1]. - Link: https://doc.rust-lang.org/nightly/std/bstr/struct.ByteStr.html#impl-Display… [1] - Link to v8: https://lore.kernel.org/r/20250203-cstr-core-v8-0-cb3f26e78686@gmail.com Changes in v8: - Move `{from,as}_char_ptr` back to `CStrExt`. This reduces the diff some. - Restore `from_bytes_with_nul_unchecked_mut`, `to_cstring`. - Link to v7: https://lore.kernel.org/r/20250202-cstr-core-v7-0-da1802520438@gmail.com Changes in v7: - Rebased on mainline. - Restore functionality added in commit a321f3ad0a5d ("rust: str: add {make,to}_{upper,lower}case() to CString"). - Used `diff.algorithm patience` to improve diff readability. - Link to v6: https://lore.kernel.org/r/20250202-cstr-core-v6-0-8469cd6d29fd@gmail.com Changes in v6: - Split the work into several commits for ease of review. - Restore `{from,as}_char_ptr` to allow building on ARM (see commit message). - Add `CStrExt` to `kernel::prelude`. (Alice Ryhl) - Remove `CStrExt::from_bytes_with_nul_unchecked_mut` and restore `DerefMut for CString`. (Alice Ryhl) - Rename and hide `kernel::c_str!` to encourage use of C-String literals. - Drop implementation and invocation changes in kunit.rs. (Trevor Gross) - Drop docs on `Display` impl. (Trevor Gross) - Rewrite docs in the style of the standard library. - Restore the `test_cstr_debug` unit tests to demonstrate that the implementation has changed. Changes in v5: - Keep the `test_cstr_display*` unit tests. Changes in v4: - Provide the `CStrExt` trait with `display()` method, which returns a `CStrDisplay` wrapper with `Display` implementation. This addresses the lack of `Display` implementation for `core::ffi::CStr`. - Provide `from_bytes_with_nul_unchecked_mut()` method in `CStrExt`, which might be useful and is going to prevent manual, unsafe casts. - Fix a typo (s/preffered/prefered/). Changes in v3: - Fix the commit message. - Remove redundant braces in `use`, when only one item is imported. Changes in v2: - Do not remove `c_str` macro. While it's preferred to use C-string literals, there are two cases where `c_str` is helpful: - When working with macros, which already return a Rust string literal (e.g. `stringify!`). - When building macros, where we want to take a Rust string literal as an argument (for caller's convenience), but still use it as a C-string internally. - Use Rust literals as arguments in macros (`new_mutex`, `new_condvar`, `new_mutex`). Use the `c_str` macro to convert these literals to C-string literals. - Use `c_str` in kunit.rs for converting the output of `stringify!` to a `CStr`. - Remove `DerefMut` implementation for `CString`. --- Tamir Duberstein (5): rust: macros: reduce collections in `quote!` macro rust: support formatting of foreign types rust: replace `CStr` with `core::ffi::CStr` rust: replace `kernel::c_str!` with C-Strings rust: remove core::ffi::CStr reexport drivers/block/rnull.rs | 4 +- drivers/gpu/drm/drm_panic_qr.rs | 5 +- drivers/gpu/nova-core/driver.rs | 2 +- drivers/gpu/nova-core/firmware.rs | 2 +- drivers/net/phy/ax88796b_rust.rs | 8 +- drivers/net/phy/qt2025.rs | 6 +- rust/kernel/block/mq.rs | 2 +- rust/kernel/device.rs | 9 +- rust/kernel/devres.rs | 2 +- rust/kernel/driver.rs | 4 +- rust/kernel/error.rs | 10 +- rust/kernel/faux.rs | 5 +- rust/kernel/firmware.rs | 16 +- rust/kernel/fmt.rs | 77 ++++++ rust/kernel/kunit.rs | 21 +- rust/kernel/lib.rs | 3 +- rust/kernel/miscdevice.rs | 5 +- rust/kernel/net/phy.rs | 12 +- rust/kernel/of.rs | 5 +- rust/kernel/pci.rs | 2 +- rust/kernel/platform.rs | 6 +- rust/kernel/prelude.rs | 5 +- rust/kernel/print.rs | 4 +- rust/kernel/seq_file.rs | 6 +- rust/kernel/str.rs | 443 ++++++++++------------------------- rust/kernel/sync.rs | 7 +- rust/kernel/sync/condvar.rs | 4 +- rust/kernel/sync/lock.rs | 4 +- rust/kernel/sync/lock/global.rs | 6 +- rust/kernel/sync/poll.rs | 1 + rust/kernel/workqueue.rs | 9 +- rust/macros/fmt.rs | 99 ++++++++ rust/macros/kunit.rs | 10 +- rust/macros/lib.rs | 19 ++ rust/macros/module.rs | 2 +- rust/macros/quote.rs | 111 ++++----- samples/rust/rust_driver_faux.rs | 4 +- samples/rust/rust_driver_pci.rs | 4 +- samples/rust/rust_driver_platform.rs | 4 +- samples/rust/rust_misc_device.rs | 3 +- scripts/rustdoc_test_gen.rs | 6 +- 41 files changed, 485 insertions(+), 472 deletions(-) --- base-commit: 7a17bbc1d952057898cb0739e60665908fbb8c72 change-id: 20250201-cstr-core-d4b9b69120cf Best regards, -- Tamir Duberstein <tamird(a)gmail.com>

1 month

3
11
0 0

[PATCH bpf-next 1/2] bpf: Add bpf_task_cwd_from_pid() kfunc

by Rong Tao

From: Rong Tao <rongtao(a)cestc.cn> It is a bit troublesome to get cwd based on pid in bpf program, such as bpftrace example [1]. This patch therefore adds a new bpf_task_cwd_from_pid() kfunc which allows BPF programs to get cwd from a pid. [1] https://github.com/bpftrace/bpftrace/issues/3314 Signed-off-by: Rong Tao <rongtao(a)cestc.cn> --- kernel/bpf/helpers.c | 45 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index b71e428ad936..0f32fbc997bb 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -24,6 +24,10 @@ #include <linux/bpf_mem_alloc.h> #include <linux/kasan.h> #include <linux/bpf_verifier.h> +#include <linux/fs.h> +#include <linux/fs_struct.h> +#include <linux/path.h> +#include <linux/string.h> #include "../../lib/kstrtox.h" @@ -2643,6 +2647,46 @@ __bpf_kfunc struct task_struct *bpf_task_from_vpid(s32 vpid) return p; } +/** + * bpf_task_cwd_from_pid - Get a task's absolute pathname of the current + * working directory from its pid. + * @pid: The pid of the task being looked up. + * @buf: The array pointed to by buf. + * @buf_len: buf length. + */ +__bpf_kfunc int bpf_task_cwd_from_pid(s32 pid, char *buf, u32 buf_len) +{ + struct path pwd; + char kpath[256], *path; + struct task_struct *task; + + if (!buf || buf_len == 0) + return -EINVAL; + + rcu_read_lock(); + task = pid_task(find_vpid(pid), PIDTYPE_PID); + if (!task) { + rcu_read_unlock(); + return -ESRCH; + } + task_lock(task); + if (!task->fs) { + task_unlock(task); + return -ENOENT; + } + get_fs_pwd(task->fs, &pwd); + task_unlock(task); + rcu_read_unlock(); + + path = d_path(&pwd, kpath, sizeof(kpath)); + path_put(&pwd); + if (IS_ERR(path)) + return PTR_ERR(path); + + strncpy(buf, path, buf_len); + return 0; +} + /** * bpf_dynptr_slice() - Obtain a read-only pointer to the dynptr data. * @p: The dynptr whose data slice to retrieve @@ -3314,6 +3358,7 @@ BTF_ID_FLAGS(func, bpf_task_get_cgroup1, KF_ACQUIRE | KF_RCU | KF_RET_NULL) #endif BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_task_from_vpid, KF_ACQUIRE | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_task_cwd_from_pid, KF_RET_NULL) BTF_ID_FLAGS(func, bpf_throw) #ifdef CONFIG_BPF_EVENTS BTF_ID_FLAGS(func, bpf_send_signal_task, KF_TRUSTED_ARGS) -- 2.49.0

1 month

4
5
0 0

[PATCH AUTOSEL 6.12 69/93] selftests: harness: Mark functions without prototypes static

by Sasha Levin

From: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> [ Upstream commit c2bcc8e9577a35f9cf4707f8bb0b58bce30991aa ] With -Wmissing-prototypes the compiler will warn about non-static functions which don't have a prototype defined. As they are not used from a different compilation unit they don't need to be defined globally. Avoid the issue by marking the functions static. Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com> Acked-by: Shuah Khan <skhan(a)linuxfoundation.org> Link: https://lore.kernel.org/r/20250505-nolibc-kselftest-harness-v4-4-ee4dd52571… Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- **YES** This commit should be backported to stable kernel trees. **Rationale:** 1. **Legitimate Build Fix**: The commit addresses a real compiler warning issue (`-Wmissing-prototypes`) that affects build cleanliness and code quality. Modern build systems increasingly use stricter warning flags, making this fix valuable for stable trees. 2. **Zero Functional Risk**: The changes are purely cosmetic from a runtime perspective. Adding `static` to functions that were already internal has no impact on functionality, memory layout, or behavior - it only affects compiler symbol visibility and warnings. 3. **Minimal and Contained**: The diff is extremely small (4 function signatures with `static` added) and isolated to the kselftest harness framework. There are no complex logic changes or cross-subsystem impacts. 4. **Testing Infrastructure Improvement**: While the kselftest framework isn't critical runtime code, it's important for kernel testing and validation. Improving build compliance in testing infrastructure benefits stable kernel maintenance. 5. **Standard Practice**: Compiler warning fixes of this nature (adding missing `static` keywords) are routinely backported to stable trees as they represent good coding practices without functional risk. 6. **Different from Similar Commits**: Unlike the referenced similar commits (all marked "NO") which involved feature additions, API changes, or structural modifications, this commit is purely a build compliance fix with no behavioral changes. The commit meets all stable tree criteria: it fixes an issue (compiler warnings), has minimal risk (no functional changes), and improves code quality without introducing new features or architectural changes. Tools like `kselftest_harness.h:241`, `kselftest_harness.h:290`, `kselftest_harness.h:970`, and `kselftest_harness.h:1188` are the specific locations where these low- risk improvements are made. tools/testing/selftests/kselftest_harness.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h index 666c9fde76da9..7c337b4fa054d 100644 --- a/tools/testing/selftests/kselftest_harness.h +++ b/tools/testing/selftests/kselftest_harness.h @@ -258,7 +258,7 @@ * A bare "return;" statement may be used to return early. */ #define FIXTURE_SETUP(fixture_name) \ - void fixture_name##_setup( \ + static void fixture_name##_setup( \ struct __test_metadata __attribute__((unused)) *_metadata, \ FIXTURE_DATA(fixture_name) __attribute__((unused)) *self, \ const FIXTURE_VARIANT(fixture_name) \ @@ -307,7 +307,7 @@ __FIXTURE_TEARDOWN(fixture_name) #define __FIXTURE_TEARDOWN(fixture_name) \ - void fixture_name##_teardown( \ + static void fixture_name##_teardown( \ struct __test_metadata __attribute__((unused)) *_metadata, \ FIXTURE_DATA(fixture_name) __attribute__((unused)) *self, \ const FIXTURE_VARIANT(fixture_name) \ @@ -987,7 +987,7 @@ static void __timeout_handler(int sig, siginfo_t *info, void *ucontext) kill(-(t->pid), SIGKILL); } -void __wait_for_test(struct __test_metadata *t) +static void __wait_for_test(struct __test_metadata *t) { struct sigaction action = { .sa_sigaction = __timeout_handler, @@ -1205,9 +1205,9 @@ static bool test_enabled(int argc, char **argv, return !has_positive; } -void __run_test(struct __fixture_metadata *f, - struct __fixture_variant_metadata *variant, - struct __test_metadata *t) +static void __run_test(struct __fixture_metadata *f, + struct __fixture_variant_metadata *variant, + struct __test_metadata *t) { struct __test_xfail *xfail; char test_name[1024]; -- 2.39.5

1 month

1
0
0 0

[PATCH AUTOSEL 6.14 076/102] selftests: harness: Mark functions without prototypes static

by Sasha Levin

From: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> [ Upstream commit c2bcc8e9577a35f9cf4707f8bb0b58bce30991aa ] With -Wmissing-prototypes the compiler will warn about non-static functions which don't have a prototype defined. As they are not used from a different compilation unit they don't need to be defined globally. Avoid the issue by marking the functions static. Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com> Acked-by: Shuah Khan <skhan(a)linuxfoundation.org> Link: https://lore.kernel.org/r/20250505-nolibc-kselftest-harness-v4-4-ee4dd52571… Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- **YES** This commit should be backported to stable kernel trees. **Rationale:** 1. **Legitimate Build Fix**: The commit addresses a real compiler warning issue (`-Wmissing-prototypes`) that affects build cleanliness and code quality. Modern build systems increasingly use stricter warning flags, making this fix valuable for stable trees. 2. **Zero Functional Risk**: The changes are purely cosmetic from a runtime perspective. Adding `static` to functions that were already internal has no impact on functionality, memory layout, or behavior - it only affects compiler symbol visibility and warnings. 3. **Minimal and Contained**: The diff is extremely small (4 function signatures with `static` added) and isolated to the kselftest harness framework. There are no complex logic changes or cross-subsystem impacts. 4. **Testing Infrastructure Improvement**: While the kselftest framework isn't critical runtime code, it's important for kernel testing and validation. Improving build compliance in testing infrastructure benefits stable kernel maintenance. 5. **Standard Practice**: Compiler warning fixes of this nature (adding missing `static` keywords) are routinely backported to stable trees as they represent good coding practices without functional risk. 6. **Different from Similar Commits**: Unlike the referenced similar commits (all marked "NO") which involved feature additions, API changes, or structural modifications, this commit is purely a build compliance fix with no behavioral changes. The commit meets all stable tree criteria: it fixes an issue (compiler warnings), has minimal risk (no functional changes), and improves code quality without introducing new features or architectural changes. Tools like `kselftest_harness.h:241`, `kselftest_harness.h:290`, `kselftest_harness.h:970`, and `kselftest_harness.h:1188` are the specific locations where these low- risk improvements are made. tools/testing/selftests/kselftest_harness.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h index 666c9fde76da9..7c337b4fa054d 100644 --- a/tools/testing/selftests/kselftest_harness.h +++ b/tools/testing/selftests/kselftest_harness.h @@ -258,7 +258,7 @@ * A bare "return;" statement may be used to return early. */ #define FIXTURE_SETUP(fixture_name) \ - void fixture_name##_setup( \ + static void fixture_name##_setup( \ struct __test_metadata __attribute__((unused)) *_metadata, \ FIXTURE_DATA(fixture_name) __attribute__((unused)) *self, \ const FIXTURE_VARIANT(fixture_name) \ @@ -307,7 +307,7 @@ __FIXTURE_TEARDOWN(fixture_name) #define __FIXTURE_TEARDOWN(fixture_name) \ - void fixture_name##_teardown( \ + static void fixture_name##_teardown( \ struct __test_metadata __attribute__((unused)) *_metadata, \ FIXTURE_DATA(fixture_name) __attribute__((unused)) *self, \ const FIXTURE_VARIANT(fixture_name) \ @@ -987,7 +987,7 @@ static void __timeout_handler(int sig, siginfo_t *info, void *ucontext) kill(-(t->pid), SIGKILL); } -void __wait_for_test(struct __test_metadata *t) +static void __wait_for_test(struct __test_metadata *t) { struct sigaction action = { .sa_sigaction = __timeout_handler, @@ -1205,9 +1205,9 @@ static bool test_enabled(int argc, char **argv, return !has_positive; } -void __run_test(struct __fixture_metadata *f, - struct __fixture_variant_metadata *variant, - struct __test_metadata *t) +static void __run_test(struct __fixture_metadata *f, + struct __fixture_variant_metadata *variant, + struct __test_metadata *t) { struct __test_xfail *xfail; char test_name[1024]; -- 2.39.5

1 month

1
0
0 0

[PATCH AUTOSEL 6.15 082/110] selftests: harness: Mark functions without prototypes static

by Sasha Levin

From: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> [ Upstream commit c2bcc8e9577a35f9cf4707f8bb0b58bce30991aa ] With -Wmissing-prototypes the compiler will warn about non-static functions which don't have a prototype defined. As they are not used from a different compilation unit they don't need to be defined globally. Avoid the issue by marking the functions static. Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com> Acked-by: Shuah Khan <skhan(a)linuxfoundation.org> Link: https://lore.kernel.org/r/20250505-nolibc-kselftest-harness-v4-4-ee4dd52571… Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- **YES** This commit should be backported to stable kernel trees. **Rationale:** 1. **Legitimate Build Fix**: The commit addresses a real compiler warning issue (`-Wmissing-prototypes`) that affects build cleanliness and code quality. Modern build systems increasingly use stricter warning flags, making this fix valuable for stable trees. 2. **Zero Functional Risk**: The changes are purely cosmetic from a runtime perspective. Adding `static` to functions that were already internal has no impact on functionality, memory layout, or behavior - it only affects compiler symbol visibility and warnings. 3. **Minimal and Contained**: The diff is extremely small (4 function signatures with `static` added) and isolated to the kselftest harness framework. There are no complex logic changes or cross-subsystem impacts. 4. **Testing Infrastructure Improvement**: While the kselftest framework isn't critical runtime code, it's important for kernel testing and validation. Improving build compliance in testing infrastructure benefits stable kernel maintenance. 5. **Standard Practice**: Compiler warning fixes of this nature (adding missing `static` keywords) are routinely backported to stable trees as they represent good coding practices without functional risk. 6. **Different from Similar Commits**: Unlike the referenced similar commits (all marked "NO") which involved feature additions, API changes, or structural modifications, this commit is purely a build compliance fix with no behavioral changes. The commit meets all stable tree criteria: it fixes an issue (compiler warnings), has minimal risk (no functional changes), and improves code quality without introducing new features or architectural changes. Tools like `kselftest_harness.h:241`, `kselftest_harness.h:290`, `kselftest_harness.h:970`, and `kselftest_harness.h:1188` are the specific locations where these low- risk improvements are made. tools/testing/selftests/kselftest_harness.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h index 666c9fde76da9..7c337b4fa054d 100644 --- a/tools/testing/selftests/kselftest_harness.h +++ b/tools/testing/selftests/kselftest_harness.h @@ -258,7 +258,7 @@ * A bare "return;" statement may be used to return early. */ #define FIXTURE_SETUP(fixture_name) \ - void fixture_name##_setup( \ + static void fixture_name##_setup( \ struct __test_metadata __attribute__((unused)) *_metadata, \ FIXTURE_DATA(fixture_name) __attribute__((unused)) *self, \ const FIXTURE_VARIANT(fixture_name) \ @@ -307,7 +307,7 @@ __FIXTURE_TEARDOWN(fixture_name) #define __FIXTURE_TEARDOWN(fixture_name) \ - void fixture_name##_teardown( \ + static void fixture_name##_teardown( \ struct __test_metadata __attribute__((unused)) *_metadata, \ FIXTURE_DATA(fixture_name) __attribute__((unused)) *self, \ const FIXTURE_VARIANT(fixture_name) \ @@ -987,7 +987,7 @@ static void __timeout_handler(int sig, siginfo_t *info, void *ucontext) kill(-(t->pid), SIGKILL); } -void __wait_for_test(struct __test_metadata *t) +static void __wait_for_test(struct __test_metadata *t) { struct sigaction action = { .sa_sigaction = __timeout_handler, @@ -1205,9 +1205,9 @@ static bool test_enabled(int argc, char **argv, return !has_positive; } -void __run_test(struct __fixture_metadata *f, - struct __fixture_variant_metadata *variant, - struct __test_metadata *t) +static void __run_test(struct __fixture_metadata *f, + struct __fixture_variant_metadata *variant, + struct __test_metadata *t) { struct __test_xfail *xfail; char test_name[1024]; -- 2.39.5

1 month

1
0
0 0

[PATCH] selftests/damon/_damon_sysfs: skip testcases if CONFIG_DAMON_SYSFS is disabled

by Enze Li

When CONFIG_DAMON_SYSFS is disabled, the selftests fail with the following outputs, not ok 2 selftests: damon: sysfs_update_schemes_tried_regions_wss_estimation.py # exit=1 not ok 3 selftests: damon: damos_quota.py # exit=1 not ok 4 selftests: damon: damos_quota_goal.py # exit=1 not ok 5 selftests: damon: damos_apply_interval.py # exit=1 not ok 6 selftests: damon: damos_tried_regions.py # exit=1 not ok 7 selftests: damon: damon_nr_regions.py # exit=1 not ok 11 selftests: damon: sysfs_update_schemes_tried_regions_hang.py # exit=1 The root cause of this issue is that all the testcases above do not check the sysfs interface of DAMON whether it exists or not. With this patch applied, all the testcases above now pass successfully. Signed-off-by: Enze Li <lienze(a)kylinos.cn> --- tools/testing/selftests/damon/_damon_sysfs.py | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/testing/selftests/damon/_damon_sysfs.py b/tools/testing/selftests/damon/_damon_sysfs.py index 6e136dc3df19..cab67addfb00 100644 --- a/tools/testing/selftests/damon/_damon_sysfs.py +++ b/tools/testing/selftests/damon/_damon_sysfs.py @@ -15,6 +15,10 @@ if sysfs_root is None: print('Seems sysfs not mounted?') exit(ksft_skip) +if not os.path.exists(sysfs_root): + print('Seems DAMON disabled?') + exit(ksft_skip) + def write_file(path, string): "Returns error string if failed, or None otherwise" string = '%s' % string base-commit: 0f70f5b08a47a3bc1a252e5f451a137cde7c98ce -- 2.43.0

1 month

2
1
0 0

[PATCH net v2] selftests: net: build net/lib dependency in all target

by Bui Quang Minh

We have the logic to include net/lib automatically for net related selftests. However, currently, this logic is only in install target which means only `make install` will have net/lib included. This commit moves the logic to all target so that all `make`, `make run_tests` and `make install` will have net/lib included in net related selftests. Reviewed-by: Jakub Kicinski <kuba(a)kernel.org> Signed-off-by: Bui Quang Minh <minhquangbui99(a)gmail.com> --- Changes in v2: - Make the commit message clearer. tools/testing/selftests/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 6aa11cd3db42..5b04d83ad9a1 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -205,7 +205,7 @@ export KHDR_INCLUDES all: @ret=1; \ - for TARGET in $(TARGETS); do \ + for TARGET in $(TARGETS) $(INSTALL_DEP_TARGETS); do \ BUILD_TARGET=$$BUILD/$$TARGET; \ mkdir $$BUILD_TARGET -p; \ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET \ @@ -270,7 +270,7 @@ ifdef INSTALL_PATH install -m 744 run_kselftest.sh $(INSTALL_PATH)/ rm -f $(TEST_LIST) @ret=1; \ - for TARGET in $(TARGETS) $(INSTALL_DEP_TARGETS); do \ + for TARGET in $(TARGETS); do \ BUILD_TARGET=$$BUILD/$$TARGET; \ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET install \ INSTALL_PATH=$(INSTALL_PATH)/$$TARGET \ -- 2.43.0

1 month

1
1
0 0

[PATCH v1 1/1] selftests/x86: Add a test to detect infinite sigtrap handler loop

by Xin Li (Intel)

When FRED is enabled, if the Trap Flag (TF) is set without an external debugger attached, it can lead to an infinite loop in the SIGTRAP handler. To avoid this, the software event flag in the augmented SS must be cleared, ensuring that no single-step trap remains pending when ERETU completes. This test checks for that specific scenario—verifying whether the kernel correctly prevents an infinite SIGTRAP loop in this edge case. Signed-off-by: Xin Li (Intel) <xin(a)zytor.com> --- tools/testing/selftests/x86/Makefile | 2 +- .../selftests/x86/test_sigtrap_handler.c | 80 +++++++++++++++++++ 2 files changed, 81 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/x86/test_sigtrap_handler.c diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index f703fcfe9f7c..c486fd88ebb1 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -12,7 +12,7 @@ CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh "$(CC)" trivial_program.c -no-pie) TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap_vdso \ check_initial_reg_state sigreturn iopl ioperm \ - test_vsyscall mov_ss_trap \ + test_vsyscall mov_ss_trap test_sigtrap_handler \ syscall_arg_fault fsgsbase_restore sigaltstack TARGETS_C_BOTHBITS += nx_stack TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \ diff --git a/tools/testing/selftests/x86/test_sigtrap_handler.c b/tools/testing/selftests/x86/test_sigtrap_handler.c new file mode 100644 index 000000000000..9c5c2cf0cf88 --- /dev/null +++ b/tools/testing/selftests/x86/test_sigtrap_handler.c @@ -0,0 +1,80 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2025 Intel Corporation + */ +#define _GNU_SOURCE + +#include <err.h> +#include <signal.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/ucontext.h> + +#ifdef __x86_64__ +# define REG_IP REG_RIP +#else +# define REG_IP REG_EIP +#endif + +static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *), int flags) +{ + struct sigaction sa; + + memset(&sa, 0, sizeof(sa)); + sa.sa_sigaction = handler; + sa.sa_flags = SA_SIGINFO | flags; + sigemptyset(&sa.sa_mask); + + if (sigaction(sig, &sa, 0)) + err(1, "sigaction"); + + return; +} + +static unsigned int loop_count_on_same_ip; + +static void sigtrap(int sig, siginfo_t *info, void *ctx_void) +{ + ucontext_t *ctx = (ucontext_t *)ctx_void; + static unsigned long last_trap_ip; + + if (last_trap_ip == ctx->uc_mcontext.gregs[REG_IP]) { + printf("trapped on %016lx\n", last_trap_ip); + + if (++loop_count_on_same_ip > 10) { + printf("trap loop detected, test failed\n"); + exit(2); + } + + return; + } + + loop_count_on_same_ip = 0; + last_trap_ip = ctx->uc_mcontext.gregs[REG_IP]; + printf("trapped on %016lx\n", last_trap_ip); +} + +int main(int argc, char *argv[]) +{ + sethandler(SIGTRAP, sigtrap, 0); + + asm volatile( +#ifdef __x86_64__ + /* Avoid clobbering the redzone */ + "sub $128, %rsp\n\t" +#endif + "push $0x302\n\t" + "popf\n\t" + "nop\n\t" + "nop\n\t" + "push $0x202\n\t" + "popf\n\t" +#ifdef __x86_64__ + "add $128, %rsp\n\t" +#endif + ); + + printf("test passed\n"); + return 0; +} base-commit: 485d11d84a2452ac16466cc7ae041c93d38929bc -- 2.49.0

1 month

2
1
0 0

[PATCH v4 00/23] iommufd: Add vIOMMU infrastructure (Part-4 HW QUEUE)

by Nicolin Chen

The vIOMMU object is designed to represent a slice of an IOMMU HW for its virtualization features shared with or passed to user space (a VM mostly) in a way of HW acceleration. This extended the HWPT-based design for more advanced virtualization feature. HW QUEUE introduced by this series as a part of the vIOMMU infrastructure represents a HW accelerated queue/buffer for VM to use exclusively, e.g. - NVIDIA's Virtual Command Queue - AMD vIOMMU's Command Buffer, Event Log Buffer, and PPR Log Buffer each of which allows its IOMMU HW to directly access a queue memory owned by a guest VM and allows a guest OS to control the HW queue direclty, to avoid VM Exit overheads to improve the performance. Introduce IOMMUFD_OBJ_HW_QUEUE and its pairing IOMMUFD_CMD_HW_QUEUE_ALLOC allowing VMM to forward the IOMMU-specific queue info, such as queue base address, size, and etc. Meanwhile, a guest-owned queue needs the guest kernel to control the queue by reading/writing its consumer and producer indexes, via MMIO acceses to the hardware MMIO registers. Introduce an mmap infrastructure for iommufd to support passing through a piece of MMIO region from the host physical address space to the guest physical address space. The mmap info (offset/ length) used by an mmap syscall must be pre-allocated and returned to the user space via an output driver-data during an IOMMUFD_CMD_HW_QUEUE_ALLOC call. Thus, it requires a driver-specific user data support in the vIOMMU allocation flow. As a real-world use case, this series implements a HW QUEUE support in the tegra241-cmdqv driver for VCMDQs on NVIDIA Grace CPU. In another word, it is also the Tegra CMDQV series Part-2 (user-space support), reworked from Previous RFCv1: https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/ This enables the HW accelerated feature for NVIDIA Grace CPU. Compared to the standard SMMUv3 operating in the nested translation mode trapping CMDQ for TLBI and ATC_INV commands, this gives a huge performance improvement: 70% to 90% reductions of invalidation time were measured by various DMA unmap tests running in a guest OS. // Unmap latencies from "dma_map_benchmark -g @granule -t @threads", // by toggling "/sys/kernel/debug/iommu/tegra241_cmdqv/bypass_vcmdq" @granule | @threads | bypass_vcmdq=1 | bypass_vcmdq=0 4KB 1 35.7 us 5.3 us 16KB 1 41.8 us 6.8 us 64KB 1 68.9 us 9.9 us 128KB 1 109.0 us 12.6 us 256KB 1 187.1 us 18.0 us 4KB 2 96.9 us 6.8 us 16KB 2 97.8 us 7.5 us 64KB 2 151.5 us 10.7 us 128KB 2 257.8 us 12.7 us 256KB 2 443.0 us 17.9 us This is on Github: https://github.com/nicolinc/iommufd/commits/iommufd_hw_queue-v4 Paring QEMU branch for testing: https://github.com/nicolinc/qemu/commits/wip/for_iommufd_hw_queue-v4 Changelog v4 * Rebase on v6.15-rc5 * Add Reviewed-by from Vasant * Rename "vQUEUE" to "HW QUEUE" * Use "offset" and "length" for all mmap-related variables * [iommufd] Use u64 for guest PA * [iommufd] Fix typo in uAPI doc * [iommufd] Rename immap_id to offset * [iommufd] Drop the partial-size mmap support * [iommufd] Do not replace WARN_ON with WARN_ON_ONCE * [iommufd] Use "u64 base_addr" for queue base address * [iommufd] Use u64 base_pfn/num_pfns for immap structure * [iommufd] Correct the size passed in to mtree_alloc_range() * [iommufd] Add IOMMUFD_VIOMMU_FLAG_HW_QUEUE_READS_PA to viommu_ops v3 https://lore.kernel.org/all/cover.1746139811.git.nicolinc@nvidia.com/ * Add Reviewed-by from Baolu, Pranjal, and Alok * Revise kdocs, uAPI docs, and commit logs * Rename "vCMDQ" back to "vQUEUE" for AMD cases * [tegra] Add tegra241_vcmdq_hw_flush_timeout() * [tegra] Rename vsmmu_alloc to alloc_vintf_user * [tegra] Use writel for SID replacement registers * [tegra] Move mmap removal call to vsmmu_destroy op * [tegra] Fix revert in tegra241_vintf_alloc_lvcmdq_user() * [iommufd] Replace "& ~PAGE_MASK" with PAGE_ALIGNED() * [iommufd] Add an object-type "owner" to immap structure * [iommufd] Drop the ictx input in the new for-driver APIs * [iommufd] Add iommufd_vma_ops to keep track of mmap lifecycle * [iommufd] Add viommu-based iommufd_viommu_alloc/destroy_mmap helpers * [iommufd] Rename iommufd_ctx_alloc/free_mmap to _iommufd_alloc/destroy_mmap v2 https://lore.kernel.org/all/cover.1745646960.git.nicolinc@nvidia.com/ * Add Reviewed-by from Jason * [smmu] Fix vsmmu initial value * [smmu] Support impl for hw_info * [tegra] Rename "slot" to "vsid" * [tegra] Update kdocs and commit logs * [tegra] Map/unmap LVCMDQ dynamically * [tegra] Refcount the previous LVCMDQ * [tegra] Return -EEXIST if LVCMDQ exists * [tegra] Simplify VINTF cleanup routine * [tegra] Use vmid and s2_domain in vsmmu * [tegra] Rename "mmap_pgoff" to "immap_id" * [tegra] Add more addr and length validation * [iommufd] Add more narrative to mmap's kdoc * [iommufd] Add iommufd_struct_depend/undepend() * [iommufd] Rename vcmdq_free op to vcmdq_destroy * [iommufd] Fix bug in iommu_copy_struct_to_user() * [iommufd] Drop is_io from iommufd_ctx_alloc_mmap() * [iommufd] Test the queue memory for its contiguity * [iommufd] Return -ENXIO if address or length fails * [iommufd] Do not change @min_last in mock_viommu_alloc() * [iommufd] Generalize TEGRA241_VCMDQ data in core structure * [iommufd] Add selftest coverage for IOMMUFD_CMD_VCMDQ_ALLOC * [iommufd] Add iopt_pin_pages() to prevent queue memory from unmapping v1 https://lore.kernel.org/all/cover.1744353300.git.nicolinc@nvidia.com/ Thanks Nicolin Nicolin Chen (23): iommufd/viommu: Add driver-allocated vDEVICE support iommu: Pass in a driver-level user data structure to viommu_alloc op iommufd/viommu: Allow driver-specific user data for a vIOMMU object iommu: Add iommu_copy_struct_to_user helper iommufd/driver: Let iommufd_viommu_alloc helper save ictx to viommu->ictx iommufd/driver: Add iommufd_struct_destroy to revert iommufd_viommu_alloc iommufd/selftest: Support user_data in mock_viommu_alloc iommufd/selftest: Add covearge for viommu data iommufd: Abstract iopt_pin_pages and iopt_unpin_pages helpers iommufd/viommu: Introduce IOMMUFD_OBJ_HW_QUEUE and its related struct iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl iommufd/driver: Add iommufd_hw_queue_depend/undepend() helpers iommufd/selftest: Add coverage for IOMMUFD_CMD_HW_QUEUE_ALLOC iommufd: Add mmap interface iommufd/selftest: Add coverage for the new mmap interface Documentation: userspace-api: iommufd: Update HW QUEUE iommu/arm-smmu-v3-iommufd: Add vsmmu_alloc impl op iommu/arm-smmu-v3-iommufd: Support implementation-defined hw_info iommu/tegra241-cmdqv: Use request_threaded_irq iommu/tegra241-cmdqv: Simplify deinit flow in tegra241_cmdqv_remove_vintf() iommu/tegra241-cmdqv: Do not statically map LVCMDQs iommu/tegra241-cmdqv: Add user-space use support iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 25 +- drivers/iommu/iommufd/io_pagetable.h | 8 + drivers/iommu/iommufd/iommufd_private.h | 28 +- drivers/iommu/iommufd/iommufd_test.h | 20 + include/linux/iommu.h | 43 +- include/linux/iommufd.h | 186 ++++++- include/uapi/linux/iommufd.h | 116 ++++- tools/testing/selftests/iommu/iommufd_utils.h | 52 +- .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 43 +- .../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 490 +++++++++++++++++- drivers/iommu/iommufd/device.c | 117 +---- drivers/iommu/iommufd/driver.c | 94 ++++ drivers/iommu/iommufd/io_pagetable.c | 95 ++++ drivers/iommu/iommufd/main.c | 80 ++- drivers/iommu/iommufd/selftest.c | 133 ++++- drivers/iommu/iommufd/viommu.c | 121 ++++- tools/testing/selftests/iommu/iommufd.c | 97 +++- .../selftests/iommu/iommufd_fail_nth.c | 11 +- Documentation/userspace-api/iommufd.rst | 12 + 19 files changed, 1577 insertions(+), 194 deletions(-) base-commit: 92a09c47464d040866cf2b4cd052bc60555185fb -- 2.43.0

1 month

5
105
0 0

[PATCH v1 0/6] VMM can handle guest SEA via KVM_EXIT_ARM_SEA

by Jiaqi Yan

Problem ======= When host APEI is unable to claim synchronous external abort (SEA) during stage-2 guest abort, today KVM directly injects an async SError into the VCPU then resumes it. The injected SError usually results in unpleasant guest kernel panic. One of the major situation of guest SEA is when VCPU consumes recoverable uncorrected memory error (UER), which is not uncommon at all in modern datacenter servers with large amounts of physical memory. Although SError and guest panic is sufficient to stop the propagation of corrupted memory there is still room to recover from memory UER in a more graceful manner. Proposed Solution ================= Alternatively KVM can replay the SEA to the faulting VCPU, via existing KVM_SET_VCPU_EVENTS API. If the memory poison consumption or the fault that cause SEA is not from guest kernel, the blast radius can be limited to the consuming or faulting guest userspace process, so the VM can keep running. In addition, instead of doing under the hood without involving userspace, there are benefits to redirect the SEA to VMM: - VM customers care about the disruptions caused by memory errors, and VMM usually has the responsibility to start the process of notifying the customers of memory error events in their VMs. For example some cloud provider emits a critical log in their observability UI [1], and provides playbook for customers on how to mitigate disruptions to their workloads. - VMM can protect future memory error consumption or faults by unmapping the poisoned pages from stage-2 page table with KVM userfault [2], which is more performant than splitting the memslot that contains the poisoned guest pages. - VMM can keep track SEA events in the VM. When VMM thinks the status on the host or the VM is bad enough, e.g. number of distinct SEAs exceeds a threshold, it can restart the VM on another healthy host. - Behavior parity with x86 architecture. When machine check exception (MCE) is caused by VCPU, kernel or KVM signals userspace SIGBUS to let VMM either recover from the MCE, or terminate itself with VM. The prior RFC proposes to implement SIGBUS on arm64 as well, but Marc preferred VCPU exit over signal [3]. However, implementation aside, returning SEA to VMM is on par with returning MCE to VMM. Once SEA is redirected to VMM, among other actions, VMM is encouraged to inject external aborts into the faulting VCPU, which is already supported by KVM on arm64, although not fully supported by KVM_SET_VCPU_EVENTS but complemented in this patchset. New UAPIs ========= This patchset introduces following userspace-visiable changes to empower VMM to control what happens next for guest SEA: - KVM_CAP_ARM_SEA_TO_USER. If userspace enables this new capability at VM creation, KVM will not inject SError while taking SEA, but VM exit to userspace. - KVM_EXIT_ARM_SEA. This is the VM exit reason VMM gets. The details about the SEA is provided in arm_sea as much as possible, including ESR value at EL2, if guest virtual and physical addresses (GPA and GVA) are available and the values if available. - KVM_CAP_ARM_INJECT_EXT_IABT. VMM today can inject external data abort to VCPU via KVM_SET_VCPU_EVENTS API. However, in case of instruction abort, VMM cannot inject it via KVM_SET_VCPU_EVENTS. KVM_CAP_ARM_INJECT_EXT_IABT is just a natural extend to KVM_CAP_ARM_INJECT_EXT_DABT that tells VMM KVM_SET_VCPU_EVENTS now supports external instruction abort. Patchset utilizes commit 26fbdf369227 ("KVM: arm64: Don't translate FAR if invalid/unsafe") from [4], available already in kvmarm/next. [4] makes KVM safely do address translation for HPFAR_EL2, including at the event of SEA, and indicate if HPFAR_EL2 is valid in NS bit. This patchset depends on [4] to tell userspace if GPA is valid and its value if valid. Patchset is based on commit 68ec8b4e84446 ("Merge branch kvm-arm64/pkvm-6.16 into kvmarm-master/next") [1] https://cloud.google.com/solutions/sap/docs/manage-host-errors [2] https://lpc.events/event/18/contributions/1757/attachments/1442/3073/LPC_%2… [3] https://lore.kernel.org/kvm/86pljbqqh0.wl-maz@kernel.org [4] https://lore.kernel.org/all/174369514508.3034362.13165690020799838042.b4-ty… Jiaqi Yan (5): KVM: arm64: VM exit to userspace to handle SEA KVM: arm64: Set FnV for VCPU when FAR_EL2 is invalid KVM: selftests: Test for KVM_EXIT_ARM_SEA and KVM_CAP_ARM_SEA_TO_USER KVM: selftests: Test for KVM_CAP_INJECT_EXT_IABT Documentation: kvm: new uAPI for handling SEA Raghavendra Rao Ananta (1): KVM: arm64: Allow userspace to inject external instruction aborts Documentation/virt/kvm/api.rst | 120 ++++++- arch/arm64/include/asm/kvm_emulate.h | 12 + arch/arm64/include/asm/kvm_host.h | 8 + arch/arm64/include/asm/kvm_ras.h | 21 +- arch/arm64/include/uapi/asm/kvm.h | 3 +- arch/arm64/kvm/Makefile | 3 +- arch/arm64/kvm/arm.c | 6 + arch/arm64/kvm/guest.c | 13 +- arch/arm64/kvm/inject_fault.c | 3 + arch/arm64/kvm/kvm_ras.c | 54 +++ arch/arm64/kvm/mmu.c | 12 +- include/uapi/linux/kvm.h | 12 + tools/arch/arm64/include/uapi/asm/kvm.h | 3 +- tools/testing/selftests/kvm/Makefile.kvm | 2 + .../testing/selftests/kvm/arm64/inject_iabt.c | 100 ++++++ .../testing/selftests/kvm/arm64/sea_to_user.c | 324 ++++++++++++++++++ tools/testing/selftests/kvm/lib/kvm_util.c | 1 + 17 files changed, 654 insertions(+), 43 deletions(-) create mode 100644 arch/arm64/kvm/kvm_ras.c create mode 100644 tools/testing/selftests/kvm/arm64/inject_iabt.c create mode 100644 tools/testing/selftests/kvm/arm64/sea_to_user.c -- 2.49.0.967.g6a0df3ecc3-goog

1 month

3
12
0 0

[PATCH] kunit: configs: Enable CONFIG_INIT_STACK_ALL_PATTERN in all_tests

by Richard Fitzgerald

Enable CONFIG_INIT_STACK_ALL_PATTERN in all_tests.config. This helps to detect use of uninitialized local variables. This option found an uninitialized data bug in the cs_dsp test. Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com> --- tools/testing/kunit/configs/all_tests.config | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/kunit/configs/all_tests.config b/tools/testing/kunit/configs/all_tests.config index cdd9782f9646..4a60bb71fe72 100644 --- a/tools/testing/kunit/configs/all_tests.config +++ b/tools/testing/kunit/configs/all_tests.config @@ -10,6 +10,7 @@ CONFIG_KUNIT_EXAMPLE_TEST=y CONFIG_KUNIT_ALL_TESTS=y CONFIG_FORTIFY_SOURCE=y +CONFIG_INIT_STACK_ALL_PATTERN=y CONFIG_IIO=y -- 2.39.5

1 month

3
3
0 0

[PATCH v8 0/9] ublk: decouple server threads from ublk_queues/hctxs

by Uday Shankar

This patch set aims to allow ublk server threads to better balance load amongst themselves by decoupling server threads from ublk_queues/hctxs, so that multiple threads can service I/Os that are issued from a single CPU. This can improve performance for workloads in which ublk server CPU is a bottleneck, and for which load is issued from CPUs which are not balanced across ublk_queues/hctxs. Performance ----------- First create two ublk devices with: ublkb0: ./kublk add -t null -q 2 --nthreads 2 ublkb1: ./kublk add -t null -q 2 --nthreads 2 --per_io_tasks Then run load with: taskset -c 1 fio/t/io_uring -r5 -p0 /dev/ublkb0: 1.90M IOPS taskset -c 1 fio/t/io_uring -r5 -p0 /dev/ublkb1: 2.18M IOPS Since ublkb1 has per-io-tasks, the second command is able to make use of both ublk server worker threads and therefore has increased max throughput. Caveats: - This testing was done on a system with 2 numa nodes, but the penalty of having I/O cross a numa (or LLC) boundary in the per_io_tasks case is quite high. So these numbers were obtained after moving all ublk server threads and the application threads to CPUs on the same numa node/LLC. - One might expect the scaling to be linear - because ublkb1 can make use of twice as many ublk server threads, it should be able to drive twice the throughput. However this is not true (the improvement is ~15%), and needs further investigation. Signed-off-by: Uday Shankar <ushankar(a)purestorage.com> --- Changes in v8: - Fix queue_rqs batch dispatch OOPS when dispatching a list of requests associated to > 1 ublk_queue (Ming Lei, Caleb Sander Mateos) - Simplify queue_rqs (Caleb Sander Mateos) - Narrow a couple of types (Ming Lei) - Add stress test for per io daemons (Ming Lei) - Link to v7: https://lore.kernel.org/r/20250527-ublk_task_per_io-v7-0-cbdbaf283baa@pures… Changes in v7: - Fix queue_rqs batch dispatch for per-io daemons - Kick round-robin tag allocation changes to a followup - Add explicit feature flag for per-task daemons (Ming Lei, Caleb Sander Mateos) - Move some variable assignments to avoid redundant computation (Caleb Sander Mateos) - Switch from storing pointers in ublk_io to computing based on address with container_of in a couple places (Ming Lei) - Link to v6: https://lore.kernel.org/r/20250507-ublk_task_per_io-v6-0-a2a298783c01@pures… Changes in v6: - Add a feature flag for this feature, called UBLK_F_RR_TAGS (Ming Lei) - Add test for this feature (Ming Lei) - Add documentation for this feature (Ming Lei) - Link to v5: https://lore.kernel.org/r/20250416-ublk_task_per_io-v5-0-9261ad7bff20@pures… Changes in v5: - Set io->task before ublk_mark_io_ready (Caleb Sander Mateos) - Set io->task atomically, read it atomically when needed - Return 0 on success from command-specific helpers in __ublk_ch_uring_cmd (Caleb Sander Mateos) - Rename ublk_handle_need_get_data to ublk_get_data (Caleb Sander Mateos) - Link to v4: https://lore.kernel.org/r/20250415-ublk_task_per_io-v4-0-54210b91a46f@pures… Changes in v4: - Drop "ublk: properly serialize all FETCH_REQs" since Ming is taking it in another set - Prevent data races by marking data structures which should be read-only in the I/O path as const (Ming Lei) - Link to v3: https://lore.kernel.org/r/20250410-ublk_task_per_io-v3-0-b811e8f4554a@pures… Changes in v3: - Check for UBLK_IO_FLAG_ACTIVE on I/O again after taking lock to ensure that two concurrent FETCH_REQs on the same I/O can't succeed (Caleb Sander Mateos) - Link to v2: https://lore.kernel.org/r/20250408-ublk_task_per_io-v2-0-b97877e6fd50@pures… Changes in v2: - Remove changes split into other patches - To ease error handling/synchronization, associate each I/O (instead of each queue) to the last task that issues a FETCH_REQ against it. Only that task is allowed to operate on the I/O. - Link to v1: https://lore.kernel.org/r/20241002224437.3088981-1-ushankar@purestorage.com --- Uday Shankar (9): ublk: have a per-io daemon instead of a per-queue daemon selftests: ublk: kublk: plumb q_id in io_uring user_data selftests: ublk: kublk: tie sqe allocation to io instead of queue selftests: ublk: kublk: lift queue initialization out of thread selftests: ublk: kublk: move per-thread data out of ublk_queue selftests: ublk: kublk: decouple ublk_queues from ublk server threads selftests: ublk: add functional test for per io daemons selftests: ublk: add stress test for per io daemons Documentation: ublk: document UBLK_F_PER_IO_DAEMON Documentation/block/ublk.rst | 35 ++- drivers/block/ublk_drv.c | 111 +++---- include/uapi/linux/ublk_cmd.h | 9 + tools/testing/selftests/ublk/Makefile | 2 + tools/testing/selftests/ublk/fault_inject.c | 4 +- tools/testing/selftests/ublk/file_backed.c | 20 +- tools/testing/selftests/ublk/kublk.c | 344 ++++++++++++++------- tools/testing/selftests/ublk/kublk.h | 73 +++-- tools/testing/selftests/ublk/null.c | 22 +- tools/testing/selftests/ublk/stripe.c | 17 +- tools/testing/selftests/ublk/test_common.sh | 5 + tools/testing/selftests/ublk/test_generic_12.sh | 55 ++++ tools/testing/selftests/ublk/test_stress_06.sh | 36 +++ .../selftests/ublk/trace/count_ios_per_tid.bt | 11 + 14 files changed, 512 insertions(+), 232 deletions(-) --- base-commit: 533c87e2ed742454957f14d7bef9f48d5a72e72d change-id: 20250408-ublk_task_per_io-c693cf608d7a Best regards, -- Uday Shankar <ushankar(a)purestorage.com>

1 month

3
13
0 0

[PATCH bpf-next 1/2] bpftool: Use appropriate permissions for map access

by Slava Imameev

Modify several functions in tools/bpf/bpftool/common.c to allow specification of requested access for file descriptors, such as read-only access. Update bpftool to request only read access for maps when write access is not required. This fixes errors when reading from maps that are protected from modification via security_bpf_map. Signed-off-by: Slava Imameev <slava.imameev(a)crowdstrike.com> --- tools/bpf/bpftool/btf.c | 3 +- tools/bpf/bpftool/common.c | 57 ++++++++++++++++++++++--------- tools/bpf/bpftool/iter.c | 2 +- tools/bpf/bpftool/link.c | 2 +- tools/bpf/bpftool/main.h | 13 ++++--- tools/bpf/bpftool/map.c | 56 +++++++++++++++++------------- tools/bpf/bpftool/map_perf_ring.c | 3 +- tools/bpf/bpftool/prog.c | 4 +-- 8 files changed, 90 insertions(+), 50 deletions(-) diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c index 6b14cbfa58aa..1ba27cb03348 100644 --- a/tools/bpf/bpftool/btf.c +++ b/tools/bpf/bpftool/btf.c @@ -905,7 +905,8 @@ static int do_dump(int argc, char **argv) return -1; } - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, + BPF_F_RDONLY); if (fd < 0) return -1; diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c index ecfa790adc13..ff1c99281beb 100644 --- a/tools/bpf/bpftool/common.c +++ b/tools/bpf/bpftool/common.c @@ -193,7 +193,8 @@ int mount_tracefs(const char *target) return err; } -int open_obj_pinned(const char *path, bool quiet) +int open_obj_pinned(const char *path, bool quiet, + const struct bpf_obj_get_opts *opts) { char *pname; int fd = -1; @@ -205,7 +206,7 @@ int open_obj_pinned(const char *path, bool quiet) goto out_ret; } - fd = bpf_obj_get(pname); + fd = bpf_obj_get_opts(pname, opts); if (fd < 0) { if (!quiet) p_err("bpf obj get (%s): %s", pname, @@ -221,12 +222,13 @@ int open_obj_pinned(const char *path, bool quiet) return fd; } -int open_obj_pinned_any(const char *path, enum bpf_obj_type exp_type) +int open_obj_pinned_any(const char *path, enum bpf_obj_type exp_type, + const struct bpf_obj_get_opts *opts) { enum bpf_obj_type type; int fd; - fd = open_obj_pinned(path, false); + fd = open_obj_pinned(path, false, opts); if (fd < 0) return -1; @@ -555,7 +557,7 @@ static int do_build_table_cb(const char *fpath, const struct stat *sb, if (typeflag != FTW_F) goto out_ret; - fd = open_obj_pinned(fpath, true); + fd = open_obj_pinned(fpath, true, NULL); if (fd < 0) goto out_ret; @@ -928,7 +930,7 @@ int prog_parse_fds(int *argc, char ***argv, int **fds) path = **argv; NEXT_ARGP(); - (*fds)[0] = open_obj_pinned_any(path, BPF_OBJ_PROG); + (*fds)[0] = open_obj_pinned_any(path, BPF_OBJ_PROG, NULL); if ((*fds)[0] < 0) return -1; return 1; @@ -965,7 +967,8 @@ int prog_parse_fd(int *argc, char ***argv) return fd; } -static int map_fd_by_name(char *name, int **fds) +static int map_fd_by_name(char *name, int **fds, + const struct bpf_get_fd_by_id_opts *opts) { unsigned int id = 0; int fd, nb_fds = 0; @@ -973,6 +976,7 @@ static int map_fd_by_name(char *name, int **fds) int err; while (true) { + LIBBPF_OPTS(bpf_get_fd_by_id_opts, opts_ro); struct bpf_map_info info = {}; __u32 len = sizeof(info); @@ -985,7 +989,9 @@ static int map_fd_by_name(char *name, int **fds) return nb_fds; } - fd = bpf_map_get_fd_by_id(id); + /* Request a read-only fd to query the map info */ + opts_ro.open_flags = BPF_F_RDONLY; + fd = bpf_map_get_fd_by_id_opts(id, &opts_ro); if (fd < 0) { p_err("can't get map by id (%u): %s", id, strerror(errno)); @@ -1004,6 +1010,15 @@ static int map_fd_by_name(char *name, int **fds) continue; } + /* Get an fd with the requested options. */ + close(fd); + fd = bpf_map_get_fd_by_id_opts(id, opts); + if (fd < 0) { + p_err("can't get map by id (%u): %s", id, + strerror(errno)); + goto err_close_fds; + } + if (nb_fds > 0) { tmp = realloc(*fds, (nb_fds + 1) * sizeof(int)); if (!tmp) { @@ -1023,8 +1038,16 @@ static int map_fd_by_name(char *name, int **fds) return -1; } -int map_parse_fds(int *argc, char ***argv, int **fds) +int map_parse_fds(int *argc, char ***argv, int **fds, __u32 open_flags) { + LIBBPF_OPTS(bpf_get_fd_by_id_opts, opts); + + if (open_flags & ~BPF_F_RDONLY) { + p_err("invalid open_flags: %x", open_flags); + return -1; + } + opts.open_flags = open_flags; + if (is_prefix(**argv, "id")) { unsigned int id; char *endptr; @@ -1038,7 +1061,7 @@ int map_parse_fds(int *argc, char ***argv, int **fds) } NEXT_ARGP(); - (*fds)[0] = bpf_map_get_fd_by_id(id); + (*fds)[0] = bpf_map_get_fd_by_id_opts(id, &opts); if ((*fds)[0] < 0) { p_err("get map by id (%u): %s", id, strerror(errno)); return -1; @@ -1056,16 +1079,18 @@ int map_parse_fds(int *argc, char ***argv, int **fds) } NEXT_ARGP(); - return map_fd_by_name(name, fds); + return map_fd_by_name(name, fds, &opts); } else if (is_prefix(**argv, "pinned")) { char *path; + LIBBPF_OPTS(bpf_obj_get_opts, get_opts); + get_opts.file_flags = open_flags; NEXT_ARGP(); path = **argv; NEXT_ARGP(); - (*fds)[0] = open_obj_pinned_any(path, BPF_OBJ_MAP); + (*fds)[0] = open_obj_pinned_any(path, BPF_OBJ_MAP, &get_opts); if ((*fds)[0] < 0) return -1; return 1; @@ -1075,7 +1100,7 @@ int map_parse_fds(int *argc, char ***argv, int **fds) return -1; } -int map_parse_fd(int *argc, char ***argv) +int map_parse_fd(int *argc, char ***argv, __u32 open_flags) { int *fds = NULL; int nb_fds, fd; @@ -1085,7 +1110,7 @@ int map_parse_fd(int *argc, char ***argv) p_err("mem alloc failed"); return -1; } - nb_fds = map_parse_fds(argc, argv, &fds); + nb_fds = map_parse_fds(argc, argv, &fds, open_flags); if (nb_fds != 1) { if (nb_fds > 1) { p_err("several maps match this handle"); @@ -1103,12 +1128,12 @@ int map_parse_fd(int *argc, char ***argv) } int map_parse_fd_and_info(int *argc, char ***argv, struct bpf_map_info *info, - __u32 *info_len) + __u32 *info_len, __u32 open_flags) { int err; int fd; - fd = map_parse_fd(argc, argv); + fd = map_parse_fd(argc, argv, open_flags); if (fd < 0) return -1; diff --git a/tools/bpf/bpftool/iter.c b/tools/bpf/bpftool/iter.c index 5c39c2ed36a2..ad318a8667a4 100644 --- a/tools/bpf/bpftool/iter.c +++ b/tools/bpf/bpftool/iter.c @@ -37,7 +37,7 @@ static int do_pin(int argc, char **argv) return -1; } - map_fd = map_parse_fd(&argc, &argv); + map_fd = map_parse_fd(&argc, &argv, 0); if (map_fd < 0) return -1; diff --git a/tools/bpf/bpftool/link.c b/tools/bpf/bpftool/link.c index 3535afc80a49..8523be11dcd9 100644 --- a/tools/bpf/bpftool/link.c +++ b/tools/bpf/bpftool/link.c @@ -117,7 +117,7 @@ static int link_parse_fd(int *argc, char ***argv) path = **argv; NEXT_ARGP(); - return open_obj_pinned_any(path, BPF_OBJ_LINK); + return open_obj_pinned_any(path, BPF_OBJ_LINK, NULL); } p_err("expected 'id' or 'pinned', got: '%s'?", **argv); diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h index 9eb764fe4cc8..6db704fda5c0 100644 --- a/tools/bpf/bpftool/main.h +++ b/tools/bpf/bpftool/main.h @@ -15,6 +15,7 @@ #include <bpf/hashmap.h> #include <bpf/libbpf.h> +#include <bpf/bpf.h> #include "json_writer.h" @@ -140,8 +141,10 @@ void get_prog_full_name(const struct bpf_prog_info *prog_info, int prog_fd, int get_fd_type(int fd); const char *get_fd_type_name(enum bpf_obj_type type); char *get_fdinfo(int fd, const char *key); -int open_obj_pinned(const char *path, bool quiet); -int open_obj_pinned_any(const char *path, enum bpf_obj_type exp_type); +int open_obj_pinned(const char *path, bool quiet, + const struct bpf_obj_get_opts *opts); +int open_obj_pinned_any(const char *path, enum bpf_obj_type exp_type, + const struct bpf_obj_get_opts *opts); int mount_bpffs_for_file(const char *file_name); int create_and_mount_bpffs_dir(const char *dir_name); int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(int *, char ***)); @@ -167,10 +170,10 @@ int do_iter(int argc, char **argv) __weak; int parse_u32_arg(int *argc, char ***argv, __u32 *val, const char *what); int prog_parse_fd(int *argc, char ***argv); int prog_parse_fds(int *argc, char ***argv, int **fds); -int map_parse_fd(int *argc, char ***argv); -int map_parse_fds(int *argc, char ***argv, int **fds); +int map_parse_fd(int *argc, char ***argv, __u32 open_flags); +int map_parse_fds(int *argc, char ***argv, int **fds, __u32 open_flags); int map_parse_fd_and_info(int *argc, char ***argv, struct bpf_map_info *info, - __u32 *info_len); + __u32 *info_len, __u32 open_flags); struct bpf_prog_linfo; #if defined(HAVE_LLVM_SUPPORT) || defined(HAVE_LIBBFD_SUPPORT) diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c index 81cc668b4b05..c7dc2eae8ba8 100644 --- a/tools/bpf/bpftool/map.c +++ b/tools/bpf/bpftool/map.c @@ -337,9 +337,9 @@ static void fill_per_cpu_value(struct bpf_map_info *info, void *value) memcpy(value + i * step, value, info->value_size); } -static int parse_elem(char **argv, struct bpf_map_info *info, - void *key, void *value, __u32 key_size, __u32 value_size, - __u32 *flags, __u32 **value_fd) +static int parse_elem(char **argv, struct bpf_map_info *info, void *key, + void *value, __u32 key_size, __u32 value_size, + __u32 *flags, __u32 **value_fd, __u32 open_flags) { if (!*argv) { if (!key && !value) @@ -362,7 +362,7 @@ static int parse_elem(char **argv, struct bpf_map_info *info, return -1; return parse_elem(argv, info, NULL, value, key_size, value_size, - flags, value_fd); + flags, value_fd, open_flags); } else if (is_prefix(*argv, "value")) { int fd; @@ -388,7 +388,7 @@ static int parse_elem(char **argv, struct bpf_map_info *info, return -1; } - fd = map_parse_fd(&argc, &argv); + fd = map_parse_fd(&argc, &argv, open_flags); if (fd < 0) return -1; @@ -424,7 +424,7 @@ static int parse_elem(char **argv, struct bpf_map_info *info, } return parse_elem(argv, info, key, NULL, key_size, value_size, - flags, NULL); + flags, NULL, open_flags); } else if (is_prefix(*argv, "any") || is_prefix(*argv, "noexist") || is_prefix(*argv, "exist")) { if (!flags) { @@ -440,7 +440,7 @@ static int parse_elem(char **argv, struct bpf_map_info *info, *flags = BPF_EXIST; return parse_elem(argv + 1, info, key, value, key_size, - value_size, NULL, value_fd); + value_size, NULL, value_fd, open_flags); } p_err("expected key or value, got: %s", *argv); @@ -639,7 +639,7 @@ static int do_show_subset(int argc, char **argv) p_err("mem alloc failed"); return -1; } - nb_fds = map_parse_fds(&argc, &argv, &fds); + nb_fds = map_parse_fds(&argc, &argv, &fds, BPF_F_RDONLY); if (nb_fds < 1) goto exit_free; @@ -672,12 +672,15 @@ static int do_show_subset(int argc, char **argv) static int do_show(int argc, char **argv) { + LIBBPF_OPTS(bpf_get_fd_by_id_opts, opts); struct bpf_map_info info = {}; __u32 len = sizeof(info); __u32 id = 0; int err; int fd; + opts.open_flags = BPF_F_RDONLY; + if (show_pinned) { map_table = hashmap__new(hash_fn_for_key_as_id, equal_fn_for_key_as_id, NULL); @@ -707,7 +710,7 @@ static int do_show(int argc, char **argv) break; } - fd = bpf_map_get_fd_by_id(id); + fd = bpf_map_get_fd_by_id_opts(id, &opts); if (fd < 0) { if (errno == ENOENT) continue; @@ -909,7 +912,7 @@ static int do_dump(int argc, char **argv) p_err("mem alloc failed"); return -1; } - nb_fds = map_parse_fds(&argc, &argv, &fds); + nb_fds = map_parse_fds(&argc, &argv, &fds, BPF_F_RDONLY); if (nb_fds < 1) goto exit_free; @@ -997,7 +1000,7 @@ static int do_update(int argc, char **argv) if (argc < 2) usage(); - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, 0); if (fd < 0) return -1; @@ -1006,7 +1009,7 @@ static int do_update(int argc, char **argv) goto exit_free; err = parse_elem(argv, &info, key, value, info.key_size, - info.value_size, &flags, &value_fd); + info.value_size, &flags, &value_fd, 0); if (err) goto exit_free; @@ -1076,7 +1079,7 @@ static int do_lookup(int argc, char **argv) if (argc < 2) usage(); - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, BPF_F_RDONLY); if (fd < 0) return -1; @@ -1084,7 +1087,8 @@ static int do_lookup(int argc, char **argv) if (err) goto exit_free; - err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, NULL); + err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, NULL, + BPF_F_RDONLY); if (err) goto exit_free; @@ -1127,7 +1131,7 @@ static int do_getnext(int argc, char **argv) if (argc < 2) usage(); - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, BPF_F_RDONLY); if (fd < 0) return -1; @@ -1140,8 +1144,8 @@ static int do_getnext(int argc, char **argv) } if (argc) { - err = parse_elem(argv, &info, key, NULL, info.key_size, 0, - NULL, NULL); + err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, + NULL, BPF_F_RDONLY); if (err) goto exit_free; } else { @@ -1198,7 +1202,7 @@ static int do_delete(int argc, char **argv) if (argc < 2) usage(); - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, BPF_F_RDONLY); if (fd < 0) return -1; @@ -1209,7 +1213,8 @@ static int do_delete(int argc, char **argv) goto exit_free; } - err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, NULL); + err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, NULL, + 0); if (err) goto exit_free; @@ -1226,11 +1231,16 @@ static int do_delete(int argc, char **argv) return err; } +static int map_parse_read_only_fd(int *argc, char ***argv) +{ + return map_parse_fd(argc, argv, BPF_F_RDONLY); +} + static int do_pin(int argc, char **argv) { int err; - err = do_pin_any(argc, argv, map_parse_fd); + err = do_pin_any(argc, argv, map_parse_read_only_fd); if (!err && json_output) jsonw_null(json_wtr); return err; @@ -1319,7 +1329,7 @@ static int do_create(int argc, char **argv) if (!REQ_ARGS(2)) usage(); inner_map_fd = map_parse_fd_and_info(&argc, &argv, - &info, &len); + &info, &len, 0); if (inner_map_fd < 0) return -1; attr.inner_map_fd = inner_map_fd; @@ -1368,7 +1378,7 @@ static int do_pop_dequeue(int argc, char **argv) if (argc < 2) usage(); - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, 0); if (fd < 0) return -1; @@ -1407,7 +1417,7 @@ static int do_freeze(int argc, char **argv) if (!REQ_ARGS(2)) return -1; - fd = map_parse_fd(&argc, &argv); + fd = map_parse_fd(&argc, &argv, 0); if (fd < 0) return -1; diff --git a/tools/bpf/bpftool/map_perf_ring.c b/tools/bpf/bpftool/map_perf_ring.c index 552b4ca40c27..bcb767e2d673 100644 --- a/tools/bpf/bpftool/map_perf_ring.c +++ b/tools/bpf/bpftool/map_perf_ring.c @@ -128,7 +128,8 @@ int do_event_pipe(int argc, char **argv) int err, map_fd; map_info_len = sizeof(map_info); - map_fd = map_parse_fd_and_info(&argc, &argv, &map_info, &map_info_len); + map_fd = map_parse_fd_and_info(&argc, &argv, &map_info, &map_info_len, + 0); if (map_fd < 0) return -1; diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c index 96eea8a67225..deeaa5c1ed7d 100644 --- a/tools/bpf/bpftool/prog.c +++ b/tools/bpf/bpftool/prog.c @@ -1062,7 +1062,7 @@ static int parse_attach_detach_args(int argc, char **argv, int *progfd, if (!REQ_ARGS(2)) return -EINVAL; - *mapfd = map_parse_fd(&argc, &argv); + *mapfd = map_parse_fd(&argc, &argv, 0); if (*mapfd < 0) return *mapfd; @@ -1608,7 +1608,7 @@ static int load_with_options(int argc, char **argv, bool first_prog_only) } NEXT_ARG(); - fd = map_parse_fd(&argc, &argv); + fd = map_parse_fd(&argc, &argv, 0); if (fd < 0) goto err_free_reuse_maps; -- 2.34.1

1 month

1
1
0 0

[PATCH] selftests/filesystems: Fix build of anon_inode_test

by Mark Brown

The anon_inode_test test fails to build due to attempting to include a nonexisting overlayfs/wrapper.h: anon_inode_test.c:10:10: fatal error: overlayfs/wrappers.h: No such file or directory 10 | #include "overlayfs/wrappers.h" | ^~~~~~~~~~~~~~~~~~~~~~ This is due to 0bd92b9fe538 ("selftests/filesystems: move wrapper.h out of overlayfs subdir") which was added in the vfs-6.16.selftests branch which was based on -rc5 and does not contain the newly added test so once things were merged into vfs.all in the build started failing - both parent commits are fine. Fixes: feaa00dbff45a ("Merge branch 'vfs-6.16.selftests' into vfs.all") Signed-off-by: Mark Brown <broonie(a)kernel.org> --- tools/testing/selftests/filesystems/anon_inode_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/filesystems/anon_inode_test.c b/tools/testing/selftests/filesystems/anon_inode_test.c index e8e0ef1460d2..73e0a4d4fb2f 100644 --- a/tools/testing/selftests/filesystems/anon_inode_test.c +++ b/tools/testing/selftests/filesystems/anon_inode_test.c @@ -7,7 +7,7 @@ #include <sys/stat.h> #include "../kselftest_harness.h" -#include "overlayfs/wrappers.h" +#include "wrappers.h" TEST(anon_inode_no_chown) { --- base-commit: feaa00dbff45ad9a0dcd04a92f88c745bf880f55 change-id: 20250516-selftests-anon-inode-build-007e206e8422 Best regards, -- Mark Brown <broonie(a)kernel.org>

1 month

1
2
0 0

[PATCH net] selftests: net: build net/lib dependency in all target

by Bui Quang Minh

Currently, we only build net/lib dependency in install target. This commit moves that to all target so that net/lib is included in in-tree build and run_tests. Signed-off-by: Bui Quang Minh <minhquangbui99(a)gmail.com> --- tools/testing/selftests/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 6aa11cd3db42..5b04d83ad9a1 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -205,7 +205,7 @@ export KHDR_INCLUDES all: @ret=1; \ - for TARGET in $(TARGETS); do \ + for TARGET in $(TARGETS) $(INSTALL_DEP_TARGETS); do \ BUILD_TARGET=$$BUILD/$$TARGET; \ mkdir $$BUILD_TARGET -p; \ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET \ @@ -270,7 +270,7 @@ ifdef INSTALL_PATH install -m 744 run_kselftest.sh $(INSTALL_PATH)/ rm -f $(TEST_LIST) @ret=1; \ - for TARGET in $(TARGETS) $(INSTALL_DEP_TARGETS); do \ + for TARGET in $(TARGETS); do \ BUILD_TARGET=$$BUILD/$$TARGET; \ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET install \ INSTALL_PATH=$(INSTALL_PATH)/$$TARGET \ -- 2.43.0

1 month

3
5
0 0

[PATCH] selftests/seccomp: report errno and add hints on failure

by Sameeksha Sankpal

Signed-off-by: Sameeksha Sankpal <sameekshasankpal(a)gmail.com> --- tools/testing/selftests/seccomp/seccomp_bpf.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 14ba51b52095..d6a85d7b26da 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -4508,7 +4508,11 @@ static char get_proc_stat(struct __test_metadata *_metadata, pid_t pid) snprintf(proc_path, sizeof(proc_path), "/proc/%d/stat", pid); ASSERT_EQ(get_nth(_metadata, proc_path, 3, &line), 1); - + int rc = get_nth(_metadata, proc_path, 3, &line); + if (rc != 1) { + printf("[ERROR] user_notification_fifo: failed to read stat for PID %d (rc=%d)\n", pid, rc); + } + ASSERT_EQ(rc, 1); status = *line; free(line); @@ -4518,6 +4522,7 @@ static char get_proc_stat(struct __test_metadata *_metadata, pid_t pid) TEST(user_notification_fifo) { struct seccomp_notif_resp resp = {}; + ksft_print_msg("[INFO] Starting FIFO notification test\n"); struct seccomp_notif req = {}; int i, status, listener; pid_t pid, pids[3]; @@ -4535,6 +4540,7 @@ TEST(user_notification_fifo) listener = user_notif_syscall(__NR_getppid, SECCOMP_FILTER_FLAG_NEW_LISTENER); ASSERT_GE(listener, 0); + ksft_print_msg("[INFO] user_notification_fifo: listener PID is %d\n", listener); pid = fork(); ASSERT_GE(pid, 0); -- 2.43.0

1 month

2
4
0 0

[PATCH v10 0/5] rust: replace kernel::str::CStr w/ core::ffi::CStr

by Tamir Duberstein

This picks up from Michal Rostecki's work[0]. Per Michal's guidance I have omitted Co-authored tags, as the end result is quite different. Link: https://lore.kernel.org/rust-for-linux/20240819153656.28807-2-vadorovsky@pr… [0] Closes: https://github.com/Rust-for-Linux/linux/issues/1075 Signed-off-by: Tamir Duberstein <tamird(a)gmail.com> --- Changes in v10: - Rebase on cbeaa41dfe26b72639141e87183cb23e00d4b0dd. - Implement Alice's suggestion to use a proc macro to work around orphan rules otherwise preventing `core::ffi::CStr` to be directly printed with `{}`. - Link to v9: https://lore.kernel.org/r/20250317-cstr-core-v9-0-51d6cc522f62@gmail.com Changes in v9: - Rebase on rust-next. - Restore `impl Display for BStr` which exists upstream[1]. - Link: https://doc.rust-lang.org/nightly/std/bstr/struct.ByteStr.html#impl-Display… [1] - Link to v8: https://lore.kernel.org/r/20250203-cstr-core-v8-0-cb3f26e78686@gmail.com Changes in v8: - Move `{from,as}_char_ptr` back to `CStrExt`. This reduces the diff some. - Restore `from_bytes_with_nul_unchecked_mut`, `to_cstring`. - Link to v7: https://lore.kernel.org/r/20250202-cstr-core-v7-0-da1802520438@gmail.com Changes in v7: - Rebased on mainline. - Restore functionality added in commit a321f3ad0a5d ("rust: str: add {make,to}_{upper,lower}case() to CString"). - Used `diff.algorithm patience` to improve diff readability. - Link to v6: https://lore.kernel.org/r/20250202-cstr-core-v6-0-8469cd6d29fd@gmail.com Changes in v6: - Split the work into several commits for ease of review. - Restore `{from,as}_char_ptr` to allow building on ARM (see commit message). - Add `CStrExt` to `kernel::prelude`. (Alice Ryhl) - Remove `CStrExt::from_bytes_with_nul_unchecked_mut` and restore `DerefMut for CString`. (Alice Ryhl) - Rename and hide `kernel::c_str!` to encourage use of C-String literals. - Drop implementation and invocation changes in kunit.rs. (Trevor Gross) - Drop docs on `Display` impl. (Trevor Gross) - Rewrite docs in the style of the standard library. - Restore the `test_cstr_debug` unit tests to demonstrate that the implementation has changed. Changes in v5: - Keep the `test_cstr_display*` unit tests. Changes in v4: - Provide the `CStrExt` trait with `display()` method, which returns a `CStrDisplay` wrapper with `Display` implementation. This addresses the lack of `Display` implementation for `core::ffi::CStr`. - Provide `from_bytes_with_nul_unchecked_mut()` method in `CStrExt`, which might be useful and is going to prevent manual, unsafe casts. - Fix a typo (s/preffered/prefered/). Changes in v3: - Fix the commit message. - Remove redundant braces in `use`, when only one item is imported. Changes in v2: - Do not remove `c_str` macro. While it's preferred to use C-string literals, there are two cases where `c_str` is helpful: - When working with macros, which already return a Rust string literal (e.g. `stringify!`). - When building macros, where we want to take a Rust string literal as an argument (for caller's convenience), but still use it as a C-string internally. - Use Rust literals as arguments in macros (`new_mutex`, `new_condvar`, `new_mutex`). Use the `c_str` macro to convert these literals to C-string literals. - Use `c_str` in kunit.rs for converting the output of `stringify!` to a `CStr`. - Remove `DerefMut` implementation for `CString`. --- Tamir Duberstein (5): rust: retitle "Example" section as "Examples" rust: support formatting of foreign types rust: replace `CStr` with `core::ffi::CStr` rust: replace `kernel::c_str!` with C-Strings rust: remove core::ffi::CStr reexport drivers/block/rnull.rs | 2 +- drivers/gpu/drm/drm_panic_qr.rs | 5 +- drivers/gpu/nova-core/driver.rs | 2 +- drivers/gpu/nova-core/firmware.rs | 2 +- drivers/net/phy/ax88796b_rust.rs | 8 +- drivers/net/phy/qt2025.rs | 6 +- rust/kernel/block/mq.rs | 2 +- rust/kernel/device.rs | 9 +- rust/kernel/devres.rs | 2 +- rust/kernel/driver.rs | 4 +- rust/kernel/error.rs | 10 +- rust/kernel/faux.rs | 5 +- rust/kernel/firmware.rs | 16 +- rust/kernel/fmt.rs | 77 +++++++ rust/kernel/kunit.rs | 21 +- rust/kernel/lib.rs | 3 +- rust/kernel/miscdevice.rs | 5 +- rust/kernel/net/phy.rs | 12 +- rust/kernel/of.rs | 5 +- rust/kernel/pci.rs | 2 +- rust/kernel/platform.rs | 6 +- rust/kernel/prelude.rs | 5 +- rust/kernel/print.rs | 4 +- rust/kernel/seq_file.rs | 6 +- rust/kernel/str.rs | 415 ++++++++++------------------------- rust/kernel/sync.rs | 7 +- rust/kernel/sync/condvar.rs | 4 +- rust/kernel/sync/lock.rs | 4 +- rust/kernel/sync/lock/global.rs | 6 +- rust/kernel/sync/poll.rs | 1 + rust/kernel/workqueue.rs | 1 + rust/macros/fmt.rs | 118 ++++++++++ rust/macros/kunit.rs | 6 +- rust/macros/lib.rs | 21 +- rust/macros/module.rs | 2 +- samples/rust/rust_driver_faux.rs | 4 +- samples/rust/rust_driver_pci.rs | 4 +- samples/rust/rust_driver_platform.rs | 4 +- samples/rust/rust_misc_device.rs | 3 +- scripts/rustdoc_test_gen.rs | 2 +- 40 files changed, 426 insertions(+), 395 deletions(-) --- base-commit: cbeaa41dfe26b72639141e87183cb23e00d4b0dd change-id: 20250201-cstr-core-d4b9b69120cf Best regards, -- Tamir Duberstein <tamird(a)gmail.com>

1 month

4
33
0 0

[PATCH bpf-next v1 1/2] bpf: Restrict usage scope of bpf_get_cgroup_classid

by Jiayuan Chen

A previous commit expanded the usage scope of bpf_get_cgroup_classid() to all contexts (see Fixes tag), but this was inappropriate. First, syzkaller reported a bug [1]. Second, it uses skb as an argument, but its implementation varies across different bpf prog types. For example, in sock_filter and sock_addr, it retrieves the classid from the current context (bpf_get_cgroup_classid_curr_proto) instead of from skb. In tc egress and lwt, it fetches the classid from skb->sk, but in tc ingress, it returns 0. In summary, the definition of bpf_get_cgroup_classid() is ambiguous and its usage scenarios are limited. It should not be treated as a general-purpose helper. This patch reverts part of the previous commit. [1] https://syzkaller.appspot.com/bug?extid=9767c7ed68b95cfa69e6 Fixes: ee971630f20f ("bpf: Allow some trace helpers for all prog types") Reported-by: syzbot+9767c7ed68b95cfa69e6(a)syzkaller.appspotmail.com Signed-off-by: Jiayuan Chen <jiayuan.chen(a)linux.dev> --- include/linux/bpf-cgroup.h | 8 ++++++++ kernel/bpf/cgroup.c | 25 +++++++++++++++++++++++++ kernel/bpf/helpers.c | 4 ---- 3 files changed, 33 insertions(+), 4 deletions(-) diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index 4847dcade917..9de7adb68294 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -427,6 +427,8 @@ int cgroup_bpf_prog_query(const union bpf_attr *attr, const struct bpf_func_proto * cgroup_common_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog); +const struct bpf_func_proto * +cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog); #else static inline int cgroup_bpf_inherit(struct cgroup *cgrp) { return 0; } @@ -463,6 +465,12 @@ cgroup_common_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return NULL; } +static inline const struct bpf_func_proto * +cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +{ + return NULL; +} + static inline int bpf_cgroup_storage_assign(struct bpf_prog_aux *aux, struct bpf_map *map) { return 0; } static inline struct bpf_cgroup_storage *bpf_cgroup_storage_alloc( diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 62a1d8deb3dc..a99b72e6f1c9 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -1653,6 +1653,10 @@ cgroup_dev_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) if (func_proto) return func_proto; + func_proto = cgroup_current_func_proto(func_id, prog); + if (func_proto) + return func_proto; + switch (func_id) { case BPF_FUNC_perf_event_output: return &bpf_event_output_data_proto; @@ -2200,6 +2204,10 @@ sysctl_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) if (func_proto) return func_proto; + func_proto = cgroup_current_func_proto(func_id, prog); + if (func_proto) + return func_proto; + switch (func_id) { case BPF_FUNC_sysctl_get_name: return &bpf_sysctl_get_name_proto; @@ -2343,6 +2351,10 @@ cg_sockopt_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) if (func_proto) return func_proto; + func_proto = cgroup_current_func_proto(func_id, prog); + if (func_proto) + return func_proto; + switch (func_id) { #ifdef CONFIG_NET case BPF_FUNC_get_netns_cookie: @@ -2589,3 +2601,16 @@ cgroup_common_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return NULL; } } + +const struct bpf_func_proto * +cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +{ + switch (func_id) { +#ifdef CONFIG_CGROUP_NET_CLASSID + case BPF_FUNC_get_cgroup_classid: + return &bpf_get_cgroup_classid_curr_proto; +#endif + default: + return NULL; + } +} diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index b71e428ad936..9d0d54f4f0de 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2024,10 +2024,6 @@ bpf_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_get_current_ancestor_cgroup_id_proto; case BPF_FUNC_current_task_under_cgroup: return &bpf_current_task_under_cgroup_proto; -#endif -#ifdef CONFIG_CGROUP_NET_CLASSID - case BPF_FUNC_get_cgroup_classid: - return &bpf_get_cgroup_classid_curr_proto; #endif case BPF_FUNC_task_storage_get: if (bpf_prog_check_recur(prog)) -- 2.47.1

1 month, 1 week

2
2
0 0

[PATCH v2 0/2] rust: minor idiomatic fixes to doctest generator

by Tamir Duberstein

Please see individual commit messages. Signed-off-by: Tamir Duberstein <tamird(a)gmail.com> --- Changes in v2: - rustfmt. - Alice's RB. - Add second patch to emit information in panic rather than separately to stderr. - Link to v1: https://lore.kernel.org/r/20250527-idiomatic-match-slice-v1-1-34b0b1d1d58c@… --- Tamir Duberstein (2): rust: replace length checks with match rust: emit path candidates in panic message scripts/rustdoc_test_gen.rs | 33 +++++++++++++++++---------------- 1 file changed, 17 insertions(+), 16 deletions(-) --- base-commit: 1ce98bb2bb30713ec4374ef11ead0d7d3e856766 change-id: 20250527-idiomatic-match-slice-26a79d100e4d Best regards, -- Tamir Duberstein <tamird(a)gmail.com>

1 month, 1 week

2
5
0 0

[PATCH v7 0/8] ublk: decouple server threads from ublk_queues/hctxs

by Uday Shankar

This patch set aims to allow ublk server threads to better balance load amongst themselves by decoupling server threads from ublk_queues/hctxs, so that multiple threads can service I/Os that are issued from a single CPU. This can improve performance for workloads in which ublk server CPU is a bottleneck, and for which load is issued from CPUs which are not balanced across ublk_queues/hctxs. Performance ----------- First create two ublk devices with: ublkb0: ./kublk add -t null -q 2 --nthreads 2 ublkb1: ./kublk add -t null -q 2 --nthreads 2 --per_io_tasks Then run load with: taskset -c 1 fio/t/io_uring -r5 -p0 /dev/ublkb0: 1.90M IOPS taskset -c 1 fio/t/io_uring -r5 -p0 /dev/ublkb1: 2.18M IOPS Since ublkb1 has per-io-tasks, the second command is able to make use of both ublk server worker threads and therefore has increased max throughput. Caveats: - This testing was done on a system with 2 numa nodes, but the penalty of having I/O cross a numa (or LLC) boundary in the per_io_tasks case is quite high. So these numbers were obtained after moving all ublk server threads and the application threads to CPUs on the same numa node/LLC. - One might expect the scaling to be linear - because ublkb1 can make use of twice as many ublk server threads, it should be able to drive twice the throughput. However this is not true (the improvement is ~15%), and needs further investigation. Signed-off-by: Uday Shankar <ushankar(a)purestorage.com> --- Changes in v7: - Fix queue_rqs batch dispatch for per-io daemons - Kick round-robin tag allocation changes to a followup - Add explicit feature flag for per-task daemons (Ming Lei, Caleb Sander Mateos) - Move some variable assignments to avoid redundant computation (Caleb Sander Mateos) - Switch from storing pointers in ublk_io to computing based on address with container_of in a couple places (Ming Lei) - Link to v6: https://lore.kernel.org/r/20250507-ublk_task_per_io-v6-0-a2a298783c01@pures… Changes in v6: - Add a feature flag for this feature, called UBLK_F_RR_TAGS (Ming Lei) - Add test for this feature (Ming Lei) - Add documentation for this feature (Ming Lei) - Link to v5: https://lore.kernel.org/r/20250416-ublk_task_per_io-v5-0-9261ad7bff20@pures… Changes in v5: - Set io->task before ublk_mark_io_ready (Caleb Sander Mateos) - Set io->task atomically, read it atomically when needed - Return 0 on success from command-specific helpers in __ublk_ch_uring_cmd (Caleb Sander Mateos) - Rename ublk_handle_need_get_data to ublk_get_data (Caleb Sander Mateos) - Link to v4: https://lore.kernel.org/r/20250415-ublk_task_per_io-v4-0-54210b91a46f@pures… Changes in v4: - Drop "ublk: properly serialize all FETCH_REQs" since Ming is taking it in another set - Prevent data races by marking data structures which should be read-only in the I/O path as const (Ming Lei) - Link to v3: https://lore.kernel.org/r/20250410-ublk_task_per_io-v3-0-b811e8f4554a@pures… Changes in v3: - Check for UBLK_IO_FLAG_ACTIVE on I/O again after taking lock to ensure that two concurrent FETCH_REQs on the same I/O can't succeed (Caleb Sander Mateos) - Link to v2: https://lore.kernel.org/r/20250408-ublk_task_per_io-v2-0-b97877e6fd50@pures… Changes in v2: - Remove changes split into other patches - To ease error handling/synchronization, associate each I/O (instead of each queue) to the last task that issues a FETCH_REQ against it. Only that task is allowed to operate on the I/O. - Link to v1: https://lore.kernel.org/r/20241002224437.3088981-1-ushankar@purestorage.com --- Uday Shankar (8): ublk: have a per-io daemon instead of a per-queue daemon selftests: ublk: kublk: plumb q_id in io_uring user_data selftests: ublk: kublk: tie sqe allocation to io instead of queue selftests: ublk: kublk: lift queue initialization out of thread selftests: ublk: kublk: move per-thread data out of ublk_queue selftests: ublk: kublk: decouple ublk_queues from ublk server threads selftests: ublk: add test for per io daemons Documentation: ublk: document UBLK_F_PER_IO_DAEMON Documentation/block/ublk.rst | 35 ++- drivers/block/ublk_drv.c | 108 +++---- include/uapi/linux/ublk_cmd.h | 9 + tools/testing/selftests/ublk/Makefile | 1 + tools/testing/selftests/ublk/fault_inject.c | 4 +- tools/testing/selftests/ublk/file_backed.c | 20 +- tools/testing/selftests/ublk/kublk.c | 345 ++++++++++++++------- tools/testing/selftests/ublk/kublk.h | 73 +++-- tools/testing/selftests/ublk/null.c | 22 +- tools/testing/selftests/ublk/stripe.c | 17 +- tools/testing/selftests/ublk/test_generic_12.sh | 55 ++++ .../selftests/ublk/trace/count_ios_per_tid.bt | 11 + 12 files changed, 470 insertions(+), 230 deletions(-) --- base-commit: 533c87e2ed742454957f14d7bef9f48d5a72e72d change-id: 20250408-ublk_task_per_io-c693cf608d7a Best regards, -- Uday Shankar <ushankar(a)purestorage.com>

1 month, 1 week

3
17
0 0

[PATCH bpf-next 2/2] selftests/bpf: Add selftests for bpf_task_cwd_from_pid()

by Rong Tao

From: Rong Tao <rongtao(a)cestc.cn> Add some selftest testcases that validate the expected behavior of the bpf_task_cwd_from_pid() kfunc that was added in the prior patch. Signed-off-by: Rong Tao <rongtao(a)cestc.cn> --- .../selftests/bpf/prog_tests/task_kfunc.c | 3 ++ .../selftests/bpf/progs/task_kfunc_common.h | 1 + .../selftests/bpf/progs/task_kfunc_success.c | 47 +++++++++++++++++++ 3 files changed, 51 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/task_kfunc.c b/tools/testing/selftests/bpf/prog_tests/task_kfunc.c index 83b90335967a..c18a0cb67164 100644 --- a/tools/testing/selftests/bpf/prog_tests/task_kfunc.c +++ b/tools/testing/selftests/bpf/prog_tests/task_kfunc.c @@ -149,6 +149,9 @@ static const char * const success_tests[] = { "task_kfunc_acquire_trusted_walked", "test_task_kfunc_flavor_relo", "test_task_kfunc_flavor_relo_not_found", + "test_task_cwd_from_pid_arg", + "test_task_cwd_from_pid", + "test_task_cwd_from_pid_current", }; static const char * const vpid_success_tests[] = { diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_common.h b/tools/testing/selftests/bpf/progs/task_kfunc_common.h index e9c4fea7a4bb..b762054a1825 100644 --- a/tools/testing/selftests/bpf/progs/task_kfunc_common.h +++ b/tools/testing/selftests/bpf/progs/task_kfunc_common.h @@ -24,6 +24,7 @@ struct task_struct *bpf_task_acquire(struct task_struct *p) __ksym; void bpf_task_release(struct task_struct *p) __ksym; struct task_struct *bpf_task_from_pid(s32 pid) __ksym; struct task_struct *bpf_task_from_vpid(s32 vpid) __ksym; +int bpf_task_cwd_from_pid(s32 pid, char *buf, u32 buf_len) __ksym; void bpf_rcu_read_lock(void) __ksym; void bpf_rcu_read_unlock(void) __ksym; diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_success.c b/tools/testing/selftests/bpf/progs/task_kfunc_success.c index 5fb4fc19d26a..6424cf5c151e 100644 --- a/tools/testing/selftests/bpf/progs/task_kfunc_success.c +++ b/tools/testing/selftests/bpf/progs/task_kfunc_success.c @@ -351,6 +351,53 @@ int BPF_PROG(test_task_from_pid_invalid, struct task_struct *task, u64 clone_fla return 0; } +SEC("tp_btf/task_newtask") +int BPF_PROG(test_task_cwd_from_pid_arg, struct task_struct *task, u64 clone_flags) +{ + char cwd[256]; + + if (!is_test_kfunc_task()) + return 0; + + err = 0; + err += bpf_task_cwd_from_pid(task->pid, NULL, sizeof(cwd)) != -EINVAL; + err += bpf_task_cwd_from_pid(task->pid, cwd, 0) != -EINVAL; + err += bpf_task_cwd_from_pid(-1, cwd, sizeof(cwd)) != -ESRCH; + return 0; +} + +SEC("tp_btf/task_newtask") +int BPF_PROG(test_task_cwd_from_pid, struct task_struct *task, u64 clone_flags) +{ + char cwd[256]; + + if (!is_test_kfunc_task()) + return 0; + + err = bpf_task_cwd_from_pid(task->pid, cwd, sizeof(cwd)); + return 0; +} + +SEC("tp_btf/task_newtask") +int BPF_PROG(test_task_cwd_from_pid_current, struct task_struct *task, u64 clone_flags) +{ + char cwd[128], cwd2[128]; + struct task_struct *current; + + if (!is_test_kfunc_task()) + return 0; + + current = bpf_get_current_task_btf(); + + err = 0; + err += bpf_task_cwd_from_pid(task->pid, cwd, sizeof(cwd)); + err += bpf_task_cwd_from_pid(current->pid, cwd2, sizeof(cwd2)); + + err += bpf_strncmp(cwd, sizeof(cwd), cwd2) != 0; + + return 0; +} + SEC("tp_btf/task_newtask") int BPF_PROG(task_kfunc_acquire_trusted_walked, struct task_struct *task, u64 clone_flags) { -- 2.49.0

1 month, 1 week

1
0
0 0

[PATCH bpf-next 0/2] Add kfunc for doing pid -> task cwd

by Rong Tao

In some application like bpftrace [1], need to get the cwd from the pid. This patch provides a new kfunc that can get the cwd of the process from the pid. [1] https://github.com/bpftrace/bpftrace/issues/3314 Rong Tao (2): bpf: Add bpf_task_cwd_from_pid() kfunc selftests/bpf: Add selftests for bpf_task_cwd_from_pid() kernel/bpf/helpers.c | 45 ++++++++++++++++++ .../selftests/bpf/prog_tests/task_kfunc.c | 3 ++ .../selftests/bpf/progs/task_kfunc_common.h | 1 + .../selftests/bpf/progs/task_kfunc_success.c | 47 +++++++++++++++++++ 4 files changed, 96 insertions(+) -- 2.49.0

1 month, 1 week

1
0
0 0

[PATCH net-next] selftest: Add selftest for multicast address notifications

by Yuyang Huang

This commit adds a new kernel selftest to verify RTNLGRP_IPV4_MCADDR and RTNLGRP_IPV6_MCADDR notifications. The test works by adding and removing a dummy interface and then confirming that the system correctly receives join and removal notifications for the 224.0.0.1 and ff02::1 multicast addresses. The test relies on the iproute2 version to be 6.13+. Tested by the following command: $ vng -v --user root --cpus 16 -- \ make -C tools/testing/selftests TARGETS=net TEST_PROGS=rtnetlink.sh \ TEST_GEN_PROGS="" run_tests Cc: Maciej Żenczykowski <maze(a)google.com> Cc: Lorenzo Colitti <lorenzo(a)google.com> Signed-off-by: Yuyang Huang <yuyanghuang(a)google.com> --- tools/testing/selftests/net/rtnetlink.sh | 34 ++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh index 2e8243a65b50..9dbcaaeaf8cd 100755 --- a/tools/testing/selftests/net/rtnetlink.sh +++ b/tools/testing/selftests/net/rtnetlink.sh @@ -21,6 +21,7 @@ ALL_TESTS=" kci_test_vrf kci_test_encap kci_test_macsec + kci_test_mcast_addr_notification kci_test_ipsec kci_test_ipsec_offload kci_test_fdb_get @@ -1334,6 +1335,39 @@ kci_test_mngtmpaddr() return $ret } +kci_test_mcast_addr_notification() +{ + local tmpfile + local monitor_pid + local match_result + + tmpfile=$(mktemp) + + ip monitor maddr > $tmpfile & + monitor_pid=$! + sleep 1 + + run_cmd ip link add name test-dummy1 type dummy + run_cmd ip link set test-dummy1 up + run_cmd ip link del dev test-dummy1 + sleep 1 + + match_result=$(grep -cE "test-dummy1.*(224.0.0.1|ff02::1)" $tmpfile) + + kill $monitor_pid + rm $tmpfile + # There should be 4 line matches as follows. + # 13: test-dummy1 inet6 mcast ff02::1 scope global + # 13: test-dummy1 inet mcast 224.0.0.1 scope global + # Deleted 13: test-dummy1 inet mcast 224.0.0.1 scope global + # Deleted 13: test-dummy1 inet6 mcast ff02::1 scope global + if [ $match_result -ne 4 ];then + end_test "FAIL: mcast addr notification" + return 1 + fi + end_test "PASS: mcast addr notification" +} + kci_test_rtnl() { local current_test -- 2.49.0.1204.g71687c7c1d-goog

1 month, 1 week

2
2
0 0

[PATCH v1] selftests/mm: two fixes for the pfnmap test

by David Hildenbrand

When unregistering the signal handler, we have to pass SIG_DFL, and blindly reading from PFN 0 and PFN 1 seems to be problematic on !x86 systems. In particularly, on arm64 tx2 machines where noting resides at these physical memory locations, we can generate RAS errors. Let's fix it by scanning /proc/iomem for actual "System RAM". Reported-by: Ryan Roberts <ryan.roberts(a)arm.com> Closes: https://lore.kernel.org/all/232960c2-81db-47ca-a337-38c4bce5f997@arm.com/T/… Fixes: 2616b370323a ("selftests/mm: add simple VM_PFNMAP tests based on mmap'ing /dev/mem") Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com> Tested-by: Aishwarya TCV <aishwarya.tcv(a)arm.com> Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Signed-off-by: David Hildenbrand <david(a)redhat.com> --- tools/testing/selftests/mm/pfnmap.c | 61 +++++++++++++++++++++++++++-- 1 file changed, 57 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/mm/pfnmap.c b/tools/testing/selftests/mm/pfnmap.c index 8a9d19b6020c7..866ac023baf5c 100644 --- a/tools/testing/selftests/mm/pfnmap.c +++ b/tools/testing/selftests/mm/pfnmap.c @@ -12,6 +12,8 @@ #include <stdint.h> #include <unistd.h> #include <errno.h> +#include <stdio.h> +#include <ctype.h> #include <fcntl.h> #include <signal.h> #include <setjmp.h> @@ -43,14 +45,62 @@ static int test_read_access(char *addr, size_t size, size_t pagesize) /* Force a read that the compiler cannot optimize out. */ *((volatile char *)(addr + offs)); } - if (signal(SIGSEGV, signal_handler) == SIG_ERR) + if (signal(SIGSEGV, SIG_DFL) == SIG_ERR) return -EINVAL; return ret; } +static int find_ram_target(off_t *phys_addr, + unsigned long long pagesize) +{ + unsigned long long start, end; + char line[80], *end_ptr; + FILE *file; + + /* Search /proc/iomem for the first suitable "System RAM" range. */ + file = fopen("/proc/iomem", "r"); + if (!file) + return -errno; + + while (fgets(line, sizeof(line), file)) { + /* Ignore any child nodes. */ + if (!isalnum(line[0])) + continue; + + if (!strstr(line, "System RAM\n")) + continue; + + start = strtoull(line, &end_ptr, 16); + /* Skip over the "-" */ + end_ptr++; + /* Make end "exclusive". */ + end = strtoull(end_ptr, NULL, 16) + 1; + + /* Actual addresses are not exported */ + if (!start && !end) + break; + + /* We need full pages. */ + start = (start + pagesize - 1) & ~(pagesize - 1); + end &= ~(pagesize - 1); + + if (start != (off_t)start) + break; + + /* We need two pages. */ + if (end > start + 2 * pagesize) { + fclose(file); + *phys_addr = start; + return 0; + } + } + return -ENOENT; +} + FIXTURE(pfnmap) { + off_t phys_addr; size_t pagesize; int dev_mem_fd; char *addr1; @@ -63,14 +113,17 @@ FIXTURE_SETUP(pfnmap) { self->pagesize = getpagesize(); + /* We'll require two physical pages throughout our tests ... */ + if (find_ram_target(&self->phys_addr, self->pagesize)) + SKIP(return, "Cannot find ram target in '/proc/iomem'\n"); + self->dev_mem_fd = open("/dev/mem", O_RDONLY); if (self->dev_mem_fd < 0) SKIP(return, "Cannot open '/dev/mem'\n"); - /* We'll require the first two pages throughout our tests ... */ self->size1 = self->pagesize * 2; self->addr1 = mmap(NULL, self->size1, PROT_READ, MAP_SHARED, - self->dev_mem_fd, 0); + self->dev_mem_fd, self->phys_addr); if (self->addr1 == MAP_FAILED) SKIP(return, "Cannot mmap '/dev/mem'\n"); @@ -129,7 +182,7 @@ TEST_F(pfnmap, munmap_split) */ self->size2 = self->pagesize; self->addr2 = mmap(NULL, self->pagesize, PROT_READ, MAP_SHARED, - self->dev_mem_fd, 0); + self->dev_mem_fd, self->phys_addr); ASSERT_NE(self->addr2, MAP_FAILED); } -- 2.49.0

1 month, 1 week

1
0
0 0

[PATCH v2] selftests/mm: add simple VM_PFNMAP tests based on mmap'ing /dev/mem

by David Hildenbrand

Let's test some basic functionality using /dev/mem. These tests will implicitly cover some PAT (Page Attribute Handling) handling on x86. These tests will only run when /dev/mem access to the first two pages in physical address space is possible and allowed; otherwise, the tests are skipped. On current x86-64 with PAT inside a VM, all tests pass: TAP version 13 1..6 # Starting 6 tests from 1 test cases. # RUN pfnmap.madvise_disallowed ... # OK pfnmap.madvise_disallowed ok 1 pfnmap.madvise_disallowed # RUN pfnmap.munmap_split ... # OK pfnmap.munmap_split ok 2 pfnmap.munmap_split # RUN pfnmap.mremap_fixed ... # OK pfnmap.mremap_fixed ok 3 pfnmap.mremap_fixed # RUN pfnmap.mremap_shrink ... # OK pfnmap.mremap_shrink ok 4 pfnmap.mremap_shrink # RUN pfnmap.mremap_expand ... # OK pfnmap.mremap_expand ok 5 pfnmap.mremap_expand # RUN pfnmap.fork ... # OK pfnmap.fork ok 6 pfnmap.fork # PASSED: 6 / 6 tests passed. # Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0 However, we are able to trigger: [ 27.888251] x86/PAT: pfnmap:1790 freeing invalid memtype [mem 0x00000000-0x00000fff] There are probably more things worth testing in the future, such as MAP_PRIVATE handling. But this set of tests is sufficient to cover most of the things we will rework regarding PAT handling. Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Peter Xu <peterx(a)redhat.com> Cc: Dev Jain <dev.jain(a)arm.com> Signed-off-by: David Hildenbrand <david(a)redhat.com> --- Hopefully I didn't miss any review feedback. v1 -> v2: * Rewrite using kselftest_harness, which simplifies a lot of things * Add to .gitignore and run_vmtests.sh * Register signal handler on demand * Use volatile trick to force a read (not factoring out FORCE_READ just yet) * Drop mprotect() test case * Add some more comments why we test certain things * Use NULL for mmap() first parameter instead of 0 * Smaller fixes --- tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 1 + tools/testing/selftests/mm/pfnmap.c | 196 ++++++++++++++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 4 + 4 files changed, 202 insertions(+) create mode 100644 tools/testing/selftests/mm/pfnmap.c diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore index 91db34941a143..824266982aa36 100644 --- a/tools/testing/selftests/mm/.gitignore +++ b/tools/testing/selftests/mm/.gitignore @@ -20,6 +20,7 @@ mremap_test on-fault-limit transhuge-stress pagemap_ioctl +pfnmap *.tmp* protection_keys protection_keys_32 diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile index ad4d6043a60f0..ae6f994d3add7 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -84,6 +84,7 @@ TEST_GEN_FILES += mremap_test TEST_GEN_FILES += mseal_test TEST_GEN_FILES += on-fault-limit TEST_GEN_FILES += pagemap_ioctl +TEST_GEN_FILES += pfnmap TEST_GEN_FILES += thuge-gen TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += uffd-stress diff --git a/tools/testing/selftests/mm/pfnmap.c b/tools/testing/selftests/mm/pfnmap.c new file mode 100644 index 0000000000000..8a9d19b6020c7 --- /dev/null +++ b/tools/testing/selftests/mm/pfnmap.c @@ -0,0 +1,196 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Basic VM_PFNMAP tests relying on mmap() of '/dev/mem' + * + * Copyright 2025, Red Hat, Inc. + * + * Author(s): David Hildenbrand <david(a)redhat.com> + */ +#define _GNU_SOURCE +#include <stdlib.h> +#include <string.h> +#include <stdint.h> +#include <unistd.h> +#include <errno.h> +#include <fcntl.h> +#include <signal.h> +#include <setjmp.h> +#include <linux/mman.h> +#include <sys/mman.h> +#include <sys/wait.h> + +#include "../kselftest_harness.h" +#include "vm_util.h" + +static sigjmp_buf sigjmp_buf_env; + +static void signal_handler(int sig) +{ + siglongjmp(sigjmp_buf_env, -EFAULT); +} + +static int test_read_access(char *addr, size_t size, size_t pagesize) +{ + size_t offs; + int ret; + + if (signal(SIGSEGV, signal_handler) == SIG_ERR) + return -EINVAL; + + ret = sigsetjmp(sigjmp_buf_env, 1); + if (!ret) { + for (offs = 0; offs < size; offs += pagesize) + /* Force a read that the compiler cannot optimize out. */ + *((volatile char *)(addr + offs)); + } + if (signal(SIGSEGV, signal_handler) == SIG_ERR) + return -EINVAL; + + return ret; +} + +FIXTURE(pfnmap) +{ + size_t pagesize; + int dev_mem_fd; + char *addr1; + size_t size1; + char *addr2; + size_t size2; +}; + +FIXTURE_SETUP(pfnmap) +{ + self->pagesize = getpagesize(); + + self->dev_mem_fd = open("/dev/mem", O_RDONLY); + if (self->dev_mem_fd < 0) + SKIP(return, "Cannot open '/dev/mem'\n"); + + /* We'll require the first two pages throughout our tests ... */ + self->size1 = self->pagesize * 2; + self->addr1 = mmap(NULL, self->size1, PROT_READ, MAP_SHARED, + self->dev_mem_fd, 0); + if (self->addr1 == MAP_FAILED) + SKIP(return, "Cannot mmap '/dev/mem'\n"); + + /* ... and want to be able to read from them. */ + if (test_read_access(self->addr1, self->size1, self->pagesize)) + SKIP(return, "Cannot read-access mmap'ed '/dev/mem'\n"); + + self->size2 = 0; + self->addr2 = MAP_FAILED; +} + +FIXTURE_TEARDOWN(pfnmap) +{ + if (self->addr2 != MAP_FAILED) + munmap(self->addr2, self->size2); + if (self->addr1 != MAP_FAILED) + munmap(self->addr1, self->size1); + if (self->dev_mem_fd >= 0) + close(self->dev_mem_fd); +} + +TEST_F(pfnmap, madvise_disallowed) +{ + int advices[] = { + MADV_DONTNEED, + MADV_DONTNEED_LOCKED, + MADV_FREE, + MADV_WIPEONFORK, + MADV_COLD, + MADV_PAGEOUT, + MADV_POPULATE_READ, + MADV_POPULATE_WRITE, + }; + int i; + + /* All these advices must be rejected. */ + for (i = 0; i < ARRAY_SIZE(advices); i++) { + EXPECT_LT(madvise(self->addr1, self->pagesize, advices[i]), 0); + EXPECT_EQ(errno, EINVAL); + } +} + +TEST_F(pfnmap, munmap_split) +{ + /* + * Unmap the first page. This munmap() call is not really expected to + * fail, but we might be able to trigger other internal issues. + */ + ASSERT_EQ(munmap(self->addr1, self->pagesize), 0); + + /* + * Remap the first page while the second page is still mapped. This + * makes sure that any PAT tracking on x86 will allow for mmap()'ing + * a page again while some parts of the first mmap() are still + * around. + */ + self->size2 = self->pagesize; + self->addr2 = mmap(NULL, self->pagesize, PROT_READ, MAP_SHARED, + self->dev_mem_fd, 0); + ASSERT_NE(self->addr2, MAP_FAILED); +} + +TEST_F(pfnmap, mremap_fixed) +{ + char *ret; + + /* Reserve a destination area. */ + self->size2 = self->size1; + self->addr2 = mmap(NULL, self->size2, PROT_READ, MAP_ANON | MAP_PRIVATE, + -1, 0); + ASSERT_NE(self->addr2, MAP_FAILED); + + /* mremap() over our destination. */ + ret = mremap(self->addr1, self->size1, self->size2, + MREMAP_FIXED | MREMAP_MAYMOVE, self->addr2); + ASSERT_NE(ret, MAP_FAILED); +} + +TEST_F(pfnmap, mremap_shrink) +{ + char *ret; + + /* Shrinking is expected to work. */ + ret = mremap(self->addr1, self->size1, self->size1 - self->pagesize, 0); + ASSERT_NE(ret, MAP_FAILED); +} + +TEST_F(pfnmap, mremap_expand) +{ + /* + * Growing is not expected to work, and getting it right would + * be challenging. So this test primarily serves as an early warning + * that something that probably should never work suddenly works. + */ + self->size2 = self->size1 + self->pagesize; + self->addr2 = mremap(self->addr1, self->size1, self->size2, MREMAP_MAYMOVE); + ASSERT_EQ(self->addr2, MAP_FAILED); +} + +TEST_F(pfnmap, fork) +{ + pid_t pid; + int ret; + + /* fork() a child and test if the child can access the pages. */ + pid = fork(); + ASSERT_GE(pid, 0); + + if (!pid) { + EXPECT_EQ(test_read_access(self->addr1, self->size1, + self->pagesize), 0); + exit(0); + } + + wait(&ret); + if (WIFEXITED(ret)) + ret = WEXITSTATUS(ret); + else + ret = -EINVAL; + ASSERT_EQ(ret, 0); +} + +TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh index 188b125bf1f6b..dddd1dd8af145 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -63,6 +63,8 @@ separated by spaces: test soft dirty page bit semantics - pagemap test pagemap_scan IOCTL +- pfnmap + tests for VM_PFNMAP handling - cow test copy-on-write semantics - thp @@ -472,6 +474,8 @@ fi CATEGORY="pagemap" run_test ./pagemap_ioctl +CATEGORY="pfnmap" run_test ./pfnmap + # COW tests CATEGORY="cow" run_test ./cow -- 2.49.0

1 month, 1 week

4
12
0 0

[PATCH v2] rust: kunit: use crate-level mapping for `c_void`

by Jesung Yang

Use `kernel::ffi::c_void` instead of `core::ffi::c_void` for consistency and to centralize abstraction. Since `kernel::ffi::c_void` is a straightforward re-export of `core::ffi::c_void`, both are functionally equivalent. However, using `kernel::ffi::c_void` improves consistency across the kernel's Rust code and provides a unified reference point in case the definition ever needs to change, even if such a change is unlikely. Signed-off-by: Jesung Yang <y.j3ms.n(a)gmail.com> Link: https://rust-for-linux.zulipchat.com/#narrow/channel/288089/topic/x/near/52… --- So in sum, I believe it's reasonable to keep the diff unchanged... but I'm happy to adjust if you'd prefer a different approach. --- Changes in v2: - Add "Link" tag to the related discussion on Zulip - Reword the commit message to clarify `kernel::ffi::c_void` is a re-export - Link to v1: https://lore.kernel.org/rust-for-linux/20250526162429.1114862-1-y.j3ms.n@gm… --- rust/kernel/kunit.rs | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs index 81833a687b75..bd6fc712dd79 100644 --- a/rust/kernel/kunit.rs +++ b/rust/kernel/kunit.rs @@ -6,7 +6,8 @@ //! //! Reference: <https://docs.kernel.org/dev-tools/kunit/index.html> -use core::{ffi::c_void, fmt}; +use core::fmt; +use kernel::ffi::c_void; /// Prints a KUnit error-level message. /// base-commit: f4daa80d6be7d3c55ca72a8e560afc4e21f886aa -- 2.39.5

1 month, 1 week

3
2
0 0

[PATCH] rust: replace length checks with match

by Tamir Duberstein

Use a match expression with slice patterns instead of length checks and indexing. The result is more idiomatic, which is a better example for future Rust code authors. Signed-off-by: Tamir Duberstein <tamird(a)gmail.com> --- scripts/rustdoc_test_gen.rs | 33 +++++++++++++++++---------------- 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/scripts/rustdoc_test_gen.rs b/scripts/rustdoc_test_gen.rs index 1ca253594d38..a3dc251221e0 100644 --- a/scripts/rustdoc_test_gen.rs +++ b/scripts/rustdoc_test_gen.rs @@ -85,24 +85,25 @@ fn find_candidates( } } - assert!( - valid_paths.len() > 0, - "No path candidates found for `{file}`. This is likely a bug in the build system, or some \ - files went away while compiling." - ); - - if valid_paths.len() > 1 { - eprintln!("Several path candidates found:"); - for path in valid_paths { - eprintln!(" {path:?}"); + match valid_paths.as_slice() { + [] => panic!( + "No path candidates found for `{file}`. This is likely a bug in the build system, or \ + some files went away while compiling." + ), + [valid_path] => { + valid_path.to_str().unwrap() + } + valid_paths => { + eprintln!("Several path candidates found:"); + for path in valid_paths { + eprintln!(" {path:?}"); + } + panic!( + "Several path candidates found for `{file}`, please resolve the ambiguity by \ + renaming a file or folder." + ); } - panic!( - "Several path candidates found for `{file}`, please resolve the ambiguity by renaming \ - a file or folder." - ); } - - valid_paths[0].to_str().unwrap() } fn main() { --- base-commit: bfc3cd87559bc593bb32bb1482f9cae3308b6398 change-id: 20250527-idiomatic-match-slice-26a79d100e4d Best regards, -- Tamir Duberstein <tamird(a)gmail.com>

1 month, 1 week

3
3
0 0

[PATCH next] tools/testing: Check correct variable in open_procmap()

by Dan Carpenter

Check if "procmap_out->fd" is negative instead of "procmap_out" (which is a pointer). Fixes: bd23f293a0d5 ("tools/testing: add PROCMAP_QUERY helper functions in mm self tests") Signed-off-by: Dan Carpenter <dan.carpenter(a)linaro.org> --- tools/testing/selftests/mm/vm_util.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/mm/vm_util.c b/tools/testing/selftests/mm/vm_util.c index 1357e2d6a7b6..61d7bf1f8c62 100644 --- a/tools/testing/selftests/mm/vm_util.c +++ b/tools/testing/selftests/mm/vm_util.c @@ -439,7 +439,7 @@ int open_procmap(pid_t pid, struct procmap_fd *procmap_out) sprintf(path, "/proc/%d/maps", pid); procmap_out->query.size = sizeof(procmap_out->query); procmap_out->fd = open(path, O_RDONLY); - if (procmap_out < 0) + if (procmap_out->fd < 0) ret = -errno; return ret; -- 2.47.2

1 month, 1 week

2
1
0 0

[PATCH net-next v9] selftests/vsock: add initial vmtest.sh for vsock

by Bobby Eshleman

This commit introduces a new vmtest.sh runner for vsock. It uses virtme-ng/qemu to run tests in a VM. The tests validate G2H, H2G, and loopback. The testing tools from tools/testing/vsock/ are reused. Currently, only vsock_test is used. VMCI and hyperv support is included in the config file to be built with the -b option, though not used in the tests. Only tested on x86. To run: $ make -C tools/testing/selftests TARGETS=vsock $ tools/testing/selftests/vsock/vmtest.sh or $ make -C tools/testing/selftests TARGETS=vsock run_tests Example runs (after make -C tools/testing/selftests TARGETS=vsock): $ ./tools/testing/selftests/vsock/vmtest.sh 1..3 ok 0 vm_server_host_client ok 1 vm_client_host_server ok 2 vm_loopback SUMMARY: PASS=3 SKIP=0 FAIL=0 Log: /tmp/vsock_vmtest_m7DI.log $ ./tools/testing/selftests/vsock/vmtest.sh vm_loopback 1..1 ok 0 vm_loopback SUMMARY: PASS=1 SKIP=0 FAIL=0 Log: /tmp/vsock_vmtest_a1IO.log $ mkdir -p ~/scratch $ make -C tools/testing/selftests install TARGETS=vsock INSTALL_PATH=~/scratch [... omitted ...] $ cd ~/scratch $ ./run_kselftest.sh TAP version 13 1..1 # timeout set to 300 # selftests: vsock: vmtest.sh # 1..3 # ok 0 vm_server_host_client # ok 1 vm_client_host_server # ok 2 vm_loopback # SUMMARY: PASS=3 SKIP=0 FAIL=0 # Log: /tmp/vsock_vmtest_svEl.log ok 1 selftests: vsock: vmtest.sh Future work can include vsock_diag_test. Because vsock requires a VM to test anything other than loopback, this patch adds vmtest.sh as a kselftest itself. This is different than other systems that have a "vmtest.sh", where it is used as a utility script to spin up a VM to run the selftests as a guest (but isn't hooked into kselftest). Signed-off-by: Bobby Eshleman <bobbyeshleman(a)gmail.com> --- Changes in v9: - make kselftest build target depend on tools/testing/vsock sources (Stefano) - add check_vng() for vng version checking (Stefano) - add virtme_ssh_channel=tcp to kernel cmdline (Stefano) - Link to v8: https://lore.kernel.org/r/20250522-vsock-vmtest-v8-1-367619bef134@gmail.com Changes in v8: - remove NIPA comment from commit msg - remove tap_* functions and TAP_PREFIX - add -b for building kernel - Link to v7: https://lore.kernel.org/r/20250515-vsock-vmtest-v7-1-ba6fa86d6c2c@gmail.com Changes in v7: - fix exit code bug when ran is kselftest: use cnt_total instead of KSFT_NUM_TESTS - updated commit message with updated output - updated commit message with commands for installing/running as kselftest - Link to v6: https://lore.kernel.org/r/20250515-vsock-vmtest-v6-1-9af1cc023900@gmail.com Changes in v6: - add make cmd in commit message in vmtest.sh example (Stefano) - check nonzero size of QEMU_PIDFILE using -s conditional (Stefano) - display log file path after tests so it is easier to find amongst other random names - cleanup qemu pidfile if qemu is unable to remove it - make oops/warning failures more obvious with 'FAIL' prefix in log (simply saying 'detected' wasn't clear enough to identify failing condition) - Link to v5: https://lore.kernel.org/r/20250513-vsock-vmtest-v5-1-4e75c4a45ceb@gmail.com Changes in v5: - make log file a tmpfile (Paolo) - make sure both default and user defined QEMU gets handled by the dependency check (Paolo) - increased VM boot up timeout from 1m to 3m for slow hosts (Paolo) - rename vm_setup -> vm_start (Paolo) - derive wait_for_listener from selftests/net/net_helper.sh to removes ss usage - Remove unused 'unset IFS' line (Paolo) - leave space after variable declarations (Paolo) - make QEMU_PIDFILE a tmp file (Paolo) - make everything readonly that is only read (Paolo) - source ktap_helpers.sh for KSFT_PASS and friends (Paolo) - don't check for timeout util (Paolo) - add missing usage string for -q qemu arg - add tap prefix to SUMMARY line since it isn't part of TAP protocol - exit with the correct status code based on failure/pass counts - Link to v4: https://lore.kernel.org/r/20250507-vsock-vmtest-v4-1-6e2a97262cd6@gmail.com Changes in v4: - do not use special tab delimiter for help string parsing (Stefano + Paolo) - fix paths for when installing kselftest and running out-of-tree (Paolo) - change vng to using running kernel instead of compiled kernel (Paolo) - use multi-line string for QEMU_OPTS (Stefano) - change timeout to 300s (Paolo) - skip if tools are not found and use kselftests status codes (Paolo) - remove build from vmtest.sh (Paolo) - change 2222 -> SSH_HOST_PORT (Stefano) - add tap-format output - add vmtest.log to gitignore - check for vsock_test binary and remind user to build it if missing - create a proper build in makefile - style fixes - add ssh, timeout, and pkill to dependency check, just in case - fix numerical comparison in conditionals - check qemu pidfile exists before proceeding (avoid wasting time waiting for ssh) - fix tracking of pass/fail bug - fix stderr redirection bug - Link to v3: https://lore.kernel.org/r/20250428-vsock-vmtest-v3-1-181af6163f3e@gmail.com Changes in v3: - use common conditional syntax for checking variables - use return value instead of global rc - fix typo TEST_HOST_LISTENER_PORT -> TEST_HOST_PORT_LISTENER - use SIGTERM instead of SIGKILL on cleanup - use peer-cid=1 for loopback - change sleep delay times into globals - fix test_vm_loopback logging - add test selection in arguments - make QEMU an argument - check that vng binary is on path - use QEMU variable - change <tab><backslash> to <space><backslash> - fix hardcoded file paths - add comment in commit msg about script that vmtest.sh was based off of - Add tools/testing/selftest/vsock/Makefile for kselftest - Link to v2: https://lore.kernel.org/r/20250417-vsock-vmtest-v2-1-3901a27331e8@gmail.com Changes in v2: - add kernel oops and warnings checker - change testname variable to use FUNCNAME - fix spacing in test_vm_server_host_client - add -s skip build option to vmtest.sh - add test_vm_loopback - pass port to vm_wait_for_listener - fix indentation in vmtest.sh - add vmci and hyperv to config - changed whitespace from tabs to spaces in help string - Link to v1: https://lore.kernel.org/r/20250410-vsock-vmtest-v1-1-f35a81dab98c@gmail.com --- MAINTAINERS | 1 + tools/testing/selftests/vsock/.gitignore | 2 + tools/testing/selftests/vsock/Makefile | 17 ++ tools/testing/selftests/vsock/config | 114 ++++++++ tools/testing/selftests/vsock/settings | 1 + tools/testing/selftests/vsock/vmtest.sh | 487 +++++++++++++++++++++++++++++++ 6 files changed, 622 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 657a67f9031ef7798c19ac63e6383d4cb18a9e1f..3fbdd7bbfce7196a3cc7db70203317c6bd0e51fd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -25751,6 +25751,7 @@ F: include/uapi/linux/vm_sockets.h F: include/uapi/linux/vm_sockets_diag.h F: include/uapi/linux/vsockmon.h F: net/vmw_vsock/ +F: tools/testing/selftests/vsock/ F: tools/testing/vsock/ VMALLOC diff --git a/tools/testing/selftests/vsock/.gitignore b/tools/testing/selftests/vsock/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..9c5bf379480f829a14713d5f5dc7d525bc272e84 --- /dev/null +++ b/tools/testing/selftests/vsock/.gitignore @@ -0,0 +1,2 @@ +vmtest.log +vsock_test diff --git a/tools/testing/selftests/vsock/Makefile b/tools/testing/selftests/vsock/Makefile new file mode 100644 index 0000000000000000000000000000000000000000..c407c0afd9388ee692d59a95f304182f067596e9 --- /dev/null +++ b/tools/testing/selftests/vsock/Makefile @@ -0,0 +1,17 @@ +# SPDX-License-Identifier: GPL-2.0 + +CURDIR := $(abspath .) +TOOLSDIR := $(abspath ../../..) +VSOCK_TEST_DIR := $(TOOLSDIR)/testing/vsock +VSOCK_TEST_SRCS := $(wildcard $(VSOCK_TEST_DIR)/*.c $(VSOCK_TEST_DIR)/*.h) + +$(OUTPUT)/vsock_test: $(VSOCK_TEST_DIR)/vsock_test + install -m 755 $< $@ + +$(VSOCK_TEST_DIR)/vsock_test: $(VSOCK_TEST_SRCS) + $(MAKE) -C $(VSOCK_TEST_DIR) vsock_test +TEST_PROGS += vmtest.sh +TEST_GEN_FILES := vsock_test + +include ../lib.mk + diff --git a/tools/testing/selftests/vsock/config b/tools/testing/selftests/vsock/config new file mode 100644 index 0000000000000000000000000000000000000000..3bffaaf98fff92dc0e3bc1286afa3d8d5d52f4c7 --- /dev/null +++ b/tools/testing/selftests/vsock/config @@ -0,0 +1,114 @@ +CONFIG_BLK_DEV_INITRD=y +CONFIG_BPF=y +CONFIG_BPF_SYSCALL=y +CONFIG_BPF_JIT=y +CONFIG_HAVE_EBPF_JIT=y +CONFIG_BPF_EVENTS=y +CONFIG_FTRACE_SYSCALLS=y +CONFIG_FUNCTION_TRACER=y +CONFIG_HAVE_DYNAMIC_FTRACE=y +CONFIG_DYNAMIC_FTRACE=y +CONFIG_HAVE_KPROBES=y +CONFIG_KPROBES=y +CONFIG_KPROBE_EVENTS=y +CONFIG_ARCH_SUPPORTS_UPROBES=y +CONFIG_UPROBES=y +CONFIG_UPROBE_EVENTS=y +CONFIG_DEBUG_FS=y +CONFIG_FW_CFG_SYSFS=y +CONFIG_FW_CFG_SYSFS_CMDLINE=y +CONFIG_DRM=y +CONFIG_DRM_VIRTIO_GPU=y +CONFIG_DRM_VIRTIO_GPU_KMS=y +CONFIG_DRM_BOCHS=y +CONFIG_VIRTIO_IOMMU=y +CONFIG_SOUND=y +CONFIG_SND=y +CONFIG_SND_SEQUENCER=y +CONFIG_SND_PCI=y +CONFIG_SND_INTEL8X0=y +CONFIG_SND_HDA_CODEC_REALTEK=y +CONFIG_SECURITYFS=y +CONFIG_CGROUP_BPF=y +CONFIG_SQUASHFS=y +CONFIG_SQUASHFS_XZ=y +CONFIG_SQUASHFS_ZSTD=y +CONFIG_FUSE_FS=y +CONFIG_SERIO=y +CONFIG_PCI=y +CONFIG_INPUT=y +CONFIG_INPUT_KEYBOARD=y +CONFIG_KEYBOARD_ATKBD=y +CONFIG_SERIAL_8250=y +CONFIG_SERIAL_8250_CONSOLE=y +CONFIG_X86_VERBOSE_BOOTUP=y +CONFIG_VGA_CONSOLE=y +CONFIG_FB=y +CONFIG_FB_VESA=y +CONFIG_FRAMEBUFFER_CONSOLE=y +CONFIG_RTC_CLASS=y +CONFIG_RTC_HCTOSYS=y +CONFIG_RTC_DRV_CMOS=y +CONFIG_HYPERVISOR_GUEST=y +CONFIG_PARAVIRT=y +CONFIG_KVM_GUEST=y +CONFIG_KVM=y +CONFIG_KVM_INTEL=y +CONFIG_KVM_AMD=y +CONFIG_VSOCKETS=y +CONFIG_VSOCKETS_DIAG=y +CONFIG_VSOCKETS_LOOPBACK=y +CONFIG_VMWARE_VMCI_VSOCKETS=y +CONFIG_VIRTIO_VSOCKETS=y +CONFIG_VIRTIO_VSOCKETS_COMMON=y +CONFIG_HYPERV_VSOCKETS=y +CONFIG_VMWARE_VMCI=y +CONFIG_VHOST_VSOCK=y +CONFIG_HYPERV=y +CONFIG_UEVENT_HELPER=n +CONFIG_VIRTIO=y +CONFIG_VIRTIO_PCI=y +CONFIG_VIRTIO_MMIO=y +CONFIG_VIRTIO_BALLOON=y +CONFIG_NET=y +CONFIG_NET_CORE=y +CONFIG_NETDEVICES=y +CONFIG_NETWORK_FILESYSTEMS=y +CONFIG_INET=y +CONFIG_NET_9P=y +CONFIG_NET_9P_VIRTIO=y +CONFIG_9P_FS=y +CONFIG_VIRTIO_NET=y +CONFIG_CMDLINE_OVERRIDE=n +CONFIG_BINFMT_SCRIPT=y +CONFIG_SHMEM=y +CONFIG_TMPFS=y +CONFIG_UNIX=y +CONFIG_MODULE_SIG_FORCE=n +CONFIG_DEVTMPFS=y +CONFIG_TTY=y +CONFIG_VT=y +CONFIG_UNIX98_PTYS=y +CONFIG_EARLY_PRINTK=y +CONFIG_INOTIFY_USER=y +CONFIG_BLOCK=y +CONFIG_SCSI_LOWLEVEL=y +CONFIG_SCSI=y +CONFIG_SCSI_VIRTIO=y +CONFIG_BLK_DEV_SD=y +CONFIG_VIRTIO_CONSOLE=y +CONFIG_WATCHDOG=y +CONFIG_WATCHDOG_CORE=y +CONFIG_I6300ESB_WDT=y +CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y +CONFIG_OVERLAY_FS=y +CONFIG_DAX=y +CONFIG_DAX_DRIVER=y +CONFIG_FS_DAX=y +CONFIG_MEMORY_HOTPLUG=y +CONFIG_MEMORY_HOTREMOVE=y +CONFIG_ZONE_DEVICE=y +CONFIG_FUSE_FS=y +CONFIG_VIRTIO_FS=y +CONFIG_VSOCKETS=y +CONFIG_VIRTIO_VSOCKETS=y diff --git a/tools/testing/selftests/vsock/settings b/tools/testing/selftests/vsock/settings new file mode 100644 index 0000000000000000000000000000000000000000..694d70710ff08ac9fc91e9ecb5dbdadcf051f019 --- /dev/null +++ b/tools/testing/selftests/vsock/settings @@ -0,0 +1 @@ +timeout=300 diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh new file mode 100755 index 0000000000000000000000000000000000000000..edacebfc163251eee9cd495eb5e704dc7adc958e --- /dev/null +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -0,0 +1,487 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (c) 2025 Meta Platforms, Inc. and affiliates +# +# Dependencies: +# * virtme-ng +# * busybox-static (used by virtme-ng) +# * qemu (used by virtme-ng) + +readonly SCRIPT_DIR="$(cd -P -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)" +readonly KERNEL_CHECKOUT=$(realpath "${SCRIPT_DIR}"/../../../../) + +source "${SCRIPT_DIR}"/../kselftest/ktap_helpers.sh + +readonly VSOCK_TEST="${SCRIPT_DIR}"/vsock_test +readonly TEST_GUEST_PORT=51000 +readonly TEST_HOST_PORT=50000 +readonly TEST_HOST_PORT_LISTENER=50001 +readonly SSH_GUEST_PORT=22 +readonly SSH_HOST_PORT=2222 +readonly VSOCK_CID=1234 +readonly WAIT_PERIOD=3 +readonly WAIT_PERIOD_MAX=60 +readonly WAIT_TOTAL=$(( WAIT_PERIOD * WAIT_PERIOD_MAX )) +readonly QEMU_PIDFILE=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid) + +# virtme-ng offers a netdev for ssh when using "--ssh", but we also need a +# control port forwarded for vsock_test. Because virtme-ng doesn't support +# adding an additional port to forward to the device created from "--ssh" and +# virtme-init mistakenly sets identical IPs to the ssh device and additional +# devices, we instead opt out of using --ssh, add the device manually, and also +# add the kernel cmdline options that virtme-init uses to setup the interface. +readonly QEMU_TEST_PORT_FWD="hostfwd=tcp::${TEST_HOST_PORT}-:${TEST_GUEST_PORT}" +readonly QEMU_SSH_PORT_FWD="hostfwd=tcp::${SSH_HOST_PORT}-:${SSH_GUEST_PORT}" +readonly QEMU_OPTS="\ + -netdev user,id=n0,${QEMU_TEST_PORT_FWD},${QEMU_SSH_PORT_FWD} \ + -device virtio-net-pci,netdev=n0 \ + -device vhost-vsock-pci,guest-cid=${VSOCK_CID} \ + --pidfile ${QEMU_PIDFILE} \ +" +readonly KERNEL_CMDLINE="\ + virtme.dhcp net.ifnames=0 biosdevname=0 \ + virtme.ssh virtme_ssh_channel=tcp virtme_ssh_user=$USER \ +" +readonly LOG=$(mktemp /tmp/vsock_vmtest_XXXX.log) +readonly TEST_NAMES=(vm_server_host_client vm_client_host_server vm_loopback) +readonly TEST_DESCS=( + "Run vsock_test in server mode on the VM and in client mode on the host." + "Run vsock_test in client mode on the VM and in server mode on the host." + "Run vsock_test using the loopback transport in the VM." +) + +VERBOSE=0 + +usage() { + local name + local desc + local i + + echo + echo "$0 [OPTIONS] [TEST]..." + echo "If no TEST argument is given, all tests will be run." + echo + echo "Options" + echo " -b: build the kernel from the current source tree and use it for guest VMs" + echo " -q: set the path to or name of qemu binary" + echo " -v: verbose output" + echo + echo "Available tests" + + for ((i = 0; i < ${#TEST_NAMES[@]}; i++)); do + name=${TEST_NAMES[${i}]} + desc=${TEST_DESCS[${i}]} + printf "\t%-35s%-35s\n" "${name}" "${desc}" + done + echo + + exit 1 +} + +die() { + echo "$*" >&2 + exit "${KSFT_FAIL}" +} + +vm_ssh() { + ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost "$@" + return $? +} + +cleanup() { + if [[ -s "${QEMU_PIDFILE}" ]]; then + pkill -SIGTERM -F "${QEMU_PIDFILE}" > /dev/null 2>&1 + fi + + # If failure occurred during or before qemu start up, then we need + # to clean this up ourselves. + if [[ -e "${QEMU_PIDFILE}" ]]; then + rm "${QEMU_PIDFILE}" + fi +} + +check_args() { + local found + + for arg in "$@"; do + found=0 + for name in "${TEST_NAMES[@]}"; do + if [[ "${name}" = "${arg}" ]]; then + found=1 + break + fi + done + + if [[ "${found}" -eq 0 ]]; then + echo "${arg} is not an available test" >&2 + usage + fi + done + + for arg in "$@"; do + if ! command -v > /dev/null "test_${arg}"; then + echo "Test ${arg} not found" >&2 + usage + fi + done +} + +check_deps() { + for dep in vng ${QEMU} busybox pkill ssh; do + if [[ ! -x $(command -v "${dep}") ]]; then + echo -e "skip: dependency ${dep} not found!\n" + exit "${KSFT_SKIP}" + fi + done + + if [[ ! -x $(command -v "${VSOCK_TEST}") ]]; then + printf "skip: %s not found!" "${VSOCK_TEST}" + printf " Please build the kselftest vsock target.\n" + exit "${KSFT_SKIP}" + fi +} + +check_vng() { + local tested_versions + local version + local ok + + tested_versions=("1.33" "1.36") + version="$(vng --version)" + + ok=0 + for tv in "${tested_versions[@]}"; do + if [[ "${version}" == *"${tv}"* ]]; then + ok=1 + break + fi + done + + if [[ ! "${ok}" -eq 1 ]]; then + printf "warning: vng version '%s' has not been tested and may " "${version}" >&2 + printf "not function properly.\n\tThe following versions have been tested: " >&2 + echo "${tested_versions[@]}" >&2 + fi +} + +handle_build() { + if [[ ! "${BUILD}" -eq 1 ]]; then + return + fi + + if [[ ! -d "${KERNEL_CHECKOUT}" ]]; then + echo "-b requires vmtest.sh called from the kernel source tree" >&2 + exit 1 + fi + + pushd "${KERNEL_CHECKOUT}" &>/dev/null + + if ! vng --kconfig --config "${SCRIPT_DIR}"/config; then + die "failed to generate .config for kernel source tree (${KERNEL_CHECKOUT})" + fi + + if ! make -j$(nproc); then + die "failed to build kernel from source tree (${KERNEL_CHECKOUT})" + fi + + popd &>/dev/null +} + +vm_start() { + local logfile=/dev/null + local verbose_opt="" + local kernel_opt="" + local qemu + + qemu=$(command -v "${QEMU}") + + if [[ "${VERBOSE}" -eq 1 ]]; then + verbose_opt="--verbose" + logfile=/dev/stdout + fi + + if [[ "${BUILD}" -eq 1 ]]; then + kernel_opt="${KERNEL_CHECKOUT}" + fi + + vng \ + --run \ + ${kernel_opt} \ + ${verbose_opt} \ + --qemu-opts="${QEMU_OPTS}" \ + --qemu="${qemu}" \ + --user root \ + --append "${KERNEL_CMDLINE}" \ + --rw &> ${logfile} & + + if ! timeout ${WAIT_TOTAL} \ + bash -c 'while [[ ! -s '"${QEMU_PIDFILE}"' ]]; do sleep 1; done; exit 0'; then + die "failed to boot VM" + fi +} + +vm_wait_for_ssh() { + local i + + i=0 + while true; do + if [[ ${i} -gt ${WAIT_PERIOD_MAX} ]]; then + die "Timed out waiting for guest ssh" + fi + if vm_ssh -- true; then + break + fi + i=$(( i + 1 )) + sleep ${WAIT_PERIOD} + done +} + +# derived from selftests/net/net_helper.sh +wait_for_listener() +{ + local port=$1 + local interval=$2 + local max_intervals=$3 + local protocol=tcp + local pattern + local i + + pattern=":$(printf "%04X" "${port}") " + + # for tcp protocol additionally check the socket state + [ "${protocol}" = "tcp" ] && pattern="${pattern}0A" + for i in $(seq "${max_intervals}"); do + if awk '{print $2" "$4}' /proc/net/"${protocol}"* | \ + grep -q "${pattern}"; then + break + fi + sleep "${interval}" + done +} + +vm_wait_for_listener() { + local port=$1 + + vm_ssh <<EOF +$(declare -f wait_for_listener) +wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX} +EOF +} + +host_wait_for_listener() { + wait_for_listener "${TEST_HOST_PORT_LISTENER}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}" +} + +__log_stdin() { + cat | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }' +} + +__log_args() { + echo "$*" | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }' +} + +log() { + local prefix="$1" + + shift + local redirect= + if [[ ${VERBOSE} -eq 0 ]]; then + redirect=/dev/null + else + redirect=/dev/stdout + fi + + if [[ "$#" -eq 0 ]]; then + __log_stdin | tee -a "${LOG}" > ${redirect} + else + __log_args "$@" | tee -a "${LOG}" > ${redirect} + fi +} + +log_setup() { + log "setup" "$@" +} + +log_host() { + local testname=$1 + + shift + log "test:${testname}:host" "$@" +} + +log_guest() { + local testname=$1 + + shift + log "test:${testname}:guest" "$@" +} + +test_vm_server_host_client() { + local testname="${FUNCNAME[0]#test_}" + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=server \ + --control-port="${TEST_GUEST_PORT}" \ + --peer-cid=2 \ + 2>&1 | log_guest "${testname}" & + + vm_wait_for_listener "${TEST_GUEST_PORT}" + + ${VSOCK_TEST} \ + --mode=client \ + --control-host=127.0.0.1 \ + --peer-cid="${VSOCK_CID}" \ + --control-port="${TEST_HOST_PORT}" 2>&1 | log_host "${testname}" + + return $? +} + +test_vm_client_host_server() { + local testname="${FUNCNAME[0]#test_}" + + ${VSOCK_TEST} \ + --mode "server" \ + --control-port "${TEST_HOST_PORT_LISTENER}" \ + --peer-cid "${VSOCK_CID}" 2>&1 | log_host "${testname}" & + + host_wait_for_listener + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=client \ + --control-host=10.0.2.2 \ + --peer-cid=2 \ + --control-port="${TEST_HOST_PORT_LISTENER}" 2>&1 | log_guest "${testname}" + + return $? +} + +test_vm_loopback() { + local testname="${FUNCNAME[0]#test_}" + local port=60000 # non-forwarded local port + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=server \ + --control-port="${port}" \ + --peer-cid=1 2>&1 | log_guest "${testname}" & + + vm_wait_for_listener "${port}" + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=client \ + --control-host="127.0.0.1" \ + --control-port="${port}" \ + --peer-cid=1 2>&1 | log_guest "${testname}" + + return $? +} + +run_test() { + local host_oops_cnt_before + local host_warn_cnt_before + local vm_oops_cnt_before + local vm_warn_cnt_before + local host_oops_cnt_after + local host_warn_cnt_after + local vm_oops_cnt_after + local vm_warn_cnt_after + local name + local rc + + host_oops_cnt_before=$(dmesg | grep -c -i 'Oops') + host_warn_cnt_before=$(dmesg --level=warn | wc -l) + vm_oops_cnt_before=$(vm_ssh -- dmesg | grep -c -i 'Oops') + vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | wc -l) + + name=$(echo "${1}" | awk '{ print $1 }') + eval test_"${name}" + rc=$? + + host_oops_cnt_after=$(dmesg | grep -i 'Oops' | wc -l) + if [[ ${host_oops_cnt_after} -gt ${host_oops_cnt_before} ]]; then + echo "FAIL: kernel oops detected on host" | log_host "${name}" + rc=$KSFT_FAIL + fi + + host_warn_cnt_after=$(dmesg --level=warn | wc -l) + if [[ ${host_warn_cnt_after} -gt ${host_warn_cnt_before} ]]; then + echo "FAIL: kernel warning detected on host" | log_host "${name}" + rc=$KSFT_FAIL + fi + + vm_oops_cnt_after=$(vm_ssh -- dmesg | grep -i 'Oops' | wc -l) + if [[ ${vm_oops_cnt_after} -gt ${vm_oops_cnt_before} ]]; then + echo "FAIL: kernel oops detected on vm" | log_host "${name}" + rc=$KSFT_FAIL + fi + + vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | wc -l) + if [[ ${vm_warn_cnt_after} -gt ${vm_warn_cnt_before} ]]; then + echo "FAIL: kernel warning detected on vm" | log_host "${name}" + rc=$KSFT_FAIL + fi + + return "${rc}" +} + +QEMU="qemu-system-$(uname -m)" + +while getopts :hvsq:b o +do + case $o in + v) VERBOSE=1;; + b) BUILD=1;; + q) QEMU=$OPTARG;; + h|*) usage;; + esac +done +shift $((OPTIND-1)) + +trap cleanup EXIT + +if [[ ${#} -eq 0 ]]; then + ARGS=("${TEST_NAMES[@]}") +else + ARGS=("$@") +fi + +check_args "${ARGS[@]}" +check_deps +check_vng +handle_build + +echo "1..${#ARGS[@]}" + +log_setup "Booting up VM" +vm_start +vm_wait_for_ssh +log_setup "VM booted up" + +cnt_pass=0 +cnt_fail=0 +cnt_skip=0 +cnt_total=0 +for arg in "${ARGS[@]}"; do + run_test "${arg}" + rc=$? + if [[ ${rc} -eq $KSFT_PASS ]]; then + cnt_pass=$(( cnt_pass + 1 )) + echo "ok ${cnt_total} ${arg}" + elif [[ ${rc} -eq $KSFT_SKIP ]]; then + cnt_skip=$(( cnt_skip + 1 )) + echo "ok ${cnt_total} ${arg} # SKIP" + elif [[ ${rc} -eq $KSFT_FAIL ]]; then + cnt_fail=$(( cnt_fail + 1 )) + echo "not ok ${cnt_total} ${arg} # exit=$rc" + fi + cnt_total=$(( cnt_total + 1 )) +done + +echo "SUMMARY: PASS=${cnt_pass} SKIP=${cnt_skip} FAIL=${cnt_fail}" +echo "Log: ${LOG}" + +if [ $((cnt_pass + cnt_skip)) -eq ${cnt_total} ]; then + exit "$KSFT_PASS" +else + exit "$KSFT_FAIL" +fi --- base-commit: 8066e388be48f1ad62b0449dc1d31a25489fa12a change-id: 20250325-vsock-vmtest-b3a21d2102c2 Best regards, -- Bobby Eshleman <bobbyeshleman(a)gmail.com>

1 month, 1 week

3
2
0 0

[bug report] selftests/landlock: Add tests for audit flags and domain IDs

by Dan Carpenter

Hello Mickaël Salaün, Commit 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs") from Mar 20, 2025 (linux-next), leads to the following Smatch static checker warning: tools/testing/selftests/landlock/audit.h:408 audit_init_filter_exe() warn: unsigned 'filter->exe_len' is never less than zero. tools/testing/selftests/landlock/audit.h 399 static int audit_init_filter_exe(struct audit_filter *filter, const char *path) 400 { 401 char *absolute_path = NULL; 402 403 /* It is assume that there is not already filtering rules. */ 404 filter->record_type = AUDIT_EXE; 405 if (!path) { 406 filter->exe_len = readlink("/proc/self/exe", filter->exe, 407 sizeof(filter->exe) - 1); --> 408 if (filter->exe_len < 0) size_t can't be negative. 409 return -errno; 410 411 return 0; 412 } 413 414 absolute_path = realpath(path, NULL); 415 if (!absolute_path) 416 return -errno; 417 418 /* No need for the terminating NULL byte. */ 419 filter->exe_len = strlen(absolute_path); 420 if (filter->exe_len > sizeof(filter->exe)) 421 return -E2BIG; 422 423 memcpy(filter->exe, absolute_path, filter->exe_len); 424 free(absolute_path); 425 return 0; 426 } regards, dan carpenter

1 month, 1 week

1
0
0 0

[PATCH net-next] selftests/bpf: Fix bpf selftest build warning

by Saket Kumar Bhaskar

On linux-next, build for bpf selftest displays a warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/if_xdp.h' differs from latest version at 'include/uapi/linux/if_xdp.h'. Commit 8066e388be48 ("net: add UAPI to the header guard in various network headers") changed the header guard from _LINUX_IF_XDP_H to _UAPI_LINUX_IF_XDP_H in include/uapi/linux/if_xdp.h. To resolve the warning, update tools/include/uapi/linux/if_xdp.h to align with the changes in include/uapi/linux/if_xdp.h Fixes: 8066e388be48 ("net: add UAPI to the header guard in various network headers") Reported-by: Venkat Rao Bagalkote <venkat88(a)linux.ibm.com> Closes: https://lore.kernel.org/all/c2bc466d-dff2-4d0d-a797-9af7f676c065@linux.ibm.… Tested-by: Venkat Rao Bagalkote <venkat88(a)linux.ibm.com> Signed-off-by: Saket Kumar Bhaskar <skb99(a)linux.ibm.com> --- tools/include/uapi/linux/if_xdp.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h index 42869770776e..44f2bb93e7e6 100644 --- a/tools/include/uapi/linux/if_xdp.h +++ b/tools/include/uapi/linux/if_xdp.h @@ -7,8 +7,8 @@ * Magnus Karlsson <magnus.karlsson(a)intel.com> */ -#ifndef _LINUX_IF_XDP_H -#define _LINUX_IF_XDP_H +#ifndef _UAPI_LINUX_IF_XDP_H +#define _UAPI_LINUX_IF_XDP_H #include <linux/types.h> @@ -180,4 +180,4 @@ struct xdp_desc { /* TX packet carries valid metadata. */ #define XDP_TX_METADATA (1 << 1) -#endif /* _LINUX_IF_XDP_H */ +#endif /* _UAPI_LINUX_IF_XDP_H */ -- 2.43.5

1 month, 1 week

3
2
0 0

[PATCH 0/7] Rust KUnit `#[test]` support improvements

by Miguel Ojeda

Improvements that build on top of the very basic `#[test]` support merged in v6.15. They are fairly minimal changes, but they allow us to map `assert*!`s back to KUnit, plus to add support for test functions that return `Result`s. In essence, they get our `#[test]`s essentially on par with the documentation tests. I also took the chance to convert some host `#[test]`s we had to KUnit in order to showcase the feature. Finally, I added documentation that was lacking from the original submission. I hope this helps. Miguel Ojeda (7): rust: kunit: support KUnit-mapped `assert!` macros in `#[test]`s rust: kunit: support checked `-> Result`s in KUnit `#[test]`s rust: add `kunit_tests` to the prelude rust: str: convert `rusttest` tests into KUnit rust: str: take advantage of the `-> Result` support in KUnit `#[test]`'s Documentation: rust: rename `#[test]`s to "`rusttest` host tests" Documentation: rust: testing: add docs on the new KUnit `#[test]` tests Documentation/rust/testing.rst | 80 ++++++++++++++++++++++++++++++++-- init/Kconfig | 3 ++ rust/Makefile | 3 +- rust/kernel/kunit.rs | 29 ++++++++++-- rust/kernel/prelude.rs | 2 +- rust/kernel/str.rs | 76 ++++++++++++++------------------ rust/macros/helpers.rs | 16 +++++++ rust/macros/kunit.rs | 31 ++++++++++++- rust/macros/lib.rs | 6 ++- 9 files changed, 191 insertions(+), 55 deletions(-) base-commit: b4432656b36e5cc1d50a1f2dc15357543add530e -- 2.49.0

1 month, 1 week

7
39
0 0

[PATCH v7 net-next 00/15] AccECN protocol patch series

by chia-yu.chang＠nokia-bell-labs.com

From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com> Hello, Please find the v7: v7 (14-May-2025) - Modify group sizes of tcp_sock_write_txrx and tcp_sock_write_rx in #3 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>) - Fix the issue in #4 and #5 where the RFC3168 ECN behavior in tcp_ecn_send() is changed (Paolo Abeni <pabeni(a)redhat.com>) - Modify group size of tcp_sock_write_txrx in #4 and #6 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>) - Update commit message for #9 to explain the increase in tcp_sock_write_rx group size - Modify group size of tcp_sock_write_tx in #10 based on pahole results v6 (09-May-2025) - Add #3 to utilize exisintg holes of tcp_sock_write_txrx group for later patches (#4, #9, #10) with new u8 members (Paolo Abeni <pabeni(a)redhat.com>) - Add pahole outcomes before and after commit in #4, #5, #6, #9, #10, #15 (Paolo Abeni <pabeni(a)redhat.com>) - Define new helper function tcp_send_ack_reflect_ect() for sending ACK with reflected ECT in #5 (Paolo Abeni <pabeni(a)redhat.com>) - Add comments for function tcp_ecn_rcv_synack() in #5 (Paolo Abeni <pabeni(a)redhat.com>) - Add enum/define to be used by sysctl_tcp_ecn in #5, sysctl_tcp_ecn_option in #9, and sysctl_tcp_ecn_option_beacon in #10 (Paolo Abeni <pabeni(a)redhat.com>) - Move accecn_fail_mode and saw_accecn_opt in #5 and #11 to use exisintg holes of tcp_sock (Paolo Abeni <pabeni(a)redhat.com>) - Change data type of new members of tcp_request_sock and move them to the end of struct in #5 and #11 (Paolo Abeni <pabeni(a)redhat.com>) - Move new members of tcp_info to the end of struct in #6 (Paolo Abeni <pabeni(a)redhat.com>) - Merge previous #7 into #9 (Paolo Abeni <pabeni(a)redhat.com>) - Mask ecnfield with INET_ECN_MASK to remove WARN_ONCE in #9 (Paolo Abeni <pabeni(a)redhat.com>) - Reduce the indentation levels for reabability in #9 and #10 (Paolo Abeni <pabeni(a)redhat.com>) - Move delivered_ecn_bytes to the RX group in #9, accecn_opt_tstamp to the TX group in #10, pkts_acked_ewma to the RX group in #15 (Paolo Abeni <pabeni(a)redhat.com>) - Add changes in Documentation/networking/net_cachelines/tcp_sock.rst for new tcp_sock members in #3, #5, #6, #9, #10, #15 v5 (22-Apr-2025) - Further fix for 32-bit ARM alignment in tcp.c (Simon Horman <horms(a)kernel.org>) v4 (18-Apr-2025) - Fix 32-bit ARM assertion for alignment requirement (Simon Horman <horms(a)kernel.org>) v3 (14-Apr-2025) - Fix patch apply issue in v2 (Jakub Kicinski <kuba(a)kernel.org>) v2 (18-Mar-2025) - Add one missing patch from the previous AccECN protocol preparation patch series to this patch series. The full patch series can be found in https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/ The Accurate ECN draft can be found in https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-accurate-ecn-28 Best regards, Chia-Yu Chia-Yu Chang (2): tcp: reorganize tcp_sock_write_txrx group for variables later tcp: accecn: AccECN option failure handling Ilpo Järvinen (13): tcp: reorganize SYN ECN code tcp: fast path functions later tcp: AccECN core tcp: accecn: AccECN negotiation tcp: accecn: add AccECN rx byte counters tcp: accecn: AccECN needs to know delivered bytes tcp: sack option handling improvements tcp: accecn: AccECN option tcp: accecn: AccECN option send control tcp: accecn: AccECN option ceb/cep heuristic tcp: accecn: AccECN ACE field multi-wrap heuristic tcp: accecn: try to fit AccECN option with SACK tcp: try to avoid safer when ACKs are thinned .../networking/net_cachelines/tcp_sock.rst | 14 + include/linux/tcp.h | 34 +- include/net/netns/ipv4.h | 2 + include/net/tcp.h | 221 ++++++- include/uapi/linux/tcp.h | 7 + net/ipv4/syncookies.c | 3 + net/ipv4/sysctl_net_ipv4.c | 19 + net/ipv4/tcp.c | 30 +- net/ipv4/tcp_input.c | 608 +++++++++++++++++- net/ipv4/tcp_ipv4.c | 7 +- net/ipv4/tcp_minisocks.c | 92 ++- net/ipv4/tcp_output.c | 296 ++++++++- net/ipv6/syncookies.c | 1 + net/ipv6/tcp_ipv6.c | 1 + 14 files changed, 1237 insertions(+), 98 deletions(-) -- 2.34.1

1 month, 1 week

4
25
0 0

[PATCH v3] kselftest: x86: Improve MOV SS test result message

by Brigham Campbell

Make test result message more descriptive and grammatically correct. Signed-off-by: Brigham Campbell <me(a)brighamcampbell.com> --- No changes in v3. I'm resending this patch to adjust patch format suggestions made by Shuah. tools/testing/selftests/x86/mov_ss_trap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/x86/mov_ss_trap.c b/tools/testing/selftests/x86/mov_ss_trap.c index f22cb6b382f9..d80033c0d7eb 100644 --- a/tools/testing/selftests/x86/mov_ss_trap.c +++ b/tools/testing/selftests/x86/mov_ss_trap.c @@ -269,6 +269,6 @@ int main() ); } - printf("[OK]\tI aten't dead\n"); + printf("[OK]\tkernel handled MOV SS without crashing test\n"); return 0; } -- 2.49.0

1 month, 1 week

1
0
0 0

[PATCH net-next v2 0/8] Devmem TCP minor cleanups and ksft improvements

by Mina Almasry

v2: https://lore.kernel.org/netdev/20250519023517.4062941-1-almasrymina@google.… Changelog: - Collect acks and tested-bys (Thanks!) - Drop the patch that removed ksft_disruptive. That seems to not have any relation to behavior when test fails. - Address comments. --- Minor cleanups to the devmem tcp code, and not-so-minor improvements to the ksft. For the cleanups: - Address comment from Paolo post-merge. - Fix whitespace. - Add improvement dropped from Taehee's fix patch. For the ksft: - Add support for ipv4 environment. - Add support for drivers that are limited to 5-tuple flow steering. - Improve test by sending 1K data instead of just "hello\nworld" Cc: sdf(a)fomichev.me Cc: ap420073(a)gmail.com Cc: praan(a)google.com Cc: shivajikant(a)google.com Mina Almasry (8): net: devmem: move list_add to net_devmem_bind_dmabuf. page_pool: fix ugly page_pool formatting net: devmem: preserve sockc_err net: devmem: ksft: add ipv4 support net: devmem: ksft: add exit_wait to make rx test pass net: devmem: ksft: add 5 tuple FS support net: devmem: ksft: upgrade rx test to send 1K data net: devmem: ncdevmem: remove unused variable net/core/devmem.c | 5 +++- net/core/devmem.h | 5 +++- net/core/netdev-genl.c | 8 ++----- net/core/page_pool.c | 4 ++-- net/ipv4/tcp.c | 24 ++++++++----------- .../selftests/drivers/net/hw/devmem.py | 18 +++++++------- .../selftests/drivers/net/hw/ncdevmem.c | 16 ++++++++++--- 7 files changed, 44 insertions(+), 36 deletions(-) base-commit: ea15e046263b19e91ffd827645ae5dfa44ebd044 -- 2.49.0.1151.ga128411c76-goog

1 month, 1 week

3
14
0 0

[PATCH v17 net-next 0/5] DUALPI2 patch

by chia-yu.chang＠nokia-bell-labs.com

From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com> Hello, Please find the DualPI2 patch v17. This patch serise adds DualPI Improved with a Square (DualPI2) with following features: * Supports congestion controls that comply with the Prague requirements in RFC9331 (e.g. TCP-Prague) * Coupled dual-queue that separates the L4S traffic in a low latency queue (L-queue), without harming remaining traffic that is scheduled in classic queue (C-queue) due to congestion-coupling using PI2 as defined in RFC9332 * Configurable overload strategies * Use of sojourn time to reliably estimate queue delay * Supports ECN L4S-identifier (IP.ECN==0b*1) to classify traffic into respective queues For more details of DualPI2, please refer IETF RFC9332 (https://datatracker.ietf.org/doc/html/rfc9332). Best regards, Chia-Yu --- v17 (25-May-2025) - Replace 0xffffffff with U32_MAX (Paolo Abeni <pabeni(a)redhat.com>) - Use helper function qdisc_dequeue_internal() and add new helper function skb_apply_step() (Paolo Abeni <pabeni(a)redhat.com>) - Add s64 casting when calculating the delta of the PI controller (Paolo Abeni <pabeni(a)redhat.com>) - Change the drop reason into SKB_DROP_REASON_QDISC_CONGESTED for drop_early (Paolo Abeni <pabeni(a)redhat.com>) - Modify the condition to remove the original skb when enqueuing multiple GSO segments (Paolo Abeni <pabeni(a)redhat.com>) - Add READ_ONCE() in dualpi2_dump_stat() (Paolo Abeni <pabeni(a)redhat.com>) - Add comments, brackets, and brackets for readability (Paolo Abeni <pabeni(a)redhat.com>) v16 (16-MAy-2025) - Add qdisc_lock() to dualpi2_timer() in dualpi2_timer (Paolo Abeni <pabeni(a)redhat.com>) - Introduce convert_ns_to_usec() to convert usec to nsec without overflow in #1 (Paolo Abeni <pabeni(a)redhat.com>) - Update convert_us_tonsec() to convert nsec to usec without overflow in #2 (Paolo Abeni <pabeni(a)redhat.com>) - Add more descriptions with respect to DualPI2 in the cover ltter and add changelog in each patch (Paolo Abeni <pabeni(a)redhat.com>) v15 (09-May-2025) - Add enum of TCA_DUALPI2_ECN_MASK_CLA_ECT to remove potential leakeage in #1 (Simon Horman <horms(a)kernel.org>) - Fix one typo in comment of #2 - Update tc.yaml in #5 to aligh with the updated enum of pkt_sched.h v14 (05-May-2025) - Modify tc.yaml: (1) Replace flags with enum and remove enum-as-flags, (2) Remove credit-queue in xstats, and (3) Change attribute types (Donald Hunter <donald.hun - Add enum and fix the ordering of variables in pkt_sched.h to align with the modified tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Add validators for DROP_OVERLOAD, DROP_EARLY, ECN_MASK, and SPLIT_GSO in sch_dualpi2.c (Donald Hunter <donald.hunter(a)gmail.com>) - Update dualpi2.json to align with the updated variable order in pkt_sched.h - Reorder patches (Donald Hunter <donald.hunter(a)gmail.com>) v13 (26-Apr-2025) - Use dashes in member names to follow YNL conventions in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Define enumerations separately for flags of drop-early, drop-overload, ecn-mask, credit-queue in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Change the types of split-gso and step-packets into flag in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Revert to u32/u8 types for tc-dualpi2-xstats members in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Add new test cases in tc-tests/qdiscs/dualpi2.json to cover all dualpi2 parameters (Donald Hunter <donald.hunter(a)gmail.com>) - Change the type of TCA_DUALPI2_STEP_PACKETS into NLA_FLAG (Donald Hunter <donald.hunter(a)gmail.com>) v12 (22-Apr-2025) - Remove anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>) - Replace u32/u8 with uint and s32 with int in tc spec document (Paolo Abeni <pabeni(a)redhat.com>) - Introduce get_memory_limit function to handle potential overflow when multipling limit with MTU (Paolo Abeni <pabeni(a)redhat.com>) - Double the packet length to further include packet overhead in memory_limit (Paolo Abeni <pabeni(a)redhat.com>) - Remove the check of qdisc_qlen(sch) when calling qdisc_tree_reduce_backlog (Paolo Abeni <pabeni(a)redhat.com>) v11 (15-Apr-2025) - Replace hstimer_init with hstimer_setup in sch_dualpi2.c v10 (25-Mar-2025) - Remove leftover include in include/linux/netdevice.h and anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>) - Use kfree_skb_reason() and add SKB_DROP_REASON_DUALPI2_STEP_DROP drop reason (Paolo Abeni <pabeni(a)redhat.com>) - Split sch_dualpi2.c into 3 patches (and overall 5 patches): Struct definition & parsing, Dump stats & configuration, Enqueue/Dequeue (Paolo Abeni <pabeni(a)redhat.com>) v9 (16-Mar-2025) - Fix mem_usage error in previous version - Add min_qlen_step to the dualpi2 attribute as the minimum queue length in number of packets in the L-queue to start step threshold marking. In previous versions, this value was fixed to 2, so the step threshold was applied to mark packets in the L queue only when the queue length of the L queue was greater than or equal to 2 packets. This will cause larger queuing delays for L4S traffic at low rates (<20Mbps). So we parameterize it and change the default value to 0. Comparison of tcp_1down run 'HTB 20Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 11.55 11.70 ms 350 TCP upload avg : 18.96 N/A Mbits/s 350 TCP upload sum : 18.96 N/A Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 10.81 10.70 ms 350 TCP upload avg : 18.91 N/A Mbits/s 350 TCP upload sum : 18.91 N/A Mbits/s 350 Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 12.61 12.80 ms 350 TCP upload avg : 9.48 N/A Mbits/s 350 TCP upload sum : 9.48 N/A Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 11.06 10.80 ms 350 TCP upload avg : 9.43 N/A Mbits/s 350 TCP upload sum : 9.43 N/A Mbits/s 350 Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 40.86 37.45 ms 350 TCP upload avg : 0.88 N/A Mbits/s 350 TCP upload sum : 0.88 N/A Mbits/s 350 TCP upload::1 : 0.88 0.97 Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 11.07 10.40 ms 350 TCP upload avg : 0.55 N/A Mbits/s 350 TCP upload sum : 0.55 N/A Mbits/s 350 TCP upload::1 : 0.55 0.59 Mbits/s 350 v8 (11-Mar-2025) - Fix warning messages in v7 v7 (07-Mar-2025) - Separate into 3 patches to avoid mixing changes of documentation, selftest, and code. (Cong Wang <xiyou.wangcong(a)gmail.com>) v6 (04-Mar-2025) - Add modprobe for dulapi2 in tc-testing script tc-testing/tdc.sh (Jakub Kicinski <kuba(a)kernel.org>) - Update test cases in dualpi2.json - Update commit message v5 (22-Feb-2025) - A comparison was done between MQ + DUALPI2, MQ + FQ_PIE, MQ + FQ_CODEL: Unshaped 1gigE with 4 download streams test: - Summary of tcp_4down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 1.19 1.34 ms 349 TCP download avg : 235.42 N/A Mbits/s 349 TCP download sum : 941.68 N/A Mbits/s 349 TCP download::1 : 235.19 235.39 Mbits/s 349 TCP download::2 : 235.03 235.35 Mbits/s 349 TCP download::3 : 236.89 235.44 Mbits/s 349 TCP download::4 : 234.57 235.19 Mbits/s 349 - Summary of tcp_4down run 'MQ + FQ_PIE' avg median # data pts Ping (ms) ICMP : 1.21 1.37 ms 350 TCP download avg : 235.42 N/A Mbits/s 350 TCP download sum : 941.61 N/A Mbits/s 350 TCP download::1 : 232.54 233.13 Mbits/s 350 TCP download::2 : 232.52 232.80 Mbits/s 350 TCP download::3 : 233.14 233.78 Mbits/s 350 TCP download::4 : 243.41 241.48 Mbits/s 350 - Summary of tcp_4down run 'MQ + DUALPI2' avg median # data pts Ping (ms) ICMP : 1.19 1.34 ms 349 TCP download avg : 235.42 N/A Mbits/s 349 TCP download sum : 941.68 N/A Mbits/s 349 TCP download::1 : 235.19 235.39 Mbits/s 349 TCP download::2 : 235.03 235.35 Mbits/s 349 TCP download::3 : 236.89 235.44 Mbits/s 349 TCP download::4 : 234.57 235.19 Mbits/s 349 Unshaped 1gigE with 128 download streams test: - Summary of tcp_128down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 Unshaped 10gigE with 4 download streams test: - Summary of tcp_4down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 0.22 0.23 ms 350 TCP download avg : 2354.08 N/A Mbits/s 350 TCP download sum : 9416.31 N/A Mbits/s 350 TCP download::1 : 2353.65 2352.81 Mbits/s 350 TCP download::2 : 2354.54 2354.21 Mbits/s 350 TCP download::3 : 2353.56 2353.78 Mbits/s 350 TCP download::4 : 2354.56 2354.45 Mbits/s 350 - Summary of tcp_4down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 0.20 0.19 ms 350 TCP download avg : 2354.76 N/A Mbits/s 350 TCP download sum : 9419.04 N/A Mbits/s 350 TCP download::1 : 2354.77 2353.89 Mbits/s 350 TCP download::2 : 2353.41 2354.29 Mbits/s 350 TCP download::3 : 2356.18 2354.19 Mbits/s 350 TCP download::4 : 2354.68 2353.15 Mbits/s 350 - Summary of tcp_4down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 0.24 0.24 ms 350 TCP download avg : 2354.11 N/A Mbits/s 350 TCP download sum : 9416.43 N/A Mbits/s 350 TCP download::1 : 2354.75 2353.93 Mbits/s 350 TCP download::2 : 2353.15 2353.75 Mbits/s 350 TCP download::3 : 2353.49 2353.72 Mbits/s 350 TCP download::4 : 2355.04 2353.73 Mbits/s 350 Unshaped 10gigE with 128 download streams test: - Summary of tcp_128down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 7.57 8.69 ms 350 TCP download avg : 73.97 N/A Mbits/s 350 TCP download sum : 9467.82 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 7.82 8.91 ms 350 TCP download avg : 73.97 N/A Mbits/s 350 TCP download sum : 9468.42 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 6.87 7.93 ms 350 TCP download avg : 73.95 N/A Mbits/s 350 TCP download sum : 9465.87 N/A Mbits/s 350 From the results shown above, we see small differences between combinations. - Update commit message to include results of no_split_gso and split_gso (Dave Taht <dave.taht(a)gmail.com> and Paolo Abeni <pabeni(a)redhat.com>) - Add memlimit in the dualpi2 attribute, and add memory_used, max_memory_used, memory_limit in dualpi2 stats (Dave Taht <dave.taht(a)gmail.com>) - Update note in sch_dualpi2.c related to BBRv3 status (Dave Taht <dave.taht(a)gmail.com>) - Update license identifier (Dave Taht <dave.taht(a)gmail.com>) - Add selftest in tools/testing/selftests/tc-testing (Cong Wang <xiyou.wangcong(a)gmail.com>) - Use netlink policies for parameter checks (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Modify texts & fix typos in Documentation/netlink/specs/tc.yaml (Dave Taht <dave.taht(a)gmail.com>) - Add descriptions of packet counter statistics and the reset function of sch_dualpi2.c - Fix step_thresh in packets - Update code comments in sch_dualpi2.c v4 (22-Oct-2024) - Update statement in Kconfig for DualPI2 (Stephen Hemminger <stephen(a)networkplumber.org>) - Put a blank line after #define in sch_dualpi2.c (Stephen Hemminger <stephen(a)networkplumber.org>) - Fix line length warning. v3 (19-Oct-2024) - Fix compilaiton error - Update Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>) v2 (18-Oct-2024) - Add Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>) - Use dualpi2 instead of skb prefix (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Replace nla_parse_nested_deprecated with nla_parse_nested (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Fix line length warning --- Chia-Yu Chang (4): sched: Struct definition and parsing of dualpi2 qdisc sched: Dump configuration and statistics of dualpi2 qdisc selftests/tc-testing: Add selftests for qdisc DualPI2 Documentation: netlink: specs: tc: Add DualPI2 specification Koen De Schepper (1): sched: Add enqueue/dequeue of dualpi2 qdisc Documentation/netlink/specs/tc.yaml | 156 +++ include/net/dropreason-core.h | 6 + include/uapi/linux/pkt_sched.h | 68 + net/sched/Kconfig | 12 + net/sched/Makefile | 1 + net/sched/sch_dualpi2.c | 1146 +++++++++++++++++ tools/testing/selftests/tc-testing/config | 1 + .../tc-testing/tc-tests/qdiscs/dualpi2.json | 254 ++++ tools/testing/selftests/tc-testing/tdc.sh | 1 + 9 files changed, 1645 insertions(+) create mode 100644 net/sched/sch_dualpi2.c create mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/dualpi2.json -- 2.34.1

1 month, 1 week

2
6
0 0

[PATCH net-next 1/3] net: devmem: support single IOV with sendmsg

by Stanislav Fomichev

sendmsg() with a single iov becomes ITER_UBUF, sendmsg() with multiple iovs becomes ITER_IOVEC. iter_iov_len does not return correct value for UBUF, so teach to treat UBUF differently. Cc: Al Viro <viro(a)zeniv.linux.org.uk> Cc: Pavel Begunkov <asml.silence(a)gmail.com> Cc: Mina Almasry <almasrymina(a)google.com> Fixes: bd61848900bf ("net: devmem: Implement TX path") Signed-off-by: Stanislav Fomichev <stfomichev(a)gmail.com> --- include/linux/uio.h | 8 +++++++- net/core/datagram.c | 3 ++- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index 49ece9e1888f..393d0622cc28 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -99,7 +99,13 @@ static inline const struct iovec *iter_iov(const struct iov_iter *iter) } #define iter_iov_addr(iter) (iter_iov(iter)->iov_base + (iter)->iov_offset) -#define iter_iov_len(iter) (iter_iov(iter)->iov_len - (iter)->iov_offset) + +static inline size_t iter_iov_len(const struct iov_iter *i) +{ + if (i->iter_type == ITER_UBUF) + return i->count; + return iter_iov(i)->iov_len - i->iov_offset; +} static inline enum iter_type iov_iter_type(const struct iov_iter *i) { diff --git a/net/core/datagram.c b/net/core/datagram.c index 9ef5442536f5..c44f1d2b70a4 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -706,7 +706,8 @@ zerocopy_fill_skb_from_devmem(struct sk_buff *skb, struct iov_iter *from, * iov_addrs are interpreted as an offset in bytes into the dma-buf to * send from. We do not support other iter types. */ - if (iov_iter_type(from) != ITER_IOVEC) + if (iov_iter_type(from) != ITER_IOVEC && + iov_iter_type(from) != ITER_UBUF) return -EFAULT; while (length && iov_iter_count(from)) { -- 2.49.0

1 month, 1 week

4
12
0 0

[PATCHv2 net-next] selftests: net: move wait_local_port_listen to lib.sh

by Hangbin Liu

The function wait_local_port_listen() is the only function defined in net_helper.sh. Since some tests source both lib.sh and net_helper.sh, we can simplify the setup by moving wait_local_port_listen() to lib.sh. With this change, net_helper.sh becomes redundant and can be removed. Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com> --- v2: remove net_helper in selftests/drivers (Matthieu Baerts) --- tools/testing/selftests/drivers/net/Makefile | 1 - .../drivers/net/lib/sh/lib_netcons.sh | 1 - .../selftests/drivers/net/netdevsim/peer.sh | 2 +- tools/testing/selftests/net/Makefile | 2 +- tools/testing/selftests/net/busy_poll_test.sh | 2 +- .../net/ipv6_route_update_soft_lockup.sh | 1 - tools/testing/selftests/net/lib.sh | 21 ++++++++++++++++ tools/testing/selftests/net/mptcp/Makefile | 2 +- .../testing/selftests/net/mptcp/mptcp_lib.sh | 1 - tools/testing/selftests/net/net_helper.sh | 25 ------------------- tools/testing/selftests/net/pmtu.sh | 1 - tools/testing/selftests/net/udpgro.sh | 2 +- tools/testing/selftests/net/udpgro_bench.sh | 2 +- tools/testing/selftests/net/udpgro_frglist.sh | 2 +- tools/testing/selftests/net/udpgro_fwd.sh | 2 +- 15 files changed, 29 insertions(+), 38 deletions(-) delete mode 100644 tools/testing/selftests/net/net_helper.sh diff --git a/tools/testing/selftests/drivers/net/Makefile b/tools/testing/selftests/drivers/net/Makefile index 17db31aa58c9..be780bcb73a3 100644 --- a/tools/testing/selftests/drivers/net/Makefile +++ b/tools/testing/selftests/drivers/net/Makefile @@ -3,7 +3,6 @@ CFLAGS += $(KHDR_INCLUDES) TEST_INCLUDES := $(wildcard lib/py/*.py) \ $(wildcard lib/sh/*.sh) \ - ../../net/net_helper.sh \ ../../net/lib.sh \ TEST_GEN_FILES := \ diff --git a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh index 3c96b022954d..29b01b8e2215 100644 --- a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh +++ b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh @@ -33,7 +33,6 @@ NSIM_DEV_SYS_NEW="/sys/bus/netdevsim/new_device" # Used to create and delete namespaces source "${LIBDIR}"/../../../../net/lib.sh -source "${LIBDIR}"/../../../../net/net_helper.sh # Create netdevsim interfaces create_ifaces() { diff --git a/tools/testing/selftests/drivers/net/netdevsim/peer.sh b/tools/testing/selftests/drivers/net/netdevsim/peer.sh index aed62d9e6c0a..1bb46ec435d4 100755 --- a/tools/testing/selftests/drivers/net/netdevsim/peer.sh +++ b/tools/testing/selftests/drivers/net/netdevsim/peer.sh @@ -1,7 +1,7 @@ #!/bin/bash # SPDX-License-Identifier: GPL-2.0-only -source ../../../net/net_helper.sh +source ../../../net/lib.sh NSIM_DEV_1_ID=$((256 + RANDOM % 256)) NSIM_DEV_1_SYS=/sys/bus/netdevsim/devices/netdevsim$NSIM_DEV_1_ID diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 70a38f485d4d..ea84b88bcb30 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -115,7 +115,7 @@ YNL_GEN_FILES := busy_poller netlink-dumps TEST_GEN_FILES += $(YNL_GEN_FILES) TEST_FILES := settings -TEST_FILES += in_netns.sh lib.sh net_helper.sh setup_loopback.sh setup_veth.sh +TEST_FILES += in_netns.sh lib.sh setup_loopback.sh setup_veth.sh TEST_GEN_FILES += $(patsubst %.c,%.o,$(wildcard *.bpf.c)) diff --git a/tools/testing/selftests/net/busy_poll_test.sh b/tools/testing/selftests/net/busy_poll_test.sh index 7db292ec4884..7d2d40812074 100755 --- a/tools/testing/selftests/net/busy_poll_test.sh +++ b/tools/testing/selftests/net/busy_poll_test.sh @@ -1,6 +1,6 @@ #!/bin/bash # SPDX-License-Identifier: GPL-2.0 -source net_helper.sh +source lib.sh NSIM_SV_ID=$((256 + RANDOM % 256)) NSIM_SV_SYS=/sys/bus/netdevsim/devices/netdevsim$NSIM_SV_ID diff --git a/tools/testing/selftests/net/ipv6_route_update_soft_lockup.sh b/tools/testing/selftests/net/ipv6_route_update_soft_lockup.sh index a6b2b1f9c641..c6866e42f95c 100755 --- a/tools/testing/selftests/net/ipv6_route_update_soft_lockup.sh +++ b/tools/testing/selftests/net/ipv6_route_update_soft_lockup.sh @@ -69,7 +69,6 @@ # which can affect the conditions needed to trigger a soft lockup. source lib.sh -source net_helper.sh TEST_DURATION=300 ROUTING_TABLE_REFRESH_PERIOD=0.01 diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh index 7962da06f816..006fdadcc4b9 100644 --- a/tools/testing/selftests/net/lib.sh +++ b/tools/testing/selftests/net/lib.sh @@ -595,3 +595,24 @@ bridge_vlan_add() bridge vlan add "$@" defer bridge vlan del "$@" } + +wait_local_port_listen() +{ + local listener_ns="${1}" + local port="${2}" + local protocol="${3}" + local pattern + local i + + pattern=":$(printf "%04X" "${port}") " + + # for tcp protocol additionally check the socket state + [ ${protocol} = "tcp" ] && pattern="${pattern}0A" + for i in $(seq 10); do + if ip netns exec "${listener_ns}" awk '{print $2" "$4}' \ + /proc/net/"${protocol}"* | grep -q "${pattern}"; then + break + fi + sleep 0.1 + done +} diff --git a/tools/testing/selftests/net/mptcp/Makefile b/tools/testing/selftests/net/mptcp/Makefile index 340e1a777e16..e47788bfa671 100644 --- a/tools/testing/selftests/net/mptcp/Makefile +++ b/tools/testing/selftests/net/mptcp/Makefile @@ -11,7 +11,7 @@ TEST_GEN_FILES = mptcp_connect pm_nl_ctl mptcp_sockopt mptcp_inq mptcp_diag TEST_FILES := mptcp_lib.sh settings -TEST_INCLUDES := ../lib.sh $(wildcard ../lib/sh/*.sh) ../net_helper.sh +TEST_INCLUDES := ../lib.sh $(wildcard ../lib/sh/*.sh) EXTRA_CLEAN := *.pcap diff --git a/tools/testing/selftests/net/mptcp/mptcp_lib.sh b/tools/testing/selftests/net/mptcp/mptcp_lib.sh index 55212188871e..09cd24b2ae46 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_lib.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_lib.sh @@ -2,7 +2,6 @@ # SPDX-License-Identifier: GPL-2.0 . "$(dirname "${0}")/../lib.sh" -. "$(dirname "${0}")/../net_helper.sh" readonly KSFT_PASS=0 readonly KSFT_FAIL=1 diff --git a/tools/testing/selftests/net/net_helper.sh b/tools/testing/selftests/net/net_helper.sh deleted file mode 100644 index 6596fe03c77f..000000000000 --- a/tools/testing/selftests/net/net_helper.sh +++ /dev/null @@ -1,25 +0,0 @@ -#!/bin/bash -# SPDX-License-Identifier: GPL-2.0 -# -# Helper functions - -wait_local_port_listen() -{ - local listener_ns="${1}" - local port="${2}" - local protocol="${3}" - local pattern - local i - - pattern=":$(printf "%04X" "${port}") " - - # for tcp protocol additionally check the socket state - [ ${protocol} = "tcp" ] && pattern="${pattern}0A" - for i in $(seq 10); do - if ip netns exec "${listener_ns}" awk '{print $2" "$4}' \ - /proc/net/"${protocol}"* | grep -q "${pattern}"; then - break - fi - sleep 0.1 - done -} diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh index 66be7699c72c..88e914c4eef9 100755 --- a/tools/testing/selftests/net/pmtu.sh +++ b/tools/testing/selftests/net/pmtu.sh @@ -205,7 +205,6 @@ # Check that PMTU exceptions are created for both paths. source lib.sh -source net_helper.sh PAUSE_ON_FAIL=no VERBOSE=0 diff --git a/tools/testing/selftests/net/udpgro.sh b/tools/testing/selftests/net/udpgro.sh index d5ffd8c9172e..1dc337c709f8 100755 --- a/tools/testing/selftests/net/udpgro.sh +++ b/tools/testing/selftests/net/udpgro.sh @@ -3,7 +3,7 @@ # # Run a series of udpgro functional tests. -source net_helper.sh +source lib.sh readonly PEER_NS="ns-peer-$(mktemp -u XXXXXX)" diff --git a/tools/testing/selftests/net/udpgro_bench.sh b/tools/testing/selftests/net/udpgro_bench.sh index 815fad8c53a8..54fa4821bc5e 100755 --- a/tools/testing/selftests/net/udpgro_bench.sh +++ b/tools/testing/selftests/net/udpgro_bench.sh @@ -3,7 +3,7 @@ # # Run a series of udpgro benchmarks -source net_helper.sh +source lib.sh readonly PEER_NS="ns-peer-$(mktemp -u XXXXXX)" diff --git a/tools/testing/selftests/net/udpgro_frglist.sh b/tools/testing/selftests/net/udpgro_frglist.sh index 5f3d1a110d11..9a2cfec1153e 100755 --- a/tools/testing/selftests/net/udpgro_frglist.sh +++ b/tools/testing/selftests/net/udpgro_frglist.sh @@ -3,7 +3,7 @@ # # Run a series of udpgro benchmarks -source net_helper.sh +source lib.sh readonly PEER_NS="ns-peer-$(mktemp -u XXXXXX)" diff --git a/tools/testing/selftests/net/udpgro_fwd.sh b/tools/testing/selftests/net/udpgro_fwd.sh index f22f6c66997e..a39fdc4aa2ff 100755 --- a/tools/testing/selftests/net/udpgro_fwd.sh +++ b/tools/testing/selftests/net/udpgro_fwd.sh @@ -1,7 +1,7 @@ #!/bin/bash # SPDX-License-Identifier: GPL-2.0 -source net_helper.sh +source lib.sh BPF_FILE="lib/xdp_dummy.bpf.o" readonly BASE="ns-$(mktemp -u XXXXXX)" -- 2.46.0

1 month, 1 week

3
2
0 0

[PATCH] selftests: net: fix spelling and grammar mistakes

by Praveen Balakrishnan

Fix several spelling and grammatical mistakes in output messages from the net selftests to improve readability. Only the message strings for the test output have been modified. No changes to the functional logic of the tests have been made. Signed-off-by: Praveen Balakrishnan <praveen.balakrishnan(a)magd.ox.ac.uk> --- .../testing/selftests/net/netfilter/conntrack_vrf.sh | 4 ++-- tools/testing/selftests/net/openvswitch/ovs-dpctl.py | 2 +- tools/testing/selftests/net/rps_default_mask.sh | 12 ++++++------ 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/net/netfilter/conntrack_vrf.sh b/tools/testing/selftests/net/netfilter/conntrack_vrf.sh index e95ecb37c2b1..806d2bfbd6e7 100755 --- a/tools/testing/selftests/net/netfilter/conntrack_vrf.sh +++ b/tools/testing/selftests/net/netfilter/conntrack_vrf.sh @@ -236,9 +236,9 @@ EOF ip netns exec "$ns1" ping -q -w 1 -c 1 "$DUMMYNET".2 > /dev/null if ip netns exec "$ns0" nft list counter t fibcount | grep -q "packets 1"; then - echo "PASS: fib lookup returned exepected output interface" + echo "PASS: fib lookup returned expected output interface" else - echo "FAIL: fib lookup did not return exepected output interface" + echo "FAIL: fib lookup did not return expected output interface" ret=1 return fi diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py index 8a0396bfaf99..b521e0dea506 100644 --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py @@ -1877,7 +1877,7 @@ class OvsPacket(GenericNetlinkSocket): elif msg["cmd"] == OvsPacket.OVS_PACKET_CMD_EXECUTE: up.execute(msg) else: - print("Unkonwn cmd: %d" % msg["cmd"]) + print("Unknown cmd: %d" % msg["cmd"]) except NetlinkError as ne: raise ne diff --git a/tools/testing/selftests/net/rps_default_mask.sh b/tools/testing/selftests/net/rps_default_mask.sh index 4287a8529890..b200019b3c80 100755 --- a/tools/testing/selftests/net/rps_default_mask.sh +++ b/tools/testing/selftests/net/rps_default_mask.sh @@ -54,16 +54,16 @@ cleanup echo 1 > /proc/sys/net/core/rps_default_mask setup -chk_rps "changing rps_default_mask dont affect existing devices" "" lo $INITIAL_RPS_DEFAULT_MASK +chk_rps "changing rps_default_mask doesn't affect existing devices" "" lo $INITIAL_RPS_DEFAULT_MASK echo 3 > /proc/sys/net/core/rps_default_mask -chk_rps "changing rps_default_mask dont affect existing netns" $NETNS lo 0 +chk_rps "changing rps_default_mask doesn't affect existing netns" $NETNS lo 0 ip link add name $VETH type veth peer netns $NETNS name $VETH ip link set dev $VETH up ip -n $NETNS link set dev $VETH up -chk_rps "changing rps_default_mask affect newly created devices" "" $VETH 3 -chk_rps "changing rps_default_mask don't affect newly child netns[II]" $NETNS $VETH 0 +chk_rps "changing rps_default_mask affects newly created devices" "" $VETH 3 +chk_rps "changing rps_default_mask doesn't affect newly child netns[II]" $NETNS $VETH 0 ip link del dev $VETH ip netns del $NETNS @@ -72,8 +72,8 @@ chk_rps "rps_default_mask is 0 by default in child netns" "$NETNS" lo 0 ip netns exec $NETNS sysctl -qw net.core.rps_default_mask=1 ip link add name $VETH type veth peer netns $NETNS name $VETH -chk_rps "changing rps_default_mask in child ns don't affect the main one" "" lo $INITIAL_RPS_DEFAULT_MASK +chk_rps "changing rps_default_mask in child ns doesn't affect the main one" "" lo $INITIAL_RPS_DEFAULT_MASK chk_rps "changing rps_default_mask in child ns affects new childns devices" $NETNS $VETH 1 -chk_rps "changing rps_default_mask in child ns don't affect existing devices" $NETNS lo 0 +chk_rps "changing rps_default_mask in child ns doesn't affect existing devices" $NETNS lo 0 exit $ret -- 2.39.5

1 month, 1 week

5
5
0 0

[PATCH net-next v8] selftests/vsock: add initial vmtest.sh for vsock

by Bobby Eshleman

This commit introduces a new vmtest.sh runner for vsock. It uses virtme-ng/qemu to run tests in a VM. The tests validate G2H, H2G, and loopback. The testing tools from tools/testing/vsock/ are reused. Currently, only vsock_test is used. VMCI and hyperv support is included in the config file to be built with the -b option, though not used in the tests. Only tested on x86. To run: $ make -C tools/testing/selftests TARGETS=vsock $ tools/testing/selftests/vsock/vmtest.sh or $ make -C tools/testing/selftests TARGETS=vsock run_tests Example runs (after make -C tools/testing/selftests TARGETS=vsock): $ ./tools/testing/selftests/vsock/vmtest.sh 1..3 ok 0 vm_server_host_client ok 1 vm_client_host_server ok 2 vm_loopback SUMMARY: PASS=3 SKIP=0 FAIL=0 Log: /tmp/vsock_vmtest_m7DI.log $ ./tools/testing/selftests/vsock/vmtest.sh vm_loopback 1..1 ok 0 vm_loopback SUMMARY: PASS=1 SKIP=0 FAIL=0 Log: /tmp/vsock_vmtest_a1IO.log $ mkdir -p ~/scratch $ make -C tools/testing/selftests install TARGETS=vsock INSTALL_PATH=~/scratch [... omitted ...] $ cd ~/scratch $ ./run_kselftest.sh TAP version 13 1..1 # timeout set to 300 # selftests: vsock: vmtest.sh # 1..3 # ok 0 vm_server_host_client # ok 1 vm_client_host_server # ok 2 vm_loopback # SUMMARY: PASS=3 SKIP=0 FAIL=0 # Log: /tmp/vsock_vmtest_svEl.log ok 1 selftests: vsock: vmtest.sh Future work can include vsock_diag_test. Because vsock requires a VM to test anything other than loopback, this patch adds vmtest.sh as a kselftest itself. This is different than other systems that have a "vmtest.sh", where it is used as a utility script to spin up a VM to run the selftests as a guest (but isn't hooked into kselftest). Signed-off-by: Bobby Eshleman <bobbyeshleman(a)gmail.com> --- Changes in v8: - remove NIPA comment from commit msg - remove tap_* functions and TAP_PREFIX - add -b for building kernel - Link to v7: https://lore.kernel.org/r/20250515-vsock-vmtest-v7-1-ba6fa86d6c2c@gmail.com Changes in v7: - fix exit code bug when ran is kselftest: use cnt_total instead of KSFT_NUM_TESTS - updated commit message with updated output - updated commit message with commands for installing/running as kselftest - Link to v6: https://lore.kernel.org/r/20250515-vsock-vmtest-v6-1-9af1cc023900@gmail.com Changes in v6: - add make cmd in commit message in vmtest.sh example (Stefano) - check nonzero size of QEMU_PIDFILE using -s conditional (Stefano) - display log file path after tests so it is easier to find amongst other random names - cleanup qemu pidfile if qemu is unable to remove it - make oops/warning failures more obvious with 'FAIL' prefix in log (simply saying 'detected' wasn't clear enough to identify failing condition) - Link to v5: https://lore.kernel.org/r/20250513-vsock-vmtest-v5-1-4e75c4a45ceb@gmail.com Changes in v5: - make log file a tmpfile (Paolo) - make sure both default and user defined QEMU gets handled by the dependency check (Paolo) - increased VM boot up timeout from 1m to 3m for slow hosts (Paolo) - rename vm_setup -> vm_start (Paolo) - derive wait_for_listener from selftests/net/net_helper.sh to removes ss usage - Remove unused 'unset IFS' line (Paolo) - leave space after variable declarations (Paolo) - make QEMU_PIDFILE a tmp file (Paolo) - make everything readonly that is only read (Paolo) - source ktap_helpers.sh for KSFT_PASS and friends (Paolo) - don't check for timeout util (Paolo) - add missing usage string for -q qemu arg - add tap prefix to SUMMARY line since it isn't part of TAP protocol - exit with the correct status code based on failure/pass counts - Link to v4: https://lore.kernel.org/r/20250507-vsock-vmtest-v4-1-6e2a97262cd6@gmail.com Changes in v4: - do not use special tab delimiter for help string parsing (Stefano + Paolo) - fix paths for when installing kselftest and running out-of-tree (Paolo) - change vng to using running kernel instead of compiled kernel (Paolo) - use multi-line string for QEMU_OPTS (Stefano) - change timeout to 300s (Paolo) - skip if tools are not found and use kselftests status codes (Paolo) - remove build from vmtest.sh (Paolo) - change 2222 -> SSH_HOST_PORT (Stefano) - add tap-format output - add vmtest.log to gitignore - check for vsock_test binary and remind user to build it if missing - create a proper build in makefile - style fixes - add ssh, timeout, and pkill to dependency check, just in case - fix numerical comparison in conditionals - check qemu pidfile exists before proceeding (avoid wasting time waiting for ssh) - fix tracking of pass/fail bug - fix stderr redirection bug - Link to v3: https://lore.kernel.org/r/20250428-vsock-vmtest-v3-1-181af6163f3e@gmail.com Changes in v3: - use common conditional syntax for checking variables - use return value instead of global rc - fix typo TEST_HOST_LISTENER_PORT -> TEST_HOST_PORT_LISTENER - use SIGTERM instead of SIGKILL on cleanup - use peer-cid=1 for loopback - change sleep delay times into globals - fix test_vm_loopback logging - add test selection in arguments - make QEMU an argument - check that vng binary is on path - use QEMU variable - change <tab><backslash> to <space><backslash> - fix hardcoded file paths - add comment in commit msg about script that vmtest.sh was based off of - Add tools/testing/selftest/vsock/Makefile for kselftest - Link to v2: https://lore.kernel.org/r/20250417-vsock-vmtest-v2-1-3901a27331e8@gmail.com Changes in v2: - add kernel oops and warnings checker - change testname variable to use FUNCNAME - fix spacing in test_vm_server_host_client - add -s skip build option to vmtest.sh - add test_vm_loopback - pass port to vm_wait_for_listener - fix indentation in vmtest.sh - add vmci and hyperv to config - changed whitespace from tabs to spaces in help string - Link to v1: https://lore.kernel.org/r/20250410-vsock-vmtest-v1-1-f35a81dab98c@gmail.com --- MAINTAINERS | 1 + tools/testing/selftests/vsock/.gitignore | 2 + tools/testing/selftests/vsock/Makefile | 16 ++ tools/testing/selftests/vsock/config | 114 ++++++++ tools/testing/selftests/vsock/settings | 1 + tools/testing/selftests/vsock/vmtest.sh | 460 +++++++++++++++++++++++++++++++ 6 files changed, 594 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 657a67f9031ef7798c19ac63e6383d4cb18a9e1f..3fbdd7bbfce7196a3cc7db70203317c6bd0e51fd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -25751,6 +25751,7 @@ F: include/uapi/linux/vm_sockets.h F: include/uapi/linux/vm_sockets_diag.h F: include/uapi/linux/vsockmon.h F: net/vmw_vsock/ +F: tools/testing/selftests/vsock/ F: tools/testing/vsock/ VMALLOC diff --git a/tools/testing/selftests/vsock/.gitignore b/tools/testing/selftests/vsock/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..9c5bf379480f829a14713d5f5dc7d525bc272e84 --- /dev/null +++ b/tools/testing/selftests/vsock/.gitignore @@ -0,0 +1,2 @@ +vmtest.log +vsock_test diff --git a/tools/testing/selftests/vsock/Makefile b/tools/testing/selftests/vsock/Makefile new file mode 100644 index 0000000000000000000000000000000000000000..7ab4970e5e8a019be33f96a36f95c00573d7bfcf --- /dev/null +++ b/tools/testing/selftests/vsock/Makefile @@ -0,0 +1,16 @@ +# SPDX-License-Identifier: GPL-2.0 + +CURDIR := $(abspath .) +TOOLSDIR := $(abspath ../../..) + +$(OUTPUT)/vsock_test: $(TOOLSDIR)/testing/vsock/vsock_test + install -m 755 $< $@ + +$(TOOLSDIR)/testing/vsock/vsock_test: + $(MAKE) -C $(TOOLSDIR)/testing/vsock vsock_test + +TEST_PROGS += vmtest.sh +TEST_GEN_FILES := vsock_test + +include ../lib.mk + diff --git a/tools/testing/selftests/vsock/config b/tools/testing/selftests/vsock/config new file mode 100644 index 0000000000000000000000000000000000000000..3bffaaf98fff92dc0e3bc1286afa3d8d5d52f4c7 --- /dev/null +++ b/tools/testing/selftests/vsock/config @@ -0,0 +1,114 @@ +CONFIG_BLK_DEV_INITRD=y +CONFIG_BPF=y +CONFIG_BPF_SYSCALL=y +CONFIG_BPF_JIT=y +CONFIG_HAVE_EBPF_JIT=y +CONFIG_BPF_EVENTS=y +CONFIG_FTRACE_SYSCALLS=y +CONFIG_FUNCTION_TRACER=y +CONFIG_HAVE_DYNAMIC_FTRACE=y +CONFIG_DYNAMIC_FTRACE=y +CONFIG_HAVE_KPROBES=y +CONFIG_KPROBES=y +CONFIG_KPROBE_EVENTS=y +CONFIG_ARCH_SUPPORTS_UPROBES=y +CONFIG_UPROBES=y +CONFIG_UPROBE_EVENTS=y +CONFIG_DEBUG_FS=y +CONFIG_FW_CFG_SYSFS=y +CONFIG_FW_CFG_SYSFS_CMDLINE=y +CONFIG_DRM=y +CONFIG_DRM_VIRTIO_GPU=y +CONFIG_DRM_VIRTIO_GPU_KMS=y +CONFIG_DRM_BOCHS=y +CONFIG_VIRTIO_IOMMU=y +CONFIG_SOUND=y +CONFIG_SND=y +CONFIG_SND_SEQUENCER=y +CONFIG_SND_PCI=y +CONFIG_SND_INTEL8X0=y +CONFIG_SND_HDA_CODEC_REALTEK=y +CONFIG_SECURITYFS=y +CONFIG_CGROUP_BPF=y +CONFIG_SQUASHFS=y +CONFIG_SQUASHFS_XZ=y +CONFIG_SQUASHFS_ZSTD=y +CONFIG_FUSE_FS=y +CONFIG_SERIO=y +CONFIG_PCI=y +CONFIG_INPUT=y +CONFIG_INPUT_KEYBOARD=y +CONFIG_KEYBOARD_ATKBD=y +CONFIG_SERIAL_8250=y +CONFIG_SERIAL_8250_CONSOLE=y +CONFIG_X86_VERBOSE_BOOTUP=y +CONFIG_VGA_CONSOLE=y +CONFIG_FB=y +CONFIG_FB_VESA=y +CONFIG_FRAMEBUFFER_CONSOLE=y +CONFIG_RTC_CLASS=y +CONFIG_RTC_HCTOSYS=y +CONFIG_RTC_DRV_CMOS=y +CONFIG_HYPERVISOR_GUEST=y +CONFIG_PARAVIRT=y +CONFIG_KVM_GUEST=y +CONFIG_KVM=y +CONFIG_KVM_INTEL=y +CONFIG_KVM_AMD=y +CONFIG_VSOCKETS=y +CONFIG_VSOCKETS_DIAG=y +CONFIG_VSOCKETS_LOOPBACK=y +CONFIG_VMWARE_VMCI_VSOCKETS=y +CONFIG_VIRTIO_VSOCKETS=y +CONFIG_VIRTIO_VSOCKETS_COMMON=y +CONFIG_HYPERV_VSOCKETS=y +CONFIG_VMWARE_VMCI=y +CONFIG_VHOST_VSOCK=y +CONFIG_HYPERV=y +CONFIG_UEVENT_HELPER=n +CONFIG_VIRTIO=y +CONFIG_VIRTIO_PCI=y +CONFIG_VIRTIO_MMIO=y +CONFIG_VIRTIO_BALLOON=y +CONFIG_NET=y +CONFIG_NET_CORE=y +CONFIG_NETDEVICES=y +CONFIG_NETWORK_FILESYSTEMS=y +CONFIG_INET=y +CONFIG_NET_9P=y +CONFIG_NET_9P_VIRTIO=y +CONFIG_9P_FS=y +CONFIG_VIRTIO_NET=y +CONFIG_CMDLINE_OVERRIDE=n +CONFIG_BINFMT_SCRIPT=y +CONFIG_SHMEM=y +CONFIG_TMPFS=y +CONFIG_UNIX=y +CONFIG_MODULE_SIG_FORCE=n +CONFIG_DEVTMPFS=y +CONFIG_TTY=y +CONFIG_VT=y +CONFIG_UNIX98_PTYS=y +CONFIG_EARLY_PRINTK=y +CONFIG_INOTIFY_USER=y +CONFIG_BLOCK=y +CONFIG_SCSI_LOWLEVEL=y +CONFIG_SCSI=y +CONFIG_SCSI_VIRTIO=y +CONFIG_BLK_DEV_SD=y +CONFIG_VIRTIO_CONSOLE=y +CONFIG_WATCHDOG=y +CONFIG_WATCHDOG_CORE=y +CONFIG_I6300ESB_WDT=y +CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y +CONFIG_OVERLAY_FS=y +CONFIG_DAX=y +CONFIG_DAX_DRIVER=y +CONFIG_FS_DAX=y +CONFIG_MEMORY_HOTPLUG=y +CONFIG_MEMORY_HOTREMOVE=y +CONFIG_ZONE_DEVICE=y +CONFIG_FUSE_FS=y +CONFIG_VIRTIO_FS=y +CONFIG_VSOCKETS=y +CONFIG_VIRTIO_VSOCKETS=y diff --git a/tools/testing/selftests/vsock/settings b/tools/testing/selftests/vsock/settings new file mode 100644 index 0000000000000000000000000000000000000000..694d70710ff08ac9fc91e9ecb5dbdadcf051f019 --- /dev/null +++ b/tools/testing/selftests/vsock/settings @@ -0,0 +1 @@ +timeout=300 diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh new file mode 100755 index 0000000000000000000000000000000000000000..d083c444dd8e3a5893f7795ae5e5d90fdede6325 --- /dev/null +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -0,0 +1,460 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (c) 2025 Meta Platforms, Inc. and affiliates +# +# Dependencies: +# * virtme-ng +# * busybox-static (used by virtme-ng) +# * qemu (used by virtme-ng) + +readonly SCRIPT_DIR="$(cd -P -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)" +readonly KERNEL_CHECKOUT=$(realpath "${SCRIPT_DIR}"/../../../../) + +source "${SCRIPT_DIR}"/../kselftest/ktap_helpers.sh + +readonly VSOCK_TEST="${SCRIPT_DIR}"/vsock_test +readonly TEST_GUEST_PORT=51000 +readonly TEST_HOST_PORT=50000 +readonly TEST_HOST_PORT_LISTENER=50001 +readonly SSH_GUEST_PORT=22 +readonly SSH_HOST_PORT=2222 +readonly VSOCK_CID=1234 +readonly WAIT_PERIOD=3 +readonly WAIT_PERIOD_MAX=60 +readonly WAIT_TOTAL=$(( WAIT_PERIOD * WAIT_PERIOD_MAX )) +readonly QEMU_PIDFILE=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid) + +# virtme-ng offers a netdev for ssh when using "--ssh", but we also need a +# control port forwarded for vsock_test. Because virtme-ng doesn't support +# adding an additional port to forward to the device created from "--ssh" and +# virtme-init mistakenly sets identical IPs to the ssh device and additional +# devices, we instead opt out of using --ssh, add the device manually, and also +# add the kernel cmdline options that virtme-init uses to setup the interface. +readonly QEMU_TEST_PORT_FWD="hostfwd=tcp::${TEST_HOST_PORT}-:${TEST_GUEST_PORT}" +readonly QEMU_SSH_PORT_FWD="hostfwd=tcp::${SSH_HOST_PORT}-:${SSH_GUEST_PORT}" +readonly QEMU_OPTS="\ + -netdev user,id=n0,${QEMU_TEST_PORT_FWD},${QEMU_SSH_PORT_FWD} \ + -device virtio-net-pci,netdev=n0 \ + -device vhost-vsock-pci,guest-cid=${VSOCK_CID} \ + --pidfile ${QEMU_PIDFILE} \ +" +readonly KERNEL_CMDLINE="virtme.dhcp net.ifnames=0 biosdevname=0 virtme.ssh virtme_ssh_user=$USER" +readonly LOG=$(mktemp /tmp/vsock_vmtest_XXXX.log) +readonly TEST_NAMES=(vm_server_host_client vm_client_host_server vm_loopback) +readonly TEST_DESCS=( + "Run vsock_test in server mode on the VM and in client mode on the host." + "Run vsock_test in client mode on the VM and in server mode on the host." + "Run vsock_test using the loopback transport in the VM." +) + +VERBOSE=0 + +usage() { + local name + local desc + local i + + echo + echo "$0 [OPTIONS] [TEST]..." + echo "If no TEST argument is given, all tests will be run." + echo + echo "Options" + echo " -b: build the kernel from the current source tree and use it for guest VMs" + echo " -q: set the path to or name of qemu binary" + echo " -v: verbose output" + echo + echo "Available tests" + + for ((i = 0; i < ${#TEST_NAMES[@]}; i++)); do + name=${TEST_NAMES[${i}]} + desc=${TEST_DESCS[${i}]} + printf "\t%-35s%-35s\n" "${name}" "${desc}" + done + echo + + exit 1 +} + +die() { + echo "$*" >&2 + exit "${KSFT_FAIL}" +} + +vm_ssh() { + ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost "$@" + return $? +} + +cleanup() { + if [[ -s "${QEMU_PIDFILE}" ]]; then + pkill -SIGTERM -F "${QEMU_PIDFILE}" > /dev/null 2>&1 + fi + + # If failure occurred during or before qemu start up, then we need + # to clean this up ourselves. + if [[ -e "${QEMU_PIDFILE}" ]]; then + rm "${QEMU_PIDFILE}" + fi +} + +check_args() { + local found + + for arg in "$@"; do + found=0 + for name in "${TEST_NAMES[@]}"; do + if [[ "${name}" = "${arg}" ]]; then + found=1 + break + fi + done + + if [[ "${found}" -eq 0 ]]; then + echo "${arg} is not an available test" >&2 + usage + fi + done + + for arg in "$@"; do + if ! command -v > /dev/null "test_${arg}"; then + echo "Test ${arg} not found" >&2 + usage + fi + done +} + +check_deps() { + for dep in vng ${QEMU} busybox pkill ssh; do + if [[ ! -x $(command -v "${dep}") ]]; then + echo -e "skip: dependency ${dep} not found!\n" + exit "${KSFT_SKIP}" + fi + done + + if [[ ! -x $(command -v "${VSOCK_TEST}") ]]; then + printf "skip: %s not found!" "${VSOCK_TEST}" + printf " Please build the kselftest vsock target.\n" + exit "${KSFT_SKIP}" + fi +} + +handle_build() { + if [[ ! "${BUILD}" -eq 1 ]]; then + return + fi + + if [[ ! -d "${KERNEL_CHECKOUT}" ]]; then + echo "-b requires vmtest.sh called from the kernel source tree" >&2 + exit 1 + fi + + pushd "${KERNEL_CHECKOUT}" &>/dev/null + + if ! vng --kconfig --config "${SCRIPT_DIR}"/config; then + die "failed to generate .config for kernel source tree (${KERNEL_CHECKOUT})" + fi + + if ! make -j$(nproc); then + die "failed to build kernel from source tree (${KERNEL_CHECKOUT})" + fi + + popd &>/dev/null +} + +vm_start() { + local logfile=/dev/null + local verbose_opt="" + local kernel_opt="" + local qemu + + qemu=$(command -v "${QEMU}") + + if [[ "${VERBOSE}" -eq 1 ]]; then + verbose_opt="--verbose" + logfile=/dev/stdout + fi + + if [[ "${BUILD}" -eq 1 ]]; then + kernel_opt="${KERNEL_CHECKOUT}" + fi + + vng \ + --run \ + ${kernel_opt} \ + ${verbose_opt} \ + --qemu-opts="${QEMU_OPTS}" \ + --qemu="${qemu}" \ + --user root \ + --append "${KERNEL_CMDLINE}" \ + --rw &> ${logfile} & + + if ! timeout ${WAIT_TOTAL} \ + bash -c 'while [[ ! -s '"${QEMU_PIDFILE}"' ]]; do sleep 1; done; exit 0'; then + die "failed to boot VM" + fi +} + +vm_wait_for_ssh() { + local i + + i=0 + while true; do + if [[ ${i} -gt ${WAIT_PERIOD_MAX} ]]; then + die "Timed out waiting for guest ssh" + fi + if vm_ssh -- true; then + break + fi + i=$(( i + 1 )) + sleep ${WAIT_PERIOD} + done +} + +# derived from selftests/net/net_helper.sh +wait_for_listener() +{ + local port=$1 + local interval=$2 + local max_intervals=$3 + local protocol=tcp + local pattern + local i + + pattern=":$(printf "%04X" "${port}") " + + # for tcp protocol additionally check the socket state + [ "${protocol}" = "tcp" ] && pattern="${pattern}0A" + for i in $(seq "${max_intervals}"); do + if awk '{print $2" "$4}' /proc/net/"${protocol}"* | \ + grep -q "${pattern}"; then + break + fi + sleep "${interval}" + done +} + +vm_wait_for_listener() { + local port=$1 + + vm_ssh <<EOF +$(declare -f wait_for_listener) +wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX} +EOF +} + +host_wait_for_listener() { + wait_for_listener "${TEST_HOST_PORT_LISTENER}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}" +} + +__log_stdin() { + cat | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }' +} + +__log_args() { + echo "$*" | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }' +} + +log() { + local prefix="$1" + + shift + local redirect= + if [[ ${VERBOSE} -eq 0 ]]; then + redirect=/dev/null + else + redirect=/dev/stdout + fi + + if [[ "$#" -eq 0 ]]; then + __log_stdin | tee -a "${LOG}" > ${redirect} + else + __log_args "$@" | tee -a "${LOG}" > ${redirect} + fi +} + +log_setup() { + log "setup" "$@" +} + +log_host() { + local testname=$1 + + shift + log "test:${testname}:host" "$@" +} + +log_guest() { + local testname=$1 + + shift + log "test:${testname}:guest" "$@" +} + +test_vm_server_host_client() { + local testname="${FUNCNAME[0]#test_}" + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=server \ + --control-port="${TEST_GUEST_PORT}" \ + --peer-cid=2 \ + 2>&1 | log_guest "${testname}" & + + vm_wait_for_listener "${TEST_GUEST_PORT}" + + ${VSOCK_TEST} \ + --mode=client \ + --control-host=127.0.0.1 \ + --peer-cid="${VSOCK_CID}" \ + --control-port="${TEST_HOST_PORT}" 2>&1 | log_host "${testname}" + + return $? +} + +test_vm_client_host_server() { + local testname="${FUNCNAME[0]#test_}" + + ${VSOCK_TEST} \ + --mode "server" \ + --control-port "${TEST_HOST_PORT_LISTENER}" \ + --peer-cid "${VSOCK_CID}" 2>&1 | log_host "${testname}" & + + host_wait_for_listener + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=client \ + --control-host=10.0.2.2 \ + --peer-cid=2 \ + --control-port="${TEST_HOST_PORT_LISTENER}" 2>&1 | log_guest "${testname}" + + return $? +} + +test_vm_loopback() { + local testname="${FUNCNAME[0]#test_}" + local port=60000 # non-forwarded local port + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=server \ + --control-port="${port}" \ + --peer-cid=1 2>&1 | log_guest "${testname}" & + + vm_wait_for_listener "${port}" + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=client \ + --control-host="127.0.0.1" \ + --control-port="${port}" \ + --peer-cid=1 2>&1 | log_guest "${testname}" + + return $? +} + +run_test() { + local host_oops_cnt_before + local host_warn_cnt_before + local vm_oops_cnt_before + local vm_warn_cnt_before + local host_oops_cnt_after + local host_warn_cnt_after + local vm_oops_cnt_after + local vm_warn_cnt_after + local name + local rc + + host_oops_cnt_before=$(dmesg | grep -c -i 'Oops') + host_warn_cnt_before=$(dmesg --level=warn | wc -l) + vm_oops_cnt_before=$(vm_ssh -- dmesg | grep -c -i 'Oops') + vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | wc -l) + + name=$(echo "${1}" | awk '{ print $1 }') + eval test_"${name}" + rc=$? + + host_oops_cnt_after=$(dmesg | grep -i 'Oops' | wc -l) + if [[ ${host_oops_cnt_after} -gt ${host_oops_cnt_before} ]]; then + echo "FAIL: kernel oops detected on host" | log_host "${name}" + rc=$KSFT_FAIL + fi + + host_warn_cnt_after=$(dmesg --level=warn | wc -l) + if [[ ${host_warn_cnt_after} -gt ${host_warn_cnt_before} ]]; then + echo "FAIL: kernel warning detected on host" | log_host "${name}" + rc=$KSFT_FAIL + fi + + vm_oops_cnt_after=$(vm_ssh -- dmesg | grep -i 'Oops' | wc -l) + if [[ ${vm_oops_cnt_after} -gt ${vm_oops_cnt_before} ]]; then + echo "FAIL: kernel oops detected on vm" | log_host "${name}" + rc=$KSFT_FAIL + fi + + vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | wc -l) + if [[ ${vm_warn_cnt_after} -gt ${vm_warn_cnt_before} ]]; then + echo "FAIL: kernel warning detected on vm" | log_host "${name}" + rc=$KSFT_FAIL + fi + + return "${rc}" +} + +QEMU="qemu-system-$(uname -m)" + +while getopts :hvsq:b o +do + case $o in + v) VERBOSE=1;; + b) BUILD=1;; + q) QEMU=$OPTARG;; + h|*) usage;; + esac +done +shift $((OPTIND-1)) + +trap cleanup EXIT + +if [[ ${#} -eq 0 ]]; then + ARGS=("${TEST_NAMES[@]}") +else + ARGS=("$@") +fi + +check_args "${ARGS[@]}" +check_deps +handle_build + +echo "1..${#ARGS[@]}" + +log_setup "Booting up VM" +vm_start +vm_wait_for_ssh +log_setup "VM booted up" + +cnt_pass=0 +cnt_fail=0 +cnt_skip=0 +cnt_total=0 +for arg in "${ARGS[@]}"; do + run_test "${arg}" + rc=$? + if [[ ${rc} -eq $KSFT_PASS ]]; then + cnt_pass=$(( cnt_pass + 1 )) + echo "ok ${cnt_total} ${arg}" + elif [[ ${rc} -eq $KSFT_SKIP ]]; then + cnt_skip=$(( cnt_skip + 1 )) + echo "ok ${cnt_total} ${arg} # SKIP" + elif [[ ${rc} -eq $KSFT_FAIL ]]; then + cnt_fail=$(( cnt_fail + 1 )) + echo "not ok ${cnt_total} ${arg} # exit=$rc" + fi + cnt_total=$(( cnt_total + 1 )) +done + +echo "SUMMARY: PASS=${cnt_pass} SKIP=${cnt_skip} FAIL=${cnt_fail}" +echo "Log: ${LOG}" + +if [ $((cnt_pass + cnt_skip)) -eq ${cnt_total} ]; then + exit "$KSFT_PASS" +else + exit "$KSFT_FAIL" +fi --- base-commit: 8066e388be48f1ad62b0449dc1d31a25489fa12a change-id: 20250325-vsock-vmtest-b3a21d2102c2 Best regards, -- Bobby Eshleman <bobbyeshleman(a)gmail.com>

1 month, 1 week

2
4
0 0

[PATCH net-next V10 0/6] Support rate management on traffic classes in devlink and mlx5

by Tariq Toukan

Hi, This patch series by Carolina is V10 of the feature that adds rate management support on traffic classes in devlink and mlx5, see full description by Carolina below [0]. V10: - Added netdevsim selftest for tc-bw ops. - Dropped header: field as it’s unnecessary for local constants in devlink.yaml. Regards, Tariq [0] This patch series extends the devlink-rate API to support traffic class (TC) bandwidth management, enabling more granular control over traffic shaping and rate limiting across multiple TCs. The API now allows users to specify bandwidth proportions for different traffic classes in a single command. This is particularly useful for managing Enhanced Transmission Selection (ETS) for groups of Virtual Functions (VFs), allowing precise bandwidth allocation across traffic classes. Additionally the series refines the QoS handling in net/mlx5 to support TC arbitration and bandwidth management on vports and rate nodes. Discussions on traffic class shaping in net-shapers began in V5 [1], where we discussed with maintainers whether net-shapers should support traffic classes and how this could be implemented. Later, after further conversations with Paolo Abeni and Simon Horman, Cosmin provided an update [2], confirming that net-shapers' tree-based hierarchy aligns well with traffic classes when treated as distinct subsets of netdev queues. Since mlx5 enforces a 1:1 mapping between TX queues and traffic classes, this approach seems feasible, though some open questions remain regarding queue reconfiguration and certain mlx5 scheduling behaviors. Building on that discussion, Cosmin has now shared a concrete implementation plan on the netdev mailing list [3]. The plan, developed in collaboration with Paolo and Simon, outlines how net-shapers can be extended to support the same use cases currently covered by devlink-rate, with the eventual goal of aligning both and simplifying the shaping infrastructure in the kernel. This work was presented at Netdev 0x19 in Zagreb [4]. There we presented how TC scheduling is enforced in mlx5 hardware, which led to discussions on the mailing list. A summary of how things work: Classification means labeling a packet with a traffic class based on the packet's DSCP or VLAN PCP field, then treating packets with different traffic classes differently during transmit processing. In a virtualized setup, VFs are untrusted and do not control classification or shaping. Classification is done by the hardware using a prio-to-TC mapping set by the hypervisor. VFs only select which send queue to use and are expected to respect the classification logic by sending each traffic class on its dedicated queue. As stated in the net-shapers plan [3], each transmit queue should carry only a single traffic class. Mixing classes in a single queue can lead to HOL blocking. In the mlx5 implementation, if the queue used does not match the classified traffic class, the hardware moves the queue to the correct TC scheduler. This movement is not a reclassification; it’s a necessary enforcement step to ensure traffic class isolation is maintained. Extend devlink-rate API to support rate management on TCs: - devlink: Extend the devlink rate API to support traffic class bandwidth management Introduce a no-op implementation: - net/mlx5: Add no-op implementation for setting tc-bw on rate objects Add support for enabling and disabling TC QoS on vports and nodes: - net/mlx5: Add support for setting tc-bw on nodes - net/mlx5: Add traffic class scheduling support for vport QoS Support for setting tc-bw on rate objects: - net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw [1] https://lore.kernel.org/netdev/20241204220931.254964-1-tariqt@nvidia.com/ [2] https://lore.kernel.org/netdev/67df1a562614b553dcab043f347a0d7c5393ff83.cam… [3] https://lore.kernel.org/netdev/d9831d0c940a7b77419abe7c7330e822bbfd1cfb.cam… [4] https://netdevconf.info/0x19/sessions/talk/optimizing-bandwidth-allocation-… Carolina Jubran (6): devlink: Extend devlink rate API with traffic classes bandwidth management selftest: netdevsim: Add devlink rate tc-bw test net/mlx5: Add no-op implementation for setting tc-bw on rate objects net/mlx5: Add support for setting tc-bw on nodes net/mlx5: Add traffic class scheduling support for vport QoS net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw Documentation/netlink/specs/devlink.yaml | 35 +- .../networking/devlink/devlink-port.rst | 7 + .../net/ethernet/mellanox/mlx5/core/devlink.c | 2 + .../net/ethernet/mellanox/mlx5/core/esw/qos.c | 1007 ++++++++++++++++- .../net/ethernet/mellanox/mlx5/core/esw/qos.h | 8 + .../net/ethernet/mellanox/mlx5/core/eswitch.h | 14 +- drivers/net/netdevsim/dev.c | 43 + drivers/net/netdevsim/netdevsim.h | 1 + include/net/devlink.h | 6 + include/uapi/linux/devlink.h | 9 + net/devlink/netlink_gen.c | 15 +- net/devlink/netlink_gen.h | 1 + net/devlink/rate.c | 127 +++ .../drivers/net/netdevsim/devlink.sh | 51 + 14 files changed, 1289 insertions(+), 37 deletions(-) base-commit: f685204c57e87d2a88b159c7525426d70ee745c9 -- 2.31.1

1 month, 1 week

5
15
0 0

[PATCH] rust: kunit: use crate-level mapping for `c_void`

by Jesung Yang

Use `kernel::ffi::c_void` instead of `core::ffi::c_void` for consistency and to centralize abstraction. Since `kernel::ffi::c_void` is a transparent wrapper around `core::ffi::c_void`, both are functionally equivalent. However, using `kernel::ffi::c_void` improves consistency across the kernel's Rust code and provides a unified reference point in case the definition ever needs to change, even if such a change is unlikely. Signed-off-by: Jesung Yang <y.j3ms.n(a)gmail.com> --- rust/kernel/kunit.rs | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs index 81833a687b75..bd6fc712dd79 100644 --- a/rust/kernel/kunit.rs +++ b/rust/kernel/kunit.rs @@ -6,7 +6,8 @@ //! //! Reference: <https://docs.kernel.org/dev-tools/kunit/index.html> -use core::{ffi::c_void, fmt}; +use core::fmt; +use kernel::ffi::c_void; /// Prints a KUnit error-level message. /// -- 2.39.5

1 month, 1 week

3
5
0 0

[PATCH bpf-next v3 0/2] bpf, arm64: support up to 12 arguments

by Alexis Lothoré

Hello, this is the v2 of the many args series for arm64, being itself a revival of Xu Kuhoai's work to enable larger arguments count for BPF programs on ARM64 ([1]). The discussions in v1 shed some light on some issues around specific cases, for example with functions passing struct on stack with custom packing/alignment attributes: those cases can not be properly detected with the current BTF info. So this new revision aims to separate concerns with a simpler implementation, just accepting additional args on stack if we can make sure about the alignment constraints (and so, refusing attachment to functions passing structs on stacks). I then checked if the specific alignment constraints could be checked with larger scalar types rather than structs, but it appears that this use case is in fact rejected at the verifier level (see a9b59159d338 ("bpf: Do not allow btf_ctx_access with __int128 types")). So in the end the specific alignment corner cases raised in [1] can not really happen in the kernel in its current state. This new revision still brings support for the standard cases as a first step, it will then be possible to iterate on top of it to add the more specific cases like struct passed on stack and larger types. [1] https://lore.kernel.org/all/20230917150752.69612-1-xukuohai@huaweicloud.com… Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com> --- Changes in v3: - switch back -EOPNOTSUPP to -ENOTSUPP - fix comment style - group intializations for arg_aux - remove some unneeded round_up - Link to v2: https://lore.kernel.org/r/20250522-many_args_arm64-v2-0-d6afdb9cf819@bootli… Changes in v2: - remove alignment computation from btf.c - deduce alignment constraints directly in jit compiler for simple types - deny attachment to functions with "corner-cases" arguments (ie: structs on stack) - remove custom tests, as the corresponding use cases are locked either by the JIT comp or the verifier - drop RFC - Link to v1: https://lore.kernel.org/r/20250411-many_args_arm64-v1-0-0a32fe72339e@bootli… --- Alexis Lothoré (eBPF Foundation) (1): selftests/bpf: enable many-args tests for arm64 Xu Kuohai (1): bpf, arm64: Support up to 12 function arguments arch/arm64/net/bpf_jit_comp.c | 225 ++++++++++++++++++++------- tools/testing/selftests/bpf/DENYLIST.aarch64 | 2 - 2 files changed, 171 insertions(+), 56 deletions(-) --- base-commit: 9435138c069117cd59a4912b5ea2ae44cc2c5ffa change-id: 20250220-many_args_arm64-8bd3747e6948 Best regards, -- Alexis Lothoré, Bootlin Embedded Linux and Kernel engineering https://bootlin.com

1 month, 1 week

4
4
0 0

[PATCH bpf-next v7 0/5] Replace CONFIG_DMABUF_SYSFS_STATS with BPF

by T.J. Mercier

Until CONFIG_DMABUF_SYSFS_STATS was added [1] it was only possible to perform per-buffer accounting with debugfs which is not suitable for production environments. Eventually we discovered the overhead with per-buffer sysfs file creation/removal was significantly impacting allocation and free times, and exacerbated kernfs lock contention. [2] dma_buf_stats_setup() is responsible for 39% of single-page buffer creation duration, or 74% of single-page dma_buf_export() duration when stressing dmabuf allocations and frees. I prototyped a change from per-buffer to per-exporter statistics with a RCU protected list of exporter allocations that accommodates most (but not all) of our use-cases and avoids almost all of the sysfs overhead. While that adds less overhead than per-buffer sysfs, and less even than the maintenance of the dmabuf debugfs_list, it's still *additional* overhead on top of the debugfs_list and doesn't give us per-buffer info. This series uses the existing dmabuf debugfs_list to implement a BPF dmabuf iterator, which adds no overhead to buffer allocation/free and provides per-buffer info. The list has been moved outside of CONFIG_DEBUG_FS scope so that it is always populated. The BPF program loaded by userspace that extracts per-buffer information gets to define its own interface which avoids the lack of ABI stability with debugfs. This will allow us to replace our use of CONFIG_DMABUF_SYSFS_STATS, and the plan is to remove it from the kernel after the next longterm stable release. [1] https://lore.kernel.org/linux-media/20201210044400.1080308-1-hridya@google.… [2] https://lore.kernel.org/all/20220516171315.2400578-1-tjmercier@google.com v1: https://lore.kernel.org/all/20250414225227.3642618-1-tjmercier@google.com v1 -> v2: Make the DMA buffer list independent of CONFIG_DEBUG_FS per Christian König Add CONFIG_DMA_SHARED_BUFFER check to kernel/bpf/Makefile per kernel test robot Use BTF_ID_LIST_SINGLE instead of BTF_ID_LIST_GLOBAL_SINGLE per Song Liu Fixup comment style, mixing code/declarations, and use ASSERT_OK_FD in selftest per Song Liu Add BPF_ITER_RESCHED feature to bpf_dmabuf_reg_info per Alexei Starovoitov Add open-coded iterator and selftest per Alexei Starovoitov Add a second test buffer from the system dmabuf heap to selftests Use the BPF program we'll use in production for selftest per Alexei Starovoitov https://r.android.com/c/platform/system/bpfprogs/+/3616123/2/dmabufIter.c https://r.android.com/c/platform/system/memory/libmeminfo/+/3614259/1/libdm… v2: https://lore.kernel.org/all/20250504224149.1033867-1-tjmercier@google.com v2 -> v3: Rebase onto bpf-next/master Move get_next_dmabuf() into drivers/dma-buf/dma-buf.c, along with the new get_first_dmabuf(). This avoids having to expose the dmabuf list and mutex to the rest of the kernel, and keeps the dmabuf mutex operations near each other in the same file. (Christian König) Add Christian's RB to dma-buf: Rename debugfs symbols Drop RFC: dma-buf: Remove DMA-BUF statistics v3: https://lore.kernel.org/all/20250507001036.2278781-1-tjmercier@google.com v3 -> v4: Fix selftest BPF program comment style (not kdoc) per Alexei Starovoitov Fix dma-buf.c kdoc comment style per Alexei Starovoitov Rename get_first_dmabuf / get_next_dmabuf to dma_buf_iter_begin / dma_buf_iter_next per Christian König Add Christian's RB to bpf: Add dmabuf iterator v4: https://lore.kernel.org/all/20250508182025.2961555-1-tjmercier@google.com v4 -> v5: Add Christian's Acks to all patches Add Song Liu's Acks Move BTF_ID_LIST_SINGLE and DEFINE_BPF_ITER_FUNC closer to usage per Song Liu Fix open-coded iterator comment style per Song Liu Move iterator termination check to its own subtest per Song Liu Rework selftest buffer creation per Song Liu Fix spacing in sanitize_string per BPF CI v5: https://lore.kernel.org/all/20250512174036.266796-1-tjmercier@google.com v5 -> v6: Song Liu: Init test buffer FDs to -1 Zero-init udmabuf_create for future proofing Bail early for iterator fd/FILE creation failure Dereference char ptr to check for NUL in sanitize_string() Move map insertion from create_test_buffers() to test_dmabuf_iter() Add ACK to selftests/bpf: Add test for open coded dmabuf_iter v6: https://lore.kernel.org/all/20250513163601.812317-1-tjmercier@google.com v6 -> v7: Zero uninitialized name bytes following the end of name strings per s390x BPF CI Reorder sanitize_string bounds checks per Song Liu Add Song's Ack to: selftests/bpf: Add test for dmabuf_iter Rebase onto bpf-next/master per BPF CI T.J. Mercier (5): dma-buf: Rename debugfs symbols bpf: Add dmabuf iterator bpf: Add open coded dmabuf iterator selftests/bpf: Add test for dmabuf_iter selftests/bpf: Add test for open coded dmabuf_iter drivers/dma-buf/dma-buf.c | 98 ++++-- include/linux/dma-buf.h | 4 +- kernel/bpf/Makefile | 3 + kernel/bpf/dmabuf_iter.c | 150 +++++++++ kernel/bpf/helpers.c | 5 + .../testing/selftests/bpf/bpf_experimental.h | 5 + tools/testing/selftests/bpf/config | 3 + .../selftests/bpf/prog_tests/dmabuf_iter.c | 285 ++++++++++++++++++ .../testing/selftests/bpf/progs/dmabuf_iter.c | 101 +++++++ 9 files changed, 632 insertions(+), 22 deletions(-) create mode 100644 kernel/bpf/dmabuf_iter.c create mode 100644 tools/testing/selftests/bpf/prog_tests/dmabuf_iter.c create mode 100644 tools/testing/selftests/bpf/progs/dmabuf_iter.c base-commit: 6888a036cfc3d617d0843ecc9bd8504e91fb9de6 -- 2.49.0.1151.ga128411c76-goog

1 month, 1 week

2
6
0 0

[PATCH v2 00/11] kunit: Introduce UAPI testing framework

by Thomas Weißschuh

Currently testing of userspace and in-kernel API use two different frameworks. kselftests for the userspace ones and Kunit for the in-kernel ones. Besides their different scopes, both have different strengths and limitations: Kunit: * Tests are normal kernel code. * They use the regular kernel toolchain. * They can be packaged and distributed as modules conveniently. Kselftests: * Tests are normal userspace code * They need a userspace toolchain. A kernel cross toolchain is likely not enough. * A fair amout of userland is required to run the tests, which means a full distro or handcrafted rootfs. * There is no way to conveniently package and run kselftests with a given kernel image. * The kselftests makefiles are not as powerful as regular kbuild. For example they are missing proper header dependency tracking or more complex compiler option modifications. Therefore kunit is much easier to run against different kernel configurations and architectures. This series aims to combine kselftests and kunit, avoiding both their limitations. It works by compiling the userspace kselftests as part of the regular kernel build, embedding them into the kunit kernel or module and executing them from there. If the kernel toolchain is not fit to produce userspace because of a missing libc, the kernel's own nolibc can be used instead. The structured TAP output from the kselftest is integrated into the kunit KTAP output transparently, the kunit parser can parse the combined logs together. Further room for improvements: * Call each test in its completely dedicated namespace * Handle additional test files besides the test executable through archives. CPIO, cramfs, etc. * Compatibility with kselftest_harness.h (in progress) * Expose the blobs in debugfs * Provide some convience wrappers around compat userprogs * Figure out a migration path/coexistence solution for kunit UAPI and tools/testing/selftests/ Output from the kunit example testcase, note the output of "example_uapi_tests". $ ./tools/testing/kunit/kunit.py run --kunitconfig lib/kunit example ... Running tests with: $ .kunit/linux kunit.filter_glob=example kunit.enable=1 mem=1G console=tty kunit_shutdown=halt [11:53:53] ================== example (10 subtests) =================== [11:53:53] [PASSED] example_simple_test [11:53:53] [SKIPPED] example_skip_test [11:53:53] [SKIPPED] example_mark_skipped_test [11:53:53] [PASSED] example_all_expect_macros_test [11:53:53] [PASSED] example_static_stub_test [11:53:53] [PASSED] example_static_stub_using_fn_ptr_test [11:53:53] [PASSED] example_priv_test [11:53:53] =================== example_params_test =================== [11:53:53] [SKIPPED] example value 3 [11:53:53] [PASSED] example value 2 [11:53:53] [PASSED] example value 1 [11:53:53] [SKIPPED] example value 0 [11:53:53] =============== [PASSED] example_params_test =============== [11:53:53] [PASSED] example_slow_test [11:53:53] ======================= (4 subtests) ======================= [11:53:53] [PASSED] procfs [11:53:53] [PASSED] userspace test 2 [11:53:53] [SKIPPED] userspace test 3: some reason [11:53:53] [PASSED] userspace test 4 [11:53:53] ================ [PASSED] example_uapi_test ================ [11:53:53] ===================== [PASSED] example ===================== [11:53:53] ============================================================ [11:53:53] Testing complete. Ran 16 tests: passed: 11, skipped: 5 [11:53:53] Elapsed time: 67.543s total, 1.823s configuring, 65.655s building, 0.058s running Based on v6.15-rc1. Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- Changes in v2: - Rebase onto v6.15-rc1 - Add documentation and kernel docs - Resolve invalid kconfig breakages - Drop already applied patch "kbuild: implement CONFIG_HEADERS_INSTALL for Usermode Linux" - Drop userprogs CONFIG_WERROR integration, it doesn't need to be part of this series - Replace patch prefix "kconfig" with "kbuild" - Rename kunit_uapi_run_executable() to kunit_uapi_run_kselftest() - Generate private, conflict-free symbols in the blob framework - Handle kselftest exit codes - Handle SIGABRT - Forward output also to kunit debugfs log - Install a fd=0 stdin filedescriptor - Link to v1: https://lore.kernel.org/r/20250217-kunit-kselftests-v1-0-42b4524c3b0a@linut… --- Thomas Weißschuh (11): kbuild: userprogs: add nolibc support kbuild: introduce CONFIG_ARCH_HAS_NOLIBC kbuild: doc: add label for userprogs section kbuild: introduce blob framework kunit: tool: Add test for nested test result reporting kunit: tool: Don't overwrite test status based on subtest counts kunit: tool: Parse skipped tests from kselftest.h kunit: Introduce UAPI testing framework kunit: uapi: Add example for UAPI tests kunit: uapi: Introduce preinit executable kunit: uapi: Validate usability of /proc Documentation/dev-tools/kunit/api/index.rst | 5 + Documentation/dev-tools/kunit/api/uapi.rst | 12 + Documentation/kbuild/makefiles.rst | 37 ++- MAINTAINERS | 2 + include/kunit/uapi.h | 24 ++ include/linux/blob.h | 32 +++ init/Kconfig | 2 + lib/kunit/.kunitconfig | 2 + lib/kunit/Kconfig | 11 + lib/kunit/Makefile | 18 +- lib/kunit/kunit-example-test.c | 15 ++ lib/kunit/kunit-example-uapi.c | 56 ++++ lib/kunit/uapi-preinit.c | 65 +++++ lib/kunit/uapi.c | 294 +++++++++++++++++++++ scripts/Makefile.blobs | 19 ++ scripts/Makefile.build | 6 + scripts/Makefile.clean | 2 +- scripts/Makefile.userprogs | 16 +- scripts/blob-wrap.c | 27 ++ tools/include/nolibc/Kconfig.nolibc | 13 + tools/testing/kunit/kunit_parser.py | 13 +- tools/testing/kunit/kunit_tool_test.py | 9 + .../test_is_test_passed-failure-nested.log | 10 + .../test_data/test_is_test_passed-kselftest.log | 3 +- 24 files changed, 682 insertions(+), 11 deletions(-) --- base-commit: bf9962cc9ec3ac1dae2bf81b126657c1c49c348a change-id: 20241015-kunit-kselftests-56273bc40442 Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

1 month, 1 week

2
22
0 0