- Linux-kselftest-mirror - lists.linaro.org

[PATCH v3] selftests/ptrace/get_syscall_info: fix for MIPS n32

by Dmitry V. Levin

MIPS n32 is one of two ILP32 architectures supported by the kernel that have 64-bit syscall arguments (another one is x32). When this test passed 32-bit arguments to syscall(), they were sign-extended in libc, PTRACE_GET_SYSCALL_INFO reported these sign-extended 64-bit values, and the test complained about the mismatch. Fix this by passing arguments of the appropriate type to syscall(), which is "unsigned long long" on MIPS n32, and __kernel_ulong_t on other architectures, the same way as selftests/ptrace/set_syscall_info already does. As a side effect, this also extends the test on all 64-bit architectures by choosing constants that don't fit into 32-bit integers. Signed-off-by: Dmitry V. Levin <ldv(a)strace.io> Acked-by: Shuah Khan <skhan(a)linuxfoundation.org> --- v3: Added Acked-by: https://lore.kernel.org/all/b2e62143-fa68-4cd1-bf6c-67f0ad49c670@linuxfound… v2: Fixed MIPS #ifdef: https://lore.kernel.org/all/20250115233747.GA28541@strace.io/ .../selftests/ptrace/get_syscall_info.c | 53 +++++++++++-------- 1 file changed, 32 insertions(+), 21 deletions(-) diff --git a/tools/testing/selftests/ptrace/get_syscall_info.c b/tools/testing/selftests/ptrace/get_syscall_info.c index 5bcd1c7b5be6..2970f72d66d3 100644 --- a/tools/testing/selftests/ptrace/get_syscall_info.c +++ b/tools/testing/selftests/ptrace/get_syscall_info.c @@ -11,8 +11,19 @@ #include <err.h> #include <signal.h> #include <asm/unistd.h> +#include <linux/types.h> #include "linux/ptrace.h" +#if defined(_MIPS_SIM) && _MIPS_SIM == _MIPS_SIM_NABI32 +/* + * MIPS N32 is the only architecture where __kernel_ulong_t + * does not match the bitness of syscall arguments. + */ +typedef unsigned long long kernel_ulong_t; +#else +typedef __kernel_ulong_t kernel_ulong_t; +#endif + static int kill_tracee(pid_t pid) { @@ -42,37 +53,37 @@ sys_ptrace(int request, pid_t pid, unsigned long addr, unsigned long data) TEST(get_syscall_info) { - static const unsigned long args[][7] = { + const kernel_ulong_t args[][7] = { /* a sequence of architecture-agnostic syscalls */ { __NR_chdir, - (unsigned long) "", - 0xbad1fed1, - 0xbad2fed2, - 0xbad3fed3, - 0xbad4fed4, - 0xbad5fed5 + (uintptr_t) "", + (kernel_ulong_t) 0xdad1bef1bad1fed1ULL, + (kernel_ulong_t) 0xdad2bef2bad2fed2ULL, + (kernel_ulong_t) 0xdad3bef3bad3fed3ULL, + (kernel_ulong_t) 0xdad4bef4bad4fed4ULL, + (kernel_ulong_t) 0xdad5bef5bad5fed5ULL }, { __NR_gettid, - 0xcaf0bea0, - 0xcaf1bea1, - 0xcaf2bea2, - 0xcaf3bea3, - 0xcaf4bea4, - 0xcaf5bea5 + (kernel_ulong_t) 0xdad0bef0caf0bea0ULL, + (kernel_ulong_t) 0xdad1bef1caf1bea1ULL, + (kernel_ulong_t) 0xdad2bef2caf2bea2ULL, + (kernel_ulong_t) 0xdad3bef3caf3bea3ULL, + (kernel_ulong_t) 0xdad4bef4caf4bea4ULL, + (kernel_ulong_t) 0xdad5bef5caf5bea5ULL }, { __NR_exit_group, 0, - 0xfac1c0d1, - 0xfac2c0d2, - 0xfac3c0d3, - 0xfac4c0d4, - 0xfac5c0d5 + (kernel_ulong_t) 0xdad1bef1fac1c0d1ULL, + (kernel_ulong_t) 0xdad2bef2fac2c0d2ULL, + (kernel_ulong_t) 0xdad3bef3fac3c0d3ULL, + (kernel_ulong_t) 0xdad4bef4fac4c0d4ULL, + (kernel_ulong_t) 0xdad5bef5fac5c0d5ULL } }; - const unsigned long *exp_args; + const kernel_ulong_t *exp_args; pid_t pid = fork(); @@ -154,7 +165,7 @@ TEST(get_syscall_info) } ASSERT_LT(0, (rc = sys_ptrace(PTRACE_GET_SYSCALL_INFO, pid, size, - (unsigned long) &info))) { + (uintptr_t) &info))) { LOG_KILL_TRACEE("PTRACE_GET_SYSCALL_INFO: %m"); } ASSERT_EQ(expected_none_size, rc) { @@ -177,7 +188,7 @@ TEST(get_syscall_info) case SIGTRAP | 0x80: ASSERT_LT(0, (rc = sys_ptrace(PTRACE_GET_SYSCALL_INFO, pid, size, - (unsigned long) &info))) { + (uintptr_t) &info))) { LOG_KILL_TRACEE("PTRACE_GET_SYSCALL_INFO: %m"); } switch (ptrace_stop) { -- ldv

7 months, 1 week

1
0
0 0

[PATCH v2] selftests/filesystems: Fix build of anon_inode_test

by Mark Brown

The newly added anon_inode_test test fails to build due to attempting to include a nonexisting overlayfs/wrapper.h: anon_inode_test.c:10:10: fatal error: overlayfs/wrappers.h: No such file or directory 10 | #include "overlayfs/wrappers.h" | ^~~~~~~~~~~~~~~~~~~~~~ This is due to 0bd92b9fe538 ("selftests/filesystems: move wrapper.h out of overlayfs subdir") which was added in the vfs-6.16.selftests branch which was based on -rc5 and did not contain the newly added test so once things were merged into mainline the build started failing - both parent commits are fine. Fixes: 3e406741b1989 ("Merge tag 'vfs-6.16-rc1.selftests' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs") Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Changes in v2: - Rebase onto mainline and adjust fixes commit now the two branches got merged there. - Link to v1: https://lore.kernel.org/r/20250518-selftests-anon-inode-build-v1-1-71eff818… --- tools/testing/selftests/filesystems/anon_inode_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/filesystems/anon_inode_test.c b/tools/testing/selftests/filesystems/anon_inode_test.c index e8e0ef1460d2..73e0a4d4fb2f 100644 --- a/tools/testing/selftests/filesystems/anon_inode_test.c +++ b/tools/testing/selftests/filesystems/anon_inode_test.c @@ -7,7 +7,7 @@ #include <sys/stat.h> #include "../kselftest_harness.h" -#include "overlayfs/wrappers.h" +#include "wrappers.h" TEST(anon_inode_no_chown) { --- base-commit: f66bc387efbee59978e076ce9bf123ac353b389c change-id: 20250516-selftests-anon-inode-build-007e206e8422 Best regards, -- Mark Brown <broonie(a)kernel.org>

7 months, 1 week

2
1
0 0

[PATCH v2] selftests/ptrace/get_syscall_info: fix for MIPS n32

by Dmitry V. Levin

MIPS n32 is one of two ILP32 architectures supported by the kernel that have 64-bit syscall arguments (another one is x32). When this test passed 32-bit arguments to syscall(), they were sign-extended in libc, PTRACE_GET_SYSCALL_INFO reported these sign-extended 64-bit values, and the test complained about the mismatch. Fix this by passing arguments of the appropriate type to syscall(), which is "unsigned long long" on MIPS n32, and __kernel_ulong_t on other architectures. As a side effect, this also extends the test on all 64-bit architectures by choosing constants that don't fit into 32-bit integers. Signed-off-by: Dmitry V. Levin <ldv(a)strace.io> --- v2: Fixed MIPS #ifdef. .../selftests/ptrace/get_syscall_info.c | 53 +++++++++++-------- 1 file changed, 32 insertions(+), 21 deletions(-) diff --git a/tools/testing/selftests/ptrace/get_syscall_info.c b/tools/testing/selftests/ptrace/get_syscall_info.c index 5bcd1c7b5be6..2970f72d66d3 100644 --- a/tools/testing/selftests/ptrace/get_syscall_info.c +++ b/tools/testing/selftests/ptrace/get_syscall_info.c @@ -11,8 +11,19 @@ #include <err.h> #include <signal.h> #include <asm/unistd.h> +#include <linux/types.h> #include "linux/ptrace.h" +#if defined(_MIPS_SIM) && _MIPS_SIM == _MIPS_SIM_NABI32 +/* + * MIPS N32 is the only architecture where __kernel_ulong_t + * does not match the bitness of syscall arguments. + */ +typedef unsigned long long kernel_ulong_t; +#else +typedef __kernel_ulong_t kernel_ulong_t; +#endif + static int kill_tracee(pid_t pid) { @@ -42,37 +53,37 @@ sys_ptrace(int request, pid_t pid, unsigned long addr, unsigned long data) TEST(get_syscall_info) { - static const unsigned long args[][7] = { + const kernel_ulong_t args[][7] = { /* a sequence of architecture-agnostic syscalls */ { __NR_chdir, - (unsigned long) "", - 0xbad1fed1, - 0xbad2fed2, - 0xbad3fed3, - 0xbad4fed4, - 0xbad5fed5 + (uintptr_t) "", + (kernel_ulong_t) 0xdad1bef1bad1fed1ULL, + (kernel_ulong_t) 0xdad2bef2bad2fed2ULL, + (kernel_ulong_t) 0xdad3bef3bad3fed3ULL, + (kernel_ulong_t) 0xdad4bef4bad4fed4ULL, + (kernel_ulong_t) 0xdad5bef5bad5fed5ULL }, { __NR_gettid, - 0xcaf0bea0, - 0xcaf1bea1, - 0xcaf2bea2, - 0xcaf3bea3, - 0xcaf4bea4, - 0xcaf5bea5 + (kernel_ulong_t) 0xdad0bef0caf0bea0ULL, + (kernel_ulong_t) 0xdad1bef1caf1bea1ULL, + (kernel_ulong_t) 0xdad2bef2caf2bea2ULL, + (kernel_ulong_t) 0xdad3bef3caf3bea3ULL, + (kernel_ulong_t) 0xdad4bef4caf4bea4ULL, + (kernel_ulong_t) 0xdad5bef5caf5bea5ULL }, { __NR_exit_group, 0, - 0xfac1c0d1, - 0xfac2c0d2, - 0xfac3c0d3, - 0xfac4c0d4, - 0xfac5c0d5 + (kernel_ulong_t) 0xdad1bef1fac1c0d1ULL, + (kernel_ulong_t) 0xdad2bef2fac2c0d2ULL, + (kernel_ulong_t) 0xdad3bef3fac3c0d3ULL, + (kernel_ulong_t) 0xdad4bef4fac4c0d4ULL, + (kernel_ulong_t) 0xdad5bef5fac5c0d5ULL } }; - const unsigned long *exp_args; + const kernel_ulong_t *exp_args; pid_t pid = fork(); @@ -154,7 +165,7 @@ TEST(get_syscall_info) } ASSERT_LT(0, (rc = sys_ptrace(PTRACE_GET_SYSCALL_INFO, pid, size, - (unsigned long) &info))) { + (uintptr_t) &info))) { LOG_KILL_TRACEE("PTRACE_GET_SYSCALL_INFO: %m"); } ASSERT_EQ(expected_none_size, rc) { @@ -177,7 +188,7 @@ TEST(get_syscall_info) case SIGTRAP | 0x80: ASSERT_LT(0, (rc = sys_ptrace(PTRACE_GET_SYSCALL_INFO, pid, size, - (unsigned long) &info))) { + (uintptr_t) &info))) { LOG_KILL_TRACEE("PTRACE_GET_SYSCALL_INFO: %m"); } switch (ptrace_stop) { -- ldv

7 months, 1 week

3
7
0 0

[PATCH] lib/tests: Make RANDSTRUCT_KUNIT_TEST depend on RANDSTRUCT

by Geert Uytterhoeven

When CONFIG_RANDSTRUCT is not enabled, all randstruct tests are skipped. Move this logic from run-time to config-time, to avoid people building and running tests that do not do anything. Fixes: b370f7eacdcfe1dd ("lib/tests: Add randstruct KUnit test") Signed-off-by: Geert Uytterhoeven <geert(a)linux-m68k.org> --- FTR, after adding "select HAVE_GCC_PLUGINS" to arch/m68k/Kconfig and installing gcc-13-plugin-dev-m68k-linux-gnu, I could enable CONFIG_RANDSTRUCT_FULL and run the tests (all passed!), so probably I should send a patch to select HAVE_GCC_PLUGINS? --- lib/Kconfig.debug | 4 ++-- lib/tests/randstruct_kunit.c | 9 --------- 2 files changed, 2 insertions(+), 11 deletions(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index a8b4febad716a4be..407f2ed7fcb3e94c 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2895,10 +2895,10 @@ config OVERFLOW_KUNIT_TEST config RANDSTRUCT_KUNIT_TEST tristate "Test randstruct structure layout randomization at runtime" if !KUNIT_ALL_TESTS depends on KUNIT + depends on RANDSTRUCT default KUNIT_ALL_TESTS help - Builds unit tests for the checking CONFIG_RANDSTRUCT=y, which - randomizes structure layouts. + Builds unit tests for checking structure layout randomization. config STACKINIT_KUNIT_TEST tristate "Test level of stack variable initialization" if !KUNIT_ALL_TESTS diff --git a/lib/tests/randstruct_kunit.c b/lib/tests/randstruct_kunit.c index f3a2d63c4cfbe7dc..2c95eca76d2411bc 100644 --- a/lib/tests/randstruct_kunit.c +++ b/lib/tests/randstruct_kunit.c @@ -305,14 +305,6 @@ static void randstruct_initializers(struct kunit *test) #undef init_members } -static int randstruct_test_init(struct kunit *test) -{ - if (!IS_ENABLED(CONFIG_RANDSTRUCT)) - kunit_skip(test, "Not built with CONFIG_RANDSTRUCT=y"); - - return 0; -} - static struct kunit_case randstruct_test_cases[] = { KUNIT_CASE(randstruct_layout_same), KUNIT_CASE(randstruct_layout_mixed), @@ -324,7 +316,6 @@ static struct kunit_case randstruct_test_cases[] = { static struct kunit_suite randstruct_test_suite = { .name = "randstruct", - .init = randstruct_test_init, .test_cases = randstruct_test_cases, }; -- 2.43.0

7 months, 1 week

2
3
0 0

[PATCH] lib/tests: Make FORTIFY_KUNIT_TEST depend on FORTIFY_SOURCE

by Geert Uytterhoeven

When CONFIG_FORTIFY_SOURCE is not enabled, all fortify tests are skipped. Move this logic from run-time to config-time, to avoid people building and running tests that do not do anything. This basically reverts commit 1a78f8cb5daac774 ("fortify: Allow KUnit test to build without FORTIFY") in v6.9, which was v3 of commit a9dc8d0442294b42 ("fortify: Allow KUnit test to build without FORTIFY") in v6.5, which was quickly reverted in commit 5e2956ee46244ffb ("Revert "fortify: Allow KUnit test to build without FORTIFY""). Signed-off-by: Geert Uytterhoeven <geert(a)linux-m68k.org> --- Let's keep on playing whack-a-mole ;-) --- lib/Kconfig.debug | 1 + lib/tests/fortify_kunit.c | 8 -------- 2 files changed, 1 insertion(+), 8 deletions(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 407f2ed7fcb3e94c..ca5afd192c9fbf51 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2912,6 +2912,7 @@ config STACKINIT_KUNIT_TEST config FORTIFY_KUNIT_TEST tristate "Test fortified str*() and mem*() function internals at runtime" if !KUNIT_ALL_TESTS depends on KUNIT + depends on FORTIFY_SOURCE default KUNIT_ALL_TESTS help Builds unit tests for checking internals of FORTIFY_SOURCE as used diff --git a/lib/tests/fortify_kunit.c b/lib/tests/fortify_kunit.c index 29ffc62a71e3f968..10b0e1b12cdc3ae2 100644 --- a/lib/tests/fortify_kunit.c +++ b/lib/tests/fortify_kunit.c @@ -48,11 +48,6 @@ void fortify_add_kunit_error(int write); #include <linux/string.h> #include <linux/vmalloc.h> -/* Handle being built without CONFIG_FORTIFY_SOURCE */ -#ifndef __compiletime_strlen -# define __compiletime_strlen __builtin_strlen -#endif - static struct kunit_resource read_resource; static struct kunit_resource write_resource; static int fortify_read_overflows; @@ -1071,9 +1066,6 @@ static void fortify_test_kmemdup(struct kunit *test) static int fortify_test_init(struct kunit *test) { - if (!IS_ENABLED(CONFIG_FORTIFY_SOURCE)) - kunit_skip(test, "Not built with CONFIG_FORTIFY_SOURCE=y"); - fortify_read_overflows = 0; kunit_add_named_resource(test, NULL, NULL, &read_resource, "fortify_read_overflows", -- 2.43.0

7 months, 1 week

2
1
0 0

[PATCH v2 10/10] selftests/sched_ext: Add test for sched_ext dl_server

by Joel Fernandes

From: Andrea Righi <arighi(a)nvidia.com> Add a selftest to validate the correct behavior of the deadline server for the ext_sched_class. [ Joel: Replaced occurences of CFS in the test with EXT. ] Signed-off-by: Andrea Righi <arighi(a)nvidia.com> Signed-off-by: Joel Fernandes <joelagnelf(a)nvidia.com> --- tools/testing/selftests/sched_ext/Makefile | 1 + .../selftests/sched_ext/rt_stall.bpf.c | 23 ++ tools/testing/selftests/sched_ext/rt_stall.c | 213 ++++++++++++++++++ 3 files changed, 237 insertions(+) create mode 100644 tools/testing/selftests/sched_ext/rt_stall.bpf.c create mode 100644 tools/testing/selftests/sched_ext/rt_stall.c diff --git a/tools/testing/selftests/sched_ext/Makefile b/tools/testing/selftests/sched_ext/Makefile index f4531327b8e7..dcc803eeab39 100644 --- a/tools/testing/selftests/sched_ext/Makefile +++ b/tools/testing/selftests/sched_ext/Makefile @@ -181,6 +181,7 @@ auto-test-targets := \ select_cpu_dispatch_bad_dsq \ select_cpu_dispatch_dbl_dsp \ select_cpu_vtime \ + rt_stall \ test_example \ testcase-targets := $(addsuffix .o,$(addprefix $(SCXOBJ_DIR)/,$(auto-test-targets))) diff --git a/tools/testing/selftests/sched_ext/rt_stall.bpf.c b/tools/testing/selftests/sched_ext/rt_stall.bpf.c new file mode 100644 index 000000000000..80086779dd1e --- /dev/null +++ b/tools/testing/selftests/sched_ext/rt_stall.bpf.c @@ -0,0 +1,23 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * A scheduler that verified if RT tasks can stall SCHED_EXT tasks. + * + * Copyright (c) 2025 NVIDIA Corporation. + */ + +#include <scx/common.bpf.h> + +char _license[] SEC("license") = "GPL"; + +UEI_DEFINE(uei); + +void BPF_STRUCT_OPS(rt_stall_exit, struct scx_exit_info *ei) +{ + UEI_RECORD(uei, ei); +} + +SEC(".struct_ops.link") +struct sched_ext_ops rt_stall_ops = { + .exit = (void *)rt_stall_exit, + .name = "rt_stall", +}; diff --git a/tools/testing/selftests/sched_ext/rt_stall.c b/tools/testing/selftests/sched_ext/rt_stall.c new file mode 100644 index 000000000000..d4cb545ebfd8 --- /dev/null +++ b/tools/testing/selftests/sched_ext/rt_stall.c @@ -0,0 +1,213 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2025 NVIDIA Corporation. + */ +#define _GNU_SOURCE +#include <stdio.h> +#include <stdlib.h> +#include <unistd.h> +#include <sched.h> +#include <sys/prctl.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <time.h> +#include <linux/sched.h> +#include <signal.h> +#include <bpf/bpf.h> +#include <scx/common.h> +#include <sys/wait.h> +#include <unistd.h> +#include "rt_stall.bpf.skel.h" +#include "scx_test.h" +#include "../kselftest.h" + +#define CORE_ID 0 /* CPU to pin tasks to */ +#define RUN_TIME 5 /* How long to run the test in seconds */ + +/* Simple busy-wait function for test tasks */ +static void process_func(void) +{ + while (1) { + /* Busy wait */ + for (volatile unsigned long i = 0; i < 10000000UL; i++); + } +} + +/* Set CPU affinity to a specific core */ +static void set_affinity(int cpu) +{ + cpu_set_t mask; + + CPU_ZERO(&mask); + CPU_SET(cpu, &mask); + if (sched_setaffinity(0, sizeof(mask), &mask) != 0) { + perror("sched_setaffinity"); + exit(EXIT_FAILURE); + } +} + +/* Set task scheduling policy and priority */ +static void set_sched(int policy, int priority) +{ + struct sched_param param; + + param.sched_priority = priority; + if (sched_setscheduler(0, policy, &param) != 0) { + perror("sched_setscheduler"); + exit(EXIT_FAILURE); + } +} + +/* Get process runtime from /proc/<pid>/stat */ +static float get_process_runtime(int pid) +{ + char path[256]; + FILE *file; + long utime, stime; + int fields; + + snprintf(path, sizeof(path), "/proc/%d/stat", pid); + file = fopen(path, "r"); + if (file == NULL) { + perror("Failed to open stat file"); + return -1; + } + + /* Skip the first 13 fields and read the 14th and 15th */ + fields = fscanf(file, + "%*d %*s %*c %*d %*d %*d %*d %*d %*u %*u %*u %*u %*u %lu %lu", + &utime, &stime); + fclose(file); + + if (fields != 2) { + fprintf(stderr, "Failed to read stat file\n"); + return -1; + } + + /* Calculate the total time spent in the process */ + long total_time = utime + stime; + long ticks_per_second = sysconf(_SC_CLK_TCK); + float runtime_seconds = total_time * 1.0 / ticks_per_second; + + return runtime_seconds; +} + +static enum scx_test_status setup(void **ctx) +{ + struct rt_stall *skel; + + skel = rt_stall__open(); + SCX_FAIL_IF(!skel, "Failed to open"); + SCX_ENUM_INIT(skel); + SCX_FAIL_IF(rt_stall__load(skel), "Failed to load skel"); + + *ctx = skel; + + return SCX_TEST_PASS; +} + +static bool sched_stress_test(void) +{ + float cfs_runtime, rt_runtime; + int cfs_pid, rt_pid; + float expected_min_ratio = 0.04; /* 4% */ + + ksft_print_header(); + ksft_set_plan(1); + + /* Create and set up a EXT task */ + cfs_pid = fork(); + if (cfs_pid == 0) { + set_affinity(CORE_ID); + process_func(); + exit(0); + } else if (cfs_pid < 0) { + perror("fork for EXT task"); + ksft_exit_fail(); + } + + /* Create an RT task */ + rt_pid = fork(); + if (rt_pid == 0) { + set_affinity(CORE_ID); + set_sched(SCHED_FIFO, 50); + process_func(); + exit(0); + } else if (rt_pid < 0) { + perror("fork for RT task"); + ksft_exit_fail(); + } + + /* Let the processes run for the specified time */ + sleep(RUN_TIME); + + /* Get runtime for the EXT task */ + cfs_runtime = get_process_runtime(cfs_pid); + if (cfs_runtime != -1) + ksft_print_msg("Runtime of EXT task (PID %d) is %f seconds\n", cfs_pid, cfs_runtime); + else + ksft_exit_fail_msg("Error getting runtime for EXT task (PID %d)\n", cfs_pid); + + /* Get runtime for the RT task */ + rt_runtime = get_process_runtime(rt_pid); + if (rt_runtime != -1) + ksft_print_msg("Runtime of RT task (PID %d) is %f seconds\n", rt_pid, rt_runtime); + else + ksft_exit_fail_msg("Error getting runtime for RT task (PID %d)\n", rt_pid); + + /* Kill the processes */ + kill(cfs_pid, SIGKILL); + kill(rt_pid, SIGKILL); + waitpid(cfs_pid, NULL, 0); + waitpid(rt_pid, NULL, 0); + + /* Verify that the scx task got enough runtime */ + float actual_ratio = cfs_runtime / (cfs_runtime + rt_runtime); + ksft_print_msg("EXT task got %.2f%% of total runtime\n", actual_ratio * 100); + + if (actual_ratio >= expected_min_ratio) { + ksft_test_result_pass("PASS: EXT task got more than %.2f%% of runtime\n", + expected_min_ratio * 100); + return true; + } else { + ksft_test_result_fail("FAIL: EXT task got less than %.2f%% of runtime\n", + expected_min_ratio * 100); + return false; + } +} + +static enum scx_test_status run(void *ctx) +{ + struct rt_stall *skel = ctx; + struct bpf_link *link; + bool res; + + link = bpf_map__attach_struct_ops(skel->maps.rt_stall_ops); + SCX_FAIL_IF(!link, "Failed to attach scheduler"); + + res = sched_stress_test(); + + SCX_EQ(skel->data->uei.kind, EXIT_KIND(SCX_EXIT_NONE)); + bpf_link__destroy(link); + + if (!res) + ksft_exit_fail(); + + return SCX_TEST_PASS; +} + +static void cleanup(void *ctx) +{ + struct rt_stall *skel = ctx; + + rt_stall__destroy(skel); +} + +struct scx_test rt_stall = { + .name = "rt_stall", + .description = "Verify that RT tasks cannot stall SCHED_EXT tasks", + .setup = setup, + .run = run, + .cleanup = cleanup, +}; +REGISTER_SCX_TEST(&rt_stall) -- 2.43.0

7 months, 1 week

1
0
0 0

[PATCH] selftests/bpf: Validate UDP length in cls_redirect test

by Suchit

Add validation step to ensure that the UDP payload is long enough to contain the expected GUE and UNIGUE encapsulation headers Signed-off-by: Suchit <suchitkarunakaran(a)gmail.com> --- tools/testing/selftests/bpf/progs/test_cls_redirect.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/progs/test_cls_redirect.c b/tools/testing/selftests/bpf/progs/test_cls_redirect.c index f344c6835e84..c1d2eaee2e77 100644 --- a/tools/testing/selftests/bpf/progs/test_cls_redirect.c +++ b/tools/testing/selftests/bpf/progs/test_cls_redirect.c @@ -978,7 +978,14 @@ int cls_redirect(struct __sk_buff *skb) return TC_ACT_OK; } - /* TODO Check UDP length? */ + uint16_t udp_len = bpf_ntohs(encap->udp.len); + uint16_t min_encap_len = sizeof(encap->udp) + sizeof(encap->gue) + sizeof(encap->unigue); + + if (udp_len < min_encap_len) { + metrics->errors_total_malformed_encapsulation++; + return TC_ACT_SHOT; + } if (encap->udp.dest != ENCAPSULATION_PORT) { return TC_ACT_OK; } -- 2.49.0

7 months, 1 week

1
0
0 0

[PATCH net-next v2 0/4] netconsole: Optimize console registration and improve testing

by Breno Leitao

During performance analysis of console subsystem latency, I discovered that netconsole registers console handlers even when no active targets exist. These orphaned console handlers are invoked on every printk() call, get the lock, iterate through empty target lists, and consume CPU cycles without performing any useful work. This patch series addresses the inefficiency by: 1. Implementing dynamic console registration/unregistration based on target availability, ensuring console handlers are only active when needed 2. Adding automatic cleanup of unused console registrations when targets are disabled or removed 3. Extending the selftest suite to cover non-extended console format, which was previously untested The optimization reduces printk() overhead by eliminating unnecessary function calls and list traversals when netconsole targets are not configured, improving overall system performance during heavy logging scenarios. --- Changes in v2: - Added selftests to test the new mechanism - Unregister the console if the last target got disabled - Sending to net-next instead of net (Jakub) - Link to v1: https://lore.kernel.org/r/20250528-netcons_ext-v1-1-69f71e404e00@debian.org --- Breno Leitao (4): netconsole: Only register console drivers when targets are configured netconsole: Add automatic console unregistration on target removal selftests: netconsole: Do not exit from inside the validation function selftests: netconsole: Add support for basic netconsole target format drivers/net/netconsole.c | 61 +++++++++++++++++++--- .../selftests/drivers/net/lib/sh/lib_netcons.sh | 27 ++++++++-- .../testing/selftests/drivers/net/netcons_basic.sh | 50 +++++++++++------- 3 files changed, 107 insertions(+), 31 deletions(-) --- base-commit: 914873bc7df913db988284876c16257e6ab772c6 change-id: 20250528-netcons_ext-572982619bea Best regards, -- Breno Leitao <leitao(a)debian.org>

7 months, 2 weeks

2
5
0 0

[PATCH v11 0/5] rust: replace kernel::str::CStr w/ core::ffi::CStr

by Tamir Duberstein

This picks up from Michal Rostecki's work[0]. Per Michal's guidance I have omitted Co-authored tags, as the end result is quite different. Link: https://lore.kernel.org/rust-for-linux/20240819153656.28807-2-vadorovsky@pr… [0] Closes: https://github.com/Rust-for-Linux/linux/issues/1075 Signed-off-by: Tamir Duberstein <tamird(a)gmail.com> --- Changes in v11: - Use `quote_spanned!` to avoid `use<'a, T>` and generally reduce manual token construction. - Add a commit to simplify `quote_spanned!`. - Drop first commit in favor of https://lore.kernel.org/rust-for-linux/20240906164448.2268368-1-paddymills@…. (Miguel Ojeda) - Correctly handle expressions such as `pr_info!("{a}", a = a = a)`. (Benno Lossin) - Avoid dealing with `}}` escapes, which is not needed. (Benno Lossin) - Revert some unnecessary changes. (Benno Lossin) - Rename `c_str_avoid_literals!` to `str_to_cstr!`. (Benno Lossin & Alice Ryhl). - Link to v10: https://lore.kernel.org/r/20250524-cstr-core-v10-0-6412a94d9d75@gmail.com Changes in v10: - Rebase on cbeaa41dfe26b72639141e87183cb23e00d4b0dd. - Implement Alice's suggestion to use a proc macro to work around orphan rules otherwise preventing `core::ffi::CStr` to be directly printed with `{}`. - Link to v9: https://lore.kernel.org/r/20250317-cstr-core-v9-0-51d6cc522f62@gmail.com Changes in v9: - Rebase on rust-next. - Restore `impl Display for BStr` which exists upstream[1]. - Link: https://doc.rust-lang.org/nightly/std/bstr/struct.ByteStr.html#impl-Display… [1] - Link to v8: https://lore.kernel.org/r/20250203-cstr-core-v8-0-cb3f26e78686@gmail.com Changes in v8: - Move `{from,as}_char_ptr` back to `CStrExt`. This reduces the diff some. - Restore `from_bytes_with_nul_unchecked_mut`, `to_cstring`. - Link to v7: https://lore.kernel.org/r/20250202-cstr-core-v7-0-da1802520438@gmail.com Changes in v7: - Rebased on mainline. - Restore functionality added in commit a321f3ad0a5d ("rust: str: add {make,to}_{upper,lower}case() to CString"). - Used `diff.algorithm patience` to improve diff readability. - Link to v6: https://lore.kernel.org/r/20250202-cstr-core-v6-0-8469cd6d29fd@gmail.com Changes in v6: - Split the work into several commits for ease of review. - Restore `{from,as}_char_ptr` to allow building on ARM (see commit message). - Add `CStrExt` to `kernel::prelude`. (Alice Ryhl) - Remove `CStrExt::from_bytes_with_nul_unchecked_mut` and restore `DerefMut for CString`. (Alice Ryhl) - Rename and hide `kernel::c_str!` to encourage use of C-String literals. - Drop implementation and invocation changes in kunit.rs. (Trevor Gross) - Drop docs on `Display` impl. (Trevor Gross) - Rewrite docs in the style of the standard library. - Restore the `test_cstr_debug` unit tests to demonstrate that the implementation has changed. Changes in v5: - Keep the `test_cstr_display*` unit tests. Changes in v4: - Provide the `CStrExt` trait with `display()` method, which returns a `CStrDisplay` wrapper with `Display` implementation. This addresses the lack of `Display` implementation for `core::ffi::CStr`. - Provide `from_bytes_with_nul_unchecked_mut()` method in `CStrExt`, which might be useful and is going to prevent manual, unsafe casts. - Fix a typo (s/preffered/prefered/). Changes in v3: - Fix the commit message. - Remove redundant braces in `use`, when only one item is imported. Changes in v2: - Do not remove `c_str` macro. While it's preferred to use C-string literals, there are two cases where `c_str` is helpful: - When working with macros, which already return a Rust string literal (e.g. `stringify!`). - When building macros, where we want to take a Rust string literal as an argument (for caller's convenience), but still use it as a C-string internally. - Use Rust literals as arguments in macros (`new_mutex`, `new_condvar`, `new_mutex`). Use the `c_str` macro to convert these literals to C-string literals. - Use `c_str` in kunit.rs for converting the output of `stringify!` to a `CStr`. - Remove `DerefMut` implementation for `CString`. --- Tamir Duberstein (5): rust: macros: reduce collections in `quote!` macro rust: support formatting of foreign types rust: replace `CStr` with `core::ffi::CStr` rust: replace `kernel::c_str!` with C-Strings rust: remove core::ffi::CStr reexport drivers/block/rnull.rs | 4 +- drivers/gpu/drm/drm_panic_qr.rs | 5 +- drivers/gpu/nova-core/driver.rs | 2 +- drivers/gpu/nova-core/firmware.rs | 2 +- drivers/net/phy/ax88796b_rust.rs | 8 +- drivers/net/phy/qt2025.rs | 6 +- rust/kernel/block/mq.rs | 2 +- rust/kernel/device.rs | 9 +- rust/kernel/devres.rs | 2 +- rust/kernel/driver.rs | 4 +- rust/kernel/error.rs | 10 +- rust/kernel/faux.rs | 5 +- rust/kernel/firmware.rs | 16 +- rust/kernel/fmt.rs | 77 ++++++ rust/kernel/kunit.rs | 21 +- rust/kernel/lib.rs | 3 +- rust/kernel/miscdevice.rs | 5 +- rust/kernel/net/phy.rs | 12 +- rust/kernel/of.rs | 5 +- rust/kernel/pci.rs | 2 +- rust/kernel/platform.rs | 6 +- rust/kernel/prelude.rs | 5 +- rust/kernel/print.rs | 4 +- rust/kernel/seq_file.rs | 6 +- rust/kernel/str.rs | 443 ++++++++++------------------------- rust/kernel/sync.rs | 7 +- rust/kernel/sync/condvar.rs | 4 +- rust/kernel/sync/lock.rs | 4 +- rust/kernel/sync/lock/global.rs | 6 +- rust/kernel/sync/poll.rs | 1 + rust/kernel/workqueue.rs | 9 +- rust/macros/fmt.rs | 99 ++++++++ rust/macros/kunit.rs | 10 +- rust/macros/lib.rs | 19 ++ rust/macros/module.rs | 2 +- rust/macros/quote.rs | 111 ++++----- samples/rust/rust_driver_faux.rs | 4 +- samples/rust/rust_driver_pci.rs | 4 +- samples/rust/rust_driver_platform.rs | 4 +- samples/rust/rust_misc_device.rs | 3 +- scripts/rustdoc_test_gen.rs | 6 +- 41 files changed, 485 insertions(+), 472 deletions(-) --- base-commit: 7a17bbc1d952057898cb0739e60665908fbb8c72 change-id: 20250201-cstr-core-d4b9b69120cf Best regards, -- Tamir Duberstein <tamird(a)gmail.com>

7 months, 2 weeks

3
11
0 0

[PATCH bpf-next 1/2] bpf: Add bpf_task_cwd_from_pid() kfunc

by Rong Tao

From: Rong Tao <rongtao(a)cestc.cn> It is a bit troublesome to get cwd based on pid in bpf program, such as bpftrace example [1]. This patch therefore adds a new bpf_task_cwd_from_pid() kfunc which allows BPF programs to get cwd from a pid. [1] https://github.com/bpftrace/bpftrace/issues/3314 Signed-off-by: Rong Tao <rongtao(a)cestc.cn> --- kernel/bpf/helpers.c | 45 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index b71e428ad936..0f32fbc997bb 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -24,6 +24,10 @@ #include <linux/bpf_mem_alloc.h> #include <linux/kasan.h> #include <linux/bpf_verifier.h> +#include <linux/fs.h> +#include <linux/fs_struct.h> +#include <linux/path.h> +#include <linux/string.h> #include "../../lib/kstrtox.h" @@ -2643,6 +2647,46 @@ __bpf_kfunc struct task_struct *bpf_task_from_vpid(s32 vpid) return p; } +/** + * bpf_task_cwd_from_pid - Get a task's absolute pathname of the current + * working directory from its pid. + * @pid: The pid of the task being looked up. + * @buf: The array pointed to by buf. + * @buf_len: buf length. + */ +__bpf_kfunc int bpf_task_cwd_from_pid(s32 pid, char *buf, u32 buf_len) +{ + struct path pwd; + char kpath[256], *path; + struct task_struct *task; + + if (!buf || buf_len == 0) + return -EINVAL; + + rcu_read_lock(); + task = pid_task(find_vpid(pid), PIDTYPE_PID); + if (!task) { + rcu_read_unlock(); + return -ESRCH; + } + task_lock(task); + if (!task->fs) { + task_unlock(task); + return -ENOENT; + } + get_fs_pwd(task->fs, &pwd); + task_unlock(task); + rcu_read_unlock(); + + path = d_path(&pwd, kpath, sizeof(kpath)); + path_put(&pwd); + if (IS_ERR(path)) + return PTR_ERR(path); + + strncpy(buf, path, buf_len); + return 0; +} + /** * bpf_dynptr_slice() - Obtain a read-only pointer to the dynptr data. * @p: The dynptr whose data slice to retrieve @@ -3314,6 +3358,7 @@ BTF_ID_FLAGS(func, bpf_task_get_cgroup1, KF_ACQUIRE | KF_RCU | KF_RET_NULL) #endif BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_task_from_vpid, KF_ACQUIRE | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_task_cwd_from_pid, KF_RET_NULL) BTF_ID_FLAGS(func, bpf_throw) #ifdef CONFIG_BPF_EVENTS BTF_ID_FLAGS(func, bpf_send_signal_task, KF_TRUSTED_ARGS) -- 2.49.0

7 months, 2 weeks

4
5
0 0

[PATCH AUTOSEL 6.12 69/93] selftests: harness: Mark functions without prototypes static

by Sasha Levin

From: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> [ Upstream commit c2bcc8e9577a35f9cf4707f8bb0b58bce30991aa ] With -Wmissing-prototypes the compiler will warn about non-static functions which don't have a prototype defined. As they are not used from a different compilation unit they don't need to be defined globally. Avoid the issue by marking the functions static. Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com> Acked-by: Shuah Khan <skhan(a)linuxfoundation.org> Link: https://lore.kernel.org/r/20250505-nolibc-kselftest-harness-v4-4-ee4dd52571… Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- **YES** This commit should be backported to stable kernel trees. **Rationale:** 1. **Legitimate Build Fix**: The commit addresses a real compiler warning issue (`-Wmissing-prototypes`) that affects build cleanliness and code quality. Modern build systems increasingly use stricter warning flags, making this fix valuable for stable trees. 2. **Zero Functional Risk**: The changes are purely cosmetic from a runtime perspective. Adding `static` to functions that were already internal has no impact on functionality, memory layout, or behavior - it only affects compiler symbol visibility and warnings. 3. **Minimal and Contained**: The diff is extremely small (4 function signatures with `static` added) and isolated to the kselftest harness framework. There are no complex logic changes or cross-subsystem impacts. 4. **Testing Infrastructure Improvement**: While the kselftest framework isn't critical runtime code, it's important for kernel testing and validation. Improving build compliance in testing infrastructure benefits stable kernel maintenance. 5. **Standard Practice**: Compiler warning fixes of this nature (adding missing `static` keywords) are routinely backported to stable trees as they represent good coding practices without functional risk. 6. **Different from Similar Commits**: Unlike the referenced similar commits (all marked "NO") which involved feature additions, API changes, or structural modifications, this commit is purely a build compliance fix with no behavioral changes. The commit meets all stable tree criteria: it fixes an issue (compiler warnings), has minimal risk (no functional changes), and improves code quality without introducing new features or architectural changes. Tools like `kselftest_harness.h:241`, `kselftest_harness.h:290`, `kselftest_harness.h:970`, and `kselftest_harness.h:1188` are the specific locations where these low- risk improvements are made. tools/testing/selftests/kselftest_harness.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h index 666c9fde76da9..7c337b4fa054d 100644 --- a/tools/testing/selftests/kselftest_harness.h +++ b/tools/testing/selftests/kselftest_harness.h @@ -258,7 +258,7 @@ * A bare "return;" statement may be used to return early. */ #define FIXTURE_SETUP(fixture_name) \ - void fixture_name##_setup( \ + static void fixture_name##_setup( \ struct __test_metadata __attribute__((unused)) *_metadata, \ FIXTURE_DATA(fixture_name) __attribute__((unused)) *self, \ const FIXTURE_VARIANT(fixture_name) \ @@ -307,7 +307,7 @@ __FIXTURE_TEARDOWN(fixture_name) #define __FIXTURE_TEARDOWN(fixture_name) \ - void fixture_name##_teardown( \ + static void fixture_name##_teardown( \ struct __test_metadata __attribute__((unused)) *_metadata, \ FIXTURE_DATA(fixture_name) __attribute__((unused)) *self, \ const FIXTURE_VARIANT(fixture_name) \ @@ -987,7 +987,7 @@ static void __timeout_handler(int sig, siginfo_t *info, void *ucontext) kill(-(t->pid), SIGKILL); } -void __wait_for_test(struct __test_metadata *t) +static void __wait_for_test(struct __test_metadata *t) { struct sigaction action = { .sa_sigaction = __timeout_handler, @@ -1205,9 +1205,9 @@ static bool test_enabled(int argc, char **argv, return !has_positive; } -void __run_test(struct __fixture_metadata *f, - struct __fixture_variant_metadata *variant, - struct __test_metadata *t) +static void __run_test(struct __fixture_metadata *f, + struct __fixture_variant_metadata *variant, + struct __test_metadata *t) { struct __test_xfail *xfail; char test_name[1024]; -- 2.39.5

7 months, 2 weeks

1
0
0 0

[PATCH AUTOSEL 6.14 076/102] selftests: harness: Mark functions without prototypes static

by Sasha Levin

From: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> [ Upstream commit c2bcc8e9577a35f9cf4707f8bb0b58bce30991aa ] With -Wmissing-prototypes the compiler will warn about non-static functions which don't have a prototype defined. As they are not used from a different compilation unit they don't need to be defined globally. Avoid the issue by marking the functions static. Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com> Acked-by: Shuah Khan <skhan(a)linuxfoundation.org> Link: https://lore.kernel.org/r/20250505-nolibc-kselftest-harness-v4-4-ee4dd52571… Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- **YES** This commit should be backported to stable kernel trees. **Rationale:** 1. **Legitimate Build Fix**: The commit addresses a real compiler warning issue (`-Wmissing-prototypes`) that affects build cleanliness and code quality. Modern build systems increasingly use stricter warning flags, making this fix valuable for stable trees. 2. **Zero Functional Risk**: The changes are purely cosmetic from a runtime perspective. Adding `static` to functions that were already internal has no impact on functionality, memory layout, or behavior - it only affects compiler symbol visibility and warnings. 3. **Minimal and Contained**: The diff is extremely small (4 function signatures with `static` added) and isolated to the kselftest harness framework. There are no complex logic changes or cross-subsystem impacts. 4. **Testing Infrastructure Improvement**: While the kselftest framework isn't critical runtime code, it's important for kernel testing and validation. Improving build compliance in testing infrastructure benefits stable kernel maintenance. 5. **Standard Practice**: Compiler warning fixes of this nature (adding missing `static` keywords) are routinely backported to stable trees as they represent good coding practices without functional risk. 6. **Different from Similar Commits**: Unlike the referenced similar commits (all marked "NO") which involved feature additions, API changes, or structural modifications, this commit is purely a build compliance fix with no behavioral changes. The commit meets all stable tree criteria: it fixes an issue (compiler warnings), has minimal risk (no functional changes), and improves code quality without introducing new features or architectural changes. Tools like `kselftest_harness.h:241`, `kselftest_harness.h:290`, `kselftest_harness.h:970`, and `kselftest_harness.h:1188` are the specific locations where these low- risk improvements are made. tools/testing/selftests/kselftest_harness.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h index 666c9fde76da9..7c337b4fa054d 100644 --- a/tools/testing/selftests/kselftest_harness.h +++ b/tools/testing/selftests/kselftest_harness.h @@ -258,7 +258,7 @@ * A bare "return;" statement may be used to return early. */ #define FIXTURE_SETUP(fixture_name) \ - void fixture_name##_setup( \ + static void fixture_name##_setup( \ struct __test_metadata __attribute__((unused)) *_metadata, \ FIXTURE_DATA(fixture_name) __attribute__((unused)) *self, \ const FIXTURE_VARIANT(fixture_name) \ @@ -307,7 +307,7 @@ __FIXTURE_TEARDOWN(fixture_name) #define __FIXTURE_TEARDOWN(fixture_name) \ - void fixture_name##_teardown( \ + static void fixture_name##_teardown( \ struct __test_metadata __attribute__((unused)) *_metadata, \ FIXTURE_DATA(fixture_name) __attribute__((unused)) *self, \ const FIXTURE_VARIANT(fixture_name) \ @@ -987,7 +987,7 @@ static void __timeout_handler(int sig, siginfo_t *info, void *ucontext) kill(-(t->pid), SIGKILL); } -void __wait_for_test(struct __test_metadata *t) +static void __wait_for_test(struct __test_metadata *t) { struct sigaction action = { .sa_sigaction = __timeout_handler, @@ -1205,9 +1205,9 @@ static bool test_enabled(int argc, char **argv, return !has_positive; } -void __run_test(struct __fixture_metadata *f, - struct __fixture_variant_metadata *variant, - struct __test_metadata *t) +static void __run_test(struct __fixture_metadata *f, + struct __fixture_variant_metadata *variant, + struct __test_metadata *t) { struct __test_xfail *xfail; char test_name[1024]; -- 2.39.5

7 months, 2 weeks

1
0
0 0

[PATCH AUTOSEL 6.15 082/110] selftests: harness: Mark functions without prototypes static

by Sasha Levin

From: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> [ Upstream commit c2bcc8e9577a35f9cf4707f8bb0b58bce30991aa ] With -Wmissing-prototypes the compiler will warn about non-static functions which don't have a prototype defined. As they are not used from a different compilation unit they don't need to be defined globally. Avoid the issue by marking the functions static. Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com> Acked-by: Shuah Khan <skhan(a)linuxfoundation.org> Link: https://lore.kernel.org/r/20250505-nolibc-kselftest-harness-v4-4-ee4dd52571… Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- **YES** This commit should be backported to stable kernel trees. **Rationale:** 1. **Legitimate Build Fix**: The commit addresses a real compiler warning issue (`-Wmissing-prototypes`) that affects build cleanliness and code quality. Modern build systems increasingly use stricter warning flags, making this fix valuable for stable trees. 2. **Zero Functional Risk**: The changes are purely cosmetic from a runtime perspective. Adding `static` to functions that were already internal has no impact on functionality, memory layout, or behavior - it only affects compiler symbol visibility and warnings. 3. **Minimal and Contained**: The diff is extremely small (4 function signatures with `static` added) and isolated to the kselftest harness framework. There are no complex logic changes or cross-subsystem impacts. 4. **Testing Infrastructure Improvement**: While the kselftest framework isn't critical runtime code, it's important for kernel testing and validation. Improving build compliance in testing infrastructure benefits stable kernel maintenance. 5. **Standard Practice**: Compiler warning fixes of this nature (adding missing `static` keywords) are routinely backported to stable trees as they represent good coding practices without functional risk. 6. **Different from Similar Commits**: Unlike the referenced similar commits (all marked "NO") which involved feature additions, API changes, or structural modifications, this commit is purely a build compliance fix with no behavioral changes. The commit meets all stable tree criteria: it fixes an issue (compiler warnings), has minimal risk (no functional changes), and improves code quality without introducing new features or architectural changes. Tools like `kselftest_harness.h:241`, `kselftest_harness.h:290`, `kselftest_harness.h:970`, and `kselftest_harness.h:1188` are the specific locations where these low- risk improvements are made. tools/testing/selftests/kselftest_harness.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h index 666c9fde76da9..7c337b4fa054d 100644 --- a/tools/testing/selftests/kselftest_harness.h +++ b/tools/testing/selftests/kselftest_harness.h @@ -258,7 +258,7 @@ * A bare "return;" statement may be used to return early. */ #define FIXTURE_SETUP(fixture_name) \ - void fixture_name##_setup( \ + static void fixture_name##_setup( \ struct __test_metadata __attribute__((unused)) *_metadata, \ FIXTURE_DATA(fixture_name) __attribute__((unused)) *self, \ const FIXTURE_VARIANT(fixture_name) \ @@ -307,7 +307,7 @@ __FIXTURE_TEARDOWN(fixture_name) #define __FIXTURE_TEARDOWN(fixture_name) \ - void fixture_name##_teardown( \ + static void fixture_name##_teardown( \ struct __test_metadata __attribute__((unused)) *_metadata, \ FIXTURE_DATA(fixture_name) __attribute__((unused)) *self, \ const FIXTURE_VARIANT(fixture_name) \ @@ -987,7 +987,7 @@ static void __timeout_handler(int sig, siginfo_t *info, void *ucontext) kill(-(t->pid), SIGKILL); } -void __wait_for_test(struct __test_metadata *t) +static void __wait_for_test(struct __test_metadata *t) { struct sigaction action = { .sa_sigaction = __timeout_handler, @@ -1205,9 +1205,9 @@ static bool test_enabled(int argc, char **argv, return !has_positive; } -void __run_test(struct __fixture_metadata *f, - struct __fixture_variant_metadata *variant, - struct __test_metadata *t) +static void __run_test(struct __fixture_metadata *f, + struct __fixture_variant_metadata *variant, + struct __test_metadata *t) { struct __test_xfail *xfail; char test_name[1024]; -- 2.39.5

7 months, 2 weeks

1
0
0 0

[PATCH] selftests/damon/_damon_sysfs: skip testcases if CONFIG_DAMON_SYSFS is disabled

by Enze Li

When CONFIG_DAMON_SYSFS is disabled, the selftests fail with the following outputs, not ok 2 selftests: damon: sysfs_update_schemes_tried_regions_wss_estimation.py # exit=1 not ok 3 selftests: damon: damos_quota.py # exit=1 not ok 4 selftests: damon: damos_quota_goal.py # exit=1 not ok 5 selftests: damon: damos_apply_interval.py # exit=1 not ok 6 selftests: damon: damos_tried_regions.py # exit=1 not ok 7 selftests: damon: damon_nr_regions.py # exit=1 not ok 11 selftests: damon: sysfs_update_schemes_tried_regions_hang.py # exit=1 The root cause of this issue is that all the testcases above do not check the sysfs interface of DAMON whether it exists or not. With this patch applied, all the testcases above now pass successfully. Signed-off-by: Enze Li <lienze(a)kylinos.cn> --- tools/testing/selftests/damon/_damon_sysfs.py | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/testing/selftests/damon/_damon_sysfs.py b/tools/testing/selftests/damon/_damon_sysfs.py index 6e136dc3df19..cab67addfb00 100644 --- a/tools/testing/selftests/damon/_damon_sysfs.py +++ b/tools/testing/selftests/damon/_damon_sysfs.py @@ -15,6 +15,10 @@ if sysfs_root is None: print('Seems sysfs not mounted?') exit(ksft_skip) +if not os.path.exists(sysfs_root): + print('Seems DAMON disabled?') + exit(ksft_skip) + def write_file(path, string): "Returns error string if failed, or None otherwise" string = '%s' % string base-commit: 0f70f5b08a47a3bc1a252e5f451a137cde7c98ce -- 2.43.0

7 months, 2 weeks

2
1
0 0

[PATCH net v2] selftests: net: build net/lib dependency in all target

by Bui Quang Minh

We have the logic to include net/lib automatically for net related selftests. However, currently, this logic is only in install target which means only `make install` will have net/lib included. This commit moves the logic to all target so that all `make`, `make run_tests` and `make install` will have net/lib included in net related selftests. Reviewed-by: Jakub Kicinski <kuba(a)kernel.org> Signed-off-by: Bui Quang Minh <minhquangbui99(a)gmail.com> --- Changes in v2: - Make the commit message clearer. tools/testing/selftests/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 6aa11cd3db42..5b04d83ad9a1 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -205,7 +205,7 @@ export KHDR_INCLUDES all: @ret=1; \ - for TARGET in $(TARGETS); do \ + for TARGET in $(TARGETS) $(INSTALL_DEP_TARGETS); do \ BUILD_TARGET=$$BUILD/$$TARGET; \ mkdir $$BUILD_TARGET -p; \ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET \ @@ -270,7 +270,7 @@ ifdef INSTALL_PATH install -m 744 run_kselftest.sh $(INSTALL_PATH)/ rm -f $(TEST_LIST) @ret=1; \ - for TARGET in $(TARGETS) $(INSTALL_DEP_TARGETS); do \ + for TARGET in $(TARGETS); do \ BUILD_TARGET=$$BUILD/$$TARGET; \ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET install \ INSTALL_PATH=$(INSTALL_PATH)/$$TARGET \ -- 2.43.0

7 months, 2 weeks

1
1
0 0

[PATCH v1 1/1] selftests/x86: Add a test to detect infinite sigtrap handler loop

by Xin Li (Intel)

When FRED is enabled, if the Trap Flag (TF) is set without an external debugger attached, it can lead to an infinite loop in the SIGTRAP handler. To avoid this, the software event flag in the augmented SS must be cleared, ensuring that no single-step trap remains pending when ERETU completes. This test checks for that specific scenario—verifying whether the kernel correctly prevents an infinite SIGTRAP loop in this edge case. Signed-off-by: Xin Li (Intel) <xin(a)zytor.com> --- tools/testing/selftests/x86/Makefile | 2 +- .../selftests/x86/test_sigtrap_handler.c | 80 +++++++++++++++++++ 2 files changed, 81 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/x86/test_sigtrap_handler.c diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index f703fcfe9f7c..c486fd88ebb1 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -12,7 +12,7 @@ CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh "$(CC)" trivial_program.c -no-pie) TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap_vdso \ check_initial_reg_state sigreturn iopl ioperm \ - test_vsyscall mov_ss_trap \ + test_vsyscall mov_ss_trap test_sigtrap_handler \ syscall_arg_fault fsgsbase_restore sigaltstack TARGETS_C_BOTHBITS += nx_stack TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \ diff --git a/tools/testing/selftests/x86/test_sigtrap_handler.c b/tools/testing/selftests/x86/test_sigtrap_handler.c new file mode 100644 index 000000000000..9c5c2cf0cf88 --- /dev/null +++ b/tools/testing/selftests/x86/test_sigtrap_handler.c @@ -0,0 +1,80 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2025 Intel Corporation + */ +#define _GNU_SOURCE + +#include <err.h> +#include <signal.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/ucontext.h> + +#ifdef __x86_64__ +# define REG_IP REG_RIP +#else +# define REG_IP REG_EIP +#endif + +static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *), int flags) +{ + struct sigaction sa; + + memset(&sa, 0, sizeof(sa)); + sa.sa_sigaction = handler; + sa.sa_flags = SA_SIGINFO | flags; + sigemptyset(&sa.sa_mask); + + if (sigaction(sig, &sa, 0)) + err(1, "sigaction"); + + return; +} + +static unsigned int loop_count_on_same_ip; + +static void sigtrap(int sig, siginfo_t *info, void *ctx_void) +{ + ucontext_t *ctx = (ucontext_t *)ctx_void; + static unsigned long last_trap_ip; + + if (last_trap_ip == ctx->uc_mcontext.gregs[REG_IP]) { + printf("trapped on %016lx\n", last_trap_ip); + + if (++loop_count_on_same_ip > 10) { + printf("trap loop detected, test failed\n"); + exit(2); + } + + return; + } + + loop_count_on_same_ip = 0; + last_trap_ip = ctx->uc_mcontext.gregs[REG_IP]; + printf("trapped on %016lx\n", last_trap_ip); +} + +int main(int argc, char *argv[]) +{ + sethandler(SIGTRAP, sigtrap, 0); + + asm volatile( +#ifdef __x86_64__ + /* Avoid clobbering the redzone */ + "sub $128, %rsp\n\t" +#endif + "push $0x302\n\t" + "popf\n\t" + "nop\n\t" + "nop\n\t" + "push $0x202\n\t" + "popf\n\t" +#ifdef __x86_64__ + "add $128, %rsp\n\t" +#endif + ); + + printf("test passed\n"); + return 0; +} base-commit: 485d11d84a2452ac16466cc7ae041c93d38929bc -- 2.49.0

7 months, 2 weeks

2
1
0 0

[PATCH v4 00/23] iommufd: Add vIOMMU infrastructure (Part-4 HW QUEUE)

by Nicolin Chen

The vIOMMU object is designed to represent a slice of an IOMMU HW for its virtualization features shared with or passed to user space (a VM mostly) in a way of HW acceleration. This extended the HWPT-based design for more advanced virtualization feature. HW QUEUE introduced by this series as a part of the vIOMMU infrastructure represents a HW accelerated queue/buffer for VM to use exclusively, e.g. - NVIDIA's Virtual Command Queue - AMD vIOMMU's Command Buffer, Event Log Buffer, and PPR Log Buffer each of which allows its IOMMU HW to directly access a queue memory owned by a guest VM and allows a guest OS to control the HW queue direclty, to avoid VM Exit overheads to improve the performance. Introduce IOMMUFD_OBJ_HW_QUEUE and its pairing IOMMUFD_CMD_HW_QUEUE_ALLOC allowing VMM to forward the IOMMU-specific queue info, such as queue base address, size, and etc. Meanwhile, a guest-owned queue needs the guest kernel to control the queue by reading/writing its consumer and producer indexes, via MMIO acceses to the hardware MMIO registers. Introduce an mmap infrastructure for iommufd to support passing through a piece of MMIO region from the host physical address space to the guest physical address space. The mmap info (offset/ length) used by an mmap syscall must be pre-allocated and returned to the user space via an output driver-data during an IOMMUFD_CMD_HW_QUEUE_ALLOC call. Thus, it requires a driver-specific user data support in the vIOMMU allocation flow. As a real-world use case, this series implements a HW QUEUE support in the tegra241-cmdqv driver for VCMDQs on NVIDIA Grace CPU. In another word, it is also the Tegra CMDQV series Part-2 (user-space support), reworked from Previous RFCv1: https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/ This enables the HW accelerated feature for NVIDIA Grace CPU. Compared to the standard SMMUv3 operating in the nested translation mode trapping CMDQ for TLBI and ATC_INV commands, this gives a huge performance improvement: 70% to 90% reductions of invalidation time were measured by various DMA unmap tests running in a guest OS. // Unmap latencies from "dma_map_benchmark -g @granule -t @threads", // by toggling "/sys/kernel/debug/iommu/tegra241_cmdqv/bypass_vcmdq" @granule | @threads | bypass_vcmdq=1 | bypass_vcmdq=0 4KB 1 35.7 us 5.3 us 16KB 1 41.8 us 6.8 us 64KB 1 68.9 us 9.9 us 128KB 1 109.0 us 12.6 us 256KB 1 187.1 us 18.0 us 4KB 2 96.9 us 6.8 us 16KB 2 97.8 us 7.5 us 64KB 2 151.5 us 10.7 us 128KB 2 257.8 us 12.7 us 256KB 2 443.0 us 17.9 us This is on Github: https://github.com/nicolinc/iommufd/commits/iommufd_hw_queue-v4 Paring QEMU branch for testing: https://github.com/nicolinc/qemu/commits/wip/for_iommufd_hw_queue-v4 Changelog v4 * Rebase on v6.15-rc5 * Add Reviewed-by from Vasant * Rename "vQUEUE" to "HW QUEUE" * Use "offset" and "length" for all mmap-related variables * [iommufd] Use u64 for guest PA * [iommufd] Fix typo in uAPI doc * [iommufd] Rename immap_id to offset * [iommufd] Drop the partial-size mmap support * [iommufd] Do not replace WARN_ON with WARN_ON_ONCE * [iommufd] Use "u64 base_addr" for queue base address * [iommufd] Use u64 base_pfn/num_pfns for immap structure * [iommufd] Correct the size passed in to mtree_alloc_range() * [iommufd] Add IOMMUFD_VIOMMU_FLAG_HW_QUEUE_READS_PA to viommu_ops v3 https://lore.kernel.org/all/cover.1746139811.git.nicolinc@nvidia.com/ * Add Reviewed-by from Baolu, Pranjal, and Alok * Revise kdocs, uAPI docs, and commit logs * Rename "vCMDQ" back to "vQUEUE" for AMD cases * [tegra] Add tegra241_vcmdq_hw_flush_timeout() * [tegra] Rename vsmmu_alloc to alloc_vintf_user * [tegra] Use writel for SID replacement registers * [tegra] Move mmap removal call to vsmmu_destroy op * [tegra] Fix revert in tegra241_vintf_alloc_lvcmdq_user() * [iommufd] Replace "& ~PAGE_MASK" with PAGE_ALIGNED() * [iommufd] Add an object-type "owner" to immap structure * [iommufd] Drop the ictx input in the new for-driver APIs * [iommufd] Add iommufd_vma_ops to keep track of mmap lifecycle * [iommufd] Add viommu-based iommufd_viommu_alloc/destroy_mmap helpers * [iommufd] Rename iommufd_ctx_alloc/free_mmap to _iommufd_alloc/destroy_mmap v2 https://lore.kernel.org/all/cover.1745646960.git.nicolinc@nvidia.com/ * Add Reviewed-by from Jason * [smmu] Fix vsmmu initial value * [smmu] Support impl for hw_info * [tegra] Rename "slot" to "vsid" * [tegra] Update kdocs and commit logs * [tegra] Map/unmap LVCMDQ dynamically * [tegra] Refcount the previous LVCMDQ * [tegra] Return -EEXIST if LVCMDQ exists * [tegra] Simplify VINTF cleanup routine * [tegra] Use vmid and s2_domain in vsmmu * [tegra] Rename "mmap_pgoff" to "immap_id" * [tegra] Add more addr and length validation * [iommufd] Add more narrative to mmap's kdoc * [iommufd] Add iommufd_struct_depend/undepend() * [iommufd] Rename vcmdq_free op to vcmdq_destroy * [iommufd] Fix bug in iommu_copy_struct_to_user() * [iommufd] Drop is_io from iommufd_ctx_alloc_mmap() * [iommufd] Test the queue memory for its contiguity * [iommufd] Return -ENXIO if address or length fails * [iommufd] Do not change @min_last in mock_viommu_alloc() * [iommufd] Generalize TEGRA241_VCMDQ data in core structure * [iommufd] Add selftest coverage for IOMMUFD_CMD_VCMDQ_ALLOC * [iommufd] Add iopt_pin_pages() to prevent queue memory from unmapping v1 https://lore.kernel.org/all/cover.1744353300.git.nicolinc@nvidia.com/ Thanks Nicolin Nicolin Chen (23): iommufd/viommu: Add driver-allocated vDEVICE support iommu: Pass in a driver-level user data structure to viommu_alloc op iommufd/viommu: Allow driver-specific user data for a vIOMMU object iommu: Add iommu_copy_struct_to_user helper iommufd/driver: Let iommufd_viommu_alloc helper save ictx to viommu->ictx iommufd/driver: Add iommufd_struct_destroy to revert iommufd_viommu_alloc iommufd/selftest: Support user_data in mock_viommu_alloc iommufd/selftest: Add covearge for viommu data iommufd: Abstract iopt_pin_pages and iopt_unpin_pages helpers iommufd/viommu: Introduce IOMMUFD_OBJ_HW_QUEUE and its related struct iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl iommufd/driver: Add iommufd_hw_queue_depend/undepend() helpers iommufd/selftest: Add coverage for IOMMUFD_CMD_HW_QUEUE_ALLOC iommufd: Add mmap interface iommufd/selftest: Add coverage for the new mmap interface Documentation: userspace-api: iommufd: Update HW QUEUE iommu/arm-smmu-v3-iommufd: Add vsmmu_alloc impl op iommu/arm-smmu-v3-iommufd: Support implementation-defined hw_info iommu/tegra241-cmdqv: Use request_threaded_irq iommu/tegra241-cmdqv: Simplify deinit flow in tegra241_cmdqv_remove_vintf() iommu/tegra241-cmdqv: Do not statically map LVCMDQs iommu/tegra241-cmdqv: Add user-space use support iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 25 +- drivers/iommu/iommufd/io_pagetable.h | 8 + drivers/iommu/iommufd/iommufd_private.h | 28 +- drivers/iommu/iommufd/iommufd_test.h | 20 + include/linux/iommu.h | 43 +- include/linux/iommufd.h | 186 ++++++- include/uapi/linux/iommufd.h | 116 ++++- tools/testing/selftests/iommu/iommufd_utils.h | 52 +- .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 43 +- .../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 490 +++++++++++++++++- drivers/iommu/iommufd/device.c | 117 +---- drivers/iommu/iommufd/driver.c | 94 ++++ drivers/iommu/iommufd/io_pagetable.c | 95 ++++ drivers/iommu/iommufd/main.c | 80 ++- drivers/iommu/iommufd/selftest.c | 133 ++++- drivers/iommu/iommufd/viommu.c | 121 ++++- tools/testing/selftests/iommu/iommufd.c | 97 +++- .../selftests/iommu/iommufd_fail_nth.c | 11 +- Documentation/userspace-api/iommufd.rst | 12 + 19 files changed, 1577 insertions(+), 194 deletions(-) base-commit: 92a09c47464d040866cf2b4cd052bc60555185fb -- 2.43.0

7 months, 2 weeks

5
105
0 0

[PATCH v1 0/6] VMM can handle guest SEA via KVM_EXIT_ARM_SEA

by Jiaqi Yan

Problem ======= When host APEI is unable to claim synchronous external abort (SEA) during stage-2 guest abort, today KVM directly injects an async SError into the VCPU then resumes it. The injected SError usually results in unpleasant guest kernel panic. One of the major situation of guest SEA is when VCPU consumes recoverable uncorrected memory error (UER), which is not uncommon at all in modern datacenter servers with large amounts of physical memory. Although SError and guest panic is sufficient to stop the propagation of corrupted memory there is still room to recover from memory UER in a more graceful manner. Proposed Solution ================= Alternatively KVM can replay the SEA to the faulting VCPU, via existing KVM_SET_VCPU_EVENTS API. If the memory poison consumption or the fault that cause SEA is not from guest kernel, the blast radius can be limited to the consuming or faulting guest userspace process, so the VM can keep running. In addition, instead of doing under the hood without involving userspace, there are benefits to redirect the SEA to VMM: - VM customers care about the disruptions caused by memory errors, and VMM usually has the responsibility to start the process of notifying the customers of memory error events in their VMs. For example some cloud provider emits a critical log in their observability UI [1], and provides playbook for customers on how to mitigate disruptions to their workloads. - VMM can protect future memory error consumption or faults by unmapping the poisoned pages from stage-2 page table with KVM userfault [2], which is more performant than splitting the memslot that contains the poisoned guest pages. - VMM can keep track SEA events in the VM. When VMM thinks the status on the host or the VM is bad enough, e.g. number of distinct SEAs exceeds a threshold, it can restart the VM on another healthy host. - Behavior parity with x86 architecture. When machine check exception (MCE) is caused by VCPU, kernel or KVM signals userspace SIGBUS to let VMM either recover from the MCE, or terminate itself with VM. The prior RFC proposes to implement SIGBUS on arm64 as well, but Marc preferred VCPU exit over signal [3]. However, implementation aside, returning SEA to VMM is on par with returning MCE to VMM. Once SEA is redirected to VMM, among other actions, VMM is encouraged to inject external aborts into the faulting VCPU, which is already supported by KVM on arm64, although not fully supported by KVM_SET_VCPU_EVENTS but complemented in this patchset. New UAPIs ========= This patchset introduces following userspace-visiable changes to empower VMM to control what happens next for guest SEA: - KVM_CAP_ARM_SEA_TO_USER. If userspace enables this new capability at VM creation, KVM will not inject SError while taking SEA, but VM exit to userspace. - KVM_EXIT_ARM_SEA. This is the VM exit reason VMM gets. The details about the SEA is provided in arm_sea as much as possible, including ESR value at EL2, if guest virtual and physical addresses (GPA and GVA) are available and the values if available. - KVM_CAP_ARM_INJECT_EXT_IABT. VMM today can inject external data abort to VCPU via KVM_SET_VCPU_EVENTS API. However, in case of instruction abort, VMM cannot inject it via KVM_SET_VCPU_EVENTS. KVM_CAP_ARM_INJECT_EXT_IABT is just a natural extend to KVM_CAP_ARM_INJECT_EXT_DABT that tells VMM KVM_SET_VCPU_EVENTS now supports external instruction abort. Patchset utilizes commit 26fbdf369227 ("KVM: arm64: Don't translate FAR if invalid/unsafe") from [4], available already in kvmarm/next. [4] makes KVM safely do address translation for HPFAR_EL2, including at the event of SEA, and indicate if HPFAR_EL2 is valid in NS bit. This patchset depends on [4] to tell userspace if GPA is valid and its value if valid. Patchset is based on commit 68ec8b4e84446 ("Merge branch kvm-arm64/pkvm-6.16 into kvmarm-master/next") [1] https://cloud.google.com/solutions/sap/docs/manage-host-errors [2] https://lpc.events/event/18/contributions/1757/attachments/1442/3073/LPC_%2… [3] https://lore.kernel.org/kvm/86pljbqqh0.wl-maz@kernel.org [4] https://lore.kernel.org/all/174369514508.3034362.13165690020799838042.b4-ty… Jiaqi Yan (5): KVM: arm64: VM exit to userspace to handle SEA KVM: arm64: Set FnV for VCPU when FAR_EL2 is invalid KVM: selftests: Test for KVM_EXIT_ARM_SEA and KVM_CAP_ARM_SEA_TO_USER KVM: selftests: Test for KVM_CAP_INJECT_EXT_IABT Documentation: kvm: new uAPI for handling SEA Raghavendra Rao Ananta (1): KVM: arm64: Allow userspace to inject external instruction aborts Documentation/virt/kvm/api.rst | 120 ++++++- arch/arm64/include/asm/kvm_emulate.h | 12 + arch/arm64/include/asm/kvm_host.h | 8 + arch/arm64/include/asm/kvm_ras.h | 21 +- arch/arm64/include/uapi/asm/kvm.h | 3 +- arch/arm64/kvm/Makefile | 3 +- arch/arm64/kvm/arm.c | 6 + arch/arm64/kvm/guest.c | 13 +- arch/arm64/kvm/inject_fault.c | 3 + arch/arm64/kvm/kvm_ras.c | 54 +++ arch/arm64/kvm/mmu.c | 12 +- include/uapi/linux/kvm.h | 12 + tools/arch/arm64/include/uapi/asm/kvm.h | 3 +- tools/testing/selftests/kvm/Makefile.kvm | 2 + .../testing/selftests/kvm/arm64/inject_iabt.c | 100 ++++++ .../testing/selftests/kvm/arm64/sea_to_user.c | 324 ++++++++++++++++++ tools/testing/selftests/kvm/lib/kvm_util.c | 1 + 17 files changed, 654 insertions(+), 43 deletions(-) create mode 100644 arch/arm64/kvm/kvm_ras.c create mode 100644 tools/testing/selftests/kvm/arm64/inject_iabt.c create mode 100644 tools/testing/selftests/kvm/arm64/sea_to_user.c -- 2.49.0.967.g6a0df3ecc3-goog

7 months, 2 weeks

3
12
0 0

[PATCH] kunit: configs: Enable CONFIG_INIT_STACK_ALL_PATTERN in all_tests

by Richard Fitzgerald

Enable CONFIG_INIT_STACK_ALL_PATTERN in all_tests.config. This helps to detect use of uninitialized local variables. This option found an uninitialized data bug in the cs_dsp test. Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com> --- tools/testing/kunit/configs/all_tests.config | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/kunit/configs/all_tests.config b/tools/testing/kunit/configs/all_tests.config index cdd9782f9646..4a60bb71fe72 100644 --- a/tools/testing/kunit/configs/all_tests.config +++ b/tools/testing/kunit/configs/all_tests.config @@ -10,6 +10,7 @@ CONFIG_KUNIT_EXAMPLE_TEST=y CONFIG_KUNIT_ALL_TESTS=y CONFIG_FORTIFY_SOURCE=y +CONFIG_INIT_STACK_ALL_PATTERN=y CONFIG_IIO=y -- 2.39.5

7 months, 2 weeks

3
3
0 0

[PATCH v8 0/9] ublk: decouple server threads from ublk_queues/hctxs

by Uday Shankar

This patch set aims to allow ublk server threads to better balance load amongst themselves by decoupling server threads from ublk_queues/hctxs, so that multiple threads can service I/Os that are issued from a single CPU. This can improve performance for workloads in which ublk server CPU is a bottleneck, and for which load is issued from CPUs which are not balanced across ublk_queues/hctxs. Performance ----------- First create two ublk devices with: ublkb0: ./kublk add -t null -q 2 --nthreads 2 ublkb1: ./kublk add -t null -q 2 --nthreads 2 --per_io_tasks Then run load with: taskset -c 1 fio/t/io_uring -r5 -p0 /dev/ublkb0: 1.90M IOPS taskset -c 1 fio/t/io_uring -r5 -p0 /dev/ublkb1: 2.18M IOPS Since ublkb1 has per-io-tasks, the second command is able to make use of both ublk server worker threads and therefore has increased max throughput. Caveats: - This testing was done on a system with 2 numa nodes, but the penalty of having I/O cross a numa (or LLC) boundary in the per_io_tasks case is quite high. So these numbers were obtained after moving all ublk server threads and the application threads to CPUs on the same numa node/LLC. - One might expect the scaling to be linear - because ublkb1 can make use of twice as many ublk server threads, it should be able to drive twice the throughput. However this is not true (the improvement is ~15%), and needs further investigation. Signed-off-by: Uday Shankar <ushankar(a)purestorage.com> --- Changes in v8: - Fix queue_rqs batch dispatch OOPS when dispatching a list of requests associated to > 1 ublk_queue (Ming Lei, Caleb Sander Mateos) - Simplify queue_rqs (Caleb Sander Mateos) - Narrow a couple of types (Ming Lei) - Add stress test for per io daemons (Ming Lei) - Link to v7: https://lore.kernel.org/r/20250527-ublk_task_per_io-v7-0-cbdbaf283baa@pures… Changes in v7: - Fix queue_rqs batch dispatch for per-io daemons - Kick round-robin tag allocation changes to a followup - Add explicit feature flag for per-task daemons (Ming Lei, Caleb Sander Mateos) - Move some variable assignments to avoid redundant computation (Caleb Sander Mateos) - Switch from storing pointers in ublk_io to computing based on address with container_of in a couple places (Ming Lei) - Link to v6: https://lore.kernel.org/r/20250507-ublk_task_per_io-v6-0-a2a298783c01@pures… Changes in v6: - Add a feature flag for this feature, called UBLK_F_RR_TAGS (Ming Lei) - Add test for this feature (Ming Lei) - Add documentation for this feature (Ming Lei) - Link to v5: https://lore.kernel.org/r/20250416-ublk_task_per_io-v5-0-9261ad7bff20@pures… Changes in v5: - Set io->task before ublk_mark_io_ready (Caleb Sander Mateos) - Set io->task atomically, read it atomically when needed - Return 0 on success from command-specific helpers in __ublk_ch_uring_cmd (Caleb Sander Mateos) - Rename ublk_handle_need_get_data to ublk_get_data (Caleb Sander Mateos) - Link to v4: https://lore.kernel.org/r/20250415-ublk_task_per_io-v4-0-54210b91a46f@pures… Changes in v4: - Drop "ublk: properly serialize all FETCH_REQs" since Ming is taking it in another set - Prevent data races by marking data structures which should be read-only in the I/O path as const (Ming Lei) - Link to v3: https://lore.kernel.org/r/20250410-ublk_task_per_io-v3-0-b811e8f4554a@pures… Changes in v3: - Check for UBLK_IO_FLAG_ACTIVE on I/O again after taking lock to ensure that two concurrent FETCH_REQs on the same I/O can't succeed (Caleb Sander Mateos) - Link to v2: https://lore.kernel.org/r/20250408-ublk_task_per_io-v2-0-b97877e6fd50@pures… Changes in v2: - Remove changes split into other patches - To ease error handling/synchronization, associate each I/O (instead of each queue) to the last task that issues a FETCH_REQ against it. Only that task is allowed to operate on the I/O. - Link to v1: https://lore.kernel.org/r/20241002224437.3088981-1-ushankar@purestorage.com --- Uday Shankar (9): ublk: have a per-io daemon instead of a per-queue daemon selftests: ublk: kublk: plumb q_id in io_uring user_data selftests: ublk: kublk: tie sqe allocation to io instead of queue selftests: ublk: kublk: lift queue initialization out of thread selftests: ublk: kublk: move per-thread data out of ublk_queue selftests: ublk: kublk: decouple ublk_queues from ublk server threads selftests: ublk: add functional test for per io daemons selftests: ublk: add stress test for per io daemons Documentation: ublk: document UBLK_F_PER_IO_DAEMON Documentation/block/ublk.rst | 35 ++- drivers/block/ublk_drv.c | 111 +++---- include/uapi/linux/ublk_cmd.h | 9 + tools/testing/selftests/ublk/Makefile | 2 + tools/testing/selftests/ublk/fault_inject.c | 4 +- tools/testing/selftests/ublk/file_backed.c | 20 +- tools/testing/selftests/ublk/kublk.c | 344 ++++++++++++++------- tools/testing/selftests/ublk/kublk.h | 73 +++-- tools/testing/selftests/ublk/null.c | 22 +- tools/testing/selftests/ublk/stripe.c | 17 +- tools/testing/selftests/ublk/test_common.sh | 5 + tools/testing/selftests/ublk/test_generic_12.sh | 55 ++++ tools/testing/selftests/ublk/test_stress_06.sh | 36 +++ .../selftests/ublk/trace/count_ios_per_tid.bt | 11 + 14 files changed, 512 insertions(+), 232 deletions(-) --- base-commit: 533c87e2ed742454957f14d7bef9f48d5a72e72d change-id: 20250408-ublk_task_per_io-c693cf608d7a Best regards, -- Uday Shankar <ushankar(a)purestorage.com>

7 months, 2 weeks

3
13
0 0

[PATCH bpf-next 1/2] bpftool: Use appropriate permissions for map access

by Slava Imameev

Modify several functions in tools/bpf/bpftool/common.c to allow specification of requested access for file descriptors, such as read-only access. Update bpftool to request only read access for maps when write access is not required. This fixes errors when reading from maps that are protected from modification via security_bpf_map. Signed-off-by: Slava Imameev <slava.imameev(a)crowdstrike.com> --- tools/bpf/bpftool/btf.c | 3 +- tools/bpf/bpftool/common.c | 57 ++++++++++++++++++++++--------- tools/bpf/bpftool/iter.c | 2 +- tools/bpf/bpftool/link.c | 2 +- tools/bpf/bpftool/main.h | 13 ++++--- tools/bpf/bpftool/map.c | 56 +++++++++++++++++------------- tools/bpf/bpftool/map_perf_ring.c | 3 +- tools/bpf/bpftool/prog.c | 4 +-- 8 files changed, 90 insertions(+), 50 deletions(-) diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c index 6b14cbfa58aa..1ba27cb03348 100644 --- a/tools/bpf/bpftool/btf.c +++ b/tools/bpf/bpftool/btf.c @@ -905,7 +905,8 @@ static int do_dump(int argc, char **argv) return -1; } - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, + BPF_F_RDONLY); if (fd < 0) return -1; diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c index ecfa790adc13..ff1c99281beb 100644 --- a/tools/bpf/bpftool/common.c +++ b/tools/bpf/bpftool/common.c @@ -193,7 +193,8 @@ int mount_tracefs(const char *target) return err; } -int open_obj_pinned(const char *path, bool quiet) +int open_obj_pinned(const char *path, bool quiet, + const struct bpf_obj_get_opts *opts) { char *pname; int fd = -1; @@ -205,7 +206,7 @@ int open_obj_pinned(const char *path, bool quiet) goto out_ret; } - fd = bpf_obj_get(pname); + fd = bpf_obj_get_opts(pname, opts); if (fd < 0) { if (!quiet) p_err("bpf obj get (%s): %s", pname, @@ -221,12 +222,13 @@ int open_obj_pinned(const char *path, bool quiet) return fd; } -int open_obj_pinned_any(const char *path, enum bpf_obj_type exp_type) +int open_obj_pinned_any(const char *path, enum bpf_obj_type exp_type, + const struct bpf_obj_get_opts *opts) { enum bpf_obj_type type; int fd; - fd = open_obj_pinned(path, false); + fd = open_obj_pinned(path, false, opts); if (fd < 0) return -1; @@ -555,7 +557,7 @@ static int do_build_table_cb(const char *fpath, const struct stat *sb, if (typeflag != FTW_F) goto out_ret; - fd = open_obj_pinned(fpath, true); + fd = open_obj_pinned(fpath, true, NULL); if (fd < 0) goto out_ret; @@ -928,7 +930,7 @@ int prog_parse_fds(int *argc, char ***argv, int **fds) path = **argv; NEXT_ARGP(); - (*fds)[0] = open_obj_pinned_any(path, BPF_OBJ_PROG); + (*fds)[0] = open_obj_pinned_any(path, BPF_OBJ_PROG, NULL); if ((*fds)[0] < 0) return -1; return 1; @@ -965,7 +967,8 @@ int prog_parse_fd(int *argc, char ***argv) return fd; } -static int map_fd_by_name(char *name, int **fds) +static int map_fd_by_name(char *name, int **fds, + const struct bpf_get_fd_by_id_opts *opts) { unsigned int id = 0; int fd, nb_fds = 0; @@ -973,6 +976,7 @@ static int map_fd_by_name(char *name, int **fds) int err; while (true) { + LIBBPF_OPTS(bpf_get_fd_by_id_opts, opts_ro); struct bpf_map_info info = {}; __u32 len = sizeof(info); @@ -985,7 +989,9 @@ static int map_fd_by_name(char *name, int **fds) return nb_fds; } - fd = bpf_map_get_fd_by_id(id); + /* Request a read-only fd to query the map info */ + opts_ro.open_flags = BPF_F_RDONLY; + fd = bpf_map_get_fd_by_id_opts(id, &opts_ro); if (fd < 0) { p_err("can't get map by id (%u): %s", id, strerror(errno)); @@ -1004,6 +1010,15 @@ static int map_fd_by_name(char *name, int **fds) continue; } + /* Get an fd with the requested options. */ + close(fd); + fd = bpf_map_get_fd_by_id_opts(id, opts); + if (fd < 0) { + p_err("can't get map by id (%u): %s", id, + strerror(errno)); + goto err_close_fds; + } + if (nb_fds > 0) { tmp = realloc(*fds, (nb_fds + 1) * sizeof(int)); if (!tmp) { @@ -1023,8 +1038,16 @@ static int map_fd_by_name(char *name, int **fds) return -1; } -int map_parse_fds(int *argc, char ***argv, int **fds) +int map_parse_fds(int *argc, char ***argv, int **fds, __u32 open_flags) { + LIBBPF_OPTS(bpf_get_fd_by_id_opts, opts); + + if (open_flags & ~BPF_F_RDONLY) { + p_err("invalid open_flags: %x", open_flags); + return -1; + } + opts.open_flags = open_flags; + if (is_prefix(**argv, "id")) { unsigned int id; char *endptr; @@ -1038,7 +1061,7 @@ int map_parse_fds(int *argc, char ***argv, int **fds) } NEXT_ARGP(); - (*fds)[0] = bpf_map_get_fd_by_id(id); + (*fds)[0] = bpf_map_get_fd_by_id_opts(id, &opts); if ((*fds)[0] < 0) { p_err("get map by id (%u): %s", id, strerror(errno)); return -1; @@ -1056,16 +1079,18 @@ int map_parse_fds(int *argc, char ***argv, int **fds) } NEXT_ARGP(); - return map_fd_by_name(name, fds); + return map_fd_by_name(name, fds, &opts); } else if (is_prefix(**argv, "pinned")) { char *path; + LIBBPF_OPTS(bpf_obj_get_opts, get_opts); + get_opts.file_flags = open_flags; NEXT_ARGP(); path = **argv; NEXT_ARGP(); - (*fds)[0] = open_obj_pinned_any(path, BPF_OBJ_MAP); + (*fds)[0] = open_obj_pinned_any(path, BPF_OBJ_MAP, &get_opts); if ((*fds)[0] < 0) return -1; return 1; @@ -1075,7 +1100,7 @@ int map_parse_fds(int *argc, char ***argv, int **fds) return -1; } -int map_parse_fd(int *argc, char ***argv) +int map_parse_fd(int *argc, char ***argv, __u32 open_flags) { int *fds = NULL; int nb_fds, fd; @@ -1085,7 +1110,7 @@ int map_parse_fd(int *argc, char ***argv) p_err("mem alloc failed"); return -1; } - nb_fds = map_parse_fds(argc, argv, &fds); + nb_fds = map_parse_fds(argc, argv, &fds, open_flags); if (nb_fds != 1) { if (nb_fds > 1) { p_err("several maps match this handle"); @@ -1103,12 +1128,12 @@ int map_parse_fd(int *argc, char ***argv) } int map_parse_fd_and_info(int *argc, char ***argv, struct bpf_map_info *info, - __u32 *info_len) + __u32 *info_len, __u32 open_flags) { int err; int fd; - fd = map_parse_fd(argc, argv); + fd = map_parse_fd(argc, argv, open_flags); if (fd < 0) return -1; diff --git a/tools/bpf/bpftool/iter.c b/tools/bpf/bpftool/iter.c index 5c39c2ed36a2..ad318a8667a4 100644 --- a/tools/bpf/bpftool/iter.c +++ b/tools/bpf/bpftool/iter.c @@ -37,7 +37,7 @@ static int do_pin(int argc, char **argv) return -1; } - map_fd = map_parse_fd(&argc, &argv); + map_fd = map_parse_fd(&argc, &argv, 0); if (map_fd < 0) return -1; diff --git a/tools/bpf/bpftool/link.c b/tools/bpf/bpftool/link.c index 3535afc80a49..8523be11dcd9 100644 --- a/tools/bpf/bpftool/link.c +++ b/tools/bpf/bpftool/link.c @@ -117,7 +117,7 @@ static int link_parse_fd(int *argc, char ***argv) path = **argv; NEXT_ARGP(); - return open_obj_pinned_any(path, BPF_OBJ_LINK); + return open_obj_pinned_any(path, BPF_OBJ_LINK, NULL); } p_err("expected 'id' or 'pinned', got: '%s'?", **argv); diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h index 9eb764fe4cc8..6db704fda5c0 100644 --- a/tools/bpf/bpftool/main.h +++ b/tools/bpf/bpftool/main.h @@ -15,6 +15,7 @@ #include <bpf/hashmap.h> #include <bpf/libbpf.h> +#include <bpf/bpf.h> #include "json_writer.h" @@ -140,8 +141,10 @@ void get_prog_full_name(const struct bpf_prog_info *prog_info, int prog_fd, int get_fd_type(int fd); const char *get_fd_type_name(enum bpf_obj_type type); char *get_fdinfo(int fd, const char *key); -int open_obj_pinned(const char *path, bool quiet); -int open_obj_pinned_any(const char *path, enum bpf_obj_type exp_type); +int open_obj_pinned(const char *path, bool quiet, + const struct bpf_obj_get_opts *opts); +int open_obj_pinned_any(const char *path, enum bpf_obj_type exp_type, + const struct bpf_obj_get_opts *opts); int mount_bpffs_for_file(const char *file_name); int create_and_mount_bpffs_dir(const char *dir_name); int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(int *, char ***)); @@ -167,10 +170,10 @@ int do_iter(int argc, char **argv) __weak; int parse_u32_arg(int *argc, char ***argv, __u32 *val, const char *what); int prog_parse_fd(int *argc, char ***argv); int prog_parse_fds(int *argc, char ***argv, int **fds); -int map_parse_fd(int *argc, char ***argv); -int map_parse_fds(int *argc, char ***argv, int **fds); +int map_parse_fd(int *argc, char ***argv, __u32 open_flags); +int map_parse_fds(int *argc, char ***argv, int **fds, __u32 open_flags); int map_parse_fd_and_info(int *argc, char ***argv, struct bpf_map_info *info, - __u32 *info_len); + __u32 *info_len, __u32 open_flags); struct bpf_prog_linfo; #if defined(HAVE_LLVM_SUPPORT) || defined(HAVE_LIBBFD_SUPPORT) diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c index 81cc668b4b05..c7dc2eae8ba8 100644 --- a/tools/bpf/bpftool/map.c +++ b/tools/bpf/bpftool/map.c @@ -337,9 +337,9 @@ static void fill_per_cpu_value(struct bpf_map_info *info, void *value) memcpy(value + i * step, value, info->value_size); } -static int parse_elem(char **argv, struct bpf_map_info *info, - void *key, void *value, __u32 key_size, __u32 value_size, - __u32 *flags, __u32 **value_fd) +static int parse_elem(char **argv, struct bpf_map_info *info, void *key, + void *value, __u32 key_size, __u32 value_size, + __u32 *flags, __u32 **value_fd, __u32 open_flags) { if (!*argv) { if (!key && !value) @@ -362,7 +362,7 @@ static int parse_elem(char **argv, struct bpf_map_info *info, return -1; return parse_elem(argv, info, NULL, value, key_size, value_size, - flags, value_fd); + flags, value_fd, open_flags); } else if (is_prefix(*argv, "value")) { int fd; @@ -388,7 +388,7 @@ static int parse_elem(char **argv, struct bpf_map_info *info, return -1; } - fd = map_parse_fd(&argc, &argv); + fd = map_parse_fd(&argc, &argv, open_flags); if (fd < 0) return -1; @@ -424,7 +424,7 @@ static int parse_elem(char **argv, struct bpf_map_info *info, } return parse_elem(argv, info, key, NULL, key_size, value_size, - flags, NULL); + flags, NULL, open_flags); } else if (is_prefix(*argv, "any") || is_prefix(*argv, "noexist") || is_prefix(*argv, "exist")) { if (!flags) { @@ -440,7 +440,7 @@ static int parse_elem(char **argv, struct bpf_map_info *info, *flags = BPF_EXIST; return parse_elem(argv + 1, info, key, value, key_size, - value_size, NULL, value_fd); + value_size, NULL, value_fd, open_flags); } p_err("expected key or value, got: %s", *argv); @@ -639,7 +639,7 @@ static int do_show_subset(int argc, char **argv) p_err("mem alloc failed"); return -1; } - nb_fds = map_parse_fds(&argc, &argv, &fds); + nb_fds = map_parse_fds(&argc, &argv, &fds, BPF_F_RDONLY); if (nb_fds < 1) goto exit_free; @@ -672,12 +672,15 @@ static int do_show_subset(int argc, char **argv) static int do_show(int argc, char **argv) { + LIBBPF_OPTS(bpf_get_fd_by_id_opts, opts); struct bpf_map_info info = {}; __u32 len = sizeof(info); __u32 id = 0; int err; int fd; + opts.open_flags = BPF_F_RDONLY; + if (show_pinned) { map_table = hashmap__new(hash_fn_for_key_as_id, equal_fn_for_key_as_id, NULL); @@ -707,7 +710,7 @@ static int do_show(int argc, char **argv) break; } - fd = bpf_map_get_fd_by_id(id); + fd = bpf_map_get_fd_by_id_opts(id, &opts); if (fd < 0) { if (errno == ENOENT) continue; @@ -909,7 +912,7 @@ static int do_dump(int argc, char **argv) p_err("mem alloc failed"); return -1; } - nb_fds = map_parse_fds(&argc, &argv, &fds); + nb_fds = map_parse_fds(&argc, &argv, &fds, BPF_F_RDONLY); if (nb_fds < 1) goto exit_free; @@ -997,7 +1000,7 @@ static int do_update(int argc, char **argv) if (argc < 2) usage(); - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, 0); if (fd < 0) return -1; @@ -1006,7 +1009,7 @@ static int do_update(int argc, char **argv) goto exit_free; err = parse_elem(argv, &info, key, value, info.key_size, - info.value_size, &flags, &value_fd); + info.value_size, &flags, &value_fd, 0); if (err) goto exit_free; @@ -1076,7 +1079,7 @@ static int do_lookup(int argc, char **argv) if (argc < 2) usage(); - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, BPF_F_RDONLY); if (fd < 0) return -1; @@ -1084,7 +1087,8 @@ static int do_lookup(int argc, char **argv) if (err) goto exit_free; - err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, NULL); + err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, NULL, + BPF_F_RDONLY); if (err) goto exit_free; @@ -1127,7 +1131,7 @@ static int do_getnext(int argc, char **argv) if (argc < 2) usage(); - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, BPF_F_RDONLY); if (fd < 0) return -1; @@ -1140,8 +1144,8 @@ static int do_getnext(int argc, char **argv) } if (argc) { - err = parse_elem(argv, &info, key, NULL, info.key_size, 0, - NULL, NULL); + err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, + NULL, BPF_F_RDONLY); if (err) goto exit_free; } else { @@ -1198,7 +1202,7 @@ static int do_delete(int argc, char **argv) if (argc < 2) usage(); - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, BPF_F_RDONLY); if (fd < 0) return -1; @@ -1209,7 +1213,8 @@ static int do_delete(int argc, char **argv) goto exit_free; } - err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, NULL); + err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, NULL, + 0); if (err) goto exit_free; @@ -1226,11 +1231,16 @@ static int do_delete(int argc, char **argv) return err; } +static int map_parse_read_only_fd(int *argc, char ***argv) +{ + return map_parse_fd(argc, argv, BPF_F_RDONLY); +} + static int do_pin(int argc, char **argv) { int err; - err = do_pin_any(argc, argv, map_parse_fd); + err = do_pin_any(argc, argv, map_parse_read_only_fd); if (!err && json_output) jsonw_null(json_wtr); return err; @@ -1319,7 +1329,7 @@ static int do_create(int argc, char **argv) if (!REQ_ARGS(2)) usage(); inner_map_fd = map_parse_fd_and_info(&argc, &argv, - &info, &len); + &info, &len, 0); if (inner_map_fd < 0) return -1; attr.inner_map_fd = inner_map_fd; @@ -1368,7 +1378,7 @@ static int do_pop_dequeue(int argc, char **argv) if (argc < 2) usage(); - fd = map_parse_fd_and_info(&argc, &argv, &info, &len); + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, 0); if (fd < 0) return -1; @@ -1407,7 +1417,7 @@ static int do_freeze(int argc, char **argv) if (!REQ_ARGS(2)) return -1; - fd = map_parse_fd(&argc, &argv); + fd = map_parse_fd(&argc, &argv, 0); if (fd < 0) return -1; diff --git a/tools/bpf/bpftool/map_perf_ring.c b/tools/bpf/bpftool/map_perf_ring.c index 552b4ca40c27..bcb767e2d673 100644 --- a/tools/bpf/bpftool/map_perf_ring.c +++ b/tools/bpf/bpftool/map_perf_ring.c @@ -128,7 +128,8 @@ int do_event_pipe(int argc, char **argv) int err, map_fd; map_info_len = sizeof(map_info); - map_fd = map_parse_fd_and_info(&argc, &argv, &map_info, &map_info_len); + map_fd = map_parse_fd_and_info(&argc, &argv, &map_info, &map_info_len, + 0); if (map_fd < 0) return -1; diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c index 96eea8a67225..deeaa5c1ed7d 100644 --- a/tools/bpf/bpftool/prog.c +++ b/tools/bpf/bpftool/prog.c @@ -1062,7 +1062,7 @@ static int parse_attach_detach_args(int argc, char **argv, int *progfd, if (!REQ_ARGS(2)) return -EINVAL; - *mapfd = map_parse_fd(&argc, &argv); + *mapfd = map_parse_fd(&argc, &argv, 0); if (*mapfd < 0) return *mapfd; @@ -1608,7 +1608,7 @@ static int load_with_options(int argc, char **argv, bool first_prog_only) } NEXT_ARG(); - fd = map_parse_fd(&argc, &argv); + fd = map_parse_fd(&argc, &argv, 0); if (fd < 0) goto err_free_reuse_maps; -- 2.34.1

7 months, 2 weeks

1
1
0 0

[PATCH] selftests/filesystems: Fix build of anon_inode_test

by Mark Brown

The anon_inode_test test fails to build due to attempting to include a nonexisting overlayfs/wrapper.h: anon_inode_test.c:10:10: fatal error: overlayfs/wrappers.h: No such file or directory 10 | #include "overlayfs/wrappers.h" | ^~~~~~~~~~~~~~~~~~~~~~ This is due to 0bd92b9fe538 ("selftests/filesystems: move wrapper.h out of overlayfs subdir") which was added in the vfs-6.16.selftests branch which was based on -rc5 and does not contain the newly added test so once things were merged into vfs.all in the build started failing - both parent commits are fine. Fixes: feaa00dbff45a ("Merge branch 'vfs-6.16.selftests' into vfs.all") Signed-off-by: Mark Brown <broonie(a)kernel.org> --- tools/testing/selftests/filesystems/anon_inode_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/filesystems/anon_inode_test.c b/tools/testing/selftests/filesystems/anon_inode_test.c index e8e0ef1460d2..73e0a4d4fb2f 100644 --- a/tools/testing/selftests/filesystems/anon_inode_test.c +++ b/tools/testing/selftests/filesystems/anon_inode_test.c @@ -7,7 +7,7 @@ #include <sys/stat.h> #include "../kselftest_harness.h" -#include "overlayfs/wrappers.h" +#include "wrappers.h" TEST(anon_inode_no_chown) { --- base-commit: feaa00dbff45ad9a0dcd04a92f88c745bf880f55 change-id: 20250516-selftests-anon-inode-build-007e206e8422 Best regards, -- Mark Brown <broonie(a)kernel.org>

7 months, 2 weeks

1
2
0 0

[PATCH net] selftests: net: build net/lib dependency in all target

by Bui Quang Minh

Currently, we only build net/lib dependency in install target. This commit moves that to all target so that net/lib is included in in-tree build and run_tests. Signed-off-by: Bui Quang Minh <minhquangbui99(a)gmail.com> --- tools/testing/selftests/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 6aa11cd3db42..5b04d83ad9a1 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -205,7 +205,7 @@ export KHDR_INCLUDES all: @ret=1; \ - for TARGET in $(TARGETS); do \ + for TARGET in $(TARGETS) $(INSTALL_DEP_TARGETS); do \ BUILD_TARGET=$$BUILD/$$TARGET; \ mkdir $$BUILD_TARGET -p; \ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET \ @@ -270,7 +270,7 @@ ifdef INSTALL_PATH install -m 744 run_kselftest.sh $(INSTALL_PATH)/ rm -f $(TEST_LIST) @ret=1; \ - for TARGET in $(TARGETS) $(INSTALL_DEP_TARGETS); do \ + for TARGET in $(TARGETS); do \ BUILD_TARGET=$$BUILD/$$TARGET; \ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET install \ INSTALL_PATH=$(INSTALL_PATH)/$$TARGET \ -- 2.43.0

7 months, 2 weeks

3
5
0 0

[PATCH] selftests/seccomp: report errno and add hints on failure

by Sameeksha Sankpal

Signed-off-by: Sameeksha Sankpal <sameekshasankpal(a)gmail.com> --- tools/testing/selftests/seccomp/seccomp_bpf.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 14ba51b52095..d6a85d7b26da 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -4508,7 +4508,11 @@ static char get_proc_stat(struct __test_metadata *_metadata, pid_t pid) snprintf(proc_path, sizeof(proc_path), "/proc/%d/stat", pid); ASSERT_EQ(get_nth(_metadata, proc_path, 3, &line), 1); - + int rc = get_nth(_metadata, proc_path, 3, &line); + if (rc != 1) { + printf("[ERROR] user_notification_fifo: failed to read stat for PID %d (rc=%d)\n", pid, rc); + } + ASSERT_EQ(rc, 1); status = *line; free(line); @@ -4518,6 +4522,7 @@ static char get_proc_stat(struct __test_metadata *_metadata, pid_t pid) TEST(user_notification_fifo) { struct seccomp_notif_resp resp = {}; + ksft_print_msg("[INFO] Starting FIFO notification test\n"); struct seccomp_notif req = {}; int i, status, listener; pid_t pid, pids[3]; @@ -4535,6 +4540,7 @@ TEST(user_notification_fifo) listener = user_notif_syscall(__NR_getppid, SECCOMP_FILTER_FLAG_NEW_LISTENER); ASSERT_GE(listener, 0); + ksft_print_msg("[INFO] user_notification_fifo: listener PID is %d\n", listener); pid = fork(); ASSERT_GE(pid, 0); -- 2.43.0

7 months, 2 weeks

2
4
0 0

[PATCH v10 0/5] rust: replace kernel::str::CStr w/ core::ffi::CStr

by Tamir Duberstein

This picks up from Michal Rostecki's work[0]. Per Michal's guidance I have omitted Co-authored tags, as the end result is quite different. Link: https://lore.kernel.org/rust-for-linux/20240819153656.28807-2-vadorovsky@pr… [0] Closes: https://github.com/Rust-for-Linux/linux/issues/1075 Signed-off-by: Tamir Duberstein <tamird(a)gmail.com> --- Changes in v10: - Rebase on cbeaa41dfe26b72639141e87183cb23e00d4b0dd. - Implement Alice's suggestion to use a proc macro to work around orphan rules otherwise preventing `core::ffi::CStr` to be directly printed with `{}`. - Link to v9: https://lore.kernel.org/r/20250317-cstr-core-v9-0-51d6cc522f62@gmail.com Changes in v9: - Rebase on rust-next. - Restore `impl Display for BStr` which exists upstream[1]. - Link: https://doc.rust-lang.org/nightly/std/bstr/struct.ByteStr.html#impl-Display… [1] - Link to v8: https://lore.kernel.org/r/20250203-cstr-core-v8-0-cb3f26e78686@gmail.com Changes in v8: - Move `{from,as}_char_ptr` back to `CStrExt`. This reduces the diff some. - Restore `from_bytes_with_nul_unchecked_mut`, `to_cstring`. - Link to v7: https://lore.kernel.org/r/20250202-cstr-core-v7-0-da1802520438@gmail.com Changes in v7: - Rebased on mainline. - Restore functionality added in commit a321f3ad0a5d ("rust: str: add {make,to}_{upper,lower}case() to CString"). - Used `diff.algorithm patience` to improve diff readability. - Link to v6: https://lore.kernel.org/r/20250202-cstr-core-v6-0-8469cd6d29fd@gmail.com Changes in v6: - Split the work into several commits for ease of review. - Restore `{from,as}_char_ptr` to allow building on ARM (see commit message). - Add `CStrExt` to `kernel::prelude`. (Alice Ryhl) - Remove `CStrExt::from_bytes_with_nul_unchecked_mut` and restore `DerefMut for CString`. (Alice Ryhl) - Rename and hide `kernel::c_str!` to encourage use of C-String literals. - Drop implementation and invocation changes in kunit.rs. (Trevor Gross) - Drop docs on `Display` impl. (Trevor Gross) - Rewrite docs in the style of the standard library. - Restore the `test_cstr_debug` unit tests to demonstrate that the implementation has changed. Changes in v5: - Keep the `test_cstr_display*` unit tests. Changes in v4: - Provide the `CStrExt` trait with `display()` method, which returns a `CStrDisplay` wrapper with `Display` implementation. This addresses the lack of `Display` implementation for `core::ffi::CStr`. - Provide `from_bytes_with_nul_unchecked_mut()` method in `CStrExt`, which might be useful and is going to prevent manual, unsafe casts. - Fix a typo (s/preffered/prefered/). Changes in v3: - Fix the commit message. - Remove redundant braces in `use`, when only one item is imported. Changes in v2: - Do not remove `c_str` macro. While it's preferred to use C-string literals, there are two cases where `c_str` is helpful: - When working with macros, which already return a Rust string literal (e.g. `stringify!`). - When building macros, where we want to take a Rust string literal as an argument (for caller's convenience), but still use it as a C-string internally. - Use Rust literals as arguments in macros (`new_mutex`, `new_condvar`, `new_mutex`). Use the `c_str` macro to convert these literals to C-string literals. - Use `c_str` in kunit.rs for converting the output of `stringify!` to a `CStr`. - Remove `DerefMut` implementation for `CString`. --- Tamir Duberstein (5): rust: retitle "Example" section as "Examples" rust: support formatting of foreign types rust: replace `CStr` with `core::ffi::CStr` rust: replace `kernel::c_str!` with C-Strings rust: remove core::ffi::CStr reexport drivers/block/rnull.rs | 2 +- drivers/gpu/drm/drm_panic_qr.rs | 5 +- drivers/gpu/nova-core/driver.rs | 2 +- drivers/gpu/nova-core/firmware.rs | 2 +- drivers/net/phy/ax88796b_rust.rs | 8 +- drivers/net/phy/qt2025.rs | 6 +- rust/kernel/block/mq.rs | 2 +- rust/kernel/device.rs | 9 +- rust/kernel/devres.rs | 2 +- rust/kernel/driver.rs | 4 +- rust/kernel/error.rs | 10 +- rust/kernel/faux.rs | 5 +- rust/kernel/firmware.rs | 16 +- rust/kernel/fmt.rs | 77 +++++++ rust/kernel/kunit.rs | 21 +- rust/kernel/lib.rs | 3 +- rust/kernel/miscdevice.rs | 5 +- rust/kernel/net/phy.rs | 12 +- rust/kernel/of.rs | 5 +- rust/kernel/pci.rs | 2 +- rust/kernel/platform.rs | 6 +- rust/kernel/prelude.rs | 5 +- rust/kernel/print.rs | 4 +- rust/kernel/seq_file.rs | 6 +- rust/kernel/str.rs | 415 ++++++++++------------------------- rust/kernel/sync.rs | 7 +- rust/kernel/sync/condvar.rs | 4 +- rust/kernel/sync/lock.rs | 4 +- rust/kernel/sync/lock/global.rs | 6 +- rust/kernel/sync/poll.rs | 1 + rust/kernel/workqueue.rs | 1 + rust/macros/fmt.rs | 118 ++++++++++ rust/macros/kunit.rs | 6 +- rust/macros/lib.rs | 21 +- rust/macros/module.rs | 2 +- samples/rust/rust_driver_faux.rs | 4 +- samples/rust/rust_driver_pci.rs | 4 +- samples/rust/rust_driver_platform.rs | 4 +- samples/rust/rust_misc_device.rs | 3 +- scripts/rustdoc_test_gen.rs | 2 +- 40 files changed, 426 insertions(+), 395 deletions(-) --- base-commit: cbeaa41dfe26b72639141e87183cb23e00d4b0dd change-id: 20250201-cstr-core-d4b9b69120cf Best regards, -- Tamir Duberstein <tamird(a)gmail.com>

7 months, 2 weeks

4
33
0 0

[PATCH bpf-next v1 1/2] bpf: Restrict usage scope of bpf_get_cgroup_classid

by Jiayuan Chen

A previous commit expanded the usage scope of bpf_get_cgroup_classid() to all contexts (see Fixes tag), but this was inappropriate. First, syzkaller reported a bug [1]. Second, it uses skb as an argument, but its implementation varies across different bpf prog types. For example, in sock_filter and sock_addr, it retrieves the classid from the current context (bpf_get_cgroup_classid_curr_proto) instead of from skb. In tc egress and lwt, it fetches the classid from skb->sk, but in tc ingress, it returns 0. In summary, the definition of bpf_get_cgroup_classid() is ambiguous and its usage scenarios are limited. It should not be treated as a general-purpose helper. This patch reverts part of the previous commit. [1] https://syzkaller.appspot.com/bug?extid=9767c7ed68b95cfa69e6 Fixes: ee971630f20f ("bpf: Allow some trace helpers for all prog types") Reported-by: syzbot+9767c7ed68b95cfa69e6(a)syzkaller.appspotmail.com Signed-off-by: Jiayuan Chen <jiayuan.chen(a)linux.dev> --- include/linux/bpf-cgroup.h | 8 ++++++++ kernel/bpf/cgroup.c | 25 +++++++++++++++++++++++++ kernel/bpf/helpers.c | 4 ---- 3 files changed, 33 insertions(+), 4 deletions(-) diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index 4847dcade917..9de7adb68294 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -427,6 +427,8 @@ int cgroup_bpf_prog_query(const union bpf_attr *attr, const struct bpf_func_proto * cgroup_common_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog); +const struct bpf_func_proto * +cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog); #else static inline int cgroup_bpf_inherit(struct cgroup *cgrp) { return 0; } @@ -463,6 +465,12 @@ cgroup_common_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return NULL; } +static inline const struct bpf_func_proto * +cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +{ + return NULL; +} + static inline int bpf_cgroup_storage_assign(struct bpf_prog_aux *aux, struct bpf_map *map) { return 0; } static inline struct bpf_cgroup_storage *bpf_cgroup_storage_alloc( diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 62a1d8deb3dc..a99b72e6f1c9 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -1653,6 +1653,10 @@ cgroup_dev_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) if (func_proto) return func_proto; + func_proto = cgroup_current_func_proto(func_id, prog); + if (func_proto) + return func_proto; + switch (func_id) { case BPF_FUNC_perf_event_output: return &bpf_event_output_data_proto; @@ -2200,6 +2204,10 @@ sysctl_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) if (func_proto) return func_proto; + func_proto = cgroup_current_func_proto(func_id, prog); + if (func_proto) + return func_proto; + switch (func_id) { case BPF_FUNC_sysctl_get_name: return &bpf_sysctl_get_name_proto; @@ -2343,6 +2351,10 @@ cg_sockopt_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) if (func_proto) return func_proto; + func_proto = cgroup_current_func_proto(func_id, prog); + if (func_proto) + return func_proto; + switch (func_id) { #ifdef CONFIG_NET case BPF_FUNC_get_netns_cookie: @@ -2589,3 +2601,16 @@ cgroup_common_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return NULL; } } + +const struct bpf_func_proto * +cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +{ + switch (func_id) { +#ifdef CONFIG_CGROUP_NET_CLASSID + case BPF_FUNC_get_cgroup_classid: + return &bpf_get_cgroup_classid_curr_proto; +#endif + default: + return NULL; + } +} diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index b71e428ad936..9d0d54f4f0de 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2024,10 +2024,6 @@ bpf_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_get_current_ancestor_cgroup_id_proto; case BPF_FUNC_current_task_under_cgroup: return &bpf_current_task_under_cgroup_proto; -#endif -#ifdef CONFIG_CGROUP_NET_CLASSID - case BPF_FUNC_get_cgroup_classid: - return &bpf_get_cgroup_classid_curr_proto; #endif case BPF_FUNC_task_storage_get: if (bpf_prog_check_recur(prog)) -- 2.47.1

7 months, 2 weeks

2
2
0 0

[PATCH v7 0/8] ublk: decouple server threads from ublk_queues/hctxs

by Uday Shankar

This patch set aims to allow ublk server threads to better balance load amongst themselves by decoupling server threads from ublk_queues/hctxs, so that multiple threads can service I/Os that are issued from a single CPU. This can improve performance for workloads in which ublk server CPU is a bottleneck, and for which load is issued from CPUs which are not balanced across ublk_queues/hctxs. Performance ----------- First create two ublk devices with: ublkb0: ./kublk add -t null -q 2 --nthreads 2 ublkb1: ./kublk add -t null -q 2 --nthreads 2 --per_io_tasks Then run load with: taskset -c 1 fio/t/io_uring -r5 -p0 /dev/ublkb0: 1.90M IOPS taskset -c 1 fio/t/io_uring -r5 -p0 /dev/ublkb1: 2.18M IOPS Since ublkb1 has per-io-tasks, the second command is able to make use of both ublk server worker threads and therefore has increased max throughput. Caveats: - This testing was done on a system with 2 numa nodes, but the penalty of having I/O cross a numa (or LLC) boundary in the per_io_tasks case is quite high. So these numbers were obtained after moving all ublk server threads and the application threads to CPUs on the same numa node/LLC. - One might expect the scaling to be linear - because ublkb1 can make use of twice as many ublk server threads, it should be able to drive twice the throughput. However this is not true (the improvement is ~15%), and needs further investigation. Signed-off-by: Uday Shankar <ushankar(a)purestorage.com> --- Changes in v7: - Fix queue_rqs batch dispatch for per-io daemons - Kick round-robin tag allocation changes to a followup - Add explicit feature flag for per-task daemons (Ming Lei, Caleb Sander Mateos) - Move some variable assignments to avoid redundant computation (Caleb Sander Mateos) - Switch from storing pointers in ublk_io to computing based on address with container_of in a couple places (Ming Lei) - Link to v6: https://lore.kernel.org/r/20250507-ublk_task_per_io-v6-0-a2a298783c01@pures… Changes in v6: - Add a feature flag for this feature, called UBLK_F_RR_TAGS (Ming Lei) - Add test for this feature (Ming Lei) - Add documentation for this feature (Ming Lei) - Link to v5: https://lore.kernel.org/r/20250416-ublk_task_per_io-v5-0-9261ad7bff20@pures… Changes in v5: - Set io->task before ublk_mark_io_ready (Caleb Sander Mateos) - Set io->task atomically, read it atomically when needed - Return 0 on success from command-specific helpers in __ublk_ch_uring_cmd (Caleb Sander Mateos) - Rename ublk_handle_need_get_data to ublk_get_data (Caleb Sander Mateos) - Link to v4: https://lore.kernel.org/r/20250415-ublk_task_per_io-v4-0-54210b91a46f@pures… Changes in v4: - Drop "ublk: properly serialize all FETCH_REQs" since Ming is taking it in another set - Prevent data races by marking data structures which should be read-only in the I/O path as const (Ming Lei) - Link to v3: https://lore.kernel.org/r/20250410-ublk_task_per_io-v3-0-b811e8f4554a@pures… Changes in v3: - Check for UBLK_IO_FLAG_ACTIVE on I/O again after taking lock to ensure that two concurrent FETCH_REQs on the same I/O can't succeed (Caleb Sander Mateos) - Link to v2: https://lore.kernel.org/r/20250408-ublk_task_per_io-v2-0-b97877e6fd50@pures… Changes in v2: - Remove changes split into other patches - To ease error handling/synchronization, associate each I/O (instead of each queue) to the last task that issues a FETCH_REQ against it. Only that task is allowed to operate on the I/O. - Link to v1: https://lore.kernel.org/r/20241002224437.3088981-1-ushankar@purestorage.com --- Uday Shankar (8): ublk: have a per-io daemon instead of a per-queue daemon selftests: ublk: kublk: plumb q_id in io_uring user_data selftests: ublk: kublk: tie sqe allocation to io instead of queue selftests: ublk: kublk: lift queue initialization out of thread selftests: ublk: kublk: move per-thread data out of ublk_queue selftests: ublk: kublk: decouple ublk_queues from ublk server threads selftests: ublk: add test for per io daemons Documentation: ublk: document UBLK_F_PER_IO_DAEMON Documentation/block/ublk.rst | 35 ++- drivers/block/ublk_drv.c | 108 +++---- include/uapi/linux/ublk_cmd.h | 9 + tools/testing/selftests/ublk/Makefile | 1 + tools/testing/selftests/ublk/fault_inject.c | 4 +- tools/testing/selftests/ublk/file_backed.c | 20 +- tools/testing/selftests/ublk/kublk.c | 345 ++++++++++++++------- tools/testing/selftests/ublk/kublk.h | 73 +++-- tools/testing/selftests/ublk/null.c | 22 +- tools/testing/selftests/ublk/stripe.c | 17 +- tools/testing/selftests/ublk/test_generic_12.sh | 55 ++++ .../selftests/ublk/trace/count_ios_per_tid.bt | 11 + 12 files changed, 470 insertions(+), 230 deletions(-) --- base-commit: 533c87e2ed742454957f14d7bef9f48d5a72e72d change-id: 20250408-ublk_task_per_io-c693cf608d7a Best regards, -- Uday Shankar <ushankar(a)purestorage.com>

7 months, 2 weeks

3
17
0 0

[PATCH bpf-next 2/2] selftests/bpf: Add selftests for bpf_task_cwd_from_pid()

by Rong Tao

From: Rong Tao <rongtao(a)cestc.cn> Add some selftest testcases that validate the expected behavior of the bpf_task_cwd_from_pid() kfunc that was added in the prior patch. Signed-off-by: Rong Tao <rongtao(a)cestc.cn> --- .../selftests/bpf/prog_tests/task_kfunc.c | 3 ++ .../selftests/bpf/progs/task_kfunc_common.h | 1 + .../selftests/bpf/progs/task_kfunc_success.c | 47 +++++++++++++++++++ 3 files changed, 51 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/task_kfunc.c b/tools/testing/selftests/bpf/prog_tests/task_kfunc.c index 83b90335967a..c18a0cb67164 100644 --- a/tools/testing/selftests/bpf/prog_tests/task_kfunc.c +++ b/tools/testing/selftests/bpf/prog_tests/task_kfunc.c @@ -149,6 +149,9 @@ static const char * const success_tests[] = { "task_kfunc_acquire_trusted_walked", "test_task_kfunc_flavor_relo", "test_task_kfunc_flavor_relo_not_found", + "test_task_cwd_from_pid_arg", + "test_task_cwd_from_pid", + "test_task_cwd_from_pid_current", }; static const char * const vpid_success_tests[] = { diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_common.h b/tools/testing/selftests/bpf/progs/task_kfunc_common.h index e9c4fea7a4bb..b762054a1825 100644 --- a/tools/testing/selftests/bpf/progs/task_kfunc_common.h +++ b/tools/testing/selftests/bpf/progs/task_kfunc_common.h @@ -24,6 +24,7 @@ struct task_struct *bpf_task_acquire(struct task_struct *p) __ksym; void bpf_task_release(struct task_struct *p) __ksym; struct task_struct *bpf_task_from_pid(s32 pid) __ksym; struct task_struct *bpf_task_from_vpid(s32 vpid) __ksym; +int bpf_task_cwd_from_pid(s32 pid, char *buf, u32 buf_len) __ksym; void bpf_rcu_read_lock(void) __ksym; void bpf_rcu_read_unlock(void) __ksym; diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_success.c b/tools/testing/selftests/bpf/progs/task_kfunc_success.c index 5fb4fc19d26a..6424cf5c151e 100644 --- a/tools/testing/selftests/bpf/progs/task_kfunc_success.c +++ b/tools/testing/selftests/bpf/progs/task_kfunc_success.c @@ -351,6 +351,53 @@ int BPF_PROG(test_task_from_pid_invalid, struct task_struct *task, u64 clone_fla return 0; } +SEC("tp_btf/task_newtask") +int BPF_PROG(test_task_cwd_from_pid_arg, struct task_struct *task, u64 clone_flags) +{ + char cwd[256]; + + if (!is_test_kfunc_task()) + return 0; + + err = 0; + err += bpf_task_cwd_from_pid(task->pid, NULL, sizeof(cwd)) != -EINVAL; + err += bpf_task_cwd_from_pid(task->pid, cwd, 0) != -EINVAL; + err += bpf_task_cwd_from_pid(-1, cwd, sizeof(cwd)) != -ESRCH; + return 0; +} + +SEC("tp_btf/task_newtask") +int BPF_PROG(test_task_cwd_from_pid, struct task_struct *task, u64 clone_flags) +{ + char cwd[256]; + + if (!is_test_kfunc_task()) + return 0; + + err = bpf_task_cwd_from_pid(task->pid, cwd, sizeof(cwd)); + return 0; +} + +SEC("tp_btf/task_newtask") +int BPF_PROG(test_task_cwd_from_pid_current, struct task_struct *task, u64 clone_flags) +{ + char cwd[128], cwd2[128]; + struct task_struct *current; + + if (!is_test_kfunc_task()) + return 0; + + current = bpf_get_current_task_btf(); + + err = 0; + err += bpf_task_cwd_from_pid(task->pid, cwd, sizeof(cwd)); + err += bpf_task_cwd_from_pid(current->pid, cwd2, sizeof(cwd2)); + + err += bpf_strncmp(cwd, sizeof(cwd), cwd2) != 0; + + return 0; +} + SEC("tp_btf/task_newtask") int BPF_PROG(task_kfunc_acquire_trusted_walked, struct task_struct *task, u64 clone_flags) { -- 2.49.0

7 months, 2 weeks

1
0
0 0

[PATCH bpf-next 0/2] Add kfunc for doing pid -> task cwd

by Rong Tao

In some application like bpftrace [1], need to get the cwd from the pid. This patch provides a new kfunc that can get the cwd of the process from the pid. [1] https://github.com/bpftrace/bpftrace/issues/3314 Rong Tao (2): bpf: Add bpf_task_cwd_from_pid() kfunc selftests/bpf: Add selftests for bpf_task_cwd_from_pid() kernel/bpf/helpers.c | 45 ++++++++++++++++++ .../selftests/bpf/prog_tests/task_kfunc.c | 3 ++ .../selftests/bpf/progs/task_kfunc_common.h | 1 + .../selftests/bpf/progs/task_kfunc_success.c | 47 +++++++++++++++++++ 4 files changed, 96 insertions(+) -- 2.49.0

7 months, 2 weeks

1
0
0 0

[PATCH net-next] selftest: Add selftest for multicast address notifications

by Yuyang Huang

This commit adds a new kernel selftest to verify RTNLGRP_IPV4_MCADDR and RTNLGRP_IPV6_MCADDR notifications. The test works by adding and removing a dummy interface and then confirming that the system correctly receives join and removal notifications for the 224.0.0.1 and ff02::1 multicast addresses. The test relies on the iproute2 version to be 6.13+. Tested by the following command: $ vng -v --user root --cpus 16 -- \ make -C tools/testing/selftests TARGETS=net TEST_PROGS=rtnetlink.sh \ TEST_GEN_PROGS="" run_tests Cc: Maciej Żenczykowski <maze(a)google.com> Cc: Lorenzo Colitti <lorenzo(a)google.com> Signed-off-by: Yuyang Huang <yuyanghuang(a)google.com> --- tools/testing/selftests/net/rtnetlink.sh | 34 ++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh index 2e8243a65b50..9dbcaaeaf8cd 100755 --- a/tools/testing/selftests/net/rtnetlink.sh +++ b/tools/testing/selftests/net/rtnetlink.sh @@ -21,6 +21,7 @@ ALL_TESTS=" kci_test_vrf kci_test_encap kci_test_macsec + kci_test_mcast_addr_notification kci_test_ipsec kci_test_ipsec_offload kci_test_fdb_get @@ -1334,6 +1335,39 @@ kci_test_mngtmpaddr() return $ret } +kci_test_mcast_addr_notification() +{ + local tmpfile + local monitor_pid + local match_result + + tmpfile=$(mktemp) + + ip monitor maddr > $tmpfile & + monitor_pid=$! + sleep 1 + + run_cmd ip link add name test-dummy1 type dummy + run_cmd ip link set test-dummy1 up + run_cmd ip link del dev test-dummy1 + sleep 1 + + match_result=$(grep -cE "test-dummy1.*(224.0.0.1|ff02::1)" $tmpfile) + + kill $monitor_pid + rm $tmpfile + # There should be 4 line matches as follows. + # 13: test-dummy1 inet6 mcast ff02::1 scope global + # 13: test-dummy1 inet mcast 224.0.0.1 scope global + # Deleted 13: test-dummy1 inet mcast 224.0.0.1 scope global + # Deleted 13: test-dummy1 inet6 mcast ff02::1 scope global + if [ $match_result -ne 4 ];then + end_test "FAIL: mcast addr notification" + return 1 + fi + end_test "PASS: mcast addr notification" +} + kci_test_rtnl() { local current_test -- 2.49.0.1204.g71687c7c1d-goog

7 months, 2 weeks

2
2
0 0

[PATCH v1] selftests/mm: two fixes for the pfnmap test

by David Hildenbrand

When unregistering the signal handler, we have to pass SIG_DFL, and blindly reading from PFN 0 and PFN 1 seems to be problematic on !x86 systems. In particularly, on arm64 tx2 machines where noting resides at these physical memory locations, we can generate RAS errors. Let's fix it by scanning /proc/iomem for actual "System RAM". Reported-by: Ryan Roberts <ryan.roberts(a)arm.com> Closes: https://lore.kernel.org/all/232960c2-81db-47ca-a337-38c4bce5f997@arm.com/T/… Fixes: 2616b370323a ("selftests/mm: add simple VM_PFNMAP tests based on mmap'ing /dev/mem") Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com> Tested-by: Aishwarya TCV <aishwarya.tcv(a)arm.com> Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Signed-off-by: David Hildenbrand <david(a)redhat.com> --- tools/testing/selftests/mm/pfnmap.c | 61 +++++++++++++++++++++++++++-- 1 file changed, 57 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/mm/pfnmap.c b/tools/testing/selftests/mm/pfnmap.c index 8a9d19b6020c7..866ac023baf5c 100644 --- a/tools/testing/selftests/mm/pfnmap.c +++ b/tools/testing/selftests/mm/pfnmap.c @@ -12,6 +12,8 @@ #include <stdint.h> #include <unistd.h> #include <errno.h> +#include <stdio.h> +#include <ctype.h> #include <fcntl.h> #include <signal.h> #include <setjmp.h> @@ -43,14 +45,62 @@ static int test_read_access(char *addr, size_t size, size_t pagesize) /* Force a read that the compiler cannot optimize out. */ *((volatile char *)(addr + offs)); } - if (signal(SIGSEGV, signal_handler) == SIG_ERR) + if (signal(SIGSEGV, SIG_DFL) == SIG_ERR) return -EINVAL; return ret; } +static int find_ram_target(off_t *phys_addr, + unsigned long long pagesize) +{ + unsigned long long start, end; + char line[80], *end_ptr; + FILE *file; + + /* Search /proc/iomem for the first suitable "System RAM" range. */ + file = fopen("/proc/iomem", "r"); + if (!file) + return -errno; + + while (fgets(line, sizeof(line), file)) { + /* Ignore any child nodes. */ + if (!isalnum(line[0])) + continue; + + if (!strstr(line, "System RAM\n")) + continue; + + start = strtoull(line, &end_ptr, 16); + /* Skip over the "-" */ + end_ptr++; + /* Make end "exclusive". */ + end = strtoull(end_ptr, NULL, 16) + 1; + + /* Actual addresses are not exported */ + if (!start && !end) + break; + + /* We need full pages. */ + start = (start + pagesize - 1) & ~(pagesize - 1); + end &= ~(pagesize - 1); + + if (start != (off_t)start) + break; + + /* We need two pages. */ + if (end > start + 2 * pagesize) { + fclose(file); + *phys_addr = start; + return 0; + } + } + return -ENOENT; +} + FIXTURE(pfnmap) { + off_t phys_addr; size_t pagesize; int dev_mem_fd; char *addr1; @@ -63,14 +113,17 @@ FIXTURE_SETUP(pfnmap) { self->pagesize = getpagesize(); + /* We'll require two physical pages throughout our tests ... */ + if (find_ram_target(&self->phys_addr, self->pagesize)) + SKIP(return, "Cannot find ram target in '/proc/iomem'\n"); + self->dev_mem_fd = open("/dev/mem", O_RDONLY); if (self->dev_mem_fd < 0) SKIP(return, "Cannot open '/dev/mem'\n"); - /* We'll require the first two pages throughout our tests ... */ self->size1 = self->pagesize * 2; self->addr1 = mmap(NULL, self->size1, PROT_READ, MAP_SHARED, - self->dev_mem_fd, 0); + self->dev_mem_fd, self->phys_addr); if (self->addr1 == MAP_FAILED) SKIP(return, "Cannot mmap '/dev/mem'\n"); @@ -129,7 +182,7 @@ TEST_F(pfnmap, munmap_split) */ self->size2 = self->pagesize; self->addr2 = mmap(NULL, self->pagesize, PROT_READ, MAP_SHARED, - self->dev_mem_fd, 0); + self->dev_mem_fd, self->phys_addr); ASSERT_NE(self->addr2, MAP_FAILED); } -- 2.49.0

7 months, 2 weeks

1
0
0 0

[PATCH v2] selftests/mm: add simple VM_PFNMAP tests based on mmap'ing /dev/mem

by David Hildenbrand

Let's test some basic functionality using /dev/mem. These tests will implicitly cover some PAT (Page Attribute Handling) handling on x86. These tests will only run when /dev/mem access to the first two pages in physical address space is possible and allowed; otherwise, the tests are skipped. On current x86-64 with PAT inside a VM, all tests pass: TAP version 13 1..6 # Starting 6 tests from 1 test cases. # RUN pfnmap.madvise_disallowed ... # OK pfnmap.madvise_disallowed ok 1 pfnmap.madvise_disallowed # RUN pfnmap.munmap_split ... # OK pfnmap.munmap_split ok 2 pfnmap.munmap_split # RUN pfnmap.mremap_fixed ... # OK pfnmap.mremap_fixed ok 3 pfnmap.mremap_fixed # RUN pfnmap.mremap_shrink ... # OK pfnmap.mremap_shrink ok 4 pfnmap.mremap_shrink # RUN pfnmap.mremap_expand ... # OK pfnmap.mremap_expand ok 5 pfnmap.mremap_expand # RUN pfnmap.fork ... # OK pfnmap.fork ok 6 pfnmap.fork # PASSED: 6 / 6 tests passed. # Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0 However, we are able to trigger: [ 27.888251] x86/PAT: pfnmap:1790 freeing invalid memtype [mem 0x00000000-0x00000fff] There are probably more things worth testing in the future, such as MAP_PRIVATE handling. But this set of tests is sufficient to cover most of the things we will rework regarding PAT handling. Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Peter Xu <peterx(a)redhat.com> Cc: Dev Jain <dev.jain(a)arm.com> Signed-off-by: David Hildenbrand <david(a)redhat.com> --- Hopefully I didn't miss any review feedback. v1 -> v2: * Rewrite using kselftest_harness, which simplifies a lot of things * Add to .gitignore and run_vmtests.sh * Register signal handler on demand * Use volatile trick to force a read (not factoring out FORCE_READ just yet) * Drop mprotect() test case * Add some more comments why we test certain things * Use NULL for mmap() first parameter instead of 0 * Smaller fixes --- tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 1 + tools/testing/selftests/mm/pfnmap.c | 196 ++++++++++++++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 4 + 4 files changed, 202 insertions(+) create mode 100644 tools/testing/selftests/mm/pfnmap.c diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore index 91db34941a143..824266982aa36 100644 --- a/tools/testing/selftests/mm/.gitignore +++ b/tools/testing/selftests/mm/.gitignore @@ -20,6 +20,7 @@ mremap_test on-fault-limit transhuge-stress pagemap_ioctl +pfnmap *.tmp* protection_keys protection_keys_32 diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile index ad4d6043a60f0..ae6f994d3add7 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -84,6 +84,7 @@ TEST_GEN_FILES += mremap_test TEST_GEN_FILES += mseal_test TEST_GEN_FILES += on-fault-limit TEST_GEN_FILES += pagemap_ioctl +TEST_GEN_FILES += pfnmap TEST_GEN_FILES += thuge-gen TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += uffd-stress diff --git a/tools/testing/selftests/mm/pfnmap.c b/tools/testing/selftests/mm/pfnmap.c new file mode 100644 index 0000000000000..8a9d19b6020c7 --- /dev/null +++ b/tools/testing/selftests/mm/pfnmap.c @@ -0,0 +1,196 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Basic VM_PFNMAP tests relying on mmap() of '/dev/mem' + * + * Copyright 2025, Red Hat, Inc. + * + * Author(s): David Hildenbrand <david(a)redhat.com> + */ +#define _GNU_SOURCE +#include <stdlib.h> +#include <string.h> +#include <stdint.h> +#include <unistd.h> +#include <errno.h> +#include <fcntl.h> +#include <signal.h> +#include <setjmp.h> +#include <linux/mman.h> +#include <sys/mman.h> +#include <sys/wait.h> + +#include "../kselftest_harness.h" +#include "vm_util.h" + +static sigjmp_buf sigjmp_buf_env; + +static void signal_handler(int sig) +{ + siglongjmp(sigjmp_buf_env, -EFAULT); +} + +static int test_read_access(char *addr, size_t size, size_t pagesize) +{ + size_t offs; + int ret; + + if (signal(SIGSEGV, signal_handler) == SIG_ERR) + return -EINVAL; + + ret = sigsetjmp(sigjmp_buf_env, 1); + if (!ret) { + for (offs = 0; offs < size; offs += pagesize) + /* Force a read that the compiler cannot optimize out. */ + *((volatile char *)(addr + offs)); + } + if (signal(SIGSEGV, signal_handler) == SIG_ERR) + return -EINVAL; + + return ret; +} + +FIXTURE(pfnmap) +{ + size_t pagesize; + int dev_mem_fd; + char *addr1; + size_t size1; + char *addr2; + size_t size2; +}; + +FIXTURE_SETUP(pfnmap) +{ + self->pagesize = getpagesize(); + + self->dev_mem_fd = open("/dev/mem", O_RDONLY); + if (self->dev_mem_fd < 0) + SKIP(return, "Cannot open '/dev/mem'\n"); + + /* We'll require the first two pages throughout our tests ... */ + self->size1 = self->pagesize * 2; + self->addr1 = mmap(NULL, self->size1, PROT_READ, MAP_SHARED, + self->dev_mem_fd, 0); + if (self->addr1 == MAP_FAILED) + SKIP(return, "Cannot mmap '/dev/mem'\n"); + + /* ... and want to be able to read from them. */ + if (test_read_access(self->addr1, self->size1, self->pagesize)) + SKIP(return, "Cannot read-access mmap'ed '/dev/mem'\n"); + + self->size2 = 0; + self->addr2 = MAP_FAILED; +} + +FIXTURE_TEARDOWN(pfnmap) +{ + if (self->addr2 != MAP_FAILED) + munmap(self->addr2, self->size2); + if (self->addr1 != MAP_FAILED) + munmap(self->addr1, self->size1); + if (self->dev_mem_fd >= 0) + close(self->dev_mem_fd); +} + +TEST_F(pfnmap, madvise_disallowed) +{ + int advices[] = { + MADV_DONTNEED, + MADV_DONTNEED_LOCKED, + MADV_FREE, + MADV_WIPEONFORK, + MADV_COLD, + MADV_PAGEOUT, + MADV_POPULATE_READ, + MADV_POPULATE_WRITE, + }; + int i; + + /* All these advices must be rejected. */ + for (i = 0; i < ARRAY_SIZE(advices); i++) { + EXPECT_LT(madvise(self->addr1, self->pagesize, advices[i]), 0); + EXPECT_EQ(errno, EINVAL); + } +} + +TEST_F(pfnmap, munmap_split) +{ + /* + * Unmap the first page. This munmap() call is not really expected to + * fail, but we might be able to trigger other internal issues. + */ + ASSERT_EQ(munmap(self->addr1, self->pagesize), 0); + + /* + * Remap the first page while the second page is still mapped. This + * makes sure that any PAT tracking on x86 will allow for mmap()'ing + * a page again while some parts of the first mmap() are still + * around. + */ + self->size2 = self->pagesize; + self->addr2 = mmap(NULL, self->pagesize, PROT_READ, MAP_SHARED, + self->dev_mem_fd, 0); + ASSERT_NE(self->addr2, MAP_FAILED); +} + +TEST_F(pfnmap, mremap_fixed) +{ + char *ret; + + /* Reserve a destination area. */ + self->size2 = self->size1; + self->addr2 = mmap(NULL, self->size2, PROT_READ, MAP_ANON | MAP_PRIVATE, + -1, 0); + ASSERT_NE(self->addr2, MAP_FAILED); + + /* mremap() over our destination. */ + ret = mremap(self->addr1, self->size1, self->size2, + MREMAP_FIXED | MREMAP_MAYMOVE, self->addr2); + ASSERT_NE(ret, MAP_FAILED); +} + +TEST_F(pfnmap, mremap_shrink) +{ + char *ret; + + /* Shrinking is expected to work. */ + ret = mremap(self->addr1, self->size1, self->size1 - self->pagesize, 0); + ASSERT_NE(ret, MAP_FAILED); +} + +TEST_F(pfnmap, mremap_expand) +{ + /* + * Growing is not expected to work, and getting it right would + * be challenging. So this test primarily serves as an early warning + * that something that probably should never work suddenly works. + */ + self->size2 = self->size1 + self->pagesize; + self->addr2 = mremap(self->addr1, self->size1, self->size2, MREMAP_MAYMOVE); + ASSERT_EQ(self->addr2, MAP_FAILED); +} + +TEST_F(pfnmap, fork) +{ + pid_t pid; + int ret; + + /* fork() a child and test if the child can access the pages. */ + pid = fork(); + ASSERT_GE(pid, 0); + + if (!pid) { + EXPECT_EQ(test_read_access(self->addr1, self->size1, + self->pagesize), 0); + exit(0); + } + + wait(&ret); + if (WIFEXITED(ret)) + ret = WEXITSTATUS(ret); + else + ret = -EINVAL; + ASSERT_EQ(ret, 0); +} + +TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh index 188b125bf1f6b..dddd1dd8af145 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -63,6 +63,8 @@ separated by spaces: test soft dirty page bit semantics - pagemap test pagemap_scan IOCTL +- pfnmap + tests for VM_PFNMAP handling - cow test copy-on-write semantics - thp @@ -472,6 +474,8 @@ fi CATEGORY="pagemap" run_test ./pagemap_ioctl +CATEGORY="pfnmap" run_test ./pfnmap + # COW tests CATEGORY="cow" run_test ./cow -- 2.49.0

7 months, 2 weeks

4
12
0 0

[PATCH v2] rust: kunit: use crate-level mapping for `c_void`

by Jesung Yang

Use `kernel::ffi::c_void` instead of `core::ffi::c_void` for consistency and to centralize abstraction. Since `kernel::ffi::c_void` is a straightforward re-export of `core::ffi::c_void`, both are functionally equivalent. However, using `kernel::ffi::c_void` improves consistency across the kernel's Rust code and provides a unified reference point in case the definition ever needs to change, even if such a change is unlikely. Signed-off-by: Jesung Yang <y.j3ms.n(a)gmail.com> Link: https://rust-for-linux.zulipchat.com/#narrow/channel/288089/topic/x/near/52… --- So in sum, I believe it's reasonable to keep the diff unchanged... but I'm happy to adjust if you'd prefer a different approach. --- Changes in v2: - Add "Link" tag to the related discussion on Zulip - Reword the commit message to clarify `kernel::ffi::c_void` is a re-export - Link to v1: https://lore.kernel.org/rust-for-linux/20250526162429.1114862-1-y.j3ms.n@gm… --- rust/kernel/kunit.rs | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs index 81833a687b75..bd6fc712dd79 100644 --- a/rust/kernel/kunit.rs +++ b/rust/kernel/kunit.rs @@ -6,7 +6,8 @@ //! //! Reference: <https://docs.kernel.org/dev-tools/kunit/index.html> -use core::{ffi::c_void, fmt}; +use core::fmt; +use kernel::ffi::c_void; /// Prints a KUnit error-level message. /// base-commit: f4daa80d6be7d3c55ca72a8e560afc4e21f886aa -- 2.39.5

7 months, 2 weeks

3
2
0 0

[PATCH] rust: replace length checks with match

by Tamir Duberstein

Use a match expression with slice patterns instead of length checks and indexing. The result is more idiomatic, which is a better example for future Rust code authors. Signed-off-by: Tamir Duberstein <tamird(a)gmail.com> --- scripts/rustdoc_test_gen.rs | 33 +++++++++++++++++---------------- 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/scripts/rustdoc_test_gen.rs b/scripts/rustdoc_test_gen.rs index 1ca253594d38..a3dc251221e0 100644 --- a/scripts/rustdoc_test_gen.rs +++ b/scripts/rustdoc_test_gen.rs @@ -85,24 +85,25 @@ fn find_candidates( } } - assert!( - valid_paths.len() > 0, - "No path candidates found for `{file}`. This is likely a bug in the build system, or some \ - files went away while compiling." - ); - - if valid_paths.len() > 1 { - eprintln!("Several path candidates found:"); - for path in valid_paths { - eprintln!(" {path:?}"); + match valid_paths.as_slice() { + [] => panic!( + "No path candidates found for `{file}`. This is likely a bug in the build system, or \ + some files went away while compiling." + ), + [valid_path] => { + valid_path.to_str().unwrap() + } + valid_paths => { + eprintln!("Several path candidates found:"); + for path in valid_paths { + eprintln!(" {path:?}"); + } + panic!( + "Several path candidates found for `{file}`, please resolve the ambiguity by \ + renaming a file or folder." + ); } - panic!( - "Several path candidates found for `{file}`, please resolve the ambiguity by renaming \ - a file or folder." - ); } - - valid_paths[0].to_str().unwrap() } fn main() { --- base-commit: bfc3cd87559bc593bb32bb1482f9cae3308b6398 change-id: 20250527-idiomatic-match-slice-26a79d100e4d Best regards, -- Tamir Duberstein <tamird(a)gmail.com>

7 months, 2 weeks

3
3
0 0

[PATCH next] tools/testing: Check correct variable in open_procmap()

by Dan Carpenter

Check if "procmap_out->fd" is negative instead of "procmap_out" (which is a pointer). Fixes: bd23f293a0d5 ("tools/testing: add PROCMAP_QUERY helper functions in mm self tests") Signed-off-by: Dan Carpenter <dan.carpenter(a)linaro.org> --- tools/testing/selftests/mm/vm_util.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/mm/vm_util.c b/tools/testing/selftests/mm/vm_util.c index 1357e2d6a7b6..61d7bf1f8c62 100644 --- a/tools/testing/selftests/mm/vm_util.c +++ b/tools/testing/selftests/mm/vm_util.c @@ -439,7 +439,7 @@ int open_procmap(pid_t pid, struct procmap_fd *procmap_out) sprintf(path, "/proc/%d/maps", pid); procmap_out->query.size = sizeof(procmap_out->query); procmap_out->fd = open(path, O_RDONLY); - if (procmap_out < 0) + if (procmap_out->fd < 0) ret = -errno; return ret; -- 2.47.2

7 months, 2 weeks

2
1
0 0

[PATCH net-next v9] selftests/vsock: add initial vmtest.sh for vsock

by Bobby Eshleman

This commit introduces a new vmtest.sh runner for vsock. It uses virtme-ng/qemu to run tests in a VM. The tests validate G2H, H2G, and loopback. The testing tools from tools/testing/vsock/ are reused. Currently, only vsock_test is used. VMCI and hyperv support is included in the config file to be built with the -b option, though not used in the tests. Only tested on x86. To run: $ make -C tools/testing/selftests TARGETS=vsock $ tools/testing/selftests/vsock/vmtest.sh or $ make -C tools/testing/selftests TARGETS=vsock run_tests Example runs (after make -C tools/testing/selftests TARGETS=vsock): $ ./tools/testing/selftests/vsock/vmtest.sh 1..3 ok 0 vm_server_host_client ok 1 vm_client_host_server ok 2 vm_loopback SUMMARY: PASS=3 SKIP=0 FAIL=0 Log: /tmp/vsock_vmtest_m7DI.log $ ./tools/testing/selftests/vsock/vmtest.sh vm_loopback 1..1 ok 0 vm_loopback SUMMARY: PASS=1 SKIP=0 FAIL=0 Log: /tmp/vsock_vmtest_a1IO.log $ mkdir -p ~/scratch $ make -C tools/testing/selftests install TARGETS=vsock INSTALL_PATH=~/scratch [... omitted ...] $ cd ~/scratch $ ./run_kselftest.sh TAP version 13 1..1 # timeout set to 300 # selftests: vsock: vmtest.sh # 1..3 # ok 0 vm_server_host_client # ok 1 vm_client_host_server # ok 2 vm_loopback # SUMMARY: PASS=3 SKIP=0 FAIL=0 # Log: /tmp/vsock_vmtest_svEl.log ok 1 selftests: vsock: vmtest.sh Future work can include vsock_diag_test. Because vsock requires a VM to test anything other than loopback, this patch adds vmtest.sh as a kselftest itself. This is different than other systems that have a "vmtest.sh", where it is used as a utility script to spin up a VM to run the selftests as a guest (but isn't hooked into kselftest). Signed-off-by: Bobby Eshleman <bobbyeshleman(a)gmail.com> --- Changes in v9: - make kselftest build target depend on tools/testing/vsock sources (Stefano) - add check_vng() for vng version checking (Stefano) - add virtme_ssh_channel=tcp to kernel cmdline (Stefano) - Link to v8: https://lore.kernel.org/r/20250522-vsock-vmtest-v8-1-367619bef134@gmail.com Changes in v8: - remove NIPA comment from commit msg - remove tap_* functions and TAP_PREFIX - add -b for building kernel - Link to v7: https://lore.kernel.org/r/20250515-vsock-vmtest-v7-1-ba6fa86d6c2c@gmail.com Changes in v7: - fix exit code bug when ran is kselftest: use cnt_total instead of KSFT_NUM_TESTS - updated commit message with updated output - updated commit message with commands for installing/running as kselftest - Link to v6: https://lore.kernel.org/r/20250515-vsock-vmtest-v6-1-9af1cc023900@gmail.com Changes in v6: - add make cmd in commit message in vmtest.sh example (Stefano) - check nonzero size of QEMU_PIDFILE using -s conditional (Stefano) - display log file path after tests so it is easier to find amongst other random names - cleanup qemu pidfile if qemu is unable to remove it - make oops/warning failures more obvious with 'FAIL' prefix in log (simply saying 'detected' wasn't clear enough to identify failing condition) - Link to v5: https://lore.kernel.org/r/20250513-vsock-vmtest-v5-1-4e75c4a45ceb@gmail.com Changes in v5: - make log file a tmpfile (Paolo) - make sure both default and user defined QEMU gets handled by the dependency check (Paolo) - increased VM boot up timeout from 1m to 3m for slow hosts (Paolo) - rename vm_setup -> vm_start (Paolo) - derive wait_for_listener from selftests/net/net_helper.sh to removes ss usage - Remove unused 'unset IFS' line (Paolo) - leave space after variable declarations (Paolo) - make QEMU_PIDFILE a tmp file (Paolo) - make everything readonly that is only read (Paolo) - source ktap_helpers.sh for KSFT_PASS and friends (Paolo) - don't check for timeout util (Paolo) - add missing usage string for -q qemu arg - add tap prefix to SUMMARY line since it isn't part of TAP protocol - exit with the correct status code based on failure/pass counts - Link to v4: https://lore.kernel.org/r/20250507-vsock-vmtest-v4-1-6e2a97262cd6@gmail.com Changes in v4: - do not use special tab delimiter for help string parsing (Stefano + Paolo) - fix paths for when installing kselftest and running out-of-tree (Paolo) - change vng to using running kernel instead of compiled kernel (Paolo) - use multi-line string for QEMU_OPTS (Stefano) - change timeout to 300s (Paolo) - skip if tools are not found and use kselftests status codes (Paolo) - remove build from vmtest.sh (Paolo) - change 2222 -> SSH_HOST_PORT (Stefano) - add tap-format output - add vmtest.log to gitignore - check for vsock_test binary and remind user to build it if missing - create a proper build in makefile - style fixes - add ssh, timeout, and pkill to dependency check, just in case - fix numerical comparison in conditionals - check qemu pidfile exists before proceeding (avoid wasting time waiting for ssh) - fix tracking of pass/fail bug - fix stderr redirection bug - Link to v3: https://lore.kernel.org/r/20250428-vsock-vmtest-v3-1-181af6163f3e@gmail.com Changes in v3: - use common conditional syntax for checking variables - use return value instead of global rc - fix typo TEST_HOST_LISTENER_PORT -> TEST_HOST_PORT_LISTENER - use SIGTERM instead of SIGKILL on cleanup - use peer-cid=1 for loopback - change sleep delay times into globals - fix test_vm_loopback logging - add test selection in arguments - make QEMU an argument - check that vng binary is on path - use QEMU variable - change <tab><backslash> to <space><backslash> - fix hardcoded file paths - add comment in commit msg about script that vmtest.sh was based off of - Add tools/testing/selftest/vsock/Makefile for kselftest - Link to v2: https://lore.kernel.org/r/20250417-vsock-vmtest-v2-1-3901a27331e8@gmail.com Changes in v2: - add kernel oops and warnings checker - change testname variable to use FUNCNAME - fix spacing in test_vm_server_host_client - add -s skip build option to vmtest.sh - add test_vm_loopback - pass port to vm_wait_for_listener - fix indentation in vmtest.sh - add vmci and hyperv to config - changed whitespace from tabs to spaces in help string - Link to v1: https://lore.kernel.org/r/20250410-vsock-vmtest-v1-1-f35a81dab98c@gmail.com --- MAINTAINERS | 1 + tools/testing/selftests/vsock/.gitignore | 2 + tools/testing/selftests/vsock/Makefile | 17 ++ tools/testing/selftests/vsock/config | 114 ++++++++ tools/testing/selftests/vsock/settings | 1 + tools/testing/selftests/vsock/vmtest.sh | 487 +++++++++++++++++++++++++++++++ 6 files changed, 622 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 657a67f9031ef7798c19ac63e6383d4cb18a9e1f..3fbdd7bbfce7196a3cc7db70203317c6bd0e51fd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -25751,6 +25751,7 @@ F: include/uapi/linux/vm_sockets.h F: include/uapi/linux/vm_sockets_diag.h F: include/uapi/linux/vsockmon.h F: net/vmw_vsock/ +F: tools/testing/selftests/vsock/ F: tools/testing/vsock/ VMALLOC diff --git a/tools/testing/selftests/vsock/.gitignore b/tools/testing/selftests/vsock/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..9c5bf379480f829a14713d5f5dc7d525bc272e84 --- /dev/null +++ b/tools/testing/selftests/vsock/.gitignore @@ -0,0 +1,2 @@ +vmtest.log +vsock_test diff --git a/tools/testing/selftests/vsock/Makefile b/tools/testing/selftests/vsock/Makefile new file mode 100644 index 0000000000000000000000000000000000000000..c407c0afd9388ee692d59a95f304182f067596e9 --- /dev/null +++ b/tools/testing/selftests/vsock/Makefile @@ -0,0 +1,17 @@ +# SPDX-License-Identifier: GPL-2.0 + +CURDIR := $(abspath .) +TOOLSDIR := $(abspath ../../..) +VSOCK_TEST_DIR := $(TOOLSDIR)/testing/vsock +VSOCK_TEST_SRCS := $(wildcard $(VSOCK_TEST_DIR)/*.c $(VSOCK_TEST_DIR)/*.h) + +$(OUTPUT)/vsock_test: $(VSOCK_TEST_DIR)/vsock_test + install -m 755 $< $@ + +$(VSOCK_TEST_DIR)/vsock_test: $(VSOCK_TEST_SRCS) + $(MAKE) -C $(VSOCK_TEST_DIR) vsock_test +TEST_PROGS += vmtest.sh +TEST_GEN_FILES := vsock_test + +include ../lib.mk + diff --git a/tools/testing/selftests/vsock/config b/tools/testing/selftests/vsock/config new file mode 100644 index 0000000000000000000000000000000000000000..3bffaaf98fff92dc0e3bc1286afa3d8d5d52f4c7 --- /dev/null +++ b/tools/testing/selftests/vsock/config @@ -0,0 +1,114 @@ +CONFIG_BLK_DEV_INITRD=y +CONFIG_BPF=y +CONFIG_BPF_SYSCALL=y +CONFIG_BPF_JIT=y +CONFIG_HAVE_EBPF_JIT=y +CONFIG_BPF_EVENTS=y +CONFIG_FTRACE_SYSCALLS=y +CONFIG_FUNCTION_TRACER=y +CONFIG_HAVE_DYNAMIC_FTRACE=y +CONFIG_DYNAMIC_FTRACE=y +CONFIG_HAVE_KPROBES=y +CONFIG_KPROBES=y +CONFIG_KPROBE_EVENTS=y +CONFIG_ARCH_SUPPORTS_UPROBES=y +CONFIG_UPROBES=y +CONFIG_UPROBE_EVENTS=y +CONFIG_DEBUG_FS=y +CONFIG_FW_CFG_SYSFS=y +CONFIG_FW_CFG_SYSFS_CMDLINE=y +CONFIG_DRM=y +CONFIG_DRM_VIRTIO_GPU=y +CONFIG_DRM_VIRTIO_GPU_KMS=y +CONFIG_DRM_BOCHS=y +CONFIG_VIRTIO_IOMMU=y +CONFIG_SOUND=y +CONFIG_SND=y +CONFIG_SND_SEQUENCER=y +CONFIG_SND_PCI=y +CONFIG_SND_INTEL8X0=y +CONFIG_SND_HDA_CODEC_REALTEK=y +CONFIG_SECURITYFS=y +CONFIG_CGROUP_BPF=y +CONFIG_SQUASHFS=y +CONFIG_SQUASHFS_XZ=y +CONFIG_SQUASHFS_ZSTD=y +CONFIG_FUSE_FS=y +CONFIG_SERIO=y +CONFIG_PCI=y +CONFIG_INPUT=y +CONFIG_INPUT_KEYBOARD=y +CONFIG_KEYBOARD_ATKBD=y +CONFIG_SERIAL_8250=y +CONFIG_SERIAL_8250_CONSOLE=y +CONFIG_X86_VERBOSE_BOOTUP=y +CONFIG_VGA_CONSOLE=y +CONFIG_FB=y +CONFIG_FB_VESA=y +CONFIG_FRAMEBUFFER_CONSOLE=y +CONFIG_RTC_CLASS=y +CONFIG_RTC_HCTOSYS=y +CONFIG_RTC_DRV_CMOS=y +CONFIG_HYPERVISOR_GUEST=y +CONFIG_PARAVIRT=y +CONFIG_KVM_GUEST=y +CONFIG_KVM=y +CONFIG_KVM_INTEL=y +CONFIG_KVM_AMD=y +CONFIG_VSOCKETS=y +CONFIG_VSOCKETS_DIAG=y +CONFIG_VSOCKETS_LOOPBACK=y +CONFIG_VMWARE_VMCI_VSOCKETS=y +CONFIG_VIRTIO_VSOCKETS=y +CONFIG_VIRTIO_VSOCKETS_COMMON=y +CONFIG_HYPERV_VSOCKETS=y +CONFIG_VMWARE_VMCI=y +CONFIG_VHOST_VSOCK=y +CONFIG_HYPERV=y +CONFIG_UEVENT_HELPER=n +CONFIG_VIRTIO=y +CONFIG_VIRTIO_PCI=y +CONFIG_VIRTIO_MMIO=y +CONFIG_VIRTIO_BALLOON=y +CONFIG_NET=y +CONFIG_NET_CORE=y +CONFIG_NETDEVICES=y +CONFIG_NETWORK_FILESYSTEMS=y +CONFIG_INET=y +CONFIG_NET_9P=y +CONFIG_NET_9P_VIRTIO=y +CONFIG_9P_FS=y +CONFIG_VIRTIO_NET=y +CONFIG_CMDLINE_OVERRIDE=n +CONFIG_BINFMT_SCRIPT=y +CONFIG_SHMEM=y +CONFIG_TMPFS=y +CONFIG_UNIX=y +CONFIG_MODULE_SIG_FORCE=n +CONFIG_DEVTMPFS=y +CONFIG_TTY=y +CONFIG_VT=y +CONFIG_UNIX98_PTYS=y +CONFIG_EARLY_PRINTK=y +CONFIG_INOTIFY_USER=y +CONFIG_BLOCK=y +CONFIG_SCSI_LOWLEVEL=y +CONFIG_SCSI=y +CONFIG_SCSI_VIRTIO=y +CONFIG_BLK_DEV_SD=y +CONFIG_VIRTIO_CONSOLE=y +CONFIG_WATCHDOG=y +CONFIG_WATCHDOG_CORE=y +CONFIG_I6300ESB_WDT=y +CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y +CONFIG_OVERLAY_FS=y +CONFIG_DAX=y +CONFIG_DAX_DRIVER=y +CONFIG_FS_DAX=y +CONFIG_MEMORY_HOTPLUG=y +CONFIG_MEMORY_HOTREMOVE=y +CONFIG_ZONE_DEVICE=y +CONFIG_FUSE_FS=y +CONFIG_VIRTIO_FS=y +CONFIG_VSOCKETS=y +CONFIG_VIRTIO_VSOCKETS=y diff --git a/tools/testing/selftests/vsock/settings b/tools/testing/selftests/vsock/settings new file mode 100644 index 0000000000000000000000000000000000000000..694d70710ff08ac9fc91e9ecb5dbdadcf051f019 --- /dev/null +++ b/tools/testing/selftests/vsock/settings @@ -0,0 +1 @@ +timeout=300 diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh new file mode 100755 index 0000000000000000000000000000000000000000..edacebfc163251eee9cd495eb5e704dc7adc958e --- /dev/null +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -0,0 +1,487 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (c) 2025 Meta Platforms, Inc. and affiliates +# +# Dependencies: +# * virtme-ng +# * busybox-static (used by virtme-ng) +# * qemu (used by virtme-ng) + +readonly SCRIPT_DIR="$(cd -P -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)" +readonly KERNEL_CHECKOUT=$(realpath "${SCRIPT_DIR}"/../../../../) + +source "${SCRIPT_DIR}"/../kselftest/ktap_helpers.sh + +readonly VSOCK_TEST="${SCRIPT_DIR}"/vsock_test +readonly TEST_GUEST_PORT=51000 +readonly TEST_HOST_PORT=50000 +readonly TEST_HOST_PORT_LISTENER=50001 +readonly SSH_GUEST_PORT=22 +readonly SSH_HOST_PORT=2222 +readonly VSOCK_CID=1234 +readonly WAIT_PERIOD=3 +readonly WAIT_PERIOD_MAX=60 +readonly WAIT_TOTAL=$(( WAIT_PERIOD * WAIT_PERIOD_MAX )) +readonly QEMU_PIDFILE=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid) + +# virtme-ng offers a netdev for ssh when using "--ssh", but we also need a +# control port forwarded for vsock_test. Because virtme-ng doesn't support +# adding an additional port to forward to the device created from "--ssh" and +# virtme-init mistakenly sets identical IPs to the ssh device and additional +# devices, we instead opt out of using --ssh, add the device manually, and also +# add the kernel cmdline options that virtme-init uses to setup the interface. +readonly QEMU_TEST_PORT_FWD="hostfwd=tcp::${TEST_HOST_PORT}-:${TEST_GUEST_PORT}" +readonly QEMU_SSH_PORT_FWD="hostfwd=tcp::${SSH_HOST_PORT}-:${SSH_GUEST_PORT}" +readonly QEMU_OPTS="\ + -netdev user,id=n0,${QEMU_TEST_PORT_FWD},${QEMU_SSH_PORT_FWD} \ + -device virtio-net-pci,netdev=n0 \ + -device vhost-vsock-pci,guest-cid=${VSOCK_CID} \ + --pidfile ${QEMU_PIDFILE} \ +" +readonly KERNEL_CMDLINE="\ + virtme.dhcp net.ifnames=0 biosdevname=0 \ + virtme.ssh virtme_ssh_channel=tcp virtme_ssh_user=$USER \ +" +readonly LOG=$(mktemp /tmp/vsock_vmtest_XXXX.log) +readonly TEST_NAMES=(vm_server_host_client vm_client_host_server vm_loopback) +readonly TEST_DESCS=( + "Run vsock_test in server mode on the VM and in client mode on the host." + "Run vsock_test in client mode on the VM and in server mode on the host." + "Run vsock_test using the loopback transport in the VM." +) + +VERBOSE=0 + +usage() { + local name + local desc + local i + + echo + echo "$0 [OPTIONS] [TEST]..." + echo "If no TEST argument is given, all tests will be run." + echo + echo "Options" + echo " -b: build the kernel from the current source tree and use it for guest VMs" + echo " -q: set the path to or name of qemu binary" + echo " -v: verbose output" + echo + echo "Available tests" + + for ((i = 0; i < ${#TEST_NAMES[@]}; i++)); do + name=${TEST_NAMES[${i}]} + desc=${TEST_DESCS[${i}]} + printf "\t%-35s%-35s\n" "${name}" "${desc}" + done + echo + + exit 1 +} + +die() { + echo "$*" >&2 + exit "${KSFT_FAIL}" +} + +vm_ssh() { + ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost "$@" + return $? +} + +cleanup() { + if [[ -s "${QEMU_PIDFILE}" ]]; then + pkill -SIGTERM -F "${QEMU_PIDFILE}" > /dev/null 2>&1 + fi + + # If failure occurred during or before qemu start up, then we need + # to clean this up ourselves. + if [[ -e "${QEMU_PIDFILE}" ]]; then + rm "${QEMU_PIDFILE}" + fi +} + +check_args() { + local found + + for arg in "$@"; do + found=0 + for name in "${TEST_NAMES[@]}"; do + if [[ "${name}" = "${arg}" ]]; then + found=1 + break + fi + done + + if [[ "${found}" -eq 0 ]]; then + echo "${arg} is not an available test" >&2 + usage + fi + done + + for arg in "$@"; do + if ! command -v > /dev/null "test_${arg}"; then + echo "Test ${arg} not found" >&2 + usage + fi + done +} + +check_deps() { + for dep in vng ${QEMU} busybox pkill ssh; do + if [[ ! -x $(command -v "${dep}") ]]; then + echo -e "skip: dependency ${dep} not found!\n" + exit "${KSFT_SKIP}" + fi + done + + if [[ ! -x $(command -v "${VSOCK_TEST}") ]]; then + printf "skip: %s not found!" "${VSOCK_TEST}" + printf " Please build the kselftest vsock target.\n" + exit "${KSFT_SKIP}" + fi +} + +check_vng() { + local tested_versions + local version + local ok + + tested_versions=("1.33" "1.36") + version="$(vng --version)" + + ok=0 + for tv in "${tested_versions[@]}"; do + if [[ "${version}" == *"${tv}"* ]]; then + ok=1 + break + fi + done + + if [[ ! "${ok}" -eq 1 ]]; then + printf "warning: vng version '%s' has not been tested and may " "${version}" >&2 + printf "not function properly.\n\tThe following versions have been tested: " >&2 + echo "${tested_versions[@]}" >&2 + fi +} + +handle_build() { + if [[ ! "${BUILD}" -eq 1 ]]; then + return + fi + + if [[ ! -d "${KERNEL_CHECKOUT}" ]]; then + echo "-b requires vmtest.sh called from the kernel source tree" >&2 + exit 1 + fi + + pushd "${KERNEL_CHECKOUT}" &>/dev/null + + if ! vng --kconfig --config "${SCRIPT_DIR}"/config; then + die "failed to generate .config for kernel source tree (${KERNEL_CHECKOUT})" + fi + + if ! make -j$(nproc); then + die "failed to build kernel from source tree (${KERNEL_CHECKOUT})" + fi + + popd &>/dev/null +} + +vm_start() { + local logfile=/dev/null + local verbose_opt="" + local kernel_opt="" + local qemu + + qemu=$(command -v "${QEMU}") + + if [[ "${VERBOSE}" -eq 1 ]]; then + verbose_opt="--verbose" + logfile=/dev/stdout + fi + + if [[ "${BUILD}" -eq 1 ]]; then + kernel_opt="${KERNEL_CHECKOUT}" + fi + + vng \ + --run \ + ${kernel_opt} \ + ${verbose_opt} \ + --qemu-opts="${QEMU_OPTS}" \ + --qemu="${qemu}" \ + --user root \ + --append "${KERNEL_CMDLINE}" \ + --rw &> ${logfile} & + + if ! timeout ${WAIT_TOTAL} \ + bash -c 'while [[ ! -s '"${QEMU_PIDFILE}"' ]]; do sleep 1; done; exit 0'; then + die "failed to boot VM" + fi +} + +vm_wait_for_ssh() { + local i + + i=0 + while true; do + if [[ ${i} -gt ${WAIT_PERIOD_MAX} ]]; then + die "Timed out waiting for guest ssh" + fi + if vm_ssh -- true; then + break + fi + i=$(( i + 1 )) + sleep ${WAIT_PERIOD} + done +} + +# derived from selftests/net/net_helper.sh +wait_for_listener() +{ + local port=$1 + local interval=$2 + local max_intervals=$3 + local protocol=tcp + local pattern + local i + + pattern=":$(printf "%04X" "${port}") " + + # for tcp protocol additionally check the socket state + [ "${protocol}" = "tcp" ] && pattern="${pattern}0A" + for i in $(seq "${max_intervals}"); do + if awk '{print $2" "$4}' /proc/net/"${protocol}"* | \ + grep -q "${pattern}"; then + break + fi + sleep "${interval}" + done +} + +vm_wait_for_listener() { + local port=$1 + + vm_ssh <<EOF +$(declare -f wait_for_listener) +wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX} +EOF +} + +host_wait_for_listener() { + wait_for_listener "${TEST_HOST_PORT_LISTENER}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}" +} + +__log_stdin() { + cat | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }' +} + +__log_args() { + echo "$*" | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }' +} + +log() { + local prefix="$1" + + shift + local redirect= + if [[ ${VERBOSE} -eq 0 ]]; then + redirect=/dev/null + else + redirect=/dev/stdout + fi + + if [[ "$#" -eq 0 ]]; then + __log_stdin | tee -a "${LOG}" > ${redirect} + else + __log_args "$@" | tee -a "${LOG}" > ${redirect} + fi +} + +log_setup() { + log "setup" "$@" +} + +log_host() { + local testname=$1 + + shift + log "test:${testname}:host" "$@" +} + +log_guest() { + local testname=$1 + + shift + log "test:${testname}:guest" "$@" +} + +test_vm_server_host_client() { + local testname="${FUNCNAME[0]#test_}" + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=server \ + --control-port="${TEST_GUEST_PORT}" \ + --peer-cid=2 \ + 2>&1 | log_guest "${testname}" & + + vm_wait_for_listener "${TEST_GUEST_PORT}" + + ${VSOCK_TEST} \ + --mode=client \ + --control-host=127.0.0.1 \ + --peer-cid="${VSOCK_CID}" \ + --control-port="${TEST_HOST_PORT}" 2>&1 | log_host "${testname}" + + return $? +} + +test_vm_client_host_server() { + local testname="${FUNCNAME[0]#test_}" + + ${VSOCK_TEST} \ + --mode "server" \ + --control-port "${TEST_HOST_PORT_LISTENER}" \ + --peer-cid "${VSOCK_CID}" 2>&1 | log_host "${testname}" & + + host_wait_for_listener + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=client \ + --control-host=10.0.2.2 \ + --peer-cid=2 \ + --control-port="${TEST_HOST_PORT_LISTENER}" 2>&1 | log_guest "${testname}" + + return $? +} + +test_vm_loopback() { + local testname="${FUNCNAME[0]#test_}" + local port=60000 # non-forwarded local port + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=server \ + --control-port="${port}" \ + --peer-cid=1 2>&1 | log_guest "${testname}" & + + vm_wait_for_listener "${port}" + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=client \ + --control-host="127.0.0.1" \ + --control-port="${port}" \ + --peer-cid=1 2>&1 | log_guest "${testname}" + + return $? +} + +run_test() { + local host_oops_cnt_before + local host_warn_cnt_before + local vm_oops_cnt_before + local vm_warn_cnt_before + local host_oops_cnt_after + local host_warn_cnt_after + local vm_oops_cnt_after + local vm_warn_cnt_after + local name + local rc + + host_oops_cnt_before=$(dmesg | grep -c -i 'Oops') + host_warn_cnt_before=$(dmesg --level=warn | wc -l) + vm_oops_cnt_before=$(vm_ssh -- dmesg | grep -c -i 'Oops') + vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | wc -l) + + name=$(echo "${1}" | awk '{ print $1 }') + eval test_"${name}" + rc=$? + + host_oops_cnt_after=$(dmesg | grep -i 'Oops' | wc -l) + if [[ ${host_oops_cnt_after} -gt ${host_oops_cnt_before} ]]; then + echo "FAIL: kernel oops detected on host" | log_host "${name}" + rc=$KSFT_FAIL + fi + + host_warn_cnt_after=$(dmesg --level=warn | wc -l) + if [[ ${host_warn_cnt_after} -gt ${host_warn_cnt_before} ]]; then + echo "FAIL: kernel warning detected on host" | log_host "${name}" + rc=$KSFT_FAIL + fi + + vm_oops_cnt_after=$(vm_ssh -- dmesg | grep -i 'Oops' | wc -l) + if [[ ${vm_oops_cnt_after} -gt ${vm_oops_cnt_before} ]]; then + echo "FAIL: kernel oops detected on vm" | log_host "${name}" + rc=$KSFT_FAIL + fi + + vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | wc -l) + if [[ ${vm_warn_cnt_after} -gt ${vm_warn_cnt_before} ]]; then + echo "FAIL: kernel warning detected on vm" | log_host "${name}" + rc=$KSFT_FAIL + fi + + return "${rc}" +} + +QEMU="qemu-system-$(uname -m)" + +while getopts :hvsq:b o +do + case $o in + v) VERBOSE=1;; + b) BUILD=1;; + q) QEMU=$OPTARG;; + h|*) usage;; + esac +done +shift $((OPTIND-1)) + +trap cleanup EXIT + +if [[ ${#} -eq 0 ]]; then + ARGS=("${TEST_NAMES[@]}") +else + ARGS=("$@") +fi + +check_args "${ARGS[@]}" +check_deps +check_vng +handle_build + +echo "1..${#ARGS[@]}" + +log_setup "Booting up VM" +vm_start +vm_wait_for_ssh +log_setup "VM booted up" + +cnt_pass=0 +cnt_fail=0 +cnt_skip=0 +cnt_total=0 +for arg in "${ARGS[@]}"; do + run_test "${arg}" + rc=$? + if [[ ${rc} -eq $KSFT_PASS ]]; then + cnt_pass=$(( cnt_pass + 1 )) + echo "ok ${cnt_total} ${arg}" + elif [[ ${rc} -eq $KSFT_SKIP ]]; then + cnt_skip=$(( cnt_skip + 1 )) + echo "ok ${cnt_total} ${arg} # SKIP" + elif [[ ${rc} -eq $KSFT_FAIL ]]; then + cnt_fail=$(( cnt_fail + 1 )) + echo "not ok ${cnt_total} ${arg} # exit=$rc" + fi + cnt_total=$(( cnt_total + 1 )) +done + +echo "SUMMARY: PASS=${cnt_pass} SKIP=${cnt_skip} FAIL=${cnt_fail}" +echo "Log: ${LOG}" + +if [ $((cnt_pass + cnt_skip)) -eq ${cnt_total} ]; then + exit "$KSFT_PASS" +else + exit "$KSFT_FAIL" +fi --- base-commit: 8066e388be48f1ad62b0449dc1d31a25489fa12a change-id: 20250325-vsock-vmtest-b3a21d2102c2 Best regards, -- Bobby Eshleman <bobbyeshleman(a)gmail.com>

7 months, 2 weeks

3
2
0 0

[bug report] selftests/landlock: Add tests for audit flags and domain IDs

by Dan Carpenter

Hello Mickaël Salaün, Commit 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs") from Mar 20, 2025 (linux-next), leads to the following Smatch static checker warning: tools/testing/selftests/landlock/audit.h:408 audit_init_filter_exe() warn: unsigned 'filter->exe_len' is never less than zero. tools/testing/selftests/landlock/audit.h 399 static int audit_init_filter_exe(struct audit_filter *filter, const char *path) 400 { 401 char *absolute_path = NULL; 402 403 /* It is assume that there is not already filtering rules. */ 404 filter->record_type = AUDIT_EXE; 405 if (!path) { 406 filter->exe_len = readlink("/proc/self/exe", filter->exe, 407 sizeof(filter->exe) - 1); --> 408 if (filter->exe_len < 0) size_t can't be negative. 409 return -errno; 410 411 return 0; 412 } 413 414 absolute_path = realpath(path, NULL); 415 if (!absolute_path) 416 return -errno; 417 418 /* No need for the terminating NULL byte. */ 419 filter->exe_len = strlen(absolute_path); 420 if (filter->exe_len > sizeof(filter->exe)) 421 return -E2BIG; 422 423 memcpy(filter->exe, absolute_path, filter->exe_len); 424 free(absolute_path); 425 return 0; 426 } regards, dan carpenter

7 months, 2 weeks

1
0
0 0

[PATCH net-next] selftests/bpf: Fix bpf selftest build warning

by Saket Kumar Bhaskar

On linux-next, build for bpf selftest displays a warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/if_xdp.h' differs from latest version at 'include/uapi/linux/if_xdp.h'. Commit 8066e388be48 ("net: add UAPI to the header guard in various network headers") changed the header guard from _LINUX_IF_XDP_H to _UAPI_LINUX_IF_XDP_H in include/uapi/linux/if_xdp.h. To resolve the warning, update tools/include/uapi/linux/if_xdp.h to align with the changes in include/uapi/linux/if_xdp.h Fixes: 8066e388be48 ("net: add UAPI to the header guard in various network headers") Reported-by: Venkat Rao Bagalkote <venkat88(a)linux.ibm.com> Closes: https://lore.kernel.org/all/c2bc466d-dff2-4d0d-a797-9af7f676c065@linux.ibm.… Tested-by: Venkat Rao Bagalkote <venkat88(a)linux.ibm.com> Signed-off-by: Saket Kumar Bhaskar <skb99(a)linux.ibm.com> --- tools/include/uapi/linux/if_xdp.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h index 42869770776e..44f2bb93e7e6 100644 --- a/tools/include/uapi/linux/if_xdp.h +++ b/tools/include/uapi/linux/if_xdp.h @@ -7,8 +7,8 @@ * Magnus Karlsson <magnus.karlsson(a)intel.com> */ -#ifndef _LINUX_IF_XDP_H -#define _LINUX_IF_XDP_H +#ifndef _UAPI_LINUX_IF_XDP_H +#define _UAPI_LINUX_IF_XDP_H #include <linux/types.h> @@ -180,4 +180,4 @@ struct xdp_desc { /* TX packet carries valid metadata. */ #define XDP_TX_METADATA (1 << 1) -#endif /* _LINUX_IF_XDP_H */ +#endif /* _UAPI_LINUX_IF_XDP_H */ -- 2.43.5

7 months, 2 weeks

3
2
0 0

[PATCH 0/7] Rust KUnit `#[test]` support improvements

by Miguel Ojeda

Improvements that build on top of the very basic `#[test]` support merged in v6.15. They are fairly minimal changes, but they allow us to map `assert*!`s back to KUnit, plus to add support for test functions that return `Result`s. In essence, they get our `#[test]`s essentially on par with the documentation tests. I also took the chance to convert some host `#[test]`s we had to KUnit in order to showcase the feature. Finally, I added documentation that was lacking from the original submission. I hope this helps. Miguel Ojeda (7): rust: kunit: support KUnit-mapped `assert!` macros in `#[test]`s rust: kunit: support checked `-> Result`s in KUnit `#[test]`s rust: add `kunit_tests` to the prelude rust: str: convert `rusttest` tests into KUnit rust: str: take advantage of the `-> Result` support in KUnit `#[test]`'s Documentation: rust: rename `#[test]`s to "`rusttest` host tests" Documentation: rust: testing: add docs on the new KUnit `#[test]` tests Documentation/rust/testing.rst | 80 ++++++++++++++++++++++++++++++++-- init/Kconfig | 3 ++ rust/Makefile | 3 +- rust/kernel/kunit.rs | 29 ++++++++++-- rust/kernel/prelude.rs | 2 +- rust/kernel/str.rs | 76 ++++++++++++++------------------ rust/macros/helpers.rs | 16 +++++++ rust/macros/kunit.rs | 31 ++++++++++++- rust/macros/lib.rs | 6 ++- 9 files changed, 191 insertions(+), 55 deletions(-) base-commit: b4432656b36e5cc1d50a1f2dc15357543add530e -- 2.49.0

7 months, 2 weeks

7
39
0 0

[PATCH v7 net-next 00/15] AccECN protocol patch series

by chia-yu.chang＠nokia-bell-labs.com

From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com> Hello, Please find the v7: v7 (14-May-2025) - Modify group sizes of tcp_sock_write_txrx and tcp_sock_write_rx in #3 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>) - Fix the issue in #4 and #5 where the RFC3168 ECN behavior in tcp_ecn_send() is changed (Paolo Abeni <pabeni(a)redhat.com>) - Modify group size of tcp_sock_write_txrx in #4 and #6 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>) - Update commit message for #9 to explain the increase in tcp_sock_write_rx group size - Modify group size of tcp_sock_write_tx in #10 based on pahole results v6 (09-May-2025) - Add #3 to utilize exisintg holes of tcp_sock_write_txrx group for later patches (#4, #9, #10) with new u8 members (Paolo Abeni <pabeni(a)redhat.com>) - Add pahole outcomes before and after commit in #4, #5, #6, #9, #10, #15 (Paolo Abeni <pabeni(a)redhat.com>) - Define new helper function tcp_send_ack_reflect_ect() for sending ACK with reflected ECT in #5 (Paolo Abeni <pabeni(a)redhat.com>) - Add comments for function tcp_ecn_rcv_synack() in #5 (Paolo Abeni <pabeni(a)redhat.com>) - Add enum/define to be used by sysctl_tcp_ecn in #5, sysctl_tcp_ecn_option in #9, and sysctl_tcp_ecn_option_beacon in #10 (Paolo Abeni <pabeni(a)redhat.com>) - Move accecn_fail_mode and saw_accecn_opt in #5 and #11 to use exisintg holes of tcp_sock (Paolo Abeni <pabeni(a)redhat.com>) - Change data type of new members of tcp_request_sock and move them to the end of struct in #5 and #11 (Paolo Abeni <pabeni(a)redhat.com>) - Move new members of tcp_info to the end of struct in #6 (Paolo Abeni <pabeni(a)redhat.com>) - Merge previous #7 into #9 (Paolo Abeni <pabeni(a)redhat.com>) - Mask ecnfield with INET_ECN_MASK to remove WARN_ONCE in #9 (Paolo Abeni <pabeni(a)redhat.com>) - Reduce the indentation levels for reabability in #9 and #10 (Paolo Abeni <pabeni(a)redhat.com>) - Move delivered_ecn_bytes to the RX group in #9, accecn_opt_tstamp to the TX group in #10, pkts_acked_ewma to the RX group in #15 (Paolo Abeni <pabeni(a)redhat.com>) - Add changes in Documentation/networking/net_cachelines/tcp_sock.rst for new tcp_sock members in #3, #5, #6, #9, #10, #15 v5 (22-Apr-2025) - Further fix for 32-bit ARM alignment in tcp.c (Simon Horman <horms(a)kernel.org>) v4 (18-Apr-2025) - Fix 32-bit ARM assertion for alignment requirement (Simon Horman <horms(a)kernel.org>) v3 (14-Apr-2025) - Fix patch apply issue in v2 (Jakub Kicinski <kuba(a)kernel.org>) v2 (18-Mar-2025) - Add one missing patch from the previous AccECN protocol preparation patch series to this patch series. The full patch series can be found in https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/ The Accurate ECN draft can be found in https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-accurate-ecn-28 Best regards, Chia-Yu Chia-Yu Chang (2): tcp: reorganize tcp_sock_write_txrx group for variables later tcp: accecn: AccECN option failure handling Ilpo Järvinen (13): tcp: reorganize SYN ECN code tcp: fast path functions later tcp: AccECN core tcp: accecn: AccECN negotiation tcp: accecn: add AccECN rx byte counters tcp: accecn: AccECN needs to know delivered bytes tcp: sack option handling improvements tcp: accecn: AccECN option tcp: accecn: AccECN option send control tcp: accecn: AccECN option ceb/cep heuristic tcp: accecn: AccECN ACE field multi-wrap heuristic tcp: accecn: try to fit AccECN option with SACK tcp: try to avoid safer when ACKs are thinned .../networking/net_cachelines/tcp_sock.rst | 14 + include/linux/tcp.h | 34 +- include/net/netns/ipv4.h | 2 + include/net/tcp.h | 221 ++++++- include/uapi/linux/tcp.h | 7 + net/ipv4/syncookies.c | 3 + net/ipv4/sysctl_net_ipv4.c | 19 + net/ipv4/tcp.c | 30 +- net/ipv4/tcp_input.c | 608 +++++++++++++++++- net/ipv4/tcp_ipv4.c | 7 +- net/ipv4/tcp_minisocks.c | 92 ++- net/ipv4/tcp_output.c | 296 ++++++++- net/ipv6/syncookies.c | 1 + net/ipv6/tcp_ipv6.c | 1 + 14 files changed, 1237 insertions(+), 98 deletions(-) -- 2.34.1

7 months, 2 weeks

4
25
0 0

[PATCH v3] kselftest: x86: Improve MOV SS test result message

by Brigham Campbell

Make test result message more descriptive and grammatically correct. Signed-off-by: Brigham Campbell <me(a)brighamcampbell.com> --- No changes in v3. I'm resending this patch to adjust patch format suggestions made by Shuah. tools/testing/selftests/x86/mov_ss_trap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/x86/mov_ss_trap.c b/tools/testing/selftests/x86/mov_ss_trap.c index f22cb6b382f9..d80033c0d7eb 100644 --- a/tools/testing/selftests/x86/mov_ss_trap.c +++ b/tools/testing/selftests/x86/mov_ss_trap.c @@ -269,6 +269,6 @@ int main() ); } - printf("[OK]\tI aten't dead\n"); + printf("[OK]\tkernel handled MOV SS without crashing test\n"); return 0; } -- 2.49.0

7 months, 2 weeks

1
0
0 0

[PATCH net-next v2 0/8] Devmem TCP minor cleanups and ksft improvements

by Mina Almasry

v2: https://lore.kernel.org/netdev/20250519023517.4062941-1-almasrymina@google.… Changelog: - Collect acks and tested-bys (Thanks!) - Drop the patch that removed ksft_disruptive. That seems to not have any relation to behavior when test fails. - Address comments. --- Minor cleanups to the devmem tcp code, and not-so-minor improvements to the ksft. For the cleanups: - Address comment from Paolo post-merge. - Fix whitespace. - Add improvement dropped from Taehee's fix patch. For the ksft: - Add support for ipv4 environment. - Add support for drivers that are limited to 5-tuple flow steering. - Improve test by sending 1K data instead of just "hello\nworld" Cc: sdf(a)fomichev.me Cc: ap420073(a)gmail.com Cc: praan(a)google.com Cc: shivajikant(a)google.com Mina Almasry (8): net: devmem: move list_add to net_devmem_bind_dmabuf. page_pool: fix ugly page_pool formatting net: devmem: preserve sockc_err net: devmem: ksft: add ipv4 support net: devmem: ksft: add exit_wait to make rx test pass net: devmem: ksft: add 5 tuple FS support net: devmem: ksft: upgrade rx test to send 1K data net: devmem: ncdevmem: remove unused variable net/core/devmem.c | 5 +++- net/core/devmem.h | 5 +++- net/core/netdev-genl.c | 8 ++----- net/core/page_pool.c | 4 ++-- net/ipv4/tcp.c | 24 ++++++++----------- .../selftests/drivers/net/hw/devmem.py | 18 +++++++------- .../selftests/drivers/net/hw/ncdevmem.c | 16 ++++++++++--- 7 files changed, 44 insertions(+), 36 deletions(-) base-commit: ea15e046263b19e91ffd827645ae5dfa44ebd044 -- 2.49.0.1151.ga128411c76-goog

7 months, 2 weeks

3
14
0 0

[PATCH v17 net-next 0/5] DUALPI2 patch

by chia-yu.chang＠nokia-bell-labs.com

From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com> Hello, Please find the DualPI2 patch v17. This patch serise adds DualPI Improved with a Square (DualPI2) with following features: * Supports congestion controls that comply with the Prague requirements in RFC9331 (e.g. TCP-Prague) * Coupled dual-queue that separates the L4S traffic in a low latency queue (L-queue), without harming remaining traffic that is scheduled in classic queue (C-queue) due to congestion-coupling using PI2 as defined in RFC9332 * Configurable overload strategies * Use of sojourn time to reliably estimate queue delay * Supports ECN L4S-identifier (IP.ECN==0b*1) to classify traffic into respective queues For more details of DualPI2, please refer IETF RFC9332 (https://datatracker.ietf.org/doc/html/rfc9332). Best regards, Chia-Yu --- v17 (25-May-2025) - Replace 0xffffffff with U32_MAX (Paolo Abeni <pabeni(a)redhat.com>) - Use helper function qdisc_dequeue_internal() and add new helper function skb_apply_step() (Paolo Abeni <pabeni(a)redhat.com>) - Add s64 casting when calculating the delta of the PI controller (Paolo Abeni <pabeni(a)redhat.com>) - Change the drop reason into SKB_DROP_REASON_QDISC_CONGESTED for drop_early (Paolo Abeni <pabeni(a)redhat.com>) - Modify the condition to remove the original skb when enqueuing multiple GSO segments (Paolo Abeni <pabeni(a)redhat.com>) - Add READ_ONCE() in dualpi2_dump_stat() (Paolo Abeni <pabeni(a)redhat.com>) - Add comments, brackets, and brackets for readability (Paolo Abeni <pabeni(a)redhat.com>) v16 (16-MAy-2025) - Add qdisc_lock() to dualpi2_timer() in dualpi2_timer (Paolo Abeni <pabeni(a)redhat.com>) - Introduce convert_ns_to_usec() to convert usec to nsec without overflow in #1 (Paolo Abeni <pabeni(a)redhat.com>) - Update convert_us_tonsec() to convert nsec to usec without overflow in #2 (Paolo Abeni <pabeni(a)redhat.com>) - Add more descriptions with respect to DualPI2 in the cover ltter and add changelog in each patch (Paolo Abeni <pabeni(a)redhat.com>) v15 (09-May-2025) - Add enum of TCA_DUALPI2_ECN_MASK_CLA_ECT to remove potential leakeage in #1 (Simon Horman <horms(a)kernel.org>) - Fix one typo in comment of #2 - Update tc.yaml in #5 to aligh with the updated enum of pkt_sched.h v14 (05-May-2025) - Modify tc.yaml: (1) Replace flags with enum and remove enum-as-flags, (2) Remove credit-queue in xstats, and (3) Change attribute types (Donald Hunter <donald.hun - Add enum and fix the ordering of variables in pkt_sched.h to align with the modified tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Add validators for DROP_OVERLOAD, DROP_EARLY, ECN_MASK, and SPLIT_GSO in sch_dualpi2.c (Donald Hunter <donald.hunter(a)gmail.com>) - Update dualpi2.json to align with the updated variable order in pkt_sched.h - Reorder patches (Donald Hunter <donald.hunter(a)gmail.com>) v13 (26-Apr-2025) - Use dashes in member names to follow YNL conventions in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Define enumerations separately for flags of drop-early, drop-overload, ecn-mask, credit-queue in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Change the types of split-gso and step-packets into flag in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Revert to u32/u8 types for tc-dualpi2-xstats members in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>) - Add new test cases in tc-tests/qdiscs/dualpi2.json to cover all dualpi2 parameters (Donald Hunter <donald.hunter(a)gmail.com>) - Change the type of TCA_DUALPI2_STEP_PACKETS into NLA_FLAG (Donald Hunter <donald.hunter(a)gmail.com>) v12 (22-Apr-2025) - Remove anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>) - Replace u32/u8 with uint and s32 with int in tc spec document (Paolo Abeni <pabeni(a)redhat.com>) - Introduce get_memory_limit function to handle potential overflow when multipling limit with MTU (Paolo Abeni <pabeni(a)redhat.com>) - Double the packet length to further include packet overhead in memory_limit (Paolo Abeni <pabeni(a)redhat.com>) - Remove the check of qdisc_qlen(sch) when calling qdisc_tree_reduce_backlog (Paolo Abeni <pabeni(a)redhat.com>) v11 (15-Apr-2025) - Replace hstimer_init with hstimer_setup in sch_dualpi2.c v10 (25-Mar-2025) - Remove leftover include in include/linux/netdevice.h and anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>) - Use kfree_skb_reason() and add SKB_DROP_REASON_DUALPI2_STEP_DROP drop reason (Paolo Abeni <pabeni(a)redhat.com>) - Split sch_dualpi2.c into 3 patches (and overall 5 patches): Struct definition & parsing, Dump stats & configuration, Enqueue/Dequeue (Paolo Abeni <pabeni(a)redhat.com>) v9 (16-Mar-2025) - Fix mem_usage error in previous version - Add min_qlen_step to the dualpi2 attribute as the minimum queue length in number of packets in the L-queue to start step threshold marking. In previous versions, this value was fixed to 2, so the step threshold was applied to mark packets in the L queue only when the queue length of the L queue was greater than or equal to 2 packets. This will cause larger queuing delays for L4S traffic at low rates (<20Mbps). So we parameterize it and change the default value to 0. Comparison of tcp_1down run 'HTB 20Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 11.55 11.70 ms 350 TCP upload avg : 18.96 N/A Mbits/s 350 TCP upload sum : 18.96 N/A Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 10.81 10.70 ms 350 TCP upload avg : 18.91 N/A Mbits/s 350 TCP upload sum : 18.91 N/A Mbits/s 350 Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 12.61 12.80 ms 350 TCP upload avg : 9.48 N/A Mbits/s 350 TCP upload sum : 9.48 N/A Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 11.06 10.80 ms 350 TCP upload avg : 9.43 N/A Mbits/s 350 TCP upload sum : 9.43 N/A Mbits/s 350 Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 40.86 37.45 ms 350 TCP upload avg : 0.88 N/A Mbits/s 350 TCP upload sum : 0.88 N/A Mbits/s 350 TCP upload::1 : 0.88 0.97 Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 11.07 10.40 ms 350 TCP upload avg : 0.55 N/A Mbits/s 350 TCP upload sum : 0.55 N/A Mbits/s 350 TCP upload::1 : 0.55 0.59 Mbits/s 350 v8 (11-Mar-2025) - Fix warning messages in v7 v7 (07-Mar-2025) - Separate into 3 patches to avoid mixing changes of documentation, selftest, and code. (Cong Wang <xiyou.wangcong(a)gmail.com>) v6 (04-Mar-2025) - Add modprobe for dulapi2 in tc-testing script tc-testing/tdc.sh (Jakub Kicinski <kuba(a)kernel.org>) - Update test cases in dualpi2.json - Update commit message v5 (22-Feb-2025) - A comparison was done between MQ + DUALPI2, MQ + FQ_PIE, MQ + FQ_CODEL: Unshaped 1gigE with 4 download streams test: - Summary of tcp_4down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 1.19 1.34 ms 349 TCP download avg : 235.42 N/A Mbits/s 349 TCP download sum : 941.68 N/A Mbits/s 349 TCP download::1 : 235.19 235.39 Mbits/s 349 TCP download::2 : 235.03 235.35 Mbits/s 349 TCP download::3 : 236.89 235.44 Mbits/s 349 TCP download::4 : 234.57 235.19 Mbits/s 349 - Summary of tcp_4down run 'MQ + FQ_PIE' avg median # data pts Ping (ms) ICMP : 1.21 1.37 ms 350 TCP download avg : 235.42 N/A Mbits/s 350 TCP download sum : 941.61 N/A Mbits/s 350 TCP download::1 : 232.54 233.13 Mbits/s 350 TCP download::2 : 232.52 232.80 Mbits/s 350 TCP download::3 : 233.14 233.78 Mbits/s 350 TCP download::4 : 243.41 241.48 Mbits/s 350 - Summary of tcp_4down run 'MQ + DUALPI2' avg median # data pts Ping (ms) ICMP : 1.19 1.34 ms 349 TCP download avg : 235.42 N/A Mbits/s 349 TCP download sum : 941.68 N/A Mbits/s 349 TCP download::1 : 235.19 235.39 Mbits/s 349 TCP download::2 : 235.03 235.35 Mbits/s 349 TCP download::3 : 236.89 235.44 Mbits/s 349 TCP download::4 : 234.57 235.19 Mbits/s 349 Unshaped 1gigE with 128 download streams test: - Summary of tcp_128down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 Unshaped 10gigE with 4 download streams test: - Summary of tcp_4down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 0.22 0.23 ms 350 TCP download avg : 2354.08 N/A Mbits/s 350 TCP download sum : 9416.31 N/A Mbits/s 350 TCP download::1 : 2353.65 2352.81 Mbits/s 350 TCP download::2 : 2354.54 2354.21 Mbits/s 350 TCP download::3 : 2353.56 2353.78 Mbits/s 350 TCP download::4 : 2354.56 2354.45 Mbits/s 350 - Summary of tcp_4down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 0.20 0.19 ms 350 TCP download avg : 2354.76 N/A Mbits/s 350 TCP download sum : 9419.04 N/A Mbits/s 350 TCP download::1 : 2354.77 2353.89 Mbits/s 350 TCP download::2 : 2353.41 2354.29 Mbits/s 350 TCP download::3 : 2356.18 2354.19 Mbits/s 350 TCP download::4 : 2354.68 2353.15 Mbits/s 350 - Summary of tcp_4down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 0.24 0.24 ms 350 TCP download avg : 2354.11 N/A Mbits/s 350 TCP download sum : 9416.43 N/A Mbits/s 350 TCP download::1 : 2354.75 2353.93 Mbits/s 350 TCP download::2 : 2353.15 2353.75 Mbits/s 350 TCP download::3 : 2353.49 2353.72 Mbits/s 350 TCP download::4 : 2355.04 2353.73 Mbits/s 350 Unshaped 10gigE with 128 download streams test: - Summary of tcp_128down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 7.57 8.69 ms 350 TCP download avg : 73.97 N/A Mbits/s 350 TCP download sum : 9467.82 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 7.82 8.91 ms 350 TCP download avg : 73.97 N/A Mbits/s 350 TCP download sum : 9468.42 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 6.87 7.93 ms 350 TCP download avg : 73.95 N/A Mbits/s 350 TCP download sum : 9465.87 N/A Mbits/s 350 From the results shown above, we see small differences between combinations. - Update commit message to include results of no_split_gso and split_gso (Dave Taht <dave.taht(a)gmail.com> and Paolo Abeni <pabeni(a)redhat.com>) - Add memlimit in the dualpi2 attribute, and add memory_used, max_memory_used, memory_limit in dualpi2 stats (Dave Taht <dave.taht(a)gmail.com>) - Update note in sch_dualpi2.c related to BBRv3 status (Dave Taht <dave.taht(a)gmail.com>) - Update license identifier (Dave Taht <dave.taht(a)gmail.com>) - Add selftest in tools/testing/selftests/tc-testing (Cong Wang <xiyou.wangcong(a)gmail.com>) - Use netlink policies for parameter checks (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Modify texts & fix typos in Documentation/netlink/specs/tc.yaml (Dave Taht <dave.taht(a)gmail.com>) - Add descriptions of packet counter statistics and the reset function of sch_dualpi2.c - Fix step_thresh in packets - Update code comments in sch_dualpi2.c v4 (22-Oct-2024) - Update statement in Kconfig for DualPI2 (Stephen Hemminger <stephen(a)networkplumber.org>) - Put a blank line after #define in sch_dualpi2.c (Stephen Hemminger <stephen(a)networkplumber.org>) - Fix line length warning. v3 (19-Oct-2024) - Fix compilaiton error - Update Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>) v2 (18-Oct-2024) - Add Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>) - Use dualpi2 instead of skb prefix (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Replace nla_parse_nested_deprecated with nla_parse_nested (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Fix line length warning --- Chia-Yu Chang (4): sched: Struct definition and parsing of dualpi2 qdisc sched: Dump configuration and statistics of dualpi2 qdisc selftests/tc-testing: Add selftests for qdisc DualPI2 Documentation: netlink: specs: tc: Add DualPI2 specification Koen De Schepper (1): sched: Add enqueue/dequeue of dualpi2 qdisc Documentation/netlink/specs/tc.yaml | 156 +++ include/net/dropreason-core.h | 6 + include/uapi/linux/pkt_sched.h | 68 + net/sched/Kconfig | 12 + net/sched/Makefile | 1 + net/sched/sch_dualpi2.c | 1146 +++++++++++++++++ tools/testing/selftests/tc-testing/config | 1 + .../tc-testing/tc-tests/qdiscs/dualpi2.json | 254 ++++ tools/testing/selftests/tc-testing/tdc.sh | 1 + 9 files changed, 1645 insertions(+) create mode 100644 net/sched/sch_dualpi2.c create mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/dualpi2.json -- 2.34.1

7 months, 2 weeks

2
6
0 0

[PATCH net-next 1/3] net: devmem: support single IOV with sendmsg

by Stanislav Fomichev

sendmsg() with a single iov becomes ITER_UBUF, sendmsg() with multiple iovs becomes ITER_IOVEC. iter_iov_len does not return correct value for UBUF, so teach to treat UBUF differently. Cc: Al Viro <viro(a)zeniv.linux.org.uk> Cc: Pavel Begunkov <asml.silence(a)gmail.com> Cc: Mina Almasry <almasrymina(a)google.com> Fixes: bd61848900bf ("net: devmem: Implement TX path") Signed-off-by: Stanislav Fomichev <stfomichev(a)gmail.com> --- include/linux/uio.h | 8 +++++++- net/core/datagram.c | 3 ++- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index 49ece9e1888f..393d0622cc28 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -99,7 +99,13 @@ static inline const struct iovec *iter_iov(const struct iov_iter *iter) } #define iter_iov_addr(iter) (iter_iov(iter)->iov_base + (iter)->iov_offset) -#define iter_iov_len(iter) (iter_iov(iter)->iov_len - (iter)->iov_offset) + +static inline size_t iter_iov_len(const struct iov_iter *i) +{ + if (i->iter_type == ITER_UBUF) + return i->count; + return iter_iov(i)->iov_len - i->iov_offset; +} static inline enum iter_type iov_iter_type(const struct iov_iter *i) { diff --git a/net/core/datagram.c b/net/core/datagram.c index 9ef5442536f5..c44f1d2b70a4 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -706,7 +706,8 @@ zerocopy_fill_skb_from_devmem(struct sk_buff *skb, struct iov_iter *from, * iov_addrs are interpreted as an offset in bytes into the dma-buf to * send from. We do not support other iter types. */ - if (iov_iter_type(from) != ITER_IOVEC) + if (iov_iter_type(from) != ITER_IOVEC && + iov_iter_type(from) != ITER_UBUF) return -EFAULT; while (length && iov_iter_count(from)) { -- 2.49.0

7 months, 2 weeks

4
12
0 0

[PATCHv2 net-next] selftests: net: move wait_local_port_listen to lib.sh

by Hangbin Liu

The function wait_local_port_listen() is the only function defined in net_helper.sh. Since some tests source both lib.sh and net_helper.sh, we can simplify the setup by moving wait_local_port_listen() to lib.sh. With this change, net_helper.sh becomes redundant and can be removed. Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com> --- v2: remove net_helper in selftests/drivers (Matthieu Baerts) --- tools/testing/selftests/drivers/net/Makefile | 1 - .../drivers/net/lib/sh/lib_netcons.sh | 1 - .../selftests/drivers/net/netdevsim/peer.sh | 2 +- tools/testing/selftests/net/Makefile | 2 +- tools/testing/selftests/net/busy_poll_test.sh | 2 +- .../net/ipv6_route_update_soft_lockup.sh | 1 - tools/testing/selftests/net/lib.sh | 21 ++++++++++++++++ tools/testing/selftests/net/mptcp/Makefile | 2 +- .../testing/selftests/net/mptcp/mptcp_lib.sh | 1 - tools/testing/selftests/net/net_helper.sh | 25 ------------------- tools/testing/selftests/net/pmtu.sh | 1 - tools/testing/selftests/net/udpgro.sh | 2 +- tools/testing/selftests/net/udpgro_bench.sh | 2 +- tools/testing/selftests/net/udpgro_frglist.sh | 2 +- tools/testing/selftests/net/udpgro_fwd.sh | 2 +- 15 files changed, 29 insertions(+), 38 deletions(-) delete mode 100644 tools/testing/selftests/net/net_helper.sh diff --git a/tools/testing/selftests/drivers/net/Makefile b/tools/testing/selftests/drivers/net/Makefile index 17db31aa58c9..be780bcb73a3 100644 --- a/tools/testing/selftests/drivers/net/Makefile +++ b/tools/testing/selftests/drivers/net/Makefile @@ -3,7 +3,6 @@ CFLAGS += $(KHDR_INCLUDES) TEST_INCLUDES := $(wildcard lib/py/*.py) \ $(wildcard lib/sh/*.sh) \ - ../../net/net_helper.sh \ ../../net/lib.sh \ TEST_GEN_FILES := \ diff --git a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh index 3c96b022954d..29b01b8e2215 100644 --- a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh +++ b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh @@ -33,7 +33,6 @@ NSIM_DEV_SYS_NEW="/sys/bus/netdevsim/new_device" # Used to create and delete namespaces source "${LIBDIR}"/../../../../net/lib.sh -source "${LIBDIR}"/../../../../net/net_helper.sh # Create netdevsim interfaces create_ifaces() { diff --git a/tools/testing/selftests/drivers/net/netdevsim/peer.sh b/tools/testing/selftests/drivers/net/netdevsim/peer.sh index aed62d9e6c0a..1bb46ec435d4 100755 --- a/tools/testing/selftests/drivers/net/netdevsim/peer.sh +++ b/tools/testing/selftests/drivers/net/netdevsim/peer.sh @@ -1,7 +1,7 @@ #!/bin/bash # SPDX-License-Identifier: GPL-2.0-only -source ../../../net/net_helper.sh +source ../../../net/lib.sh NSIM_DEV_1_ID=$((256 + RANDOM % 256)) NSIM_DEV_1_SYS=/sys/bus/netdevsim/devices/netdevsim$NSIM_DEV_1_ID diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 70a38f485d4d..ea84b88bcb30 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -115,7 +115,7 @@ YNL_GEN_FILES := busy_poller netlink-dumps TEST_GEN_FILES += $(YNL_GEN_FILES) TEST_FILES := settings -TEST_FILES += in_netns.sh lib.sh net_helper.sh setup_loopback.sh setup_veth.sh +TEST_FILES += in_netns.sh lib.sh setup_loopback.sh setup_veth.sh TEST_GEN_FILES += $(patsubst %.c,%.o,$(wildcard *.bpf.c)) diff --git a/tools/testing/selftests/net/busy_poll_test.sh b/tools/testing/selftests/net/busy_poll_test.sh index 7db292ec4884..7d2d40812074 100755 --- a/tools/testing/selftests/net/busy_poll_test.sh +++ b/tools/testing/selftests/net/busy_poll_test.sh @@ -1,6 +1,6 @@ #!/bin/bash # SPDX-License-Identifier: GPL-2.0 -source net_helper.sh +source lib.sh NSIM_SV_ID=$((256 + RANDOM % 256)) NSIM_SV_SYS=/sys/bus/netdevsim/devices/netdevsim$NSIM_SV_ID diff --git a/tools/testing/selftests/net/ipv6_route_update_soft_lockup.sh b/tools/testing/selftests/net/ipv6_route_update_soft_lockup.sh index a6b2b1f9c641..c6866e42f95c 100755 --- a/tools/testing/selftests/net/ipv6_route_update_soft_lockup.sh +++ b/tools/testing/selftests/net/ipv6_route_update_soft_lockup.sh @@ -69,7 +69,6 @@ # which can affect the conditions needed to trigger a soft lockup. source lib.sh -source net_helper.sh TEST_DURATION=300 ROUTING_TABLE_REFRESH_PERIOD=0.01 diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh index 7962da06f816..006fdadcc4b9 100644 --- a/tools/testing/selftests/net/lib.sh +++ b/tools/testing/selftests/net/lib.sh @@ -595,3 +595,24 @@ bridge_vlan_add() bridge vlan add "$@" defer bridge vlan del "$@" } + +wait_local_port_listen() +{ + local listener_ns="${1}" + local port="${2}" + local protocol="${3}" + local pattern + local i + + pattern=":$(printf "%04X" "${port}") " + + # for tcp protocol additionally check the socket state + [ ${protocol} = "tcp" ] && pattern="${pattern}0A" + for i in $(seq 10); do + if ip netns exec "${listener_ns}" awk '{print $2" "$4}' \ + /proc/net/"${protocol}"* | grep -q "${pattern}"; then + break + fi + sleep 0.1 + done +} diff --git a/tools/testing/selftests/net/mptcp/Makefile b/tools/testing/selftests/net/mptcp/Makefile index 340e1a777e16..e47788bfa671 100644 --- a/tools/testing/selftests/net/mptcp/Makefile +++ b/tools/testing/selftests/net/mptcp/Makefile @@ -11,7 +11,7 @@ TEST_GEN_FILES = mptcp_connect pm_nl_ctl mptcp_sockopt mptcp_inq mptcp_diag TEST_FILES := mptcp_lib.sh settings -TEST_INCLUDES := ../lib.sh $(wildcard ../lib/sh/*.sh) ../net_helper.sh +TEST_INCLUDES := ../lib.sh $(wildcard ../lib/sh/*.sh) EXTRA_CLEAN := *.pcap diff --git a/tools/testing/selftests/net/mptcp/mptcp_lib.sh b/tools/testing/selftests/net/mptcp/mptcp_lib.sh index 55212188871e..09cd24b2ae46 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_lib.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_lib.sh @@ -2,7 +2,6 @@ # SPDX-License-Identifier: GPL-2.0 . "$(dirname "${0}")/../lib.sh" -. "$(dirname "${0}")/../net_helper.sh" readonly KSFT_PASS=0 readonly KSFT_FAIL=1 diff --git a/tools/testing/selftests/net/net_helper.sh b/tools/testing/selftests/net/net_helper.sh deleted file mode 100644 index 6596fe03c77f..000000000000 --- a/tools/testing/selftests/net/net_helper.sh +++ /dev/null @@ -1,25 +0,0 @@ -#!/bin/bash -# SPDX-License-Identifier: GPL-2.0 -# -# Helper functions - -wait_local_port_listen() -{ - local listener_ns="${1}" - local port="${2}" - local protocol="${3}" - local pattern - local i - - pattern=":$(printf "%04X" "${port}") " - - # for tcp protocol additionally check the socket state - [ ${protocol} = "tcp" ] && pattern="${pattern}0A" - for i in $(seq 10); do - if ip netns exec "${listener_ns}" awk '{print $2" "$4}' \ - /proc/net/"${protocol}"* | grep -q "${pattern}"; then - break - fi - sleep 0.1 - done -} diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh index 66be7699c72c..88e914c4eef9 100755 --- a/tools/testing/selftests/net/pmtu.sh +++ b/tools/testing/selftests/net/pmtu.sh @@ -205,7 +205,6 @@ # Check that PMTU exceptions are created for both paths. source lib.sh -source net_helper.sh PAUSE_ON_FAIL=no VERBOSE=0 diff --git a/tools/testing/selftests/net/udpgro.sh b/tools/testing/selftests/net/udpgro.sh index d5ffd8c9172e..1dc337c709f8 100755 --- a/tools/testing/selftests/net/udpgro.sh +++ b/tools/testing/selftests/net/udpgro.sh @@ -3,7 +3,7 @@ # # Run a series of udpgro functional tests. -source net_helper.sh +source lib.sh readonly PEER_NS="ns-peer-$(mktemp -u XXXXXX)" diff --git a/tools/testing/selftests/net/udpgro_bench.sh b/tools/testing/selftests/net/udpgro_bench.sh index 815fad8c53a8..54fa4821bc5e 100755 --- a/tools/testing/selftests/net/udpgro_bench.sh +++ b/tools/testing/selftests/net/udpgro_bench.sh @@ -3,7 +3,7 @@ # # Run a series of udpgro benchmarks -source net_helper.sh +source lib.sh readonly PEER_NS="ns-peer-$(mktemp -u XXXXXX)" diff --git a/tools/testing/selftests/net/udpgro_frglist.sh b/tools/testing/selftests/net/udpgro_frglist.sh index 5f3d1a110d11..9a2cfec1153e 100755 --- a/tools/testing/selftests/net/udpgro_frglist.sh +++ b/tools/testing/selftests/net/udpgro_frglist.sh @@ -3,7 +3,7 @@ # # Run a series of udpgro benchmarks -source net_helper.sh +source lib.sh readonly PEER_NS="ns-peer-$(mktemp -u XXXXXX)" diff --git a/tools/testing/selftests/net/udpgro_fwd.sh b/tools/testing/selftests/net/udpgro_fwd.sh index f22f6c66997e..a39fdc4aa2ff 100755 --- a/tools/testing/selftests/net/udpgro_fwd.sh +++ b/tools/testing/selftests/net/udpgro_fwd.sh @@ -1,7 +1,7 @@ #!/bin/bash # SPDX-License-Identifier: GPL-2.0 -source net_helper.sh +source lib.sh BPF_FILE="lib/xdp_dummy.bpf.o" readonly BASE="ns-$(mktemp -u XXXXXX)" -- 2.46.0

7 months, 2 weeks

3
2
0 0

[PATCH] selftests: net: fix spelling and grammar mistakes

by Praveen Balakrishnan

Fix several spelling and grammatical mistakes in output messages from the net selftests to improve readability. Only the message strings for the test output have been modified. No changes to the functional logic of the tests have been made. Signed-off-by: Praveen Balakrishnan <praveen.balakrishnan(a)magd.ox.ac.uk> --- .../testing/selftests/net/netfilter/conntrack_vrf.sh | 4 ++-- tools/testing/selftests/net/openvswitch/ovs-dpctl.py | 2 +- tools/testing/selftests/net/rps_default_mask.sh | 12 ++++++------ 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/net/netfilter/conntrack_vrf.sh b/tools/testing/selftests/net/netfilter/conntrack_vrf.sh index e95ecb37c2b1..806d2bfbd6e7 100755 --- a/tools/testing/selftests/net/netfilter/conntrack_vrf.sh +++ b/tools/testing/selftests/net/netfilter/conntrack_vrf.sh @@ -236,9 +236,9 @@ EOF ip netns exec "$ns1" ping -q -w 1 -c 1 "$DUMMYNET".2 > /dev/null if ip netns exec "$ns0" nft list counter t fibcount | grep -q "packets 1"; then - echo "PASS: fib lookup returned exepected output interface" + echo "PASS: fib lookup returned expected output interface" else - echo "FAIL: fib lookup did not return exepected output interface" + echo "FAIL: fib lookup did not return expected output interface" ret=1 return fi diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py index 8a0396bfaf99..b521e0dea506 100644 --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py @@ -1877,7 +1877,7 @@ class OvsPacket(GenericNetlinkSocket): elif msg["cmd"] == OvsPacket.OVS_PACKET_CMD_EXECUTE: up.execute(msg) else: - print("Unkonwn cmd: %d" % msg["cmd"]) + print("Unknown cmd: %d" % msg["cmd"]) except NetlinkError as ne: raise ne diff --git a/tools/testing/selftests/net/rps_default_mask.sh b/tools/testing/selftests/net/rps_default_mask.sh index 4287a8529890..b200019b3c80 100755 --- a/tools/testing/selftests/net/rps_default_mask.sh +++ b/tools/testing/selftests/net/rps_default_mask.sh @@ -54,16 +54,16 @@ cleanup echo 1 > /proc/sys/net/core/rps_default_mask setup -chk_rps "changing rps_default_mask dont affect existing devices" "" lo $INITIAL_RPS_DEFAULT_MASK +chk_rps "changing rps_default_mask doesn't affect existing devices" "" lo $INITIAL_RPS_DEFAULT_MASK echo 3 > /proc/sys/net/core/rps_default_mask -chk_rps "changing rps_default_mask dont affect existing netns" $NETNS lo 0 +chk_rps "changing rps_default_mask doesn't affect existing netns" $NETNS lo 0 ip link add name $VETH type veth peer netns $NETNS name $VETH ip link set dev $VETH up ip -n $NETNS link set dev $VETH up -chk_rps "changing rps_default_mask affect newly created devices" "" $VETH 3 -chk_rps "changing rps_default_mask don't affect newly child netns[II]" $NETNS $VETH 0 +chk_rps "changing rps_default_mask affects newly created devices" "" $VETH 3 +chk_rps "changing rps_default_mask doesn't affect newly child netns[II]" $NETNS $VETH 0 ip link del dev $VETH ip netns del $NETNS @@ -72,8 +72,8 @@ chk_rps "rps_default_mask is 0 by default in child netns" "$NETNS" lo 0 ip netns exec $NETNS sysctl -qw net.core.rps_default_mask=1 ip link add name $VETH type veth peer netns $NETNS name $VETH -chk_rps "changing rps_default_mask in child ns don't affect the main one" "" lo $INITIAL_RPS_DEFAULT_MASK +chk_rps "changing rps_default_mask in child ns doesn't affect the main one" "" lo $INITIAL_RPS_DEFAULT_MASK chk_rps "changing rps_default_mask in child ns affects new childns devices" $NETNS $VETH 1 -chk_rps "changing rps_default_mask in child ns don't affect existing devices" $NETNS lo 0 +chk_rps "changing rps_default_mask in child ns doesn't affect existing devices" $NETNS lo 0 exit $ret -- 2.39.5

7 months, 2 weeks

5
5
0 0

[PATCH net-next v8] selftests/vsock: add initial vmtest.sh for vsock

by Bobby Eshleman

This commit introduces a new vmtest.sh runner for vsock. It uses virtme-ng/qemu to run tests in a VM. The tests validate G2H, H2G, and loopback. The testing tools from tools/testing/vsock/ are reused. Currently, only vsock_test is used. VMCI and hyperv support is included in the config file to be built with the -b option, though not used in the tests. Only tested on x86. To run: $ make -C tools/testing/selftests TARGETS=vsock $ tools/testing/selftests/vsock/vmtest.sh or $ make -C tools/testing/selftests TARGETS=vsock run_tests Example runs (after make -C tools/testing/selftests TARGETS=vsock): $ ./tools/testing/selftests/vsock/vmtest.sh 1..3 ok 0 vm_server_host_client ok 1 vm_client_host_server ok 2 vm_loopback SUMMARY: PASS=3 SKIP=0 FAIL=0 Log: /tmp/vsock_vmtest_m7DI.log $ ./tools/testing/selftests/vsock/vmtest.sh vm_loopback 1..1 ok 0 vm_loopback SUMMARY: PASS=1 SKIP=0 FAIL=0 Log: /tmp/vsock_vmtest_a1IO.log $ mkdir -p ~/scratch $ make -C tools/testing/selftests install TARGETS=vsock INSTALL_PATH=~/scratch [... omitted ...] $ cd ~/scratch $ ./run_kselftest.sh TAP version 13 1..1 # timeout set to 300 # selftests: vsock: vmtest.sh # 1..3 # ok 0 vm_server_host_client # ok 1 vm_client_host_server # ok 2 vm_loopback # SUMMARY: PASS=3 SKIP=0 FAIL=0 # Log: /tmp/vsock_vmtest_svEl.log ok 1 selftests: vsock: vmtest.sh Future work can include vsock_diag_test. Because vsock requires a VM to test anything other than loopback, this patch adds vmtest.sh as a kselftest itself. This is different than other systems that have a "vmtest.sh", where it is used as a utility script to spin up a VM to run the selftests as a guest (but isn't hooked into kselftest). Signed-off-by: Bobby Eshleman <bobbyeshleman(a)gmail.com> --- Changes in v8: - remove NIPA comment from commit msg - remove tap_* functions and TAP_PREFIX - add -b for building kernel - Link to v7: https://lore.kernel.org/r/20250515-vsock-vmtest-v7-1-ba6fa86d6c2c@gmail.com Changes in v7: - fix exit code bug when ran is kselftest: use cnt_total instead of KSFT_NUM_TESTS - updated commit message with updated output - updated commit message with commands for installing/running as kselftest - Link to v6: https://lore.kernel.org/r/20250515-vsock-vmtest-v6-1-9af1cc023900@gmail.com Changes in v6: - add make cmd in commit message in vmtest.sh example (Stefano) - check nonzero size of QEMU_PIDFILE using -s conditional (Stefano) - display log file path after tests so it is easier to find amongst other random names - cleanup qemu pidfile if qemu is unable to remove it - make oops/warning failures more obvious with 'FAIL' prefix in log (simply saying 'detected' wasn't clear enough to identify failing condition) - Link to v5: https://lore.kernel.org/r/20250513-vsock-vmtest-v5-1-4e75c4a45ceb@gmail.com Changes in v5: - make log file a tmpfile (Paolo) - make sure both default and user defined QEMU gets handled by the dependency check (Paolo) - increased VM boot up timeout from 1m to 3m for slow hosts (Paolo) - rename vm_setup -> vm_start (Paolo) - derive wait_for_listener from selftests/net/net_helper.sh to removes ss usage - Remove unused 'unset IFS' line (Paolo) - leave space after variable declarations (Paolo) - make QEMU_PIDFILE a tmp file (Paolo) - make everything readonly that is only read (Paolo) - source ktap_helpers.sh for KSFT_PASS and friends (Paolo) - don't check for timeout util (Paolo) - add missing usage string for -q qemu arg - add tap prefix to SUMMARY line since it isn't part of TAP protocol - exit with the correct status code based on failure/pass counts - Link to v4: https://lore.kernel.org/r/20250507-vsock-vmtest-v4-1-6e2a97262cd6@gmail.com Changes in v4: - do not use special tab delimiter for help string parsing (Stefano + Paolo) - fix paths for when installing kselftest and running out-of-tree (Paolo) - change vng to using running kernel instead of compiled kernel (Paolo) - use multi-line string for QEMU_OPTS (Stefano) - change timeout to 300s (Paolo) - skip if tools are not found and use kselftests status codes (Paolo) - remove build from vmtest.sh (Paolo) - change 2222 -> SSH_HOST_PORT (Stefano) - add tap-format output - add vmtest.log to gitignore - check for vsock_test binary and remind user to build it if missing - create a proper build in makefile - style fixes - add ssh, timeout, and pkill to dependency check, just in case - fix numerical comparison in conditionals - check qemu pidfile exists before proceeding (avoid wasting time waiting for ssh) - fix tracking of pass/fail bug - fix stderr redirection bug - Link to v3: https://lore.kernel.org/r/20250428-vsock-vmtest-v3-1-181af6163f3e@gmail.com Changes in v3: - use common conditional syntax for checking variables - use return value instead of global rc - fix typo TEST_HOST_LISTENER_PORT -> TEST_HOST_PORT_LISTENER - use SIGTERM instead of SIGKILL on cleanup - use peer-cid=1 for loopback - change sleep delay times into globals - fix test_vm_loopback logging - add test selection in arguments - make QEMU an argument - check that vng binary is on path - use QEMU variable - change <tab><backslash> to <space><backslash> - fix hardcoded file paths - add comment in commit msg about script that vmtest.sh was based off of - Add tools/testing/selftest/vsock/Makefile for kselftest - Link to v2: https://lore.kernel.org/r/20250417-vsock-vmtest-v2-1-3901a27331e8@gmail.com Changes in v2: - add kernel oops and warnings checker - change testname variable to use FUNCNAME - fix spacing in test_vm_server_host_client - add -s skip build option to vmtest.sh - add test_vm_loopback - pass port to vm_wait_for_listener - fix indentation in vmtest.sh - add vmci and hyperv to config - changed whitespace from tabs to spaces in help string - Link to v1: https://lore.kernel.org/r/20250410-vsock-vmtest-v1-1-f35a81dab98c@gmail.com --- MAINTAINERS | 1 + tools/testing/selftests/vsock/.gitignore | 2 + tools/testing/selftests/vsock/Makefile | 16 ++ tools/testing/selftests/vsock/config | 114 ++++++++ tools/testing/selftests/vsock/settings | 1 + tools/testing/selftests/vsock/vmtest.sh | 460 +++++++++++++++++++++++++++++++ 6 files changed, 594 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 657a67f9031ef7798c19ac63e6383d4cb18a9e1f..3fbdd7bbfce7196a3cc7db70203317c6bd0e51fd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -25751,6 +25751,7 @@ F: include/uapi/linux/vm_sockets.h F: include/uapi/linux/vm_sockets_diag.h F: include/uapi/linux/vsockmon.h F: net/vmw_vsock/ +F: tools/testing/selftests/vsock/ F: tools/testing/vsock/ VMALLOC diff --git a/tools/testing/selftests/vsock/.gitignore b/tools/testing/selftests/vsock/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..9c5bf379480f829a14713d5f5dc7d525bc272e84 --- /dev/null +++ b/tools/testing/selftests/vsock/.gitignore @@ -0,0 +1,2 @@ +vmtest.log +vsock_test diff --git a/tools/testing/selftests/vsock/Makefile b/tools/testing/selftests/vsock/Makefile new file mode 100644 index 0000000000000000000000000000000000000000..7ab4970e5e8a019be33f96a36f95c00573d7bfcf --- /dev/null +++ b/tools/testing/selftests/vsock/Makefile @@ -0,0 +1,16 @@ +# SPDX-License-Identifier: GPL-2.0 + +CURDIR := $(abspath .) +TOOLSDIR := $(abspath ../../..) + +$(OUTPUT)/vsock_test: $(TOOLSDIR)/testing/vsock/vsock_test + install -m 755 $< $@ + +$(TOOLSDIR)/testing/vsock/vsock_test: + $(MAKE) -C $(TOOLSDIR)/testing/vsock vsock_test + +TEST_PROGS += vmtest.sh +TEST_GEN_FILES := vsock_test + +include ../lib.mk + diff --git a/tools/testing/selftests/vsock/config b/tools/testing/selftests/vsock/config new file mode 100644 index 0000000000000000000000000000000000000000..3bffaaf98fff92dc0e3bc1286afa3d8d5d52f4c7 --- /dev/null +++ b/tools/testing/selftests/vsock/config @@ -0,0 +1,114 @@ +CONFIG_BLK_DEV_INITRD=y +CONFIG_BPF=y +CONFIG_BPF_SYSCALL=y +CONFIG_BPF_JIT=y +CONFIG_HAVE_EBPF_JIT=y +CONFIG_BPF_EVENTS=y +CONFIG_FTRACE_SYSCALLS=y +CONFIG_FUNCTION_TRACER=y +CONFIG_HAVE_DYNAMIC_FTRACE=y +CONFIG_DYNAMIC_FTRACE=y +CONFIG_HAVE_KPROBES=y +CONFIG_KPROBES=y +CONFIG_KPROBE_EVENTS=y +CONFIG_ARCH_SUPPORTS_UPROBES=y +CONFIG_UPROBES=y +CONFIG_UPROBE_EVENTS=y +CONFIG_DEBUG_FS=y +CONFIG_FW_CFG_SYSFS=y +CONFIG_FW_CFG_SYSFS_CMDLINE=y +CONFIG_DRM=y +CONFIG_DRM_VIRTIO_GPU=y +CONFIG_DRM_VIRTIO_GPU_KMS=y +CONFIG_DRM_BOCHS=y +CONFIG_VIRTIO_IOMMU=y +CONFIG_SOUND=y +CONFIG_SND=y +CONFIG_SND_SEQUENCER=y +CONFIG_SND_PCI=y +CONFIG_SND_INTEL8X0=y +CONFIG_SND_HDA_CODEC_REALTEK=y +CONFIG_SECURITYFS=y +CONFIG_CGROUP_BPF=y +CONFIG_SQUASHFS=y +CONFIG_SQUASHFS_XZ=y +CONFIG_SQUASHFS_ZSTD=y +CONFIG_FUSE_FS=y +CONFIG_SERIO=y +CONFIG_PCI=y +CONFIG_INPUT=y +CONFIG_INPUT_KEYBOARD=y +CONFIG_KEYBOARD_ATKBD=y +CONFIG_SERIAL_8250=y +CONFIG_SERIAL_8250_CONSOLE=y +CONFIG_X86_VERBOSE_BOOTUP=y +CONFIG_VGA_CONSOLE=y +CONFIG_FB=y +CONFIG_FB_VESA=y +CONFIG_FRAMEBUFFER_CONSOLE=y +CONFIG_RTC_CLASS=y +CONFIG_RTC_HCTOSYS=y +CONFIG_RTC_DRV_CMOS=y +CONFIG_HYPERVISOR_GUEST=y +CONFIG_PARAVIRT=y +CONFIG_KVM_GUEST=y +CONFIG_KVM=y +CONFIG_KVM_INTEL=y +CONFIG_KVM_AMD=y +CONFIG_VSOCKETS=y +CONFIG_VSOCKETS_DIAG=y +CONFIG_VSOCKETS_LOOPBACK=y +CONFIG_VMWARE_VMCI_VSOCKETS=y +CONFIG_VIRTIO_VSOCKETS=y +CONFIG_VIRTIO_VSOCKETS_COMMON=y +CONFIG_HYPERV_VSOCKETS=y +CONFIG_VMWARE_VMCI=y +CONFIG_VHOST_VSOCK=y +CONFIG_HYPERV=y +CONFIG_UEVENT_HELPER=n +CONFIG_VIRTIO=y +CONFIG_VIRTIO_PCI=y +CONFIG_VIRTIO_MMIO=y +CONFIG_VIRTIO_BALLOON=y +CONFIG_NET=y +CONFIG_NET_CORE=y +CONFIG_NETDEVICES=y +CONFIG_NETWORK_FILESYSTEMS=y +CONFIG_INET=y +CONFIG_NET_9P=y +CONFIG_NET_9P_VIRTIO=y +CONFIG_9P_FS=y +CONFIG_VIRTIO_NET=y +CONFIG_CMDLINE_OVERRIDE=n +CONFIG_BINFMT_SCRIPT=y +CONFIG_SHMEM=y +CONFIG_TMPFS=y +CONFIG_UNIX=y +CONFIG_MODULE_SIG_FORCE=n +CONFIG_DEVTMPFS=y +CONFIG_TTY=y +CONFIG_VT=y +CONFIG_UNIX98_PTYS=y +CONFIG_EARLY_PRINTK=y +CONFIG_INOTIFY_USER=y +CONFIG_BLOCK=y +CONFIG_SCSI_LOWLEVEL=y +CONFIG_SCSI=y +CONFIG_SCSI_VIRTIO=y +CONFIG_BLK_DEV_SD=y +CONFIG_VIRTIO_CONSOLE=y +CONFIG_WATCHDOG=y +CONFIG_WATCHDOG_CORE=y +CONFIG_I6300ESB_WDT=y +CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y +CONFIG_OVERLAY_FS=y +CONFIG_DAX=y +CONFIG_DAX_DRIVER=y +CONFIG_FS_DAX=y +CONFIG_MEMORY_HOTPLUG=y +CONFIG_MEMORY_HOTREMOVE=y +CONFIG_ZONE_DEVICE=y +CONFIG_FUSE_FS=y +CONFIG_VIRTIO_FS=y +CONFIG_VSOCKETS=y +CONFIG_VIRTIO_VSOCKETS=y diff --git a/tools/testing/selftests/vsock/settings b/tools/testing/selftests/vsock/settings new file mode 100644 index 0000000000000000000000000000000000000000..694d70710ff08ac9fc91e9ecb5dbdadcf051f019 --- /dev/null +++ b/tools/testing/selftests/vsock/settings @@ -0,0 +1 @@ +timeout=300 diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh new file mode 100755 index 0000000000000000000000000000000000000000..d083c444dd8e3a5893f7795ae5e5d90fdede6325 --- /dev/null +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -0,0 +1,460 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (c) 2025 Meta Platforms, Inc. and affiliates +# +# Dependencies: +# * virtme-ng +# * busybox-static (used by virtme-ng) +# * qemu (used by virtme-ng) + +readonly SCRIPT_DIR="$(cd -P -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)" +readonly KERNEL_CHECKOUT=$(realpath "${SCRIPT_DIR}"/../../../../) + +source "${SCRIPT_DIR}"/../kselftest/ktap_helpers.sh + +readonly VSOCK_TEST="${SCRIPT_DIR}"/vsock_test +readonly TEST_GUEST_PORT=51000 +readonly TEST_HOST_PORT=50000 +readonly TEST_HOST_PORT_LISTENER=50001 +readonly SSH_GUEST_PORT=22 +readonly SSH_HOST_PORT=2222 +readonly VSOCK_CID=1234 +readonly WAIT_PERIOD=3 +readonly WAIT_PERIOD_MAX=60 +readonly WAIT_TOTAL=$(( WAIT_PERIOD * WAIT_PERIOD_MAX )) +readonly QEMU_PIDFILE=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid) + +# virtme-ng offers a netdev for ssh when using "--ssh", but we also need a +# control port forwarded for vsock_test. Because virtme-ng doesn't support +# adding an additional port to forward to the device created from "--ssh" and +# virtme-init mistakenly sets identical IPs to the ssh device and additional +# devices, we instead opt out of using --ssh, add the device manually, and also +# add the kernel cmdline options that virtme-init uses to setup the interface. +readonly QEMU_TEST_PORT_FWD="hostfwd=tcp::${TEST_HOST_PORT}-:${TEST_GUEST_PORT}" +readonly QEMU_SSH_PORT_FWD="hostfwd=tcp::${SSH_HOST_PORT}-:${SSH_GUEST_PORT}" +readonly QEMU_OPTS="\ + -netdev user,id=n0,${QEMU_TEST_PORT_FWD},${QEMU_SSH_PORT_FWD} \ + -device virtio-net-pci,netdev=n0 \ + -device vhost-vsock-pci,guest-cid=${VSOCK_CID} \ + --pidfile ${QEMU_PIDFILE} \ +" +readonly KERNEL_CMDLINE="virtme.dhcp net.ifnames=0 biosdevname=0 virtme.ssh virtme_ssh_user=$USER" +readonly LOG=$(mktemp /tmp/vsock_vmtest_XXXX.log) +readonly TEST_NAMES=(vm_server_host_client vm_client_host_server vm_loopback) +readonly TEST_DESCS=( + "Run vsock_test in server mode on the VM and in client mode on the host." + "Run vsock_test in client mode on the VM and in server mode on the host." + "Run vsock_test using the loopback transport in the VM." +) + +VERBOSE=0 + +usage() { + local name + local desc + local i + + echo + echo "$0 [OPTIONS] [TEST]..." + echo "If no TEST argument is given, all tests will be run." + echo + echo "Options" + echo " -b: build the kernel from the current source tree and use it for guest VMs" + echo " -q: set the path to or name of qemu binary" + echo " -v: verbose output" + echo + echo "Available tests" + + for ((i = 0; i < ${#TEST_NAMES[@]}; i++)); do + name=${TEST_NAMES[${i}]} + desc=${TEST_DESCS[${i}]} + printf "\t%-35s%-35s\n" "${name}" "${desc}" + done + echo + + exit 1 +} + +die() { + echo "$*" >&2 + exit "${KSFT_FAIL}" +} + +vm_ssh() { + ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost "$@" + return $? +} + +cleanup() { + if [[ -s "${QEMU_PIDFILE}" ]]; then + pkill -SIGTERM -F "${QEMU_PIDFILE}" > /dev/null 2>&1 + fi + + # If failure occurred during or before qemu start up, then we need + # to clean this up ourselves. + if [[ -e "${QEMU_PIDFILE}" ]]; then + rm "${QEMU_PIDFILE}" + fi +} + +check_args() { + local found + + for arg in "$@"; do + found=0 + for name in "${TEST_NAMES[@]}"; do + if [[ "${name}" = "${arg}" ]]; then + found=1 + break + fi + done + + if [[ "${found}" -eq 0 ]]; then + echo "${arg} is not an available test" >&2 + usage + fi + done + + for arg in "$@"; do + if ! command -v > /dev/null "test_${arg}"; then + echo "Test ${arg} not found" >&2 + usage + fi + done +} + +check_deps() { + for dep in vng ${QEMU} busybox pkill ssh; do + if [[ ! -x $(command -v "${dep}") ]]; then + echo -e "skip: dependency ${dep} not found!\n" + exit "${KSFT_SKIP}" + fi + done + + if [[ ! -x $(command -v "${VSOCK_TEST}") ]]; then + printf "skip: %s not found!" "${VSOCK_TEST}" + printf " Please build the kselftest vsock target.\n" + exit "${KSFT_SKIP}" + fi +} + +handle_build() { + if [[ ! "${BUILD}" -eq 1 ]]; then + return + fi + + if [[ ! -d "${KERNEL_CHECKOUT}" ]]; then + echo "-b requires vmtest.sh called from the kernel source tree" >&2 + exit 1 + fi + + pushd "${KERNEL_CHECKOUT}" &>/dev/null + + if ! vng --kconfig --config "${SCRIPT_DIR}"/config; then + die "failed to generate .config for kernel source tree (${KERNEL_CHECKOUT})" + fi + + if ! make -j$(nproc); then + die "failed to build kernel from source tree (${KERNEL_CHECKOUT})" + fi + + popd &>/dev/null +} + +vm_start() { + local logfile=/dev/null + local verbose_opt="" + local kernel_opt="" + local qemu + + qemu=$(command -v "${QEMU}") + + if [[ "${VERBOSE}" -eq 1 ]]; then + verbose_opt="--verbose" + logfile=/dev/stdout + fi + + if [[ "${BUILD}" -eq 1 ]]; then + kernel_opt="${KERNEL_CHECKOUT}" + fi + + vng \ + --run \ + ${kernel_opt} \ + ${verbose_opt} \ + --qemu-opts="${QEMU_OPTS}" \ + --qemu="${qemu}" \ + --user root \ + --append "${KERNEL_CMDLINE}" \ + --rw &> ${logfile} & + + if ! timeout ${WAIT_TOTAL} \ + bash -c 'while [[ ! -s '"${QEMU_PIDFILE}"' ]]; do sleep 1; done; exit 0'; then + die "failed to boot VM" + fi +} + +vm_wait_for_ssh() { + local i + + i=0 + while true; do + if [[ ${i} -gt ${WAIT_PERIOD_MAX} ]]; then + die "Timed out waiting for guest ssh" + fi + if vm_ssh -- true; then + break + fi + i=$(( i + 1 )) + sleep ${WAIT_PERIOD} + done +} + +# derived from selftests/net/net_helper.sh +wait_for_listener() +{ + local port=$1 + local interval=$2 + local max_intervals=$3 + local protocol=tcp + local pattern + local i + + pattern=":$(printf "%04X" "${port}") " + + # for tcp protocol additionally check the socket state + [ "${protocol}" = "tcp" ] && pattern="${pattern}0A" + for i in $(seq "${max_intervals}"); do + if awk '{print $2" "$4}' /proc/net/"${protocol}"* | \ + grep -q "${pattern}"; then + break + fi + sleep "${interval}" + done +} + +vm_wait_for_listener() { + local port=$1 + + vm_ssh <<EOF +$(declare -f wait_for_listener) +wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX} +EOF +} + +host_wait_for_listener() { + wait_for_listener "${TEST_HOST_PORT_LISTENER}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}" +} + +__log_stdin() { + cat | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }' +} + +__log_args() { + echo "$*" | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }' +} + +log() { + local prefix="$1" + + shift + local redirect= + if [[ ${VERBOSE} -eq 0 ]]; then + redirect=/dev/null + else + redirect=/dev/stdout + fi + + if [[ "$#" -eq 0 ]]; then + __log_stdin | tee -a "${LOG}" > ${redirect} + else + __log_args "$@" | tee -a "${LOG}" > ${redirect} + fi +} + +log_setup() { + log "setup" "$@" +} + +log_host() { + local testname=$1 + + shift + log "test:${testname}:host" "$@" +} + +log_guest() { + local testname=$1 + + shift + log "test:${testname}:guest" "$@" +} + +test_vm_server_host_client() { + local testname="${FUNCNAME[0]#test_}" + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=server \ + --control-port="${TEST_GUEST_PORT}" \ + --peer-cid=2 \ + 2>&1 | log_guest "${testname}" & + + vm_wait_for_listener "${TEST_GUEST_PORT}" + + ${VSOCK_TEST} \ + --mode=client \ + --control-host=127.0.0.1 \ + --peer-cid="${VSOCK_CID}" \ + --control-port="${TEST_HOST_PORT}" 2>&1 | log_host "${testname}" + + return $? +} + +test_vm_client_host_server() { + local testname="${FUNCNAME[0]#test_}" + + ${VSOCK_TEST} \ + --mode "server" \ + --control-port "${TEST_HOST_PORT_LISTENER}" \ + --peer-cid "${VSOCK_CID}" 2>&1 | log_host "${testname}" & + + host_wait_for_listener + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=client \ + --control-host=10.0.2.2 \ + --peer-cid=2 \ + --control-port="${TEST_HOST_PORT_LISTENER}" 2>&1 | log_guest "${testname}" + + return $? +} + +test_vm_loopback() { + local testname="${FUNCNAME[0]#test_}" + local port=60000 # non-forwarded local port + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=server \ + --control-port="${port}" \ + --peer-cid=1 2>&1 | log_guest "${testname}" & + + vm_wait_for_listener "${port}" + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=client \ + --control-host="127.0.0.1" \ + --control-port="${port}" \ + --peer-cid=1 2>&1 | log_guest "${testname}" + + return $? +} + +run_test() { + local host_oops_cnt_before + local host_warn_cnt_before + local vm_oops_cnt_before + local vm_warn_cnt_before + local host_oops_cnt_after + local host_warn_cnt_after + local vm_oops_cnt_after + local vm_warn_cnt_after + local name + local rc + + host_oops_cnt_before=$(dmesg | grep -c -i 'Oops') + host_warn_cnt_before=$(dmesg --level=warn | wc -l) + vm_oops_cnt_before=$(vm_ssh -- dmesg | grep -c -i 'Oops') + vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | wc -l) + + name=$(echo "${1}" | awk '{ print $1 }') + eval test_"${name}" + rc=$? + + host_oops_cnt_after=$(dmesg | grep -i 'Oops' | wc -l) + if [[ ${host_oops_cnt_after} -gt ${host_oops_cnt_before} ]]; then + echo "FAIL: kernel oops detected on host" | log_host "${name}" + rc=$KSFT_FAIL + fi + + host_warn_cnt_after=$(dmesg --level=warn | wc -l) + if [[ ${host_warn_cnt_after} -gt ${host_warn_cnt_before} ]]; then + echo "FAIL: kernel warning detected on host" | log_host "${name}" + rc=$KSFT_FAIL + fi + + vm_oops_cnt_after=$(vm_ssh -- dmesg | grep -i 'Oops' | wc -l) + if [[ ${vm_oops_cnt_after} -gt ${vm_oops_cnt_before} ]]; then + echo "FAIL: kernel oops detected on vm" | log_host "${name}" + rc=$KSFT_FAIL + fi + + vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | wc -l) + if [[ ${vm_warn_cnt_after} -gt ${vm_warn_cnt_before} ]]; then + echo "FAIL: kernel warning detected on vm" | log_host "${name}" + rc=$KSFT_FAIL + fi + + return "${rc}" +} + +QEMU="qemu-system-$(uname -m)" + +while getopts :hvsq:b o +do + case $o in + v) VERBOSE=1;; + b) BUILD=1;; + q) QEMU=$OPTARG;; + h|*) usage;; + esac +done +shift $((OPTIND-1)) + +trap cleanup EXIT + +if [[ ${#} -eq 0 ]]; then + ARGS=("${TEST_NAMES[@]}") +else + ARGS=("$@") +fi + +check_args "${ARGS[@]}" +check_deps +handle_build + +echo "1..${#ARGS[@]}" + +log_setup "Booting up VM" +vm_start +vm_wait_for_ssh +log_setup "VM booted up" + +cnt_pass=0 +cnt_fail=0 +cnt_skip=0 +cnt_total=0 +for arg in "${ARGS[@]}"; do + run_test "${arg}" + rc=$? + if [[ ${rc} -eq $KSFT_PASS ]]; then + cnt_pass=$(( cnt_pass + 1 )) + echo "ok ${cnt_total} ${arg}" + elif [[ ${rc} -eq $KSFT_SKIP ]]; then + cnt_skip=$(( cnt_skip + 1 )) + echo "ok ${cnt_total} ${arg} # SKIP" + elif [[ ${rc} -eq $KSFT_FAIL ]]; then + cnt_fail=$(( cnt_fail + 1 )) + echo "not ok ${cnt_total} ${arg} # exit=$rc" + fi + cnt_total=$(( cnt_total + 1 )) +done + +echo "SUMMARY: PASS=${cnt_pass} SKIP=${cnt_skip} FAIL=${cnt_fail}" +echo "Log: ${LOG}" + +if [ $((cnt_pass + cnt_skip)) -eq ${cnt_total} ]; then + exit "$KSFT_PASS" +else + exit "$KSFT_FAIL" +fi --- base-commit: 8066e388be48f1ad62b0449dc1d31a25489fa12a change-id: 20250325-vsock-vmtest-b3a21d2102c2 Best regards, -- Bobby Eshleman <bobbyeshleman(a)gmail.com>

7 months, 2 weeks

2
4
0 0

[PATCH net-next V10 0/6] Support rate management on traffic classes in devlink and mlx5

by Tariq Toukan

Hi, This patch series by Carolina is V10 of the feature that adds rate management support on traffic classes in devlink and mlx5, see full description by Carolina below [0]. V10: - Added netdevsim selftest for tc-bw ops. - Dropped header: field as it’s unnecessary for local constants in devlink.yaml. Regards, Tariq [0] This patch series extends the devlink-rate API to support traffic class (TC) bandwidth management, enabling more granular control over traffic shaping and rate limiting across multiple TCs. The API now allows users to specify bandwidth proportions for different traffic classes in a single command. This is particularly useful for managing Enhanced Transmission Selection (ETS) for groups of Virtual Functions (VFs), allowing precise bandwidth allocation across traffic classes. Additionally the series refines the QoS handling in net/mlx5 to support TC arbitration and bandwidth management on vports and rate nodes. Discussions on traffic class shaping in net-shapers began in V5 [1], where we discussed with maintainers whether net-shapers should support traffic classes and how this could be implemented. Later, after further conversations with Paolo Abeni and Simon Horman, Cosmin provided an update [2], confirming that net-shapers' tree-based hierarchy aligns well with traffic classes when treated as distinct subsets of netdev queues. Since mlx5 enforces a 1:1 mapping between TX queues and traffic classes, this approach seems feasible, though some open questions remain regarding queue reconfiguration and certain mlx5 scheduling behaviors. Building on that discussion, Cosmin has now shared a concrete implementation plan on the netdev mailing list [3]. The plan, developed in collaboration with Paolo and Simon, outlines how net-shapers can be extended to support the same use cases currently covered by devlink-rate, with the eventual goal of aligning both and simplifying the shaping infrastructure in the kernel. This work was presented at Netdev 0x19 in Zagreb [4]. There we presented how TC scheduling is enforced in mlx5 hardware, which led to discussions on the mailing list. A summary of how things work: Classification means labeling a packet with a traffic class based on the packet's DSCP or VLAN PCP field, then treating packets with different traffic classes differently during transmit processing. In a virtualized setup, VFs are untrusted and do not control classification or shaping. Classification is done by the hardware using a prio-to-TC mapping set by the hypervisor. VFs only select which send queue to use and are expected to respect the classification logic by sending each traffic class on its dedicated queue. As stated in the net-shapers plan [3], each transmit queue should carry only a single traffic class. Mixing classes in a single queue can lead to HOL blocking. In the mlx5 implementation, if the queue used does not match the classified traffic class, the hardware moves the queue to the correct TC scheduler. This movement is not a reclassification; it’s a necessary enforcement step to ensure traffic class isolation is maintained. Extend devlink-rate API to support rate management on TCs: - devlink: Extend the devlink rate API to support traffic class bandwidth management Introduce a no-op implementation: - net/mlx5: Add no-op implementation for setting tc-bw on rate objects Add support for enabling and disabling TC QoS on vports and nodes: - net/mlx5: Add support for setting tc-bw on nodes - net/mlx5: Add traffic class scheduling support for vport QoS Support for setting tc-bw on rate objects: - net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw [1] https://lore.kernel.org/netdev/20241204220931.254964-1-tariqt@nvidia.com/ [2] https://lore.kernel.org/netdev/67df1a562614b553dcab043f347a0d7c5393ff83.cam… [3] https://lore.kernel.org/netdev/d9831d0c940a7b77419abe7c7330e822bbfd1cfb.cam… [4] https://netdevconf.info/0x19/sessions/talk/optimizing-bandwidth-allocation-… Carolina Jubran (6): devlink: Extend devlink rate API with traffic classes bandwidth management selftest: netdevsim: Add devlink rate tc-bw test net/mlx5: Add no-op implementation for setting tc-bw on rate objects net/mlx5: Add support for setting tc-bw on nodes net/mlx5: Add traffic class scheduling support for vport QoS net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw Documentation/netlink/specs/devlink.yaml | 35 +- .../networking/devlink/devlink-port.rst | 7 + .../net/ethernet/mellanox/mlx5/core/devlink.c | 2 + .../net/ethernet/mellanox/mlx5/core/esw/qos.c | 1007 ++++++++++++++++- .../net/ethernet/mellanox/mlx5/core/esw/qos.h | 8 + .../net/ethernet/mellanox/mlx5/core/eswitch.h | 14 +- drivers/net/netdevsim/dev.c | 43 + drivers/net/netdevsim/netdevsim.h | 1 + include/net/devlink.h | 6 + include/uapi/linux/devlink.h | 9 + net/devlink/netlink_gen.c | 15 +- net/devlink/netlink_gen.h | 1 + net/devlink/rate.c | 127 +++ .../drivers/net/netdevsim/devlink.sh | 51 + 14 files changed, 1289 insertions(+), 37 deletions(-) base-commit: f685204c57e87d2a88b159c7525426d70ee745c9 -- 2.31.1

7 months, 2 weeks

5
15
0 0

[PATCH] rust: kunit: use crate-level mapping for `c_void`

by Jesung Yang

Use `kernel::ffi::c_void` instead of `core::ffi::c_void` for consistency and to centralize abstraction. Since `kernel::ffi::c_void` is a transparent wrapper around `core::ffi::c_void`, both are functionally equivalent. However, using `kernel::ffi::c_void` improves consistency across the kernel's Rust code and provides a unified reference point in case the definition ever needs to change, even if such a change is unlikely. Signed-off-by: Jesung Yang <y.j3ms.n(a)gmail.com> --- rust/kernel/kunit.rs | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs index 81833a687b75..bd6fc712dd79 100644 --- a/rust/kernel/kunit.rs +++ b/rust/kernel/kunit.rs @@ -6,7 +6,8 @@ //! //! Reference: <https://docs.kernel.org/dev-tools/kunit/index.html> -use core::{ffi::c_void, fmt}; +use core::fmt; +use kernel::ffi::c_void; /// Prints a KUnit error-level message. /// -- 2.39.5

7 months, 2 weeks

3
5
0 0

[PATCH bpf-next v3 0/2] bpf, arm64: support up to 12 arguments

by Alexis Lothoré

Hello, this is the v2 of the many args series for arm64, being itself a revival of Xu Kuhoai's work to enable larger arguments count for BPF programs on ARM64 ([1]). The discussions in v1 shed some light on some issues around specific cases, for example with functions passing struct on stack with custom packing/alignment attributes: those cases can not be properly detected with the current BTF info. So this new revision aims to separate concerns with a simpler implementation, just accepting additional args on stack if we can make sure about the alignment constraints (and so, refusing attachment to functions passing structs on stacks). I then checked if the specific alignment constraints could be checked with larger scalar types rather than structs, but it appears that this use case is in fact rejected at the verifier level (see a9b59159d338 ("bpf: Do not allow btf_ctx_access with __int128 types")). So in the end the specific alignment corner cases raised in [1] can not really happen in the kernel in its current state. This new revision still brings support for the standard cases as a first step, it will then be possible to iterate on top of it to add the more specific cases like struct passed on stack and larger types. [1] https://lore.kernel.org/all/20230917150752.69612-1-xukuohai@huaweicloud.com… Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com> --- Changes in v3: - switch back -EOPNOTSUPP to -ENOTSUPP - fix comment style - group intializations for arg_aux - remove some unneeded round_up - Link to v2: https://lore.kernel.org/r/20250522-many_args_arm64-v2-0-d6afdb9cf819@bootli… Changes in v2: - remove alignment computation from btf.c - deduce alignment constraints directly in jit compiler for simple types - deny attachment to functions with "corner-cases" arguments (ie: structs on stack) - remove custom tests, as the corresponding use cases are locked either by the JIT comp or the verifier - drop RFC - Link to v1: https://lore.kernel.org/r/20250411-many_args_arm64-v1-0-0a32fe72339e@bootli… --- Alexis Lothoré (eBPF Foundation) (1): selftests/bpf: enable many-args tests for arm64 Xu Kuohai (1): bpf, arm64: Support up to 12 function arguments arch/arm64/net/bpf_jit_comp.c | 225 ++++++++++++++++++++------- tools/testing/selftests/bpf/DENYLIST.aarch64 | 2 - 2 files changed, 171 insertions(+), 56 deletions(-) --- base-commit: 9435138c069117cd59a4912b5ea2ae44cc2c5ffa change-id: 20250220-many_args_arm64-8bd3747e6948 Best regards, -- Alexis Lothoré, Bootlin Embedded Linux and Kernel engineering https://bootlin.com

7 months, 2 weeks

4
4
0 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror