February 2025 - Linux-kselftest-mirror

[PATCH net 0/4] sockmap, vsock: For connectible sockets allow only connected

by Michal Luczaj

Series deals with one more case of vsock surprising BPF/sockmap by being inconsistency about (having an) assigned transport. KASAN: null-ptr-deref in range [0x0000000000000120-0x0000000000000127] CPU: 7 UID: 0 PID: 56 Comm: kworker/7:0 Not tainted 6.14.0-rc1+ Workqueue: vsock-loopback vsock_loopback_work RIP: 0010:vsock_read_skb+0x4b/0x90 Call Trace: sk_psock_verdict_data_ready+0xa4/0x2e0 virtio_transport_recv_pkt+0x1ca8/0x2acc vsock_loopback_work+0x27d/0x3f0 process_one_work+0x846/0x1420 worker_thread+0x5b3/0xf80 kthread+0x35a/0x700 ret_from_fork+0x2d/0x70 ret_from_fork_asm+0x1a/0x30 This bug, similarly to commit f6abafcd32f9 ("vsock/bpf: return early if transport is not assigned"), could be fixed with a single NULL check. But instead, let's explore another approach: take a hint from vsock_bpf_update_proto() and teach sockmap to accept only vsocks that are already connected (no risk of transport being dropped or reassigned). At the same time straight reject the listeners (vsock listening sockets do not carry any transport anyway). This way BPF does not have to worry about vsk->transport becoming NULL. Signed-off-by: Michal Luczaj <mhal(a)rbox.co> --- Michal Luczaj (4): sockmap, vsock: For connectible sockets allow only connected vsock/bpf: Warn on socket without transport selftest/bpf: Adapt vsock_delete_on_close to sockmap rejecting unconnected selftest/bpf: Add vsock test for sockmap rejecting unconnected net/core/sock_map.c | 3 + net/vmw_vsock/af_vsock.c | 3 + net/vmw_vsock/vsock_bpf.c | 2 +- .../selftests/bpf/prog_tests/sockmap_basic.c | 70 ++++++++++++++++------ 4 files changed, 59 insertions(+), 19 deletions(-) --- base-commit: 9c01a177c2e4b55d2bcce8a1f6bdd1d46a8320e3 change-id: 20250210-vsock-listen-sockmap-nullptr-e6e82ca79611 Best regards, -- Michal Luczaj <mhal(a)rbox.co>

10 months, 1 week

3
13
0 0

[RESEND v4 0/3] mm/pkey: Add PKEY_UNRESTRICTED macro

by Yury Khrustalev

Add PKEY_UNRESTRICTED macro to mman.h and use it in selftests. For context, this change will also allow for more consistent update of the Glibc manual which in turn will help with introducing memory protection keys on AArch64 targets. Applies to 5bc55a333a2f (tag: v6.13-rc7). Note that I couldn't build ppc tests so I would appreciate if someone could check the 3rd patch. Thank you! Signed-off-by: Yury Khrustalev <yury.khrustalev(a)arm.com> --- Changes in v4: - Removed change to tools/include/uapi/asm-generic/mman-common.h as it is not necessary. Link to v3: https://lore.kernel.org/all/20241028090715.509527-1-yury.khrustalev@arm.com/ Changes in v3: - Replaced previously missed 0-s tools/testing/selftests/mm/mseal_test.c - Replaced previously missed 0-s in tools/testing/selftests/mm/mseal_test.c Link to v2: https://lore.kernel.org/linux-arch/20241027170006.464252-2-yury.khrustalev@… Changes in v2: - Update tools/include/uapi/asm-generic/mman-common.h as well - Add usages of the new macro to selftests. Link to v1: https://lore.kernel.org/linux-arch/20241022120128.359652-1-yury.khrustalev@… --- Yury Khrustalev (3): mm/pkey: Add PKEY_UNRESTRICTED macro selftests/mm: Use PKEY_UNRESTRICTED macro selftests/powerpc: Use PKEY_UNRESTRICTED macro include/uapi/asm-generic/mman-common.h | 1 + tools/testing/selftests/mm/mseal_test.c | 6 +++--- tools/testing/selftests/mm/pkey-helpers.h | 3 ++- tools/testing/selftests/mm/pkey_sighandler_tests.c | 4 ++-- tools/testing/selftests/mm/protection_keys.c | 2 +- tools/testing/selftests/powerpc/include/pkeys.h | 2 +- tools/testing/selftests/powerpc/mm/pkey_exec_prot.c | 2 +- tools/testing/selftests/powerpc/mm/pkey_siginfo.c | 2 +- tools/testing/selftests/powerpc/ptrace/core-pkey.c | 6 +++--- tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c | 6 +++--- 10 files changed, 18 insertions(+), 16 deletions(-) -- 2.39.5

10 months, 1 week

3
7
0 0

[PATCH] kunit: Clarify kunit_skip() argument name

by Kevin Brodsky

kunit_skip() and kunit_mark_skipped() can only be passed a pointer to a struct kunit, not struct kunit_suite (only kunit_log() actually supports both). Rename their first argument accordingly. Signed-off-by: Kevin Brodsky <kevin.brodsky(a)arm.com> --- Cc: Brendan Higgins <brendan.higgins(a)linux.dev> Cc: David Gow <davidgow(a)google.com> Cc: Rae Moar <rmoar(a)google.com> Cc: linux-kselftest(a)vger.kernel.org --- include/kunit/test.h | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/include/kunit/test.h b/include/kunit/test.h index 58dbab60f853..0ffb97c78566 100644 --- a/include/kunit/test.h +++ b/include/kunit/test.h @@ -553,9 +553,9 @@ void kunit_cleanup(struct kunit *test); void __printf(2, 3) kunit_log_append(struct string_stream *log, const char *fmt, ...); /** - * kunit_mark_skipped() - Marks @test_or_suite as skipped + * kunit_mark_skipped() - Marks @test as skipped * - * @test_or_suite: The test context object. + * @test: The test context object. * @fmt: A printk() style format string. * * Marks the test as skipped. @fmt is given output as the test status @@ -563,18 +563,18 @@ void __printf(2, 3) kunit_log_append(struct string_stream *log, const char *fmt, * * Test execution continues after kunit_mark_skipped() is called. */ -#define kunit_mark_skipped(test_or_suite, fmt, ...) \ +#define kunit_mark_skipped(test, fmt, ...) \ do { \ - WRITE_ONCE((test_or_suite)->status, KUNIT_SKIPPED); \ - scnprintf((test_or_suite)->status_comment, \ + WRITE_ONCE((test)->status, KUNIT_SKIPPED); \ + scnprintf((test)->status_comment, \ KUNIT_STATUS_COMMENT_SIZE, \ fmt, ##__VA_ARGS__); \ } while (0) /** - * kunit_skip() - Marks @test_or_suite as skipped + * kunit_skip() - Marks @test as skipped * - * @test_or_suite: The test context object. + * @test: The test context object. * @fmt: A printk() style format string. * * Skips the test. @fmt is given output as the test status @@ -582,10 +582,10 @@ void __printf(2, 3) kunit_log_append(struct string_stream *log, const char *fmt, * * Test execution is halted after kunit_skip() is called. */ -#define kunit_skip(test_or_suite, fmt, ...) \ +#define kunit_skip(test, fmt, ...) \ do { \ - kunit_mark_skipped((test_or_suite), fmt, ##__VA_ARGS__);\ - kunit_try_catch_throw(&((test_or_suite)->try_catch)); \ + kunit_mark_skipped((test), fmt, ##__VA_ARGS__); \ + kunit_try_catch_throw(&((test)->try_catch)); \ } while (0) /* base-commit: 0ad2507d5d93f39619fc42372c347d6006b64319 -- 2.47.0

10 months, 1 week

2
1
0 0

[PATCH] kunit: tool: Build GDB scripts

by Brendan Jackman

Following a similar rationale as commit e4835f1da425f ("kunit: tool: Build compile_commands.json"), make a common developer tool available by default for KUnit users. Compared to compile_commands.json, there is a little more work to be done to build the GDB scripts. Is it enough to affect development cycle duration? Unscientific evaluation: rm -rf .kunit; time tools/testing/kunit/kunit.py build --kunitconfig ./lib/kunit/.kunitconfig --jobs 96 Without this patch it took 14.77s, with this patch it took 14.83. So, although `make scripts_gdb` is pretty slow, presumably most of that is just the overhead of running Kbuild at all, actually building the scripts is approximately free. Note also, to actually get the GDB scripts the user needs to enable CONFIG_SCRIPTS_GDB, but building the scripts_gdb target without that is still harmless. Signed-off-by: Brendan Jackman <jackmanb(a)google.com> --- tools/testing/kunit/kunit_kernel.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py index e76d7894b6c5195ece49f0d8c7ac35130df428a9..33b5f7351cbb5d0be240cb52db2bc1fa94aeb75e 100644 --- a/tools/testing/kunit/kunit_kernel.py +++ b/tools/testing/kunit/kunit_kernel.py @@ -72,8 +72,8 @@ class LinuxSourceTreeOperations: raise ConfigError(e.output.decode()) def make(self, jobs: int, build_dir: str, make_options: Optional[List[str]]) -> None: - command = ['make', 'all', 'compile_commands.json', 'ARCH=' + self._linux_arch, - 'O=' + build_dir, '--jobs=' + str(jobs)] + command = ['make', 'all', 'compile_commands.json', 'scripts_gdb', + 'ARCH=' + self._linux_arch, 'O=' + build_dir, '--jobs=' + str(jobs)] if make_options: command.extend(make_options) if self._cross_compile: --- base-commit: 521d60e196ecb215f425e04e9ab33e02beaffbe3 change-id: 20250121-kunit-gdb-b27315b4f2d8 Best regards, -- Brendan Jackman <jackmanb(a)google.com>

10 months, 1 week

2
1
0 0

[PATCH] tools/nolibc: add support for N64 and N32 ABIs

by Thomas Weißschuh

Add support for the MIPS 64bit N64 and ILP32 N32 ABIs. In addition to different byte orders and ABIs there are also different releases of the MIPS architecture. To avoid blowing up the test matrix, only add a subset of all possible test combinations. Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net> --- tools/include/nolibc/arch-mips.h | 119 ++++++++++++++++++++++++---- tools/testing/selftests/nolibc/Makefile | 28 ++++++- tools/testing/selftests/nolibc/run-tests.sh | 2 +- 3 files changed, 131 insertions(+), 18 deletions(-) diff --git a/tools/include/nolibc/arch-mips.h b/tools/include/nolibc/arch-mips.h index 753a8ed2cf695f0b5eac4b5e4d317fdb383ebf93..638520a3427a985fdbd5f5a49b55853bbadeee75 100644 --- a/tools/include/nolibc/arch-mips.h +++ b/tools/include/nolibc/arch-mips.h @@ -10,7 +10,7 @@ #include "compiler.h" #include "crt.h" -#if !defined(_ABIO32) +#if !defined(_ABIO32) && !defined(_ABIN32) && !defined(_ABI64) #error Unsupported MIPS ABI #endif @@ -32,11 +32,32 @@ * - the arguments are cast to long and assigned into the target registers * which are then simply passed as registers to the asm code, so that we * don't have to experience issues with register constraints. + * + * Syscalls for MIPS ABI N32, same as ABI O32 with the following differences : + * - arguments are in a0, a1, a2, a3, t0, t1, t2, t3. + * t0..t3 are also known as a4..a7. + * - stack is 16-byte aligned */ +#if defined(_ABIO32) + #define _NOLIBC_SYSCALL_CLOBBERLIST \ "memory", "cc", "at", "v1", "hi", "lo", \ "t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9" +#define _NOLIBC_SYSCALL_STACK_RESERVE "addiu $sp, $sp, -32\n" +#define _NOLIBC_SYSCALL_STACK_UNRESERVE "addiu $sp, $sp, 32\n" + +#elif defined(_ABIN32) || defined(_ABI64) + +/* binutils, GCC and clang disagree about register aliases, use numbers instead. */ +#define _NOLIBC_SYSCALL_CLOBBERLIST \ + "memory", "cc", "at", "v1", \ + "10", "11", "12", "13", "14", "15", "24", "25" + +#define _NOLIBC_SYSCALL_STACK_RESERVE +#define _NOLIBC_SYSCALL_STACK_UNRESERVE + +#endif #define my_syscall0(num) \ ({ \ @@ -44,9 +65,9 @@ register long _arg4 __asm__ ("a3"); \ \ __asm__ volatile ( \ - "addiu $sp, $sp, -32\n" \ + _NOLIBC_SYSCALL_STACK_RESERVE \ "syscall\n" \ - "addiu $sp, $sp, 32\n" \ + _NOLIBC_SYSCALL_STACK_UNRESERVE \ : "=r"(_num), "=r"(_arg4) \ : "r"(_num) \ : _NOLIBC_SYSCALL_CLOBBERLIST \ @@ -61,9 +82,9 @@ register long _arg4 __asm__ ("a3"); \ \ __asm__ volatile ( \ - "addiu $sp, $sp, -32\n" \ + _NOLIBC_SYSCALL_STACK_RESERVE \ "syscall\n" \ - "addiu $sp, $sp, 32\n" \ + _NOLIBC_SYSCALL_STACK_UNRESERVE \ : "=r"(_num), "=r"(_arg4) \ : "0"(_num), \ "r"(_arg1) \ @@ -80,9 +101,9 @@ register long _arg4 __asm__ ("a3"); \ \ __asm__ volatile ( \ - "addiu $sp, $sp, -32\n" \ + _NOLIBC_SYSCALL_STACK_RESERVE \ "syscall\n" \ - "addiu $sp, $sp, 32\n" \ + _NOLIBC_SYSCALL_STACK_UNRESERVE \ : "=r"(_num), "=r"(_arg4) \ : "0"(_num), \ "r"(_arg1), "r"(_arg2) \ @@ -100,9 +121,9 @@ register long _arg4 __asm__ ("a3"); \ \ __asm__ volatile ( \ - "addiu $sp, $sp, -32\n" \ + _NOLIBC_SYSCALL_STACK_RESERVE \ "syscall\n" \ - "addiu $sp, $sp, 32\n" \ + _NOLIBC_SYSCALL_STACK_UNRESERVE \ : "=r"(_num), "=r"(_arg4) \ : "0"(_num), \ "r"(_arg1), "r"(_arg2), "r"(_arg3) \ @@ -120,9 +141,9 @@ register long _arg4 __asm__ ("a3") = (long)(arg4); \ \ __asm__ volatile ( \ - "addiu $sp, $sp, -32\n" \ + _NOLIBC_SYSCALL_STACK_RESERVE \ "syscall\n" \ - "addiu $sp, $sp, 32\n" \ + _NOLIBC_SYSCALL_STACK_UNRESERVE \ : "=r" (_num), "=r"(_arg4) \ : "0"(_num), \ "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4) \ @@ -131,6 +152,8 @@ _arg4 ? -_num : _num; \ }) +#if defined(_ABIO32) + #define my_syscall5(num, arg1, arg2, arg3, arg4, arg5) \ ({ \ register long _num __asm__ ("v0") = (num); \ @@ -141,10 +164,10 @@ register long _arg5 = (long)(arg5); \ \ __asm__ volatile ( \ - "addiu $sp, $sp, -32\n" \ + _NOLIBC_SYSCALL_STACK_RESERVE \ "sw %7, 16($sp)\n" \ "syscall\n" \ - "addiu $sp, $sp, 32\n" \ + _NOLIBC_SYSCALL_STACK_UNRESERVE \ : "=r" (_num), "=r"(_arg4) \ : "0"(_num), \ "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5) \ @@ -164,11 +187,11 @@ register long _arg6 = (long)(arg6); \ \ __asm__ volatile ( \ - "addiu $sp, $sp, -32\n" \ + _NOLIBC_SYSCALL_STACK_RESERVE \ "sw %7, 16($sp)\n" \ "sw %8, 20($sp)\n" \ "syscall\n" \ - "addiu $sp, $sp, 32\n" \ + _NOLIBC_SYSCALL_STACK_UNRESERVE \ : "=r" (_num), "=r"(_arg4) \ : "0"(_num), \ "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \ @@ -178,6 +201,50 @@ _arg4 ? -_num : _num; \ }) +#else + +#define my_syscall5(num, arg1, arg2, arg3, arg4, arg5) \ +({ \ + register long _num __asm__ ("v0") = (num); \ + register long _arg1 __asm__ ("$4") = (long)(arg1); \ + register long _arg2 __asm__ ("$5") = (long)(arg2); \ + register long _arg3 __asm__ ("$6") = (long)(arg3); \ + register long _arg4 __asm__ ("$7") = (long)(arg4); \ + register long _arg5 __asm__ ("$8") = (long)(arg5); \ + \ + __asm__ volatile ( \ + "syscall\n" \ + : "=r" (_num), "=r"(_arg4) \ + : "0"(_num), \ + "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5) \ + : _NOLIBC_SYSCALL_CLOBBERLIST \ + ); \ + _arg4 ? -_num : _num; \ +}) + +#define my_syscall6(num, arg1, arg2, arg3, arg4, arg5, arg6) \ +({ \ + register long _num __asm__ ("v0") = (num); \ + register long _arg1 __asm__ ("$4") = (long)(arg1); \ + register long _arg2 __asm__ ("$5") = (long)(arg2); \ + register long _arg3 __asm__ ("$6") = (long)(arg3); \ + register long _arg4 __asm__ ("$7") = (long)(arg4); \ + register long _arg5 __asm__ ("$8") = (long)(arg5); \ + register long _arg6 __asm__ ("$9") = (long)(arg6); \ + \ + __asm__ volatile ( \ + "syscall\n" \ + : "=r" (_num), "=r"(_arg4) \ + : "0"(_num), \ + "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \ + "r"(_arg6) \ + : _NOLIBC_SYSCALL_CLOBBERLIST \ + ); \ + _arg4 ? -_num : _num; \ +}) + +#endif + /* startup code, note that it's called __start on MIPS */ void __start(void); void __attribute__((weak, noreturn)) __nolibc_entrypoint __no_stack_protector __start(void) @@ -190,13 +257,33 @@ void __attribute__((weak, noreturn)) __nolibc_entrypoint __no_stack_protector __ "1:\n" ".cpload $ra\n" "move $a0, $sp\n" /* save stack pointer to $a0, as arg1 of _start_c */ + +#if defined(_ABIO32) "addiu $sp, $sp, -4\n" /* space for .cprestore to store $gp */ ".cprestore 0\n" "li $t0, -8\n" "and $sp, $sp, $t0\n" /* $sp must be 8-byte aligned */ "addiu $sp, $sp, -16\n" /* the callee expects to save a0..a3 there */ - "lui $t9, %hi(_start_c)\n" /* ABI requires current function address in $t9 */ +#else + "daddiu $sp, $sp, -8\n" /* space for .cprestore to store $gp */ + ".cpsetup $ra, 0, 1b\n" + "li $t0, -16\n" + "and $sp, $sp, $t0\n" /* $sp must be 16-byte aligned */ +#endif + + /* ABI requires current function address in $t9 */ +#if defined(_ABIO32) || defined(_ABIN32) + "lui $t9, %hi(_start_c)\n" "ori $t9, %lo(_start_c)\n" +#else + "lui $t9, %highest(_start_c)\n" + "ori $t9, %higher(_start_c)\n" + "dsll $t9, 0x10\n" + "ori $t9, %hi(_start_c)\n" + "dsll $t9, 0x10\n" + "ori $t9, %lo(_start_c)\n" +#endif + "jalr $t9\n" /* transfer to c runtime */ " nop\n" /* delayed slot */ ".set pop\n" diff --git a/tools/testing/selftests/nolibc/Makefile b/tools/testing/selftests/nolibc/Makefile index 983985b7529b65b7ce4a00c28f3f915d83974eea..2dec6ab9596c974b6aac439685e17f5c10a76948 100644 --- a/tools/testing/selftests/nolibc/Makefile +++ b/tools/testing/selftests/nolibc/Makefile @@ -52,6 +52,10 @@ ARCH_ppc64 = powerpc ARCH_ppc64le = powerpc ARCH_mips32le = mips ARCH_mips32be = mips +ARCH_mipsn32le = mips +ARCH_mipsn32be = mips +ARCH_mips64le = mips +ARCH_mips64be = mips ARCH_riscv32 = riscv ARCH_riscv64 = riscv ARCH := $(or $(ARCH_$(XARCH)),$(XARCH)) @@ -64,6 +68,10 @@ IMAGE_arm64 = arch/arm64/boot/Image IMAGE_arm = arch/arm/boot/zImage IMAGE_mips32le = vmlinuz IMAGE_mips32be = vmlinuz +IMAGE_mipsn32le = vmlinuz +IMAGE_mipsn32be = vmlinuz +IMAGE_mips64le = vmlinuz +IMAGE_mips64be = vmlinuz IMAGE_ppc = vmlinux IMAGE_ppc64 = vmlinux IMAGE_ppc64le = arch/powerpc/boot/zImage @@ -83,6 +91,10 @@ DEFCONFIG_arm64 = defconfig DEFCONFIG_arm = multi_v7_defconfig DEFCONFIG_mips32le = malta_defconfig DEFCONFIG_mips32be = malta_defconfig generic/eb.config +DEFCONFIG_mipsn32le = malta_defconfig generic/64r2.config +DEFCONFIG_mipsn32be = malta_defconfig generic/64r6.config generic/eb.config +DEFCONFIG_mips64le = malta_defconfig generic/64r6.config +DEFCONFIG_mips64be = malta_defconfig generic/64r2.config generic/eb.config DEFCONFIG_ppc = pmac32_defconfig DEFCONFIG_ppc64 = powernv_be_defconfig DEFCONFIG_ppc64le = powernv_defconfig @@ -105,7 +117,11 @@ QEMU_ARCH_x86 = x86_64 QEMU_ARCH_arm64 = aarch64 QEMU_ARCH_arm = arm QEMU_ARCH_mips32le = mipsel # works with malta_defconfig -QEMU_ARCH_mips32be = mips +QEMU_ARCH_mips32be = mips +QEMU_ARCH_mipsn32le = mips64el +QEMU_ARCH_mipsn32be = mips64 +QEMU_ARCH_mips64le = mips64el +QEMU_ARCH_mips64be = mips64 QEMU_ARCH_ppc = ppc QEMU_ARCH_ppc64 = ppc64 QEMU_ARCH_ppc64le = ppc64 @@ -117,6 +133,8 @@ QEMU_ARCH_loongarch = loongarch64 QEMU_ARCH = $(QEMU_ARCH_$(XARCH)) QEMU_ARCH_USER_ppc64le = ppc64le +QEMU_ARCH_USER_mipsn32le = mipsn32el +QEMU_ARCH_USER_mipsn32be = mipsn32 QEMU_ARCH_USER = $(or $(QEMU_ARCH_USER_$(XARCH)),$(QEMU_ARCH_$(XARCH))) QEMU_BIOS_DIR = /usr/share/edk2/ @@ -134,6 +152,10 @@ QEMU_ARGS_arm64 = -M virt -cpu cortex-a53 -append "panic=-1 $(TEST:%=NOLIBC QEMU_ARGS_arm = -M virt -append "panic=-1 $(TEST:%=NOLIBC_TEST=%)" QEMU_ARGS_mips32le = -M malta -append "panic=-1 $(TEST:%=NOLIBC_TEST=%)" QEMU_ARGS_mips32be = -M malta -append "panic=-1 $(TEST:%=NOLIBC_TEST=%)" +QEMU_ARGS_mipsn32le = -M malta -cpu 5KEc -append "panic=-1 $(TEST:%=NOLIBC_TEST=%)" +QEMU_ARGS_mipsn32be = -M malta -cpu I6400 -append "panic=-1 $(TEST:%=NOLIBC_TEST=%)" +QEMU_ARGS_mips64le = -M malta -cpu I6400 -append "panic=-1 $(TEST:%=NOLIBC_TEST=%)" +QEMU_ARGS_mips64be = -M malta -cpu 5KEc -append "panic=-1 $(TEST:%=NOLIBC_TEST=%)" QEMU_ARGS_ppc = -M g3beige -append "console=ttyS0 panic=-1 $(TEST:%=NOLIBC_TEST=%)" QEMU_ARGS_ppc64 = -M powernv -append "console=hvc0 panic=-1 $(TEST:%=NOLIBC_TEST=%)" QEMU_ARGS_ppc64le = -M powernv -append "console=hvc0 panic=-1 $(TEST:%=NOLIBC_TEST=%)" @@ -161,6 +183,10 @@ CFLAGS_ppc64le = -m64 -mlittle-endian -mno-vsx $(call cc-option,-mabi=elfv2) CFLAGS_s390 = -m64 CFLAGS_mips32le = -EL -mabi=32 -fPIC CFLAGS_mips32be = -EB -mabi=32 +CFLAGS_mipsn32le = -EL -mabi=n32 -fPIC -march=mips64r2 +CFLAGS_mipsn32be = -EB -mabi=n32 -march=mips64r6 +CFLAGS_mips64le = -EL -mabi=64 -march=mips64r6 +CFLAGS_mips64be = -EB -mabi=64 -march=mips64r2 CFLAGS_STACKPROTECTOR ?= $(call cc-option,-mstack-protector-guard=global $(call cc-option,-fstack-protector-all)) CFLAGS ?= -Os -fno-ident -fno-asynchronous-unwind-tables -std=c89 -W -Wall -Wextra \ $(call cc-option,-fno-stack-protector) $(call cc-option,-Wmissing-prototypes) \ diff --git a/tools/testing/selftests/nolibc/run-tests.sh b/tools/testing/selftests/nolibc/run-tests.sh index 6db01115276888bc89f6ec5532153c37e55c83d3..f0f3890fb5fa8196cd33aa8681ed30b00d8f474e 100755 --- a/tools/testing/selftests/nolibc/run-tests.sh +++ b/tools/testing/selftests/nolibc/run-tests.sh @@ -20,7 +20,7 @@ llvm= all_archs=( i386 x86_64 arm64 arm - mips32le mips32be + mips32le mips32be mipsn32le mipsn32be mips64le mips64be ppc ppc64 ppc64le riscv32 riscv64 s390 --- base-commit: 16681bea9a80080765c98b545ad74c17de2d513c change-id: 20231105-nolibc-mips-n32-234901bd910d Best regards, -- Thomas Weißschuh <linux(a)weissschuh.net>

10 months, 1 week

3
6
0 0

[PATCH net-next v3] selftests: net: add support for testing SO_RCVMARK and SO_RCVPRIORITY

by Anna Emese Nyiri

Introduce tests to verify the correct functionality of the SO_RCVMARK and SO_RCVPRIORITY socket options. Key changes include: - so_rcv_listener.c: Implements a receiver application to test the correct behavior of the SO_RCVMARK and SO_RCVPRIORITY options. - test_so_rcv.sh: Provides a shell script to automate testing for these options. - Makefile: Integrates test_so_rcv.sh into the kernel selftests. v3: - Add the C part to TEST_GEN_FILES. - Ensure the test fails if no cmsg of type opt.name is received in so_rcv_listener.c - Rebased on net-next. v2: https://lore.kernel.org/netdev/20250210192216.37756-1-annaemesenyiri@gmail.… - Add the C part to TEST_GEN_PROGS and .gitignore. - Modify buffer space and add IPv6 testing option in so_rcv_listener.c. - Add IPv6 testing, remove unnecessary comment, add kselftest exit codes, run both binaries in a namespace, and add sleep in test_so_rcv.sh. The sleep was added to ensure that the listener process has enough time to start before the sender attempts to connect. - Rebased on net-next. v1: https://lore.kernel.org/netdev/20250129143601.16035-2-annaemesenyiri@gmail.… Suggested-by: Jakub Kicinski <kuba(a)kernel.org> Suggested-by: Ferenc Fejes <fejes(a)inf.elte.hu> Signed-off-by: Anna Emese Nyiri <annaemesenyiri(a)gmail.com> --- tools/testing/selftests/net/.gitignore | 1 + tools/testing/selftests/net/Makefile | 2 + tools/testing/selftests/net/so_rcv_listener.c | 168 ++++++++++++++++++ tools/testing/selftests/net/test_so_rcv.sh | 73 ++++++++ 4 files changed, 244 insertions(+) create mode 100644 tools/testing/selftests/net/so_rcv_listener.c create mode 100755 tools/testing/selftests/net/test_so_rcv.sh diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore index 28a715a8ef2b..80dcae53ef55 100644 --- a/tools/testing/selftests/net/.gitignore +++ b/tools/testing/selftests/net/.gitignore @@ -42,6 +42,7 @@ socket so_incoming_cpu so_netns_cookie so_txtime +so_rcv_listener stress_reuseport_listen tap tcp_fastopen_backup_key diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index b6271714504d..8d6116b80cf1 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -33,6 +33,7 @@ TEST_PROGS += gro.sh TEST_PROGS += gre_gso.sh TEST_PROGS += cmsg_so_mark.sh TEST_PROGS += cmsg_so_priority.sh +TEST_PROGS += test_so_rcv.sh TEST_PROGS += cmsg_time.sh cmsg_ipv6.sh TEST_PROGS += netns-name.sh TEST_PROGS += nl_netdev.py @@ -76,6 +77,7 @@ TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls tun tap epoll_busy_ TEST_GEN_FILES += toeplitz TEST_GEN_FILES += cmsg_sender TEST_GEN_FILES += stress_reuseport_listen +TEST_GEN_FILES += so_rcv_listener TEST_PROGS += test_vxlan_vnifiltering.sh TEST_GEN_FILES += io_uring_zerocopy_tx TEST_PROGS += io_uring_zerocopy_tx.sh diff --git a/tools/testing/selftests/net/so_rcv_listener.c b/tools/testing/selftests/net/so_rcv_listener.c new file mode 100644 index 000000000000..4b0b14edce61 --- /dev/null +++ b/tools/testing/selftests/net/so_rcv_listener.c @@ -0,0 +1,168 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <errno.h> +#include <netdb.h> +#include <stdbool.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> +#include <linux/types.h> +#include <sys/socket.h> +#include <netinet/in.h> +#include <arpa/inet.h> + +#ifndef SO_RCVPRIORITY +#define SO_RCVPRIORITY 82 +#endif + +struct options { + __u32 val; + int name; + int rcvname; + const char *host; + const char *service; +} opt; + +static void __attribute__((noreturn)) usage(const char *bin) +{ + printf("Usage: %s [opts] <dst host> <dst port / service>\n", bin); + printf("Options:\n" + "\t\t-M val Test SO_RCVMARK\n" + "\t\t-P val Test SO_RCVPRIORITY\n" + ""); + exit(EXIT_FAILURE); +} + +static void parse_args(int argc, char *argv[]) +{ + int o; + + while ((o = getopt(argc, argv, "M:P:")) != -1) { + switch (o) { + case 'M': + opt.val = atoi(optarg); + opt.name = SO_MARK; + opt.rcvname = SO_RCVMARK; + break; + case 'P': + opt.val = atoi(optarg); + opt.name = SO_PRIORITY; + opt.rcvname = SO_RCVPRIORITY; + break; + default: + usage(argv[0]); + break; + } + } + + if (optind != argc - 2) + usage(argv[0]); + + opt.host = argv[optind]; + opt.service = argv[optind + 1]; +} + +int main(int argc, char *argv[]) +{ + int err = 0; + int recv_fd = -1; + int ret_value = 0; + __u32 recv_val; + struct cmsghdr *cmsg; + char cbuf[CMSG_SPACE(sizeof(__u32))]; + char recv_buf[CMSG_SPACE(sizeof(__u32))]; + struct iovec iov[1]; + struct msghdr msg; + struct sockaddr_in recv_addr4; + struct sockaddr_in6 recv_addr6; + + parse_args(argc, argv); + + int family = strchr(opt.host, ':') ? AF_INET6 : AF_INET; + + recv_fd = socket(family, SOCK_DGRAM, IPPROTO_UDP); + if (recv_fd < 0) { + perror("Can't open recv socket"); + ret_value = -errno; + goto cleanup; + } + + err = setsockopt(recv_fd, SOL_SOCKET, opt.rcvname, &opt.val, sizeof(opt.val)); + if (err < 0) { + perror("Recv setsockopt error"); + ret_value = -errno; + goto cleanup; + } + + if (family == AF_INET) { + memset(&recv_addr4, 0, sizeof(recv_addr4)); + recv_addr4.sin_family = family; + recv_addr4.sin_port = htons(atoi(opt.service)); + + if (inet_pton(family, opt.host, &recv_addr4.sin_addr) <= 0) { + perror("Invalid IPV4 address"); + ret_value = -errno; + goto cleanup; + } + + err = bind(recv_fd, (struct sockaddr *)&recv_addr4, sizeof(recv_addr4)); + } else { + memset(&recv_addr6, 0, sizeof(recv_addr6)); + recv_addr6.sin6_family = family; + recv_addr6.sin6_port = htons(atoi(opt.service)); + + if (inet_pton(family, opt.host, &recv_addr6.sin6_addr) <= 0) { + perror("Invalid IPV6 address"); + ret_value = -errno; + goto cleanup; + } + + err = bind(recv_fd, (struct sockaddr *)&recv_addr6, sizeof(recv_addr6)); + } + + if (err < 0) { + perror("Recv bind error"); + ret_value = -errno; + goto cleanup; + } + + iov[0].iov_base = recv_buf; + iov[0].iov_len = sizeof(recv_buf); + + memset(&msg, 0, sizeof(msg)); + msg.msg_iov = iov; + msg.msg_iovlen = 1; + msg.msg_control = cbuf; + msg.msg_controllen = sizeof(cbuf); + + err = recvmsg(recv_fd, &msg, 0); + if (err < 0) { + perror("Message receive error"); + ret_value = -errno; + goto cleanup; + } + + for (cmsg = CMSG_FIRSTHDR(&msg); cmsg != NULL; cmsg = CMSG_NXTHDR(&msg, cmsg)) { + if (cmsg->cmsg_level == SOL_SOCKET && cmsg->cmsg_type == opt.name) { + recv_val = *(__u32 *)CMSG_DATA(cmsg); + printf("Received value: %u\n", recv_val); + + if (recv_val != opt.val) { + fprintf(stderr, "Error: expected value: %u, got: %u\n", + opt.val, recv_val); + ret_value = -EINVAL; + } + goto cleanup; + } + } + + fprintf(stderr, "Error: No matching cmsg received\n"); + ret_value = -ENOMSG; + +cleanup: + if (recv_fd >= 0) + close(recv_fd); + + return ret_value; +} diff --git a/tools/testing/selftests/net/test_so_rcv.sh b/tools/testing/selftests/net/test_so_rcv.sh new file mode 100755 index 000000000000..d8aa4362879d --- /dev/null +++ b/tools/testing/selftests/net/test_so_rcv.sh @@ -0,0 +1,73 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +source lib.sh + +HOSTS=("127.0.0.1" "::1") +PORT=1234 +TOTAL_TESTS=0 +FAILED_TESTS=0 + +declare -A TESTS=( + ["SO_RCVPRIORITY"]="-P 2" + ["SO_RCVMARK"]="-M 3" +) + +check_result() { + ((TOTAL_TESTS++)) + if [ "$1" -ne 0 ]; then + ((FAILED_TESTS++)) + fi +} + +cleanup() +{ + cleanup_ns $NS +} + +trap cleanup EXIT + +setup_ns NS + +for HOST in "${HOSTS[@]}"; do + PROTOCOL="IPv4" + if [[ "$HOST" == "::1" ]]; then + PROTOCOL="IPv6" + fi + for test_name in "${!TESTS[@]}"; do + echo "Running $test_name test, $PROTOCOL" + arg=${TESTS[$test_name]} + + ip netns exec $NS ./so_rcv_listener $arg $HOST $PORT & + LISTENER_PID=$! + + sleep 0.5 + + if ! ip netns exec $NS ./cmsg_sender $arg $HOST $PORT; then + echo "Sender failed for $test_name, $PROTOCOL" + kill "$LISTENER_PID" 2>/dev/null + wait "$LISTENER_PID" + check_result 1 + continue + fi + + wait "$LISTENER_PID" + LISTENER_EXIT_CODE=$? + + if [ "$LISTENER_EXIT_CODE" -eq 0 ]; then + echo "Rcv test OK for $test_name, $PROTOCOL" + check_result 0 + else + echo "Rcv test FAILED for $test_name, $PROTOCOL" + check_result 1 + fi + done +done + +if [ "$FAILED_TESTS" -ne 0 ]; then + echo "FAIL - $FAILED_TESTS/$TOTAL_TESTS tests failed" + exit ${KSFT_FAIL} +else + echo "OK - All $TOTAL_TESTS tests passed" + exit ${KSFT_PASS} +fi -- 2.43.0

10 months, 1 week

4
3
0 0

[PATCH net-next v8 0/3] netdev-genl: Add an xsk attribute to queues

by Joe Damato

Greetings: Welcome to v8. Minor change, see changelog below. Re-tested on my mlx5 system both with and without CONFIG_XDP_SOCKETS enabled and both with and without NETIF set. This is an attempt to followup on something Jakub asked me about [1], adding an xsk attribute to queues and more clearly documenting which queues are linked to NAPIs... After the RFC [2], Jakub suggested creating an empty nest for queues which have a pool, so I've adjusted this version to work that way. The nest can be extended in the future to express attributes about XSK as needed. Queues which are not used for AF_XDP do not have the xsk attribute present. I've run the included test on: - my mlx5 machine (via NETIF=) - without setting NETIF And the test seems to pass in both cases. Thanks, Joe [1]: https://lore.kernel.org/netdev/20250113143109.60afa59a@kernel.org/ [2]: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/ v8: - Update the Makefile in patch 3 to use TEST_GEN_FILES instead of TEST_GET_PROGS. - Fix a codespell complaint in xdp_helper.c. v7: https://lore.kernel.org/netdev/20250213192336.42156-1-jdamato@fastly.com/ - Added CONFIG_XDP_SOCKETS to selftests/driver/net/config as suggested by Stanislav. - Updated xdp_helper.c to return -1 for AF_XDP non-existence, but 1 for other failures. - Updated queues.py to mark test as skipped if AF_XDP does not exist. v6: https://lore.kernel.org/bpf/20250210193903.16235-1-jdamato@fastly.com/ - Added ifdefs for CONFIG_XDP_SOCKETS in patch 2 as Stanislav suggested. v5: https://lore.kernel.org/bpf/20250208041248.111118-1-jdamato@fastly.com/ - Removed unused ret variable from patch 2 as Simon suggested. v4: https://lore.kernel.org/lkml/20250207030916.32751-1-jdamato@fastly.com/ - Add patch 1, as suggested by Jakub, which adds an empty nest helper. - Use the helper in patch 2, which makes the code cleaner and prevents a possible bug. v3: https://lore.kernel.org/netdev/20250204191108.161046-1-jdamato@fastly.com/ - Change comment format in patch 2 to avoid kdoc warnings. No other changes. v2: https://lore.kernel.org/all/20250203185828.19334-1-jdamato@fastly.com/ - Switched from RFC to actual submission now that net-next is open - Adjusted patch 1 to include an empty nest as suggested by Jakub - Adjusted patch 2 to update the test based on changes to patch 1, and to incorporate some Python feedback from Jakub :) rfc: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/ Joe Damato (3): netlink: Add nla_put_empty_nest helper netdev-genl: Add an XSK attribute to queues selftests: drv-net: Test queue xsk attribute Documentation/netlink/specs/netdev.yaml | 13 ++- include/net/netlink.h | 15 +++ include/uapi/linux/netdev.h | 6 ++ net/core/netdev-genl.c | 12 +++ tools/include/uapi/linux/netdev.h | 6 ++ .../testing/selftests/drivers/net/.gitignore | 2 + tools/testing/selftests/drivers/net/Makefile | 3 + tools/testing/selftests/drivers/net/config | 1 + tools/testing/selftests/drivers/net/queues.py | 42 +++++++- .../selftests/drivers/net/xdp_helper.c | 98 +++++++++++++++++++ 10 files changed, 194 insertions(+), 4 deletions(-) create mode 100644 tools/testing/selftests/drivers/net/.gitignore create mode 100644 tools/testing/selftests/drivers/net/xdp_helper.c base-commit: 7a7e0197133d18cfd9931e7d3a842d0f5730223f -- 2.43.0

10 months, 1 week

2
2
0 0

[PATCH 0/2] selftests/mm: Allow execution on systems without huge pages

by Mark Brown

Currently the mm selftests refuse to run if we don't have huge page support but there are plenty of tests that don't depend on this feature, relax this requirement to allow coverage on relevant systems (eg, most 32 bit arm ones). While doing this I noticed a bug with an existing check if we're running THP tests, the fix overlaps with the above change so is sent as part of a series. Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Mark Brown (2): selftests/mm: Fix check for running THP tests selftests/mm: Allow tests to run with no huge pages support tools/testing/selftests/mm/run_vmtests.sh | 68 +++++++++++++++++++------------ 1 file changed, 43 insertions(+), 25 deletions(-) --- base-commit: a64dcfb451e254085a7daee5fe51bf22959d52d3 change-id: 20250211-kselftest-mm-no-hugepages-ee5917a170eb Best regards, -- Mark Brown <broonie(a)kernel.org>

10 months, 1 week

2
4
0 0

[PATCH 2/2] rseq/selftests: Add test for mm_cid compaction

by Gabriele Monaco

A task in the kernel (task_mm_cid_work) runs somewhat periodically to compact the mm_cid for each process. Add a test to validate that it runs correctly and timely. The test spawns 1 thread pinned to each CPU, then each thread, including the main one, runs in short bursts for some time. During this period, the mm_cids should be spanning all numbers between 0 and nproc. At the end of this phase, a thread with high enough mm_cid (>= nproc/2) is selected to be the new leader, all other threads terminate. After some time, the only running thread should see 0 as mm_cid, if that doesn't happen, the compaction mechanism didn't work and the test fails. Since mm_cid compaction is less likely for tasks running in short bursts, we increase the likelihood by just running a busy loop at every iteration. This compaction is a best effort work and this behaviour is currently acceptable. The test never fails if only 1 core is available, in which case, we cannot test anything as the only available mm_cid is 0. Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> Signed-off-by: Gabriele Monaco <gmonaco(a)redhat.com> --- tools/testing/selftests/rseq/.gitignore | 1 + tools/testing/selftests/rseq/Makefile | 2 +- .../selftests/rseq/mm_cid_compaction_test.c | 208 ++++++++++++++++++ 3 files changed, 210 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore index 16496de5f6ce4..2c89f97e4f737 100644 --- a/tools/testing/selftests/rseq/.gitignore +++ b/tools/testing/selftests/rseq/.gitignore @@ -3,6 +3,7 @@ basic_percpu_ops_test basic_percpu_ops_mm_cid_test basic_test basic_rseq_op_test +mm_cid_compaction_test param_test param_test_benchmark param_test_compare_twice diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile index 5a3432fceb586..ce1b38f46a355 100644 --- a/tools/testing/selftests/rseq/Makefile +++ b/tools/testing/selftests/rseq/Makefile @@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1 TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \ param_test_benchmark param_test_compare_twice param_test_mm_cid \ - param_test_mm_cid_benchmark param_test_mm_cid_compare_twice + param_test_mm_cid_benchmark param_test_mm_cid_compare_twice mm_cid_compaction_test TEST_GEN_PROGS_EXTENDED = librseq.so diff --git a/tools/testing/selftests/rseq/mm_cid_compaction_test.c b/tools/testing/selftests/rseq/mm_cid_compaction_test.c new file mode 100644 index 0000000000000..8808500466d02 --- /dev/null +++ b/tools/testing/selftests/rseq/mm_cid_compaction_test.c @@ -0,0 +1,208 @@ +// SPDX-License-Identifier: LGPL-2.1 +#define _GNU_SOURCE +#include <assert.h> +#include <pthread.h> +#include <sched.h> +#include <stdint.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <stddef.h> + +#include "../kselftest.h" +#include "rseq.h" + +#define VERBOSE 0 +#define printf_verbose(fmt, ...) \ + do { \ + if (VERBOSE) \ + printf(fmt, ##__VA_ARGS__); \ + } while (0) + +/* 0.5 s */ +#define RUNNER_PERIOD 500000 +/* Number of runs before we terminate or get the token */ +#define THREAD_RUNS 5 + +/* + * Number of times we check that the mm_cid were compacted. + * Checks are repeated every RUNNER_PERIOD. + */ +#define MM_CID_COMPACT_TIMEOUT 10 + +struct thread_args { + int cpu; + int num_cpus; + pthread_mutex_t *token; + pthread_barrier_t *barrier; + pthread_t *tinfo; + struct thread_args *args_head; +}; + +static void __noreturn *thread_runner(void *arg) +{ + struct thread_args *args = arg; + int i, ret, curr_mm_cid; + cpu_set_t cpumask; + + CPU_ZERO(&cpumask); + CPU_SET(args->cpu, &cpumask); + ret = pthread_setaffinity_np(pthread_self(), sizeof(cpumask), &cpumask); + if (ret) { + errno = ret; + perror("Error: failed to set affinity"); + abort(); + } + pthread_barrier_wait(args->barrier); + + for (i = 0; i < THREAD_RUNS; i++) + usleep(RUNNER_PERIOD); + curr_mm_cid = rseq_current_mm_cid(); + /* + * We select one thread with high enough mm_cid to be the new leader. + * All other threads (including the main thread) will terminate. + * After some time, the mm_cid of the only remaining thread should + * converge to 0, if not, the test fails. + */ + if (curr_mm_cid >= args->num_cpus / 2 && + !pthread_mutex_trylock(args->token)) { + printf_verbose( + "cpu%d has mm_cid=%d and will be the new leader.\n", + sched_getcpu(), curr_mm_cid); + for (i = 0; i < args->num_cpus; i++) { + if (args->tinfo[i] == pthread_self()) + continue; + ret = pthread_join(args->tinfo[i], NULL); + if (ret) { + errno = ret; + perror("Error: failed to join thread"); + abort(); + } + } + pthread_barrier_destroy(args->barrier); + free(args->tinfo); + free(args->token); + free(args->barrier); + free(args->args_head); + + for (i = 0; i < MM_CID_COMPACT_TIMEOUT; i++) { + curr_mm_cid = rseq_current_mm_cid(); + printf_verbose("run %d: mm_cid=%d on cpu%d.\n", i, + curr_mm_cid, sched_getcpu()); + if (curr_mm_cid == 0) + exit(EXIT_SUCCESS); + /* + * Currently mm_cid compaction is less likely for tasks + * running in short bursts: increase likelihood by just + * running for some time doing nothing. + */ + for (int j = 0; j < 0xffff; j++) + for (int k = 0; k < 0xffff; k++) + asm(""); + usleep(RUNNER_PERIOD); + } + exit(EXIT_FAILURE); + } + printf_verbose("cpu%d has mm_cid=%d and is going to terminate.\n", + sched_getcpu(), curr_mm_cid); + pthread_exit(NULL); +} + +int test_mm_cid_compaction(void) +{ + cpu_set_t affinity; + int i, j, ret = 0, num_threads; + pthread_t *tinfo; + pthread_mutex_t *token; + pthread_barrier_t *barrier; + struct thread_args *args; + + sched_getaffinity(0, sizeof(affinity), &affinity); + num_threads = CPU_COUNT(&affinity); + tinfo = calloc(num_threads, sizeof(*tinfo)); + if (!tinfo) { + perror("Error: failed to allocate tinfo"); + return -1; + } + args = calloc(num_threads, sizeof(*args)); + if (!args) { + perror("Error: failed to allocate args"); + ret = -1; + goto out_free_tinfo; + } + token = malloc(sizeof(*token)); + if (!token) { + perror("Error: failed to allocate token"); + ret = -1; + goto out_free_args; + } + barrier = malloc(sizeof(*barrier)); + if (!barrier) { + perror("Error: failed to allocate barrier"); + ret = -1; + goto out_free_token; + } + if (num_threads == 1) { + fprintf(stderr, "Cannot test on a single cpu. " + "Skipping mm_cid_compaction test.\n"); + /* only skipping the test, this is not a failure */ + goto out_free_barrier; + } + pthread_mutex_init(token, NULL); + ret = pthread_barrier_init(barrier, NULL, num_threads); + if (ret) { + errno = ret; + perror("Error: failed to initialise barrier"); + goto out_free_barrier; + } + for (i = 0, j = 0; i < CPU_SETSIZE && j < num_threads; i++) { + if (!CPU_ISSET(i, &affinity)) + continue; + args[j].num_cpus = num_threads; + args[j].tinfo = tinfo; + args[j].token = token; + args[j].barrier = barrier; + args[j].cpu = i; + args[j].args_head = args; + if (!j) { + /* The first thread is the main one */ + tinfo[0] = pthread_self(); + ++j; + continue; + } + ret = pthread_create(&tinfo[j], NULL, thread_runner, &args[j]); + if (ret) { + errno = ret; + perror("Error: failed to create thread"); + abort(); + } + ++j; + } + printf_verbose("Started %d threads.\n", num_threads); + + /* Also main thread will terminate if it is not selected as leader */ + thread_runner(&args[0]); + + /* only reached in case of errors */ +out_free_barrier: + free(barrier); +out_free_token: + free(token); +out_free_args: + free(args); +out_free_tinfo: + free(tinfo); + + return ret; +} + +int main(int argc, char **argv) +{ + if (!rseq_mm_cid_available()) { + fprintf(stderr, "Error: rseq_mm_cid unavailable\n"); + return -1; + } + if (test_mm_cid_compaction()) + return -1; + return 0; +} -- 2.48.1

10 months, 1 week

2
1
0 0

[RFC v2 0/5] mm: introduce THP deferred setting

by Nico Pache

This series is a follow-up to [1], which adds mTHP support to khugepaged. mTHP khugepaged support was necessary for the global="defer" and mTHP="inherit" case (and others) to make sense. We've seen cases were customers switching from RHEL7 to RHEL8 see a significant increase in the memory footprint for the same workloads. Through our investigations we found that a large contributing factor to the increase in RSS was an increase in THP usage. For workloads like MySQL, or when using allocators like jemalloc, it is often recommended to set /transparent_hugepages/enabled=never. This is in part due to performance degradations and increased memory waste. This series introduces enabled=defer, this setting acts as a middle ground between always and madvise. If the mapping is MADV_HUGEPAGE, the page fault handler will act normally, making a hugepage if possible. If the allocation is not MADV_HUGEPAGE, then the page fault handler will default to the base size allocation. The caveat is that khugepaged can still operate on pages thats not MADV_HUGEPAGE. This allows for two things... one, applications specifically designed to use hugepages will get them, and two, applications that don't use hugepages can still benefit from them without aggressively inserting THPs at every possible chance. This curbs the memory waste, and defers the use of hugepages to khugepaged. Khugepaged can then scan the memory for eligible collapsing. Admins may want to lower max_ptes_none, if not, khugepaged may aggressively collapse single allocations into hugepages. TESTING: - Built for x86_64, aarch64, ppc64le, and s390x - selftests mm - In [1] I provided a script [2] that has multiple access patterns - lots of general use. These changes have been running in my VM for some time - redis testing. This test was my original case for the defer mode. What I was able to prove was that THP=always leads to increased max_latency cases; hence why it is recommended to disable THPs for redis servers. However with 'defer' we dont have the max_latency spikes and can still get the system to utilize THPs. I further tested this with the mTHP defer setting and found that redis (and probably other jmalloc users) can utilize THPs via defer (+mTHP defer) without a large latency penalty and some potential gains. I uploaded some mmtest results here [3] which compares: stock+thp=never stock+(m)thp=always khugepaged-mthp + defer (max_ptes_none=64) The results show that (m)THPs can cause some throughput regression in some cases, but also has gains in other cases. The mTHP+defer results have more gains and less losses over the (m)THP=always case. V2 Changes: - base changes on mTHP khugepaged support - Fix selftests parsing issue - add mTHP defer option - add mTHP defer Documentation [1] - https://lkml.org/lkml/2025/2/10/1982 [2] - https://gitlab.com/npache/khugepaged_mthp_test [3] - https://people.redhat.com/npache/mthp_khugepaged_defer/testoutput2/output.h… Nico Pache (5): mm: defer THP insertion to khugepaged mm: document transparent_hugepage=defer usage selftests: mm: add defer to thp setting parser khugepaged: add defer option to mTHP options mm: document mTHP defer setting Documentation/admin-guide/mm/transhuge.rst | 40 ++++++++++--- include/linux/huge_mm.h | 18 +++++- mm/huge_memory.c | 69 +++++++++++++++++++--- mm/khugepaged.c | 10 ++-- tools/testing/selftests/mm/thp_settings.c | 1 + tools/testing/selftests/mm/thp_settings.h | 1 + 6 files changed, 115 insertions(+), 24 deletions(-) -- 2.48.1

10 months, 1 week

2
13
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror February 2025