April 2025 - Linux-kselftest-mirror

[PATCH net-next v3] selftests/vsock: add initial vmtest.sh for vsock

by Bobby Eshleman

This commit introduces a new vmtest.sh runner for vsock. It uses virtme-ng/qemu to run tests in a VM. The tests validate G2H, H2G, and loopback. The testing tools from tools/testing/vsock/ are reused. Currently, only vsock_test is used. VMCI and hyperv support is automatically built, though not used. Only tested on x86. To run: $ tools/testing/selftests/vsock/vmtest.sh or $ make -C tools/testing/selftests TARGETS=vsock run_tests Results: # linux/tools/testing/selftests/vsock/vmtest.log setup: Building kernel and tests setup: Booting up VM setup: VM booted up test:vm_server_host_client:guest: Control socket listening on 0.0.0.0:51000 test:vm_server_host_client:guest: Control socket connection accepted... [...] test:vm_loopback:guest: 30 - SOCK_STREAM retry failed connect()...ok test:vm_loopback:guest: 31 - SOCK_STREAM SO_LINGER null-ptr-deref...ok test:vm_loopback:guest: 31 - SOCK_STREAM SO_LINGER null-ptr-deref...ok Future work can include vsock_diag_test. vmtest.sh is loosely based off of tools/testing/selftests/net/pmtu.sh, which was picked out of the bag of tests I knew to work with NIPA. Because vsock requires a VM to test anything other than loopback, this patch adds vmtest.sh as a kselftest itself. This is different than other systems that have a "vmtest.sh", where it is used as a utility script to spin up a VM to run the selftests as a guest (but isn't hooked into kselftest). This aspect is worth review, as I'm not aware of all of the enviroments where this would run. Signed-off-by: Bobby Eshleman <bobbyeshleman(a)gmail.com> --- Changes in v3: - use common conditional syntax for checking variables - use return value instead of global rc - fix typo TEST_HOST_LISTENER_PORT -> TEST_HOST_PORT_LISTENER - use SIGTERM instead of SIGKILL on cleanup - use peer-cid=1 for loopback - change sleep delay times into globals - fix test_vm_loopback logging - add test selection in arguments - make QEMU an argument - check that vng binary is on path - use QEMU variable - change <tab><backslash> to <space><backslash> - fix hardcoded file paths - add comment in commit msg about script that vmtest.sh was based off of - Add tools/testing/selftest/vsock/Makefile for kselftest - Link to v2: https://lore.kernel.org/r/20250417-vsock-vmtest-v2-1-3901a27331e8@gmail.com Changes in v2: - add kernel oops and warnings checker - change testname variable to use FUNCNAME - fix spacing in test_vm_server_host_client - add -s skip build option to vmtest.sh - add test_vm_loopback - pass port to vm_wait_for_listener - fix indentation in vmtest.sh - add vmci and hyperv to config - changed whitespace from tabs to spaces in help string - Link to v1: https://lore.kernel.org/r/20250410-vsock-vmtest-v1-1-f35a81dab98c@gmail.com --- MAINTAINERS | 1 + tools/testing/selftests/vsock/.gitignore | 1 + tools/testing/selftests/vsock/Makefile | 9 + tools/testing/selftests/vsock/config.vsock | 10 + tools/testing/selftests/vsock/settings | 1 + tools/testing/selftests/vsock/vmtest.sh | 354 +++++++++++++++++++++++++++++ 6 files changed, 376 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 657a67f9031ef7798c19ac63e6383d4cb18a9e1f..3fbdd7bbfce7196a3cc7db70203317c6bd0e51fd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -25751,6 +25751,7 @@ F: include/uapi/linux/vm_sockets.h F: include/uapi/linux/vm_sockets_diag.h F: include/uapi/linux/vsockmon.h F: net/vmw_vsock/ +F: tools/testing/selftests/vsock/ F: tools/testing/vsock/ VMALLOC diff --git a/tools/testing/selftests/vsock/.gitignore b/tools/testing/selftests/vsock/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..1950aa8ac68c0831c12c1aaa429da45bbe41e60f --- /dev/null +++ b/tools/testing/selftests/vsock/.gitignore @@ -0,0 +1 @@ +vsock_selftests.log diff --git a/tools/testing/selftests/vsock/Makefile b/tools/testing/selftests/vsock/Makefile new file mode 100644 index 0000000000000000000000000000000000000000..6fded8c4d593541a6f7462147bffcb719def378f --- /dev/null +++ b/tools/testing/selftests/vsock/Makefile @@ -0,0 +1,9 @@ +# SPDX-License-Identifier: GPL-2.0 +.PHONY: all +all: + +TEST_PROGS := vmtest.sh +EXTRA_CLEAN := vmtest.log + +include ../lib.mk + diff --git a/tools/testing/selftests/vsock/config.vsock b/tools/testing/selftests/vsock/config.vsock new file mode 100644 index 0000000000000000000000000000000000000000..9e0fb2270e6a2fc0beb5f0d9f0bc37158d0a9d23 --- /dev/null +++ b/tools/testing/selftests/vsock/config.vsock @@ -0,0 +1,10 @@ +CONFIG_VSOCKETS=y +CONFIG_VSOCKETS_DIAG=y +CONFIG_VSOCKETS_LOOPBACK=y +CONFIG_VMWARE_VMCI_VSOCKETS=y +CONFIG_VIRTIO_VSOCKETS=y +CONFIG_VIRTIO_VSOCKETS_COMMON=y +CONFIG_HYPERV_VSOCKETS=y +CONFIG_VMWARE_VMCI=y +CONFIG_VHOST_VSOCK=y +CONFIG_HYPERV=y diff --git a/tools/testing/selftests/vsock/settings b/tools/testing/selftests/vsock/settings new file mode 100644 index 0000000000000000000000000000000000000000..e7b9417537fbc4626153b72e8f295ab4594c844b --- /dev/null +++ b/tools/testing/selftests/vsock/settings @@ -0,0 +1 @@ +timeout=0 diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh new file mode 100755 index 0000000000000000000000000000000000000000..d70b9446e531d6d20beb24ddeda2cf0a9f7e9a39 --- /dev/null +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -0,0 +1,354 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (c) 2025 Meta Platforms, Inc. and affiliates +# +# Dependencies: +# * virtme-ng +# * busybox-static (used by virtme-ng) +# * qemu (used by virtme-ng) + +SCRIPT_DIR="$(cd -P -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)" +KERNEL_CHECKOUT=$(realpath ${SCRIPT_DIR}/../../../..) +QEMU=$(command -v qemu-system-$(uname -m)) +VERBOSE=0 +SKIP_BUILD=0 +VSOCK_TEST=${KERNEL_CHECKOUT}/tools/testing/vsock/vsock_test + +TEST_GUEST_PORT=51000 +TEST_HOST_PORT=50000 +TEST_HOST_PORT_LISTENER=50001 +SSH_GUEST_PORT=22 +SSH_HOST_PORT=2222 +VSOCK_CID=1234 +WAIT_PERIOD=3 +WAIT_PERIOD_MAX=20 + +QEMU_PIDFILE=/tmp/qemu.pid + +# virtme-ng offers a netdev for ssh when using "--ssh", but we also need a +# control port forwarded for vsock_test. Because virtme-ng doesn't support +# adding an additional port to forward to the device created from "--ssh" and +# virtme-init mistakenly sets identical IPs to the ssh device and additional +# devices, we instead opt out of using --ssh, add the device manually, and also +# add the kernel cmdline options that virtme-init uses to setup the interface. +QEMU_OPTS="" +QEMU_OPTS="${QEMU_OPTS} -netdev user,id=n0,hostfwd=tcp::${TEST_HOST_PORT}-:${TEST_GUEST_PORT}" +QEMU_OPTS="${QEMU_OPTS},hostfwd=tcp::${SSH_HOST_PORT}-:${SSH_GUEST_PORT}" +QEMU_OPTS="${QEMU_OPTS} -device virtio-net-pci,netdev=n0" +QEMU_OPTS="${QEMU_OPTS} -device vhost-vsock-pci,guest-cid=${VSOCK_CID}" +QEMU_OPTS="${QEMU_OPTS} --pidfile ${QEMU_PIDFILE}" +KERNEL_CMDLINE="virtme.dhcp net.ifnames=0 biosdevname=0 virtme.ssh virtme_ssh_user=$USER" + +LOG=${SCRIPT_DIR}/vmtest.log + +# Name Description +avail_tests=" + vm_server_host_client Run vsock_test in server mode on the VM and in client mode on the host. + vm_client_host_server Run vsock_test in client mode on the VM and in server mode on the host. + vm_loopback Run vsock_test using the loopback transport in the VM. +" + +usage() { + echo + echo "$0 [OPTIONS] [TEST]..." + echo "If no TEST argument is given, all tests will be run." + echo + echo "Options" + echo " -v: verbose output" + echo " -s: skip build" + echo + echo "Available tests${avail_tests}" + exit 1 +} + +die() { + echo "$*" >&2 + exit 1 +} + +vm_ssh() { + ssh -q -o UserKnownHostsFile=/dev/null -p 2222 localhost $* + return $? +} + +cleanup() { + if [[ -f "${QEMU_PIDFILE}" ]]; then + pkill -SIGTERM -F ${QEMU_PIDFILE} 2>&1 >/dev/null + fi +} + +build() { + log_setup "Building kernel and tests" + + pushd ${KERNEL_CHECKOUT} >/dev/null + vng \ + --kconfig \ + --config ${KERNEL_CHECKOUT}/tools/testing/selftests/vsock/config.vsock + make -j$(nproc) + make -C ${KERNEL_CHECKOUT}/tools/testing/vsock + popd >/dev/null + echo +} + +vm_setup() { + local VNG_OPTS="" + if [[ "${VERBOSE}" = 1 ]]; then + VNG_OPTS="--verbose" + fi + vng \ + $VNG_OPTS \ + --run ${KERNEL_CHECKOUT} \ + --qemu-opts="${QEMU_OPTS}" \ + --qemu="${QEMU}" \ + --user root \ + --append "${KERNEL_CMDLINE}" \ + --rw 2>&1 >/dev/null & +} + +vm_wait_for_ssh() { + i=0 + while [[ true ]]; do + if [[ ${i} > ${WAIT_PERIOD_MAX} ]]; then + die "Timed out waiting for guest ssh" + fi + vm_ssh -- true + if [[ $? -eq 0 ]]; then + break + fi + i=$(( i + 1 )) + sleep ${WAIT_PERIOD} + done +} + +wait_for_listener() { + local PORT=$1 + local i=0 + while ! ss -ltn | grep -q ":${PORT}"; do + if [[ ${i} > ${WAIT_PERIOD_MAX} ]]; then + die "Timed out waiting for listener on port ${PORT}" + fi + i=$(( i + 1 )) + sleep ${WAIT_PERIOD} + done +} + +vm_wait_for_listener() { + local port=$1 + vm_ssh -- "$(declare -f wait_for_listener); wait_for_listener ${port}" +} + +host_wait_for_listener() { + wait_for_listener ${TEST_HOST_PORT_LISTENER} +} + +log() { + local prefix="$1" + shift + + if [[ "$#" -eq 0 ]]; then + cat | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }' | tee -a ${LOG} + else + echo "$*" | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }' | tee -a ${LOG} + fi +} + +log_setup() { + log "setup" "$@" +} + +log_host() { + testname=$1 + shift + log "test:${testname}:host" "$@" +} + +log_guest() { + testname=$1 + shift + log "test:${testname}:guest" "$@" +} + +test_vm_server_host_client() { + local testname="${FUNCNAME[0]#test_}" + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=server \ + --control-port="${TEST_GUEST_PORT}" \ + --peer-cid=2 \ + 2>&1 | log_guest "${testname}" & + + vm_wait_for_listener ${TEST_GUEST_PORT} + + ${VSOCK_TEST} \ + --mode=client \ + --control-host=127.0.0.1 \ + --peer-cid="${VSOCK_CID}" \ + --control-port="${TEST_HOST_PORT}" 2>&1 | log_host "${testname}" + + return $? +} + +test_vm_client_host_server() { + local testname="${FUNCNAME[0]#test_}" + + ${VSOCK_TEST} \ + --mode "server" \ + --control-port "${TEST_HOST_PORT_LISTENER}" \ + --peer-cid "${VSOCK_CID}" 2>&1 | log_host "${testname}" & + + host_wait_for_listener + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=client \ + --control-host=10.0.2.2 \ + --peer-cid=2 \ + --control-port="${TEST_HOST_PORT_LISTENER}" 2>&1 | log_guest "${testname}" + + return $? +} + +test_vm_loopback() { + local testname="${FUNCNAME[0]#test_}" + local port=60000 # non-forwarded local port + + vm_ssh -- ${VSOCK_TEST} \ + --mode=server \ + --control-port="${port}" \ + --peer-cid=1 2>&1 | log_guest "${testname}" & + + vm_wait_for_listener ${port} + + vm_ssh -- ${VSOCK_TEST} \ + --mode=client \ + --control-host="127.0.0.1" \ + --control-port="${port}" \ + --peer-cid=1 2>&1 | log_guest "${testname}" + + return $? +} + +run_test() { + unset IFS + local host_oops_cnt_before + local host_warn_cnt_before + local vm_oops_cnt_before + local vm_warn_cnt_before + local host_oops_cnt_after + local host_warn_cnt_after + local vm_oops_cnt_after + local vm_warn_cnt_after + local name + local rc + + host_oops_cnt_before=$(dmesg | grep -c -i 'Oops') + host_warn_cnt_before=$(dmesg --level=warn | wc -l) + vm_oops_cnt_before=$(vm_ssh -- dmesg | grep -c -i 'Oops') + vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | wc -l) + + name=$(echo "${1}" | awk '{ print $1 }') + eval test_"${name}" + + host_oops_cnt_after=$(dmesg | grep -i 'Oops' | wc -l) + if [[ ${host_oops_cnt_after} > ${host_oops_cnt_before} ]]; then + echo "${name}: kernel oops detected on host" | log_host ${name} + rc=1 + fi + + host_warn_cnt_after=$(dmesg --level=warn | wc -l) + if [[ ${host_warn_cnt_after} > ${host_warn_cnt_before} ]]; then + echo "${name}: kernel warning detected on host" | log_host ${name} + rc=1 + fi + + vm_oops_cnt_after=$(vm_ssh -- dmesg | grep -i 'Oops' | wc -l) + if [[ ${vm_oops_cnt_after} > ${vm_oops_cnt_before} ]]; then + echo "${name}: kernel oops detected on vm" | log_host ${name} + rc=1 + fi + + vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | wc -l) + if [[ ${vm_warn_cnt_after} > ${vm_warn_cnt_before} ]]; then + echo "${name}: kernel warning detected on vm" | log_host ${name} + rc=1 + fi + + return ${rc} +} + +while getopts :hvsq: o +do + case $o in + v) VERBOSE=1;; + s) SKIP_BUILD=1;; + q) QEMU=$OPTARG;; + h|*) usage;; + esac +done +shift $((OPTIND-1)) + +trap cleanup EXIT + +if [[ ! -x "$(command -v vng)" ]]; then + die "vng not found." +fi + +if [[ ! -x "${QEMU}" ]]; then + die "${QEMU} not found." +fi + +rm -f "${LOG}" +if [[ "${SKIP_BUILD}" != 1 ]]; then + build +fi +log_setup "Booting up VM" +vm_setup +vm_wait_for_ssh +log_setup "VM booted up" + +for arg in "$@"; do + if ! command -v > /dev/null "test_${arg}"; then + echo "Test ${arg} not found" + die "${usage}" + fi +done + +IFS=" +" +cnt=0 +name="" +desc="" +for t in ${avail_tests}; do + [ "${name}" = "" ] && name="${t}" && continue + # desc is unused, but we need to eat it. + [ "${desc}" = "" ] && desc="${t}" + + run_this=0 + if [[ "${#}" -eq 0 ]]; then + run_this=1 + else + for arg in "$@"; do + if [[ "${arg}" = "${name}" ]]; then + run_this=1 + fi + done + fi + + if [[ "${run_this}" = 1 ]]; then + run_test "${name}" + rc=$? + if [[ ${rc} != 0 ]]; then + cnt=$(( cnt + 1 )) + fi + fi + name="" + desc="" +done + +if [[ ${cnt} = 0 ]]; then + echo OK +else + echo FAILED: ${cnt} +fi +echo "Log: ${LOG}" +exit ${cnt} --- base-commit: 8066e388be48f1ad62b0449dc1d31a25489fa12a change-id: 20250325-vsock-vmtest-b3a21d2102c2 Best regards, -- Bobby Eshleman <bobbyeshleman(a)gmail.com>

7 months, 2 weeks

3
6
0 0

[RFC PATCH 00/11] New KVM ioctl to link a gmem inode to a new gmem file

by Ackerley Tng

Hello, This patchset builds upon the code at https://lore.kernel.org/lkml/20230718234512.1690985-1-seanjc@google.com/T/. This code is available at https://github.com/googleprodkernel/linux-cc/tree/kvm-gmem-link-migrate-rfc…. In guest_mem v11, a split file/inode model was proposed, where memslot bindings belong to the file and pages belong to the inode. This model lends itself well to having different VMs use separate files pointing to the same inode. This RFC proposes an ioctl, KVM_LINK_GUEST_MEMFD, that takes a VM and a gmem fd, and returns another gmem fd referencing a different file and associated with VM. This RFC also includes an update to KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM to migrate memory context (slot->arch.lpage_info and kvm->mem_attr_array) from source to destination vm, intra-host. Intended usage of the two ioctls: 1. Source VM’s fd is passed to destination VM via unix sockets 2. Destination VM uses new ioctl KVM_LINK_GUEST_MEMFD to link source VM’s fd to a new fd. 3. Destination VM will pass new fds to KVM_SET_USER_MEMORY_REGION, which will bind the new file, pointing to the same inode that the source VM’s file points to, to memslots 4. Use KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM to move kvm->mem_attr_array and slot->arch.lpage_info to the destination VM. 5. Run the destination VM as per normal Some other approaches considered were: + Using the linkat() syscall, but that requires a mount/directory for a source fd to be linked to + Using the dup() syscall, but that only duplicates the fd, and both fds point to the same file --- Ackerley Tng (11): KVM: guest_mem: Refactor out kvm_gmem_alloc_file() KVM: guest_mem: Add ioctl KVM_LINK_GUEST_MEMFD KVM: selftests: Add tests for KVM_LINK_GUEST_MEMFD ioctl KVM: selftests: Test transferring private memory to another VM KVM: x86: Refactor sev's flag migration_in_progress to kvm struct KVM: x86: Refactor common code out of sev.c KVM: x86: Refactor common migration preparation code out of sev_vm_move_enc_context_from KVM: x86: Let moving encryption context be configurable KVM: x86: Handle moving of memory context for intra-host migration KVM: selftests: Generalize migration functions from sev_migrate_tests.c KVM: selftests: Add tests for migration of private mem arch/x86/include/asm/kvm_host.h | 4 +- arch/x86/kvm/svm/sev.c | 85 ++----- arch/x86/kvm/svm/svm.h | 3 +- arch/x86/kvm/x86.c | 221 +++++++++++++++++- arch/x86/kvm/x86.h | 6 + include/linux/kvm_host.h | 18 ++ include/uapi/linux/kvm.h | 8 + tools/testing/selftests/kvm/Makefile | 1 + .../testing/selftests/kvm/guest_memfd_test.c | 42 ++++ .../selftests/kvm/include/kvm_util_base.h | 31 +++ .../kvm/x86_64/private_mem_migrate_tests.c | 93 ++++++++ .../selftests/kvm/x86_64/sev_migrate_tests.c | 48 ++-- virt/kvm/guest_mem.c | 151 ++++++++++-- virt/kvm/kvm_main.c | 10 + virt/kvm/kvm_mm.h | 7 + 15 files changed, 596 insertions(+), 132 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86_64/private_mem_migrate_tests.c -- 2.41.0.640.ga95def55d0-goog

7 months, 2 weeks

3
15
0 0

[PATCH] selftests/run_kselftest.sh: Use readlink if realpath is not available

by Yosry Ahmed

'realpath' is not always available, fallback to 'readlink -f' if is not available. They seem to work equally well in this context. Signed-off-by: Yosry Ahmed <yosry.ahmed(a)linux.dev> --- tools/testing/selftests/run_kselftest.sh | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/run_kselftest.sh b/tools/testing/selftests/run_kselftest.sh index 50e03eefe7ac7..0443beacf3621 100755 --- a/tools/testing/selftests/run_kselftest.sh +++ b/tools/testing/selftests/run_kselftest.sh @@ -3,7 +3,14 @@ # # Run installed kselftest tests. # -BASE_DIR=$(realpath $(dirname $0)) + +# Fallback to readlink if realpath is not available +if which realpath > /dev/null; then + BASE_DIR=$(realpath $(dirname $0)) +else + BASE_DIR=$(readlink -f $(dirname $0)) +fi + cd $BASE_DIR TESTS="$BASE_DIR"/kselftest-list.txt if [ ! -r "$TESTS" ] ; then -- 2.49.0.rc1.451.g8f38331e32-goog

7 months, 3 weeks

2
3
0 0

[PATCH] KVM: selftests: add test for SVE host corruption

by Mark Brown

This test program, originally written by Mark Rutland and lightly modified by me for upstream, verifies that we do not have the issues with host SVE state being discarded which were fixed in fbc7e61195e2 ("KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state") by running a simple VM while checking the SVE register state for corruption. Signed-off-by: Mark Brown <broonie(a)kernel.org> --- tools/testing/selftests/kvm/Makefile.kvm | 1 + tools/testing/selftests/kvm/arm64/host_sve.c | 127 +++++++++++++++++++++++++++ 2 files changed, 128 insertions(+) diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm index f62b0a5aba35..d37072054a3d 100644 --- a/tools/testing/selftests/kvm/Makefile.kvm +++ b/tools/testing/selftests/kvm/Makefile.kvm @@ -147,6 +147,7 @@ TEST_GEN_PROGS_arm64 = $(TEST_GEN_PROGS_COMMON) TEST_GEN_PROGS_arm64 += arm64/aarch32_id_regs TEST_GEN_PROGS_arm64 += arm64/arch_timer_edge_cases TEST_GEN_PROGS_arm64 += arm64/debug-exceptions +TEST_GEN_PROGS_arm64 += arm64/host_sve TEST_GEN_PROGS_arm64 += arm64/hypercalls TEST_GEN_PROGS_arm64 += arm64/mmio_abort TEST_GEN_PROGS_arm64 += arm64/page_fault_test diff --git a/tools/testing/selftests/kvm/arm64/host_sve.c b/tools/testing/selftests/kvm/arm64/host_sve.c new file mode 100644 index 000000000000..3826772fd470 --- /dev/null +++ b/tools/testing/selftests/kvm/arm64/host_sve.c @@ -0,0 +1,127 @@ +// SPDX-License-Identifier: GPL-2.0-only + +/* + * Host SVE: Check FPSIMD/SVE/SME save/restore over KVM_RUN ioctls. + * + * Copyright 2025 Arm, Ltd + */ + +#include <errno.h> +#include <signal.h> +#include <sys/auxv.h> +#include <asm/kvm.h> +#include <kvm_util.h> + +#include "ucall_common.h" + +static void guest_code(void) +{ + for (int i = 0; i < 10; i++) { + GUEST_UCALL_NONE(); + } + + GUEST_DONE(); +} + +void handle_sigill(int sig, siginfo_t *info, void *ctx) +{ + ucontext_t *uctx = ctx; + + printf(" < host signal %d >\n", sig); + + /* + * Skip the UDF + */ + uctx->uc_mcontext.pc += 4; +} + +void register_sigill_handler(void) +{ + struct sigaction sa = { + .sa_sigaction = handle_sigill, + .sa_flags = SA_SIGINFO, + }; + sigaction(SIGILL, &sa, NULL); +} + +static void do_sve_roundtrip(void) +{ + unsigned long before, after; + + /* + * Set all bits in a predicate register, force a save/restore via a + * SIGILL (which handle_sigill() will recover from), then report + * whether the value has changed. + */ + asm volatile( + " .arch_extension sve\n" + " ptrue p0.B\n" + " cntp %[before], p0, p0.B\n" + " udf #0\n" + " cntp %[after], p0, p0.B\n" + : [before] "=r" (before), + [after] "=r" (after) + : + : "p0" + ); + + if (before != after) { + TEST_FAIL("Signal roundtrip discarded predicate bits (%ld => %ld)\n", + before, after); + } else { + printf("Signal roundtrip preserved predicate bits (%ld => %ld)\n", + before, after); + } +} + +static void test_run(void) +{ + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + struct ucall uc; + bool guest_done = false; + + register_sigill_handler(); + + vm = vm_create_with_one_vcpu(&vcpu, guest_code); + + do_sve_roundtrip(); + + while (!guest_done) { + + printf("Running VCPU...\n"); + vcpu_run(vcpu); + + switch (get_ucall(vcpu, &uc)) { + case UCALL_NONE: + do_sve_roundtrip(); + do_sve_roundtrip(); + break; + case UCALL_DONE: + guest_done = true; + break; + case UCALL_ABORT: + REPORT_GUEST_ASSERT(uc); + break; + default: + TEST_FAIL("Unexpected guest exit"); + } + } + + kvm_vm_free(vm); +} + +int main(void) +{ + /* + * This is testing the host environment, we don't care about + * guest SVE support. + */ + if (!(getauxval(AT_HWCAP) & HWCAP_SVE)) { + printf("SVE not supported\n"); + return KSFT_SKIP; + } + + test_run(); + return 0; +} --- base-commit: 8ffd015db85fea3e15a77027fda6c02ced4d2444 change-id: 20250226-kvm-selftest-sve-signal-1add0d9d716c Best regards, -- Mark Brown <broonie(a)kernel.org>

7 months, 3 weeks

3
3
0 0

[PATCH net-next v13 0/9] Device memory TCP TX

by Mina Almasry

v13: https://lore.kernel.org/netdev/20250425204743.617260-1-almasrymina@google.c… === Changelog: - Fix unneeded error label pointed out by Christoph, and addressed nitpick. v12: https://lore.kernel.org/netdev/20250423031117.907681-1-almasrymina@google.c… ==== No changes in v12, just restored the selftests patch I accidentally dropped in v11 v11: https://lore.kernel.org/netdev/20250423031117.907681-1-almasrymina@google.c… ==== Addressed a couple of nits and collected Acked-by from Harshitha (thanks!) v10: https://lore.kernel.org/netdev/20250417231540.2780723-1-almasrymina@google.… ==== Addressed comments following conversations with Pavel, Stan, and Harshitha. Thank you guys for the reviews again. Overall minor changes: Changelog: - Check for !niov->pp in io_zcrx_recv_frag, just in case we end up with a TX niov in that path (Pavel). - Fix locking case in !netif_device_present (Jakub/Stan). v9: https://lore.kernel.org/netdev/20250415224756.152002-1-almasrymina@google.c… === Changelog: - Use priv->bindings list instead of sock_bindings_list. This was missed during the rebase as the bindings have been updated to use priv->bindings recently (thanks Stan!) v8: https://lore.kernel.org/netdev/20250308214045.1160445-1-almasrymina@google.… === Only address minor comments on V7 Changelog: - Use netdev locking instead of rtnl_locking to match rx path. - Now that iouring zcrx is in net-next, use NET_IOV_IOURING instead of NET_IOV_UNSPECIFIED. - Post send binding to net_devmem_dmabuf_bindings after it's been fully initialized (Stan). v7: https://lore.kernel.org/netdev/20250227041209.2031104-1-almasrymina@google.… === Changelog: - Check the dmabuf net_iov binding belongs to the device the TX is going out on. (Jakub) - Provide detailed inspection of callsites of __skb_frag_ref/skb_page_unref in patch 2's changelog (Jakub) v6: https://lore.kernel.org/netdev/20250222191517.743530-1-almasrymina@google.c… === v6 has no major changes. Addressed a few issues from Paolo and David, and collected Acks from Stan. Thank you everyone for the review! Changes: - retain behavior to process MSG_FASTOPEN even if the provided cmsg is invalid (Paolo). - Rework the freeing of tx_vec slightly (it now has its own err label). (Paolo). - Squash the commit that makes dmabuf unbinding scheduled work into the same one which implements the TX path so we don't run into future errors on bisecting (Paolo). - Fix/add comments to explain how dmabuf binding refcounting works (David). v5: https://lore.kernel.org/netdev/20250220020914.895431-1-almasrymina@google.c… === v5 has no major changes; it clears up the relatively minor issues pointed out to in v4, and rebases the series on top of net-next to resolve the conflict with a patch that raced to the tree. It also collects the review tags from v4. Changes: - Rebase to net-next - Fix issues in selftest (Stan). - Address comments in the devmem and netmem driver docs (Stan and Bagas) - Fix zerocopy_fill_skb_from_devmem return error code (Stan). v4: https://lore.kernel.org/netdev/20250203223916.1064540-1-almasrymina@google.… === v4 mainly addresses the critical driver support issue surfaced in v3 by Paolo and Stan. Drivers aiming to support netmem_tx should make sure not to pass the netmem dma-addrs to the dma-mapping APIs, as these dma-addrs may come from dma-bufs. Additionally other feedback from v3 is addressed. Major changes: - Add helpers to handle netmem dma-addrs. Add GVE support for netmem_tx. - Fix binding->tx_vec not being freed on error paths during the tx binding. - Add a minimal devmem_tx test to devmem.py. - Clean up everything obsolete from the cover letter (Paolo). v3: https://patchwork.kernel.org/project/netdevbpf/list/?series=929401&state=* === Address minor comments from RFCv2 and fix a few build warnings and ynl-regen issues. No major changes. RFC v2: https://patchwork.kernel.org/project/netdevbpf/list/?series=920056&state=* ======= RFC v2 addresses much of the feedback from RFC v1. I plan on sending something close to this as net-next reopens, sending it slightly early to get feedback if any. Major changes: -------------- - much improved UAPI as suggested by Stan. We now interpret the iov_base of the passed in iov from userspace as the offset into the dmabuf to send from. This removes the need to set iov.iov_base = NULL which may be confusing to users, and enables us to send multiple iovs in the same sendmsg() call. ncdevmem and the docs show a sample use of that. - Removed the duplicate dmabuf iov_iter in binding->iov_iter. I think this is good improvment as it was confusing to keep track of 2 iterators for the same sendmsg, and mistracking both iterators caused a couple of bugs reported in the last iteration that are now resolved with this streamlining. - Improved test coverage in ncdevmem. Now multiple sendmsg() are tested, and sending multiple iovs in the same sendmsg() is tested. - Fixed issue where dmabuf unmapping was happening in invalid context (Stan). ==================================================================== The TX path had been dropped from the Device Memory TCP patch series post RFCv1 [1], to make that series slightly easier to review. This series rebases the implementation of the TX path on top of the net_iov/netmem framework agreed upon and merged. The motivation for the feature is thoroughly described in the docs & cover letter of the original proposal, so I don't repeat the lengthy descriptions here, but they are available in [1]. Full outline on usage of the TX path is detailed in the documentation included with this series. Test example is available via the kselftest included in the series as well. The series is relatively small, as the TX path for this feature largely piggybacks on the existing MSG_ZEROCOPY implementation. Patch Overview: --------------- 1. Documentation & tests to give high level overview of the feature being added. 1. Add netmem refcounting needed for the TX path. 2. Devmem TX netlink API. 3. Devmem TX net stack implementation. 4. Make dma-buf unbinding scheduled work to handle TX cases where it gets freed from contexts where we can't sleep. 5. Add devmem TX documentation. 6. Add scaffolding enabling driver support for netmem_tx. Add helpers, driver feature flag, and docs to enable drivers to declare netmem_tx support. 7. Guard netmem_tx against being enabled against drivers that don't support it. 8. Add devmem_tx selftests. Add TX path to ncdevmem and add a test to devmem.py. Testing: -------- Testing is very similar to devmem TCP RX path. The ncdevmem test used for the RX path is now augemented with client functionality to test TX path. * Test Setup: Kernel: net-next with this RFC and memory provider API cherry-picked locally. Hardware: Google Cloud A3 VMs. NIC: GVE with header split & RSS & flow steering support. Performance results are not included with this version, unfortunately. I'm having issues running the dma-buf exporter driver against the upstream kernel on my test setup. The issues are specific to that dma-buf exporter and do not affect this patch series. I plan to follow up this series with perf fixes if the tests point to issues once they're up and running. Special thanks to Stan who took a stab at rebasing the TX implementation on top of the netmem/net_iov framework merged. Parts of his proposal [2] that are reused as-is are forked off into their own patches to give full credit. [1] https://lore.kernel.org/netdev/20240909054318.1809580-1-almasrymina@google.… [2] https://lore.kernel.org/netdev/20240913150913.1280238-2-sdf@fomichev.me/T/#… Cc: sdf(a)fomichev.me Cc: asml.silence(a)gmail.com Cc: dw(a)davidwei.uk Cc: Jamal Hadi Salim <jhs(a)mojatatu.com> Cc: Victor Nogueira <victor(a)mojatatu.com> Cc: Pedro Tammela <pctammela(a)mojatatu.com> Cc: Samiullah Khawaja <skhawaja(a)google.com> Cc: Kuniyuki Iwashima <kuniyu(a)amazon.com> Mina Almasry (8): netmem: add niov->type attribute to distinguish different net_iov types net: add get_netmem/put_netmem support net: devmem: Implement TX path net: add devmem TCP TX documentation net: enable driver support for netmem TX gve: add netmem TX support to GVE DQO-RDA mode net: check for driver support in netmem TX selftests: ncdevmem: Implement devmem TCP TX Stanislav Fomichev (1): net: devmem: TCP tx netlink api Documentation/netlink/specs/netdev.yaml | 12 + Documentation/networking/devmem.rst | 150 ++++++++- .../networking/net_cachelines/net_device.rst | 1 + Documentation/networking/netdev-features.rst | 5 + Documentation/networking/netmem.rst | 23 +- drivers/net/ethernet/google/gve/gve_main.c | 3 + drivers/net/ethernet/google/gve/gve_tx_dqo.c | 8 +- include/linux/netdevice.h | 2 + include/linux/skbuff.h | 17 +- include/linux/skbuff_ref.h | 4 +- include/net/netmem.h | 34 +- include/net/sock.h | 1 + include/uapi/linux/netdev.h | 1 + io_uring/zcrx.c | 3 +- net/core/datagram.c | 48 ++- net/core/dev.c | 34 +- net/core/devmem.c | 131 ++++++-- net/core/devmem.h | 83 ++++- net/core/netdev-genl-gen.c | 13 + net/core/netdev-genl-gen.h | 1 + net/core/netdev-genl.c | 80 ++++- net/core/skbuff.c | 48 ++- net/core/sock.c | 6 + net/ipv4/ip_output.c | 3 +- net/ipv4/tcp.c | 50 ++- net/ipv6/ip6_output.c | 3 +- net/vmw_vsock/virtio_transport_common.c | 5 +- tools/include/uapi/linux/netdev.h | 1 + .../selftests/drivers/net/hw/devmem.py | 26 +- .../selftests/drivers/net/hw/ncdevmem.c | 300 +++++++++++++++++- 30 files changed, 1008 insertions(+), 88 deletions(-) base-commit: 0d15a26b247d25cd012134bf8825128fedb15cc9 -- 2.49.0.901.g37484f566f-goog

7 months, 3 weeks

3
17
0 0

[PATCH v2 0/3] selftests: ublk: more misc fixes

by Uday Shankar

Fix some more minor issues in ublk selftests. The first patch is from https://lore.kernel.org/linux-block/20250423-ublk_selftests-v1-0-7d060e260e… with a modification requested by Jens. The others are new. Signed-off-by: Uday Shankar <ushankar(a)purestorage.com> --- Changes in v2: - Use a test-specific WERROR flag instead of reusing CONFIG_WERROR from the kernel build for deciding whether or not to use -Werror for the kublk build. The default behavior is to use -Werror (Ming Lei) - Link to v1: https://lore.kernel.org/r/20250428-ublk_selftests-v1-0-5795f7b00cda@puresto… --- Uday Shankar (3): selftests: ublk: kublk: build with -Werror iff WERROR!=0 selftests: ublk: make test_generic_06 silent on success selftests: ublk: kublk: fix include path tools/testing/selftests/ublk/Makefile | 6 +++++- tools/testing/selftests/ublk/kublk.h | 1 - tools/testing/selftests/ublk/test_generic_06.sh | 2 +- 3 files changed, 6 insertions(+), 3 deletions(-) --- base-commit: 53ec1abce79c986dc59e59d0c60d00088bcdf32a change-id: 20250428-ublk_selftests-983240d3a325 Best regards, -- Uday Shankar <ushankar(a)purestorage.com>

7 months, 3 weeks

3
5
0 0

[PATCH 0/3] selftests: ublk: more misc fixes

by Uday Shankar

Fix some more minor issues in ublk selftests. The first patch is from https://lore.kernel.org/linux-block/20250423-ublk_selftests-v1-0-7d060e260e… with a modification requested by Jens. The others are new. Signed-off-by: Uday Shankar <ushankar(a)purestorage.com> --- Uday Shankar (3): selftests: ublk: kublk: build with -Werror iff CONFIG_WERROR=y selftests: ublk: make test_generic_06 silent on success selftests: ublk: kublk: fix include path tools/testing/selftests/ublk/Makefile | 4 +++- tools/testing/selftests/ublk/kublk.h | 1 - tools/testing/selftests/ublk/test_generic_06.sh | 2 +- 3 files changed, 4 insertions(+), 3 deletions(-) --- base-commit: 53ec1abce79c986dc59e59d0c60d00088bcdf32a change-id: 20250428-ublk_selftests-983240d3a325 Best regards, -- Uday Shankar <ushankar(a)purestorage.com>

7 months, 3 weeks

4
14
0 0

[PATCH v3 0/8] perform /proc/pid/maps read and PROCMAP_QUERY under RCU

by Suren Baghdasaryan

After a long delay I'm posting next iteration of lockless /proc/pid/maps reading patchset. Differences from v2 [1]: - Add a set of tests concurrently modifying address space and checking for correct reading results; - Use new mmap_lock_speculate_xxx APIs for concurrent change detection and retries; - Add lockless PROCMAP_QUERY execution support; The new tests are designed to check for any unexpected data tearing while performing some common address space modifications (vma split, resize and remap). Even before these changes, reading /proc/pid/maps might have inconsistent data because the file is read page-by-page with mmap_lock being dropped between the pages. Such tearing is expected and userspace is supposed to deal with that possibility. An example of user-visible inconsistency can be that the same vma is printed twice: once before it was modified and then after the modifications. For example if vma was extended, it might be found and reported twice. Whan is not expected is to see a gap where there should have been a vma both before and after modification. This patchset increases the chances of such tearing, therefore it's event more important now to test for unexpected inconsistencies. Thanks to Paul McKenney who developed a benchmark to test performance of concurrent reads and updates, we also have data on performance benefits: The test has a pair of processes scanning /proc/PID/maps, and another process unmapping and remapping 4K pages from a 128MB range of anonymous memory. At the end of each 10-second run, the latency of each mmap() or munmap() operation is measured, and for each run the maximum and mean latency is printed. (Yes, the map/unmap process is started first, its PID is passed to the scanners, and then the map/unmap process waits until both scanners are running before starting its timed test. The scanners keep scanning until the specified /proc/PID/maps file disappears.) In summary, with stock mm, 78% of the runs had maximum latencies in excess of 0.5 milliseconds, and with more then half of the runs' latencies exceeding a full millisecond. In contrast, 98% of the runs with Suren's patch series applied had maximum latencies of less than 0.5 milliseconds. From a median-performance viewpoint, Suren's series also looks good, with stock mm weighing in at 13 microseconds and Suren's series at 10 microseconds, better than a 20% improvement. [1] https://lore.kernel.org/all/20240123231014.3801041-1-surenb@google.com/ Suren Baghdasaryan (8): selftests/proc: add /proc/pid/maps tearing from vma split test selftests/proc: extend /proc/pid/maps tearing test to include vma resizing selftests/proc: extend /proc/pid/maps tearing test to include vma remapping selftests/proc: test PROCMAP_QUERY ioctl while vma is concurrently modified selftests/proc: add verbose more for tests to facilitate debugging mm: make vm_area_struct anon_name field RCU-safe mm/maps: read proc/pid/maps under RCU mm/maps: execute PROCMAP_QUERY ioctl under RCU fs/proc/internal.h | 6 + fs/proc/task_mmu.c | 233 +++++- include/linux/mm_inline.h | 28 +- include/linux/mm_types.h | 3 +- mm/madvise.c | 30 +- tools/testing/selftests/proc/proc-pid-vm.c | 793 ++++++++++++++++++++- 6 files changed, 1061 insertions(+), 32 deletions(-) base-commit: 79f35c4125a9a3fd98efeed4cce1cd7ce5311a44 -- 2.49.0.805.g082f7c87e0-goog

7 months, 3 weeks

6
38
0 0

[PATCH v5 00/25] context_tracking,x86: Defer some IPIs until a user->kernel transition

by Valentin Schneider

Context ======= We've observed within Red Hat that isolated, NOHZ_FULL CPUs running a pure-userspace application get regularly interrupted by IPIs sent from housekeeping CPUs. Those IPIs are caused by activity on the housekeeping CPUs leading to various on_each_cpu() calls, e.g.: 64359.052209596 NetworkManager 0 1405 smp_call_function_many_cond (cpu=0, func=do_kernel_range_flush) smp_call_function_many_cond+0x1 smp_call_function+0x39 on_each_cpu+0x2a flush_tlb_kernel_range+0x7b __purge_vmap_area_lazy+0x70 _vm_unmap_aliases.part.42+0xdf change_page_attr_set_clr+0x16a set_memory_ro+0x26 bpf_int_jit_compile+0x2f9 bpf_prog_select_runtime+0xc6 bpf_prepare_filter+0x523 sk_attach_filter+0x13 sock_setsockopt+0x92c __sys_setsockopt+0x16a __x64_sys_setsockopt+0x20 do_syscall_64+0x87 entry_SYSCALL_64_after_hwframe+0x65 The heart of this series is the thought that while we cannot remove NOHZ_FULL CPUs from the list of CPUs targeted by these IPIs, they may not have to execute the callbacks immediately. Anything that only affects kernelspace can wait until the next user->kernel transition, providing it can be executed "early enough" in the entry code. The original implementation is from Peter [1]. Nicolas then added kernel TLB invalidation deferral to that [2], and I picked it up from there. Deferral approach ================= Storing each and every callback, like a secondary call_single_queue turned out to be a no-go: the whole point of deferral is to keep NOHZ_FULL CPUs in userspace for as long as possible - no signal of any form would be sent when deferring an IPI. This means that any form of queuing for deferred callbacks would end up as a convoluted memory leak. Deferred IPIs must thus be coalesced, which this series achieves by assigning IPIs a "type" and having a mapping of IPI type to callback, leveraged upon kernel entry. What about IPIs whose callback take a parameter, you may ask? Peter suggested during OSPM23 [3] that since on_each_cpu() targets housekeeping CPUs *and* isolated CPUs, isolated CPUs can access either global or housekeeping-CPU-local state to "reconstruct" the data that would have been sent via the IPI. This series does not affect any IPI callback that requires an argument, but the approach would remain the same (one coalescable callback executed on kernel entry). Kernel entry vs execution of the deferred operation =================================================== This is what I've referred to as the "Danger Zone" during my LPC24 talk [4]. There is a non-zero length of code that is executed upon kernel entry before the deferred operation can be itself executed (before we start getting into context_tracking.c proper), i.e.: idtentry_func_foo() <--- we're in the kernel irqentry_enter() enter_from_user_mode() __ct_user_exit() ct_kernel_enter_state() ct_work_flush() <--- deferred operation is executed here This means one must take extra care to what can happen in the early entry code, and that <bad things> cannot happen. For instance, we really don't want to hit instructions that have been modified by a remote text_poke() while we're on our way to execute a deferred sync_core(). Patches doing the actual deferral have more detail on this. Where are we at with this whole thing? ====================================== Dave has been incredibly helpful wrt figuring out what would and wouldn't (mostly that) be safe to do for deferring kernel range TLB flush IPIs, see [5]. Long story short, there are ugly things I can still do to (safely) defer the TLB flush IPIs, but it's going to be a long session of pulling my own hair out, and I got plenty so I won't be done for a while. In the meantime, I think everything leading up to deferring text poke IPIs is sane-ish and could get in. I'm not the biggest fan of adding an API with a single user, but hey, I've been working on this for "a little while" now and I'll still need to get the other IPIs sorted out. TL;DR: Text patching IPI deferral LGTM so here it is for now, I'm still working on the TLB flush thing. Patches ======= o Patches 1-2 are standalone objtool cleanups. o Patches 3-4 add an RCU testing feature. o Patches 5-6 add infrastructure for annotating static keys and static calls that may be used in noinstr code (courtesy of Josh). o Patches 7-20 use said annotations on relevant keys / calls. o Patch 21 enforces proper usage of said annotations (courtesy of Josh). o Patches 22-23 deal with detecting NOINSTR text in modules o Patches 24-25 add the actual IPI deferral faff Patches are also available at: https://gitlab.com/vschneid/linux.git -b redhat/isolirq/defer/v5 Testing ======= Xeon E5-2699 system with SMToff, NOHZ_FULL, isolated CPUs. RHEL10 userspace. Workload is using rteval (kernel compilation + hackbench) on housekeeping CPUs and a dummy stay-in-userspace loop on the isolated CPUs. The main invocation is: $ trace-cmd record -e "csd_queue_cpu" -f "cpu & CPUS{$ISOL_CPUS}" \ -e "ipi_send_cpumask" -f "cpumask & CPUS{$ISOL_CPUS}" \ -e "ipi_send_cpu" -f "cpu & CPUS{$ISOL_CPUS}" \ rteval --onlyload --loads-cpulist=$HK_CPUS \ --hackbench-runlowmem=True --duration=$DURATION This only records IPIs sent to isolated CPUs, so any event there is interference (with a bit of fuzz at the start/end of the workload when spawning the processes). All tests were done with a duration of 3 hours. v6.14 # This is the actual IPI count $ trace-cmd report | grep callback | awk '{ print $(NF) }' | sort | uniq -c | sort -nr 93 callback=generic_smp_call_function_single_interrupt+0x0 22 callback=nohz_full_kick_func+0x0 # These are the different CSD's that caused IPIs $ trace-cmd report | grep csd_queue | awk '{ print $(NF-1) }' | sort | uniq -c | sort -nr 1456 func=do_flush_tlb_all 78 func=do_sync_core 33 func=nohz_full_kick_func 26 func=do_kernel_range_flush v6.14 + patches # This is the actual IPI count $ trace-cmd report | grep callback | awk '{ print $(NF) }' | sort | uniq -c | sort -nr 86 callback=generic_smp_call_function_single_interrupt+0x0 41 callback=nohz_full_kick_func+0x0 # These are the different CSD's that caused IPIs $ trace-cmd report | grep csd_queue | awk '{ print $(NF-1) }' | sort | uniq -c | sort -nr 1378 func=do_flush_tlb_all 33 func=nohz_full_kick_func So the TLB flush is still there driving most of the IPIs, but at least the instruction patching IPIs are gone. With kernel TLB flushes deferred, there are no IPIs sent to isolated CPUs in that 3hr window, but as stated above that still needs some more work. Also note that tlb_remove_table_smp_sync() showed up during testing of v3, and has gone as mysteriously as it showed up. Yair had a series adressing this [6] which per these results would be worth revisiting. Acknowledgements ================ Special thanks to: o Clark Williams for listening to my ramblings about this and throwing ideas my way o Josh Poimboeuf for all his help with everything objtool-related o All of the folks who attended various (too many?) talks about this and provided precious feedback. o The mm folks for pointing out what I can and can't do with TLB flushes Links ===== [1]: https://lore.kernel.org/all/20210929151723.162004989@infradead.org/ [2]: https://github.com/vianpl/linux.git -b ct-work-defer-wip [3]: https://youtu.be/0vjE6fjoVVE [4]: https://lpc.events/event/18/contributions/1889/ [5]: http://lore.kernel.org/r/eef09bdc-7546-462b-9ac0-661a44d2ceae@intel.com [6]: https://lore.kernel.org/lkml/20230620144618.125703-1-ypodemsk@redhat.com/ Revisions ========= v4 -> v5 ++++++++ o Rebased onto v6.15-rc3 o Collected Reviewed-by o Annotated a few more static keys o Added proper checking of noinstr sections that are in loadable code such as KVM early entry (Sean Christopherson) o Switched to checking for CT_RCU_WATCHING instead of CT_STATE_KERNEL or CT_STATE_IDLE, which means deferral is now behaving sanely for IRQ/NMI entry from idle (thanks to Frederic!) o Ditched the vmap TLB flush deferral (for now) RFCv3 -> v4 +++++++++++ o Rebased onto v6.13-rc6 o New objtool patches from Josh o More .noinstr static key/call patches o Static calls now handled as well (again thanks to Josh) o Fixed clearing the work bits on kernel exit o Messed with IRQ hitting an idle CPU vs context tracking o Various comment and naming cleanups o Made RCU_DYNTICKS_TORTURE depend on !COMPILE_TEST (PeterZ) o Fixed the CT_STATE_KERNEL check when setting a deferred work (Frederic) o Cleaned up the __flush_tlb_all() mess thanks to PeterZ RFCv2 -> RFCv3 ++++++++++++++ o Rebased onto v6.12-rc6 o Added objtool documentation for the new warning (Josh) o Added low-size RCU watching counter to TREE04 torture scenario (Paul) o Added FORCEFUL jump label and static key types o Added noinstr-compliant helpers for tlb flush deferral RFCv1 -> RFCv2 ++++++++++++++ o Rebased onto v6.5-rc1 o Updated the trace filter patches (Steven) o Fixed __ro_after_init keys used in modules (Peter) o Dropped the extra context_tracking atomic, squashed the new bits in the existing .state field (Peter, Frederic) o Added an RCU_EXPERT config for the RCU dynticks counter size, and added an rcutorture case for a low-size counter (Paul) o Fixed flush_tlb_kernel_range_deferrable() definition Josh Poimboeuf (3): jump_label: Add annotations for validating noinstr usage static_call: Add read-only-after-init static calls objtool: Add noinstr validation for static branches/calls Valentin Schneider (22): objtool: Make validate_call() recognize indirect calls to pv_ops[] objtool: Flesh out warning related to pv_ops[] calls rcu: Add a small-width RCU watching counter debug option rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE x86/paravirt: Mark pv_sched_clock static call as __ro_after_init x86/idle: Mark x86_idle static call as __ro_after_init x86/paravirt: Mark pv_steal_clock static call as __ro_after_init riscv/paravirt: Mark pv_steal_clock static call as __ro_after_init loongarch/paravirt: Mark pv_steal_clock static call as __ro_after_init arm64/paravirt: Mark pv_steal_clock static call as __ro_after_init arm/paravirt: Mark pv_steal_clock static call as __ro_after_init perf/x86/amd: Mark perf_lopwr_cb static call as __ro_after_init sched/clock: Mark sched_clock_running key as __ro_after_init KVM: VMX: Mark __kvm_is_using_evmcs static key as __ro_after_init x86/speculation/mds: Mark mds_idle_clear key as allowed in .noinstr sched/clock, x86: Mark __sched_clock_stable key as allowed in .noinstr KVM: VMX: Mark vmx_l1d_should flush and vmx_l1d_flush_cond keys as allowed in .noinstr stackleack: Mark stack_erasing_bypass key as allowed in .noinstr module: Remove outdated comment about text_size module: Add MOD_NOINSTR_TEXT mem_type context-tracking: Introduce work deferral infrastructure context_tracking,x86: Defer kernel text patching IPIs arch/Kconfig | 9 ++ arch/arm/kernel/paravirt.c | 2 +- arch/arm64/kernel/paravirt.c | 2 +- arch/loongarch/kernel/paravirt.c | 2 +- arch/riscv/kernel/paravirt.c | 2 +- arch/x86/Kconfig | 1 + arch/x86/events/amd/brs.c | 2 +- arch/x86/include/asm/context_tracking_work.h | 18 +++ arch/x86/include/asm/text-patching.h | 1 + arch/x86/kernel/alternative.c | 39 ++++++- arch/x86/kernel/cpu/bugs.c | 2 +- arch/x86/kernel/kprobes/core.c | 4 +- arch/x86/kernel/kprobes/opt.c | 4 +- arch/x86/kernel/module.c | 2 +- arch/x86/kernel/paravirt.c | 4 +- arch/x86/kernel/process.c | 2 +- arch/x86/kvm/vmx/vmx.c | 11 +- arch/x86/kvm/vmx/vmx_onhyperv.c | 2 +- include/asm-generic/sections.h | 15 +++ include/linux/context_tracking.h | 21 ++++ include/linux/context_tracking_state.h | 54 +++++++-- include/linux/context_tracking_work.h | 26 +++++ include/linux/jump_label.h | 30 ++++- include/linux/module.h | 6 +- include/linux/objtool.h | 7 ++ include/linux/static_call.h | 19 ++++ kernel/context_tracking.c | 69 +++++++++++- kernel/kprobes.c | 8 +- kernel/module/main.c | 85 ++++++++++---- kernel/rcu/Kconfig.debug | 15 +++ kernel/sched/clock.c | 7 +- kernel/stackleak.c | 6 +- kernel/time/Kconfig | 5 + tools/objtool/Documentation/objtool.txt | 34 ++++++ tools/objtool/check.c | 106 +++++++++++++++--- tools/objtool/include/objtool/check.h | 1 + tools/objtool/include/objtool/elf.h | 1 + tools/objtool/include/objtool/special.h | 1 + tools/objtool/special.c | 15 ++- .../selftests/rcutorture/configs/rcu/TREE04 | 1 + 40 files changed, 557 insertions(+), 84 deletions(-) create mode 100644 arch/x86/include/asm/context_tracking_work.h create mode 100644 include/linux/context_tracking_work.h -- 2.49.0

7 months, 3 weeks

5
39
0 0

[PATCH v3 00/32] kselftest harness and nolibc compatibility

by Thomas Weißschuh

Nolibc is useful for selftests as the test programs can be very small, and compiled with just a kernel crosscompiler, without userspace support. Currently nolibc is only usable with kselftest.h, not the more convenient to use kselftest_harness.h This series provides this compatibility by adding new features to nolibc and removing the usage of problematic features from the harness. The first half of the series are changes to the harness, the second one are for nolibc. Both parts are very independent and should go through different trees. The last patch is not meant to be applied and serves as test that everything works together correctly. Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- Changes in v3: - Send patches to correct kselftest harness maintainers - Move harness selftest to dedicated directory - Add harness selftest to MAINTAINERS - Integrate harness selftest cleanup with the selftest framework - Consistently use "kselftest harness" in commit messages - Properly propagate kselftest harness failure - Link to v2: https://lore.kernel.org/r/20250407-nolibc-kselftest-harness-v2-0-f8812f76e9… Changes in v2: - Rebase unto v6.15-rc1 - Rename internal nolibc symbols - Handle edge case of waitpid(INT_MIN) == ESRCH - Fix arm configurations for final testing patch - Clean up global getopt.h variable declarations - Add Acks from Willy - Link to v1: https://lore.kernel.org/r/20250304-nolibc-kselftest-harness-v1-0-adca7cd231… --- Thomas Weißschuh (32): selftests: harness: Add kselftest harness selftest selftests: harness: Use C89 comment style selftests: harness: Ignore unused variant argument warning selftests: harness: Mark functions without prototypes static selftests: harness: Remove inline qualifier for wrappers selftests: harness: Remove dependency on libatomic selftests: harness: Implement test timeouts through pidfd selftests: harness: Don't set setup_completed for fixtureless tests selftests: harness: Always provide "self" and "variant" selftests: harness: Move teardown conditional into test metadata selftests: harness: Add teardown callback to test metadata selftests: harness: Stop using setjmp()/longjmp() selftests: harness: Guard includes on nolibc tools/nolibc: handle intmax_t/uintmax_t in printf tools/nolibc: use intmax definitions from compiler tools/nolibc: use pselect6_time64 if available tools/nolibc: use ppoll_time64 if available tools/nolibc: add tolower() and toupper() tools/nolibc: add _exit() tools/nolibc: add setpgrp() tools/nolibc: implement waitpid() in terms of waitid() Revert "selftests/nolibc: use waitid() over waitpid()" tools/nolibc: add dprintf() and vdprintf() tools/nolibc: add getopt() tools/nolibc: allow different write callbacks in printf tools/nolibc: allow limiting of printf destination size tools/nolibc: add snprintf() and friends selftests/nolibc: use snprintf() for printf tests selftests/nolibc: rename vfprintf test suite selftests/nolibc: add test for snprintf() truncation tools/nolibc: implement width padding in printf() HACK: selftests/nolibc: demonstrate usage of the kselftest harness MAINTAINERS | 1 + tools/include/nolibc/Makefile | 1 + tools/include/nolibc/getopt.h | 101 ++ tools/include/nolibc/nolibc.h | 1 + tools/include/nolibc/stdint.h | 4 +- tools/include/nolibc/stdio.h | 127 +- tools/include/nolibc/string.h | 17 + tools/include/nolibc/sys.h | 105 +- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/kselftest_harness.h | 181 +- .../testing/selftests/kselftest_harness/.gitignore | 2 + tools/testing/selftests/kselftest_harness/Makefile | 7 + .../selftests/kselftest_harness/harness-selftest.c | 129 ++ .../kselftest_harness/harness-selftest.expected | 62 + .../kselftest_harness/harness-selftest.sh | 13 + tools/testing/selftests/nolibc/Makefile | 13 +- tools/testing/selftests/nolibc/harness-selftest.c | 1 + tools/testing/selftests/nolibc/nolibc-test.c | 1729 +------------------- tools/testing/selftests/nolibc/run-tests.sh | 2 +- 19 files changed, 637 insertions(+), 1860 deletions(-) --- base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8 change-id: 20250130-nolibc-kselftest-harness-8b2c8cac43bf Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

7 months, 3 weeks

7
45
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror April 2025