- Linux-kselftest-mirror - lists.linaro.org

[PATCH 0/2] Exposing nice CPU usage to userspace

by Joshua＠web.codeaurora.org

From: Joshua Hahn <joshua.hahn6(a)gmail.com> Niced CPU usage is a metric reported in host-level /proc/stat, but is not reported in cgroup-level statistics in cpu.stat. However, when a host contains multiple tasks across different workloads, it becomes difficult to gauage how much of the task is being spent on niced processes based on /proc/stat alone, since host-level metrics do not provide this cgroup-level granularity. Exposing this metric will allow load balancers to correctly probe the niced CPU metric for each workload, and make more informed decisions when directing higher priority tasks. Joshua Hahn (2): Tracking cgroup-level niced CPU time Selftests for niced CPU statistics include/linux/cgroup-defs.h | 1 + kernel/cgroup/rstat.c | 16 ++++- tools/testing/selftests/cgroup/test_cpu.c | 72 +++++++++++++++++++++++ 3 files changed, 86 insertions(+), 3 deletions(-) -- 2.43.5

1 year, 4 months

5
6
0 0

[PATCH v2 0/6] kunit: Add macros to help write more complex tests

by Michal Wajdeczko

v1: https://groups.google.com/g/kunit-dev/c/f4LIMLyofj8 v2: make it more complex and attempt to be thread safe s/FIXED_STUB/GLOBAL_STUB (David, Lucas) make it little more thread safe (Rae, David) wait until stub call finishes before test end (David) wait until stub call finishes before changing stub (David) allow stub deactivation (Rae) prefer kunit log (David) add simple selftest (Michal) also introduce ONLY_IF_KUNIT macro (Michal) Sample output from the tests: $ tools/testing/kunit/kunit.py run *example*.*global* \ --kunitconfig lib/kunit/.kunitconfig --raw_output KTAP version 1 1..1 # example: initializing suite KTAP version 1 # Subtest: example # module: kunit_example_test 1..1 # example_global_stub_test: initializing # example_global_stub_test: add_two: redirecting to subtract_one # example_global_stub_test: add_two: redirecting to subtract_one # example_global_stub_test: cleaning up ok 1 example_global_stub_test # example: exiting suite ok 1 example $ tools/testing/kunit/kunit.py run *global*.*global* \ --kunitconfig lib/kunit/.kunitconfig --raw_output KTAP version 1 1..1 KTAP version 1 # Subtest: kunit_global_stub # module: kunit_test 1..4 # kunit_global_stub_test_activate: real_void_func: redirecting to replacement_void_func # kunit_global_stub_test_activate: real_func: redirecting to replacement_func # kunit_global_stub_test_activate: real_func: redirecting to replacement_func # kunit_global_stub_test_activate: real_func: redirecting to other_replacement_func # kunit_global_stub_test_activate: real_func: redirecting to other_replacement_func # kunit_global_stub_test_activate: real_func: redirecting to super_replacement_func # kunit_global_stub_test_activate: real_func: redirecting to super_replacement_func ok 1 kunit_global_stub_test_activate ok 2 kunit_global_stub_test_deactivate # kunit_global_stub_test_slow_deactivate: real_func: redirecting to slow_replacement_func # kunit_global_stub_test_slow_deactivate: real_func: redirecting to slow_replacement_func # kunit_global_stub_test_slow_deactivate: waiting for slow_replacement_func # kunit_global_stub_test_slow_deactivate.speed: slow ok 3 kunit_global_stub_test_slow_deactivate # kunit_global_stub_test_slow_replace: real_func: redirecting to slow_replacement_func # kunit_global_stub_test_slow_replace: real_func: redirecting to slow_replacement_func # kunit_global_stub_test_slow_replace: waiting for slow_replacement_func # kunit_global_stub_test_slow_replace: real_func: redirecting to other_replacement_func # kunit_global_stub_test_slow_replace.speed: slow ok 4 kunit_global_stub_test_slow_replace # kunit_global_stub: pass:4 fail:0 skip:0 total:4 # Totals: pass:4 fail:0 skip:0 total:4 ok 1 kunit_global_stub Cc: Rae Moar <rmoar(a)google.com> Cc: David Gow <davidgow(a)google.com> Cc: Lucas De Marchi <lucas.demarchi(a)intel.com> Michal Wajdeczko (6): kunit: Introduce kunit_is_running() kunit: Add macro to conditionally expose declarations to tests kunit: Add macro to conditionally expose expressions to tests kunit: Allow function redirection outside of the KUnit thread kunit: Add example with alternate function redirection method kunit: Add some selftests for global stub redirection macros include/kunit/static_stub.h | 158 ++++++++++++++++++++ include/kunit/test-bug.h | 12 +- include/kunit/visibility.h | 16 +++ lib/kunit/kunit-example-test.c | 67 +++++++++ lib/kunit/kunit-test.c | 254 ++++++++++++++++++++++++++++++++- lib/kunit/static_stub.c | 49 +++++++ 6 files changed, 553 insertions(+), 3 deletions(-) -- 2.43.0

1 year, 4 months

3
14
0 0

[PATCH nf-next v3 1/2] netfilter: Make IP_NF_IPTABLES_LEGACY selectable

by Breno Leitao

This option makes IP_NF_IPTABLES_LEGACY user selectable, giving users the option to configure iptables without enabling any other config. Suggested-by: Florian Westphal <fw(a)strlen.de> Signed-off-by: Breno Leitao <leitao(a)debian.org> --- net/ipv4/netfilter/Kconfig | 19 +++++++++++-------- tools/testing/selftests/net/config | 8 ++++++++ 2 files changed, 19 insertions(+), 8 deletions(-) diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig index 1b991b889506..a06c1903183f 100644 --- a/net/ipv4/netfilter/Kconfig +++ b/net/ipv4/netfilter/Kconfig @@ -12,7 +12,12 @@ config NF_DEFRAG_IPV4 # old sockopt interface and eval loop config IP_NF_IPTABLES_LEGACY - tristate + tristate "Legacy IP tables support" + default n + select NETFILTER_XTABLES + help + iptables is a general, extensible packet identification legacy framework. + This is not needed if you are using iptables over nftables (iptables-nft). config NF_SOCKET_IPV4 tristate "IPv4 socket lookup support" @@ -177,7 +182,7 @@ config IP_NF_MATCH_TTL config IP_NF_FILTER tristate "Packet filtering" default m if NETFILTER_ADVANCED=n - select IP_NF_IPTABLES_LEGACY + depends on IP_NF_IPTABLES_LEGACY help Packet filtering defines a table `filter', which has a series of rules for simple packet filtering at local input, forwarding and @@ -217,7 +222,7 @@ config IP_NF_NAT default m if NETFILTER_ADVANCED=n select NF_NAT select NETFILTER_XT_NAT - select IP_NF_IPTABLES_LEGACY + depends on IP_NF_IPTABLES_LEGACY help This enables the `nat' table in iptables. This allows masquerading, port forwarding and other forms of full Network Address Port @@ -258,7 +263,7 @@ endif # IP_NF_NAT config IP_NF_MANGLE tristate "Packet mangling" default m if NETFILTER_ADVANCED=n - select IP_NF_IPTABLES_LEGACY + depends on IP_NF_IPTABLES_LEGACY help This option adds a `mangle' table to iptables: see the man page for iptables(8). This table is used for various packet alterations @@ -293,7 +298,7 @@ config IP_NF_TARGET_TTL # raw + specific targets config IP_NF_RAW tristate 'raw table support (required for NOTRACK/TRACE)' - select IP_NF_IPTABLES_LEGACY + depends on IP_NF_IPTABLES_LEGACY help This option adds a `raw' table to iptables. This table is the very first in the netfilter framework and hooks in at the PREROUTING @@ -305,9 +310,7 @@ config IP_NF_RAW # security table for MAC policy config IP_NF_SECURITY tristate "Security table" - depends on SECURITY - depends on NETFILTER_ADVANCED - select IP_NF_IPTABLES_LEGACY + depends on SECURITY && NETFILTER_ADVANCED && IP_NF_IPTABLES_LEGACY help This option adds a `security' table to iptables, for use with Mandatory Access Control (MAC) policy. diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config index 5b9baf708950..90e997cfa12e 100644 --- a/tools/testing/selftests/net/config +++ b/tools/testing/selftests/net/config @@ -28,6 +28,7 @@ CONFIG_NET_FOU=y CONFIG_NET_FOU_IP_TUNNELS=y CONFIG_NETFILTER=y CONFIG_NETFILTER_ADVANCED=y +CONFIG_NETFILTER_XT_TARGET_HL=m CONFIG_NF_CONNTRACK=m CONFIG_IPV6_MROUTE=y CONFIG_IPV6_SIT=y @@ -35,6 +36,11 @@ CONFIG_IP_DCCP=m CONFIG_NF_NAT=m CONFIG_IP6_NF_IPTABLES=m CONFIG_IP_NF_IPTABLES=m +CONFIG_IP_NF_IPTABLES_LEGACY=m +CONFIG_IP_NF_FILTER=m +CONFIG_IP_NF_TARGET_REJECT=m +CONFIG_IP_NF_TARGET_MASQUERADE=m +CONFIG_IP_NF_MANGLE=m CONFIG_IP6_NF_NAT=m CONFIG_IP6_NF_RAW=m CONFIG_IP_NF_NAT=m @@ -54,6 +60,7 @@ CONFIG_MPTCP=y CONFIG_NF_TABLES=m CONFIG_NF_TABLES_IPV6=y CONFIG_NF_TABLES_IPV4=y +CONFIG_NF_REJECT_IPV4=y CONFIG_NFT_NAT=m CONFIG_NETFILTER_XT_MATCH_LENGTH=m CONFIG_NET_ACT_CSUM=m @@ -106,4 +113,5 @@ CONFIG_CRYPTO_ARIA=y CONFIG_XFRM_INTERFACE=m CONFIG_XFRM_USER=m CONFIG_IP_NF_MATCH_RPFILTER=m +CONFIG_IP_NF_TARGET_MASQUERADE=m CONFIG_IP6_NF_MATCH_RPFILTER=m -- 2.43.5

1 year, 4 months

2
6
0 0

[PATCH bpf-next V1] enable virtFS(9p virtio) for sharing directory on VM to optimize debugging

by Lin Yikai

[Problem] Sometimes, we have only x86_64 server for compiling BPF with target ARCH of arm64. Therefore, the only way to debug bpf is using cross-compile and qemu. Unfortunately, debugging online on VM is very inconvenient, when test_progs fails. Such as: 1. We are unable to directly replace old test object and still need to quit VM and restart, which consumes valuable time. 2. We also want to share other tools or binaries online for execution on the VM, which is not supported by VM. [Optimization] I noitce that CONFIG_9P_FS is enabled in "config.vm", so virtFS (9p virtio) is available on VM. To achieve it, I add a new init file on qemu, which only exists when '-v' option is appended. root@(none):/# cat /etc/rcS.d/S20-testDebug #!/bin/sh set -x rm -rf /mnt/shared mkdir -p /mnt/shared /bin/mount -t 9p -o trans=virtio,version=9p2000.L host0 /mnt/shared [Usage] Append the option '-v' to enable it. For instance: LDLIBS=-static ./vmtest.sh -v -s -- ./test_progs -t d_path This will share the directory between VM's "/mnt/shared" with host's *${OUTPUT_DIR}/${MOUNT_DIR}/shared*. On host: $ mv ./test_progs ~/workplace/bpf/arm64/.bpf_selftests/mnt/shared/ On VM(you can directly move it into /root/bpf): root@(none):/# ls /mnt/shared/ test_progs Signed-off-by: Lin Yikai <yikai.lin(a)vivo.com> --- tools/testing/selftests/bpf/vmtest.sh | 75 ++++++++++++++++++++++++++- 1 file changed, 73 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/vmtest.sh b/tools/testing/selftests/bpf/vmtest.sh index c7461ed496ab..82afadde50da 100755 --- a/tools/testing/selftests/bpf/vmtest.sh +++ b/tools/testing/selftests/bpf/vmtest.sh @@ -70,10 +70,15 @@ LOG_FILE_BASE="$(date +"bpf_selftests.%Y-%m-%d_%H-%M-%S")" LOG_FILE="${LOG_FILE_BASE}.log" EXIT_STATUS_FILE="${LOG_FILE_BASE}.exit_status" +DEBUG_CMD_INIT="" +DEBUG_FILE_INIT="S20-testDebug" +QEMU_FLAG_VIRTFS="" + + usage() { cat <<EOF -Usage: $0 [-i] [-s] [-d <output_dir>] -- [<command>] +Usage: $0 [-i] [-s] [-v] [-d <output_dir>] -- [<command>] <command> is the command you would normally run when you are in tools/testing/selftests/bpf. e.g: @@ -101,6 +106,8 @@ Options: -s) Instead of powering off the VM, start an interactive shell. If <command> is specified, the shell runs after the command finishes executing + -v) enable virtFS (9p virtio) for sharing directory + of "/mnt/shared" on the VM EOF } @@ -275,6 +282,7 @@ EOF -serial mon:stdio \ "${QEMU_FLAGS[@]}" \ -enable-kvm \ + ${QEMU_FLAG_VIRTFS} \ -m 4G \ -drive file="${rootfs_img}",format=raw,index=1,media=disk,if=virtio,cache=none \ -kernel "${kernel_bzimage}" \ @@ -354,6 +362,60 @@ catch() exit ${exit_code} } +update_debug_init() +{ + #You can do something else just for debuging on qemu. + #The init script will be reset every time before vm running on host, + #and be executed on qemu before test_progs. + local init_script_dir="${OUTPUT_DIR}/${MOUNT_DIR}/etc/rcS.d" + local init_script_file="${init_script_dir}/${DEBUG_FILE_INIT}" + + mount_image + if [[ "${DEBUG_CMD_INIT}" == "" ]]; then + sudo rm -rf ${init_script_file} + unmount_image + return + fi + + if [[ ! -d "${init_script_dir}" ]]; then + cat <<EOF +Could not find ${init_script_dir} in the mounted image. +This likely indicates a bad or not default rootfs image, +You need to change debug init manually +according to the actual situation of the rootfs image. +EOF + unmount_image + exit 1 + fi + + sudo bash -c "cat > ${init_script_file}" <<EOF +#!/bin/sh +set -x +${DEBUG_CMD_INIT} +EOF + sudo chmod 755 "${init_script_file}" + unmount_image +} + +#Establish shared dir access by 9p virtfs +#between "/mnt/shared" on qemu with *${OUTPUT_DIR}/${MOUNT_DIR}/shared* on local host. +debug_by_virtfs_shared() +{ + local qemu_shared_dir="/mnt/shared" + local host_shared_dir="${OUTPUT_DIR}/${MOUNT_DIR}/shared" + + #append virtfs shared flag for qemu + local flag="-virtfs local,mount_tag=host0,security_model=passthrough,id=host0,path=${host_shared_dir}" + mkdir -p "${host_shared_dir}" + QEMU_FLAG_VIRTFS="${QEMU_FLAG_VIRTFS} ${flag}" + + #append mount cmd into init + DEBUG_CMD_INIT="${DEBUG_CMD_INIT}\ +rm -rf ${qemu_shared_dir} +mkdir -p ${qemu_shared_dir} +/bin/mount -t 9p -o trans=virtio,version=9p2000.L host0 ${qemu_shared_dir}" +} + main() { local script_dir="$(cd -P -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)" @@ -365,8 +427,9 @@ main() local update_image="no" local exit_command="poweroff -f" local debug_shell="no" + local enable_virtfs_shared="no" - while getopts ':hskid:j:' opt; do + while getopts ':vhskid:j:' opt; do case ${opt} in i) update_image="yes" @@ -382,6 +445,9 @@ main() debug_shell="yes" exit_command="bash" ;; + v) + enable_virtfs_shared="yes" + ;; h) usage exit 0 @@ -449,6 +515,11 @@ main() create_vm_image fi + if [[ "${enable_virtfs_shared}" == "yes" ]]; then + debug_by_virtfs_shared + fi + update_debug_init + update_selftests "${kernel_checkout}" "${make_command}" update_init_script "${command}" "${exit_command}" run_vm "${kernel_bzimage}" -- 2.34.1

1 year, 4 months

1
0
0 0

[PATCH] selftests/arm64: Fix build warnings for abi

by Dev Jain

A "%s" is missing in ksft_exit_fail_msg(); instead, use the newly introduced ksft_exit_fail_perror(). Also, uint64_t corresponds to unsigned 64-bit integer, so use %lx instead of %llx. Signed-off-by: Dev Jain <dev.jain(a)arm.com> --- The changes in ptrace.c were earlier a part of the following: https://lore.kernel.org/all/20240625122408.1439097-6-dev.jain@arm.com/ which were reviewed by Mark. tools/testing/selftests/arm64/abi/ptrace.c | 4 ++-- tools/testing/selftests/arm64/abi/syscall-abi.c | 8 ++++---- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/arm64/abi/ptrace.c b/tools/testing/selftests/arm64/abi/ptrace.c index e4fa507cbdd0..b51d21f78cf9 100644 --- a/tools/testing/selftests/arm64/abi/ptrace.c +++ b/tools/testing/selftests/arm64/abi/ptrace.c @@ -163,10 +163,10 @@ static void test_hw_debug(pid_t child, int type, const char *type_name) static int do_child(void) { if (ptrace(PTRACE_TRACEME, -1, NULL, NULL)) - ksft_exit_fail_msg("PTRACE_TRACEME", strerror(errno)); + ksft_exit_fail_perror("PTRACE_TRACEME"); if (raise(SIGSTOP)) - ksft_exit_fail_msg("raise(SIGSTOP)", strerror(errno)); + ksft_exit_fail_perror("raise(SIGSTOP)"); return EXIT_SUCCESS; } diff --git a/tools/testing/selftests/arm64/abi/syscall-abi.c b/tools/testing/selftests/arm64/abi/syscall-abi.c index d704511a0955..5ec9a18ec802 100644 --- a/tools/testing/selftests/arm64/abi/syscall-abi.c +++ b/tools/testing/selftests/arm64/abi/syscall-abi.c @@ -81,7 +81,7 @@ static int check_gpr(struct syscall_cfg *cfg, int sve_vl, int sme_vl, uint64_t s */ for (i = 9; i < ARRAY_SIZE(gpr_in); i++) { if (gpr_in[i] != gpr_out[i]) { - ksft_print_msg("%s SVE VL %d mismatch in GPR %d: %llx != %llx\n", + ksft_print_msg("%s SVE VL %d mismatch in GPR %d: %lx != %lx\n", cfg->name, sve_vl, i, gpr_in[i], gpr_out[i]); errors++; @@ -112,7 +112,7 @@ static int check_fpr(struct syscall_cfg *cfg, int sve_vl, int sme_vl, if (!sve_vl && !(svcr & SVCR_SM_MASK)) { for (i = 0; i < ARRAY_SIZE(fpr_in); i++) { if (fpr_in[i] != fpr_out[i]) { - ksft_print_msg("%s Q%d/%d mismatch %llx != %llx\n", + ksft_print_msg("%s Q%d/%d mismatch %lx != %lx\n", cfg->name, i / 2, i % 2, fpr_in[i], fpr_out[i]); @@ -294,13 +294,13 @@ static int check_svcr(struct syscall_cfg *cfg, int sve_vl, int sme_vl, int errors = 0; if (svcr_out & SVCR_SM_MASK) { - ksft_print_msg("%s Still in SM, SVCR %llx\n", + ksft_print_msg("%s Still in SM, SVCR %lx\n", cfg->name, svcr_out); errors++; } if ((svcr_in & SVCR_ZA_MASK) != (svcr_out & SVCR_ZA_MASK)) { - ksft_print_msg("%s PSTATE.ZA changed, SVCR %llx != %llx\n", + ksft_print_msg("%s PSTATE.ZA changed, SVCR %lx != %lx\n", cfg->name, svcr_in, svcr_out); errors++; } -- 2.30.2

1 year, 4 months

3
7
0 0

[PATCH net v2 00/15] mptcp: more fixes for the in-kernel PM

by Matthieu Baerts (NGI0)

Here is a new batch of fixes for the MPTCP in-kernel path-manager: Patch 1 ensures the address ID is set to 0 when the path-manager sends an ADD_ADDR for the address of the initial subflow. The same fix is applied when a new subflow is created re-using this special address. A fix for v6.0. Patch 2 is similar, but for the case where an endpoint is removed: if this endpoint was used for the initial address, it is important to send a RM_ADDR with this ID set to 0, and look for existing subflows with the ID set to 0. A fix for v6.0 as well. Patch 3 validates the two previous patches. Patch 4 makes the PM selecting an "active" path to send an address notification in an ACK, instead of taking the first path in the list. A fix for v5.11. Patch 5 fixes skipping the establishment of a new subflow if a previous subflow using the same pair of addresses is being closed. A fix for v5.13. Patch 6 resets the ID linked to the initial subflow when the linked endpoint is re-added, possibly with a different ID. A fix for v6.0. Patch 7 validates the three previous patches. Patch 8 is a small fix for the MPTCP Join selftest, when being used with older subflows not supporting all MIB counters. A fix for a commit introduced in v6.4, but backported up to v5.10. Patch 9 avoids the PM to try to close the initial subflow multiple times, and increment counters while nothing happened. A fix for v5.10. Patch 10 stops incrementing local_addr_used and add_addr_accepted counters when dealing with the address ID 0, because these counters are not taking into account the initial subflow, and are then not decremented when the linked addresses are removed. A fix for v6.0. Patch 11 validates the previous patch. Patch 12 avoids the PM to send multiple SUB_CLOSED events for the initial subflow. A fix for v5.12. Patch 13 validates the previous patch. Patch 14 stops treating the ADD_ADDR 0 as a new address, and accepts it in order to re-create the initial subflow if it has been closed, even if the limit for *new* addresses -- not taking into account the address of the initial subflow -- has been reached. A fix for v5.10. Patch 15 validates the previous patch. Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org> --- Changes in v2: - Patches 11,15/15: allow the connection to run for longer, should fix the issue seen on the Netdev CI, with a debug kconfig. - Link to v1: https://lore.kernel.org/r/20240826-net-mptcp-more-pm-fix-v1-0-8cd6c87d1d6d@… --- Matthieu Baerts (NGI0) (15): mptcp: pm: reuse ID 0 after delete and re-add mptcp: pm: fix RM_ADDR ID for the initial subflow selftests: mptcp: join: check removing ID 0 endpoint mptcp: pm: send ACK on an active subflow mptcp: pm: skip connecting to already established sf mptcp: pm: reset MPC endp ID when re-added selftests: mptcp: join: check re-adding init endp with != id selftests: mptcp: join: no extra msg if no counter mptcp: pm: do not remove already closed subflows mptcp: pm: fix ID 0 endp usage after multiple re-creations selftests: mptcp: join: check re-re-adding ID 0 endp mptcp: avoid duplicated SUB_CLOSED events selftests: mptcp: join: validate event numbers mptcp: pm: ADD_ADDR 0 is not a new address selftests: mptcp: join: check re-re-adding ID 0 signal net/mptcp/pm.c | 4 +- net/mptcp/pm_netlink.c | 87 ++++++++++---- net/mptcp/protocol.c | 6 + net/mptcp/protocol.h | 5 +- tools/testing/selftests/net/mptcp/mptcp_join.sh | 153 ++++++++++++++++++++---- tools/testing/selftests/net/mptcp/mptcp_lib.sh | 4 + 6 files changed, 209 insertions(+), 50 deletions(-) --- base-commit: 3a0504d54b3b57f0d7bf3d9184a00c9f8887f6d7 change-id: 20240826-net-mptcp-more-pm-fix-ffa61a36f817 Best regards, -- Matthieu Baerts (NGI0) <matttbe(a)kernel.org>

1 year, 4 months

2
16
0 0

[PATCH v5 0/4] HID: hidraw: HIDIOCREVOKE introduction

by bentiss＠kernel.org

The is the v5 of the HIDIOCREVOKE patches. After a small discussion with Peter, we decided to: - drop the BPF hooks that are problematic (Linus doesn't want "ALLOW_ERROR_INJECTION" to be used as "normal" fmodret bpf hooks) - punt those BPF hooks later once we get the API right - I'll be the one sending that new version, given that it's easier for me ATM For testing the patch, and for convenience, I added a new selftest program that can test this new ioctl. This will also allow us to integrate the (future) BPF hooks and show how this should be used. Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org> --- Changes in v5: - check for ENODEV when required in selftests - create new common header for the HID tests that can be reused in other HID selftests - Link to v4: https://lore.kernel.org/r/20240827-hidraw-revoke-v4-0-88c6795bf867@kernel.o… Link to v3: https://lore.kernel.org/all/20240812052753.GA478917@quokka/ --- Benjamin Tissoires (3): selftests/hid: extract the utility part of hid_bpf.c into its own header selftests/hid: Add initial hidraw tests skeleton selftests/hid: Add HIDIOCREVOKE tests Peter Hutterer (1): HID: hidraw: add HIDIOCREVOKE ioctl drivers/hid/hidraw.c | 39 ++- include/linux/hidraw.h | 1 + include/uapi/linux/hidraw.h | 1 + tools/testing/selftests/hid/.gitignore | 1 + tools/testing/selftests/hid/Makefile | 2 +- tools/testing/selftests/hid/hid_bpf.c | 437 +------------------------------ tools/testing/selftests/hid/hid_common.h | 436 ++++++++++++++++++++++++++++++ tools/testing/selftests/hid/hidraw.c | 237 +++++++++++++++++ 8 files changed, 714 insertions(+), 440 deletions(-) --- base-commit: 6e4436539ae182dc86d57d13849862bcafaa4709 change-id: 20240826-hidraw-revoke-0a02ebb21743 Best regards, -- Benjamin Tissoires <bentiss(a)kernel.org>

1 year, 4 months

2
5
0 0

[PATCH bpf-next v3 0/8] libbpf, selftests/bpf: Support cross-endian usage

by Tony Ambardar

Hello all, This patch series targets a long-standing BPF usability issue - the lack of general cross-compilation support - by enabling cross-endian usage of libbpf and bpftool, as well as supporting cross-endian build targets for selftests/bpf. Benefits include improved BPF development and testing for embedded systems based on e.g. big-endian MIPS, more build options e.g for s390x systems, and better accessibility to the very latest test tools e.g. 'test_progs'. Initial development and testing used mips64, since this arch makes switching the build byte-order trivial and is thus very handy for A/B testing. However, it lacks some key features (bpf2bpf call, kfuncs, etc) making for poor selftests/bpf coverage. Final testing takes the kernel and selftests/bpf cross-built from x86_64 to s390x, and runs the result under QEMU/s390x. That same configuration could also be used on kernel-patches/bpf CI for regression testing endian support or perhaps load-sharing s390x builds across x86_64 systems. This thread includes some background regarding testing on QEMU/s390x and the generally favourable results: https://lore.kernel.org/bpf/ZsEcsaa3juxxQBUf@kodidev-ubuntu/ Feedback and suggestions are welcome! Best regards, Tony Changelog: --------- v2 -> v3: (feedback from Andrii) - improve some log and commit message formatting - restructure BTF.ext endianness safety checks and byte-swapping - use BTF.ext info record definitions for swapping, require BTF v1 - follow BTF API implementation more closely for BTF.ext - explicitly reject loading non-native endianness program into kernel - simplify linker output byte-order setting - drop redundant safety checks during linking - simplify endianness macro and improve blob setup code for light skel - no unexpected test failures after cross-compiling x86_64 -> s390x v1 -> v2: - fixed a light skeleton bug causing test_progs 'map_ptr' failure - simplified some BTF.ext related endianness logic - remove an 'inline' usage related to CI checkpatch failure - improve some formatting noted by checkpatch warnings - unexpected 'test_progs' failures drop 3 -> 2 (x86_64 to s390x cross) Tony Ambardar (8): libbpf: Improve log message formatting libbpf: Fix header comment typos for BTF.ext libbpf: Fix output .symtab byte-order during linking libbpf: Support BTF.ext loading and output in either endianness libbpf: Support opening bpf objects of either endianness libbpf: Support linking bpf objects of either endianness libbpf: Support creating light skeleton of either endianness selftests/bpf: Support cross-endian building tools/lib/bpf/bpf_gen_internal.h | 1 + tools/lib/bpf/btf.c | 230 ++++++++++++++++++++++++--- tools/lib/bpf/btf.h | 3 + tools/lib/bpf/btf_dump.c | 2 +- tools/lib/bpf/btf_relocate.c | 2 +- tools/lib/bpf/gen_loader.c | 185 ++++++++++++++++----- tools/lib/bpf/libbpf.c | 39 +++-- tools/lib/bpf/libbpf.map | 2 + tools/lib/bpf/libbpf_internal.h | 17 +- tools/lib/bpf/linker.c | 92 +++++++++-- tools/lib/bpf/relo_core.c | 2 +- tools/lib/bpf/skel_internal.h | 3 +- tools/testing/selftests/bpf/Makefile | 7 +- 13 files changed, 488 insertions(+), 97 deletions(-) -- 2.34.1

1 year, 4 months

1
9
0 0

[PATCH net-next v22 00/13] Device Memory TCP

by Mina Almasry

v22: https://patchwork.kernel.org/project/netdevbpf/list/?series=881158&state=* ==== v22 aims to resolve the pending issue pointed to in v21, which is the interaction with xdp. In this series I rebase on top of the minor refactor which refactors propagating xdp configuration to slave devices: https://patchwork.kernel.org/project/netdevbpf/list/?series=881994&state=* I then disable setting xdp on devices using memory providers, and propagating xdp configuration to devices using memory providers. Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v22/ v21: https://patchwork.kernel.org/project/netdevbpf/list/?series=880735&state=* ==== v20 addressed some comments and resolved a test failure, but introduced an unfortunate build error with a config edge case I wasn't testing. v21 simply resolves that error. Major Changes: - Resolve build error with CONFIG_PAGE_POOL=n && CONFIG_NET=y Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v21/ v20: https://patchwork.kernel.org/project/netdevbpf/list/?series=879373&state=* ==== v20 aims to resolve a couple of bug reports against v19, and addresses some review comments around the page_pool_check_memory_provider mechanism. Major changes: - Test edge cases such as header split disabled in selftest. - Change `offset = 0` back to `offset = offset - start` to resolve issue found in RX path by Taehee (thanks!) - Address a few comments around page_pool_check_memory_provider() from Pavel & Jakub. - Removed some unnecessary includes across various patches in the series. - Removed unnecessary EXPORT_SYMBOL(page_pool_mem_providers) (Jakub). - Fix regression caused by incorrect dev_get_max_mp_channel check, along with rename (Jakub). Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v20/ v19: https://patchwork.kernel.org/project/netdevbpf/list/?series=876852&state=* ==== v18 got a thorough review (thanks!), and this iteration addresses the feedback. Major changes: - Prevent deactivating mp bound queues. - Prevent installing xdp on mp bound netdevs, or installing mps on xdp installed netdevs. - Fix corner cases in netlink API vis-a-vis missing attributes. - Iron out the unreadable netmem driver support story. To be honest, the conversation with Jakub & Pavel got a bit confusing for me. I've implemented an approach in this set that makes sense to me, and AFAICT, addresses the requirements. It may be good as-is, or it may be a conversation starter/continuer. To be honest IMO there are many ways to skin this cat and I don't see an extremely strong reason to go for one approach over another. Here is one approach you may like. - Don't reset niov dma_addr on allocation & free. - Add some tests to the selftest that catches some of the issues around missing netlink attributes or deactivating mp-bound queues. Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v19/ v18: https://patchwork.kernel.org/project/netdevbpf/list/?series=874848&state=* ==== v17 got minor feedback: (a) to beef up the description on patch 1 and (b) to remove the leading underscores in the header definition. I applied (a). (b) seems to be against current conventions so I did not apply before further discussion. Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v17/ v17: https://patchwork.kernel.org/project/netdevbpf/list/?series=869900&state=* ==== v16 also got a very thorough review and some testing (thanks again!). Thes version addresses all the concerns reported on v15, in terms of feedback and issues reported. Major changes: - Use ASSERT_RTNL. - Moved around some of the page_pool helpers definitions so I can hide some netmem helpers in private files as Jakub suggested. - Don't make every net_iov hold a ref on the binding as Jakub suggested. - Fix issue reported by Taehee where we access queues after they have been freed. Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v17/ v16: https://patchwork.kernel.org/project/netdevbpf/list/?series=866353&state=* ==== v15 got a thorough review and some testing, and this version addresses almost all the feedback. Some more minor comments where the authors said it could be done later, I left out. Major changes: - Addition of dma-buf introspection to page-pool-get and queue-get. - Fixes to selftests suggested by Taehee. - Fixes to documentation suggested by Donald. - A couple of suggestions and fixes to TCP patches by Eric and David. - Fixes to number assignements suggested by Arnd. - Use rtnl_lock()ing to guard against queue reconfiguration while the page_pool initialization is happening. (Jakub). - Fixes to a few warnings reproduced by Taehee. - Fixes to dma-buf binding suggested by Taehee and Jakub. - Fixes to netlink UAPI suggested by Jakub - Applied a number of Reviewed-bys and Acked-bys (including ones I lost from v13+). Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v16/ One caveat: Taehee reproduced a KASAN warning and reported it here: https://lore.kernel.org/netdev/CAMArcTUdCxOBYGF3vpbq=eBvqZfnc44KBaQTN7H-wqd… I estimate the issue to be minor and easily fixable: https://lore.kernel.org/netdev/CAHS8izNgaqC--GGE2xd85QB=utUnOHmioCsDd1TNxJW… I hope to be able to follow up with a fix to net tree as net-next closes imminently, but if this iteration doesn't make it in, I will repost with a fix squashed after net-next reopens, no problem. v15: https://patchwork.kernel.org/project/netdevbpf/list/?series=865481&state=* ==== No material changes in this version, only a fix to linking against libynl.a from the last version. Per Jakub's instructions I've pulled one of his patches into this series, and now use the new libynl.a correctly, I hope. As usual, the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v15/ v14: https://patchwork.kernel.org/project/netdevbpf/list/?series=865135&archive=… ==== No material changes in this version. Only rebase and re-verification on top of net-next. v13, I think, raced with commit ebad6d0334793 ("net/ipv4: Use nested-BH locking for ipv4_tcp_sk.") being merged to net-next that caused a patchwork failure to apply. This series should apply cleanly on commit c4532232fa2a4 ("selftests: net: remove unneeded IP_GRE config"). I did not wait the customary 24hr as Jakub said it's OK to repost as soon as I build test the rebased version: https://lore.kernel.org/netdev/20240625075926.146d769d@kernel.org/ v13: https://patchwork.kernel.org/project/netdevbpf/list/?series=861406&archive=… ==== Major changes: -------------- This iteration addresses Pavel's review comments, applies his reviewed-by's, and seeks to fix the patchwork build error (sorry!). As usual, the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v13/ v12: https://patchwork.kernel.org/project/netdevbpf/list/?series=859747&state=* ==== Major changes: -------------- This iteration only addresses one minor comment from Pavel with regards to the trace printing of netmem, and the patchwork build error introduced in v11 because I missed doing an allmodconfig build, sorry. Other than that v11, AFAICT, received no feedback. There is one discussion about how the specifics of plugging io uring memory through the page pool, but not relevant to content in this particular patchset, AFAICT. As usual, the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v12/ v11: https://patchwork.kernel.org/project/netdevbpf/list/?series=857457&state=* ==== Major Changes: -------------- v11 addresses feedback received in v10. The major change is the removal of the memory provider ops as requested by Christoph. We still accomplish the same thing, but utilizing direct function calls with if statements rather than generic ops. Additionally address sparse warnings, bugs and review comments from folks that reviewed. As usual, the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v11/ Detailed changelog: ------------------- - Fixes in netdev_rx_queue_restart() from Pavel & David. - Remove commit e650e8c3a36f5 ("net: page_pool: create hooks for custom page providers") from the series to address Christoph's feedback and rebased other patches on the series on this change. - Fixed build errors with CONFIG_DMA_SHARED_BUFFER && !CONFIG_GENERIC_ALLOCATOR build. - Fixed sparse warnings pointed out by Paolo. - Drop unnecessary gro_pull_from_frag0 checks. - Added Bagas reviewed-by to docs. v10: https://patchwork.kernel.org/project/netdevbpf/list/?series=852422&state=* ==== Major Changes: -------------- v9 was sent right before the merge window closed (sorry!). v10 is almost a re-send of the series now that the merge window re-opened. Only rebased to latest net-next and addressed some minor iterative comments received on v9. As usual, the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v10/ Detailed changelog: ------------------- - Fixed tokens leaking in DONTNEED setsockopt (Nikolay). - Moved net_iov_dma_addr() to devmem.c and made it a devmem specific helpers (David). - Rename hook alloc_pages to alloc_netmems as alloc_pages is now preprocessor macro defined and causes a build error. v9: === Major Changes: -------------- GVE queue API has been merged. Submitting this version as non-RFC after rebasing on top of the merged API, and dropped the out of tree queue API I was carrying on github. Addressed the little feedback v8 has received. Detailed changelog: ------------------ - Added new patch from David Wei to this series for netdev_rx_queue_restart() - Fixed sparse error. - Removed CONFIG_ checks in netmem_is_net_iov() - Flipped skb->readable to skb->unreadable - Minor fixes to selftests & docs. RFC v8: ======= Major Changes: -------------- - Fixed build error generated by patch-by-patch build. - Applied docs suggestions from Randy. RFC v7: ======= Major Changes: -------------- This revision largely rebases on top of net-next and addresses the feedback RFCv6 received from folks, namely Jakub, Yunsheng, Arnd, David, & Pavel. The series remains in RFC because the queue-API ndos defined in this series are not yet implemented. I have a GVE implementation I carry out of tree for my testing. A upstreamable GVE implementation is in the works. Aside from that, in my estimation all the patches are ready for review/merge. Please do take a look. As usual the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v7/ Detailed changelog: - Use admin-perm in netlink API. - Addressed feedback from Jakub with regards to netlink API implementation. - Renamed devmem.c functions to something more appropriate for that file. - Improve the performance seen through the page_pool benchmark. - Fix the value definition of all the SO_DEVMEM_* uapi. - Various fixes to documentation. Perf - page-pool benchmark: --------------------------- Improved performance of bench_page_pool_simple.ko tests compared to v6: https://pastebin.com/raw/v5dYRg8L net-next base: 8 cycle fast path. RFC v6: 10 cycle fast path. RFC v7: 9 cycle fast path. RFC v7 with CONFIG_DMA_SHARED_BUFFER disabled: 8 cycle fast path, same as baseline. Perf - Devmem TCP benchmark: --------------------- Perf is about the same regardless of the changes in v7, namely the removal of the static_branch_unlikely to improve the page_pool benchmark performance: 189/200gbps bi-directional throughput with RX devmem TCP and regular TCP TX i.e. ~95% line rate. RFC v6: ======= Major Changes: -------------- This revision largely rebases on top of net-next and addresses the little feedback RFCv5 received. The series remains in RFC because the queue-API ndos defined in this series are not yet implemented. I have a GVE implementation I carry out of tree for my testing. A upstreamable GVE implementation is in the works. Aside from that, in my estimation all the patches are ready for review/merge. Please do take a look. As usual the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v6/ This version also comes with some performance data recorded in the cover letter (see below changelog). Detailed changelog: - Rebased on top of the merged netmem_ref changes. - Converted skb->dmabuf to skb->readable (Pavel). Pavel's original suggestion was to remove the skb->dmabuf flag entirely, but when I looked into it closely, I found the issue that if we remove the flag we have to dereference the shinfo(skb) pointer to obtain the first frag to tell whether an skb is readable or not. This can cause a performance regression if it dirties the cache line when the shinfo(skb) was not really needed. Instead, I converted the skb->dmabuf flag into a generic skb->readable flag which can be re-used by io_uring 0-copy RX. - Squashed a few locking optimizations from Eric Dumazet in the RX path and the DEVMEM_DONTNEED setsockopt. - Expanded the tests a bit. Added validation for invalid scenarios and added some more coverage. Perf - page-pool benchmark: --------------------------- bench_page_pool_simple.ko tests with and without these changes: https://pastebin.com/raw/ncHDwAbn AFAIK the number that really matters in the perf tests is the 'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8 cycles without the changes but there is some 1 cycle noise in some results. With the patches this regresses to 9 cycles with the changes but there is 1 cycle noise occasionally running this test repeatedly. Lastly I tried disable the static_branch_unlikely() in netmem_is_net_iov() check. To my surprise disabling the static_branch_unlikely() check reduces the fast path back to 8 cycles, but the 1 cycle noise remains. Perf - Devmem TCP benchmark: --------------------- 189/200gbps bi-directional throughput with RX devmem TCP and regular TCP TX i.e. ~95% line rate. Major changes in RFC v5: ======================== 1. Rebased on top of 'Abstract page from net stack' series and used the new netmem type to refer to LSB set pointers instead of re-using struct page. 2. Downgraded this series back to RFC and called it RFC v5. This is because this series is now dependent on 'Abstract page from net stack'[1] and the queue API. Both are removed from the series to reduce the patch # and those bits are fairly independent or pre-requisite work. 3. Reworked the page_pool devmem support to use netmem and for some more unified handling. 4. Reworked the reference counting of net_iov (renamed from page_pool_iov) to use pp_ref_count for refcounting. The full changes including the dependent series and GVE page pool support is here: https://github.com/mina/linux/commits/tcpdevmem-rfcv5/ [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810774 Major changes in v1: ==================== 1. Implemented MVP queue API ndos to remove the userspace-visible driver reset. 2. Fixed issues in the napi_pp_put_page() devmem frag unref path. 3. Removed RFC tag. Many smaller addressed comments across all the patches (patches have individual change log). Full tree including the rest of the GVE driver changes: https://github.com/mina/linux/commits/tcpdevmem-v1 Changes in RFC v3: ================== 1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the series reviewable and mergeable. 2. Implemented multi-rx-queue binding which was a todo in v2. 3. Fix to cmsg handling. The sticking point in RFC v2[2] was the device reset required to refill the device rx-queues after the dmabuf bind/unbind. The solution suggested as I understand is a subset of the per-queue management ops Jakub suggested or similar: https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/ This is not addressed in this revision, because: 1. This point was discussed at netconf & netdev and there is openness to using the current approach of requiring a device reset. 2. Implementing individual queue resetting seems to be difficult for my test bed with GVE. My prototype to test this ran into issues with the rx-queues not coming back up properly if reset individually. At the moment I'm unsure if it's a mistake in the POC or a genuine issue in the virtualization stack behind GVE, which currently doesn't test individual rx-queue restart. 3. Our usecases are not bothered by requiring a device reset to refill the buffer queues, and we'd like to support NICs that run into this limitation with resetting individual queues. My thought is that drivers that have trouble with per-queue configs can use the support in this series, while drivers that support new netdev ops to reset individual queues can automatically reset the queue as part of the dma-buf bind/unbind. The same approach with device resets is presented again for consideration with other sticking points addressed. This proposal includes the rx devmem path only proposed for merge. For a snapshot of my entire tree which includes the GVE POC page pool support & device memory support: https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3 [1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.… [2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4… Changes in RFC v2: ================== The sticking point in RFC v1[1] was the dma-buf pages approach we used to deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept that attempts to resolve this by implementing scatterlist support in the networking stack, such that we can import the dma-buf scatterlist directly. This is the approach proposed at a high level here[2]. Detailed changes: 1. Replaced dma-buf pages approach with importing scatterlist into the page pool. 2. Replace the dma-buf pages centric API with a netlink API. 3. Removed the TX path implementation - there is no issue with implementing the TX path with scatterlist approach, but leaving out the TX path makes it easier to review. 4. Functionality is tested with this proposal, but I have not conducted perf testing yet. I'm not sure there are regressions, but I removed perf claims from the cover letter until they can be re-confirmed. 5. Added Signed-off-by: contributors to the implementation. 6. Fixed some bugs with the RX path since RFC v1. Any feedback welcome, but specifically the biggest pending questions needing feedback IMO are: 1. Feedback on the scatterlist-based approach in general. 2. Netlink API (Patch 1 & 2). 3. Approach to handle all the drivers that expect to receive pages from the page pool (Patch 6). [1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c… [2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX… ================== * TL;DR: Device memory TCP (devmem TCP) is a proposal for transferring data to and/or from device memory efficiently, without bouncing the data to a host memory buffer. * Problem: A large amount of data transfers have device memory as the source and/or destination. Accelerators drastically increased the volume of such transfers. Some examples include: - ML accelerators transferring large amounts of training data from storage into GPU/TPU memory. In some cases ML training setup time can be as long as 50% of TPU compute time, improving data transfer throughput & efficiency can help improving GPU/TPU utilization. - Distributed training, where ML accelerators, such as GPUs on different hosts, exchange data among them. - Distributed raw block storage applications transfer large amounts of data with remote SSDs, much of this data does not require host processing. Today, the majority of the Device-to-Device data transfers the network are implemented as the following low level operations: Device-to-Host copy, Host-to-Host network transfer, and Host-to-Device copy. The implementation is suboptimal, especially for bulk data transfers, and can put significant strains on system resources, such as host memory bandwidth, PCIe bandwidth, etc. One important reason behind the current state is the kernel’s lack of semantics to express device to network transfers. * Proposal: In this patch series we attempt to optimize this use case by implementing socket APIs that enable the user to: 1. send device memory across the network directly, and 2. receive incoming network packets directly into device memory. Packet _payloads_ go directly from the NIC to device memory for receive and from device memory to NIC for transmit. Packet _headers_ go to/from host memory and are processed by the TCP/IP stack normally. The NIC _must_ support header split to achieve this. Advantages: - Alleviate host memory bandwidth pressure, compared to existing network-transfer + device-copy semantics. - Alleviate PCIe BW pressure, by limiting data transfer to the lowest level of the PCIe tree, compared to traditional path which sends data through the root complex. * Patch overview: ** Part 1: netlink API Gives user ability to bind dma-buf to an RX queue. ** Part 2: scatterlist support Currently the standard for device memory sharing is DMABUF, which doesn't generate struct pages. On the other hand, networking stack (skbs, drivers, and page pool) operate on pages. We have 2 options: 1. Generate struct pages for dmabuf device memory, or, 2. Modify the networking stack to process scatterlist. Approach #1 was attempted in RFC v1. RFC v2 implements approach #2. ** part 3: page pool support We piggy back on page pool memory providers proposal: https://github.com/kuba-moo/linux/tree/pp-providers It allows the page pool to define a memory provider that provides the page allocation and freeing. It helps abstract most of the device memory TCP changes from the driver. ** part 4: support for unreadable skb frags Page pool iovs are not accessible by the host; we implement changes throughput the networking stack to correctly handle skbs with unreadable frags. ** Part 5: recvmsg() APIs We define user APIs for the user to send and receive device memory. Not included with this series is the GVE devmem TCP support, just to simplify the review. Code available here if desired: https://github.com/mina/linux/tree/tcpdevmem This series is built on top of net-next with Jakub's pp-providers changes cherry-picked. * NIC dependencies: 1. (strict) Devmem TCP require the NIC to support header split, i.e. the capability to split incoming packets into a header + payload and to put each into a separate buffer. Devmem TCP works by using device memory for the packet payload, and host memory for the packet headers. 2. (optional) Devmem TCP works better with flow steering support & RSS support, i.e. the NIC's ability to steer flows into certain rx queues. This allows the sysadmin to enable devmem TCP on a subset of the rx queues, and steer devmem TCP traffic onto these queues and non devmem TCP elsewhere. The NIC I have access to with these properties is the GVE with DQO support running in Google Cloud, but any NIC that supports these features would suffice. I may be able to help reviewers bring up devmem TCP on their NICs. * Testing: The series includes a udmabuf kselftest that show a simple use case of devmem TCP and validates the entire data path end to end without a dependency on a specific dmabuf provider. ** Test Setup Kernel: net-next with this series and memory provider API cherry-picked locally. Hardware: Google Cloud A3 VMs. NIC: GVE with header split & RSS & flow steering support. Cc: Pavel Begunkov <asml.silence(a)gmail.com> Cc: David Wei <dw(a)davidwei.uk> Cc: Jason Gunthorpe <jgg(a)ziepe.ca> Cc: Yunsheng Lin <linyunsheng(a)huawei.com> Cc: Shailend Chand <shailend(a)google.com> Cc: Harshitha Ramamurthy <hramamurthy(a)google.com> Cc: Shakeel Butt <shakeel.butt(a)linux.dev> Cc: Jeroen de Borst <jeroendb(a)google.com> Cc: Praveen Kaligineedi <pkaligineedi(a)google.com> Cc: Bagas Sanjaya <bagasdotme(a)gmail.com> Cc: Steven Rostedt <rostedt(a)goodmis.org> Cc: Christoph Hellwig <hch(a)infradead.org> Cc: Nikolay Aleksandrov <razor(a)blackwall.org> Cc: Taehee Yoo <ap420073(a)gmail.com> Cc: Donald Hunter <donald.hunter(a)gmail.com> Mina Almasry (13): netdev: add netdev_rx_queue_restart() net: netdev netlink api to bind dma-buf to a net device netdev: support binding dma-buf to netdevice netdev: netdevice devmem allocator page_pool: devmem support memory-provider: dmabuf devmem memory provider net: support non paged skb frags net: add support for skbs with unreadable frags tcp: RX path for devmem TCP net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags net: add devmem TCP documentation selftests: add ncdevmem, netcat for devmem TCP netdev: add dmabuf introspection Documentation/netlink/specs/netdev.yaml | 61 +++ Documentation/networking/devmem.rst | 269 +++++++++++ Documentation/networking/index.rst | 1 + arch/alpha/include/uapi/asm/socket.h | 6 + arch/mips/include/uapi/asm/socket.h | 6 + arch/parisc/include/uapi/asm/socket.h | 6 + arch/sparc/include/uapi/asm/socket.h | 6 + include/linux/netdevice.h | 2 + include/linux/skbuff.h | 61 ++- include/linux/skbuff_ref.h | 9 +- include/linux/socket.h | 1 + include/net/devmem.h | 133 ++++++ include/net/mp_dmabuf_devmem.h | 44 ++ include/net/netdev_rx_queue.h | 5 + include/net/netmem.h | 169 ++++++- include/net/page_pool/helpers.h | 39 +- include/net/page_pool/types.h | 22 +- include/net/sock.h | 2 + include/net/tcp.h | 5 +- include/trace/events/page_pool.h | 12 +- include/uapi/asm-generic/socket.h | 6 + include/uapi/linux/netdev.h | 13 + include/uapi/linux/uio.h | 17 + net/core/Makefile | 3 +- net/core/datagram.c | 6 + net/core/dev.c | 24 +- net/core/devmem.c | 382 ++++++++++++++++ net/core/gro.c | 3 +- net/core/netdev-genl-gen.c | 23 + net/core/netdev-genl-gen.h | 6 + net/core/netdev-genl.c | 118 +++++ net/core/netdev_rx_queue.c | 81 ++++ net/core/netmem_priv.h | 31 ++ net/core/page_pool.c | 117 +++-- net/core/page_pool_priv.h | 46 ++ net/core/page_pool_user.c | 29 ++ net/core/skbuff.c | 77 +++- net/core/sock.c | 68 +++ net/ethtool/common.c | 8 + net/ipv4/esp4.c | 3 +- net/ipv4/tcp.c | 261 ++++++++++- net/ipv4/tcp_input.c | 13 +- net/ipv4/tcp_ipv4.c | 16 + net/ipv4/tcp_minisocks.c | 2 + net/ipv4/tcp_output.c | 5 +- net/ipv6/esp6.c | 3 +- net/packet/af_packet.c | 4 +- net/xdp/xsk_buff_pool.c | 5 + tools/include/uapi/linux/netdev.h | 13 + tools/testing/selftests/net/.gitignore | 1 + tools/testing/selftests/net/Makefile | 9 + tools/testing/selftests/net/ncdevmem.c | 570 ++++++++++++++++++++++++ 52 files changed, 2701 insertions(+), 121 deletions(-) create mode 100644 Documentation/networking/devmem.rst create mode 100644 include/net/devmem.h create mode 100644 include/net/mp_dmabuf_devmem.h create mode 100644 net/core/devmem.c create mode 100644 net/core/netdev_rx_queue.c create mode 100644 net/core/netmem_priv.h create mode 100644 tools/testing/selftests/net/ncdevmem.c -- 2.46.0.295.g3b9ea8a38a-goog

1 year, 4 months

2
18
0 0

[PATCH v11 00/39] arm64/gcs: Provide support for GCS in userspace

by Mark Brown

The arm64 Guarded Control Stack (GCS) feature provides support for hardware protected stacks of return addresses, intended to provide hardening against return oriented programming (ROP) attacks and to make it easier to gather call stacks for applications such as profiling. When GCS is active a secondary stack called the Guarded Control Stack is maintained, protected with a memory attribute which means that it can only be written with specific GCS operations. The current GCS pointer can not be directly written to by userspace. When a BL is executed the value stored in LR is also pushed onto the GCS, and when a RET is executed the top of the GCS is popped and compared to LR with a fault being raised if the values do not match. GCS operations may only be performed on GCS pages, a data abort is generated if they are not. The combination of hardware enforcement and lack of extra instructions in the function entry and exit paths should result in something which has less overhead and is more difficult to attack than a purely software implementation like clang's shadow stacks. This series implements support for use of GCS by userspace, along with support for use of GCS within KVM guests. It does not enable use of GCS by either EL1 or EL2, this will be implemented separately. Executables are started without GCS and must use a prctl() to enable it, it is expected that this will be done very early in application execution by the dynamic linker or other startup code. For dynamic linking this will be done by checking that everything in the executable is marked as GCS compatible. x86 has an equivalent feature called shadow stacks, this series depends on the x86 patches for generic memory management support for the new guarded/shadow stack page type and shares APIs as much as possible. As there has been extensive discussion with the wider community around the ABI for shadow stacks I have as far as practical kept implementation decisions close to those for x86, anticipating that review would lead to similar conclusions in the absence of strong reasoning for divergence. The main divergence I am concious of is that x86 allows shadow stack to be enabled and disabled repeatedly, freeing the shadow stack for the thread whenever disabled, while this implementation keeps the GCS allocated after disable but refuses to reenable it. This is to avoid races with things actively walking the GCS during a disable, we do anticipate that some systems will wish to disable GCS at runtime but are not aware of any demand for subsequently reenabling it. x86 uses an arch_prctl() to manage enable and disable, since only x86 and S/390 use arch_prctl() a generic prctl() was proposed[1] as part of a patch set for the equivalent RISC-V Zicfiss feature which I initially adopted fairly directly but following review feedback has been revised quite a bit. We currently maintain the x86 pattern of implicitly allocating a shadow stack for threads started with shadow stack enabled, there has been some discussion of removing this support and requiring the use of clone3() with explicit allocation of shadow stacks instead. I have no strong feelings either way, implicit allocation is not really consistent with anything else we do and creates the potential for errors around thread exit but on the other hand it is existing ABI on x86 and minimises the changes needed in userspace code. glibc and bionic changes using this ABI have been implemented and tested. Headless Android systems have been validated and Ross Burton has used this code has been used to bring up a Yocto system with GCS enabed as standard, a test implementation of V8 support has also been done. There is an open issue with support for CRIU, on x86 this required the ability to set the GCS mode via ptrace. This series supports configuring mode bits other than enable/disable via ptrace but it needs to be confirmed if this is sufficient. It is likely that we could relax some of the barriers added here with some more targeted placements, this is left for further study. There is an in process series adding clone3() support for shadow stacks: https://lore.kernel.org/r/20240819-clone3-shadow-stack-v9-0-962d74f99464@ke… Previous versions of this series depended on that, this dependency has been removed in order to make merging easier. [1] https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/ Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Changes in v11: - Remove the dependency on the addition of clone3() support for shadow stacks, rebasing onto v6.11-rc3. - Make ID_AA64PFR1_EL1.GCS writeable in KVM. - Hide GCS registers when GCS is not enabled for KVM guests. - Require HCRX_EL2.GCSEn if booting at EL1. - Require that GCSCR_EL1 and GCSCRE0_EL1 be initialised regardless of if we boot at EL2 or EL1. - Remove some stray use of bit 63 in signal cap tokens. - Warn if we see a GCS with VM_SHARED. - Remove rdundant check for VM_WRITE in fault handling. - Cleanups and clarifications in the ABI document. - Clean up and improve documentation of some sync placement. - Only set the EL0 GCS mode if it's actually changed. - Various minor fixes and tweaks. - Link to v10: https://lore.kernel.org/r/20240801-arm64-gcs-v10-0-699e2bd2190b@kernel.org Changes in v10: - Fix issues with THP. - Tighten up requirements for initialising GCSCR*. - Only generate GCS signal frames for threads using GCS. - Only context switch EL1 GCS registers if S1PIE is enabled. - Move context switch of GCSCRE0_EL1 to EL0 context switch. - Make GCS registers unconditionally visible to userspace. - Use FHU infrastructure. - Don't change writability of ID_AA64PFR1_EL1 for KVM. - Remove unused arguments from alloc_gcs(). - Typo fixes. - Link to v9: https://lore.kernel.org/r/20240625-arm64-gcs-v9-0-0f634469b8f0@kernel.org Changes in v9: - Rebase onto v6.10-rc3. - Restructure and clarify memory management fault handling. - Fix up basic-gcs for the latest clone3() changes. - Convert to newly merged KVM ID register based feature configuration. - Fixes for NV traps. - Link to v8: https://lore.kernel.org/r/20240203-arm64-gcs-v8-0-c9fec77673ef@kernel.org Changes in v8: - Invalidate signal cap token on stack when consuming. - Typo and other trivial fixes. - Don't try to use process_vm_write() on GCS, it intentionally does not work. - Fix leak of thread GCSs. - Rebase onto latest clone3() series. - Link to v7: https://lore.kernel.org/r/20231122-arm64-gcs-v7-0-201c483bd775@kernel.org Changes in v7: - Rebase onto v6.7-rc2 via the clone3() patch series. - Change the token used to cap the stack during signal handling to be compatible with GCSPOPM. - Fix flags for new page types. - Fold in support for clone3(). - Replace copy_to_user_gcs() with put_user_gcs(). - Link to v6: https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org Changes in v6: - Rebase onto v6.6-rc3. - Add some more gcsb_dsync() barriers following spec clarifications. - Due to ongoing discussion around clone()/clone3() I've not updated anything there, the behaviour is the same as on previous versions. - Link to v5: https://lore.kernel.org/r/20230822-arm64-gcs-v5-0-9ef181dd6324@kernel.org Changes in v5: - Don't map any permissions for user GCSs, we always use EL0 accessors or use a separate mapping of the page. - Reduce the standard size of the GCS to RLIMIT_STACK/2. - Enforce a PAGE_SIZE alignment requirement on map_shadow_stack(). - Clarifications and fixes to documentation. - More tests. - Link to v4: https://lore.kernel.org/r/20230807-arm64-gcs-v4-0-68cfa37f9069@kernel.org Changes in v4: - Implement flags for map_shadow_stack() allowing the cap and end of stack marker to be enabled independently or not at all. - Relax size and alignment requirements for map_shadow_stack(). - Add more blurb explaining the advantages of hardware enforcement. - Link to v3: https://lore.kernel.org/r/20230731-arm64-gcs-v3-0-cddf9f980d98@kernel.org Changes in v3: - Rebase onto v6.5-rc4. - Add a GCS barrier on context switch. - Add a GCS stress test. - Link to v2: https://lore.kernel.org/r/20230724-arm64-gcs-v2-0-dc2c1d44c2eb@kernel.org Changes in v2: - Rebase onto v6.5-rc3. - Rework prctl() interface to allow each bit to be locked independently. - map_shadow_stack() now places the cap token based on the size requested by the caller not the actual space allocated. - Mode changes other than enable via ptrace are now supported. - Expand test coverage. - Various smaller fixes and adjustments. - Link to v1: https://lore.kernel.org/r/20230716-arm64-gcs-v1-0-bf567f93bba6@kernel.org --- Mark Brown (39): mm: Introduce ARCH_HAS_USER_SHADOW_STACK arm64/mm: Restructure arch_validate_flags() for extensibility prctl: arch-agnostic prctl for shadow stack mman: Add map_shadow_stack() flags arm64: Document boot requirements for Guarded Control Stacks arm64/gcs: Document the ABI for Guarded Control Stacks arm64/sysreg: Add definitions for architected GCS caps arm64/gcs: Add manual encodings of GCS instructions arm64/gcs: Provide put_user_gcs() arm64/gcs: Provide basic EL2 setup to allow GCS usage at EL0 and EL1 arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS) arm64/mm: Allocate PIE slots for EL0 guarded control stack mm: Define VM_SHADOW_STACK for arm64 when we support GCS arm64/mm: Map pages for guarded control stack KVM: arm64: Manage GCS access and registers for guests arm64/idreg: Add overrride for GCS arm64/hwcap: Add hwcap for GCS arm64/traps: Handle GCS exceptions arm64/mm: Handle GCS data aborts arm64/gcs: Context switch GCS state for EL0 arm64/gcs: Ensure that new threads have a GCS arm64/gcs: Implement shadow stack prctl() interface arm64/mm: Implement map_shadow_stack() arm64/signal: Set up and restore the GCS context for signal handlers arm64/signal: Expose GCS state in signal frames arm64/ptrace: Expose GCS via ptrace and core files arm64: Add Kconfig for Guarded Control Stack (GCS) kselftest/arm64: Verify the GCS hwcap kselftest/arm64: Add GCS as a detected feature in the signal tests kselftest/arm64: Add framework support for GCS to signal handling tests kselftest/arm64: Allow signals tests to specify an expected si_code kselftest/arm64: Always run signals tests with GCS enabled kselftest/arm64: Add very basic GCS test program kselftest/arm64: Add a GCS test program built with the system libc kselftest/arm64: Add test coverage for GCS mode locking kselftest/arm64: Add GCS signal tests kselftest/arm64: Add a GCS stress test kselftest/arm64: Enable GCS for the FP stress tests KVM: selftests: arm64: Add GCS registers to get-reg-list Documentation/admin-guide/kernel-parameters.txt | 3 + Documentation/arch/arm64/booting.rst | 32 + Documentation/arch/arm64/elf_hwcaps.rst | 2 + Documentation/arch/arm64/gcs.rst | 230 +++++++ Documentation/arch/arm64/index.rst | 1 + Documentation/filesystems/proc.rst | 2 +- arch/arm64/Kconfig | 20 + arch/arm64/include/asm/cpufeature.h | 6 + arch/arm64/include/asm/el2_setup.h | 29 + arch/arm64/include/asm/esr.h | 28 +- arch/arm64/include/asm/exception.h | 2 + arch/arm64/include/asm/gcs.h | 107 +++ arch/arm64/include/asm/hwcap.h | 1 + arch/arm64/include/asm/kvm_host.h | 12 + arch/arm64/include/asm/mman.h | 23 +- arch/arm64/include/asm/pgtable-prot.h | 14 +- arch/arm64/include/asm/processor.h | 7 + arch/arm64/include/asm/sysreg.h | 20 + arch/arm64/include/asm/uaccess.h | 40 ++ arch/arm64/include/asm/vncr_mapping.h | 2 + arch/arm64/include/uapi/asm/hwcap.h | 1 + arch/arm64/include/uapi/asm/ptrace.h | 8 + arch/arm64/include/uapi/asm/sigcontext.h | 9 + arch/arm64/kernel/cpufeature.c | 12 + arch/arm64/kernel/cpuinfo.c | 1 + arch/arm64/kernel/entry-common.c | 23 + arch/arm64/kernel/pi/idreg-override.c | 2 + arch/arm64/kernel/process.c | 88 +++ arch/arm64/kernel/ptrace.c | 54 ++ arch/arm64/kernel/signal.c | 225 ++++++- arch/arm64/kernel/traps.c | 11 + arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 49 +- arch/arm64/kvm/sys_regs.c | 27 +- arch/arm64/mm/Makefile | 1 + arch/arm64/mm/fault.c | 40 ++ arch/arm64/mm/gcs.c | 252 +++++++ arch/arm64/mm/mmap.c | 10 +- arch/arm64/tools/cpucaps | 1 + arch/x86/Kconfig | 1 + arch/x86/include/uapi/asm/mman.h | 3 - fs/proc/task_mmu.c | 2 +- include/linux/mm.h | 18 +- include/uapi/asm-generic/mman.h | 4 + include/uapi/linux/elf.h | 1 + include/uapi/linux/prctl.h | 22 + kernel/sys.c | 30 + mm/Kconfig | 6 + tools/testing/selftests/arm64/Makefile | 2 +- tools/testing/selftests/arm64/abi/hwcap.c | 19 + tools/testing/selftests/arm64/fp/assembler.h | 15 + tools/testing/selftests/arm64/fp/fpsimd-test.S | 2 + tools/testing/selftests/arm64/fp/sve-test.S | 2 + tools/testing/selftests/arm64/fp/za-test.S | 2 + tools/testing/selftests/arm64/fp/zt-test.S | 2 + tools/testing/selftests/arm64/gcs/.gitignore | 5 + tools/testing/selftests/arm64/gcs/Makefile | 24 + tools/testing/selftests/arm64/gcs/asm-offsets.h | 0 tools/testing/selftests/arm64/gcs/basic-gcs.c | 357 ++++++++++ tools/testing/selftests/arm64/gcs/gcs-locking.c | 200 ++++++ .../selftests/arm64/gcs/gcs-stress-thread.S | 311 +++++++++ tools/testing/selftests/arm64/gcs/gcs-stress.c | 530 +++++++++++++++ tools/testing/selftests/arm64/gcs/gcs-util.h | 100 +++ tools/testing/selftests/arm64/gcs/libc-gcs.c | 728 +++++++++++++++++++++ tools/testing/selftests/arm64/signal/.gitignore | 1 + .../testing/selftests/arm64/signal/test_signals.c | 17 +- .../testing/selftests/arm64/signal/test_signals.h | 6 + .../selftests/arm64/signal/test_signals_utils.c | 32 +- .../selftests/arm64/signal/test_signals_utils.h | 39 ++ .../arm64/signal/testcases/gcs_exception_fault.c | 62 ++ .../selftests/arm64/signal/testcases/gcs_frame.c | 88 +++ .../arm64/signal/testcases/gcs_write_fault.c | 67 ++ .../selftests/arm64/signal/testcases/testcases.c | 7 + .../selftests/arm64/signal/testcases/testcases.h | 1 + tools/testing/selftests/kvm/aarch64/get-reg-list.c | 28 + 74 files changed, 4086 insertions(+), 43 deletions(-) --- base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba change-id: 20230303-arm64-gcs-e311ab0d8729 Best regards, -- Mark Brown <broonie(a)kernel.org>

1 year, 4 months

2
64
0 0

[PATCH] MAINTAINERS: Add selftests/x86 entry

by Muhammad Usama Anjum

There are no maintainers specified for tools/testing/selftests/x86. Shuah has mentioned [1] that the patches should go through x86 tree or in special cases directly to Shuah's tree after getting ack-ed from x86 maintainers. Different people have been confused when sending patches as correct maintainers aren't found by get_maintainer.pl script. Fix this by adding entry to MAINTAINERS file. [1] https://lore.kernel.org/all/90dc0dfc-4c67-4ea1-b705-0585d6e2ec47@linuxfound… Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com> --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index 523d84b2d6139..f3a17e5d954a3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -24378,6 +24378,7 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/core F: Documentation/arch/x86/ F: Documentation/devicetree/bindings/x86/ F: arch/x86/ +F: tools/testing/selftests/x86 X86 ENTRY CODE M: Andy Lutomirski <luto(a)kernel.org> -- 2.39.2

1 year, 4 months

4
7
0 0

[PATCH] selftests/vDSO: open code basic chacha instead of linking to libsodium

by Jason A. Donenfeld

Linking to libsodium makes building this test annoying in cross compilation environments and is just way too much. Since this is just a basic correctness test, simply open code a simple, unoptimized, dumb chacha, rather than linking to libsodium. Signed-off-by: Jason A. Donenfeld <Jason(a)zx2c4.com> --- tools/testing/selftests/vDSO/Makefile | 7 +-- .../testing/selftests/vDSO/vdso_test_chacha.c | 46 ++++++++++++++++++- 2 files changed, 45 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/vDSO/Makefile b/tools/testing/selftests/vDSO/Makefile index 13a626ef64f7..93c50a462858 100644 --- a/tools/testing/selftests/vDSO/Makefile +++ b/tools/testing/selftests/vDSO/Makefile @@ -1,8 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 uname_M := $(shell uname -m 2>/dev/null || echo not) ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/) -SODIUM_LIBS := $(shell pkg-config --libs libsodium 2>/dev/null) -SODIUM_CFLAGS := $(shell pkg-config --cflags libsodium 2>/dev/null) TEST_GEN_PROGS := vdso_test_gettimeofday TEST_GEN_PROGS += vdso_test_getcpu @@ -14,10 +12,8 @@ endif TEST_GEN_PROGS += vdso_test_correctness ifeq ($(uname_M),x86_64) TEST_GEN_PROGS += vdso_test_getrandom -ifneq ($(SODIUM_LIBS),) TEST_GEN_PROGS += vdso_test_chacha endif -endif CFLAGS := -std=gnu99 @@ -43,8 +39,7 @@ $(OUTPUT)/vdso_test_getrandom: CFLAGS += -isystem $(top_srcdir)/tools/include \ -isystem $(top_srcdir)/include/uapi $(OUTPUT)/vdso_test_chacha: $(top_srcdir)/tools/arch/$(ARCH)/vdso/vgetrandom-chacha.S -$(OUTPUT)/vdso_test_chacha: LDLIBS += $(SODIUM_LIBS) $(OUTPUT)/vdso_test_chacha: CFLAGS += -idirafter $(top_srcdir)/tools/include \ -idirafter $(top_srcdir)/arch/$(ARCH)/include \ -idirafter $(top_srcdir)/include \ - -D__ASSEMBLY__ -Wa,--noexecstack $(SODIUM_CFLAGS) + -D__ASSEMBLY__ -Wa,--noexecstack diff --git a/tools/testing/selftests/vDSO/vdso_test_chacha.c b/tools/testing/selftests/vDSO/vdso_test_chacha.c index ca5639d02969..019e8fbdf570 100644 --- a/tools/testing/selftests/vDSO/vdso_test_chacha.c +++ b/tools/testing/selftests/vDSO/vdso_test_chacha.c @@ -3,7 +3,6 @@ * Copyright (C) 2022-2024 Jason A. Donenfeld <Jason(a)zx2c4.com>. All Rights Reserved. */ -#include <sodium/crypto_stream_chacha20.h> #include <sys/random.h> #include <string.h> #include <stdint.h> @@ -14,6 +13,49 @@ typedef uint8_t u8; typedef uint32_t u32; typedef uint64_t u64; #include <vdso/getrandom.h> +#include <tools/le_byteshift.h> + +static u32 rol32(u32 word, unsigned int shift) +{ + return (word << (shift & 31)) | (word >> ((-shift) & 31)); +} + +static void reference_chacha20_blocks(u8 *dst_bytes, const u32 *key, size_t nblocks) +{ + u32 s[16] = { + 0x61707865U, 0x3320646eU, 0x79622d32U, 0x6b206574U, + key[0], key[1], key[2], key[3], key[4], key[5], key[6], key[7] + }; + + while (nblocks--) { + u32 x[16]; + memcpy(x, s, sizeof(x)); + for (unsigned int r = 0; r < 20; r += 2) { + #define QR(a, b, c, d) ( \ + x[a] += x[b], \ + x[d] = rol32(x[d] ^ x[a], 16), \ + x[c] += x[d], \ + x[b] = rol32(x[b] ^ x[c], 12), \ + x[a] += x[b], \ + x[d] = rol32(x[d] ^ x[a], 8), \ + x[c] += x[d], \ + x[b] = rol32(x[b] ^ x[c], 7)) + + QR(0, 4, 8, 12); + QR(1, 5, 9, 13); + QR(2, 6, 10, 14); + QR(3, 7, 11, 15); + QR(0, 5, 10, 15); + QR(1, 6, 11, 12); + QR(2, 7, 8, 13); + QR(3, 4, 9, 14); + } + for (unsigned int i = 0; i < 16; ++i, dst_bytes += sizeof(u32)) + put_unaligned_le32(x[i] + s[i], dst_bytes); + if (!++s[12]) + ++s[13]; + } +} int main(int argc, char *argv[]) { @@ -31,7 +73,7 @@ int main(int argc, char *argv[]) printf("getrandom() failed!\n"); return KSFT_SKIP; } - crypto_stream_chacha20(output1, sizeof(output1), nonce, (uint8_t *)key); + reference_chacha20_blocks(output1, key, BLOCKS); for (unsigned int split = 0; split < BLOCKS; ++split) { memset(output2, 'X', sizeof(output2)); memset(counter, 0, sizeof(counter)); -- 2.46.0

1 year, 4 months

3
4
0 0

[PATCH] selftests/bpf: Fix incorrect parameters in NULL pointer checking

by Hao Ge

From: Hao Ge <gehao(a)kylinos.cn> Smatch reported the following warning: ./tools/testing/selftests/bpf/testing_helpers.c:455 get_xlated_program() warn: variable dereferenced before check 'buf' (see line 454) It seems correct,so let's modify it based on it's suggestion. Actually,commit b23ed4d74c4d ("selftests/bpf: Fix invalid pointer check in get_xlated_program()") fixed an issue in the test_verifier.c once,but it was reverted this time. Let's solve this issue with the minimal changes possible. Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org> Closes: https://lore.kernel.org/all/1eb3732f-605a-479d-ba64-cd14250cbf91@stanley.mo… Fixes: b4b7a4099b8c ("selftests/bpf: Factor out get_xlated_program() helper") Signed-off-by: Hao Ge <gehao(a)kylinos.cn> --- tools/testing/selftests/bpf/testing_helpers.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/testing_helpers.c b/tools/testing/selftests/bpf/testing_helpers.c index d5379a0e6da8..34dfea295c8e 100644 --- a/tools/testing/selftests/bpf/testing_helpers.c +++ b/tools/testing/selftests/bpf/testing_helpers.c @@ -451,7 +451,7 @@ int get_xlated_program(int fd_prog, struct bpf_insn **buf, __u32 *cnt) *cnt = xlated_prog_len / buf_element_size; *buf = calloc(*cnt, buf_element_size); - if (!buf) { + if (!*buf) { perror("can't allocate xlated program buffer"); return -ENOMEM; } -- 2.25.1

1 year, 4 months

2
2
0 0

[PATCH 1/2] MAINTAINERS: Add selftest files to TPM section

by Michal Suchanek

tools/testing/selftests/tpm2/ is TPM-specific test Signed-off-by: Michal Suchanek <msuchanek(a)suse.de> --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index 878dcd23b331..c2ee92c7c16c 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -23184,6 +23184,7 @@ Q: https://patchwork.kernel.org/project/linux-integrity/list/ T: git git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd.git F: Documentation/devicetree/bindings/tpm/ F: drivers/char/tpm/ +F: tools/testing/selftests/tpm2/ TPS546D24 DRIVER M: Duke Du <dukedu83(a)gmail.com> -- 2.46.0

1 year, 4 months

2
2
0 0

Re: [PATCH v4 3/4] selftests/vDSO: Use KHDR_INCLUDES to locate UAPI headers for vdso_test_getrandom

by Jason A. Donenfeld

On Tue, Aug 27, 2024 at 09:20:16PM +0800, Xi Ruoyao wrote: > Building test_vdso_getrandom currently leads to following issue: > > In file included from /home/xry111/git-repos/linux/tools/include/linux/compiler_types.h:36, > from /home/xry111/git-repos/linux/include/uapi/linux/stddef.h:5, > from /home/xry111/git-repos/linux/include/uapi/linux/posix_types.h:5, > from /usr/include/asm/sigcontext.h:12, > from /usr/include/bits/sigcontext.h:30, > from /usr/include/signal.h:301, > from vdso_test_getrandom.c:14: > /home/xry111/git-repos/linux/tools/include/linux/compiler-gcc.h:3:2: error: #error "Please don't include <linux/compiler-gcc.h> directly, include <linux/compiler.h> instead." > 3 | #error "Please don't include <linux/compiler-gcc.h> directly, include <linux/compiler.h> instead." > | ^~~~~ > > It's because the compiler_types.h inclusion in > include/uapi/linux/stddef.h is expected to be removed by the > header_install.sh script, as compiler_types.h shouldn't be used from the > user space. Hmm. If I run this on my current 6.10-based system, I get: $ make CC vdso_standalone_test_x86 CC vdso_test_getrandom vdso_test_getrandom.c:43:41: error: field ‘params’ has incomplete type 43 | struct vgetrandom_opaque_params params; | ^~~~~~ Because KHDR_INCLUDES is /usr/include instead. Christophe, any suggestions on this one? And any idea why loongarch is hitting it, but not x86 or ppc? Jason

1 year, 4 months

4
12
0 0

[PATCH bpf-next v1 0/2] Enable vmtest for cross-compile arm64 on x86_64 host, and fix some related issues.

by Lin Yikai

Enable vmtest for cross-compile arm64 on x86_64 host, and fix some related issues. I have verified the patch for x86_64 with the target arch of 'x86' or 'arm64'. v1: - patch 2: - [1/2] Update "vmtest.sh" for cross-compile arm64 on x86_64 host. - [2/2] Fix cross-compile issue for some files and a static compile issue for "-lzstd" Lin Yikai (2): selftests/bpf: Update "vmtest.sh" for cross-compile arm64 on x86_64 host. selftests/bpf: Fix cross-compile issue for some files and a static compile issue for "-lzstd" tools/testing/selftests/bpf/Makefile | 12 ++++++++- tools/testing/selftests/bpf/README.rst | 12 ++++++++- tools/testing/selftests/bpf/vmtest.sh | 37 +++++++++++++++++++++----- 3 files changed, 53 insertions(+), 8 deletions(-) -- 2.34.1

1 year, 4 months

5
6
0 0

[PATCH RFC 0/3] add support for mm-local memory allocations

by Roman Kagan

In a series posted a few years ago [1], a proposal was put forward to allow the kernel to allocate memory local to a mm and thus push it out of reach for current and future speculation-based cross-process attacks. We still believe this is a nice thing to have. However, in the time passed since that post Linux mm has grown quite a few new goodies, so we'd like to explore possibilities to implement this functionality with less effort and churn leveraging the now available facilities. Specifically, this is a proof-of-concept attempt to implement mm-local allocations piggy-backing on memfd_secret(), using regular user addressess but pinning the pages and flipping the user/supervisor flag on the respective PTEs to make them directly accessible from kernel, and sealing the VMA to prevent userland from taking over the address range. The approach allowed to delegate all the heavy lifting -- address management, interactions with the direct map, cleanup on mm teardown -- to the existing infrastructure, and required zero architecture-specific code. Compared to the approach used in the orignal series, where a dedicated kernel address range and thus a dedicated PGD was used for mm-local allocations, the one proposed here may have certain drawbacks, in particular - using user addresses for kernel memory may violate assumptions in various parts of kernel code which we may not have identified with smoke tests we did - the allocated addresses are guessable by the userland (ATM they are even visible in /proc/PID/maps but that's fixable) which may weaken the security posture Also included is a simple test driver and selftest to smoke test and showcase the feature. The code is PoC RFC and lacks a lot of checks and special case handling, but demonstrates the idea. We'd appreciate any feedback on whether it's a viable approach or it should better be abandoned in favor of the one with dedicated PGD / kernel address range or yet something else. [1] https://lore.kernel.org/lkml/20190612170834.14855-1-mhillenb@amazon.de/ Fares Mehanna (2): mseal: expose interface to seal / unseal user memory ranges mm/secretmem: implement mm-local kernel allocations Roman Kagan (1): drivers/misc: add test driver and selftest for proclocal allocator drivers/misc/Makefile | 1 + tools/testing/selftests/proclocal/Makefile | 6 + include/linux/secretmem.h | 8 + mm/internal.h | 7 + drivers/misc/proclocal-test.c | 200 +++++++++++++++++ mm/gup.c | 4 +- mm/mseal.c | 81 ++++--- mm/secretmem.c | 208 ++++++++++++++++++ .../selftests/proclocal/proclocal-test.c | 150 +++++++++++++ drivers/misc/Kconfig | 15 ++ tools/testing/selftests/proclocal/.gitignore | 1 + 11 files changed, 649 insertions(+), 32 deletions(-) create mode 100644 tools/testing/selftests/proclocal/Makefile create mode 100644 drivers/misc/proclocal-test.c create mode 100644 tools/testing/selftests/proclocal/proclocal-test.c create mode 100644 tools/testing/selftests/proclocal/.gitignore -- 2.34.1 Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597

1 year, 4 months

4
6
0 0

[PATCH net-next v4 0/8] net/selftests: TCP-AO selftests updates

by Dmitry Safonov via B4 Relay

First 3 patches are more-or-less cleanups/preparations. Patches 4/5 are fixes for netns file descriptors leaks/open. Patch 6 was sent to me/contributed off-list by Mohammad, who wants 32-bit kernels to run TCP-AO. Patch 7 is a workaround/fix for slow VMs. Albeit, I can't reproduce the issue, but I hope it will fix netdev flakes for connect-deny-* tests. And the biggest change is adding TCP-AO tracepoints to selftests. I think it's a good addition by the following reasons: - The related tracepoints are now tested; - It allows tcp-ao selftests to raise expectations on the kernel behavior - up from the syscalls exit statuses + net counters. - Provides tracepoints usage samples. As tracepoints are not a stable ABI, any kernel changes done to them will be reflected to the selftests, which also will allow users to see how to change their code. It's quite better than parsing dmesg (what BGP was doing pre-tracepoints, ugh). Somewhat arguably, the code parses trace_pipe, rather than uses libtraceevent (which any sane user should do). The reason behind that is the same as for rt-netlink macros instead of libmnl: I'm trying to minimize the library dependencies of the selftests. And the performance of formatting text in kernel and parsing it again in a test is not critical. Current output sample: > ok 73 Trace events matched expectations: 13 tcp_hash_md5_required[2] tcp_hash_md5_unexpected[4] tcp_hash_ao_required[3] tcp_ao_key_not_found[4] Previously, tracepoints selftests were part of kernel tcp tracepoints submission [1], but since then the code was quite changed: - Now generic tracing setup is in lib/ftrace.c, separate from lib/ftrace-tcp.c which utilizes TCP trace points. This separation allows future selftests to trace non-TCP events, i.e. to find out an skb's drop reason, which was useful in the creation of TCP-CLOSE stress-test (not in this patch set, but used in attempt to reproduce the issue from [2]). - Another change is that in the previous submission the trace events where used only to detect unexpected TCP-AO/TCP-MD5 events. In this version the selftests will fail if an expected trace event didn't appear. Let's see how reliable this is on the netdev bot - it obviously passes on my testing, but potentially may require a temporary XFAIL patch if it misbehaves on a slow VM. [1] https://lore.kernel.org/lkml/20240224-tcp-ao-tracepoints-v1-0-15f31b7f30a7@… [2] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=3… Signed-off-by: Dmitry Safonov <0x7f454c46(a)gmail.com> --- In v4 mostly worked on non-appearing events on netdev test VM. - Set up x86 VM with the config from netdev & run stress-ng didn't reproduce the isssue. - Spread more error messages if tracing pthread fails to start - Added conditional wait for tracer thread, just before destruction, in case it didn't had a time slice to run and parse trace events. - Addressed some of checkpatch.pl --strict warnings (with nits from Simon Horman) - Link to v3: https://lore.kernel.org/r/20240815-tcp-ao-selftests-upd-6-12-v3-0-7bd2e22bb… Changes in v3: - Corrected the selftests printing of tcp header flags, parsed from trace points - Fixed an issue with VRF kconfig checks (and tests) - Made check for unexpected trace events XFAIL, yet looking into the reason behind the fail - Link to v2: https://lore.kernel.org/r/20240802-tcp-ao-selftests-upd-6-12-v2-0-370c99358… Changes in v2: - Fixed two issues with parsing TCP-AO events: the socket state and TCP segment flags. Hopefully, won't fail on netdev. - Reword patch 1 & 2 messages to be more informative and at some degree formal (Paolo) - Since commit e33a02ed6a4f ("selftests: Add printf attribute to kselftest prints") it's possible to use __printf instead of "raw" gcc attribute - switch using that, as checkpatch suggests. - Link to v1: https://lore.kernel.org/r/20240730-tcp-ao-selftests-upd-6-12-v1-0-ffd4bf15d… --- Dmitry Safonov (7): selftests/net: Clean-up double assignment selftests/net: Provide test_snprintf() helper selftests/net: Be consistent in kconfig checks selftests/net: Open /proc/thread-self in open_netns() selftests/net: Don't forget to close nsfd after switch_save_ns() selftests/net: Synchronize client/server before counters checks selftests/net: Add trace events matching to tcp_ao Mohammad Nassiri (1): selftests/tcp_ao: Fix printing format for uint64_t tools/testing/selftests/net/tcp_ao/Makefile | 3 +- tools/testing/selftests/net/tcp_ao/bench-lookups.c | 2 +- tools/testing/selftests/net/tcp_ao/config | 1 + tools/testing/selftests/net/tcp_ao/connect-deny.c | 25 +- tools/testing/selftests/net/tcp_ao/connect.c | 6 +- tools/testing/selftests/net/tcp_ao/icmps-discard.c | 2 +- .../testing/selftests/net/tcp_ao/key-management.c | 18 +- tools/testing/selftests/net/tcp_ao/lib/aolib.h | 178 ++++++- .../testing/selftests/net/tcp_ao/lib/ftrace-tcp.c | 559 +++++++++++++++++++++ tools/testing/selftests/net/tcp_ao/lib/ftrace.c | 543 ++++++++++++++++++++ tools/testing/selftests/net/tcp_ao/lib/kconfig.c | 31 +- tools/testing/selftests/net/tcp_ao/lib/setup.c | 17 +- tools/testing/selftests/net/tcp_ao/lib/sock.c | 1 - tools/testing/selftests/net/tcp_ao/lib/utils.c | 26 + tools/testing/selftests/net/tcp_ao/restore.c | 30 +- tools/testing/selftests/net/tcp_ao/rst.c | 2 +- tools/testing/selftests/net/tcp_ao/self-connect.c | 19 +- tools/testing/selftests/net/tcp_ao/seq-ext.c | 28 +- .../selftests/net/tcp_ao/setsockopt-closed.c | 6 +- tools/testing/selftests/net/tcp_ao/unsigned-md5.c | 35 +- 20 files changed, 1465 insertions(+), 67 deletions(-) --- base-commit: f9db28bb09f46087580f2a8da54bb0aab59a8024 change-id: 20240730-tcp-ao-selftests-upd-6-12-4d3e53a74f3f Best regards, -- Dmitry Safonov <0x7f454c46(a)gmail.com>

1 year, 4 months

4
11
0 0

[PATCH net 0/4] mptcp: close subflow when receiving TCP+FIN and misc.

by Matthieu Baerts (NGI0)

Here are different fixes: Patch 1 closes the subflow after having received a FIN, instead of leaving it half-closed until the end of the MPTCP connection. A fix for v5.12. Patch 2 validates the previous patch. Patch 3 is a fix for a recent fix to check both directions for the backup flag. It can follow the 'Fixes' commit and be backported up to v5.7. Patch 4 adds a missing \n at the end of pr_debug(), causing debug messages to be displayed with a delay, which confuses the debugger. A fix for v5.6. Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org> --- Note: Peter's email address has been removed from the Cc list, because it is bouncing. --- Matthieu Baerts (NGI0) (4): mptcp: close subflow when receiving TCP+FIN selftests: mptcp: join: cannot rm sf if closed mptcp: sched: check both backup in retrans mptcp: pr_debug: add missing \n at the end net/mptcp/fastopen.c | 4 +- net/mptcp/options.c | 50 ++++++++++----------- net/mptcp/pm.c | 28 ++++++------ net/mptcp/pm_netlink.c | 20 ++++----- net/mptcp/protocol.c | 59 +++++++++++++------------ net/mptcp/protocol.h | 4 +- net/mptcp/sched.c | 4 +- net/mptcp/sockopt.c | 4 +- net/mptcp/subflow.c | 56 ++++++++++++----------- tools/testing/selftests/net/mptcp/mptcp_join.sh | 11 ++--- 10 files changed, 122 insertions(+), 118 deletions(-) --- base-commit: 31a972959ae57691a1e4f539399b2674ae576086 change-id: 20240826-net-mptcp-close-extra-sf-fin-19d4e5aa4c9c Best regards, -- Matthieu Baerts (NGI0) <matttbe(a)kernel.org>

1 year, 4 months

4
7
0 0

[PATCH 1/1] Improve missing mods error message and make shell script executable

by David Hunter

Make the test executable. Currently, tests in this shell script are not executable, so the scipt file is skipped entirely. Also, the error message descirbing the required modules is inaccurate. Currently, only "SKIP: Need act_mirred module" is shown. As a result, users might only that module; however, three modules are actually required and if any of them are missing, the build will fail with the same message. Fix the error message to show any/all modules needed for the script file upon failure. Signed-off-by: David Hunter <david.hunter.linux(a)gmail.com> --- .../testing/selftests/net/test_ingress_egress_chaining.sh | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) mode change 100644 => 100755 tools/testing/selftests/net/test_ingress_egress_chaining.sh diff --git a/tools/testing/selftests/net/test_ingress_egress_chaining.sh b/tools/testing/selftests/net/test_ingress_egress_chaining.sh old mode 100644 new mode 100755 index 08adff6bb3b6..b1a3d68e0ec2 --- a/tools/testing/selftests/net/test_ingress_egress_chaining.sh +++ b/tools/testing/selftests/net/test_ingress_egress_chaining.sh @@ -13,8 +13,14 @@ if [ "$(id -u)" -ne 0 ];then fi needed_mods="act_mirred cls_flower sch_ingress" +mods_missing="" + +for mod in $needed_mods; do + modinfo $mod &>/dev/null || mods_missing="$mods_missing$mod " +done + for mod in $needed_mods; do - modinfo $mod &>/dev/null || { echo "SKIP: Need act_mirred module"; exit $ksft_skip; } + modinfo $mod &>/dev/null || { echo "SKIP: modules needed: $mods_missing"; exit $ksft_skip; } done ns="ns$((RANDOM%899+100))" -- 2.43.0

1 year, 4 months

3
7
0 0

[PATCH 1/1 V2] Selftests: net: Improve missing modules error message

by David Hunter

The error message descirbing the required modules is inaccurate. Currently, only "SKIP: Need act_mirred module" is printed when any of the modules are missing. As a result, users might only include that module; however, three modules are required. Fix the error message to show any/all modules needed for the script file to properly execute. Signed-off-by: David Hunter <david.hunter.linux(a)gmail.com> --- V1 --> V2 - included subject prefixes - Split the patch into two separate patches (one for each issue) - fixed typos in message body - removed second, unnecessary for loop --- .../selftests/net/test_ingress_egress_chaining.sh | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/test_ingress_egress_chaining.sh b/tools/testing/selftests/net/test_ingress_egress_chaining.sh index 08adff6bb3b6..d4b97662849b 100644 --- a/tools/testing/selftests/net/test_ingress_egress_chaining.sh +++ b/tools/testing/selftests/net/test_ingress_egress_chaining.sh @@ -13,10 +13,19 @@ if [ "$(id -u)" -ne 0 ];then fi needed_mods="act_mirred cls_flower sch_ingress" +mods_missing="" +numb_mods_needed=0; + for mod in $needed_mods; do - modinfo $mod &>/dev/null || { echo "SKIP: Need act_mirred module"; exit $ksft_skip; } + modinfo $mod &>/dev/null || + { mods_missing="$mods_missing$mod " ; numb_mods_needed=$(expr $numb_mods_needed + 1);} done +if [[ $numb_mods_needed>0 ]]; then + echo "SKIP: $numb_mods_needed modules needed: $mods_missing" ; exit $ksft_skip; +fi + + ns="ns$((RANDOM%899+100))" veth1="veth1$((RANDOM%899+100))" veth2="veth2$((RANDOM%899+100))" -- 2.43.0

1 year, 4 months

2
3
0 0

[PATCH net-next] selftests: forwarding: local_termination: Down ports on cleanup

by Petr Machata

This test neglects to put ports down on cleanup. Fix it. Fixes: 90b9566aa5cd ("selftests: forwarding: add a test for local_termination.sh") Signed-off-by: Petr Machata <petrm(a)nvidia.com> --- tools/testing/selftests/net/forwarding/local_termination.sh | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/testing/selftests/net/forwarding/local_termination.sh b/tools/testing/selftests/net/forwarding/local_termination.sh index c5b0cbc85b3e..9b5a63519b94 100755 --- a/tools/testing/selftests/net/forwarding/local_termination.sh +++ b/tools/testing/selftests/net/forwarding/local_termination.sh @@ -278,6 +278,10 @@ bridge() cleanup() { pre_cleanup + + ip link set $h2 down + ip link set $h1 down + vrf_cleanup } -- 2.45.0

1 year, 4 months

2
1
0 0

[PATCH net-next] selftests: forwarding: no_forwarding: Down ports on cleanup

by Petr Machata

This test neglects to put ports down on cleanup. Fix it. Fixes: 476a4f05d9b8 ("selftests: forwarding: add a no_forwarding.sh test") Signed-off-by: Petr Machata <petrm(a)nvidia.com> --- tools/testing/selftests/net/forwarding/no_forwarding.sh | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/testing/selftests/net/forwarding/no_forwarding.sh b/tools/testing/selftests/net/forwarding/no_forwarding.sh index af3b398d13f0..9e677aa64a06 100755 --- a/tools/testing/selftests/net/forwarding/no_forwarding.sh +++ b/tools/testing/selftests/net/forwarding/no_forwarding.sh @@ -233,6 +233,9 @@ cleanup() { pre_cleanup + ip link set dev $swp2 down + ip link set dev $swp1 down + h2_destroy h1_destroy -- 2.45.0

1 year, 4 months

3
2
0 0

[RFC PATCH net-next 0/5] selftests: forwarding: Introduce deferred commands

by Petr Machata

Recently, a defer helper was added to Python selftests. The idea is to keep cleanup commands close to their dirtying counterparts, thereby making it more transparent what is cleaning up what, making it harder to miss a cleanup, and make the whole cleanup business exception safe. All these benefits are applicable to bash as well, exception safety can be interpreted in terms of safety vs. a SIGINT. This patchset therefore introduces a framework of several helpers that serve to schedule cleanups in bash selftests. As a personal remark. More than once was I bit by stop_traffic not getting invoked because I C-c'd a traffic scheduler selftest at the wrong time. This would leave behind a running mausezahn that would break follow-up runs of the script that I was just debugging, making me question my sanity. ("How did this one extra debug print break the full script? And when I remove it again, _why is it still broken_?") This is an attempt at squashing this whole class of problems. Patch #1 has more details about the primitives being introduced. Patches #2 to #5 the convert several selftests to give an idea of how it looks in practice. Petr Machata (5): selftests: forwarding: Introduce deferred commands selftests: mlxsw: sch_red_core: Use defer for test cleanup selftests: mlxsw: sch_red_core: Use defer for stopping traffic selftests: mlxsw: sch_red_*: Use defer for qdisc management selftests: sch_tbf_core: Use defer for stopping traffic .../drivers/net/mlxsw/sch_red_core.sh | 131 +++++++----------- .../drivers/net/mlxsw/sch_red_ets.sh | 32 ++--- .../drivers/net/mlxsw/sch_red_root.sh | 24 +++- tools/testing/selftests/net/forwarding/lib.sh | 83 +++++++++++ .../selftests/net/forwarding/sch_tbf_core.sh | 3 +- 5 files changed, 170 insertions(+), 103 deletions(-) -- 2.45.0

1 year, 4 months

4
20
0 0

[PATCH nf-next v3 2/2] netfilter: Make IP6_NF_IPTABLES_LEGACY selectable

by Breno Leitao

This option makes IP6_NF_IPTABLES_LEGACY user selectable, giving users the option to configure iptables without enabling any other config. Signed-off-by: Breno Leitao <leitao(a)debian.org> --- net/ipv6/netfilter/Kconfig | 22 ++++++++++++---------- tools/testing/selftests/net/config | 5 +++++ 2 files changed, 17 insertions(+), 10 deletions(-) diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig index f3c8e2d918e1..dad0a50d3ef4 100644 --- a/net/ipv6/netfilter/Kconfig +++ b/net/ipv6/netfilter/Kconfig @@ -8,7 +8,13 @@ menu "IPv6: Netfilter Configuration" # old sockopt interface and eval loop config IP6_NF_IPTABLES_LEGACY - tristate + tristate "Legacy IP6 tables support" + depends on INET && IPV6 + select NETFILTER_XTABLES + default n + help + ip6tables is a general, extensible packet identification legacy framework. + This is not needed if you are using iptables over nftables (iptables-nft). config NF_SOCKET_IPV6 tristate "IPv6 socket lookup support" @@ -190,7 +196,7 @@ config IP6_NF_TARGET_HL config IP6_NF_FILTER tristate "Packet filtering" default m if NETFILTER_ADVANCED=n - select IP6_NF_IPTABLES_LEGACY + depends on IP6_NF_IPTABLES_LEGACY tristate help Packet filtering defines a table `filter', which has a series of @@ -227,7 +233,7 @@ config IP6_NF_TARGET_SYNPROXY config IP6_NF_MANGLE tristate "Packet mangling" default m if NETFILTER_ADVANCED=n - select IP6_NF_IPTABLES_LEGACY + depends on IP6_NF_IPTABLES_LEGACY help This option adds a `mangle' table to iptables: see the man page for iptables(8). This table is used for various packet alterations @@ -237,7 +243,7 @@ config IP6_NF_MANGLE config IP6_NF_RAW tristate 'raw table support (required for TRACE)' - select IP6_NF_IPTABLES_LEGACY + depends on IP6_NF_IPTABLES_LEGACY help This option adds a `raw' table to ip6tables. This table is the very first in the netfilter framework and hooks in at the PREROUTING @@ -249,9 +255,7 @@ config IP6_NF_RAW # security table for MAC policy config IP6_NF_SECURITY tristate "Security table" - depends on SECURITY - depends on NETFILTER_ADVANCED - select IP6_NF_IPTABLES_LEGACY + depends on SECURITY && NETFILTER_ADVANCED && IP6_NF_IPTABLES_LEGACY help This option adds a `security' table to iptables, for use with Mandatory Access Control (MAC) policy. @@ -260,10 +264,8 @@ config IP6_NF_SECURITY config IP6_NF_NAT tristate "ip6tables NAT support" - depends on NF_CONNTRACK - depends on NETFILTER_ADVANCED + depends on NF_CONNTRACK && NETFILTER_ADVANCED && IP6_NF_IPTABLES_LEGACY select NF_NAT - select IP6_NF_IPTABLES_LEGACY select NETFILTER_XT_NAT help This enables the `nat' table in ip6tables. This allows masquerading, diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config index 90e997cfa12e..e534144c75ea 100644 --- a/tools/testing/selftests/net/config +++ b/tools/testing/selftests/net/config @@ -35,12 +35,16 @@ CONFIG_IPV6_SIT=y CONFIG_IP_DCCP=m CONFIG_NF_NAT=m CONFIG_IP6_NF_IPTABLES=m +CONFIG_IP6_NF_IPTABLES_LEGACY=m CONFIG_IP_NF_IPTABLES=m CONFIG_IP_NF_IPTABLES_LEGACY=m CONFIG_IP_NF_FILTER=m CONFIG_IP_NF_TARGET_REJECT=m CONFIG_IP_NF_TARGET_MASQUERADE=m CONFIG_IP_NF_MANGLE=m +CONFIG_IP6_NF_MANGLE=m +CONFIG_IP6_NF_FILTER=m +CONFIG_IP6_NF_TARGET_REJECT=m CONFIG_IP6_NF_NAT=m CONFIG_IP6_NF_RAW=m CONFIG_IP_NF_NAT=m @@ -61,6 +65,7 @@ CONFIG_NF_TABLES=m CONFIG_NF_TABLES_IPV6=y CONFIG_NF_TABLES_IPV4=y CONFIG_NF_REJECT_IPV4=y +CONFIG_NF_REJECT_IPV6=y CONFIG_NFT_NAT=m CONFIG_NETFILTER_XT_MATCH_LENGTH=m CONFIG_NET_ACT_CSUM=m -- 2.43.5

1 year, 4 months

1
0
0 0

kselftest/next build: 7 builds: 2 failed, 5 passed, 1 warning (v6.11-rc1-16-g611fbeb44a777)

by kernelci.org bot

kselftest/next build: 7 builds: 2 failed, 5 passed, 1 warning (v6.11-rc1-16-g611fbeb44a777) Full Build Summary: https://kernelci.org/build/kselftest/branch/next/kernel/v6.11-rc1-16-g611fb… Tree: kselftest Branch: next Git Describe: v6.11-rc1-16-g611fbeb44a777 Git Commit: 611fbeb44a777e5ab54ab3127ec85f72147911d8 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git Built: 4 unique architectures Build Failures Detected: arm64: defconfig+kselftest+arm64-chromebook: (clang-16) FAIL defconfig+kselftest+arm64-chromebook: (gcc-12) FAIL Warnings Detected: arm64: arm: i386: x86_64: x86_64_defconfig+kselftest (clang-16): 1 warning Warnings summary: 1 vmlinux.o: warning: objtool: set_ftrace_ops_ro+0x39: relocation to !ENDBR: .text+0x14ef94 ================================================================================ Detailed per-defconfig build reports: -------------------------------------------------------------------------------- defconfig+kselftest (arm64, gcc-12) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- defconfig+kselftest+arm64-chromebook (arm64, gcc-12) — FAIL, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- defconfig+kselftest+arm64-chromebook (arm64, clang-16) — FAIL, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- i386_defconfig+kselftest (i386, gcc-12) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- multi_v7_defconfig+kselftest (arm, gcc-12) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- x86_64_defconfig+kselftest (x86_64, gcc-12) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- x86_64_defconfig+kselftest (x86_64, clang-16) — PASS, 0 errors, 1 warning, 0 section mismatches Warnings: vmlinux.o: warning: objtool: set_ftrace_ops_ro+0x39: relocation to !ENDBR: .text+0x14ef94 --- For more info write to <info(a)kernelci.org>

1 year, 4 months

1
0
0 0

[PATCH v3] Documentation: KUnit: Update filename best practices

by Kees Cook

Based on feedback from Linus[1] and follow-up discussions, change the suggested file naming for KUnit tests. Link: https://lore.kernel.org/lkml/CAHk-=wgim6pNiGTBMhP8Kd3tsB7_JTAuvNJ=XYd3wPvvk… [1] Reviewed-by: John Hubbard <jhubbard(a)nvidia.com> Signed-off-by: Kees Cook <kees(a)kernel.org> --- v3: additional clarification v2: https://lore.kernel.org/all/20240720165441.it.320-kees@kernel.org/ Cc: David Gow <davidgow(a)google.com> Cc: Brendan Higgins <brendan.higgins(a)linux.dev> Cc: Rae Moar <rmoar(a)google.com> Cc: John Hubbard <jhubbard(a)nvidia.com> Cc: Jonathan Corbet <corbet(a)lwn.net> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: linux-kselftest(a)vger.kernel.org Cc: kunit-dev(a)googlegroups.com Cc: linux-doc(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org Cc: linux-hardening(a)vger.kernel.org --- Documentation/dev-tools/kunit/style.rst | 29 +++++++++++++++++-------- 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/Documentation/dev-tools/kunit/style.rst b/Documentation/dev-tools/kunit/style.rst index b6d0d7359f00..eac81a714a29 100644 --- a/Documentation/dev-tools/kunit/style.rst +++ b/Documentation/dev-tools/kunit/style.rst @@ -188,15 +188,26 @@ For example, a Kconfig entry might look like: Test File and Module Names ========================== -KUnit tests can often be compiled as a module. These modules should be named -after the test suite, followed by ``_test``. If this is likely to conflict with -non-KUnit tests, the suffix ``_kunit`` can also be used. +KUnit tests are often compiled as a separate module. To avoid conflicting +with regular modules, KUnit modules should be named after the test suite, +followed by ``_kunit`` (e.g. if "foobar" is the core module, then +"foobar_kunit" is the KUnit test module). -The easiest way of achieving this is to name the file containing the test suite -``<suite>_test.c`` (or, as above, ``<suite>_kunit.c``). This file should be -placed next to the code under test. +Test source files, whether compiled as a separate module or an +``#include`` in another source file, are best kept in a ``tests/`` +subdirectory to not conflict with other source files (e.g. for +tab-completion). + +Note that the ``_test`` suffix has also been used in some existing +tests. The ``_kunit`` suffix is preferred, as it makes the distinction +between KUnit and non-KUnit tests clearer. + +So for the common case, name the file containing the test suite +``tests/<suite>_kunit.c``. The ``tests`` directory should be placed at +the same level as the code under test. For example, tests for +``lib/string.c`` live in ``lib/tests/string_kunit.c``. If the suite name contains some or all of the name of the test's parent -directory, it may make sense to modify the source filename to reduce redundancy. -For example, a ``foo_firmware`` suite could be in the ``foo/firmware_test.c`` -file. +directory, it may make sense to modify the source filename to reduce +redundancy. For example, a ``foo_firmware`` suite could be in the +``foo/tests/firmware_kunit.c`` file. -- 2.34.1

1 year, 4 months

3
3
0 0

[PATCH v1] selftests:tdx:Use min macro

by Yan Zhen

Using the min macro is usually more intuitive and readable. Signed-off-by: Yan Zhen <yanzhen(a)vivo.com> --- tools/testing/selftests/tdx/tdx_guest_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/tdx/tdx_guest_test.c b/tools/testing/selftests/tdx/tdx_guest_test.c index 81d8cb88e..d7ddf5307 100644 --- a/tools/testing/selftests/tdx/tdx_guest_test.c +++ b/tools/testing/selftests/tdx/tdx_guest_test.c @@ -118,7 +118,7 @@ static void print_array_hex(const char *title, const char *prefix_str, printf("\t\t%s", title); for (j = 0; j < len; j += rowsize) { - line_len = rowsize < (len - j) ? rowsize : (len - j); + line_len = min((len - j), rowsize); printf("%s%.8x:", prefix_str, j); for (i = 0; i < line_len; i++) printf(" %.2x", ptr[j + i]); -- 2.34.1

1 year, 4 months

3
2
0 0

[PATCH net-next v6 25/25] testing/selftest: add test tool and scripts for ovpn module

by Antonio Quartulli

The ovpn-cli tool can be compiled and used as selftest for the ovpn kernel module. It implementes the netlink API and can thus be integrated in any script for more automated testing. Along with the tool, 2 scripts are added that perform basic functionality tests by means of network namespaces. The scripts can be performed in sequence by running run.sh Cc: shuah(a)kernel.org Cc: linux-kselftest(a)vger.kernel.org Signed-off-by: Antonio Quartulli <antonio(a)openvpn.net> --- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/net/ovpn/.gitignore | 2 + tools/testing/selftests/net/ovpn/Makefile | 17 + tools/testing/selftests/net/ovpn/config | 8 + .../selftests/net/ovpn/data-test-tcp.sh | 9 + tools/testing/selftests/net/ovpn/data-test.sh | 150 ++ tools/testing/selftests/net/ovpn/data64.key | 5 + .../testing/selftests/net/ovpn/float-test.sh | 115 ++ tools/testing/selftests/net/ovpn/ovpn-cli.c | 1820 +++++++++++++++++ .../testing/selftests/net/ovpn/tcp_peers.txt | 1 + .../testing/selftests/net/ovpn/udp_peers.txt | 5 + 11 files changed, 2133 insertions(+) create mode 100644 tools/testing/selftests/net/ovpn/.gitignore create mode 100644 tools/testing/selftests/net/ovpn/Makefile create mode 100644 tools/testing/selftests/net/ovpn/config create mode 100755 tools/testing/selftests/net/ovpn/data-test-tcp.sh create mode 100755 tools/testing/selftests/net/ovpn/data-test.sh create mode 100644 tools/testing/selftests/net/ovpn/data64.key create mode 100755 tools/testing/selftests/net/ovpn/float-test.sh create mode 100644 tools/testing/selftests/net/ovpn/ovpn-cli.c create mode 100644 tools/testing/selftests/net/ovpn/tcp_peers.txt create mode 100644 tools/testing/selftests/net/ovpn/udp_peers.txt diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index a5f1c0c27dff..c293f2717708 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -66,6 +66,7 @@ TARGETS += net/forwarding TARGETS += net/hsr TARGETS += net/mptcp TARGETS += net/openvswitch +TARGETS += net/ovpn TARGETS += net/tcp_ao TARGETS += net/netfilter TARGETS += net/rds diff --git a/tools/testing/selftests/net/ovpn/.gitignore b/tools/testing/selftests/net/ovpn/.gitignore new file mode 100644 index 000000000000..ee44c081ca7c --- /dev/null +++ b/tools/testing/selftests/net/ovpn/.gitignore @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0+ +ovpn-cli diff --git a/tools/testing/selftests/net/ovpn/Makefile b/tools/testing/selftests/net/ovpn/Makefile new file mode 100644 index 000000000000..171cf047497c --- /dev/null +++ b/tools/testing/selftests/net/ovpn/Makefile @@ -0,0 +1,17 @@ +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2020-2024 OpenVPN, Inc. +# +CFLAGS = -Wall -I../../../../../usr/include +CFLAGS += $(shell pkg-config --cflags libnl-3.0 libnl-genl-3.0) + +LDFLAGS = -lmbedtls -lmbedcrypto +LDFLAGS += $(shell pkg-config --libs libnl-3.0 libnl-genl-3.0) + +ovpn-cli: ovpn-cli.c + +TEST_PROGS = data-test.sh \ + data-test-tcp.sh \ + float-test.sh +TEST_GEN_FILES = ovpn-cli + +include ../../lib.mk diff --git a/tools/testing/selftests/net/ovpn/config b/tools/testing/selftests/net/ovpn/config new file mode 100644 index 000000000000..5ff47de23c12 --- /dev/null +++ b/tools/testing/selftests/net/ovpn/config @@ -0,0 +1,8 @@ +CONFIG_NET=y +CONFIG_INET=y +CONFIG_NET_UDP_TUNNEL=y +CONFIG_DST_CACHE=y +CONFIG_CRYPTO_AES=y +CONFIG_CRYPTO_GCM=y +CONFIG_CRYPTO_CHACHA20POLY1305=y +CONFIG_OVPN=y diff --git a/tools/testing/selftests/net/ovpn/data-test-tcp.sh b/tools/testing/selftests/net/ovpn/data-test-tcp.sh new file mode 100755 index 000000000000..65f05659b5fd --- /dev/null +++ b/tools/testing/selftests/net/ovpn/data-test-tcp.sh @@ -0,0 +1,9 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2024 OpenVPN, Inc. +# +# Author: Antonio Quartulli <antonio(a)openvpn.net> + +PROTO="TCP" + +source data-test.sh diff --git a/tools/testing/selftests/net/ovpn/data-test.sh b/tools/testing/selftests/net/ovpn/data-test.sh new file mode 100755 index 000000000000..8468defb1f1c --- /dev/null +++ b/tools/testing/selftests/net/ovpn/data-test.sh @@ -0,0 +1,150 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2020-2024 OpenVPN, Inc. +# +# Author: Antonio Quartulli <antonio(a)openvpn.net> + +#set -x +set -e + +UDP_PEERS_FILE=${UDP_PEERS_FILE:-udp_peers.txt} +TCP_PEERS_FILE=${TCP_PEERS_FILE:-tcp_peers.txt} +OVPN_CLI=${OVPN_CLI:-./ovpn-cli} +ALG=${ALG:-aes} +PROTO=${PROTO:-UDP} + +create_ns() { + ip netns add peer${1} +} + +setup_ns() { + MODE="P2P" + + if [ ${1} -eq 0 ]; then + MODE="MP" + for p in $(seq 1 ${NUM_PEERS}); do + ip link add veth${p} netns peer0 type veth peer name veth${p} netns peer${p} + + ip -n peer0 addr add 10.10.${p}.1/24 dev veth${p} + ip -n peer0 link set veth${p} up + + ip -n peer${p} addr add 10.10.${p}.2/24 dev veth${p} + ip -n peer${p} link set veth${p} up + done + fi + + ip netns exec peer${1} ${OVPN_CLI} new_iface tun${1} $MODE + ip -n peer${1} addr add ${2} dev tun${1} + ip -n peer${1} link set tun${1} up +} + +add_peer() { + if [ "${PROTO}" == "UDP" ]; then + if [ ${1} -eq 0 ]; then + ip netns exec peer0 ${OVPN_CLI} new_multi_peer tun0 1 ${UDP_PEERS_FILE} + + for p in $(seq 1 ${NUM_PEERS}); do + # ip netns exec peer0 ${OVPN_CLI} new_peer tun0 ${p} ${p} 10.10.${p}.2 1 5.5.5.$((${p} + 1)) + + ip netns exec peer0 ${OVPN_CLI} new_key tun0 ${p} 1 0 ${ALG} 0 data64.key + done + else + ip netns exec peer${1} ${OVPN_CLI} new_peer tun${1} 1 ${1} 10.10.${1}.1 1 + ip netns exec peer${1} ${OVPN_CLI} new_key tun${1} ${1} 1 0 ${ALG} 1 data64.key + fi + else + if [ ${1} -eq 0 ]; then + (ip netns exec peer0 ${OVPN_CLI} listen tun0 1 ${TCP_PEERS_FILE} && { + for p in $(seq 1 ${NUM_PEERS}); do + ip netns exec peer0 ${OVPN_CLI} new_key tun0 ${p} 1 0 ${ALG} 0 data64.key + done + }) & + sleep 5 + else + ip netns exec peer${1} ${OVPN_CLI} connect tun${1} ${1} 10.10.${1}.1 1 data64.key + fi + fi +} + +cleanup() { + for p in $(seq 1 10); do + ip -n peer0 link del veth${p} 2>/dev/null || true + done + for p in $(seq 0 10); do + ip netns exec peer${p} ${OVPN_CLI} del_iface tun${p} 2>/dev/null || true + ip netns del peer${p} 2>/dev/null || true + done +} + +if [ "${PROTO}" == "UDP" ]; then + NUM_PEERS=${NUM_PEERS:-$(wc -l ${UDP_PEERS_FILE} | awk '{print $1}')} +else + NUM_PEERS=${NUM_PEERS:-$(wc -l ${TCP_PEERS_FILE} | awk '{print $1}')} +fi + +cleanup + +for p in $(seq 0 ${NUM_PEERS}); do + create_ns ${p} +done + +for p in $(seq 0 ${NUM_PEERS}); do + setup_ns ${p} 5.5.5.$((${p} + 1))/24 +done + +for p in $(seq 0 ${NUM_PEERS}); do + add_peer ${p} +done + +for p in $(seq 1 ${NUM_PEERS}); do + ip netns exec peer0 ${OVPN_CLI} set_peer tun0 ${p} 60 120 + ip netns exec peer${p} ${OVPN_CLI} set_peer tun${p} ${p} 60 120 +done + +for p in $(seq 1 ${NUM_PEERS}); do + ip netns exec peer0 ping -qfc 1000 -w 5 5.5.5.$((${p} + 1)) +done + +ip netns exec peer0 iperf3 -1 -s & +sleep 1 +ip netns exec peer1 iperf3 -Z -t 3 -c 5.5.5.1 + +for p in $(seq 1 ${NUM_PEERS}); do + ip netns exec peer0 ${OVPN_CLI} new_key tun0 ${p} 2 1 ${ALG} 0 data64.key + ip netns exec peer${p} ${OVPN_CLI} new_key tun${p} ${p} 2 1 ${ALG} 1 data64.key + ip netns exec peer${p} ${OVPN_CLI} swap_keys tun${p} ${p} +done + +sleep 1 +echo "Querying all peers:" +ip netns exec peer0 ${OVPN_CLI} get_peer tun0 +ip netns exec peer1 ${OVPN_CLI} get_peer tun1 + +echo "Querying peer 1:" +ip netns exec peer0 ${OVPN_CLI} get_peer tun0 1 + +echo "Querying non-existent peer 10:" +ip netns exec peer0 ${OVPN_CLI} get_peer tun0 10 || true + +ip netns exec peer0 ${OVPN_CLI} del_peer tun0 1 + +echo "Setting timeout to 10s MP:" +# bring ifaces down to prevent traffic being sent +for p in $(seq 0 ${NUM_PEERS}); do + ip -n peer${p} link set tun${p} down +done +# set short timeout +for p in $(seq 1 ${NUM_PEERS}); do + ip netns exec peer0 ${OVPN_CLI} set_peer tun0 ${p} 10 10 || true + ip netns exec peer${p} ${OVPN_CLI} set_peer tun${p} ${p} 0 0 +done +# wait for peers to timeout +sleep 15 + +echo "Setting timeout to 10s P2P:" +for p in $(seq 1 ${NUM_PEERS}); do + ip netns exec peer${p} ${OVPN_CLI} set_peer tun${p} ${p} 10 10 +done +sleep 15 + +cleanup diff --git a/tools/testing/selftests/net/ovpn/data64.key b/tools/testing/selftests/net/ovpn/data64.key new file mode 100644 index 000000000000..a99e88c4e290 --- /dev/null +++ b/tools/testing/selftests/net/ovpn/data64.key @@ -0,0 +1,5 @@ +jRqMACN7d7/aFQNT8S7jkrBD8uwrgHbG5OQZP2eu4R1Y7tfpS2bf5RHv06Vi163CGoaIiTX99R3B +ia9ycAH8Wz1+9PWv51dnBLur9jbShlgZ2QHLtUc4a/gfT7zZwULXuuxdLnvR21DDeMBaTbkgbai9 +uvAa7ne1liIgGFzbv+Bas4HDVrygxIxuAnP5Qgc3648IJkZ0QEXPF+O9f0n5+QIvGCxkAUVx+5K6 +KIs+SoeWXnAopELmoGSjUpFtJbagXK82HfdqpuUxT2Tnuef0/14SzVE/vNleBNu2ZbyrSAaah8tE +BofkPJUBFY+YQcfZNM5Dgrw3i+Bpmpq/gpdg5w== diff --git a/tools/testing/selftests/net/ovpn/float-test.sh b/tools/testing/selftests/net/ovpn/float-test.sh new file mode 100755 index 000000000000..5e113b767e78 --- /dev/null +++ b/tools/testing/selftests/net/ovpn/float-test.sh @@ -0,0 +1,115 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2020-2024 OpenVPN, Inc. +# +# Author: Antonio Quartulli <antonio(a)openvpn.net> + +#set -x +set -e + +UDP_PEERS_FILE=${UDP_PEERS_FILE:-udp_peers.txt} +TCP_PEERS_FILE=${TCP_PEERS_FILE:-tcp_peers.txt} +OVPN_CLI=${OVPN_CLI:-./ovpn-cli} +ALG=${ALG:-aes} +PROTO=${PROTO:-UDP} + +create_ns() { + ip netns add peer${1} +} + +setup_ns() { + MODE="P2P" + + if [ ${1} -eq 0 ]; then + MODE="MP" + for p in $(seq 1 ${NUM_PEERS}); do + ip link add veth${p} netns peer0 type veth peer name veth${p} netns peer${p} + + ip -n peer0 addr add 10.10.${p}.1/24 dev veth${p} + ip -n peer0 link set veth${p} up + + ip -n peer${p} addr add 10.10.${p}.2/24 dev veth${p} + ip -n peer${p} link set veth${p} up + done + fi + + ip netns exec peer${1} ${OVPN_CLI} new_iface tun${1} $MODE + ip -n peer${1} addr add ${2} dev tun${1} + ip -n peer${1} link set tun${1} up +} + +add_peer() { + if [ "${PROTO}" == "UDP" ]; then + if [ ${1} -eq 0 ]; then + ip netns exec peer0 ${OVPN_CLI} new_multi_peer tun0 1 ${UDP_PEERS_FILE} + + for p in $(seq 1 ${NUM_PEERS}); do + # ip netns exec peer0 ${OVPN_CLI} new_peer tun0 ${p} ${p} 10.10.${p}.2 1 5.5.5.$((${p} + 1)) + ip netns exec peer0 ${OVPN_CLI} new_key tun0 ${p} 1 0 ${ALG} 0 data64.key + done + else + ip netns exec peer${1} ${OVPN_CLI} new_peer tun${1} 1 ${1} 10.10.${1}.1 1 + ip netns exec peer${1} ${OVPN_CLI} new_key tun${1} ${1} 1 0 ${ALG} 1 data64.key + fi + else + if [ ${1} -eq 0 ]; then + (ip netns exec peer0 ${OVPN_CLI} listen tun0 1 ${TCP_PEERS_FILE} && { + for p in $(seq 1 ${NUM_PEERS}); do + ip netns exec peer0 ${OVPN_CLI} new_key tun0 ${p} 1 0 ${ALG} 0 data64.key + done + }) & + sleep 5 + else + ip netns exec peer${1} ${OVPN_CLI} connect tun${1} ${1} 10.10.${1}.1 1 5.5.5.1 data64.key + fi + fi +} + +cleanup() { + for p in $(seq 1 10); do + ip -n peer0 link del veth${p} 2>/dev/null || true + done + for p in $(seq 0 10); do + ip netns exec peer${p} ${OVPN_CLI} del_iface tun${p} 2>/dev/null || true + ip netns del peer${p} 2>/dev/null || true + done +} + +if [ "${PROTO}" == "UDP" ]; then + NUM_PEERS=${NUM_PEERS:-$(wc -l ${UDP_PEERS_FILE} | awk '{print $1}')} +else + NUM_PEERS=${NUM_PEERS:-$(wc -l ${TCP_PEERS_FILE} | awk '{print $1}')} +fi + +cleanup + +for p in $(seq 0 ${NUM_PEERS}); do + create_ns ${p} +done + +for p in $(seq 0 ${NUM_PEERS}); do + setup_ns ${p} 5.5.5.$((${p} + 1))/24 +done + +for p in $(seq 0 ${NUM_PEERS}); do + add_peer ${p} +done + +for p in $(seq 1 ${NUM_PEERS}); do + ip netns exec peer0 ${OVPN_CLI} set_peer tun0 ${p} 60 120 + ip netns exec peer${p} ${OVPN_CLI} set_peer tun${p} ${p} 60 120 +done + +for p in $(seq 1 ${NUM_PEERS}); do + ip netns exec peer0 ping -qfc 1000 -w 5 5.5.5.$((${p} + 1)) +done +# make clients float.. +for p in $(seq 1 ${NUM_PEERS}); do + ip -n peer${p} addr del 10.10.${p}.2/24 dev veth${p} + ip -n peer${p} addr add 10.10.${p}.3/24 dev veth${p} +done +for p in $(seq 1 ${NUM_PEERS}); do + ip netns exec peer${p} ping -qfc 1000 -w 5 5.5.5.1 +done + +cleanup diff --git a/tools/testing/selftests/net/ovpn/ovpn-cli.c b/tools/testing/selftests/net/ovpn/ovpn-cli.c new file mode 100644 index 000000000000..a10ad4881f13 --- /dev/null +++ b/tools/testing/selftests/net/ovpn/ovpn-cli.c @@ -0,0 +1,1820 @@ +// SPDX-License-Identifier: GPL-2.0 +/* OpenVPN data channel accelerator + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author: Antonio Quartulli <antonio(a)openvpn.net> + */ + +#include <stdio.h> +#include <inttypes.h> +#include <stdbool.h> +#include <string.h> +#include <errno.h> +#include <unistd.h> +#include <arpa/inet.h> +#include <net/if.h> +#include <netinet/in.h> + +#include <linux/ovpn.h> +#include <linux/types.h> +#include <linux/netlink.h> + +#include <netlink/socket.h> +#include <netlink/netlink.h> +#include <netlink/genl/genl.h> +#include <netlink/genl/family.h> +#include <netlink/genl/ctrl.h> + +#include <mbedtls/base64.h> +#include <mbedtls/error.h> + +#include <sys/socket.h> + +/* we use strscpy to make checkpatch happy */ +#define strscpy strncpy + +/* libnl < 3.5.0 does not set the NLA_F_NESTED on its own, therefore we + * have to explicitly do it to prevent the kernel from failing upon + * parsing of the message + */ +#define nla_nest_start(_msg, _type) \ + nla_nest_start(_msg, (_type) | NLA_F_NESTED) + +uint64_t nla_get_uint(struct nlattr *attr) +{ + if (nla_len(attr) == sizeof(uint32_t)) + return nla_get_u32(attr); + else + return nla_get_u64(attr); +} + +typedef int (*ovpn_nl_cb)(struct nl_msg *msg, void *arg); + +enum ovpn_key_direction { + KEY_DIR_IN = 0, + KEY_DIR_OUT, +}; + +#define KEY_LEN (256 / 8) +#define NONCE_LEN 8 + +#define PEER_ID_UNDEF 0x00FFFFFF + +struct nl_ctx { + struct nl_sock *nl_sock; + struct nl_msg *nl_msg; + struct nl_cb *nl_cb; + + int ovpn_dco_id; +}; + +struct ovpn_ctx { + __u8 key_enc[KEY_LEN]; + __u8 key_dec[KEY_LEN]; + __u8 nonce[NONCE_LEN]; + + enum ovpn_cipher_alg cipher; + + sa_family_t sa_family; + + __u32 peer_id; + __u16 lport; + + union { + struct sockaddr_in in4; + struct sockaddr_in6 in6; + } remote; + + union { + struct sockaddr_in in4; + struct sockaddr_in6 in6; + } peer_ip; + + bool peer_ip_set; + + unsigned int ifindex; + char ifname[IFNAMSIZ]; + enum ovpn_mode mode; + bool mode_set; + + int socket; + int cli_socket; + + __u32 keepalive_interval; + __u32 keepalive_timeout; + + enum ovpn_key_direction key_dir; + enum ovpn_key_slot key_slot; + int key_id; +}; + +static int ovpn_nl_recvmsgs(struct nl_ctx *ctx) +{ + int ret; + + ret = nl_recvmsgs(ctx->nl_sock, ctx->nl_cb); + + switch (ret) { + case -NLE_INTR: + fprintf(stderr, + "netlink received interrupt due to signal - ignoring\n"); + break; + case -NLE_NOMEM: + fprintf(stderr, "netlink out of memory error\n"); + break; + case -NLE_AGAIN: + fprintf(stderr, + "netlink reports blocking read - aborting wait\n"); + break; + default: + if (ret) + fprintf(stderr, "netlink reports error (%d): %s\n", + ret, nl_geterror(-ret)); + break; + } + + return ret; +} + +static struct nl_ctx *nl_ctx_alloc_flags(struct ovpn_ctx *ovpn, int cmd, + int flags) +{ + struct nl_ctx *ctx; + int err, ret; + + ctx = calloc(1, sizeof(*ctx)); + if (!ctx) + return NULL; + + ctx->nl_sock = nl_socket_alloc(); + if (!ctx->nl_sock) { + fprintf(stderr, "cannot allocate netlink socket\n"); + goto err_free; + } + + nl_socket_set_buffer_size(ctx->nl_sock, 8192, 8192); + + ret = genl_connect(ctx->nl_sock); + if (ret) { + fprintf(stderr, "cannot connect to generic netlink: %s\n", + nl_geterror(ret)); + goto err_sock; + } + + /* enable Extended ACK for detailed error reporting */ + err = 1; + setsockopt(nl_socket_get_fd(ctx->nl_sock), SOL_NETLINK, NETLINK_EXT_ACK, + &err, sizeof(err)); + + ctx->ovpn_dco_id = genl_ctrl_resolve(ctx->nl_sock, OVPN_FAMILY_NAME); + if (ctx->ovpn_dco_id < 0) { + fprintf(stderr, "cannot find ovpn_dco netlink component: %d\n", + ctx->ovpn_dco_id); + goto err_free; + } + + ctx->nl_msg = nlmsg_alloc(); + if (!ctx->nl_msg) { + fprintf(stderr, "cannot allocate netlink message\n"); + goto err_sock; + } + + ctx->nl_cb = nl_cb_alloc(NL_CB_DEFAULT); + if (!ctx->nl_cb) { + fprintf(stderr, "failed to allocate netlink callback\n"); + goto err_msg; + } + + nl_socket_set_cb(ctx->nl_sock, ctx->nl_cb); + + genlmsg_put(ctx->nl_msg, 0, 0, ctx->ovpn_dco_id, 0, flags, cmd, 0); + + if (ovpn->ifindex > 0) + NLA_PUT_U32(ctx->nl_msg, OVPN_A_IFINDEX, ovpn->ifindex); + + return ctx; +nla_put_failure: +err_msg: + nlmsg_free(ctx->nl_msg); +err_sock: + nl_socket_free(ctx->nl_sock); +err_free: + free(ctx); + return NULL; +} + +static struct nl_ctx *nl_ctx_alloc(struct ovpn_ctx *ovpn, int cmd) +{ + return nl_ctx_alloc_flags(ovpn, cmd, 0); +} + +static void nl_ctx_free(struct nl_ctx *ctx) +{ + if (!ctx) + return; + + nl_socket_free(ctx->nl_sock); + nlmsg_free(ctx->nl_msg); + nl_cb_put(ctx->nl_cb); + free(ctx); +} + +static int ovpn_nl_cb_error(struct sockaddr_nl (*nla)__attribute__((unused)), + struct nlmsgerr *err, void *arg) +{ + struct nlmsghdr *nlh = (struct nlmsghdr *)err - 1; + struct nlattr *tb_msg[NLMSGERR_ATTR_MAX + 1]; + int len = nlh->nlmsg_len; + struct nlattr *attrs; + int *ret = arg; + int ack_len = sizeof(*nlh) + sizeof(int) + sizeof(*nlh); + + *ret = err->error; + + if (!(nlh->nlmsg_flags & NLM_F_ACK_TLVS)) + return NL_STOP; + + if (!(nlh->nlmsg_flags & NLM_F_CAPPED)) + ack_len += err->msg.nlmsg_len - sizeof(*nlh); + + if (len <= ack_len) + return NL_STOP; + + attrs = (void *)((unsigned char *)nlh + ack_len); + len -= ack_len; + + nla_parse(tb_msg, NLMSGERR_ATTR_MAX, attrs, len, NULL); + if (tb_msg[NLMSGERR_ATTR_MSG]) { + len = strnlen((char *)nla_data(tb_msg[NLMSGERR_ATTR_MSG]), + nla_len(tb_msg[NLMSGERR_ATTR_MSG])); + fprintf(stderr, "kernel error: %*s\n", len, + (char *)nla_data(tb_msg[NLMSGERR_ATTR_MSG])); + } + + if (tb_msg[NLMSGERR_ATTR_MISS_NEST]) { + fprintf(stderr, "missing required nesting type %u\n", + nla_get_u32(tb_msg[NLMSGERR_ATTR_MISS_NEST])); + } + + if (tb_msg[NLMSGERR_ATTR_MISS_TYPE]) { + fprintf(stderr, "missing required attribute type %u\n", + nla_get_u32(tb_msg[NLMSGERR_ATTR_MISS_TYPE])); + } + + return NL_STOP; +} + +static int ovpn_nl_cb_finish(struct nl_msg (*msg)__attribute__((unused)), + void *arg) +{ + int *status = arg; + + *status = 0; + return NL_SKIP; +} + +static int ovpn_nl_cb_ack(struct nl_msg (*msg)__attribute__((unused)), + void *arg) +{ + int *status = arg; + + *status = 0; + return NL_STOP; +} + +static int ovpn_nl_msg_send(struct nl_ctx *ctx, ovpn_nl_cb cb) +{ + int status = 1; + + nl_cb_err(ctx->nl_cb, NL_CB_CUSTOM, ovpn_nl_cb_error, &status); + nl_cb_set(ctx->nl_cb, NL_CB_FINISH, NL_CB_CUSTOM, ovpn_nl_cb_finish, + &status); + nl_cb_set(ctx->nl_cb, NL_CB_ACK, NL_CB_CUSTOM, ovpn_nl_cb_ack, &status); + + if (cb) + nl_cb_set(ctx->nl_cb, NL_CB_VALID, NL_CB_CUSTOM, cb, ctx); + + nl_send_auto_complete(ctx->nl_sock, ctx->nl_msg); + + while (status == 1) + ovpn_nl_recvmsgs(ctx); + + if (status < 0) + fprintf(stderr, "failed to send netlink message: %s (%d)\n", + strerror(-status), status); + + return status; +} + +static int ovpn_read_key(const char *file, struct ovpn_ctx *ctx) +{ + int idx_enc, idx_dec, ret = -1; + unsigned char *ckey = NULL; + __u8 *bkey = NULL; + size_t olen = 0; + long ckey_len; + FILE *fp; + + fp = fopen(file, "r"); + if (!fp) { + fprintf(stderr, "cannot open: %s\n", file); + return -1; + } + + /* get file size */ + fseek(fp, 0L, SEEK_END); + ckey_len = ftell(fp); + rewind(fp); + + /* if the file is longer, let's just read a portion */ + if (ckey_len > 256) + ckey_len = 256; + + ckey = malloc(ckey_len); + if (!ckey) + goto err; + + ret = fread(ckey, 1, ckey_len, fp); + if (ret != ckey_len) { + fprintf(stderr, + "couldn't read enough data from key file: %dbytes read\n", + ret); + goto err; + } + + olen = 0; + ret = mbedtls_base64_decode(NULL, 0, &olen, ckey, ckey_len); + if (ret != MBEDTLS_ERR_BASE64_BUFFER_TOO_SMALL) { + char buf[256]; + + mbedtls_strerror(ret, buf, sizeof(buf)); + fprintf(stderr, "unexpected base64 error1: %s (%d)\n", buf, + ret); + + goto err; + } + + bkey = malloc(olen); + if (!bkey) { + fprintf(stderr, "cannot allocate binary key buffer\n"); + goto err; + } + + ret = mbedtls_base64_decode(bkey, olen, &olen, ckey, ckey_len); + if (ret) { + char buf[256]; + + mbedtls_strerror(ret, buf, sizeof(buf)); + fprintf(stderr, "unexpected base64 error2: %s (%d)\n", buf, + ret); + + goto err; + } + + if (olen < 2 * KEY_LEN + NONCE_LEN) { + fprintf(stderr, + "not enough data in key file, found %zdB but needs %dB\n", + olen, 2 * KEY_LEN + NONCE_LEN); + goto err; + } + + switch (ctx->key_dir) { + case KEY_DIR_IN: + idx_enc = 0; + idx_dec = 1; + break; + case KEY_DIR_OUT: + idx_enc = 1; + idx_dec = 0; + break; + } + + memcpy(ctx->key_enc, bkey + KEY_LEN * idx_enc, KEY_LEN); + memcpy(ctx->key_dec, bkey + KEY_LEN * idx_dec, KEY_LEN); + memcpy(ctx->nonce, bkey + 2 * KEY_LEN, NONCE_LEN); + + ret = 0; + +err: + fclose(fp); + free(bkey); + free(ckey); + + return ret; +} + +static int ovpn_read_cipher(const char *cipher, struct ovpn_ctx *ctx) +{ + if (strcmp(cipher, "aes") == 0) + ctx->cipher = OVPN_CIPHER_ALG_AES_GCM; + else if (strcmp(cipher, "chachapoly") == 0) + ctx->cipher = OVPN_CIPHER_ALG_CHACHA20_POLY1305; + else if (strcmp(cipher, "none") == 0) + ctx->cipher = OVPN_CIPHER_ALG_NONE; + else + return -ENOTSUP; + + return 0; +} + +static int ovpn_read_key_direction(const char *dir, struct ovpn_ctx *ctx) +{ + int in_dir; + + in_dir = strtoll(dir, NULL, 10); + switch (in_dir) { + case KEY_DIR_IN: + case KEY_DIR_OUT: + ctx->key_dir = in_dir; + break; + default: + fprintf(stderr, + "invalid key direction provided. Can be 0 or 1 only\n"); + return -1; + } + + return 0; +} + +static int ovpn_socket(struct ovpn_ctx *ctx, sa_family_t family, int proto) +{ + struct sockaddr_storage local_sock; + struct sockaddr_in6 *in6; + struct sockaddr_in *in; + int ret, s, sock_type; + size_t sock_len; + + if (proto == IPPROTO_UDP) + sock_type = SOCK_DGRAM; + else if (proto == IPPROTO_TCP) + sock_type = SOCK_STREAM; + else + return -EINVAL; + + s = socket(family, sock_type, 0); + if (s < 0) { + perror("cannot create socket"); + return -1; + } + + memset((char *)&local_sock, 0, sizeof(local_sock)); + + switch (family) { + case AF_INET: + in = (struct sockaddr_in *)&local_sock; + in->sin_family = family; + in->sin_port = htons(ctx->lport); + in->sin_addr.s_addr = htonl(INADDR_ANY); + sock_len = sizeof(*in); + break; + case AF_INET6: + in6 = (struct sockaddr_in6 *)&local_sock; + in6->sin6_family = family; + in6->sin6_port = htons(ctx->lport); + in6->sin6_addr = in6addr_any; + sock_len = sizeof(*in6); + break; + default: + return -1; + } + + int opt = 1; + + ret = setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt)); + + if (ret < 0) { + perror("setsockopt for SO_REUSEADDR"); + return ret; + } + + ret = setsockopt(s, SOL_SOCKET, SO_REUSEPORT, &opt, sizeof(opt)); + if (ret < 0) { + perror("setsockopt for SO_REUSEPORT"); + return ret; + } + + if (family == AF_INET6) { + opt = 0; + if (setsockopt(s, IPPROTO_IPV6, IPV6_V6ONLY, &opt, + sizeof(opt))) { + perror("failed to set IPV6_V6ONLY"); + return -1; + } + } + + ret = bind(s, (struct sockaddr *)&local_sock, sock_len); + if (ret < 0) { + perror("cannot bind socket"); + goto err_socket; + } + + ctx->socket = s; + ctx->sa_family = family; + return 0; + +err_socket: + close(s); + return -1; +} + +static int ovpn_udp_socket(struct ovpn_ctx *ctx, sa_family_t family) +{ + return ovpn_socket(ctx, family, IPPROTO_UDP); +} + +static int ovpn_listen(struct ovpn_ctx *ctx, sa_family_t family) +{ + int ret; + + ret = ovpn_socket(ctx, family, IPPROTO_TCP); + if (ret < 0) + return ret; + + ret = listen(ctx->socket, 10); + if (ret < 0) { + perror("listen"); + close(ctx->socket); + return -1; + } + + return 0; +} + +static int ovpn_accept(struct ovpn_ctx *ctx) +{ + socklen_t socklen; + int ret; + + socklen = sizeof(ctx->remote); + ret = accept(ctx->socket, (struct sockaddr *)&ctx->remote, &socklen); + if (ret < 0) { + perror("accept"); + goto err; + } + + fprintf(stderr, "Connection received!\n"); + + switch (socklen) { + case sizeof(struct sockaddr_in): + case sizeof(struct sockaddr_in6): + break; + default: + fprintf(stderr, "error: expecting IPv4 or IPv6 connection\n"); + close(ret); + ret = -EINVAL; + goto err; + } + + return ret; +err: + close(ctx->socket); + return ret; +} + +static int ovpn_connect(struct ovpn_ctx *ovpn) +{ + socklen_t socklen; + int s, ret; + + s = socket(ovpn->remote.in4.sin_family, SOCK_STREAM, 0); + if (s < 0) { + perror("cannot create socket"); + return -1; + } + + switch (ovpn->remote.in4.sin_family) { + case AF_INET: + socklen = sizeof(struct sockaddr_in); + break; + case AF_INET6: + socklen = sizeof(struct sockaddr_in6); + break; + default: + return -EOPNOTSUPP; + } + + ret = connect(s, (struct sockaddr *)&ovpn->remote, socklen); + if (ret < 0) { + perror("connect"); + goto err; + } + + fprintf(stderr, "connected\n"); + + ovpn->socket = s; + + return 0; +err: + close(s); + return ret; +} + +static int ovpn_new_peer(struct ovpn_ctx *ovpn, bool is_tcp) +{ + struct nlattr *attr; + struct nl_ctx *ctx; + size_t alen; + int ret = -1; + + ctx = nl_ctx_alloc(ovpn, OVPN_CMD_SET_PEER); + if (!ctx) + return -ENOMEM; + + attr = nla_nest_start(ctx->nl_msg, OVPN_A_PEER); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_ID, ovpn->peer_id); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_SOCKET, ovpn->socket); + + if (!is_tcp) { + switch (ovpn->remote.in4.sin_family) { + case AF_INET: + alen = sizeof(struct sockaddr_in); + break; + case AF_INET6: + alen = sizeof(struct sockaddr_in6); + break; + default: + fprintf(stderr, + "Invalid family for remote socket address\n"); + goto nla_put_failure; + } + NLA_PUT(ctx->nl_msg, OVPN_A_PEER_SOCKADDR_REMOTE, alen, + &ovpn->remote); + } + + if (ovpn->peer_ip_set) { + switch (ovpn->peer_ip.in4.sin_family) { + case AF_INET: + NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_VPN_IPV4, + ovpn->peer_ip.in4.sin_addr.s_addr); + break; + case AF_INET6: + NLA_PUT(ctx->nl_msg, OVPN_A_PEER_VPN_IPV6, + sizeof(struct in6_addr), + &ovpn->peer_ip.in6.sin6_addr); + break; + default: + fprintf(stderr, "Invalid family for peer address\n"); + goto nla_put_failure; + } + } + + nla_nest_end(ctx->nl_msg, attr); + + ret = ovpn_nl_msg_send(ctx, NULL); +nla_put_failure: + nl_ctx_free(ctx); + return ret; +} + +static int ovpn_set_peer(struct ovpn_ctx *ovpn) +{ + struct nlattr *attr; + struct nl_ctx *ctx; + int ret = -1; + + ctx = nl_ctx_alloc(ovpn, OVPN_CMD_SET_PEER); + if (!ctx) + return -ENOMEM; + + attr = nla_nest_start(ctx->nl_msg, OVPN_A_PEER); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_ID, ovpn->peer_id); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_KEEPALIVE_INTERVAL, + ovpn->keepalive_interval); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_KEEPALIVE_TIMEOUT, + ovpn->keepalive_timeout); + nla_nest_end(ctx->nl_msg, attr); + + ret = ovpn_nl_msg_send(ctx, NULL); +nla_put_failure: + nl_ctx_free(ctx); + return ret; +} + +static int ovpn_del_peer(struct ovpn_ctx *ovpn) +{ + struct nlattr *attr; + struct nl_ctx *ctx; + int ret = -1; + + ctx = nl_ctx_alloc(ovpn, OVPN_CMD_DEL_PEER); + if (!ctx) + return -ENOMEM; + + attr = nla_nest_start(ctx->nl_msg, OVPN_A_PEER); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_ID, ovpn->peer_id); + nla_nest_end(ctx->nl_msg, attr); + + ret = ovpn_nl_msg_send(ctx, NULL); +nla_put_failure: + nl_ctx_free(ctx); + return ret; +} + +static int ovpn_handle_peer(struct nl_msg *msg, void *arg) +{ + struct nlattr *attrs_peer[OVPN_A_PEER_MAX + 1]; + struct genlmsghdr *gnlh = nlmsg_data(nlmsg_hdr(msg)); + struct nlattr *attrs[OVPN_A_MAX + 1]; + __u16 port = 0; + + nla_parse(attrs, OVPN_A_MAX, genlmsg_attrdata(gnlh, 0), + genlmsg_attrlen(gnlh, 0), NULL); + + if (!attrs[OVPN_A_PEER]) { + fprintf(stderr, "no packet content in netlink message\n"); + return NL_SKIP; + } + + nla_parse(attrs_peer, OVPN_A_PEER_MAX, nla_data(attrs[OVPN_A_PEER]), + nla_len(attrs[OVPN_A_PEER]), NULL); + + if (attrs_peer[OVPN_A_PEER_ID]) + fprintf(stderr, "* Peer %u\n", + nla_get_u32(attrs_peer[OVPN_A_PEER_ID])); + + if (attrs_peer[OVPN_A_PEER_VPN_IPV4]) { + char buf[INET_ADDRSTRLEN]; + + inet_ntop(AF_INET, nla_data(attrs_peer[OVPN_A_PEER_VPN_IPV4]), + buf, sizeof(buf)); + fprintf(stderr, "\tVPN IPv4: %s\n", buf); + } + + if (attrs_peer[OVPN_A_PEER_VPN_IPV6]) { + char buf[INET6_ADDRSTRLEN]; + + inet_ntop(AF_INET6, nla_data(attrs_peer[OVPN_A_PEER_VPN_IPV6]), + buf, sizeof(buf)); + fprintf(stderr, "\tVPN IPv6: %s\n", buf); + } + + if (attrs_peer[OVPN_A_PEER_LOCAL_PORT]) + port = ntohs(nla_get_u16(attrs_peer[OVPN_A_PEER_LOCAL_PORT])); + + if (attrs_peer[OVPN_A_PEER_SOCKADDR_REMOTE]) { + struct sockaddr_storage ss; + struct sockaddr_in6 *in6 = (struct sockaddr_in6 *)&ss; + struct sockaddr_in *in = (struct sockaddr_in *)&ss; + + memcpy(&ss, nla_data(attrs_peer[OVPN_A_PEER_SOCKADDR_REMOTE]), + nla_len(attrs_peer[OVPN_A_PEER_SOCKADDR_REMOTE])); + + if (in->sin_family == AF_INET) { + char buf[INET_ADDRSTRLEN]; + + if (attrs_peer[OVPN_A_PEER_LOCAL_IP]) { + void *p = attrs_peer[OVPN_A_PEER_LOCAL_IP]; + + inet_ntop(AF_INET, nla_data(p), buf, + sizeof(buf)); + fprintf(stderr, "\tLocal: %s:%hu\n", buf, + port); + } + + inet_ntop(AF_INET, &in->sin_addr, buf, sizeof(buf)); + fprintf(stderr, "\tRemote: %s:%u\n", buf, + ntohs(in->sin_port)); + } else if (in->sin_family == AF_INET6) { + char buf[INET6_ADDRSTRLEN]; + + if (attrs_peer[OVPN_A_PEER_LOCAL_IP]) { + void *p = attrs_peer[OVPN_A_PEER_LOCAL_IP]; + + inet_ntop(AF_INET6, nla_data(p), buf, + sizeof(buf)); + fprintf(stderr, "\tLocal: %s\n", buf); + } + + inet_ntop(AF_INET6, &in6->sin6_addr, buf, sizeof(buf)); + fprintf(stderr, "\tRemote: %s:%u (scope-id: %u)\n", buf, + ntohs(in6->sin6_port), + ntohl(in6->sin6_scope_id)); + } + } + + if (attrs_peer[OVPN_A_PEER_KEEPALIVE_INTERVAL]) { + void *p = attrs_peer[OVPN_A_PEER_KEEPALIVE_INTERVAL]; + + fprintf(stderr, "\tKeepalive interval: %u sec\n", + nla_get_u32(p)); + } + + if (attrs_peer[OVPN_A_PEER_KEEPALIVE_TIMEOUT]) + fprintf(stderr, "\tKeepalive timeout: %u sec\n", + nla_get_u32(attrs_peer[OVPN_A_PEER_KEEPALIVE_TIMEOUT])); + + if (attrs_peer[OVPN_A_PEER_VPN_RX_BYTES]) + fprintf(stderr, "\tVPN RX bytes: %" PRIu64 "\n", + nla_get_uint(attrs_peer[OVPN_A_PEER_VPN_RX_BYTES])); + + if (attrs_peer[OVPN_A_PEER_VPN_TX_BYTES]) + fprintf(stderr, "\tVPN TX bytes: %" PRIu64 "\n", + nla_get_uint(attrs_peer[OVPN_A_PEER_VPN_TX_BYTES])); + + if (attrs_peer[OVPN_A_PEER_VPN_RX_PACKETS]) + fprintf(stderr, "\tVPN RX packets: %" PRIu64 "\n", + nla_get_uint(attrs_peer[OVPN_A_PEER_VPN_RX_PACKETS])); + + if (attrs_peer[OVPN_A_PEER_VPN_TX_PACKETS]) + fprintf(stderr, "\tVPN TX packets: %" PRIu64 "\n", + nla_get_uint(attrs_peer[OVPN_A_PEER_VPN_TX_PACKETS])); + + if (attrs_peer[OVPN_A_PEER_LINK_RX_BYTES]) + fprintf(stderr, "\tLINK RX bytes: %" PRIu64 "\n", + nla_get_uint(attrs_peer[OVPN_A_PEER_LINK_RX_BYTES])); + + if (attrs_peer[OVPN_A_PEER_LINK_TX_BYTES]) + fprintf(stderr, "\tLINK TX bytes: %" PRIu64 "\n", + nla_get_uint(attrs_peer[OVPN_A_PEER_LINK_TX_BYTES])); + + if (attrs_peer[OVPN_A_PEER_LINK_RX_PACKETS]) + fprintf(stderr, "\tLINK RX packets: %" PRIu64 "\n", + nla_get_uint(attrs_peer[OVPN_A_PEER_LINK_RX_PACKETS])); + + if (attrs_peer[OVPN_A_PEER_LINK_TX_PACKETS]) + fprintf(stderr, "\tLINK TX packets: %" PRIu64 "\n", + nla_get_uint(attrs_peer[OVPN_A_PEER_LINK_TX_PACKETS])); + + return NL_SKIP; +} + +static int ovpn_get_peer(struct ovpn_ctx *ovpn) +{ + int flags = 0, ret = -1; + struct nlattr *attr; + struct nl_ctx *ctx; + + if (ovpn->peer_id == PEER_ID_UNDEF) + flags = NLM_F_DUMP; + + ctx = nl_ctx_alloc_flags(ovpn, OVPN_CMD_GET_PEER, flags); + if (!ctx) + return -ENOMEM; + + if (ovpn->peer_id != PEER_ID_UNDEF) { + attr = nla_nest_start(ctx->nl_msg, OVPN_A_PEER); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_ID, ovpn->peer_id); + nla_nest_end(ctx->nl_msg, attr); + } + + ret = ovpn_nl_msg_send(ctx, ovpn_handle_peer); +nla_put_failure: + nl_ctx_free(ctx); + return ret; +} + +static int ovpn_new_key(struct ovpn_ctx *ovpn) +{ + struct nlattr *peer, *keyconf, *key_dir; + struct nl_ctx *ctx; + int ret = -1; + + ctx = nl_ctx_alloc(ovpn, OVPN_CMD_SET_KEY); + if (!ctx) + return -ENOMEM; + + peer = nla_nest_start(ctx->nl_msg, OVPN_A_PEER); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_ID, ovpn->peer_id); + + keyconf = nla_nest_start(ctx->nl_msg, OVPN_A_PEER_KEYCONF); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_KEYCONF_SLOT, ovpn->key_slot); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_KEYCONF_KEY_ID, ovpn->key_id); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_KEYCONF_CIPHER_ALG, ovpn->cipher); + + key_dir = nla_nest_start(ctx->nl_msg, OVPN_A_KEYCONF_ENCRYPT_DIR); + NLA_PUT(ctx->nl_msg, OVPN_A_KEYDIR_CIPHER_KEY, KEY_LEN, ovpn->key_enc); + NLA_PUT(ctx->nl_msg, OVPN_A_KEYDIR_NONCE_TAIL, NONCE_LEN, ovpn->nonce); + nla_nest_end(ctx->nl_msg, key_dir); + + key_dir = nla_nest_start(ctx->nl_msg, OVPN_A_KEYCONF_DECRYPT_DIR); + NLA_PUT(ctx->nl_msg, OVPN_A_KEYDIR_CIPHER_KEY, KEY_LEN, ovpn->key_dec); + NLA_PUT(ctx->nl_msg, OVPN_A_KEYDIR_NONCE_TAIL, NONCE_LEN, ovpn->nonce); + nla_nest_end(ctx->nl_msg, key_dir); + + nla_nest_end(ctx->nl_msg, keyconf); + + nla_nest_end(ctx->nl_msg, peer); + + ret = ovpn_nl_msg_send(ctx, NULL); +nla_put_failure: + nl_ctx_free(ctx); + return ret; +} + +static int ovpn_del_key(struct ovpn_ctx *ovpn) +{ + struct nlattr *peer, *keyconf; + struct nl_ctx *ctx; + int ret = -1; + + ctx = nl_ctx_alloc(ovpn, OVPN_CMD_DEL_KEY); + if (!ctx) + return -ENOMEM; + + peer = nla_nest_start(ctx->nl_msg, OVPN_A_PEER); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_ID, ovpn->peer_id); + + keyconf = nla_nest_start(ctx->nl_msg, OVPN_A_PEER_KEYCONF); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_KEYCONF_SLOT, OVPN_KEY_SLOT_PRIMARY); + nla_nest_end(ctx->nl_msg, keyconf); + + nla_nest_end(ctx->nl_msg, peer); + + ret = ovpn_nl_msg_send(ctx, NULL); +nla_put_failure: + nl_ctx_free(ctx); + return ret; +} + +static int ovpn_swap_keys(struct ovpn_ctx *ovpn) +{ + struct nlattr *peer; + struct nl_ctx *ctx; + int ret = -1; + + ctx = nl_ctx_alloc(ovpn, OVPN_CMD_SWAP_KEYS); + if (!ctx) + return -ENOMEM; + + peer = nla_nest_start(ctx->nl_msg, OVPN_A_PEER); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_ID, ovpn->peer_id); + nla_nest_end(ctx->nl_msg, peer); + + ret = ovpn_nl_msg_send(ctx, NULL); +nla_put_failure: + nl_ctx_free(ctx); + return ret; +} + +static int ovpn_handle_iface(struct nl_msg *msg, void *arg) +{ + struct genlmsghdr *gnlh = nlmsg_data(nlmsg_hdr(msg)); + struct nlattr *attrs[OVPN_A_MAX + 1]; + + nla_parse(attrs, OVPN_A_MAX, genlmsg_attrdata(gnlh, 0), + genlmsg_attrlen(gnlh, 0), NULL); + + if (!attrs[OVPN_A_IFNAME]) { + fprintf(stderr, "no ifname in netlink message\n"); + return NL_SKIP; + } + + fprintf(stderr, "Created ifname: %s\n", + (char *)nla_data(attrs[OVPN_A_IFNAME])); + + return NL_SKIP; +} + +static int ovpn_new_iface(struct ovpn_ctx *ovpn) +{ + struct nl_ctx *ctx; + int ret = -1; + + ctx = nl_ctx_alloc(ovpn, OVPN_CMD_NEW_IFACE); + if (!ctx) + return -ENOMEM; + + NLA_PUT(ctx->nl_msg, OVPN_A_IFNAME, strlen(ovpn->ifname) + 1, + ovpn->ifname); + + if (ovpn->mode_set) + NLA_PUT_U32(ctx->nl_msg, OVPN_A_MODE, ovpn->mode); + + fprintf(stdout, "Creating interface %s with mode %u\n", ovpn->ifname, + ovpn->mode); + + ret = ovpn_nl_msg_send(ctx, ovpn_handle_iface); +nla_put_failure: + nl_ctx_free(ctx); + return ret; +} + +static int ovpn_del_iface(struct ovpn_ctx *ovpn) +{ + struct nl_ctx *ctx; + int ret = -1; + + ctx = nl_ctx_alloc(ovpn, OVPN_CMD_DEL_IFACE); + if (!ctx) + return -ENOMEM; + + ret = ovpn_nl_msg_send(ctx, NULL); + nl_ctx_free(ctx); + return ret; +} + +static int nl_seq_check(struct nl_msg *msg, void *arg) +{ + return NL_OK; +} + +struct mcast_handler_args { + const char *group; + int id; +}; + +static int mcast_family_handler(struct nl_msg *msg, void *arg) +{ + struct mcast_handler_args *grp = arg; + struct nlattr *tb[CTRL_ATTR_MAX + 1]; + struct genlmsghdr *gnlh = nlmsg_data(nlmsg_hdr(msg)); + struct nlattr *mcgrp; + int rem_mcgrp; + + nla_parse(tb, CTRL_ATTR_MAX, genlmsg_attrdata(gnlh, 0), + genlmsg_attrlen(gnlh, 0), NULL); + + if (!tb[CTRL_ATTR_MCAST_GROUPS]) + return NL_SKIP; + + nla_for_each_nested(mcgrp, tb[CTRL_ATTR_MCAST_GROUPS], rem_mcgrp) { + struct nlattr *tb_mcgrp[CTRL_ATTR_MCAST_GRP_MAX + 1]; + + nla_parse(tb_mcgrp, CTRL_ATTR_MCAST_GRP_MAX, + nla_data(mcgrp), nla_len(mcgrp), NULL); + + if (!tb_mcgrp[CTRL_ATTR_MCAST_GRP_NAME] || + !tb_mcgrp[CTRL_ATTR_MCAST_GRP_ID]) + continue; + if (strncmp(nla_data(tb_mcgrp[CTRL_ATTR_MCAST_GRP_NAME]), + grp->group, nla_len(tb_mcgrp[CTRL_ATTR_MCAST_GRP_NAME]))) + continue; + grp->id = nla_get_u32(tb_mcgrp[CTRL_ATTR_MCAST_GRP_ID]); + break; + } + + return NL_SKIP; +} + +static int mcast_error_handler(struct sockaddr_nl *nla, struct nlmsgerr *err, + void *arg) +{ + int *ret = arg; + + *ret = err->error; + return NL_STOP; +} + +static int mcast_ack_handler(struct nl_msg *msg, void *arg) +{ + int *ret = arg; + + *ret = 0; + return NL_STOP; +} + +static int ovpn_handle_msg(struct nl_msg *msg, void *arg) +{ + struct genlmsghdr *gnlh = nlmsg_data(nlmsg_hdr(msg)); + struct nlattr *attrs[OVPN_A_MAX + 1]; + struct nlmsghdr *nlh = nlmsg_hdr(msg); + //enum ovpn_del_peer_reason reason; + char ifname[IF_NAMESIZE]; + __u32 ifindex; + + fprintf(stderr, "received message from ovpn-dco\n"); + + if (!genlmsg_valid_hdr(nlh, 0)) { + fprintf(stderr, "invalid header\n"); + return NL_STOP; + } + + if (nla_parse(attrs, OVPN_A_MAX, genlmsg_attrdata(gnlh, 0), + genlmsg_attrlen(gnlh, 0), NULL)) { + fprintf(stderr, "received bogus data from ovpn-dco\n"); + return NL_STOP; + } + + if (!attrs[OVPN_A_IFINDEX]) { + fprintf(stderr, "no ifindex in this message\n"); + return NL_STOP; + } + + ifindex = nla_get_u32(attrs[OVPN_A_IFINDEX]); + if (!if_indextoname(ifindex, ifname)) { + fprintf(stderr, "cannot resolve ifname for ifindex: %u\n", + ifindex); + return NL_STOP; + } + + switch (gnlh->cmd) { + case OVPN_CMD_DEL_PEER: + /*if (!attrs[OVPN_A_DEL_PEER_REASON]) { + * fprintf(stderr, "no reason in DEL_PEER message\n"); + * return NL_STOP; + *} + * + *reason = nla_get_u8(attrs[OVPN_A_DEL_PEER_REASON]); + *fprintf(stderr, + * "received CMD_DEL_PEER, ifname: %s reason: %d\n", + * ifname, reason); + */ + fprintf(stdout, "received CMD_DEL_PEER\n"); + break; + default: + fprintf(stderr, "received unknown command: %d\n", gnlh->cmd); + return NL_STOP; + } + + return NL_OK; +} + +static int ovpn_get_mcast_id(struct nl_sock *sock, const char *family, + const char *group) +{ + struct nl_msg *msg; + struct nl_cb *cb; + int ret, ctrlid; + struct mcast_handler_args grp = { + .group = group, + .id = -ENOENT, + }; + + msg = nlmsg_alloc(); + if (!msg) + return -ENOMEM; + + cb = nl_cb_alloc(NL_CB_DEFAULT); + if (!cb) { + ret = -ENOMEM; + goto out_fail_cb; + } + + ctrlid = genl_ctrl_resolve(sock, "nlctrl"); + + genlmsg_put(msg, 0, 0, ctrlid, 0, 0, CTRL_CMD_GETFAMILY, 0); + + ret = -ENOBUFS; + NLA_PUT_STRING(msg, CTRL_ATTR_FAMILY_NAME, family); + + ret = nl_send_auto_complete(sock, msg); + if (ret < 0) + goto nla_put_failure; + + ret = 1; + + nl_cb_err(cb, NL_CB_CUSTOM, mcast_error_handler, &ret); + nl_cb_set(cb, NL_CB_ACK, NL_CB_CUSTOM, mcast_ack_handler, &ret); + nl_cb_set(cb, NL_CB_VALID, NL_CB_CUSTOM, mcast_family_handler, &grp); + + while (ret > 0) + nl_recvmsgs(sock, cb); + + if (ret == 0) + ret = grp.id; + nla_put_failure: + nl_cb_put(cb); + out_fail_cb: + nlmsg_free(msg); + return ret; +} + +static void ovpn_listen_mcast(void) +{ + struct nl_sock *sock; + struct nl_cb *cb; + int mcid, ret; + + sock = nl_socket_alloc(); + if (!sock) { + fprintf(stderr, "cannot allocate netlink socket\n"); + goto err_free; + } + + nl_socket_set_buffer_size(sock, 8192, 8192); + + ret = genl_connect(sock); + if (ret < 0) { + fprintf(stderr, "cannot connect to generic netlink: %s\n", + nl_geterror(ret)); + goto err_free; + } + + mcid = ovpn_get_mcast_id(sock, OVPN_FAMILY_NAME, OVPN_MCGRP_PEERS); + if (mcid < 0) { + fprintf(stderr, "cannot get mcast group: %s\n", + nl_geterror(mcid)); + goto err_free; + } + + ret = nl_socket_add_membership(sock, mcid); + if (ret) { + fprintf(stderr, "failed to join mcast group: %d\n", ret); + goto err_free; + } + + ret = 0; + cb = nl_cb_alloc(NL_CB_DEFAULT); + nl_cb_set(cb, NL_CB_SEQ_CHECK, NL_CB_CUSTOM, nl_seq_check, NULL); + nl_cb_set(cb, NL_CB_VALID, NL_CB_CUSTOM, ovpn_handle_msg, &ret); + nl_cb_err(cb, NL_CB_CUSTOM, ovpn_nl_cb_error, &ret); + + while (ret != -EINTR) + ret = nl_recvmsgs(sock, cb); + + nl_cb_put(cb); +err_free: + nl_socket_free(sock); +} + +static void usage(const char *cmd) +{ + fprintf(stderr, "Error: invalid arguments.\n\n"); + fprintf(stderr, + "Usage %s <iface> <connect|listen|new_peer|new_multi_peer|set_peer|del_peer|new_key|del_key|recv|send|listen_mcast> [arguments..]\n", + cmd); + fprintf(stderr, "\tiface: tun interface name\n\n"); + + fprintf(stderr, + "* connect <peer_id> <raddr> <rport> <key file>: start connecting peer of TCP-based VPN session\n"); + fprintf(stderr, "\tpeer-id: peer ID of the connecting peer\n"); + fprintf(stderr, "\tremote-addr: peer IP address\n"); + fprintf(stderr, "\tremote-port: peer TCP port\n"); + + fprintf(stderr, + "* listen <lport> <peers_file>: listen for incoming peer TCP connections\n"); + fprintf(stderr, "\tlport: src TCP port\n"); + fprintf(stderr, + "\tpeers_file: file containing one peer per line: Line format:\n"); + fprintf(stderr, "\t\t<peer_id> <vpnaddr>\n\n"); + + fprintf(stderr, + "* new_peer <lport> <peer-id> <raddr> <rport> [vpnaddr]: add new peer\n"); + fprintf(stderr, + "\tpeer-id: peer ID to be used in data packets to/from this peer\n"); + fprintf(stderr, "\tlocal-port: local UDP port\n"); + fprintf(stderr, "\tremote-addr: peer IP address\n"); + fprintf(stderr, "\tremote-port: peer UDP port\n"); + fprintf(stderr, "\tvpnaddr: peer VPN IP\n\n"); + + fprintf(stderr, + "* new_multi_peer <lport> <file>: add multiple peers as listed in the file\n"); + fprintf(stderr, "\tlport: local UDP port to bind to\n"); + fprintf(stderr, + "\tfile: text file containing one peer per line. Line format:\n"); + fprintf(stderr, "\t\t<peer-id> <raddr> <rport> <vpnaddr>\n\n"); + + fprintf(stderr, + "* set_peer <peer-id> <keepalive_interval> <keepalive_timeout>: set peer attributes\n"); + fprintf(stderr, "\tpeer-id: peer ID of the peer to modify\n"); + fprintf(stderr, + "\tkeepalive_interval: interval for sending ping messages\n"); + fprintf(stderr, + "\tkeepalive_timeout: time after which a peer is timed out\n\n"); + + fprintf(stderr, "* del_peer <peer-id>: delete peer\n"); + fprintf(stderr, "\tpeer-id: peer ID of the peer to delete\n\n"); + + fprintf(stderr, + "* new_key <peer-id> <slot> <key_id> <cipher> <key_dir> <key_file>: set data channel key\n"); + fprintf(stderr, + "\tpeer-id: peer ID of the peer to configure the key for\n"); + fprintf(stderr, + "\tcipher: cipher to use, supported: aes (AES-GCM), chachapoly (CHACHA20POLY1305), none\n"); + fprintf(stderr, + "\tkey_dir: key direction, must 0 on one host and 1 on the other\n"); + fprintf(stderr, "\tkey_file: file containing the pre-shared key\n\n"); + + fprintf(stderr, + "* del_key <peer-id>: erase existing data channel key\n"); + fprintf(stderr, "\tpeer-id: peer ID of the peer to modify\n\n"); + + fprintf(stderr, + "* swap_keys <peer-id>: swap primary and seconday key slots\n"); + fprintf(stderr, "\tpeer-id: peer ID of the peer to modify\n\n"); + + fprintf(stderr, + "* listen_mcast: listen to ovpn-dco netlink multicast messages\n"); +} + +static int ovpn_parse_remote(struct ovpn_ctx *ovpn, const char *host, + const char *service, const char *vpnip) +{ + int ret; + struct addrinfo *result; + struct addrinfo hints = { + .ai_family = ovpn->sa_family, + .ai_socktype = SOCK_DGRAM, + .ai_protocol = IPPROTO_UDP + }; + + if (host) { + ret = getaddrinfo(host, service, &hints, &result); + if (ret == EAI_NONAME || ret == EAI_FAIL) + return -1; + + if (!(result->ai_family == AF_INET && + result->ai_addrlen == sizeof(struct sockaddr_in)) && + !(result->ai_family == AF_INET6 && + result->ai_addrlen == sizeof(struct sockaddr_in6))) { + ret = -EINVAL; + goto out; + } + + memcpy(&ovpn->remote, result->ai_addr, result->ai_addrlen); + } + + if (vpnip) { + ret = getaddrinfo(vpnip, NULL, &hints, &result); + if (ret == EAI_NONAME || ret == EAI_FAIL) + return -1; + + if (!(result->ai_family == AF_INET && + result->ai_addrlen == sizeof(struct sockaddr_in)) && + !(result->ai_family == AF_INET6 && + result->ai_addrlen == sizeof(struct sockaddr_in6))) { + ret = -EINVAL; + goto out; + } + + memcpy(&ovpn->peer_ip, result->ai_addr, result->ai_addrlen); + ovpn->sa_family = result->ai_family; + + ovpn->peer_ip_set = true; + } + + ret = 0; +out: + freeaddrinfo(result); + return ret; +} + +static int ovpn_parse_new_peer(struct ovpn_ctx *ovpn, const char *peer_id, + const char *raddr, const char *rport, + const char *vpnip) +{ + ovpn->peer_id = strtoul(peer_id, NULL, 10); + if (errno == ERANGE) { + fprintf(stderr, "peer ID value out of range\n"); + return -1; + } + + return ovpn_parse_remote(ovpn, raddr, rport, vpnip); +} + +static void ovpn_send_tcp_data(int socket) +{ + uint16_t len = htons(1000); + uint8_t buf[1002]; + int ret; + + memcpy(buf, &len, sizeof(len)); + memset(buf + sizeof(len), 0x86, sizeof(buf) - sizeof(len)); + + ret = send(socket, buf, sizeof(buf), 0); + + fprintf(stdout, "Sent %u bytes over TCP socket\n", ret); +} + +static void ovpn_recv_tcp_data(int socket) +{ + uint8_t buf[1002]; + uint16_t len; + int ret; + + ret = recv(socket, buf, sizeof(buf), 0); + + if (ret < 2) { + fprintf(stderr, ">>>> Error while reading TCP data: %d\n", ret); + return; + } + + memcpy(&len, buf, sizeof(len)); + len = ntohs(len); + + fprintf(stdout, ">>>> Received %u bytes over TCP socket, header: %u\n", + ret, len); + +/* int i; + * for (i = 2; i < ret; i++) { + * fprintf(stdout, "0x%.2x ", buf[i]); + * if (i && !((i - 2) % 16)) + * fprintf(stdout, "\n"); + * } + * fprintf(stdout, "\n"); + */ +} + +static int ovpn_parse_set_peer(struct ovpn_ctx *ovpn, int argc, char *argv[]) +{ + if (argc < 5) { + usage(argv[0]); + return -1; + } + + ovpn->keepalive_interval = strtoul(argv[3], NULL, 10); + if (errno == ERANGE) { + fprintf(stderr, "keepalive interval value out of range\n"); + return -1; + } + + ovpn->keepalive_timeout = strtoul(argv[4], NULL, 10); + if (errno == ERANGE) { + fprintf(stderr, "keepalive interval value out of range\n"); + return -1; + } + + return 0; +} + +int main(int argc, char *argv[]) +{ + struct ovpn_ctx ovpn; +// struct nl_ctx *ctx; + int ret; + + if (argc < 2) { + usage(argv[0]); + return -1; + } + + memset(&ovpn, 0, sizeof(ovpn)); + ovpn.sa_family = AF_INET; + ovpn.cli_socket = -1; + + if (argc > 2) { + strscpy(ovpn.ifname, argv[2], IFNAMSIZ - 1); + ovpn.ifname[IFNAMSIZ - 1] = '\0'; + } + + /* all commands except new_iface expect a valid ifindex */ + if (strcmp(argv[1], "new_iface")) { + /* in this case a ifname MUST be defined */ + if (argc < 3) { + usage(argv[0]); + return -1; + } + + ovpn.ifindex = if_nametoindex(ovpn.ifname); + if (!ovpn.ifindex) { + fprintf(stderr, "cannot find interface: %s\n", + strerror(errno)); + return -1; + } + } + + if (!strcmp(argv[1], "new_iface")) { + if (argc > 3) { + if (!strcmp(argv[3], "P2P")) { + ovpn.mode = OVPN_MODE_P2P; + } else if (!strcmp(argv[3], "MP")) { + ovpn.mode = OVPN_MODE_MP; + } else { + fprintf(stderr, "Cannot parse iface mode: %s\n", + argv[3]); + return -1; + } + ovpn.mode_set = true; + } + + ret = ovpn_new_iface(&ovpn); + if (ret < 0) { + fprintf(stderr, "Cannot create interface %s: %d\n", + ovpn.ifname, ret); + return -1; + } + } else if (!strcmp(argv[1], "del_iface")) { + ret = ovpn_del_iface(&ovpn); + if (ret < 0) { + fprintf(stderr, "Cannot delete interface %s: %d\n", + ovpn.ifname, ret); + return -1; + } + } else if (!strcmp(argv[1], "listen")) { + char peer_id[10], vpnip[100]; + int n; + FILE *fp; + + if (argc < 4) { + usage(argv[0]); + return -1; + } + + ovpn.lport = strtoul(argv[3], NULL, 10); + if (errno == ERANGE || ovpn.lport > 65535) { + fprintf(stderr, "lport value out of range\n"); + return -1; + } + + if (argc > 4 && !strcmp(argv[4], "ipv6")) + ovpn.sa_family = AF_INET6; + + ret = ovpn_listen(&ovpn, ovpn.sa_family); + if (ret < 0) { + fprintf(stderr, "cannot listen on TCP socket\n"); + return ret; + } + + fp = fopen(argv[4], "r"); + if (!fp) { + fprintf(stderr, "cannot open file: %s\n", argv[4]); + return -1; + } + + while ((n = fscanf(fp, "%s %s\n", peer_id, vpnip)) == 2) { + struct ovpn_ctx peer_ctx = { 0 }; + + peer_ctx.ifindex = ovpn.ifindex; + peer_ctx.sa_family = ovpn.sa_family; + + peer_ctx.socket = ovpn_accept(&ovpn); + if (peer_ctx.socket < 0) { + fprintf(stderr, "cannot accept connection!\n"); + return -1; + } + + /* store the socket of the first peer to test TCP I/O */ + if (ovpn.cli_socket < 0) + ovpn.cli_socket = peer_ctx.socket; + + ret = ovpn_parse_new_peer(&peer_ctx, peer_id, NULL, + NULL, vpnip); + if (ret < 0) { + fprintf(stderr, "error while parsing line\n"); + return -1; + } + + ret = ovpn_new_peer(&peer_ctx, true); + if (ret < 0) { + fprintf(stderr, + "cannot add peer to VPN: %s %s\n", + peer_id, vpnip); + return ret; + } + } + + if (ovpn.cli_socket >= 0) + ovpn_recv_tcp_data(ovpn.cli_socket); + } else if (!strcmp(argv[1], "connect")) { + if (argc < 5) { + usage(argv[0]); + return -1; + } + + ovpn.sa_family = AF_INET; + + ret = ovpn_parse_new_peer(&ovpn, argv[3], argv[4], argv[5], + NULL); + if (ret < 0) { + fprintf(stderr, "Cannot parse remote peer data\n"); + return ret; + } + + ret = ovpn_connect(&ovpn); + if (ret < 0) { + fprintf(stderr, "cannot connect TCP socket\n"); + return ret; + } + + ret = ovpn_new_peer(&ovpn, true); + if (ret < 0) { + fprintf(stderr, "cannot add peer to VPN\n"); + close(ovpn.socket); + return ret; + } + + if (argc > 6) { + ovpn.key_slot = OVPN_KEY_SLOT_PRIMARY; + ovpn.key_id = 0; + ovpn.cipher = OVPN_CIPHER_ALG_AES_GCM; + ovpn.key_dir = KEY_DIR_OUT; + + ret = ovpn_read_key(argv[6], &ovpn); + if (ret) + return ret; + + ret = ovpn_new_key(&ovpn); + if (ret < 0) { + fprintf(stderr, "cannot set key\n"); + return ret; + } + + ovpn_send_tcp_data(ovpn.socket); + } + } else if (!strcmp(argv[1], "new_peer")) { + if (argc < 7) { + usage(argv[0]); + return -1; + } + + ovpn.lport = strtoul(argv[3], NULL, 10); + if (errno == ERANGE || ovpn.lport > 65535) { + fprintf(stderr, "lport value out of range\n"); + return -1; + } + + const char *vpnip = (argc > 7) ? argv[7] : NULL; + + ret = ovpn_parse_new_peer(&ovpn, argv[4], argv[5], argv[6], + vpnip); + if (ret < 0) + return ret; + + ret = ovpn_udp_socket(&ovpn, AF_INET6); //ovpn.sa_family ? + if (ret < 0) + return ret; + + ret = ovpn_new_peer(&ovpn, false); + if (ret < 0) { + fprintf(stderr, "cannot add peer to VPN\n"); + return ret; + } + } else if (!strcmp(argv[1], "new_multi_peer")) { + char peer_id[10], raddr[128], rport[10], vpnip[100]; + FILE *fp; + int n; + + if (argc < 5) { + usage(argv[0]); + return -1; + } + + ovpn.lport = strtoul(argv[3], NULL, 10); + if (errno == ERANGE || ovpn.lport > 65535) { + fprintf(stderr, "lport value out of range\n"); + return -1; + } + + fp = fopen(argv[4], "r"); + if (!fp) { + fprintf(stderr, "cannot open file: %s\n", argv[4]); + return -1; + } + + ret = ovpn_udp_socket(&ovpn, AF_INET6); + if (ret < 0) + return ret; + + while ((n = fscanf(fp, "%s %s %s %s\n", peer_id, raddr, rport, + vpnip)) == 4) { + struct ovpn_ctx peer_ctx = { 0 }; + + peer_ctx.ifindex = ovpn.ifindex; + peer_ctx.socket = ovpn.socket; + peer_ctx.sa_family = AF_UNSPEC; + + ret = ovpn_parse_new_peer(&peer_ctx, peer_id, raddr, + rport, vpnip); + if (ret < 0) { + fprintf(stderr, "error while parsing line\n"); + return -1; + } + + ret = ovpn_new_peer(&peer_ctx, false); + if (ret < 0) { + fprintf(stderr, + "cannot add peer to VPN: %s %s %s %s\n", + peer_id, raddr, rport, vpnip); + return ret; + } + } + } else if (!strcmp(argv[1], "set_peer")) { + ovpn.peer_id = strtoul(argv[3], NULL, 10); + if (errno == ERANGE) { + fprintf(stderr, "peer ID value out of range\n"); + return -1; + } + + argv++; + argc--; + + ret = ovpn_parse_set_peer(&ovpn, argc, argv); + if (ret < 0) + return ret; + + ret = ovpn_set_peer(&ovpn); + if (ret < 0) { + fprintf(stderr, "cannot set peer to VPN\n"); + return ret; + } + } else if (!strcmp(argv[1], "del_peer")) { + if (argc < 4) { + usage(argv[0]); + return -1; + } + + ovpn.peer_id = strtoul(argv[3], NULL, 10); + if (errno == ERANGE) { + fprintf(stderr, "peer ID value out of range\n"); + return -1; + } + + ret = ovpn_del_peer(&ovpn); + if (ret < 0) { + fprintf(stderr, "cannot delete peer to VPN\n"); + return ret; + } + } else if (!strcmp(argv[1], "get_peer")) { + ovpn.peer_id = PEER_ID_UNDEF; + if (argc > 3) + ovpn.peer_id = strtoul(argv[3], NULL, 10); + + fprintf(stderr, "List of peers connected to: %s\n", + ovpn.ifname); + + ret = ovpn_get_peer(&ovpn); + if (ret < 0) { + fprintf(stderr, "cannot get peer(s): %d\n", ret); + return ret; + } + } else if (!strcmp(argv[1], "new_key")) { + if (argc < 8) { + usage(argv[0]); + return -1; + } + + ovpn.peer_id = strtoul(argv[3], NULL, 10); + if (errno == ERANGE) { + fprintf(stderr, "peer ID value out of range\n"); + return -1; + } + + int slot = strtoul(argv[4], NULL, 10); + + if (errno == ERANGE || slot < 1 || slot > 2) { + fprintf(stderr, "ket slot out of range\n"); + return -1; + } + + switch (slot) { + case 1: + ovpn.key_slot = OVPN_KEY_SLOT_PRIMARY; + break; + case 2: + ovpn.key_slot = OVPN_KEY_SLOT_SECONDARY; + break; + } + + ovpn.key_id = strtoul(argv[5], NULL, 10); + if (errno == ERANGE || ovpn.key_id > 2) { + fprintf(stderr, "ket ID out of range\n"); + return -1; + } + + ret = ovpn_read_cipher(argv[6], &ovpn); + if (ret < 0) + return ret; + + ret = ovpn_read_key_direction(argv[7], &ovpn); + if (ret < 0) + return ret; + + ret = ovpn_read_key(argv[8], &ovpn); + if (ret) + return ret; + + ret = ovpn_new_key(&ovpn); + if (ret < 0) { + fprintf(stderr, "cannot set key\n"); + return ret; + } + } else if (!strcmp(argv[1], "del_key")) { + if (argc < 3) { + usage(argv[0]); + return -1; + } + + ovpn.peer_id = strtoul(argv[3], NULL, 10); + if (errno == ERANGE) { + fprintf(stderr, "peer ID value out of range\n"); + return -1; + } + + argv++; + argc--; + + ret = ovpn_del_key(&ovpn); + if (ret < 0) { + fprintf(stderr, "cannot delete key\n"); + return ret; + } + } else if (!strcmp(argv[1], "swap_keys")) { + if (argc < 3) { + usage(argv[0]); + return -1; + } + + ovpn.peer_id = strtoul(argv[3], NULL, 10); + if (errno == ERANGE) { + fprintf(stderr, "peer ID value out of range\n"); + return -1; + } + + argv++; + argc--; + + ret = ovpn_swap_keys(&ovpn); + if (ret < 0) { + fprintf(stderr, "cannot swap keys\n"); + return ret; + } + } else if (!strcmp(argv[1], "listen_mcast")) { + ovpn_listen_mcast(); + } else { + usage(argv[0]); + return -1; + } + + return ret; +} diff --git a/tools/testing/selftests/net/ovpn/tcp_peers.txt b/tools/testing/selftests/net/ovpn/tcp_peers.txt new file mode 100644 index 000000000000..3b7f68bb7f64 --- /dev/null +++ b/tools/testing/selftests/net/ovpn/tcp_peers.txt @@ -0,0 +1 @@ +1 5.5.5.2 diff --git a/tools/testing/selftests/net/ovpn/udp_peers.txt b/tools/testing/selftests/net/ovpn/udp_peers.txt new file mode 100644 index 000000000000..32f14bd9347a --- /dev/null +++ b/tools/testing/selftests/net/ovpn/udp_peers.txt @@ -0,0 +1,5 @@ +1 10.10.1.2 1 5.5.5.2 +2 10.10.2.2 1 5.5.5.3 +3 10.10.3.2 1 5.5.5.4 +4 10.10.4.2 1 5.5.5.5 +5 10.10.5.2 1 5.5.5.6 -- 2.44.2

1 year, 4 months

1
0
0 0

[PATCH v5 0/2] Add test to distinguish between thread's signal mask and ucontext_t

by Dev Jain

This patch series is motivated by the following observation: Raise a signal, jump to signal handler. The ucontext_t structure dumped by kernel to userspace has a uc_sigmask field having the mask of blocked signals. If you run a fresh minimalistic program doing this, this field is empty, even if you block some signals while registering the handler with sigaction(). Here is what the man-pages have to say: sigaction(2): "sa_mask specifies a mask of signals which should be blocked (i.e., added to the signal mask of the thread in which the signal handler is invoked) during execution of the signal handler. In addition, the signal which triggered the handler will be blocked, unless the SA_NODEFER flag is used." signal(7): Under "Execution of signal handlers", (1.3) implies: "The thread's current signal mask is accessible via the ucontext_t object that is pointed to by the third argument of the signal handler." But, (1.4) states: "Any signals specified in act->sa_mask when registering the handler with sigprocmask(2) are added to the thread's signal mask. The signal being delivered is also added to the signal mask, unless SA_NODEFER was specified when registering the handler. These signals are thus blocked while the handler executes." There clearly is no distinction being made in the man pages between "Thread's signal mask" and ucontext_t; this logically should imply that a signal blocked by populating struct sigaction should be visible in ucontext_t. Here is what the kernel code does (for Aarch64): do_signal() -> handle_signal() -> sigmask_to_save(), which returns &current->blocked, is passed to setup_rt_frame() -> setup_sigframe() -> __copy_to_user(). Hence, &current->blocked is copied to ucontext_t exposed to userspace. Returning back to handle_signal(), signal_setup_done() -> signal_delivered() -> sigorsets() and set_current_blocked() are responsible for using information from struct ksignal ksig, which was populated through the sigaction() system call in kernel/signal.c: copy_from_user(&new_sa.sa, act, sizeof(new_sa.sa)), to update &current->blocked; hence, the set of blocked signals for the current thread is updated AFTER the kernel dumps ucontext_t to userspace. Assuming that the above is indeed the intended behaviour, because it semantically makes sense, since the signals blocked using sigaction() remain blocked only till the execution of the handler, and not in the context present before jumping to the handler (but nothing can be confirmed from the man-pages), the series introduces a test for mangling with uc_sigmask. I will send a separate series to fix the man-pages. The proposed selftest has been tested out on Aarch32, Aarch64 and x86_64. v4->v5: - Remove a redundant print statement v3->v4: - Allocate sigsets as automatic variables to avoid malloc() v2->v3: - ucontext describes current state -> ucontext describes interrupted context - Add a comment for blockage of USR2 even after return from handler - Describe blockage of signals in a better way v1->v2: - Replace all occurrences of SIGPIPE with SIGSEGV - Fixed a mismatch between code comment and ksft log - Add a testcase: Raise the same signal again; it must not be queued - Remove unneeded <assert.h>, <unistd.h> - Give a detailed test description in the comments; also describe the exact meaning of delivered and blocked - Handle errors for all libc functions/syscalls - Mention tests in Makefile and .gitignore in alphabetical order v1: - https://lore.kernel.org/all/20240607122319.768640-1-dev.jain@arm.com/ Dev Jain (2): selftests: Rename sigaltstack to generic signal selftests: Add a test mangling with uc_sigmask tools/testing/selftests/Makefile | 2 +- .../{sigaltstack => signal}/.gitignore | 3 +- .../{sigaltstack => signal}/Makefile | 3 +- .../current_stack_pointer.h | 0 .../selftests/signal/mangle_uc_sigmask.c | 184 ++++++++++++++++++ .../sas.c => signal/sigaltstack.c} | 0 6 files changed, 189 insertions(+), 3 deletions(-) rename tools/testing/selftests/{sigaltstack => signal}/.gitignore (57%) rename tools/testing/selftests/{sigaltstack => signal}/Makefile (53%) rename tools/testing/selftests/{sigaltstack => signal}/current_stack_pointer.h (100%) create mode 100644 tools/testing/selftests/signal/mangle_uc_sigmask.c rename tools/testing/selftests/{sigaltstack/sas.c => signal/sigaltstack.c} (100%) -- 2.30.2

1 year, 4 months

2
6
0 0

[PATCH 0/4] kunit: Add macros to help write more complex tests

by Michal Wajdeczko

It was suggested to promote some of the ideas introduced by [1] to be a part of the core KUnit instead of keeping them locally. [1] https://patchwork.freedesktop.org/series/137095/ Cc: Rae Moar <rmoar(a)google.com> Cc: David Gow <davidgow(a)google.com> Cc: Lucas De Marchi <lucas.demarchi(a)intel.com> Michal Wajdeczko (4): kunit: Introduce kunit_is_running() kunit: Add macro to conditionally expose declarations to tests kunit: Allow function redirection outside of the KUnit thread kunit: Add example with alternate function redirection method include/kunit/static_stub.h | 80 ++++++++++++++++++++++++++++++++++ include/kunit/test-bug.h | 12 ++++- include/kunit/visibility.h | 8 ++++ lib/kunit/kunit-example-test.c | 63 ++++++++++++++++++++++++++ lib/kunit/static_stub.c | 21 +++++++++ 5 files changed, 182 insertions(+), 2 deletions(-) -- 2.43.0

1 year, 4 months

4
19
0 0

[PATCH bpf-next v2 0/8] libbpf, selftests/bpf: Support cross-endian usage

by Tony Ambardar

From: Tony Ambardar <tony.ambardar(a)gmail.com> Hello all, This patch series targets a long-standing BPF usability issue - the lack of general cross-compilation support - by enabling cross-endian usage of libbpf and bpftool, as well as supporting cross-endian build targets for selftests/bpf. Benefits include improved BPF development and testing for embedded systems based on e.g. big-endian MIPS, more build options e.g for s390x systems, and better accessibility to the very latest test tools e.g. 'test_progs'. Initial development and testing used mips64, since this arch makes switching the build byte-order trivial and is thus very handy for A/B testing. However, it lacks some key features (bpf2bpf call, kfuncs, etc) making for poor selftests/bpf coverage. Final testing takes the kernel and selftests/bpf cross-built from x86_64 to s390x, and runs the result under QEMU/s390x. That same configuration could also be used on kernel-patches/bpf CI for regression testing endian support or perhaps load-sharing s390x builds across x86_64 systems. This thread includes some background regarding testing on QEMU/s390x and the generally favourable results: https://lore.kernel.org/bpf/ZsEcsaa3juxxQBUf@kodidev-ubuntu/ Feedback and suggestions are welcome! Best regards, Tony Changelog: --------- v1 -> v2: - fixed a light skeleton bug causing test_progs 'map_ptr' failure - simplified some BTF.ext related endianness logic - remove an 'inline' usage related to CI checkpatch failure - improve some formatting noted by checkpatch warnings - unexpected 'test_progs' failures drop 3 -> 2 (x86_64 to s390x cross) Tony Ambardar (8): libbpf: Improve log message formatting libbpf: Fix header comment typos for BTF.ext libbpf: Fix output .symtab byte-order during linking libbpf: Support BTF.ext loading and output in either endianness libbpf: Support opening bpf objects of either endianness libbpf: Support linking bpf objects of either endianness libbpf: Support creating light skeleton of either endianness selftests/bpf: Support cross-endian building tools/lib/bpf/bpf_gen_internal.h | 1 + tools/lib/bpf/btf.c | 168 ++++++++++++++++++++++-- tools/lib/bpf/btf.h | 3 + tools/lib/bpf/btf_dump.c | 2 +- tools/lib/bpf/btf_relocate.c | 2 +- tools/lib/bpf/gen_loader.c | 187 ++++++++++++++++++++------- tools/lib/bpf/libbpf.c | 26 +++- tools/lib/bpf/libbpf.map | 2 + tools/lib/bpf/libbpf_internal.h | 17 ++- tools/lib/bpf/linker.c | 108 +++++++++++++--- tools/lib/bpf/relo_core.c | 2 +- tools/lib/bpf/skel_internal.h | 3 +- tools/testing/selftests/bpf/Makefile | 7 +- 13 files changed, 444 insertions(+), 84 deletions(-) -- 2.34.1

1 year, 4 months

2
22
0 0

[PATCH net 00/15] mptcp: more fixes for the in-kernel PM

by Matthieu Baerts (NGI0)

Here is a new batch of fixes for the MPTCP in-kernel path-manager: Patch 1 ensures the address ID is set to 0 when the path-manager sends an ADD_ADDR for the address of the initial subflow. The same fix is applied when a new subflow is created re-using this special address. A fix for v6.0. Patch 2 is similar, but for the case where an endpoint is removed: if this endpoint was used for the initial address, it is important to send a RM_ADDR with this ID set to 0, and look for existing subflows with the ID set to 0. A fix for v6.0 as well. Patch 3 validates the two previous patches. Patch 4 makes the PM selecting an "active" path to send an address notification in an ACK, instead of taking the first path in the list. A fix for v5.11. Patch 5 fixes skipping the establishment of a new subflow if a previous subflow using the same pair of addresses is being closed. A fix for v5.13. Patch 6 resets the ID linked to the initial subflow when the linked endpoint is re-added, possibly with a different ID. A fix for v6.0. Patch 7 validates the three previous patches. Patch 8 is a small fix for the MPTCP Join selftest, when being used with older subflows not supporting all MIB counters. A fix for a commit introduced in v6.4, but backported up to v5.10. Patch 9 avoids the PM to try to close the initial subflow multiple times, and increment counters while nothing happened. A fix for v5.10. Patch 10 stops incrementing local_addr_used and add_addr_accepted counters when dealing with the address ID 0, because these counters are not taking into account the initial subflow, and are then not decremented when the linked addresses are removed. A fix for v6.0. Patch 11 validates the previous patch. Patch 12 avoids the PM to send multiple SUB_CLOSED events for the initial subflow. A fix for v5.12. Patch 13 validates the previous patch. Patch 14 stops treating the ADD_ADDR 0 as a new address, and accepts it in order to re-create the initial subflow if it has been closed, even if the limit for *new* addresses -- not taking into account the address of the initial subflow -- has been reached. A fix for v5.10. Patch 15 validates the previous patch. Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org> --- Matthieu Baerts (NGI0) (15): mptcp: pm: reuse ID 0 after delete and re-add mptcp: pm: fix RM_ADDR ID for the initial subflow selftests: mptcp: join: check removing ID 0 endpoint mptcp: pm: send ACK on an active subflow mptcp: pm: skip connecting to already established sf mptcp: pm: reset MPC endp ID when re-added selftests: mptcp: join: check re-adding init endp with != id selftests: mptcp: join: no extra msg if no counter mptcp: pm: do not remove already closed subflows mptcp: pm: fix ID 0 endp usage after multiple re-creations selftests: mptcp: join: check re-re-adding ID 0 endp mptcp: avoid duplicated SUB_CLOSED events selftests: mptcp: join: validate event numbers mptcp: pm: ADD_ADDR 0 is not a new address selftests: mptcp: join: check re-re-adding ID 0 signal net/mptcp/pm.c | 4 +- net/mptcp/pm_netlink.c | 87 ++++++++++---- net/mptcp/protocol.c | 6 + net/mptcp/protocol.h | 5 +- tools/testing/selftests/net/mptcp/mptcp_join.sh | 149 ++++++++++++++++++++---- tools/testing/selftests/net/mptcp/mptcp_lib.sh | 4 + 6 files changed, 207 insertions(+), 48 deletions(-) --- base-commit: 8af174ea863c72f25ce31cee3baad8a301c0cf0f change-id: 20240826-net-mptcp-more-pm-fix-ffa61a36f817 Best regards, -- Matthieu Baerts (NGI0) <matttbe(a)kernel.org>

1 year, 4 months

3
17
0 0

[PATCH v2 0/9] SEV Kernel Selftests

by Pratik R. Sampat

This series primarily introduces SEV-SNP test for the kernel selftest framework. It tests boot, ioctl, pre fault, and fallocate in various combinations to exercise both positive and negative launch flow paths. Patch 1 - Adds a wrapper for the ioctl calls that decouple ioctl and asserts which enables the use of negative test cases. No functional change intended. Patch 2 - Extend the sev smoke tests to use the SNP specific ioctl calls and sets up memory to boot a SNP guest VM Patch 3 - Adds SNP to shutdown testing Patch 4, 5 - Tests the ioctl path for SEV, SEV-ES and SNP Patch 6 - Adds support for SNP in KVM_SEV_INIT2 tests Patch 7,8,9 - Enable Prefault tests for SEV, SEV-ES and SNP The patchset is rebased on top of kvm/queue and and over the "KVM: selftests: Add SEV-ES shutdown test" patch. https://lore.kernel.org/kvm/20240709182936.146487-1-pgonda@google.com/ v2: 1. Add SMT parsing check to populate SNP policy flags 2. Extend Peter Gonda's shutdown test to include SNP 3. Introduce new tests for prefault which include exercising prefault, fallocate, hole-punch in various combinations. 4. Decouple ioctl patch reworked to introduce private variants of the the functions that call into the ioctl. Also reordered the patch for it to arrive first so that new APIs are not written right after their introduction. 5. General cleanups - adding comments, avoiding local booleans, better error message. Suggestions incorporated from Peter, Tom, and Sean. RFC: https://lore.kernel.org/kvm/20240710220540.188239-1-pratikrajesh.sampat@amd… Michael Roth (2): KVM: selftests: Add interface to manually flag protected/encrypted ranges KVM: selftests: Add a CoCo-specific test for KVM_PRE_FAULT_MEMORY Pratik R. Sampat (7): KVM: selftests: Decouple SEV ioctls from asserts KVM: selftests: Add a basic SNP smoke test KVM: selftests: Add SNP to shutdown testing KVM: selftests: SEV IOCTL test KVM: selftests: SNP IOCTL test KVM: selftests: SEV-SNP test for KVM_SEV_INIT2 KVM: selftests: Interleave fallocate for KVM_PRE_FAULT_MEMORY tools/testing/selftests/kvm/Makefile | 1 + .../testing/selftests/kvm/include/kvm_util.h | 13 + .../selftests/kvm/include/x86_64/processor.h | 1 + .../selftests/kvm/include/x86_64/sev.h | 76 +++- tools/testing/selftests/kvm/lib/kvm_util.c | 53 ++- .../selftests/kvm/lib/x86_64/processor.c | 6 +- tools/testing/selftests/kvm/lib/x86_64/sev.c | 190 +++++++- .../kvm/x86_64/coco_pre_fault_memory_test.c | 421 ++++++++++++++++++ .../selftests/kvm/x86_64/sev_init2_tests.c | 13 + .../selftests/kvm/x86_64/sev_smoke_test.c | 298 ++++++++++++- 10 files changed, 1024 insertions(+), 48 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86_64/coco_pre_fault_memory_test.c -- 2.34.1

1 year, 4 months

4
16
0 0

[PATCH v4 0/3] HID: hidraw: HIDIOCREVOKE introduction

by bentiss＠kernel.org

The is the v4 of the HIDIOCREVOKE patches. Link to v3: https://lore.kernel.org/all/20240812052753.GA478917@quokka/ After a small discussion with Peter, we decided to: - drop the BPF hooks that are problematic (Linus doesn't want "ALLOW_ERROR_INJECTION" to be used as "normal" fmodret bpf hooks) - punt those BPF hooks later once we get the API right - I'll be the one sending that new version, given that it's easier for me ATM For testing the patch, and for convenience, I added a new selftest program that can test this new ioctl. This will also allow us to integrate the (future) BPF hooks and show how this should be used. Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org> --- Benjamin Tissoires (2): selftests/hid: Add initial hidraw tests skeleton selftests/hid: Add HIDIOCREVOKE tests Peter Hutterer (1): HID: hidraw: add HIDIOCREVOKE ioctl drivers/hid/hidraw.c | 39 +- include/linux/hidraw.h | 1 + include/uapi/linux/hidraw.h | 1 + tools/testing/selftests/hid/.gitignore | 1 + tools/testing/selftests/hid/Makefile | 2 +- tools/testing/selftests/hid/hidraw.c | 665 +++++++++++++++++++++++++++++++++ 6 files changed, 704 insertions(+), 5 deletions(-) --- base-commit: 6e4436539ae182dc86d57d13849862bcafaa4709 change-id: 20240826-hidraw-revoke-0a02ebb21743 Best regards, -- Benjamin Tissoires <bentiss(a)kernel.org>

1 year, 4 months

3
4
0 0

[PATCH 0/9] misc fixups for DAMON {self,kunit} tests

by SeongJae Park

This patchset is for minor fixups of DAMON selftests and kunit tests. First three patches make DAMON selftests more cleanly maintained (patches 1 and 2) without unnecessary warnings (patch 3). Following six patches remove unnecessary test case (patch 4), handle configs combinations that can make tests fail (patches 5-7), reorganize the test files following the new guideline (patch 8), and add reference kunitconfig for DAMON kunit tests (patch 9). SeongJae Park (9): selftests/damon: add access_memory_even to .gitignore selftests/damon: cleanup __pycache__/ with 'make clean' selftests/damon: add execute permissions to test scripts mm/damon/core-test: test only vaddr case on ops registration test mm/damon/core-test: fix damon_test_ops_registration() for DAMON_VADDR unset case mm/damon/dbgfs-test: skip dbgfs_set_targets() test if PADDR is not registered mm/damon/dbgfs-test: skip dbgfs_set_init_regions() test if PADDR is not registered mm/damon: move kunit tests to tests/ subdirectory with _kunit suffix mm/damon/tests: add .kunitconfig file for DAMON kunit tests mm/damon/core.c | 2 +- mm/damon/dbgfs.c | 2 +- mm/damon/sysfs.c | 2 +- mm/damon/tests/.kunitconfig | 22 ++++++++++++++++++ mm/damon/{core-test.h => tests/core-kunit.h} | 23 ++++++++++++++----- .../{dbgfs-test.h => tests/dbgfs-kunit.h} | 10 ++++++++ .../{sysfs-test.h => tests/sysfs-kunit.h} | 0 .../{vaddr-test.h => tests/vaddr-kunit.h} | 0 mm/damon/vaddr.c | 2 +- tools/testing/selftests/damon/.gitignore | 1 + tools/testing/selftests/damon/Makefile | 2 ++ .../selftests/damon/damon_nr_regions.py | 0 .../selftests/damon/damos_apply_interval.py | 0 tools/testing/selftests/damon/damos_quota.py | 0 .../selftests/damon/damos_quota_goal.py | 0 .../selftests/damon/damos_tried_regions.py | 0 .../damon/debugfs_target_ids_pid_leak.sh | 0 ...s_target_ids_read_before_terminate_race.sh | 0 ...sysfs_update_schemes_tried_regions_hang.py | 0 ...te_schemes_tried_regions_wss_estimation.py | 0 20 files changed, 56 insertions(+), 10 deletions(-) create mode 100644 mm/damon/tests/.kunitconfig rename mm/damon/{core-test.h => tests/core-kunit.h} (96%) rename mm/damon/{dbgfs-test.h => tests/dbgfs-kunit.h} (94%) rename mm/damon/{sysfs-test.h => tests/sysfs-kunit.h} (100%) rename mm/damon/{vaddr-test.h => tests/vaddr-kunit.h} (100%) mode change 100644 => 100755 tools/testing/selftests/damon/damon_nr_regions.py mode change 100644 => 100755 tools/testing/selftests/damon/damos_apply_interval.py mode change 100644 => 100755 tools/testing/selftests/damon/damos_quota.py mode change 100644 => 100755 tools/testing/selftests/damon/damos_quota_goal.py mode change 100644 => 100755 tools/testing/selftests/damon/damos_tried_regions.py mode change 100644 => 100755 tools/testing/selftests/damon/debugfs_target_ids_pid_leak.sh mode change 100644 => 100755 tools/testing/selftests/damon/debugfs_target_ids_read_before_terminate_race.sh mode change 100644 => 100755 tools/testing/selftests/damon/sysfs_update_schemes_tried_regions_hang.py mode change 100644 => 100755 tools/testing/selftests/damon/sysfs_update_schemes_tried_regions_wss_estimation.py base-commit: ac23a0bb201e9916faa357d51c387e523813b4ad -- 2.39.2

1 year, 4 months

1
9
0 0

[PATCH v3 0/7] mm: workingset reporting

by Yuanchu Xie

Changes from PATCH v2 -> v3: - Fixed typos in commit messages and documentation (Lance Yang, Randy Dunlap) - Split out the force_scan patch to be reviewed separately - Added benchmarks from Ghait Ouled Amar Ben Cheikh - Fixed reported compile error without CONFIG_MEMCG Changes from PATCH v1 -> v2: - Updated selftest to use ksft_test_result_code instead of switch-case (Muhammad Usama Anjum) - Included more use cases in the cover letter (Huang, Ying) - Added documentation for sysfs and memcg interfaces - Added an aging-specific struct lru_gen_mm_walk in struct pglist_data to avoid allocating for each lruvec. Changes from RFC v3 -> PATCH v1: - Updated selftest to use ksft_print_msg instead of fprintf(stderr, ...) (Muhammad Usama Anjum) - Included more detail in patch skipping pmd_young with force_scan (Huang, Ying) - Deferred reaccess histogram as a followup - Removed per-memcg page age interval configs for simplicity Changes from RFC v2 -> RFC v3: - Update to v6.8 - Added an aging kernel thread (gated behind config) - Added basic selftests for sysfs interface files - Track swapped out pages for reaccesses - Refactoring and cleanup - Dropped the virtio-balloon extension to make things manageable Changes from RFC v1 -> RFC v2: - Refactored the patchs into smaller pieces - Renamed interfaces and functions from wss to wsr (Working Set Reporting) - Fixed build errors when CONFIG_WSR is not set - Changed working_set_num_bins to u8 for virtio-balloon - Added support for per-NUMA node reporting for virtio-balloon [rfc v1] https://lore.kernel.org/linux-mm/20230509185419.1088297-1-yuanchu@google.co… [rfc v2] https://lore.kernel.org/linux-mm/20230621180454.973862-1-yuanchu@google.com/ [rfc v3] https://lore.kernel.org/linux-mm/20240327213108.2384666-1-yuanchu@google.co… This patch series provides workingset reporting of user pages in lruvecs, of which coldness can be tracked by accessed bits and fd references. However, the concept of workingset applies generically to all types of memory, which could be kernel slab caches, discardable userspace caches (databases), or CXL.mem. Therefore, data sources might come from slab shrinkers, device drivers, or the userspace. IMO, the kernel should provide a set of workingset interfaces that should be generic enough to accommodate the various use cases, and be extensible to potential future use cases. The current proposed interfaces are not sufficient in that regard, but I would like to start somewhere, solicit feedback, and iterate. Use cases ========== Job scheduling On overcommitted hosts, workingset information allows the job scheduler to right-size each job and land more jobs on the same host or NUMA node, and in the case of a job with increasing workingset, policy decisions can be made to migrate other jobs off the host/NUMA node, or oom-kill the misbehaving job. If the job shape is very different from the machine shape, knowing the workingset per-node can also help inform page allocation policies. Proactive reclaim Workingset information allows the a container manager to proactively reclaim memory while not impacting a job's performance. While PSI may provide a reactive measure of when a proactive reclaim has reclaimed too much, workingset reporting allows the policy to be more accurate and flexible. Ballooning (similar to proactive reclaim) While this patch series does not extend the virtio-balloon device, balloon policies benefit from workingset to more precisely determine the size of the memory balloon. On desktops/laptops/mobile devices where memory is scarce and overcommitted, the balloon sizing in multiple VMs running on the same device can be orchestrated with workingset reports from each one. Promotion/Demotion If different mechanisms are used for promition and demotion, workingset information can help connect the two and avoid pages being migrated back and forth. For example, given a promotion hot page threshold defined in reaccess distance of N seconds (promote pages accessed more often than every N seconds). The threshold N should be set so that ~80% (e.g.) of pages on the fast memory node passes the threshold. This calculation can be done with workingset reports. To be directly useful for promotion policies, the workingset report interfaces need to be extended to report hotness and gather hotness information from the devices[1]. [1] https://www.opencompute.org/documents/ocp-cms-hotness-tracking-requirements… Sysfs and Cgroup Interfaces ========== The interfaces are detailed in the patches that introduce them. The main idea here is we break down the workingset per-node per-memcg into time intervals (ms), e.g. 1000 anon=137368 file=24530 20000 anon=34342 file=0 30000 anon=353232 file=333608 40000 anon=407198 file=206052 9223372036854775807 anon=4925624 file=892892 I realize this does not generalize well to hotness information, but I lack the intuition for an abstraction that presents hotness in a useful way. Based on a recent proposal for move_phys_pages[2], it seems like userspace tiering software would like to move specific physical pages, instead of informing the kernel "move x number of hot pages to y device". Please advise. [2] https://lore.kernel.org/lkml/20240319172609.332900-1-gregory.price@memverge… Implementation ========== Currently, the reporting of user pages is based off of MGLRU, and therefore requires CONFIG_LRU_GEN=y. We would benefit from more MGLRU generations for a more fine-grained workingset report. I will make the generation count configurable in the next version. The workingset reporting mechanism is gated behind CONFIG_WORKINGSET_REPORT, and the aging thread is behind CONFIG_WORKINGSET_REPORT_AGING. Benchmarks ========== Ghait Ouled Amar Ben Cheikh has implemented a simple "reclaim everything colder than 10 seconds every 40 seconds" policy and ran Linux compile and redis from the phoronix test suite. The results are in his repo: https://github.com/miloudi98/WMO Yuanchu Xie (7): mm: aggregate working set information into histograms mm: use refresh interval to rate-limit workingset report aggregation mm: report workingset during memory pressure driven scanning mm: extend working set reporting to memcgs mm: add kernel aging thread for workingset reporting selftest: test system-wide workingset reporting Docs/admin-guide/mm/workingset_report: document sysfs and memcg interfaces Documentation/admin-guide/mm/index.rst | 1 + .../admin-guide/mm/workingset_report.rst | 105 ++++ drivers/base/node.c | 6 + include/linux/memcontrol.h | 21 + include/linux/mmzone.h | 9 + include/linux/workingset_report.h | 97 +++ mm/Kconfig | 15 + mm/Makefile | 2 + mm/internal.h | 18 + mm/memcontrol.c | 184 +++++- mm/mm_init.c | 2 + mm/mmzone.c | 2 + mm/vmscan.c | 56 +- mm/workingset_report.c | 561 ++++++++++++++++++ mm/workingset_report_aging.c | 127 ++++ tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 3 + tools/testing/selftests/mm/run_vmtests.sh | 5 + .../testing/selftests/mm/workingset_report.c | 306 ++++++++++ .../testing/selftests/mm/workingset_report.h | 39 ++ .../selftests/mm/workingset_report_test.c | 330 +++++++++++ 21 files changed, 1885 insertions(+), 5 deletions(-) create mode 100644 Documentation/admin-guide/mm/workingset_report.rst create mode 100644 include/linux/workingset_report.h create mode 100644 mm/workingset_report.c create mode 100644 mm/workingset_report_aging.c create mode 100644 tools/testing/selftests/mm/workingset_report.c create mode 100644 tools/testing/selftests/mm/workingset_report.h create mode 100644 tools/testing/selftests/mm/workingset_report_test.c -- 2.46.0.76.ge559c4bf1a-goog

1 year, 4 months

6
13
0 0

[PATCH net-next v7] net: netconsole: selftests: Create a new netconsole selftest

by Breno Leitao

Adds a selftest that creates two virtual interfaces, assigns one to a new namespace, and assigns IP addresses to both. It listens on the destination interface using socat and configures a dynamic target on netconsole, pointing to the destination IP address. The test then checks if the message was received properly on the destination interface. Signed-off-by: Breno Leitao <leitao(a)debian.org> Acked-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org> --- Changelog: v7: * Fixed a typo (s/Skippig/Skipping) (Matthieu Baerts) v6: * Check for SRC and DST ip before starting the test (Jakub) * Revert the printk configuration at the end of the test (Jakub) * Fix the modprobe stderr redirection (Jakub) * https://lore.kernel.org/all/20240821080826.3753521-1-leitao@debian.org/ v5: * Replace check_file_size() by "test -s" (Matthieu) * https://lore.kernel.org/all/20240819090406.1441297-1-leitao@debian.org/#t v4: * Avoid sleeping in waiting for sockets and files (Matthieu Baerts) * Some other improvements (Matthieu Baerts) * Add configfs as a dependency (Jakub) * https://lore.kernel.org/all/20240816132450.346744-1-leitao@debian.org/ v3: * Defined CONFIGs in config file (Jakub) * Identention fixes (Petr Machata) * Use setup_ns in a better way (Matthieu Baerts) * Add dependencies in TEST_INCLUDES (Hangbin Liu) * https://lore.kernel.org/all/20240815095157.3064722-1-leitao@debian.org/ v2: * Change the location of the path (Jakub) * Move from veth to netdevsim * Other small changes in dependency checks and cleanup * https://lore.kernel.org/all/20240813183825.837091-1-leitao@debian.org/ v1: * https://lore.kernel.org/all/ZqyUHN770pjSofTC@gmail.com/ MAINTAINERS | 1 + tools/testing/selftests/drivers/net/Makefile | 5 +- tools/testing/selftests/drivers/net/config | 4 + .../selftests/drivers/net/netcons_basic.sh | 234 ++++++++++++++++++ 4 files changed, 243 insertions(+), 1 deletion(-) create mode 100755 tools/testing/selftests/drivers/net/netcons_basic.sh diff --git a/MAINTAINERS b/MAINTAINERS index 5dbf23cf11c8..9a371ddd8719 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15772,6 +15772,7 @@ M: Breno Leitao <leitao(a)debian.org> S: Maintained F: Documentation/networking/netconsole.rst F: drivers/net/netconsole.c +F: tools/testing/selftests/drivers/net/netcons_basic.sh NETDEVSIM M: Jakub Kicinski <kuba(a)kernel.org> diff --git a/tools/testing/selftests/drivers/net/Makefile b/tools/testing/selftests/drivers/net/Makefile index e54f382bcb02..39fb97a8c1df 100644 --- a/tools/testing/selftests/drivers/net/Makefile +++ b/tools/testing/selftests/drivers/net/Makefile @@ -1,8 +1,11 @@ # SPDX-License-Identifier: GPL-2.0 -TEST_INCLUDES := $(wildcard lib/py/*.py) +TEST_INCLUDES := $(wildcard lib/py/*.py) \ + ../../net/net_helper.sh \ + ../../net/lib.sh \ TEST_PROGS := \ + netcons_basic.sh \ ping.py \ queues.py \ stats.py \ diff --git a/tools/testing/selftests/drivers/net/config b/tools/testing/selftests/drivers/net/config index f6a58ce8a230..a2d8af60876d 100644 --- a/tools/testing/selftests/drivers/net/config +++ b/tools/testing/selftests/drivers/net/config @@ -1,2 +1,6 @@ CONFIG_IPV6=y CONFIG_NETDEVSIM=m +CONFIG_CONFIGFS_FS=y +CONFIG_NETCONSOLE=m +CONFIG_NETCONSOLE_DYNAMIC=y +CONFIG_NETCONSOLE_EXTENDED_LOG=y diff --git a/tools/testing/selftests/drivers/net/netcons_basic.sh b/tools/testing/selftests/drivers/net/netcons_basic.sh new file mode 100755 index 000000000000..06021b2059b7 --- /dev/null +++ b/tools/testing/selftests/drivers/net/netcons_basic.sh @@ -0,0 +1,234 @@ +#!/usr/bin/env bash +# SPDX-License-Identifier: GPL-2.0 + +# This test creates two netdevsim virtual interfaces, assigns one of them (the +# "destination interface") to a new namespace, and assigns IP addresses to both +# interfaces. +# +# It listens on the destination interface using socat and configures a dynamic +# target on netconsole, pointing to the destination IP address. +# +# Finally, it checks whether the message was received properly on the +# destination interface. Note that this test may pollute the kernel log buffer +# (dmesg) and relies on dynamic configuration and namespaces being configured. +# +# Author: Breno Leitao <leitao(a)debian.org> + +set -euo pipefail + +SCRIPTDIR=$(dirname "$(readlink -e "${BASH_SOURCE[0]}")") + +# Simple script to test dynamic targets in netconsole +SRCIF="" # to be populated later +SRCIP=192.168.1.1 +DSTIF="" # to be populated later +DSTIP=192.168.1.2 + +PORT="6666" +MSG="netconsole selftest" +TARGET=$(mktemp -u netcons_XXXXX) +DEFAULT_PRINTK_VALUES=$(cat /proc/sys/kernel/printk) +NETCONS_CONFIGFS="/sys/kernel/config/netconsole" +NETCONS_PATH="${NETCONS_CONFIGFS}"/"${TARGET}" +# NAMESPACE will be populated by setup_ns with a random value +NAMESPACE="" + +# IDs for netdevsim +NSIM_DEV_1_ID=$((256 + RANDOM % 256)) +NSIM_DEV_2_ID=$((512 + RANDOM % 256)) + +# Used to create and delete namespaces +source "${SCRIPTDIR}"/../../net/lib.sh +source "${SCRIPTDIR}"/../../net/net_helper.sh + +# Create netdevsim interfaces +create_ifaces() { + local NSIM_DEV_SYS_NEW=/sys/bus/netdevsim/new_device + + echo "$NSIM_DEV_2_ID" > "$NSIM_DEV_SYS_NEW" + echo "$NSIM_DEV_1_ID" > "$NSIM_DEV_SYS_NEW" + udevadm settle 2> /dev/null || true + + local NSIM1=/sys/bus/netdevsim/devices/netdevsim"$NSIM_DEV_1_ID" + local NSIM2=/sys/bus/netdevsim/devices/netdevsim"$NSIM_DEV_2_ID" + + # These are global variables + SRCIF=$(find "$NSIM1"/net -maxdepth 1 -type d ! \ + -path "$NSIM1"/net -exec basename {} \;) + DSTIF=$(find "$NSIM2"/net -maxdepth 1 -type d ! \ + -path "$NSIM2"/net -exec basename {} \;) +} + +link_ifaces() { + local NSIM_DEV_SYS_LINK="/sys/bus/netdevsim/link_device" + local SRCIF_IFIDX=$(cat /sys/class/net/"$SRCIF"/ifindex) + local DSTIF_IFIDX=$(cat /sys/class/net/"$DSTIF"/ifindex) + + exec {NAMESPACE_FD}</var/run/netns/"${NAMESPACE}" + exec {INITNS_FD}</proc/self/ns/net + + # Bind the dst interface to namespace + ip link set "${DSTIF}" netns "${NAMESPACE}" + + # Linking one device to the other one (on the other namespace} + if ! echo "${INITNS_FD}:$SRCIF_IFIDX $NAMESPACE_FD:$DSTIF_IFIDX" > $NSIM_DEV_SYS_LINK + then + echo "linking netdevsim1 with netdevsim2 should succeed" + cleanup + exit "${ksft_skip}" + fi +} + +function configure_ip() { + # Configure the IPs for both interfaces + ip netns exec "${NAMESPACE}" ip addr add "${DSTIP}"/24 dev "${DSTIF}" + ip netns exec "${NAMESPACE}" ip link set "${DSTIF}" up + + ip addr add "${SRCIP}"/24 dev "${SRCIF}" + ip link set "${SRCIF}" up +} + +function set_network() { + # setup_ns function is coming from lib.sh + setup_ns NAMESPACE + + # Create both interfaces, and assign the destination to a different + # namespace + create_ifaces + + # Link both interfaces back to back + link_ifaces + + configure_ip +} + +function create_dynamic_target() { + DSTMAC=$(ip netns exec "${NAMESPACE}" \ + ip link show "${DSTIF}" | awk '/ether/ {print $2}') + + # Create a dynamic target + mkdir "${NETCONS_PATH}" + + echo "${DSTIP}" > "${NETCONS_PATH}"/remote_ip + echo "${SRCIP}" > "${NETCONS_PATH}"/local_ip + echo "${DSTMAC}" > "${NETCONS_PATH}"/remote_mac + echo "${SRCIF}" > "${NETCONS_PATH}"/dev_name + + echo 1 > "${NETCONS_PATH}"/enabled +} + +function cleanup() { + local NSIM_DEV_SYS_DEL="/sys/bus/netdevsim/del_device" + + # delete netconsole dynamic reconfiguration + echo 0 > "${NETCONS_PATH}"/enabled + # Remove the configfs entry + rmdir "${NETCONS_PATH}" + + # Delete netdevsim devices + echo "$NSIM_DEV_2_ID" > "$NSIM_DEV_SYS_DEL" + echo "$NSIM_DEV_1_ID" > "$NSIM_DEV_SYS_DEL" + + # this is coming from lib.sh + cleanup_all_ns + + # Restoring printk configurations + echo "${DEFAULT_PRINTK_VALUES}" > /proc/sys/kernel/printk +} + +function listen_port_and_save_to() { + local OUTPUT=${1} + # Just wait for 2 seconds + timeout 2 ip netns exec "${NAMESPACE}" \ + socat UDP-LISTEN:"${PORT}",fork "${OUTPUT}" +} + +function validate_result() { + local TMPFILENAME="$1" + + # Check if the file exists + if [ ! -f "$TMPFILENAME" ]; then + echo "FAIL: File was not generated." >&2 + exit "${ksft_fail}" + fi + + if ! grep -q "${MSG}" "${TMPFILENAME}"; then + echo "FAIL: ${MSG} not found in ${TMPFILENAME}" >&2 + cat "${TMPFILENAME}" >&2 + exit "${ksft_fail}" + fi + + # Delete the file once it is validated, otherwise keep it + # for debugging purposes + rm "${TMPFILENAME}" + exit "${ksft_pass}" +} + +function check_for_dependencies() { + if [ "$(id -u)" -ne 0 ]; then + echo "This test must be run as root" >&2 + exit "${ksft_skip}" + fi + + if ! which socat > /dev/null ; then + echo "SKIP: socat(1) is not available" >&2 + exit "${ksft_skip}" + fi + + if ! which ip > /dev/null ; then + echo "SKIP: ip(1) is not available" >&2 + exit "${ksft_skip}" + fi + + if ! which udevadm > /dev/null ; then + echo "SKIP: udevadm(1) is not available" >&2 + exit "${ksft_skip}" + fi + + if [ ! -d "${NETCONS_CONFIGFS}" ]; then + echo "SKIP: directory ${NETCONS_CONFIGFS} does not exist. Check if NETCONSOLE_DYNAMIC is enabled" >&2 + exit "${ksft_skip}" + fi + + if ip link show "${DSTIF}" 2> /dev/null; then + echo "SKIP: interface ${DSTIF} exists in the system. Not overwriting it." >&2 + exit "${ksft_skip}" + fi + + if ip addr list | grep -E "inet.*(${SRCIP}|${DSTIP})" 2> /dev/null; then + echo "SKIP: IPs already in use. Skipping it" >&2 + exit "${ksft_skip}" + fi +} + +# ========== # +# Start here # +# ========== # +modprobe netdevsim 2> /dev/null || true +modprobe netconsole 2> /dev/null || true + +# The content of kmsg will be save to the following file +OUTPUT_FILE="/tmp/${TARGET}" + +# Check for basic system dependency and exit if not found +check_for_dependencies +# Set current loglevel to KERN_INFO(6), and default to KERN_NOTICE(5) +echo "6 5" > /proc/sys/kernel/printk +# Remove the namespace, interfaces and netconsole target on exit +trap cleanup EXIT +# Create one namespace and two interfaces +set_network +# Create a dynamic target for netconsole +create_dynamic_target +# Listed for netconsole port inside the namespace and destination interface +listen_port_and_save_to "${OUTPUT_FILE}" & +# Wait for socat to start and listen to the port. +wait_local_port_listen "${NAMESPACE}" "${PORT}" udp +# Send the message +echo "${MSG}: ${TARGET}" > /dev/kmsg +# Wait until socat saves the file to disk +busywait "${BUSYWAIT_TIMEOUT}" test -s "${OUTPUT_FILE}" + +# Make sure the message was received in the dst part +# and exit +validate_result "${OUTPUT_FILE}" -- 2.43.5

1 year, 4 months

2
1
0 0

[PATCH net-next v15 01/13] mm: page_frag: add a test module for page_frag

by Yunsheng Lin

The testing is done by ensuring that the fragment allocated from a frag_frag_cache instance is pushed into a ptr_ring instance in a kthread binded to a specified cpu, and a kthread binded to a specified cpu will pop the fragment from the ptr_ring and free the fragment. CC: Alexander Duyck <alexander.duyck(a)gmail.com> Signed-off-by: Yunsheng Lin <linyunsheng(a)huawei.com> --- tools/testing/selftests/mm/Makefile | 2 + tools/testing/selftests/mm/page_frag/Makefile | 18 ++ .../selftests/mm/page_frag/page_frag_test.c | 170 ++++++++++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 9 +- 4 files changed, 198 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/mm/page_frag/Makefile create mode 100644 tools/testing/selftests/mm/page_frag/page_frag_test.c diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile index cfad627e8d94..ed196901b9ca 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -36,6 +36,8 @@ MAKEFLAGS += --no-builtin-rules CFLAGS = -Wall -I $(top_srcdir) $(EXTRA_CFLAGS) $(KHDR_INCLUDES) $(TOOLS_INCLUDES) LDLIBS = -lrt -lpthread -lm +TEST_GEN_MODS_DIR := page_frag + TEST_GEN_FILES = cow TEST_GEN_FILES += compaction_test TEST_GEN_FILES += gup_longterm diff --git a/tools/testing/selftests/mm/page_frag/Makefile b/tools/testing/selftests/mm/page_frag/Makefile new file mode 100644 index 000000000000..58dda74d50a3 --- /dev/null +++ b/tools/testing/selftests/mm/page_frag/Makefile @@ -0,0 +1,18 @@ +PAGE_FRAG_TEST_DIR := $(realpath $(dir $(abspath $(lastword $(MAKEFILE_LIST))))) +KDIR ?= $(abspath $(PAGE_FRAG_TEST_DIR)/../../../../..) + +ifeq ($(V),1) +Q = +else +Q = @ +endif + +MODULES = page_frag_test.ko + +obj-m += page_frag_test.o + +all: + +$(Q)make -C $(KDIR) M=$(PAGE_FRAG_TEST_DIR) modules + +clean: + +$(Q)make -C $(KDIR) M=$(PAGE_FRAG_TEST_DIR) clean diff --git a/tools/testing/selftests/mm/page_frag/page_frag_test.c b/tools/testing/selftests/mm/page_frag/page_frag_test.c new file mode 100644 index 000000000000..0e803db1ad79 --- /dev/null +++ b/tools/testing/selftests/mm/page_frag/page_frag_test.c @@ -0,0 +1,170 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Test module for page_frag cache + * + * Copyright: linyunsheng(a)huawei.com + */ + +#include <linux/mm.h> +#include <linux/module.h> +#include <linux/cpumask.h> +#include <linux/completion.h> +#include <linux/ptr_ring.h> +#include <linux/kthread.h> + +static struct ptr_ring ptr_ring; +static int nr_objs = 512; +static atomic_t nthreads; +static struct completion wait; +static struct page_frag_cache test_frag; + +static int nr_test = 5120000; +module_param(nr_test, int, 0); +MODULE_PARM_DESC(nr_test, "number of iterations to test"); + +static bool test_align; +module_param(test_align, bool, 0); +MODULE_PARM_DESC(test_align, "use align API for testing"); + +static int test_alloc_len = 2048; +module_param(test_alloc_len, int, 0); +MODULE_PARM_DESC(test_alloc_len, "alloc len for testing"); + +static int test_push_cpu; +module_param(test_push_cpu, int, 0); +MODULE_PARM_DESC(test_push_cpu, "test cpu for pushing fragment"); + +static int test_pop_cpu; +module_param(test_pop_cpu, int, 0); +MODULE_PARM_DESC(test_pop_cpu, "test cpu for popping fragment"); + +static int page_frag_pop_thread(void *arg) +{ + struct ptr_ring *ring = arg; + int nr = nr_test; + + pr_info("page_frag pop test thread begins on cpu %d\n", + smp_processor_id()); + + while (nr > 0) { + void *obj = __ptr_ring_consume(ring); + + if (obj) { + nr--; + page_frag_free(obj); + } else { + cond_resched(); + } + } + + if (atomic_dec_and_test(&nthreads)) + complete(&wait); + + pr_info("page_frag pop test thread exits on cpu %d\n", + smp_processor_id()); + + return 0; +} + +static int page_frag_push_thread(void *arg) +{ + struct ptr_ring *ring = arg; + int nr = nr_test; + + pr_info("page_frag push test thread begins on cpu %d\n", + smp_processor_id()); + + while (nr > 0) { + void *va; + int ret; + + if (test_align) { + va = page_frag_alloc_align(&test_frag, test_alloc_len, + GFP_KERNEL, SMP_CACHE_BYTES); + + WARN_ONCE((unsigned long)va & (SMP_CACHE_BYTES - 1), + "unaligned va returned\n"); + } else { + va = page_frag_alloc(&test_frag, test_alloc_len, GFP_KERNEL); + } + + if (!va) + continue; + + ret = __ptr_ring_produce(ring, va); + if (ret) { + page_frag_free(va); + cond_resched(); + } else { + nr--; + } + } + + pr_info("page_frag push test thread exits on cpu %d\n", + smp_processor_id()); + + if (atomic_dec_and_test(&nthreads)) + complete(&wait); + + return 0; +} + +static int __init page_frag_test_init(void) +{ + struct task_struct *tsk_push, *tsk_pop; + ktime_t start; + u64 duration; + int ret; + + test_frag.va = NULL; + atomic_set(&nthreads, 2); + init_completion(&wait); + + if (test_alloc_len > PAGE_SIZE || test_alloc_len <= 0 || + !cpu_active(test_push_cpu) || !cpu_active(test_pop_cpu)) + return -EINVAL; + + ret = ptr_ring_init(&ptr_ring, nr_objs, GFP_KERNEL); + if (ret) + return ret; + + tsk_push = kthread_create_on_cpu(page_frag_push_thread, &ptr_ring, + test_push_cpu, "page_frag_push"); + if (IS_ERR(tsk_push)) + return PTR_ERR(tsk_push); + + tsk_pop = kthread_create_on_cpu(page_frag_pop_thread, &ptr_ring, + test_pop_cpu, "page_frag_pop"); + if (IS_ERR(tsk_pop)) { + kthread_stop(tsk_push); + return PTR_ERR(tsk_pop); + } + + start = ktime_get(); + wake_up_process(tsk_push); + wake_up_process(tsk_pop); + + pr_info("waiting for test to complete\n"); + wait_for_completion(&wait); + + duration = (u64)ktime_us_delta(ktime_get(), start); + pr_info("%d of iterations for %s testing took: %lluus\n", nr_test, + test_align ? "aligned" : "non-aligned", duration); + + ptr_ring_cleanup(&ptr_ring, NULL); + page_frag_cache_drain(&test_frag); + + return -EAGAIN; +} + +static void __exit page_frag_test_exit(void) +{ +} + +module_init(page_frag_test_init); +module_exit(page_frag_test_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Yunsheng Lin <linyunsheng(a)huawei.com>"); +MODULE_DESCRIPTION("Test module for page_frag"); diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh index 36045edb10de..9a788d5f3f28 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -75,6 +75,8 @@ separated by spaces: read-only VMAs - mdwe test prctl(PR_SET_MDWE, ...) +- page_frag + test handling of page fragment allocation and freeing example: ./run_vmtests.sh -t "hmm mmap ksm" EOF @@ -231,7 +233,8 @@ run_test() { ("$@" 2>&1) | tap_prefix local ret=${PIPESTATUS[0]} count_total=$(( count_total + 1 )) - if [ $ret -eq 0 ]; then + # page_frag_test.ko returns 11(EAGAIN) when insmod'ing to avoid rmmod + if [ $ret -eq 0 ] | [ $ret -eq 11 -a ${CATEGORY} == "page_frag" ]; then count_pass=$(( count_pass + 1 )) echo "[PASS]" | tap_prefix echo "ok ${count_total} ${test}" | tap_output @@ -456,6 +459,10 @@ CATEGORY="mkdirty" run_test ./mkdirty CATEGORY="mdwe" run_test ./mdwe_test +CATEGORY="page_frag" run_test insmod ./page_frag/page_frag_test.ko + +CATEGORY="page_frag" run_test insmod ./page_frag/page_frag_test.ko test_alloc_len=12 test_align=1 + echo "SUMMARY: PASS=${count_pass} SKIP=${count_skip} FAIL=${count_fail}" | tap_prefix echo "1..${count_total}" | tap_output -- 2.33.0

1 year, 4 months

2
1
0 0

[PATCH v3 0/3] riscv: mm: Extend mappable memory up to hint address

by Charlie Jenkins

On riscv, mmap currently returns an address from the largest address space that can fit entirely inside of the hint address. This makes it such that the hint address is almost never returned. This patch raises the mappable area up to and including the hint address. This allows mmap to often return the hint address, which allows a performance improvement over searching for a valid address as well as making the behavior more similar to other architectures. Note that a previous patch introduced stronger semantics compared to other architectures for riscv mmap. On riscv, mmap will not use bits in the upper bits of the virtual address depending on the hint address. On other architectures, a random address is returned in the address space requested. On all architectures the hint address will be returned if it is available. This allows riscv applications to configure how many bits in the virtual address should be left empty. This has the two benefits of being able to request address spaces that are smaller than the default and doesn't require the application to know the page table layout of riscv. Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com> --- Changes in v3: - Add back forgotten semi-colon - Fix test cases - Add support for rv32 - Change cover letter name so it's not the same as patch 1 - Link to v2: https://lore.kernel.org/r/20240130-use_mmap_hint_address-v2-0-f34ebfd33053@… Changes in v2: - Add back forgotten "mmap_end = STACK_TOP_MAX" - Link to v1: https://lore.kernel.org/r/20240129-use_mmap_hint_address-v1-0-4c74da813ba1@… --- Charlie Jenkins (3): riscv: mm: Use hint address in mmap if available selftests: riscv: Generalize mm selftests docs: riscv: Define behavior of mmap Documentation/arch/riscv/vm-layout.rst | 16 ++-- arch/riscv/include/asm/processor.h | 27 +++--- tools/testing/selftests/riscv/mm/mmap_bottomup.c | 23 +---- tools/testing/selftests/riscv/mm/mmap_default.c | 23 +---- tools/testing/selftests/riscv/mm/mmap_test.h | 107 ++++++++++++++--------- 5 files changed, 83 insertions(+), 113 deletions(-) --- base-commit: 556e2d17cae620d549c5474b1ece053430cd50bc change-id: 20240119-use_mmap_hint_address-f9f4b1b6f5f1 -- - Charlie

1 year, 4 months

5
19
0 0

[PATCH v2 0/3] selftests: Fix cpuid / vendor checking build issues

by Ilpo Järvinen

This series first generalizes resctrl selftest non-contiguous CAT check to not assume non-AMD vendor implies Intel. Second, it improves kselftest common parts and resctrl selftest such that the use of __cpuid_count() does not lead into a build failure (happens at least on ARM). While ARM does not currently support resctrl features, there's an ongoing work to enable resctrl support also for it on the kernel side. In any case, a common header such as kselftest.h should have a proper fallback in place for what it provides, thus it seems justified to fix this common level problem on the common level rather than e.g. disabling build for resctrl selftest for archs lacking resctrl support. v2: - Removed RFC from the last patch & added Fixes and tags - Fixed the error message's line splits - Noted down the reason for void casts in the stub Ilpo Järvinen (3): selftests/resctrl: Generalize non-contiguous CAT check selftests/resctrl: Always initialize ecx to avoid build warnings kselftest: Provide __cpuid_count() stub on non-x86 archs tools/testing/selftests/kselftest.h | 6 +++++ tools/testing/selftests/lib.mk | 4 ++++ tools/testing/selftests/resctrl/cat_test.c | 28 +++++++++++++--------- 3 files changed, 27 insertions(+), 11 deletions(-) -- 2.39.2

1 year, 4 months

2
10
0 0

[PATCH] selftests/mm: do not try to split below filesystem block size

by Pankaj Raghav (Samsung)

From: Pankaj Raghav <p.raghav(a)samsung.com> There is no point trying to split pagecache thp below the blocksize of the filesystem as that is the minimum order that pagecache needs to maintain to support blocksizes greater than pagesize [1]. Set the lower limit for the splitting order to be the fs blocksize order. As the number of tests will now depend on the minimum splitting order, move the file preparation before calling ksft_set_plan(). [1] https://lore.kernel.org/linux-fsdevel/20240822135018.1931258-1-kernel@panka… Signed-off-by: Pankaj Raghav <p.raghav(a)samsung.com> --- .../selftests/mm/split_huge_page_test.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c index e5e8dafc9d94..187fe9107998 100644 --- a/tools/testing/selftests/mm/split_huge_page_test.c +++ b/tools/testing/selftests/mm/split_huge_page_test.c @@ -9,11 +9,13 @@ #include <stdlib.h> #include <stdarg.h> #include <unistd.h> +#include <math.h> #include <inttypes.h> #include <string.h> #include <fcntl.h> #include <sys/mman.h> #include <sys/mount.h> +#include <sys/stat.h> #include <malloc.h> #include <stdbool.h> #include <time.h> @@ -404,9 +406,10 @@ void split_thp_in_pagecache_to_order(size_t fd_size, int order, const char *fs_l int main(int argc, char **argv) { - int i; + int i, min_split_order = 0; size_t fd_size; char *optional_xfs_path = NULL; + struct stat filestat; char fs_loc_template[] = "/tmp/thp_fs_XXXXXX"; const char *fs_loc; bool created_tmp; @@ -421,8 +424,6 @@ int main(int argc, char **argv) if (argc > 1) optional_xfs_path = argv[1]; - ksft_set_plan(3+9); - pagesize = getpagesize(); pageshift = ffs(pagesize) - 1; pmd_pagesize = read_pmd_pagesize(); @@ -431,13 +432,19 @@ int main(int argc, char **argv) fd_size = 2 * pmd_pagesize; + created_tmp = prepare_thp_fs(optional_xfs_path, fs_loc_template, + &fs_loc); + + if (!stat(fs_loc, &filestat)) + min_split_order = log2(filestat.st_blksize) - pageshift; + + ksft_set_plan(3 + 9 - min_split_order); + split_pmd_thp(); split_pte_mapped_thp(); split_file_backed_thp(); - created_tmp = prepare_thp_fs(optional_xfs_path, fs_loc_template, - &fs_loc); - for (i = 8; i >= 0; i--) + for (i = 8; i >= min_split_order; i--) split_thp_in_pagecache_to_order(fd_size, i, fs_loc); cleanup_thp_fs(fs_loc, created_tmp); base-commit: 5771112c37523a2344b346d7fe613694a2566df9 -- 2.44.1

1 year, 4 months

2
2
0 0

[PATCH v1] selftests/mm: fix charge_reserved_hugetlb.sh test

by David Hildenbrand

Currently, running the charge_reserved_hugetlb.sh selftest we can sometimes observe something like: $ ./charge_reserved_hugetlb.sh -cgroup-v2 ... write_result is 0 After write: hugetlb_usage=0 reserved_usage=10485760 killing write_to_hugetlbfs Received 2. Deleting the memory Detach failure: Invalid argument umount: /mnt/huge: target is busy. Both cases are issues in the test. While the unmount error seems to be racy, it will make the test fail: $ ./run_vmtests.sh -t hugetlb ... # [FAIL] not ok 10 charge_reserved_hugetlb.sh -cgroup-v2 # exit=32 The issue is that we are not waiting for the write_to_hugetlbfs process to quit. So it might still have a hugetlbfs file open, about which umount is not happy. Fix that by making "killall" wait for the process to quit. The other error ("Detach failure: Invalid argument") does not seem to result in a test error, but is misleading. Turns out write_to_hugetlbfs.c unconditionally tries to cleanup using shmdt(), even when we only mmap()'ed a hugetlb file. Even worse, shmaddr is never even set for the SHM case. Fix that as well. With this change it seems to work as expected. Fixes: 29750f71a9b4 ("hugetlb_cgroup: add hugetlb_cgroup reservation tests") Reported-by: Mario Casquero <mcasquer(a)redhat.com> Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Muchun Song <muchun.song(a)linux.dev> Cc: Mina Almasry <almasrymina(a)google.com> Signed-off-by: David Hildenbrand <david(a)redhat.com> --- .../selftests/mm/charge_reserved_hugetlb.sh | 2 +- .../testing/selftests/mm/write_to_hugetlbfs.c | 21 +++++++++++-------- 2 files changed, 13 insertions(+), 10 deletions(-) diff --git a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh b/tools/testing/selftests/mm/charge_reserved_hugetlb.sh index d680c00d2853a..67df7b47087f0 100755 --- a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh +++ b/tools/testing/selftests/mm/charge_reserved_hugetlb.sh @@ -254,7 +254,7 @@ function cleanup_hugetlb_memory() { local cgroup="$1" if [[ "$(pgrep -f write_to_hugetlbfs)" != "" ]]; then echo killing write_to_hugetlbfs - killall -2 write_to_hugetlbfs + killall -2 --wait write_to_hugetlbfs wait_for_hugetlb_memory_to_get_depleted $cgroup fi set -e diff --git a/tools/testing/selftests/mm/write_to_hugetlbfs.c b/tools/testing/selftests/mm/write_to_hugetlbfs.c index 6a2caba19ee1d..1289d311efd70 100644 --- a/tools/testing/selftests/mm/write_to_hugetlbfs.c +++ b/tools/testing/selftests/mm/write_to_hugetlbfs.c @@ -28,7 +28,7 @@ enum method { /* Global variables. */ static const char *self; -static char *shmaddr; +static int *shmaddr; static int shmid; /* @@ -47,15 +47,17 @@ void sig_handler(int signo) { printf("Received %d.\n", signo); if (signo == SIGINT) { - printf("Deleting the memory\n"); - if (shmdt((const void *)shmaddr) != 0) { - perror("Detach failure"); + if (shmaddr) { + printf("Deleting the memory\n"); + if (shmdt((const void *)shmaddr) != 0) { + perror("Detach failure"); + shmctl(shmid, IPC_RMID, NULL); + exit(4); + } + shmctl(shmid, IPC_RMID, NULL); - exit(4); + printf("Done deleting the memory\n"); } - - shmctl(shmid, IPC_RMID, NULL); - printf("Done deleting the memory\n"); } exit(2); } @@ -211,7 +213,8 @@ int main(int argc, char **argv) shmctl(shmid, IPC_RMID, NULL); exit(2); } - printf("shmaddr: %p\n", ptr); + shmaddr = ptr; + printf("shmaddr: %p\n", shmaddr); break; default: -- 2.46.0

1 year, 4 months

3
4
0 0

[PATCH] ftrace/selftest: Test combination of function_graph tracer and function profiler

by Steven Rostedt

From: Steven Rostedt <rostedt(a)goodmis.org> Masami reported a bug when running function graph tracing then the function profiler. The following commands would cause a kernel crash: # cd /sys/kernel/tracing/ # echo function_graph > current_tracer # echo 1 > function_profile_enabled In that order. Create a test to test this two to make sure this does not come back as a regression. Link: https://lore.kernel.org/172398528350.293426.8347220120333730248.stgit@devno… Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- .../ftrace/test.d/ftrace/fgraph-profiler.tc | 30 +++++++++++++++++++ 1 file changed, 30 insertions(+) create mode 100644 tools/testing/selftests/ftrace/test.d/ftrace/fgraph-profiler.tc diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-profiler.tc b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-profiler.tc new file mode 100644 index 000000000000..62d44a1395da --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-profiler.tc @@ -0,0 +1,30 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: ftrace - function profiler with function graph tracing +# requires: function_profile_enabled set_ftrace_filter function_graph:tracer + +# The function graph tracer can now be run along side of the function +# profiler. But there was a bug that caused the combination of the two +# to crash. It also required the function graph tracer to be started +# first. +# +# This test triggers that bug +# +# We need function_graph and profiling to to run this test + +fail() { # mesg + echo $1 + exit_fail +} + +echo "Enabling function graph tracer:" +echo function_graph > current_tracer +echo "enable profiler" + +# Older kernels do not allow function_profile to be enabled with +# function graph tracer. If the below fails, mark it as unsupported +echo 1 > function_profile_enabled || exit_unsupported + +sleep 1 + +exit 0 -- 2.43.0

1 year, 4 months

3
4
0 0

[PATCH] selftests/livepatch: wait for atomic replace to occur

by Ryan Sullivan

On some machines with a large number of CPUs there is a sizable delay between an atomic replace occurring and when sysfs updates accordingly. This fix uses 'loop_until' to wait for the atomic replace to unload all previous livepatches. Signed-off-by: Ryan Sullivan <rysulliv(a)redhat.com> --- tools/testing/selftests/livepatch/test-livepatch.sh | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/livepatch/test-livepatch.sh b/tools/testing/selftests/livepatch/test-livepatch.sh index 65c9c058458d..bd13257bfdfe 100755 --- a/tools/testing/selftests/livepatch/test-livepatch.sh +++ b/tools/testing/selftests/livepatch/test-livepatch.sh @@ -139,11 +139,8 @@ load_lp $MOD_REPLACE replace=1 grep 'live patched' /proc/cmdline > /dev/kmsg grep 'live patched' /proc/meminfo > /dev/kmsg -mods=(/sys/kernel/livepatch/*) -nmods=${#mods[@]} -if [ "$nmods" -ne 1 ]; then - die "Expecting only one moduled listed, found $nmods" -fi +loop_until 'mods=(/sys/kernel/livepatch/*); nmods=${#mods[@]}; [[ "$nmods" -eq 1 ]]' || + die "Expecting only one moduled listed, found $nmods" # These modules were disabled by the atomic replace for mod in $MOD_LIVEPATCH3 $MOD_LIVEPATCH2 $MOD_LIVEPATCH1; do -- 2.44.0

1 year, 4 months

3
3
0 0

[PATCH net-next v15 04/13] mm: page_frag: avoid caller accessing 'page_frag_cache' directly

by Yunsheng Lin

Use appropriate frag_page API instead of caller accessing 'page_frag_cache' directly. CC: Alexander Duyck <alexander.duyck(a)gmail.com> Signed-off-by: Yunsheng Lin <linyunsheng(a)huawei.com> Reviewed-by: Alexander Duyck <alexanderduyck(a)fb.com> Acked-by: Chuck Lever <chuck.lever(a)oracle.com> --- drivers/vhost/net.c | 2 +- include/linux/page_frag_cache.h | 10 ++++++++++ net/core/skbuff.c | 6 +++--- net/rxrpc/conn_object.c | 4 +--- net/rxrpc/local_object.c | 4 +--- net/sunrpc/svcsock.c | 6 ++---- tools/testing/selftests/mm/page_frag/page_frag_test.c | 2 +- 7 files changed, 19 insertions(+), 15 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index f16279351db5..9ad37c012189 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -1325,7 +1325,7 @@ static int vhost_net_open(struct inode *inode, struct file *f) vqs[VHOST_NET_VQ_RX]); f->private_data = n; - n->pf_cache.va = NULL; + page_frag_cache_init(&n->pf_cache); return 0; } diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h index 67ac8626ed9b..0a52f7a179c8 100644 --- a/include/linux/page_frag_cache.h +++ b/include/linux/page_frag_cache.h @@ -7,6 +7,16 @@ #include <linux/mm_types_task.h> #include <linux/types.h> +static inline void page_frag_cache_init(struct page_frag_cache *nc) +{ + nc->va = NULL; +} + +static inline bool page_frag_cache_is_pfmemalloc(struct page_frag_cache *nc) +{ + return !!nc->pfmemalloc; +} + void page_frag_cache_drain(struct page_frag_cache *nc); void __page_frag_cache_drain(struct page *page, unsigned int count); void *__page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 1748673e1fe0..9352fcf8cda3 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -752,14 +752,14 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len, if (in_hardirq() || irqs_disabled()) { nc = this_cpu_ptr(&netdev_alloc_cache); data = page_frag_alloc(nc, len, gfp_mask); - pfmemalloc = nc->pfmemalloc; + pfmemalloc = page_frag_cache_is_pfmemalloc(nc); } else { local_bh_disable(); local_lock_nested_bh(&napi_alloc_cache.bh_lock); nc = this_cpu_ptr(&napi_alloc_cache.page); data = page_frag_alloc(nc, len, gfp_mask); - pfmemalloc = nc->pfmemalloc; + pfmemalloc = page_frag_cache_is_pfmemalloc(nc); local_unlock_nested_bh(&napi_alloc_cache.bh_lock); local_bh_enable(); @@ -849,7 +849,7 @@ struct sk_buff *napi_alloc_skb(struct napi_struct *napi, unsigned int len) len = SKB_HEAD_ALIGN(len); data = page_frag_alloc(&nc->page, len, gfp_mask); - pfmemalloc = nc->page.pfmemalloc; + pfmemalloc = page_frag_cache_is_pfmemalloc(&nc->page); } local_unlock_nested_bh(&napi_alloc_cache.bh_lock); diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c index 1539d315afe7..694c4df7a1a3 100644 --- a/net/rxrpc/conn_object.c +++ b/net/rxrpc/conn_object.c @@ -337,9 +337,7 @@ static void rxrpc_clean_up_connection(struct work_struct *work) */ rxrpc_purge_queue(&conn->rx_queue); - if (conn->tx_data_alloc.va) - __page_frag_cache_drain(virt_to_page(conn->tx_data_alloc.va), - conn->tx_data_alloc.pagecnt_bias); + page_frag_cache_drain(&conn->tx_data_alloc); call_rcu(&conn->rcu, rxrpc_rcu_free_connection); } diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c index 504453c688d7..a8cffe47cf01 100644 --- a/net/rxrpc/local_object.c +++ b/net/rxrpc/local_object.c @@ -452,9 +452,7 @@ void rxrpc_destroy_local(struct rxrpc_local *local) #endif rxrpc_purge_queue(&local->rx_queue); rxrpc_purge_client_connections(local); - if (local->tx_alloc.va) - __page_frag_cache_drain(virt_to_page(local->tx_alloc.va), - local->tx_alloc.pagecnt_bias); + page_frag_cache_drain(&local->tx_alloc); } /* diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 6b3f01beb294..dcfd84cf0694 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1609,7 +1609,6 @@ static void svc_tcp_sock_detach(struct svc_xprt *xprt) static void svc_sock_free(struct svc_xprt *xprt) { struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt); - struct page_frag_cache *pfc = &svsk->sk_frag_cache; struct socket *sock = svsk->sk_sock; trace_svcsock_free(svsk, sock); @@ -1619,8 +1618,7 @@ static void svc_sock_free(struct svc_xprt *xprt) sockfd_put(sock); else sock_release(sock); - if (pfc->va) - __page_frag_cache_drain(virt_to_head_page(pfc->va), - pfc->pagecnt_bias); + + page_frag_cache_drain(&svsk->sk_frag_cache); kfree(svsk); } diff --git a/tools/testing/selftests/mm/page_frag/page_frag_test.c b/tools/testing/selftests/mm/page_frag/page_frag_test.c index 4a009122991e..c52598eaf7e7 100644 --- a/tools/testing/selftests/mm/page_frag/page_frag_test.c +++ b/tools/testing/selftests/mm/page_frag/page_frag_test.c @@ -117,7 +117,7 @@ static int __init page_frag_test_init(void) u64 duration; int ret; - test_frag.va = NULL; + page_frag_cache_init(&test_frag); atomic_set(&nthreads, 2); init_completion(&wait); -- 2.33.0

1 year, 4 months

1
0
0 0

[PATCH net-next v15 02/13] mm: move the page fragment allocator from page_alloc into its own file

by Yunsheng Lin

Inspired by [1], move the page fragment allocator from page_alloc into its own c file and header file, as we are about to make more change for it to replace another page_frag implementation in sock.c As this patchset is going to replace 'struct page_frag' with 'struct page_frag_cache' in sched.h, including page_frag_cache.h in sched.h has a compiler error caused by interdependence between mm_types.h and mm.h for asm-offsets.c, see [2]. So avoid the compiler error by moving 'struct page_frag_cache' to mm_types_task.h as suggested by Alexander, see [3]. 1. https://lore.kernel.org/all/20230411160902.4134381-3-dhowells@redhat.com/ 2. https://lore.kernel.org/all/15623dac-9358-4597-b3ee-3694a5956920@gmail.com/ 3. https://lore.kernel.org/all/CAKgT0UdH1yD=LSCXFJ=YM_aiA4OomD-2wXykO42bizaWMt… CC: David Howells <dhowells(a)redhat.com> CC: Alexander Duyck <alexander.duyck(a)gmail.com> Signed-off-by: Yunsheng Lin <linyunsheng(a)huawei.com> Acked-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/gfp.h | 22 --- include/linux/mm_types.h | 18 --- include/linux/mm_types_task.h | 18 +++ include/linux/page_frag_cache.h | 31 ++++ include/linux/skbuff.h | 1 + mm/Makefile | 1 + mm/page_alloc.c | 136 ---------------- mm/page_frag_cache.c | 145 ++++++++++++++++++ .../selftests/mm/page_frag/page_frag_test.c | 2 +- 9 files changed, 197 insertions(+), 177 deletions(-) create mode 100644 include/linux/page_frag_cache.h create mode 100644 mm/page_frag_cache.c diff --git a/include/linux/gfp.h b/include/linux/gfp.h index f53f76e0b17e..01a49be7c98d 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -371,28 +371,6 @@ __meminit void *alloc_pages_exact_nid_noprof(int nid, size_t size, gfp_t gfp_mas extern void __free_pages(struct page *page, unsigned int order); extern void free_pages(unsigned long addr, unsigned int order); -struct page_frag_cache; -void page_frag_cache_drain(struct page_frag_cache *nc); -extern void __page_frag_cache_drain(struct page *page, unsigned int count); -void *__page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, - gfp_t gfp_mask, unsigned int align_mask); - -static inline void *page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align) -{ - WARN_ON_ONCE(!is_power_of_2(align)); - return __page_frag_alloc_align(nc, fragsz, gfp_mask, -align); -} - -static inline void *page_frag_alloc(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask) -{ - return __page_frag_alloc_align(nc, fragsz, gfp_mask, ~0u); -} - -extern void page_frag_free(void *addr); - #define __free_page(page) __free_pages((page), 0) #define free_page(addr) free_pages((addr), 0) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 485424979254..843d75412105 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -521,9 +521,6 @@ static_assert(sizeof(struct ptdesc) <= sizeof(struct page)); */ #define STRUCT_PAGE_MAX_SHIFT (order_base_2(sizeof(struct page))) -#define PAGE_FRAG_CACHE_MAX_SIZE __ALIGN_MASK(32768, ~PAGE_MASK) -#define PAGE_FRAG_CACHE_MAX_ORDER get_order(PAGE_FRAG_CACHE_MAX_SIZE) - /* * page_private can be used on tail pages. However, PagePrivate is only * checked by the VM on the head page. So page_private on the tail pages @@ -542,21 +539,6 @@ static inline void *folio_get_private(struct folio *folio) return folio->private; } -struct page_frag_cache { - void * va; -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - __u16 offset; - __u16 size; -#else - __u32 offset; -#endif - /* we maintain a pagecount bias, so that we dont dirty cache line - * containing page->_refcount every time we allocate a fragment. - */ - unsigned int pagecnt_bias; - bool pfmemalloc; -}; - typedef unsigned long vm_flags_t; /* diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h index a2f6179b672b..cdc1e3696439 100644 --- a/include/linux/mm_types_task.h +++ b/include/linux/mm_types_task.h @@ -8,6 +8,7 @@ * (These are defined separately to decouple sched.h from mm_types.h as much as possible.) */ +#include <linux/align.h> #include <linux/types.h> #include <asm/page.h> @@ -46,6 +47,23 @@ struct page_frag { #endif }; +#define PAGE_FRAG_CACHE_MAX_SIZE __ALIGN_MASK(32768, ~PAGE_MASK) +#define PAGE_FRAG_CACHE_MAX_ORDER get_order(PAGE_FRAG_CACHE_MAX_SIZE) +struct page_frag_cache { + void *va; +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + __u16 offset; + __u16 size; +#else + __u32 offset; +#endif + /* we maintain a pagecount bias, so that we dont dirty cache line + * containing page->_refcount every time we allocate a fragment. + */ + unsigned int pagecnt_bias; + bool pfmemalloc; +}; + /* Track pages that require TLB flushes */ struct tlbflush_unmap_batch { #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h new file mode 100644 index 000000000000..67ac8626ed9b --- /dev/null +++ b/include/linux/page_frag_cache.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _LINUX_PAGE_FRAG_CACHE_H +#define _LINUX_PAGE_FRAG_CACHE_H + +#include <linux/log2.h> +#include <linux/mm_types_task.h> +#include <linux/types.h> + +void page_frag_cache_drain(struct page_frag_cache *nc); +void __page_frag_cache_drain(struct page *page, unsigned int count); +void *__page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, + gfp_t gfp_mask, unsigned int align_mask); + +static inline void *page_frag_alloc_align(struct page_frag_cache *nc, + unsigned int fragsz, gfp_t gfp_mask, + unsigned int align) +{ + WARN_ON_ONCE(!is_power_of_2(align)); + return __page_frag_alloc_align(nc, fragsz, gfp_mask, -align); +} + +static inline void *page_frag_alloc(struct page_frag_cache *nc, + unsigned int fragsz, gfp_t gfp_mask) +{ + return __page_frag_alloc_align(nc, fragsz, gfp_mask, ~0u); +} + +void page_frag_free(void *addr); + +#endif diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index cf8f6ce06742..7482997c719f 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -31,6 +31,7 @@ #include <linux/in6.h> #include <linux/if_packet.h> #include <linux/llist.h> +#include <linux/page_frag_cache.h> #include <net/flow.h> #if IS_ENABLED(CONFIG_NF_CONNTRACK) #include <linux/netfilter/nf_conntrack_common.h> diff --git a/mm/Makefile b/mm/Makefile index d2915f8c9dc0..e9d342fa8058 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -65,6 +65,7 @@ page-alloc-$(CONFIG_SHUFFLE_PAGE_ALLOCATOR) += shuffle.o memory-hotplug-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o obj-y += page-alloc.o +obj-y += page_frag_cache.o obj-y += init-mm.o obj-y += memblock.o obj-y += $(memory-hotplug-y) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c565de8f48e9..d0e88aa6eb0d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4798,142 +4798,6 @@ void free_pages(unsigned long addr, unsigned int order) EXPORT_SYMBOL(free_pages); -/* - * Page Fragment: - * An arbitrary-length arbitrary-offset area of memory which resides - * within a 0 or higher order page. Multiple fragments within that page - * are individually refcounted, in the page's reference counter. - * - * The page_frag functions below provide a simple allocation framework for - * page fragments. This is used by the network stack and network device - * drivers to provide a backing region of memory for use as either an - * sk_buff->head, or to be used in the "frags" portion of skb_shared_info. - */ -static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, - gfp_t gfp_mask) -{ - struct page *page = NULL; - gfp_t gfp = gfp_mask; - -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_COMP | - __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; - page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, - PAGE_FRAG_CACHE_MAX_ORDER); - nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE; -#endif - if (unlikely(!page)) - page = alloc_pages_node(NUMA_NO_NODE, gfp, 0); - - nc->va = page ? page_address(page) : NULL; - - return page; -} - -void page_frag_cache_drain(struct page_frag_cache *nc) -{ - if (!nc->va) - return; - - __page_frag_cache_drain(virt_to_head_page(nc->va), nc->pagecnt_bias); - nc->va = NULL; -} -EXPORT_SYMBOL(page_frag_cache_drain); - -void __page_frag_cache_drain(struct page *page, unsigned int count) -{ - VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); - - if (page_ref_sub_and_test(page, count)) - free_unref_page(page, compound_order(page)); -} -EXPORT_SYMBOL(__page_frag_cache_drain); - -void *__page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align_mask) -{ - unsigned int size = PAGE_SIZE; - struct page *page; - int offset; - - if (unlikely(!nc->va)) { -refill: - page = __page_frag_cache_refill(nc, gfp_mask); - if (!page) - return NULL; - -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size = nc->size; -#endif - /* Even if we own the page, we do not use atomic_set(). - * This would break get_page_unless_zero() users. - */ - page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); - - /* reset page count bias and offset to start of new frag */ - nc->pfmemalloc = page_is_pfmemalloc(page); - nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - nc->offset = size; - } - - offset = nc->offset - fragsz; - if (unlikely(offset < 0)) { - page = virt_to_page(nc->va); - - if (!page_ref_sub_and_test(page, nc->pagecnt_bias)) - goto refill; - - if (unlikely(nc->pfmemalloc)) { - free_unref_page(page, compound_order(page)); - goto refill; - } - -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size = nc->size; -#endif - /* OK, page count is 0, we can safely set it */ - set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); - - /* reset page count bias and offset to start of new frag */ - nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - offset = size - fragsz; - if (unlikely(offset < 0)) { - /* - * The caller is trying to allocate a fragment - * with fragsz > PAGE_SIZE but the cache isn't big - * enough to satisfy the request, this may - * happen in low memory conditions. - * We don't release the cache page because - * it could make memory pressure worse - * so we simply return NULL here. - */ - return NULL; - } - } - - nc->pagecnt_bias--; - offset &= align_mask; - nc->offset = offset; - - return nc->va + offset; -} -EXPORT_SYMBOL(__page_frag_alloc_align); - -/* - * Frees a page fragment allocated out of either a compound or order 0 page. - */ -void page_frag_free(void *addr) -{ - struct page *page = virt_to_head_page(addr); - - if (unlikely(put_page_testzero(page))) - free_unref_page(page, compound_order(page)); -} -EXPORT_SYMBOL(page_frag_free); - static void *make_alloc_exact(unsigned long addr, unsigned int order, size_t size) { diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c new file mode 100644 index 000000000000..609a485cd02a --- /dev/null +++ b/mm/page_frag_cache.c @@ -0,0 +1,145 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Page fragment allocator + * + * Page Fragment: + * An arbitrary-length arbitrary-offset area of memory which resides within a + * 0 or higher order page. Multiple fragments within that page are + * individually refcounted, in the page's reference counter. + * + * The page_frag functions provide a simple allocation framework for page + * fragments. This is used by the network stack and network device drivers to + * provide a backing region of memory for use as either an sk_buff->head, or to + * be used in the "frags" portion of skb_shared_info. + */ + +#include <linux/export.h> +#include <linux/gfp_types.h> +#include <linux/init.h> +#include <linux/mm.h> +#include <linux/page_frag_cache.h> +#include "internal.h" + +static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, + gfp_t gfp_mask) +{ + struct page *page = NULL; + gfp_t gfp = gfp_mask; + +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_COMP | + __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; + page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, + PAGE_FRAG_CACHE_MAX_ORDER); + nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE; +#endif + if (unlikely(!page)) + page = alloc_pages_node(NUMA_NO_NODE, gfp, 0); + + nc->va = page ? page_address(page) : NULL; + + return page; +} + +void page_frag_cache_drain(struct page_frag_cache *nc) +{ + if (!nc->va) + return; + + __page_frag_cache_drain(virt_to_head_page(nc->va), nc->pagecnt_bias); + nc->va = NULL; +} +EXPORT_SYMBOL(page_frag_cache_drain); + +void __page_frag_cache_drain(struct page *page, unsigned int count) +{ + VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); + + if (page_ref_sub_and_test(page, count)) + free_unref_page(page, compound_order(page)); +} +EXPORT_SYMBOL(__page_frag_cache_drain); + +void *__page_frag_alloc_align(struct page_frag_cache *nc, + unsigned int fragsz, gfp_t gfp_mask, + unsigned int align_mask) +{ + unsigned int size = PAGE_SIZE; + struct page *page; + int offset; + + if (unlikely(!nc->va)) { +refill: + page = __page_frag_cache_refill(nc, gfp_mask); + if (!page) + return NULL; + +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + /* if size can vary use size else just use PAGE_SIZE */ + size = nc->size; +#endif + /* Even if we own the page, we do not use atomic_set(). + * This would break get_page_unless_zero() users. + */ + page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); + + /* reset page count bias and offset to start of new frag */ + nc->pfmemalloc = page_is_pfmemalloc(page); + nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; + nc->offset = size; + } + + offset = nc->offset - fragsz; + if (unlikely(offset < 0)) { + page = virt_to_page(nc->va); + + if (!page_ref_sub_and_test(page, nc->pagecnt_bias)) + goto refill; + + if (unlikely(nc->pfmemalloc)) { + free_unref_page(page, compound_order(page)); + goto refill; + } + +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + /* if size can vary use size else just use PAGE_SIZE */ + size = nc->size; +#endif + /* OK, page count is 0, we can safely set it */ + set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); + + /* reset page count bias and offset to start of new frag */ + nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; + offset = size - fragsz; + if (unlikely(offset < 0)) { + /* + * The caller is trying to allocate a fragment + * with fragsz > PAGE_SIZE but the cache isn't big + * enough to satisfy the request, this may + * happen in low memory conditions. + * We don't release the cache page because + * it could make memory pressure worse + * so we simply return NULL here. + */ + return NULL; + } + } + + nc->pagecnt_bias--; + offset &= align_mask; + nc->offset = offset; + + return nc->va + offset; +} +EXPORT_SYMBOL(__page_frag_alloc_align); + +/* + * Frees a page fragment allocated out of either a compound or order 0 page. + */ +void page_frag_free(void *addr) +{ + struct page *page = virt_to_head_page(addr); + + if (unlikely(put_page_testzero(page))) + free_unref_page(page, compound_order(page)); +} +EXPORT_SYMBOL(page_frag_free); diff --git a/tools/testing/selftests/mm/page_frag/page_frag_test.c b/tools/testing/selftests/mm/page_frag/page_frag_test.c index 0e803db1ad79..4a009122991e 100644 --- a/tools/testing/selftests/mm/page_frag/page_frag_test.c +++ b/tools/testing/selftests/mm/page_frag/page_frag_test.c @@ -6,12 +6,12 @@ * Copyright: linyunsheng(a)huawei.com */ -#include <linux/mm.h> #include <linux/module.h> #include <linux/cpumask.h> #include <linux/completion.h> #include <linux/ptr_ring.h> #include <linux/kthread.h> +#include <linux/page_frag_cache.h> static struct ptr_ring ptr_ring; static int nr_objs = 512; -- 2.33.0

1 year, 4 months

1
0
0 0

[PATCH rc 0/2] Fix maps created without READ or WRITE

by Jason Gunthorpe

I noticed some bugs here while working on iommupt. Fix them up. Joerg, can you pick this both for your -rc branch? Thanks, Jason Jason Gunthorpe (2): iommufd: Do not allow creating areas without READ or WRITE iommu: Do not return 0 from map_pages if it doesn't do anything drivers/iommu/io-pgtable-arm-v7s.c | 3 +-- drivers/iommu/io-pgtable-arm.c | 3 +-- drivers/iommu/io-pgtable-dart.c | 3 +-- drivers/iommu/iommufd/ioas.c | 8 ++++++++ tools/testing/selftests/iommu/iommufd.c | 6 +++--- 5 files changed, 14 insertions(+), 9 deletions(-) base-commit: 4be8b00b2b0f669989486e9f2fb9b65edb4ef8c4 -- 2.46.0

1 year, 4 months

5
7
0 0

[PATCH v5 0/4] Allow userspace to change ID_AA64PFR1_EL1

by Shaoqin Huang

Hi guys, This is another try to allow userspace to change ID_AA64PFR1_EL1, and we want to give userspace the ability to control the visible feature set for a VM, which could be used by userspace in such a way to transparently migrate VMs. The patch series have four part: The first patch disable those fields which KVM doesn't know how to handle, so KVM will only expose value 0 of those fields to the guest. The second patch check the FEAT_SSBS in guest IDREG instead of the cpu capability. The third patch allow userspace to change ID_AA64PFR1_EL1, it only advertise the fields known to KVM and leave others unadvertise. The fourth patch adds the kselftest to test if userspace can change the ID_AA64PFR1_EL1. Besides, I also noticed there is another patch [1] which try to make the ID_AA64PFR1_EL1 writable. This patch [1] is try to enable GCS on baremental, and add GCS support for the guest. What I understand is if we have GCS support on baremental, it will be clear to how to handle them in KVM. And same for other fields like NMI, THE, DF2, MTEX.. At that time, they can be writable. [1] [PATCH v9 13/39] KVM: arm64: Manage GCS registers for guests https://lore.kernel.org/all/20240625-arm64-gcs-v9-13-0f634469b8f0@kernel.or… Changelog: ---------- v4 -> v5: * Only advertise fields which KVM know how to handle to userspace, leave others unadvertised. * Add a new patch to check FEAT_SSBS in IDREG instead of cpu capability. * Tweak the kselftest writable fields. * Improve the commit message. v3 -> v4: * Add a new patch to disable some feature which KVM doesn't know how to handle in the register accessor. * Handle all the fields in the register. * Fixes a small cnt issue in kselftest. v2 -> v3: * Give more description about why only part of the fields can be writable. * Updated the writable mask by referring the latest ARM spec. v1 -> v2: * Tackling the full register instead of single field. * Changing the patch title and commit message. RFCv1 -> v1: * Fix the compilation error. * Delete the machine specific information and make the description more generable. RFCv1: https://lore.kernel.org/all/20240612023553.127813-1-shahuang@redhat.com/ v1: https://lore.kernel.org/all/20240617075131.1006173-1-shahuang@redhat.com/ v2: https://lore.kernel.org/all/20240618063808.1040085-1-shahuang@redhat.com/ v3: https://lore.kernel.org/all/20240628060454.1936886-2-shahuang@redhat.com/ v4: https://lore.kernel.org/all/20240718035017.434996-1-shahuang@redhat.com/ Shaoqin Huang (4): KVM: arm64: Disable fields that KVM doesn't know how to handle in ID_AA64PFR1_EL1 KVM: arm64: Use kvm_has_feat() to check if FEAT_SSBS is advertised to the guest KVM: arm64: Allow userspace to change ID_AA64PFR1_EL1 KVM: selftests: aarch64: Add writable test for ID_AA64PFR1_EL1 arch/arm64/kvm/hypercalls.c | 12 +++++----- arch/arm64/kvm/sys_regs.c | 22 ++++++++++++++++++- .../selftests/kvm/aarch64/set_id_regs.c | 14 +++++++++--- 3 files changed, 38 insertions(+), 10 deletions(-) -- 2.40.1

1 year, 4 months

3
6
0 0

[PATCH bpf-next 0/2] selftests: bpf: avoid duplicated UAPI headers

by Matthieu Baerts (NGI0)

If a BPF selftest program requires (recent) UAPI headers [1], it is currently needed to duplicate these header files into tools/include/uapi. That's not a good solution, because it is a duplication that needs to be kept up-to-date, while the required files are only a few directories away. A solution to avoid these duplicated files is to use the KHDR_INCLUDES from the kselftest infrastructure. That is what is being done in the first patch. The second patch removes 'if_xdp.h', which is no longer needed, and was causing a warning when building the libbpf required by the BPF selftests. There could be more duplicated UAPI header files that could be removed, but I didn't spend too much time checking which ones are not used by anything else from the 'tools' directory. Hopefully, these modifications should not cause any issues on the different CIs, because it is using the recommended method for the kernel selftests. If this causes issues on the CIs side, it should be easy to fix by overriding the KHDR_INCLUDES variable, and it might be better to do that, because it likely means the CI is not following the recommended way to execute the kernel selftests. See patch 1/2 for more details about that. Link: https://lore.kernel.org/all/08f925cd-e267-4a6b-84b1-792515c4e199@kernel.org… [1] Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org> --- Matthieu Baerts (NGI0) (2): selftests: bpf: use KHDR_INCLUDES for the UAPI headers selftests: bpf: remove duplicated UAPI if_xdp headers tools/include/uapi/linux/if_xdp.h | 173 --------------------- tools/lib/bpf/Makefile | 3 - tools/testing/selftests/bpf/Makefile | 2 +- .../selftests/bpf/prog_tests/assign_reuse.c | 2 +- tools/testing/selftests/bpf/prog_tests/tc_links.c | 4 +- tools/testing/selftests/bpf/prog_tests/tc_netkit.c | 2 +- tools/testing/selftests/bpf/prog_tests/tc_opts.c | 2 +- .../selftests/bpf/prog_tests/user_ringbuf.c | 2 +- .../testing/selftests/bpf/prog_tests/xdp_bonding.c | 2 +- .../selftests/bpf/prog_tests/xdp_cpumap_attach.c | 2 +- .../selftests/bpf/prog_tests/xdp_devmap_attach.c | 2 +- .../selftests/bpf/prog_tests/xdp_do_redirect.c | 2 +- tools/testing/selftests/bpf/prog_tests/xdp_link.c | 2 +- tools/testing/selftests/bpf/xdp_features.c | 4 +- 14 files changed, 14 insertions(+), 190 deletions(-) --- base-commit: fdf1c728fac541891ef1aa773bfd42728626769c change-id: 20240816-ups-bpf-next-selftests-use-khdr-28f935c8848a Best regards, -- Matthieu Baerts (NGI0) <matttbe(a)kernel.org>

1 year, 4 months

3
7
0 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror