September 2022 - Linux-kselftest-mirror

[PATCH v2 net-next] selftests: forwarding: add Per-Stream Filtering and Policing test for Ocelot

by Vladimir Oltean

The Felix VSC9959 switch in NXP LS1028A supports the tc-gate action which enforced time-based access control per stream. A stream as seen by this switch is identified by {MAC DA, VID}. We use the standard forwarding selftest topology with 2 host interfaces and 2 switch interfaces. The host ports must require timestamping non-IP packets and supporting tc-etf offload, for isochron to work. The isochron program monitors network sync status (ptp4l, phc2sys) and deterministically transmits packets to the switch such that the tc-gate action either (a) always accepts them based on its schedule, or (b) always drops them. I tried to keep as much of the logic that isn't specific to the NXP LS1028A in a new tsn_lib.sh, for future reuse. This covers synchronization using ptp4l and phc2sys, and isochron. The cycle-time chosen for this selftest isn't particularly impressive (and the focus is the functionality of the switch), but I didn't really know what to do better, considering that it will mostly be run during debugging sessions, various kernel bloatware would be enabled, like lockdep, KASAN, etc, and we certainly can't run any races with those on. I tried to look through the kselftest framework for other real time applications and didn't really find any, so I'm not sure how better to prepare the environment in case we want to go for a lower cycle time. At the moment, the only thing the selftest is ensuring is that dynamic frequency scaling is disabled on the CPU that isochron runs on. It would probably be useful to have a blacklist of kernel config options (checked through zcat /proc/config.gz) and some cyclictest scripts to run beforehand, but I saw none of those. Signed-off-by: Vladimir Oltean <vladimir.oltean(a)nxp.com> --- v1->v2: - fix an off-by-one bug introduced at the last minute regarding which tc-mqprio queue was used for tc-etf and SO_TXTIME - introduce debugging for packets incorrectly received / incorrectly dropped based on "isochron report" - make the tsn_lib.sh dependency on isochron and linuxptp optional via REQUIRE_ISOCHRON and REQUIRE_LINUXPTP - avoid errors when CONFIG_CPU_FREQ is disabled - consistently use SCHED_FIFO instead of SCHED_RR for the isochron receiver .../selftests/drivers/net/ocelot/psfp.sh | 327 ++++++++++++++++++ .../selftests/net/forwarding/tsn_lib.sh | 235 +++++++++++++ 2 files changed, 562 insertions(+) create mode 100755 tools/testing/selftests/drivers/net/ocelot/psfp.sh create mode 100644 tools/testing/selftests/net/forwarding/tsn_lib.sh diff --git a/tools/testing/selftests/drivers/net/ocelot/psfp.sh b/tools/testing/selftests/drivers/net/ocelot/psfp.sh new file mode 100755 index 000000000000..5a5cee92c665 --- /dev/null +++ b/tools/testing/selftests/drivers/net/ocelot/psfp.sh @@ -0,0 +1,327 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright 2021-2022 NXP + +# Note: On LS1028A, in lack of enough user ports, this setup requires patching +# the device tree to use the second CPU port as a user port + +WAIT_TIME=1 +NUM_NETIFS=4 +STABLE_MAC_ADDRS=yes +NETIF_CREATE=no +lib_dir=$(dirname $0)/../../../net/forwarding +source $lib_dir/tc_common.sh +source $lib_dir/lib.sh +source $lib_dir/tsn_lib.sh + +UDS_ADDRESS_H1="/var/run/ptp4l_h1" +UDS_ADDRESS_SWP1="/var/run/ptp4l_swp1" + +# Tunables +NUM_PKTS=1000 +STREAM_VID=100 +STREAM_PRIO=6 +# Use a conservative cycle of 10 ms to allow the test to still pass when the +# kernel has some extra overhead like lockdep etc +CYCLE_TIME_NS=10000000 +# Create two Gate Control List entries, one OPEN and one CLOSE, of equal +# durations +GATE_DURATION_NS=$((${CYCLE_TIME_NS} / 2)) +# Give 2/3 of the cycle time to user space and 1/3 to the kernel +FUDGE_FACTOR=$((${CYCLE_TIME_NS} / 3)) +# Shift the isochron base time by half the gate time, so that packets are +# always received by swp1 close to the middle of the time slot, to minimize +# inaccuracies due to network sync +SHIFT_TIME_NS=$((${GATE_DURATION_NS} / 2)) + +h1=${NETIFS[p1]} +swp1=${NETIFS[p2]} +swp2=${NETIFS[p3]} +h2=${NETIFS[p4]} + +H1_IPV4="192.0.2.1" +H2_IPV4="192.0.2.2" +H1_IPV6="2001:db8:1::1" +H2_IPV6="2001:db8:1::2" + +# Chain number exported by the ocelot driver for +# Per-Stream Filtering and Policing filters +PSFP() +{ + echo 30000 +} + +psfp_chain_create() +{ + local if_name=$1 + + tc qdisc add dev $if_name clsact + + tc filter add dev $if_name ingress chain 0 pref 49152 flower \ + skip_sw action goto chain $(PSFP) +} + +psfp_chain_destroy() +{ + local if_name=$1 + + tc qdisc del dev $if_name clsact +} + +psfp_filter_check() +{ + local expected=$1 + local packets="" + local drops="" + local stats="" + + stats=$(tc -j -s filter show dev ${swp1} ingress chain $(PSFP) pref 1) + packets=$(echo ${stats} | jq ".[1].options.actions[].stats.packets") + drops=$(echo ${stats} | jq ".[1].options.actions[].stats.drops") + + if ! [ "${packets}" = "${expected}" ]; then + printf "Expected filter to match on %d packets but matched on %d instead\n" \ + "${expected}" "${packets}" + fi + + echo "Hardware filter reports ${drops} drops" +} + +h1_create() +{ + simple_if_init $h1 $H1_IPV4/24 $H1_IPV6/64 +} + +h1_destroy() +{ + simple_if_fini $h1 $H1_IPV4/24 $H1_IPV6/64 +} + +h2_create() +{ + simple_if_init $h2 $H2_IPV4/24 $H2_IPV6/64 +} + +h2_destroy() +{ + simple_if_fini $h2 $H2_IPV4/24 $H2_IPV6/64 +} + +switch_create() +{ + local h2_mac_addr=$(mac_get $h2) + + ip link set ${swp1} up + ip link set ${swp2} up + + ip link add br0 type bridge vlan_filtering 1 + ip link set ${swp1} master br0 + ip link set ${swp2} master br0 + ip link set br0 up + + bridge vlan add dev ${swp2} vid ${STREAM_VID} + bridge vlan add dev ${swp1} vid ${STREAM_VID} + # PSFP on Ocelot requires the filter to also be added to the bridge + # FDB, and not be removed + bridge fdb add dev ${swp2} \ + ${h2_mac_addr} vlan ${STREAM_VID} static master + + psfp_chain_create ${swp1} + + tc filter add dev ${swp1} ingress chain $(PSFP) pref 1 \ + protocol 802.1Q flower skip_sw \ + dst_mac ${h2_mac_addr} vlan_id ${STREAM_VID} \ + action gate base-time 0.000000000 \ + sched-entry OPEN ${GATE_DURATION_NS} -1 -1 \ + sched-entry CLOSE ${GATE_DURATION_NS} -1 -1 +} + +switch_destroy() +{ + psfp_chain_destroy ${swp1} + ip link del br0 +} + +txtime_setup() +{ + local if_name=$1 + + tc qdisc add dev ${if_name} clsact + # Classify PTP on TC 7 and isochron on TC 6 + tc filter add dev ${if_name} egress protocol 0x88f7 \ + flower action skbedit priority 7 + tc filter add dev ${if_name} egress protocol 802.1Q \ + flower vlan_ethtype 0xdead action skbedit priority 6 + tc qdisc add dev ${if_name} handle 100: parent root mqprio num_tc 8 \ + queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ + map 0 1 2 3 4 5 6 7 \ + hw 1 + # Set up TC 6 for SO_TXTIME. tc-mqprio queues count from 1. + tc qdisc replace dev ${if_name} parent 100:$((${STREAM_PRIO} + 1)) etf \ + clockid CLOCK_TAI offload delta ${FUDGE_FACTOR} +} + +txtime_cleanup() +{ + local if_name=$1 + + tc qdisc del dev ${if_name} root + tc qdisc del dev ${if_name} clsact +} + +setup_prepare() +{ + vrf_prepare + + h1_create + h2_create + switch_create + + txtime_setup ${h1} + + # Set up swp1 as a master PHC for h1, synchronized to the local + # CLOCK_REALTIME. + phc2sys_start ${swp1} ${UDS_ADDRESS_SWP1} + + # Assumption true for LS1028A: h1 and h2 use the same PHC. So by + # synchronizing h1 to swp1 via PTP, h2 is also implicitly synchronized + # to swp1 (and both to CLOCK_REALTIME). + ptp4l_start ${h1} true ${UDS_ADDRESS_H1} + ptp4l_start ${swp1} false ${UDS_ADDRESS_SWP1} + + # Make sure there are no filter matches at the beginning of the test + psfp_filter_check 0 +} + +cleanup() +{ + pre_cleanup + + ptp4l_stop ${swp1} + ptp4l_stop ${h1} + phc2sys_stop + isochron_recv_stop + + txtime_cleanup ${h1} + + h2_destroy + h1_destroy + switch_destroy + + vrf_cleanup +} + +debug_incorrectly_dropped_packets() +{ + local isochron_dat=$1 + local dropped_seqids + local seqid + + echo "Packets incorrectly dropped:" + + dropped_seqids=$(isochron report \ + --input-file "${isochron_dat}" \ + --printf-format "%u RX hw %T\n" \ + --printf-args "qR" | \ + grep 'RX hw 0.000000000' | \ + awk '{print $1}') + + for seqid in ${dropped_seqids}; do + isochron report \ + --input-file "${isochron_dat}" \ + --start ${seqid} --stop ${seqid} \ + --printf-format "seqid %u scheduled for %T, HW TX timestamp %T\n" \ + --printf-args "qST" + done +} + +debug_incorrectly_received_packets() +{ + local isochron_dat=$1 + + echo "Packets incorrectly received:" + + isochron report \ + --input-file "${isochron_dat}" \ + --printf-format "seqid %u scheduled for %T, HW TX timestamp %T, HW RX timestamp %T\n" \ + --printf-args "qSTR" | + grep -v 'HW RX timestamp 0.000000000' +} + +run_test() +{ + local base_time=$1 + local expected=$2 + local test_name=$3 + local debug=$4 + local isochron_dat="$(mktemp)" + local extra_args="" + local received + + isochron_do \ + "${h1}" \ + "${h2}" \ + "${UDS_ADDRESS_H1}" \ + "" \ + "${base_time}" \ + "${CYCLE_TIME_NS}" \ + "${SHIFT_TIME_NS}" \ + "${NUM_PKTS}" \ + "${STREAM_VID}" \ + "${STREAM_PRIO}" \ + "" \ + "${isochron_dat}" + + # Count all received packets by looking at the non-zero RX timestamps + received=$(isochron report \ + --input-file "${isochron_dat}" \ + --printf-format "%u\n" --printf-args "R" | \ + grep -w -v '0' | wc -l) + + if [ "${received}" = "${expected}" ]; then + RET=0 + else + RET=1 + echo "Expected isochron to receive ${expected} packets but received ${received}" + fi + + log_test "${test_name}" + + if [ "$RET" = "1" ]; then + ${debug} "${isochron_dat}" + fi + + rm ${isochron_dat} 2> /dev/null +} + +test_gate_in_band() +{ + # Send packets in-band with the OPEN gate entry + run_test 0.000000000 ${NUM_PKTS} "In band" \ + debug_incorrectly_dropped_packets + + psfp_filter_check ${NUM_PKTS} +} + +test_gate_out_of_band() +{ + # Send packets in-band with the CLOSE gate entry + run_test 0.005000000 0 "Out of band" \ + debug_incorrectly_received_packets + + psfp_filter_check $((2 * ${NUM_PKTS})) +} + +trap cleanup EXIT + +ALL_TESTS=" + test_gate_in_band + test_gate_out_of_band +" + +setup_prepare +setup_wait + +tests_run + +exit $EXIT_STATUS diff --git a/tools/testing/selftests/net/forwarding/tsn_lib.sh b/tools/testing/selftests/net/forwarding/tsn_lib.sh new file mode 100644 index 000000000000..60a1423e8116 --- /dev/null +++ b/tools/testing/selftests/net/forwarding/tsn_lib.sh @@ -0,0 +1,235 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright 2021-2022 NXP + +REQUIRE_ISOCHRON=${REQUIRE_ISOCHRON:=yes} +REQUIRE_LINUXPTP=${REQUIRE_LINUXPTP:=yes} + +# Tunables +UTC_TAI_OFFSET=37 +ISOCHRON_CPU=1 + +if [[ "$REQUIRE_ISOCHRON" = "yes" ]]; then + # https://github.com/vladimiroltean/tsn-scripts + # WARNING: isochron versions pre-1.0 are unstable, + # always use the latest version + require_command isochron +fi +if [[ "$REQUIRE_LINUXPTP" = "yes" ]]; then + require_command phc2sys + require_command ptp4l +fi + +phc2sys_start() +{ + local if_name=$1 + local uds_address=$2 + local extra_args="" + + if ! [ -z "${uds_address}" ]; then + extra_args="${extra_args} -z ${uds_address}" + fi + + phc2sys_log="$(mktemp)" + + chrt -f 10 phc2sys -m \ + -c ${if_name} \ + -s CLOCK_REALTIME \ + -O ${UTC_TAI_OFFSET} \ + --step_threshold 0.00002 \ + --first_step_threshold 0.00002 \ + ${extra_args} \ + > "${phc2sys_log}" 2>&1 & + phc2sys_pid=$! + + echo "phc2sys logs to ${phc2sys_log} and has pid ${phc2sys_pid}" + + sleep 1 +} + +phc2sys_stop() +{ + { kill ${phc2sys_pid} && wait ${phc2sys_pid}; } 2> /dev/null + rm "${phc2sys_log}" 2> /dev/null +} + +ptp4l_start() +{ + local if_name=$1 + local slave_only=$2 + local uds_address=$3 + local log="ptp4l_log_${if_name}" + local pid="ptp4l_pid_${if_name}" + local extra_args="" + + if [ "${slave_only}" = true ]; then + extra_args="${extra_args} -s" + fi + + # declare dynamic variables ptp4l_log_${if_name} and ptp4l_pid_${if_name} + # as global, so that they can be referenced later + declare -g "${log}=$(mktemp)" + + chrt -f 10 ptp4l -m -2 -P \ + -i ${if_name} \ + --step_threshold 0.00002 \ + --first_step_threshold 0.00002 \ + --tx_timestamp_timeout 100 \ + --uds_address="${uds_address}" \ + ${extra_args} \ + > "${!log}" 2>&1 & + declare -g "${pid}=$!" + + echo "ptp4l for interface ${if_name} logs to ${!log} and has pid ${!pid}" + + sleep 1 +} + +ptp4l_stop() +{ + local if_name=$1 + local log="ptp4l_log_${if_name}" + local pid="ptp4l_pid_${if_name}" + + { kill ${!pid} && wait ${!pid}; } 2> /dev/null + rm "${!log}" 2> /dev/null +} + +cpufreq_max() +{ + local cpu=$1 + local freq="cpu${cpu}_freq" + local governor="cpu${cpu}_governor" + + # Kernel may be compiled with CONFIG_CPU_FREQ disabled + if ! [ -d /sys/bus/cpu/devices/cpu${cpu}/cpufreq ]; then + return + fi + + # declare dynamic variables cpu${cpu}_freq and cpu${cpu}_governor as + # global, so they can be referenced later + declare -g "${freq}=$(cat /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_min_freq)" + declare -g "${governor}=$(cat /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_governor)" + + cat /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_max_freq > \ + /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_min_freq + echo -n "performance" > \ + /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_governor +} + +cpufreq_restore() +{ + local cpu=$1 + local freq="cpu${cpu}_freq" + local governor="cpu${cpu}_governor" + + if ! [ -d /sys/bus/cpu/devices/cpu${cpu}/cpufreq ]; then + return + fi + + echo "${!freq}" > /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_min_freq + echo -n "${!governor}" > \ + /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_governor +} + +isochron_recv_start() +{ + local if_name=$1 + local uds=$2 + local extra_args=$3 + + if ! [ -z "${uds}" ]; then + extra_args="--unix-domain-socket ${uds}" + fi + + isochron rcv \ + --interface ${if_name} \ + --sched-priority 98 \ + --sched-fifo \ + --utc-tai-offset ${UTC_TAI_OFFSET} \ + --quiet \ + ${extra_args} & \ + isochron_pid=$! + + sleep 1 +} + +isochron_recv_stop() +{ + { kill ${isochron_pid} && wait ${isochron_pid}; } 2> /dev/null +} + +isochron_do() +{ + local sender_if_name=$1; shift + local receiver_if_name=$1; shift + local sender_uds=$1; shift + local receiver_uds=$1; shift + local base_time=$1; shift + local cycle_time=$1; shift + local shift_time=$1; shift + local num_pkts=$1; shift + local vid=$1; shift + local priority=$1; shift + local dst_ip=$1; shift + local isochron_dat=$1; shift + local extra_args="" + local receiver_extra_args="" + local vrf="$(master_name_get ${sender_if_name})" + local use_l2="true" + + if ! [ -z "${dst_ip}" ]; then + use_l2="false" + fi + + if ! [ -z "${vrf}" ]; then + dst_ip="${dst_ip}%${vrf}" + fi + + if ! [ -z "${vid}" ]; then + vid="--vid=${vid}" + fi + + if [ -z "${receiver_uds}" ]; then + extra_args="${extra_args} --omit-remote-sync" + fi + + if ! [ -z "${shift_time}" ]; then + extra_args="${extra_args} --shift-time=${shift_time}" + fi + + if [ "${use_l2}" = "true" ]; then + extra_args="${extra_args} --l2 --etype=0xdead ${vid}" + receiver_extra_args="--l2 --etype=0xdead" + else + extra_args="${extra_args} --l4 --ip-destination=${dst_ip}" + receiver_extra_args="--l4" + fi + + cpufreq_max ${ISOCHRON_CPU} + + isochron_recv_start "${h2}" "${receiver_uds}" "${receiver_extra_args}" + + isochron send \ + --interface ${sender_if_name} \ + --unix-domain-socket ${sender_uds} \ + --priority ${priority} \ + --base-time ${base_time} \ + --cycle-time ${cycle_time} \ + --num-frames ${num_pkts} \ + --frame-size 64 \ + --txtime \ + --utc-tai-offset ${UTC_TAI_OFFSET} \ + --cpu-mask $((1 << ${ISOCHRON_CPU})) \ + --sched-fifo \ + --sched-priority 98 \ + --client 127.0.0.1 \ + --sync-threshold 5000 \ + --output-file ${isochron_dat} \ + ${extra_args} \ + --quiet + + isochron_recv_stop + + cpufreq_restore ${ISOCHRON_CPU} +} -- 2.25.1

2 years, 9 months

5
13
0 0

[PATCH i-g-t v2 0/4] Add support for KUnit tests

by Isabella Basso

This patch series was first developed as part of the LKCamp hackathon that happened last year[1], mainly focusing on refactoring DRM tests to use KUnit. KUnit[2][3] is a unified test framework that provides helper tools, simplifying their development and execution. Using an x86-64 machine it's possible to run tests in the host's kernel natively using user-mode Linux[4] (aka UML), which simplifies usage in a wide variety of scenarios, including integration to CI. As the tool's adoption widens into graphics testing territory, I and LKCamp members figured it would be important to support it in IGT, as it's a core tool for GPU drivers maintainers. I have then added KUnit support into IGT mainly following the KTAP specs, and it can be tested using patch 4/4 in this series together with a DRM selftests patch series available at [5]. Changes since v1: - Major rework of parsing function structure: - It is not longer recursive - Adapt kselftests functions and structs to be used with KUnit - Switch DRM selftests to KUnit parsing as they're updated in the kernel - Replace AMD KUnit tests by DRM selftests [1]: https://groups.google.com/g/kunit-dev/c/YqFR1q2uZvk/m/IbvItSfHBAAJ [2]: https://kunit.dev [3]: https://docs.kernel.org/dev-tools/kunit/index.html [4]: http://user-mode-linux.sourceforge.net [5]: https://lore.kernel.org/all/20220708203052.236290-1-maira.canal@usp.br/ Isabella Basso (4): lib/igt_kmod: rename kselftest functions to ktest lib/igt_kmod.c: check if module is builtin before attempting to unload it lib/igt_kmod: add compatibility for KUnit tests: DRM selftests: switch to KUnit lib/igt_kmod.c | 315 +++++++++++++++++++++++++++++++++++++++++-- lib/igt_kmod.h | 14 +- tests/drm_buddy.c | 7 +- tests/drm_mm.c | 7 +- tests/kms_selftest.c | 12 +- 5 files changed, 329 insertions(+), 26 deletions(-) -- 2.37.2

2 years, 9 months

6
29
0 0

[PATCH] selftests: fix LLVM build for i386 and x86_64

by Guillaume Tucker

Add missing cases for the i386 and x86_64 architectures when determining the LLVM target for building kselftest. Fixes: 795285ef2425 ("selftests: Fix clang cross compilation") Signed-off-by: Guillaume Tucker <guillaume.tucker(a)collabora.com> --- tools/testing/selftests/lib.mk | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk index b1c62914366b..cc4c443d5b14 100644 --- a/tools/testing/selftests/lib.mk +++ b/tools/testing/selftests/lib.mk @@ -10,12 +10,14 @@ endif CLANG_TARGET_FLAGS_arm := arm-linux-gnueabi CLANG_TARGET_FLAGS_arm64 := aarch64-linux-gnu CLANG_TARGET_FLAGS_hexagon := hexagon-linux-musl +CLANG_TARGET_FLAGS_i386 := i386-linux-gnu CLANG_TARGET_FLAGS_m68k := m68k-linux-gnu CLANG_TARGET_FLAGS_mips := mipsel-linux-gnu CLANG_TARGET_FLAGS_powerpc := powerpc64le-linux-gnu CLANG_TARGET_FLAGS_riscv := riscv64-linux-gnu CLANG_TARGET_FLAGS_s390 := s390x-linux-gnu CLANG_TARGET_FLAGS_x86 := x86_64-linux-gnu +CLANG_TARGET_FLAGS_x86_64 := x86_64-linux-gnu CLANG_TARGET_FLAGS := $(CLANG_TARGET_FLAGS_$(ARCH)) ifeq ($(CROSS_COMPILE),) -- 2.30.2

2 years, 10 months

2
2
0 0

[PATCH] selftests: add missing ')' in lib.mk

by Guillaume Tucker

Add missing closing ')' in lib.mk in a call to $error(). This only affects LLVM / Clang builds. Fixes: 795285ef2425 ("selftests: Fix clang cross compilation") Signed-off-by: Guillaume Tucker <guillaume.tucker(a)collabora.com> --- tools/testing/selftests/lib.mk | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk index 947fc72413e9..a87f60873e5b 100644 --- a/tools/testing/selftests/lib.mk +++ b/tools/testing/selftests/lib.mk @@ -20,7 +20,7 @@ CLANG_TARGET_FLAGS := $(CLANG_TARGET_FLAGS_$(ARCH)) ifeq ($(CROSS_COMPILE),) ifeq ($(CLANG_TARGET_FLAGS),) -$(error Specify CROSS_COMPILE or add '--target=' option to lib.mk +$(error Specify CROSS_COMPILE or add '--target=' option to lib.mk) else CLANG_FLAGS += --target=$(CLANG_TARGET_FLAGS) endif # CLANG_TARGET_FLAGS -- 2.30.2

2 years, 10 months

3
3
0 0

[PATCH net-next v2 0/3] net: introduce rps_default_mask

by Paolo Abeni

Real-time setups try hard to ensure proper isolation between time critical applications and e.g. network processing performed by the network stack in softirq and RPS is used to move the softirq activity away from the isolated core. If the network configuration is dynamic, with netns and devices routinely created at run-time, enforcing the correct RPS setting on each newly created device allowing to transient bad configuration became complex. These series try to address the above, introducing a new sysctl knob: rps_default_mask. The new sysctl entry allows configuring a systemwide RPS mask, to be enforced since receive queue creation time without any fourther per device configuration required. Additionally, a simple self-test is introduced to check the rps_default_mask behavior. v1 -> v2: - fix sparse warning in patch 2/3 Paolo Abeni (3): net/sysctl: factor-out netdev_rx_queue_set_rps_mask() helper net/core: introduce default_rps_mask netns attribute self-tests: introduce self-tests for RPS default mask Documentation/admin-guide/sysctl/net.rst | 6 ++ include/linux/netdevice.h | 1 + net/core/net-sysfs.c | 73 +++++++++++-------- net/core/sysctl_net_core.c | 58 +++++++++++++++ tools/testing/selftests/net/Makefile | 1 + tools/testing/selftests/net/config | 3 + .../testing/selftests/net/rps_default_mask.sh | 57 +++++++++++++++ 7 files changed, 169 insertions(+), 30 deletions(-) create mode 100755 tools/testing/selftests/net/rps_default_mask.sh -- 2.26.2

2 years, 10 months

4
12
0 0

[PATCH v14 00/39] arm64/sme: Initial support for the Scalable Matrix Extension

by Mark Brown

This series provides initial support for the ARMv9 Scalable Matrix Extension (SME). SME takes the approach used for vectors in SVE and extends this to provide architectural support for matrix operations. A more detailed overview can be found in [1]. For the kernel SME can be thought of as a series of features which are intended to be used together by applications but operate mostly orthogonally: - The ZA matrix register. - Streaming mode, in which ZA can be accessed and a subset of SVE features are available. - A second vector length, used for streaming mode SVE and ZA and controlled using a similar interface to that for SVE. - TPIDR2, a new userspace controllable system register intended for use by the C library for storing context related to the ZA ABI. A substantial part of the series is dedicated to refactoring the existing SVE support so that we don't need to duplicate code for handling vector lengths and the SVE registers, this involves creating an array of vector types and making the users take the vector type as a parameter. I'm not 100% happy with this but wasn't able to come up with anything better, duplicating code definitely felt like a bad idea so this felt like the least bad thing. If this approach makes sense to people it might make sense to split this off into a separate series and/or merge it while the rest is pending review to try to make things a little more digestable, the series is very large so it'd probably make things easier to digest if some of the preparatory refactoring could be merged before the rest is ready. One feature of the architecture of particular note is that switching to and from streaming mode may change the size of and invalidate the contents of the SVE registers, and when in streaming mode the FFR is not accessible. This complicates aspects of the ABI like signal handling and ptrace. This initial implementation is mainly intended to get the ABI in place, there are several areas which will be worked on going forwards - some of these will be blockers, others could be handled in followup serieses: - SME is currently not supported for KVM guests, this will be done as a followup series. A host system can use SME and run KVM guests but SME is not available in the guests. - The KVM host support is done in a very simplistic way, were anyone to attempt to use it in production there would be performance impacts on hosts with SME support. As part of this we also add enumeration of fine grained traps. - There is not currently ptrace or signal support TPIDR2, this will be done as a followup series. - No support is currently provided for scheduler control of SME or SME applications, given the size of the SME register state the context switch overhead may be noticable so this may be needed especially for real time applications. Similar concerns already exist for larger SVE vector lengths but are amplified for SME, particularly as the vector length increases. - There has been no work on optimising the performance of anything the kernel does. It is not expected that any systems will be encountered that support SME but not SVE, SME is an ARMv9 feature and SVE is mandatory for ARMv9. The code attempts to handle any such systems that are encountered but this hasn't been tested extensively. v14: - Rebase onto v5.18-rc3. v13: - Preserve ZA in both parent and child on clone() and add a test case for this. - Fix EFI integration for FA64. - Minor tweaks to the ABI document following Catlain's review. - Add and make use of thread_get_cur_vl() helper. - Fix some issues with SVE/FPSIMD register type moves in streaming SVE ptrace. - Typo fixes. - Roll in separately posted series extending ptrace coverage in kselftest for better integrated testing of the series. v12: - Fix some typos in the ABI document. - Print a message when we skip a vector length in the signal tests. - Add note of earliest toolchain versions with SME to manual encodings for future reference now that's landed. - Drop reference to PCS in sme.rst, it's not referenced and one of the links was broken. - Encode smstop and smstart as sysregs in the kernel. - Don't redundantly flush the SVE register state when loading FPSIMD state with SME enabled for the task, the architecture will do this for us. - Introduce and use task_get_cur_vl() to get the vector length for the currently active SVE registers. - Fix support for !FA64 mode in signal and syscall tests. - Simplify instruction sequence for ssve_regs signal test. - Actually include the ZA signal test in the patch set. v11: - Rebase onto v5.17-rc3. - Provide a sme-inst.h to collect manual encodings in kselftest. v10: - Actually do the rebase of fixups from the previous version into relevant patches. v9: - Remove defensive programming around IS_ENABLED() and FGT in KVM code. - Fix naming of TPIDR2 FGT register bit. - Add patches making handling of floating point register bits more consistent (also sent as separate series). - Drop now unused enumeration of fine grained traps. v8: - Rebase onto v5.17-rc1. - Support interoperation with KVM, SME is disabled for KVM guests with minimal handling for cleaning up SME state when entering and leaving the guest. - Document and implement that signal handlers are invoked with ZA and streaming mode disabled. - Use the RDSVL instruction introduced in EAC2 of the architecture to obtain the streaming mode vector length during enumeration, ZA state loading/saving and in test programs. - Store a pointer to SVCR in fpsimd_last_state and use it in fpsimd_save() for interoperation with KVM. - Add a test case sme_trap_no_sm checking that we generate a SIGILL when using an instruction that requires streaming mode without enabling it. - Add basic ZA context form validation to testcases helper library. - Move signal tests over to validating streaming VL from ZA information. - Pulled in patch removing ARRAY_SIZE() so that kselftest builds cleanly and to avoid trivial conflicts. v7: - Rebase onto v5.16-rc3. - Reduce indentation when supporting custom triggers for signal tests as suggested by Catalin. - Change to specifying a width for all CPU features rather than adding single bit specific infrastructure. - Don't require zeroing of non-shared SVE state during syscalls. v6: - Rebase onto v5.16-rc1. - Return to disabling TIF_SVE on kernel entry even if we have SME state, this avoids the need for KVM to handle the case where TIF_SVE is set on guest entry. - Add syscall-abi.h to SME updates to syscall-abi, mistakenly omitted from commit. v5: - Rebase onto currently merged SVE and kselftest patches. - Add support for the FA64 option, introduced in the recently published EAC1 update to the specification. - Pull in test program for the syscall ABI previously sent separately with some revisions and add coverage for the SME ABI. - Fix checking for options with 1 bit fields in ID_AA64SMFR0_EL1. - Minor fixes and clarifications to the ABI documentation. v4: - Rebase onto merged patches. - Remove an uneeded NULL check in vec_proc_do_default_vl(). - Include patch to factor out utility routines in kselftests written in assembler. - Specify -ffreestanding when building TPIDR2 test. v3: - Skip FFR rather than predicate registers in sve_flush_live(). - Don't assume a bool is all zeros in sve_flush_live() as per AAPCS. - Don't redundantly specify a zero index when clearing FFR. v2: - Fix several issues with !SME and !SVE configurations. - Preserve TPIDR2 when creating a new thread/process unless CLONE_SETTLS is set. - Report traps due to using features in an invalid mode as SIGILL. - Spell out streaming mode behaviour in SVE ABI documentation more directly. - Document TPIDR2 in the ABI document. - Use SMSTART and SMSTOP rather than read/modify/write sequences. - Rework logic for exiting streaming mode on syscall. - Don't needlessly initialise SVCR on access trap. - Always restore SME VL for userspace if SME traps are disabled. - Only yield to encourage preemption every 128 iterations in za-test, otherwise do a getpid(), and validate SVCR after syscall. - Leave streaming mode disabled except when reading the vector length in za-test, and disable ZA after detecting a mismatch. - Add SME support to vlset. - Clarifications and typo fixes in comments. - Move sme_alloc() forward declaration back a patch. [1] https://community.arm.com/developer/ip-products/processors/b/processors-ip-… Mark Brown (39): kselftest/arm64: Fix comment for ptrace_sve_get_fpsimd_data() kselftest/arm64: Remove assumption that tasks start FPSIMD only kselftest/arm64: Validate setting via FPSIMD and read via SVE regsets arm64/sme: Provide ABI documentation for SME arm64/sme: System register and exception syndrome definitions arm64/sme: Manually encode SME instructions arm64/sme: Early CPU setup for SME arm64/sme: Basic enumeration support arm64/sme: Identify supported SME vector lengths at boot arm64/sme: Implement sysctl to set the default vector length arm64/sme: Implement vector length configuration prctl()s arm64/sme: Implement support for TPIDR2 arm64/sme: Implement SVCR context switching arm64/sme: Implement streaming SVE context switching arm64/sme: Implement ZA context switching arm64/sme: Implement traps and syscall handling for SME arm64/sme: Disable ZA and streaming mode when handling signals arm64/sme: Implement streaming SVE signal handling arm64/sme: Implement ZA signal handling arm64/sme: Implement ptrace support for streaming mode SVE registers arm64/sme: Add ptrace support for ZA arm64/sme: Disable streaming mode and ZA when flushing CPU state arm64/sme: Save and restore streaming mode over EFI runtime calls KVM: arm64: Hide SME system registers from guests KVM: arm64: Trap SME usage in guest KVM: arm64: Handle SME host state when running guests arm64/sme: Provide Kconfig for SME kselftest/arm64: Add manual encodings for SME instructions kselftest/arm64: sme: Add SME support to vlset kselftest/arm64: Add tests for TPIDR2 kselftest/arm64: Extend vector configuration API tests to cover SME kselftest/arm64: sme: Provide streaming mode SVE stress test kselftest/arm64: signal: Handle ZA signal context in core code kselftest/arm64: Add stress test for SME ZA context switching kselftest/arm64: signal: Add SME signal handling tests kselftest/arm64: Add streaming SVE to SVE ptrace tests kselftest/arm64: Add coverage for the ZA ptrace interface kselftest/arm64: Add SME support to syscall ABI test selftests/arm64: Add a testcase for handling of ZA on clone() Documentation/arm64/elf_hwcaps.rst | 33 + Documentation/arm64/index.rst | 1 + Documentation/arm64/sme.rst | 428 +++++++++++++ Documentation/arm64/sve.rst | 70 ++- arch/arm64/Kconfig | 11 + arch/arm64/include/asm/cpu.h | 4 + arch/arm64/include/asm/cpufeature.h | 24 + arch/arm64/include/asm/el2_setup.h | 64 +- arch/arm64/include/asm/esr.h | 13 +- arch/arm64/include/asm/exception.h | 1 + arch/arm64/include/asm/fpsimd.h | 123 +++- arch/arm64/include/asm/fpsimdmacros.h | 87 +++ arch/arm64/include/asm/hwcap.h | 8 + arch/arm64/include/asm/kvm_arm.h | 1 + arch/arm64/include/asm/kvm_host.h | 4 + arch/arm64/include/asm/processor.h | 26 +- arch/arm64/include/asm/sysreg.h | 67 ++ arch/arm64/include/asm/thread_info.h | 2 + arch/arm64/include/uapi/asm/hwcap.h | 8 + arch/arm64/include/uapi/asm/ptrace.h | 69 ++- arch/arm64/include/uapi/asm/sigcontext.h | 55 +- arch/arm64/kernel/cpufeature.c | 106 ++++ arch/arm64/kernel/cpuinfo.c | 13 + arch/arm64/kernel/entry-common.c | 11 + arch/arm64/kernel/entry-fpsimd.S | 36 ++ arch/arm64/kernel/fpsimd.c | 585 ++++++++++++++++-- arch/arm64/kernel/process.c | 44 +- arch/arm64/kernel/ptrace.c | 358 +++++++++-- arch/arm64/kernel/signal.c | 188 +++++- arch/arm64/kernel/syscall.c | 29 +- arch/arm64/kernel/traps.c | 1 + arch/arm64/kvm/fpsimd.c | 43 +- arch/arm64/kvm/hyp/nvhe/switch.c | 30 + arch/arm64/kvm/hyp/vhe/switch.c | 11 +- arch/arm64/kvm/sys_regs.c | 9 +- arch/arm64/tools/cpucaps | 2 + include/uapi/linux/elf.h | 2 + include/uapi/linux/prctl.h | 9 + kernel/sys.c | 12 + tools/testing/selftests/arm64/abi/.gitignore | 1 + tools/testing/selftests/arm64/abi/Makefile | 9 +- .../selftests/arm64/abi/syscall-abi-asm.S | 79 ++- .../testing/selftests/arm64/abi/syscall-abi.c | 204 +++++- .../testing/selftests/arm64/abi/syscall-abi.h | 15 + tools/testing/selftests/arm64/abi/tpidr2.c | 298 +++++++++ tools/testing/selftests/arm64/fp/.gitignore | 5 + tools/testing/selftests/arm64/fp/Makefile | 19 +- tools/testing/selftests/arm64/fp/rdvl-sme.c | 14 + tools/testing/selftests/arm64/fp/rdvl.S | 10 + tools/testing/selftests/arm64/fp/rdvl.h | 1 + tools/testing/selftests/arm64/fp/sme-inst.h | 51 ++ tools/testing/selftests/arm64/fp/ssve-stress | 59 ++ tools/testing/selftests/arm64/fp/sve-ptrace.c | 175 +++++- tools/testing/selftests/arm64/fp/sve-test.S | 20 + tools/testing/selftests/arm64/fp/vec-syscfg.c | 10 + tools/testing/selftests/arm64/fp/vlset.c | 10 +- .../testing/selftests/arm64/fp/za-fork-asm.S | 61 ++ tools/testing/selftests/arm64/fp/za-fork.c | 156 +++++ tools/testing/selftests/arm64/fp/za-ptrace.c | 356 +++++++++++ tools/testing/selftests/arm64/fp/za-stress | 59 ++ tools/testing/selftests/arm64/fp/za-test.S | 388 ++++++++++++ .../testing/selftests/arm64/signal/.gitignore | 3 + .../selftests/arm64/signal/test_signals.h | 4 + .../arm64/signal/test_signals_utils.c | 6 + .../testcases/fake_sigreturn_sme_change_vl.c | 92 +++ .../arm64/signal/testcases/sme_trap_no_sm.c | 38 ++ .../signal/testcases/sme_trap_non_streaming.c | 45 ++ .../arm64/signal/testcases/sme_trap_za.c | 36 ++ .../selftests/arm64/signal/testcases/sme_vl.c | 68 ++ .../arm64/signal/testcases/ssve_regs.c | 135 ++++ .../arm64/signal/testcases/testcases.c | 36 ++ .../arm64/signal/testcases/testcases.h | 3 +- .../arm64/signal/testcases/za_regs.c | 128 ++++ 73 files changed, 4991 insertions(+), 191 deletions(-) create mode 100644 Documentation/arm64/sme.rst create mode 100644 tools/testing/selftests/arm64/abi/syscall-abi.h create mode 100644 tools/testing/selftests/arm64/abi/tpidr2.c create mode 100644 tools/testing/selftests/arm64/fp/rdvl-sme.c create mode 100644 tools/testing/selftests/arm64/fp/sme-inst.h create mode 100644 tools/testing/selftests/arm64/fp/ssve-stress create mode 100644 tools/testing/selftests/arm64/fp/za-fork-asm.S create mode 100644 tools/testing/selftests/arm64/fp/za-fork.c create mode 100644 tools/testing/selftests/arm64/fp/za-ptrace.c create mode 100644 tools/testing/selftests/arm64/fp/za-stress create mode 100644 tools/testing/selftests/arm64/fp/za-test.S create mode 100644 tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_sme_change_vl.c create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_trap_no_sm.c create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_trap_non_streaming.c create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_trap_za.c create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_vl.c create mode 100644 tools/testing/selftests/arm64/signal/testcases/ssve_regs.c create mode 100644 tools/testing/selftests/arm64/signal/testcases/za_regs.c base-commit: b2d229d4ddb17db541098b83524d901257e93845 -- 2.30.2

2 years, 12 months

8
58
0 0

[PATCH v2 0/2] livepatch: Move tests from lib/livepatch to selftests/livepatch

by Marcos Paulo de Souza

Hi there, this is the v2 of the patchset. The v1 can be found at [1]. There is only one change in patch 1, which changed the target directory to build the test modules. All other changes happen in patch 2. Thanks for reviewing! Changes from v1: # test_modules/Makefile * Build the test modules targeting /lib/modules, instead of ksrc when building from the kernel source. # test_modules/test_klp_syscall.c * Added a parameter array to receive the pids that should transition to the new system call. (suggedted by Joe) * Create a new sysfs file /sys/kernel/test_klp_syscall/npids to show how many pids from the argument need to transition to the new state. (suggested by Joe) * Fix the PPC32 support by adding the syscall wrapper for archs that select it by default, without erroring out. PPC does not set SYSCALL_WRAPPER, so having it set in v1 was a mistake. (suggested by Joe) * The aarch64 syscall prefix was added too, since the livepatch support will come soon. # test_binaries/test_klp-call_getpid.c * Change %d/%u in printf (suggested byu Joe) * Change run -> stop variable name, and inverted the assignments (suggested by * Joe). # File test-syscall.sh * Fixed test-syscall.sh to call test_klp-call-getpid in test_binaries dir * Load test_klp_syscall passed the pids of the test_klp-call_getpid instances. Check the sysfs file from test_klp_syscall module to check that all pids transitioned correctly. (suggested by Joe) * Simplified the loop that calls test_klp-call_getpid. (suggested by Joe) * Removed the "success" comment from the script, as it's implicit that it succeed. Otherwise load_lp would error out. (suggested by Joe) * Changed the commit message of patch 2 to further detail what means "tricky" when livepatching syscalls. (suggested by Joe) [1]: 20220603143242.870-1-mpdesouza(a)suse.com Marcos Paulo de Souza (2): livepatch: Move tests from lib/livepatch to selftests/livepatch selftests: livepatch: Test livepatching a heavily called syscall arch/s390/configs/debug_defconfig | 1 - arch/s390/configs/defconfig | 1 - lib/Kconfig.debug | 22 --- lib/Makefile | 2 - lib/livepatch/Makefile | 14 -- tools/testing/selftests/livepatch/Makefile | 35 +++- tools/testing/selftests/livepatch/README | 5 +- tools/testing/selftests/livepatch/config | 1 - .../testing/selftests/livepatch/functions.sh | 34 ++-- .../selftests/livepatch/test-callbacks.sh | 50 +++--- .../selftests/livepatch/test-ftrace.sh | 6 +- .../selftests/livepatch/test-livepatch.sh | 10 +- .../selftests/livepatch/test-shadow-vars.sh | 2 +- .../testing/selftests/livepatch/test-state.sh | 18 +-- .../selftests/livepatch/test-syscall.sh | 52 ++++++ .../test_binaries/test_klp-call_getpid.c | 48 ++++++ .../selftests/livepatch/test_modules/Makefile | 20 +++ .../test_modules}/test_klp_atomic_replace.c | 0 .../test_modules}/test_klp_callbacks_busy.c | 0 .../test_modules}/test_klp_callbacks_demo.c | 0 .../test_modules}/test_klp_callbacks_demo2.c | 0 .../test_modules}/test_klp_callbacks_mod.c | 0 .../test_modules}/test_klp_livepatch.c | 0 .../test_modules}/test_klp_shadow_vars.c | 0 .../livepatch/test_modules}/test_klp_state.c | 0 .../livepatch/test_modules}/test_klp_state2.c | 0 .../livepatch/test_modules}/test_klp_state3.c | 0 .../livepatch/test_modules/test_klp_syscall.c | 150 ++++++++++++++++++ 28 files changed, 360 insertions(+), 111 deletions(-) delete mode 100644 lib/livepatch/Makefile create mode 100755 tools/testing/selftests/livepatch/test-syscall.sh create mode 100644 tools/testing/selftests/livepatch/test_binaries/test_klp-call_getpid.c create mode 100644 tools/testing/selftests/livepatch/test_modules/Makefile rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_atomic_replace.c (100%) rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_callbacks_busy.c (100%) rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_callbacks_demo.c (100%) rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_callbacks_demo2.c (100%) rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_callbacks_mod.c (100%) rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_livepatch.c (100%) rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_shadow_vars.c (100%) rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_state.c (100%) rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_state2.c (100%) rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_state3.c (100%) create mode 100644 tools/testing/selftests/livepatch/test_modules/test_klp_syscall.c -- 2.35.3

3 years

6
21
0 0

[PATCH] selftests/bpf: Fix conflicts with built-in functions in bpf_iter_ksym

by James Hilliard

Both tolower and toupper are built in c functions, we should not redefine them as this can result in a build error. Fixes the following errors: progs/bpf_iter_ksym.c:10:20: error: conflicting types for built-in function 'tolower'; expected 'int(int)' [-Werror=builtin-declaration-mismatch] 10 | static inline char tolower(char c) | ^~~~~~~ progs/bpf_iter_ksym.c:5:1: note: 'tolower' is declared in header '<ctype.h>' 4 | #include <bpf/bpf_helpers.h> +++ |+#include <ctype.h> 5 | progs/bpf_iter_ksym.c:17:20: error: conflicting types for built-in function 'toupper'; expected 'int(int)' [-Werror=builtin-declaration-mismatch] 17 | static inline char toupper(char c) | ^~~~~~~ progs/bpf_iter_ksym.c:17:20: note: 'toupper' is declared in header '<ctype.h>' Signed-off-by: James Hilliard <james.hilliard1(a)gmail.com> --- tools/testing/selftests/bpf/progs/bpf_iter_ksym.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_ksym.c b/tools/testing/selftests/bpf/progs/bpf_iter_ksym.c index 285c008cbf9c..9ba14c37bbcc 100644 --- a/tools/testing/selftests/bpf/progs/bpf_iter_ksym.c +++ b/tools/testing/selftests/bpf/progs/bpf_iter_ksym.c @@ -7,14 +7,14 @@ char _license[] SEC("license") = "GPL"; unsigned long last_sym_value = 0; -static inline char tolower(char c) +static inline char to_lower(char c) { if (c >= 'A' && c <= 'Z') c += ('a' - 'A'); return c; } -static inline char toupper(char c) +static inline char to_upper(char c) { if (c >= 'a' && c <= 'z') c -= ('a' - 'A'); @@ -54,7 +54,7 @@ int dump_ksym(struct bpf_iter__ksym *ctx) type = iter->type; if (iter->module_name[0]) { - type = iter->exported ? toupper(type) : tolower(type); + type = iter->exported ? to_upper(type) : to_lower(type); BPF_SEQ_PRINTF(seq, "0x%llx %c %s [ %s ] ", value, type, iter->name, iter->module_name); } else { -- 2.34.1

3 years

2
5
0 0

[PATCH v8 00/26] tcp: Initial support for RFC5925 auth option

by Leonard Crestez

This is similar to TCP-MD5 in functionality but it's sufficiently different that packet formats and interfaces are incompatible. Compared to TCP-MD5 more algorithms are supported and multiple keys can be used on the same connection but there is still no negotiation mechanism. Expected use-case is protecting long-duration BGP/LDP connections between routers using pre-shared keys. The goal of this series is to allow routers using the Linux TCP stack to interoperate with vendors such as Cisco and Juniper. An fully-featured userspace implementation using this patchset exists but it is not open. A completely unrelated series that implements the same features was posted recently: https://lore.kernel.org/netdev/20220818170005.747015-1-dima@arista.com/ The biggest difference is that this series puts TCP-AO key on a global instead of per-socket list and that it attempts to make kernel-mode key selection decisions instead of very strictly requiring userspace to make all decisions. I believe my approach greatly simplifies userspace implementation. The biggest difference in this iteration of the patch series is adding per-key lifetime values based on RFC8177 in order to implement kernel-mode key rollover. Older versions still required userspace to tweak the NOSEND/NORECV flags and always pick rnextkeyid explicitly, but now no active "key management" should be required on established socket - Just set correct flags and expiration dates and the kernel can perform key rollover itself. You can see a (simple) test of that behavior here: https://github.com/cdleonard/tcp-authopt-test/blob/main/tcp_authopt_test/te… The main implementation of this behavior is patch 17. Very very old versions of this series had per-socket keys but that approach was prone to an issue when key change made on a listen socket between "synack" and "accept" did not affect the new socket. My solution was to make keys global, the Arista solution is to require userspace to query the key list on accepted sockets and update them. This offloads responsibility for an ABI race to userspace. It can be made to work. Here are some known flaws and limitations: * Crypto API is used with buffers on the stack and inside struct sock, this might not work on all arches. I'm currently only testing x64 VMs * Interaction with FASTOPEN not tested. * Traffic key is not cached (reducing performance). * All lookups examine all keys, ignoring optimization opportunities * Overlaping MKTs can be configured despite what RFC5925 says. This is considered "misconfiguration by userspace" and it would make sense for the kernel to be more aggressive here. Some testing support is included in nettest and fcnal-test.sh, similar to the current level of tcp-md5 testing. A more elaborate test suite using pytest and scapy is available out of tree: https://github.com/cdleonard/tcp-authopt-test There is an automatic system that runs that test suite in vagrant in gitlab-ci: https://gitlab.com/cdleonard/vagrantcpao That test suite fully covers the ABI of this patchset. Changes for frr (obsolete): https://github.com/FRRouting/frr/pull/9442 That PR was made early for ABI feedback, it has many issues. Changes for yabgp (obsolete): https://github.com/cdleonard/yabgp/commits/tcp_authopt This was used for interoperability testing with cisco. Would need updates for global keys to avoid leaks. Changes since PATCH v7: * Add lifetime fields to struct tcp_authopt_key * Fix not checking MD5 after unexpected AO. Link to v7: https://lore.kernel.org/netdev/cover.1660852705.git.cdleonard@gmail.com/ Changes since PATCH v6: * Squash "remove unused noops" patch (forgot to do this before v5 send). * Make TCP_REPAIR_AUTHOPT fail if (!tp->repair) * Add {snd,rcv}_seq to struct tcp_repair_authopt next to {snd,rcv}_sne. The fact that internally snd_sne is maintained as a 64-bit extension of sne_nxt is a problem for TCP_REPAIR implementation in userspace which might not have access to snd_nxt during live traffic. By exposing a full 64-bit “recent sequence number” to userspace it's possible to ignore which exact SEQ number the SNE value is an extension of. * Fix ipv6_addr_is_prefix helper; it was incorrect and dependant on uninitialized stack memory. This was caught by test suite after many rebases. * Implement ipv4-mapped-ipv6 support, request by Eric Dumazet Link: https://lore.kernel.org/netdev/cover.1658815925.git.cdleonard@gmail.com/ Changes since PATCH v5: * Rebased on recent net-next, including recent changes refactoring md5 * Use to skb_drop_reason * Fix using sock_kmalloc for key alloc but regular kfree for free. Use kmalloc because keys are global * Fix mentioning non-existent copy_from_sockopt in doc for _copy_from_sockptr_tolerant * If no valid keys are available for a destination then report a socket error instead of sending unsigned traffic * Remove several noop implementations which are always called from ifdef * Fix build issues in all scenarios, including -Werror at every point. * Split "tcp: Refactor tcp_inbound_md5_hash into tcp_inbound_sig_hash" into a separate commit. * Add TCP_AUTHOPT_FLAG_ACTIVE to distinguish between "keys configured for socket" and "connection authenticated". A listen socket with authentication enabled will return other sockets with authentication enabled on accept() but if no key is configured for the peer then authentication will be inactive. * Add support for TCP_REPAIR_AUTHOPT new sockopts which loads/saves the AO-specific information. Link: https://lore.kernel.org/netdev/cover.1643026076.git.cdleonard@gmail.com/ Changes since PATCH v4: * Move the traffic_key context_bytes header to stack. If it's a constant string then ahash can fail unexpectedly. * Fix allowing unsigned traffic if all keys are marked norecv. * Fix crashing in __tcp_authopt_alg_init on failure. * Try to respect the rnextkeyid from SYN on SYNACK (new patch) * Fix incorrect check for TCP_AUTHOPT_KEY_DEL in __tcp_authopt_select_key * Improve docs on __tcp_authopt_select_key * Fix build with CONFIG_PROC_FS=n (kernel build robot) * Fix build with CONFIG_IPV6=n (kernel build robot) Link: https://lore.kernel.org/netdev/cover.1640273966.git.cdleonard@gmail.com/ Changes since PATCH v3: * Made keys global (per-netns rather than per-sock). * Add /proc/net/tcp_authopt with a table of keys (not sockets). * Fix part of the shash/ahash conversion having slipped from patch 3 to patch 5 * Fix tcp_parse_sig_options assigning NULL incorrectly when both MD5 and AO are disabled (kernel build robot) * Fix sparse endianness warnings in prefix match (kernel build robot) * Fix several incorrect RCU annotations reported by sparse (kernel build robot) Link: https://lore.kernel.org/netdev/cover.1638962992.git.cdleonard@gmail.com/ Changes since PATCH v2: * Protect tcp_authopt_alg_get/put_tfm with local_bh_disable instead of preempt_disable. This caused signature corruption when send path executing with BH enabled was interrupted by recv. * Fix accepted keyids not configured locally as "unexpected". If any key is configured that matches the peer then traffic MUST be signed. * Fix issues related to sne rollover during handshake itself. (Francesco) * Implement and test prefixlen (David) * Replace shash with ahash and reuse some of the MD5 code (Dmitry) * Parse md5+ao options only once in the same function (Dmitry) * Pass tcp_authopt_info into inbound check path, this avoids second rcu dereference for same packet. * Pass tcp_request_socket into inbound check path instead of just listen socket. This is required for SNE rollover during handshake and clearifies ISN handling. * Do not allow disabling via sysctl after enabling once, this is difficult to support well (David) * Verbose check for sysctl_tcp_authopt (Dmitry) * Use netif_index_is_l3_master (David) * Cleanup ipvx_addr_match (David) * Add a #define tcp_authopt_needed to wrap static key usage because it looks nicer. * Replace rcu_read_lock with rcu_dereference_protected in SNE updates (Eric) * Remove test suite Link: https://lore.kernel.org/netdev/cover.1635784253.git.cdleonard@gmail.com/ Changes since PATCH v1: * Implement Sequence Number Extension * Implement l3index for vrf: TCP_AUTHOPT_KEY_IFINDEX as equivalent of TCP_MD5SIG_FLAG_IFINDEX * Expand TCP-AO tests in fcnal-test.sh to near-parity with md5. * Show addr/port on failure similar to md5 * Remove tox dependency from test suite (create venv directly) * Switch default pytest output format to TAP (kselftest standard) * Fix _copy_from_sockptr_tolerant stack corruption on short sockopts. This was covered in test but error was invisible without STACKPROTECTOR=y * Fix sysctl_tcp_authopt check in tcp_get_authopt_val before memset. This was harmless because error code is checked in getsockopt anyway. * Fix dropping md5 packets on all sockets with AO enabled * Fix checking (key->recv_id & TCP_AUTHOPT_KEY_ADDR_BIND) instead of key->flags in tcp_authopt_key_match_exact * Fix PATCH 1/19 not compiling due to missing "int err" declaration * Add ratelimited message for AO and MD5 both present * Export all symbols required by CONFIG_IPV6=m (again) * Fix compilation with CONFIG_TCP_AUTHOPT=y CONFIG_TCP_MD5SIG=n * Fix checkpatch issues * Pass -rrequirements.txt to tox to avoid dependency variation. Link: https://lore.kernel.org/netdev/cover.1632240523.git.cdleonard@gmail.com/ Changes since RFCv3: * Implement TCP_AUTHOPT handling for timewait and reset replies. Write tests to execute these paths by injecting packets with scapy * Handle combining md5 and authopt: if both are configured use authopt. * Fix locking issues around send_key, introduced in on of the later patches. * Handle IPv4-mapped-IPv6 addresses: it used to be that an ipv4 SYN sent to an ipv6 socket with TCP-AO triggered WARN * Implement un-namespaced sysctl disabled this feature by default * Allocate new key before removing any old one in setsockopt (Dmitry) * Remove tcp_authopt_key_info.local_id because it's no longer used (Dmitry) * Propagate errors from TCP_AUTHOPT getsockopt (Dmitry) * Fix no-longer-correct TCP_AUTHOPT_KEY_DEL docs (Dmitry) * Simplify crypto allocation (Eric) * Use kzmalloc instead of __GFP_ZERO (Eric) * Add static_key_false tcp_authopt_needed (Eric) * Clear authopt_info copied from oldsk in __tcp_authopt_openreq (Eric) * Replace memcmp in ipv4 and ipv6 addr comparisons (Eric) * Export symbols for CONFIG_IPV6=m (kernel test robot) * Mark more functions static (kernel test robot) * Fix build with CONFIG_PROVE_RCU_LIST=y (kernel test robot) Link: https://lore.kernel.org/netdev/cover.1629840814.git.cdleonard@gmail.com/ Changes since RFCv2: * Removed local_id from ABI and match on send_id/recv_id/addr * Add all relevant out-of-tree tests to tools/testing/selftests * Return an error instead of ignoring unknown flags, hopefully this makes it easier to extend. * Check sk_family before __tcp_authopt_info_get_or_create in tcp_set_authopt_key * Use sock_owned_by_me instead of WARN_ON(!lockdep_sock_is_held(sk)) * Fix some intermediate build failures reported by kbuild robot * Improve documentation Link: https://lore.kernel.org/netdev/cover.1628544649.git.cdleonard@gmail.com/ Changes since RFC: * Split into per-topic commits for ease of review. The intermediate commits compile with a few "unused function" warnings and don't do anything useful by themselves. * Add ABI documention including kernel-doc on uapi * Fix lockdep warnings from crypto by creating pools with one shash for each cpu * Accept short options to setsockopt by padding with zeros; this approach allows increasing the size of the structs in the future. * Support for aes-128-cmac-96 * Support for binding addresses to keys in a way similar to old tcp_md5 * Add support for retrieving received keyid/rnextkeyid and controling the keyid/rnextkeyid being sent. Link: https://lore.kernel.org/netdev/01383a8751e97ef826ef2adf93bfde3a08195a43.162… Leonard Crestez (26): tcp: authopt: Initial support and key management docs: Add user documentation for tcp_authopt tcp: authopt: Add crypto initialization tcp: Refactor tcp_sig_hash_skb_data for AO tcp: authopt: Compute packet signatures tcp: Refactor tcp_inbound_md5_hash into tcp_inbound_sig_hash tcp: authopt: Hook into tcp core tcp: authopt: Disable via sysctl by default tcp: authopt: Implement Sequence Number Extension tcp: ipv6: Add AO signing for tcp_v6_send_response tcp: authopt: Add support for signing skb-less replies tcp: ipv4: Add AO signing for skb-less replies tcp: authopt: Add NOSEND/NORECV flags tcp: authopt: Add initial l3index support tcp: authopt: Add prefixlen support tcp: authopt: Add send/recv lifetime support tcp: authopt: Add key selection controls tcp: authopt: Add v4mapped ipv6 address support tcp: authopt: Add /proc/net/tcp_authopt listing all keys tcp: authopt: If no keys are valid for send report an error tcp: authopt: Try to respect rnextkeyid from SYN on SYNACK tcp: authopt: Initial support for TCP_AUTHOPT_FLAG_ACTIVE tcp: authopt: Initial implementation of TCP_REPAIR_AUTHOPT selftests: nettest: Rename md5_prefix to key_addr_prefix selftests: nettest: Initial tcp_authopt support selftests: net/fcnal: Initial tcp_authopt support Documentation/networking/index.rst | 1 + Documentation/networking/ip-sysctl.rst | 6 + Documentation/networking/tcp_authopt.rst | 95 + include/linux/tcp.h | 15 + include/net/dropreason.h | 16 + include/net/net_namespace.h | 4 + include/net/netns/tcp_authopt.h | 12 + include/net/tcp.h | 55 +- include/net/tcp_authopt.h | 269 +++ include/uapi/linux/snmp.h | 1 + include/uapi/linux/tcp.h | 188 ++ net/ipv4/Kconfig | 14 + net/ipv4/Makefile | 1 + net/ipv4/proc.c | 1 + net/ipv4/sysctl_net_ipv4.c | 39 + net/ipv4/tcp.c | 126 +- net/ipv4/tcp_authopt.c | 2044 +++++++++++++++++++++ net/ipv4/tcp_input.c | 55 +- net/ipv4/tcp_ipv4.c | 100 +- net/ipv4/tcp_minisocks.c | 12 + net/ipv4/tcp_output.c | 106 +- net/ipv6/tcp_ipv6.c | 70 +- tools/testing/selftests/net/fcnal-test.sh | 329 +++- tools/testing/selftests/net/nettest.c | 204 +- 24 files changed, 3675 insertions(+), 88 deletions(-) create mode 100644 Documentation/networking/tcp_authopt.rst create mode 100644 include/net/netns/tcp_authopt.h create mode 100644 include/net/tcp_authopt.h create mode 100644 net/ipv4/tcp_authopt.c -- 2.25.1

3 years

6
41
0 0

[PATCH v2 0/5] mm/memfd: MFD_NOEXEC for memfd_create

by jeffxu＠google.com

From: Jeff Xu <jeffxu(a)chromium.org> Hi, This v2 series MFD_NOEXEC, this series includes: 1> address comments in V1 2> add sysctl (vm.mfd_noexec) to change the default file permissions of memfd_create to be non-executable. Below are cover-level for v1: The default file permissions on a memfd include execute bits, which means that such a memfd can be filled with a executable and passed to the exec() family of functions. This is undesirable on systems where all code is verified and all filesystems are intended to be mounted noexec, since an attacker may be able to use a memfd to load unverified code and execute it. Additionally, execution via memfd is a common way to avoid scrutiny for malicious code, since it allows execution of a program without a file ever appearing on disk. This attack vector is not totally mitigated with this new flag, since the default memfd file permissions must remain executable to avoid breaking existing legitimate uses, but it should be possible to use other security mechanisms to prevent memfd_create calls without MFD_NOEXEC on systems where it is known that executable memfds are not necessary. This patch series adds a new MFD_NOEXEC flag for memfd_create(), which allows creation of non-executable memfds, and as part of the implementation of this new flag, it also adds a new F_SEAL_EXEC seal, which will prevent modification of any of the execute bits of a sealed memfd. I am not sure if this is the best way to implement the desired behavior (for example, the F_SEAL_EXEC seal is really more of an implementation detail and feels a bit clunky to expose), so suggestions are welcome for alternate approaches. v1: https://lwn.net/Articles/890096/ Daniel Verkamp (4): mm/memfd: add F_SEAL_EXEC mm/memfd: add MFD_NOEXEC flag to memfd_create selftests/memfd: add tests for F_SEAL_EXEC selftests/memfd: add tests for MFD_NOEXEC Jeff Xu (1): sysctl: add support for mfd_noexec include/linux/mm.h | 4 + include/uapi/linux/fcntl.h | 1 + include/uapi/linux/memfd.h | 1 + kernel/sysctl.c | 9 ++ mm/memfd.c | 39 ++++- mm/shmem.c | 6 + tools/testing/selftests/memfd/memfd_test.c | 163 ++++++++++++++++++++- 7 files changed, 221 insertions(+), 2 deletions(-) base-commit: 9e2f40233670c70c25e0681cb66d50d1e2742829 -- 2.37.1.559.g78731f0fdb-goog

3 years, 1 month

4
11
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror September 2022